debug-trl-grpo

v1.0
machine-learning
hard
Jiajun Bao
#post-training
#trl
#debugging
#reasoning-model
#grpo

# debug-trl-grpo

debug-trl-grpo

I am training a countdown math task using TRL for my model with GRPO, but the new model shows no improvement. The source code is at /app/trl. Check if there is any bug and fix them.

DO NOT modify the training script (/app/train_grpo.py) or reward function (/app/reward_fn.py)