Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Those don't have DPO/GRPO which arguably made some parts of RL obsolete.


check out cs 336 stanford, they cover DPO/GRPO and relevant parts needed to train LLMs.


It's also covered by CS329H.


I can assure you that lacking knowledge in DPO (and especially GRPO it’s just stripped down PPO) is not a dealbreaker.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: