Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

apparently (according to the blog post) that's a result of the RL human preference fine-tuning - the human rankers preferred longer more in-depth answers


Wasn't the instruct model created using that same strategy?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: