apparently (according to the blog post) that's a result of the RL human preferen... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		make3 on Nov 30, 2022 \| parent \| context \| favorite \| on: OpenAI ChatGPT: Optimizing language models for dia... apparently (according to the blog post) that's a result of the RL human preference fine-tuning - the human rankers preferred longer more in-depth answers

TheCaptain4815 on Nov 30, 2022 [–]

Wasn't the instruct model created using that same strategy?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact