Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>RLHF works on problems that are difficult to specify yet easy to judge.

But that's the thing, that it seems that everyone here on HN (and elsewhere) finds it easy to judge the flaws of AI-generated code, and they seem relatively consistent. So if we start offering these critiques as RLHF at scale, we should be able to bring the LLM output to the level where further feedback is hard (or at least inconsistent), right?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: