Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You don't know what you are talking about. Obviously refusal circuitry does not live in one layer, but the repo is built on a paper with sound foundations from an Anthropic scholar working with a DeepMind interpretability mentor: https://scholar.google.com/citations?view_op=view_citation&h...
 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: