Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've really been enjoying their series on mech interp, does anyone have any other good recs?


"Transformers Represent Belief State Geometry in their Residual Stream":

https://www.lesswrong.com/posts/gTZ2SxesbHckJ3CkF/transforme...

Basically finding that transformers don't just store a world-model as in "what does the world that produce the observed inputs look like?", they store a "Mixed-State Presentation", basically a weighted set of possible worlds that produce the observed inputs.


The Othello-GPT and Chess-GPT lines of work.

Was the first research work that clued me into what Anthropic's work today ended up demonstrating.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: