Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder if you would want to use an earlier layer as opposed to the penultimate layer, I would imagine that the LLM uses that layer to "prepare" for the final dimensionality reduction to clean the signal such that it scores well on the loss function.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: