Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Attention Residuals (arxiv.org)
2 points by djhemath 3 days ago | hide | past | favorite | 1 comment
 help



This paper by the Kimi team allows us to add more depth to the model without losing information/context. Although it increases efficiency by just over 1%, the total savings could reach millions. Or at least, it would allow us to build models with more layers for the same cost as today.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: