Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The computations in transformers are actually generalized tensor tensor contractions implemented as matrix multiplications. Their efficient implementation in gpu hardware involves many algebraic gems and is a work of art. You can have a taste of the complexity involved in their design in this Youtube video: https://www.youtube.com/live/ufa4pmBOBT8


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: