Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It does remind me of a project [1] Andrej Karpathy did, writing a neural network and training code in ~600 lines (although networks have easier logic to code than a compiler).

[1] https://github.com/karpathy/nanoGPT



This is an implementation of GPT using the pytorch library. It is not meant to be the shortest implementation of a trainable GPT, however it is very clean code. Pytorch does a lot of the heavy lifting, especially when it comes to training on multiple GPU. This implementation only works with data distributed parallel training, so one could not train models of the size of GPT-4 with it out of the box.


Perhaps they were thinking of https://github.com/karpathy/micrograd




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: