It does remind me of a project [1] Andrej Karpathy did, writing a neural network...

pama · on Sept 4, 2023

This is an implementation of GPT using the pytorch library. It is not meant to be the shortest implementation of a trainable GPT, however it is very clean code. Pytorch does a lot of the heavy lifting, especially when it comes to training on multiple GPU. This implementation only works with data distributed parallel training, so one could not train models of the size of GPT-4 with it out of the box.

tavianator · on Sept 4, 2023

Perhaps they were thinking of https://github.com/karpathy/micrograd