Author here: There are two ways to implement autograd, reverse-mode and forward-...

Author here:

There are two ways to implement autograd, reverse-mode and forward-mode. Reverse-mode is what minigrad uses, and what most ML libraries these days use by default, since it computes gradients of all inputs (wrt one output) in a single pass. It's exactly what you describe in the 2nd paragraph.

Forward-mode autograd is the technique that can use dual numbers. It computes all gradients of one input (wrt all outputs) in a single pass. Dual numbers is a pretty neat mathematical trick, but I'm not aware of anyone that actually uses them to compute gradients.

The most approachable explanation of dual numbers I've seen is in Aurelien Geron's book Hands-On Machine Learning (Appendix D). There are articles online but I found them more technical.

Thanks for checking out the project!