Essentially a computer neural network is just a lot of addition (and matrix mult...

mdaniel · on April 3, 2023

To add to this excellent reply, I'll also point out that the reason folks want the weights is that they are the result of a massive search operation, akin to finding the right temperature to bake a cake from all possible floats. It takes a lot of wall clock time, and a lot of GPU energy, and a lot of input examples and counter-examples to find the "right" numbers. Thus, it really is better -- all things being equal -- to publish the results of that search to keep everyone else from having to repeat the search for themselves

dragonwriter · on April 3, 2023

> a massive search operation, akin to finding the right temperature to bake a cake from all possible floats

...for each of 13 billion (for a model with that many parameters) different cakes, except that they aren’t like cakes because the “best" temperature for each depends on the actual temperatures chosen for the others.

EVa5I7bHFq9mnYK · on April 4, 2023

It's 2^(16*13,000,000,000) different cakes.

swader999 · on April 4, 2023

Way better than paperclips.

holoduke · on April 3, 2023

Why would a 4bit quantized model be less accurate than a 16?

mdaniel · on April 4, 2023

My lay-person's understanding is that it's due to the problem one is trying to solve with a deep learning model: draw a curve through the dimensions which separates "good" from "bad" activation values. The lower resolution the line, the higher likelihood the line will fit sometimes and veer off into erroneous space others

imagine trying to draw the blue line on the right using only lego blocks: https://youtu.be/QDX-1M5Nj7s?t=1202

discussion: https://news.ycombinator.com/item?id=35405338

dragonwriter · on April 4, 2023

Because 4 bits less precisely specifies the value of the parameter than 16 bits does.