Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Essentially a computer neural network is just a lot of addition (and matrix multiplication) of floating point numbers. The parameters are the "strength" or "weights" of the connections between neurons on different layers and the "bias" of each neuron. If neuron Alice is connected to neuron Bob and Alice has a value of 0.7, and the weight of Alice's connection to bob is 0.5, then the value sent from Alice to Bob is 0.35. This value (and the values from all the other incoming connections) are summed at added to the neuron's negative bias.

I highly recommend checking out 3blue1brown series on how neural nets, gradient descent, and the dot product (implemented as a matrix multiplication) all tie together: https://www.youtube.com/watch?v=aircAruvnKk



To add to this excellent reply, I'll also point out that the reason folks want the weights is that they are the result of a massive search operation, akin to finding the right temperature to bake a cake from all possible floats. It takes a lot of wall clock time, and a lot of GPU energy, and a lot of input examples and counter-examples to find the "right" numbers. Thus, it really is better -- all things being equal -- to publish the results of that search to keep everyone else from having to repeat the search for themselves


> a massive search operation, akin to finding the right temperature to bake a cake from all possible floats

...for each of 13 billion (for a model with that many parameters) different cakes, except that they aren’t like cakes because the “best" temperature for each depends on the actual temperatures chosen for the others.


It's 2^(16*13,000,000,000) different cakes.


Way better than paperclips.


Why would a 4bit quantized model be less accurate than a 16?


My lay-person's understanding is that it's due to the problem one is trying to solve with a deep learning model: draw a curve through the dimensions which separates "good" from "bad" activation values. The lower resolution the line, the higher likelihood the line will fit sometimes and veer off into erroneous space others

imagine trying to draw the blue line on the right using only lego blocks: https://youtu.be/QDX-1M5Nj7s?t=1202

discussion: https://news.ycombinator.com/item?id=35405338


Because 4 bits less precisely specifies the value of the parameter than 16 bits does.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: