Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A large array of uniquely-set floating point values. (AKA "parameters".)

In a language model, a word is put in one end (as a numerical index to a wordlist), and then it and the weights multiplied together, and then a new word comes out (again as an index).

Numbers in, numbers out, and a small bit of logic that maps words to numbers and back at either end. ("Encodings".)

"Training" is the typically expensive process of feeding huge amounts of data into the model, to get it to choose the magic values for its weights that allow it to do useful stuff that looks and feels like that training data.

Something else that can be done with weights is they can be "fine-tuned", or "tweaked" slightly to give different overall results out of the model, therefore tailored to some new use-case. Often the model gets a new name after.

In this case, what's been released is not actually the weights. It's a set of these tweaks ("deltas"), which are intended to be added to Meta's LLaMA model weights to end up with the final intended LLaMA-based model, called "Vicuna".



> A large array of uniquely-set floating point values.

How large? How many elements?


It's in the name of the model - "Vicuna-13B" implies there are 13 billion parameters.


the way these LLMs work, there is a weight for each parameter? 13 billion weights? what is an example of a parameter?


A parameter is a variable for which a weight (a floating point value) is the concrete value.


a weight is an example of a parameter

so is a bias, and presumably the biases are also in the same file with the weights




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: