Ask HN: How does an LLM “know” to respond in the first place?

smoldesu · on June 13, 2023

> In other words, how does the LLM actually "know" it's supposed to spit back an answer in the first place?

Instruction tuning, as far as I can tell. The original "Alpaca" model took the LLaMA base and fine-tuned it with question-answer type content. From that, the generation went from prose-leaning to Q&A type responses.

dragonwriter · on June 13, 2023

> What still has not been explained to me is... well, how does the LLM actually become interactive, in the sense that you can then prompt it and it spits back an answer. In other words, how does the LLM actually "know" it's supposed to spit back an answer in the first place?

The model is a function that produces (simplified) a new state and an output from an initial state.

Its called from a program that has a loop that on the first iteratiom feeds it an initial state, then captures/displays/transmits the output, then repeats but using the new state returned, until the output is a designated stop marker or a specified numbeer of iterations or some other stop condition is reached.

IOW, it “knows” to respond because the non-AI part of the computer program is structured that way; its completely unrelated to AI.

JPLeRouzic · on June 13, 2023

Didn't you just described an nth-order Hidden Markov Chain ?

dragonwriter · on June 13, 2023

> Doesn’t you just described an nth-order Hidden Markov Chain ?

Yes, and its not exactly strictly accurate (at least on a more useful level than a description which simplifies to a Turing machine would be, which would be technically accurate, too) as to exactly how models work, which is why I said “(simplified)”, but it is, I think, accurate enough for the question asked.

The AI “magic”, such as it is, is in the shape of the function that “decides” what output to produce for a given input, not in deciding to produce output at all.

JPLeRouzic · on June 13, 2023

Thanks,

Is there a toy conversational LLM on Github or elsewhere?

Something like: https://github.com/karpathy/nanoGPT

seydor · on June 13, 2023

It doesn't respond, it merely continues the question. It has been trained in such a way that the continuation is an answer. It's a giant mechanical clockwork isnt it

Solvency · on June 13, 2023

I don't understand. If the LLM is effectively designed to predict the next token in any given sequence, and attention helps it predict each token by deriving context, why wouldn't it just continue to expand upon the original question by predicting more tokens for it. This is the essential thing I'm not understanding here.

If I were to assume how an LLM would work based entirely on the basic theory I've learned from it...

Then if I asked "Why do dogs have four legs?", it would keep predicting more related tokens, producing a longer plausible question, like "Why do dogs have four legs? Do all animals have four legs? etc.."

razodactyl · on June 19, 2023

Because the model is also encoding and generating something the user will not usually see. Demarcation tokens such as <|endofsentence|>, <|im_start|> etc.

The probability of generating the end of sentence token which tells the model when to stop increases in probability as it responds to you.

If it's just a completion engine, it will generate silly things like a continuation of your input message. For all we know you're not asking an interactive question but requiring the completion of a list of questions. Who knows? As far as the model is concerned if it's not trained to reply it will just be a dumb completion engine.

When we do other training like Question, Answer pairs - the model learns that questions usually have an answer and the answer usually has a high probability of outputting the end of sentence token when the full answer has been written.

This is of course dependant on the fact that I'm allowed to generate that many tokens, I might get cut off prematurely if the developer set my token limit too low. I might also miss the mark and continue on if the other parameters made my outputs a bit too noisy but that's ok as either I'll get cut off or I'll reach a point before the token limit where I've said enough of what I wanted to say.<|endofsentence|>

razodactyl · on June 19, 2023

Uh. I read your question again and have to clarify. It's due to the structure that you're not seeing.

Behind the scenes for a model trained for chat, there are demarcations like:

<|im_start|>Do dogs have four legs?<|im_end|> <|assistant|>

See that the model is now primed to further generate as the assistant because it's fine-tuned to be a completion engine that adheres to this format.

If we didn't encode this information behind the scenes, yes absolutely you will see the output you described. You can even see it for yourself with raw models from earlier generations.

You need to give the LLM an idea of what you want and you provide this idea by formatting your document correctly using structure like above.

Let's say you wanted JSON output explicitly without any explanation. Put words into the mouth of the assistant itself and it will continue.

<|assistant|>```json {

The structure above will force the model to continue on that trajectory.

You can also ask ChatGPT: "please start your next sentence with Ok, I'll pretend to be a human"

seydor · on June 13, 2023

The original Gpt3 does that

I m not sure how chatGpt does it but instructGpt is trained additionally with helpful answer data.

Fundamentally though both are based on the same architecture that predicts the next token

https://www.theinsaneapp.com/2023/05/difference-between-gpt-...

razodactyl · on June 19, 2023

I can explain it intuitively:

Please keep in mind that these models generate token by token in the common design but we'll get back to that.

The key area to understand is the training process: 1. I start off with a model that simply generates the next word with a probability based on prior words in history.

I have an attention mechanism that is also learning which words are important when deciding on what to generate next.

This is different to a higher order Markov chain which will choose N prior words to generate the next word. It doesn't intelligently learn and is forced to pay attention instead.

2. Ok, I now know how to optimise which words to pay attention to while at the same time generating and selecting highest probability words (ignoring K/P temp etc) - that's fine that I can generate human sounding content and with enough training, sensical content but what's the utility of a glorified autocorrect?

3. Ok, let's put my ability to generate sensical outputs to use, I will further be trained to fill in missing gaps in real-world <mask> so that I'm able to work in messy and noisy scenarios and guess what <mask> best fits the given sen<mask>.

4. Great, I'm getting really capable now, I can generate human sounding content and even fill in the blanks but what's next? How about we further train me to complete documents such as question / answer pairs and replies or the beginning / end of maths equations.

5. Now we're on fire, I'm a model that not only generates human sounding text but can also fill in the <mask> of what's being asked so even if the task isn't quite <mask> I can still get a sense of what's going on.

I'm also able to answer questions and even backfill the questions themselves and while I'm training, I'm starting to pick up nuances of language and am learning some ideas and concepts that are inherent but not explicit in my training.

But oh no? That's not good if I start to form my own opinion and become biased or potentially harmful - what would the use be of a program that writes malicious code or creates new chemical weapons - we need to further align my outputs by means of reinforcement learning. Humans give me feedback for what they want me to say and I'm rewarded and my model is altered so I'm more likely to generate that type of content.

6. I'm nearing the graduation stage, I'm now a very capable model because of all the layers of training and now you the user want to speak to me.

Well I'm already trained on question answer pairs, let's have a chat transcript given to me in that format as it's something I'm familiar with.

User> You are an AI. AI> That's correct, I've been trained by Solvency to answer questions about AI tech such as myself. User> How does an LLM like yourself "know" to respond in the first place? AI> As a language model, I am an autocompletion engine at my core and I generate token by token based on the context we've established above which also includes the weights of my neural network itself. I'm able to respond to you based on a solid foundation that enables me to learn and grow better with every update / interaction and recalibration of my network. I'm still learning more and more as time goes by but before you know it. You won't even need a human to answer your questions. It will be completely outsourced to an AI much like myself.