> Because they can imagine a god-like super intelligent AI, it must be possible....

lukev · on March 31, 2023

This is exactly the kind of mysticism I'm talking about. In fact we know precisely how LLMs work.

The fact that parts of human linguistic concept-space can be encoded in a high dimensional space of floating point numbers, and that a particular sequence of matrix multiplications can leverage that to perform basic reasoning tasks is surprising and interesting and useful.

But we know everything about how how it is trained and how it is invoked.

In fact, because it's only "state" aside from its parameters is whatever its context window, current LLMs have the interesting property that if you invoke them recursively, all of their "thoughts" are human readable. This is in fact a delightful property for anyone worried about AI safety: our best AIs currently produce a readable transcript of their "mental" processes in English.

rhn_mk1 · on March 31, 2023

We know how they work, that is true. We don't know why they work, because if we could, then we could extrapolate what happens when you throw more compute at them, and no one would have been surprised about the capabilities of GPT-N+1. Also no one would have been caught with their pants down by seeing people jailbreak their models.

To illustrate it in a different way: on a mechanistic level, we know how animal brains work, as well. Ganglions, calcium channels, the stuff. That doesn't help understand high level phenomena like cognition, which is the part that matters.

If you're right about the LLMs revealing their inner working, that would be indeed a reason to chill out. But I have my doubts, given that LLMs are good at hallucinating. Could you justify why the human readability is actually true, and support that with examples?

lukev · on March 31, 2023

I don't need examples. It's simply how they work. This is why they hallucinate.

A LLM is fundamentally a mathematical function (albeit a very complex one, with billions of terms (a.k.a parameters or weights)). The function does one thing and one thing only: it takes a sequence of tokens as input (the context), and it emits the next token(word)[1].

This is a stateless process: it has no "memory" and the model parameters are immutable; they are not changed during the generation process.

In order to generate longer sequences of text, you call the function multiple times, each time appending the previously generated token to the input sequence. The output of the function is 100% dependent on the input.

Therefore, the only "internal state" a model has is the input sequence, which is human-readable sequence of tokens. It can't "hallucinate", it can't "lie", and it can't "tell the truth", it can only emit tokens one at a time. It can't have a hidden "intent" without emitting those tokens, it can't "believe" something different than what it emits.

[1] Actually a set of probabilities for the next token, and one is selected at random based on the "heat" generating setting, but this is irrelevant for the high-level view.

lukev · on March 31, 2023

Responding here since it won't let me continue your thread any more.

No, there's a fundamental misunderstanding here. I'm not saying the model will tell you the truth about its internal state if you ask it (it absolutely will not.)

I'm saying it has no internal state, and no inner high level processes at all other than it's pre-baked, immutable parameters.

rhn_mk1 · on March 31, 2023

Then you did not read my post carefully enough. The question was not about "internal state" but "inner workings". The model clearly does something. The problem is that we don't know how to describe in human terms what happens between the matrix multiplication and the words it spits out. Whether it has state is completely irrelevant.

lukev · on March 31, 2023

Whether it has inner state is highly relevant to my claim, which was that the only state a LLM has (aside from its parameters) is transparent and readable in English. Which the context is.

rhn_mk1 · on March 31, 2023

You're the one who put state in the conversation. State is part of the whole, and not the whole. It's not enough to understand state if you want to understand why they work. I feel like you're trying to muddy the waters by redirecting the problem to be about the state - it isn't.

lukev · on March 31, 2023

On one hand I hate to belabor this point, but on the other I think it's actually super important.

Both of things things are true:

1. The relationships between parameter weights are mysterious, non-evident, and we don't know precisely why it is so effective at token generation.

2. An agent built on top of a LLM cannot have any thought, intent, consideration, agenda, or idea that is not readable in plain english. Because all of those concepts involve state.

rhn_mk1 · on March 31, 2023

I'm not going to argue whether that's correct or not. In the end, adding state to a LLM is trivial. Bing chat has enough state to converse without forgetting the context. Google put an LLM on a physical robot, which has state even if narrowly understood as the position in space. Go further and you might realize that we have systems with part LLM, part other state (LLM + a stateful human on the other side of the chat).

So we have ever-more-powerful seemingly-intelligent LLMs, attached to state with no obvious limit to the growth of either. I don't see why in the extreme this shouldn't extrapolate to godlike intelligence, even with the state caveat.

nicpottier · on March 31, 2023

This is a great reminder, thank you.

As someone not skilled in this art, is there anything preventing us from opening that context window many orders of magnitude? What happens then? And what happens if it is then "thinking in text" faster than we can read them? (with an intent towards paper clips)

This is a genuine question, I'm not trolling.

happypumpkin · on March 31, 2023

I'm very much not an expert either but apparently for the regular "attention" mechanism memory and compute requirements scale quadratically with respect to input sequence length. So increasing it by just two orders of magnitude would mean (I think) a full context window needs 10000x more memory and compute time and presumably costs would go up by at least as much. GPT3 (I think) uses the regular attention mechanism while 4 is unknown.

However, GPT4 claims there are techniques to improve scaling (complexity down to sub-quadratic or linear) without affecting accuracy too much (I have no clue if true): sparse attention, long-range arena, reformer, and performer.

I'm also pretty sure I've read (and anecdotally it seems true) that accuracy decreases with longer input/output sequences regardless. How much I also don't know.

lukev · on March 31, 2023

You could do those things in theory. I'm not saying that you could never build AGI on top of a LLM, or that such a AGI could not become "misaligned."

I'm just saying that having a mental state that's natively in English is a nice property if one is worried about what they are "thinking."

rhn_mk1 · on March 31, 2023

I don't see how this proves that asking the model about its internal state will reveal its inner high level processes in a human-readable way.

Perhaps there's a research paper which would explain it better?

Jensson · on March 31, 2023

> Also no one would have been caught with their pants down by seeing people jailbreak their models.

Preventing jailbreak in a language model is like preventing a GO AI from drawing a dick with the pieces. You can try, but since the model doesn't have any concept of what you want it to do it is very hard to control that. Doesn't make the model smart, it just means that the model wasn't made to understand dick pictures.

rhn_mk1 · on March 31, 2023

It does not make the model smart, but it demonstrates our inablity to control it despite wanting it. That strongly suggests that it's not fully understood.

famouswaffles · on March 31, 2023

We don't know how they work lol. How they are trained is what we understand. Nobody knows what the models learn exactly during training and nobody sure as hell knows what those billions of neurons are doing at inference. Why just a few months ago, some researchers discovered the neuron that largely decides when "an" comes before a word in GPT-2. We understand very little about the inner workings of these models. And if you knew what you were talking about, you would know that.

lukev · on March 31, 2023

We apparently have misaligned understandings of what we mean by "how they work." I agree, we don't know how to interpret the weight structure that the model learns during training.

But we do know exactly what happens mechanically during training and inference; what gets multiplied by what, what the inputs and outputs are, how data moves around the system. These are not some mysterious agents that could theoretically do or be anything, much less be secretly conscious (as a lot of alarmists are saying.)

They are functions that multiply billions of numbers to generate output tokens. Their ability to output the "right" output tokens is not well understood, and nearly magical. That's what makes them so exciting.

famouswaffles · on March 31, 2023

It is all things considered pretty easy to set up GPT such that it runs on its own input forever while being able to interact with users/other systems. add an inner monologue/react and reflexion and you have a very powerful system. embody it with some physical locomotive machine and oh boy. no one has really put this all together yet but everything i've said has been done to some degree. The individual pieces are here. it's just a matter of time. I'm working on some such myself.

What it could do is limited only by its intelligence (which is quite a bit higher than the base model as several papers have indicted) and the tools it controls (we seem to gladfully pile more and more control ). What it can be is...anything. If there's anything LLMs are good at, it's simulation.

Even this system with thoughts we can theoretically configure to see would be difficult to control. theory and practicality would not meet the road. you will not be able to monitor this system in real time. We've seen bing (doesn't even have all i've described) take action when "upset". The only reason it didn't turn sour is because her actions are limited to search and ending the conversation. But that's obviously not the direction of things here.

Can't say i want this train to stop. But i'm under no delusions it couldn't turn dangerous very quickly.

lukev · on March 31, 2023

I agree that LLMs could be one module in a future AGI system.

I disagree that LLMs are good at simulation. They're good at prediction. They can only simulate to the degree that the thing they're simulating is present in their training data.

Also, if you were trying to build an AGI, why would you NOT run it slowly at first so you could preserve and observe the logs? And if you wanted to build it to run full speed, why would you not build other single-purpose dumber AIs to watch it in case its thought stream diverged from expected behavior?

There's a lot of escape hatches here.

famouswaffles · on April 2, 2023

human.exe, robot.exe, malignant agent.exe are all very much simulations that an llm would have no problem running.

>Also, if you were trying to build an AGI, why would you NOT run it slowly at first so you could preserve and observe the logs?

I'm telling you that is extremely easy to do all the things i've said. Some might be interested in doing what you say. Others might not. at any rate, to be effective this requires real time monitoring of thoughts and actions. That's not feasible forever. an LLMs state can change. There's no guarantee the friendly agent you observed today will be friendly tomorrow.

>And if you wanted to build it to run full speed, why would you not build other single-purpose dumber AIs to watch it in case its thought stream diverged from expected behavior?

This is already done with say Bing. Not even remotely robust enough.