I think of an LLM model as like a crystallised mathematical snapshot of intelligence... like a cell on a microscope slide, a dead and mounted form of output from the living process of intelligence...
This paper makes me wonder whether, in a very fuzzy sense, we could give #LLMs access to some similarly crystallised analog of emotion or emotional valence, below the level of language
"Intelligence" is a continuous process. Without a continuous feedback loop, LLMs will never be more than a compression algorithm we bullied into being a chatbot.
OpenAi as a mega-organism might be intelligent, but the LLMs definitely are not.
The "compressed capture of semantic relationships" is a new thing we don't have a word for.
Funnily enough, there is a mathematical link between data compression and AGI [1]. I believe a paper circulated some time ago that compared gpt2 to gzip, with interesting results.
Would you say with equal confidence that they don't exemplify their intelligence by their ability to repeatedly select an often-successful next action from a set of possible next actions, based on a set of input observations?
It still doesn’t make sense for dogs. It might make some sense given a higher-level goal (hiding a toy under the bed)[1] but it doesn’t make much sense for selecting the goals (“I should hide this toy because the other dog keeps stealing it”). In building an AI dog it doesn’t work to elevate these higher-level goals into individual tokens because real dogs form goals dynamically according to their environment and the set is infinitely large. (Note that LLM agents also badly struggle with this; generating goals token-by-token means their goals have hallucinations.)
[1] It still doesn’t make much sense to view this as a statistical process; dogs can generalize far better than transformers, as perhaps best seen with seeing-eye dogs. I believe dogs’ powers of causal reasoning exceed what is possible from mere surface statistics: e.g. they innately understand object permanence as puppies, whereas transformers still don’t understand it after viewing thousands of dogs’ lifetimes of experience.
I've not been able to find any way to distinguish "mere surface statistics" from the deeper, richer, and more meaningful kind of something that it is meant to be contrasted with, except that "surface statistics" are un-compressed. For example, surface statistics might be the set of output measurements generated by a compact process, such as the positions of planets over time; knowing the laws of gravity means we can generate gigabytes of these statistics correctly and easily, which will accurately match future observations.
But then going the other way, from statistics to a causal model, is just an inverse problem -- just like, say, going from a set of noisy magnetic field measurements at the boundary of a container to a pattern of electric current flow inside a volume, or going from planet positions to orbit shapes and periods to an inverse square law of gravity. Generating a compressed inverse model from surface statistics is exactly the sort of thing that deep learning has proven to be very good at. And by now we've seen no shortage of evidence that LLMs and other deep networks contain stateful world models, which is exactly what you'd expect, because for all their parameters, they aren't nearly big enough to contain an infinitesimal fraction of the statistics they were trained on.
So I think it's overly dismissive to regard LLMs as mere surface statistics.
> So I think it's overly dismissive to regard LLMs as mere surface statistics.
It's literally what they are though.
Yes those probabilities embed human knowledge but that doesn't mean that the LLM itself is intelligent. It's why every LLM today fails at anything that isn't centred around rote learning.
It's what they input and output, but it's not literally what they are. The only way to squeeze that many statistics into a compact model is to curve-fit an approximation of the generating process itself. While it fits stochastic sequences (of any type, but usually text), it's conceptually no different from any other ML model. It's no more surface statistics than a deep neural network trained for machine vision would be.
Meh. I'm sometimes curious the different conversations that are possible in different places, I guess? One sometimes hears from different ppl, but maybe wants cross-talk
Seemed easy, and I thought harmless, tho maybe not
This paper makes me wonder whether, in a very fuzzy sense, we could give #LLMs access to some similarly crystallised analog of emotion or emotional valence, below the level of language
https://x.com/patcon_/status/1866549080127893613?s=46