> So I think it's overly dismissive to regard LLMs as mere surface statistics.
It's literally what they are though.
Yes those probabilities embed human knowledge but that doesn't mean that the LLM itself is intelligent. It's why every LLM today fails at anything that isn't centred around rote learning.
It's what they input and output, but it's not literally what they are. The only way to squeeze that many statistics into a compact model is to curve-fit an approximation of the generating process itself. While it fits stochastic sequences (of any type, but usually text), it's conceptually no different from any other ML model. It's no more surface statistics than a deep neural network trained for machine vision would be.
It's literally what they are though.
Yes those probabilities embed human knowledge but that doesn't mean that the LLM itself is intelligent. It's why every LLM today fails at anything that isn't centred around rote learning.