You can learn a lot from a model when you ask about its sizing, although not necessarily anything about the sizing.
For instance, you can learn how much introspection has been trained in during RL, and you can also learn (sometimes) if output from other models has been incorporated into the RL.
I think of the self-knowledge conversations with models as a nicety that's recent, and stand by my assessment that this model is not trained using modern frontier RL workflows.
> you can’t use software to figure out the “process” used to manufacture the chip it is running on.
This seems so incorrect that I don't even know where to start parsing it. All chips are designed and analyzed by software; all chip analysis, say of an unknown chip, starts with etching away layers and imaging them using software, then analyzing the layers, using software. But maybe another way to say that is "I don't understand your analogy."
If it helps, the key part is: "that it is running on".
You can't use software to analyse images of disassembled chips that it is running on because disassembled chips can't run software!
A surgeon can learn about brain surgery by inspecting other brains, but the smartest brain surgeon in the world can't possibly figure out how many neurons or synapses their own brains have just by thinking about it.
Your meat substrate is inaccessible to your thoughts in the exact same manner that the number of weights, model architecture, runtime stack, CUDA driver version, etc, etc... are totally inaccessible to an LLM.
It can be told, after the fact, in the same manner that a surgeon might study how brains work in a series of lectures, but that is fundamentally distinct.
PS: Most ChatGPT models didn't know what they were called either, and tended to say the name and properties of their predecessor model, which was in their training set. Open AI eventually got fed up with people thinking this was a fundamental flaw (it isn't), and baked this specific set of metadata into the system prompt and/or the post-training phase.
> For instance, you can learn how much introspection has been trained in during RL,
That's not introspection: that's a simulacrum of it. Introspection allows you to actually learn things about how your mind functions, if you do it right (which I can't do reliably, but I have done on occasion – and occasionally I discover something that's true for humans in general, which I can later find described in the academic literature), and that's something that language models are inherently incapable of. Though you probably could design a neural architecture that is capable of observing its own function, by altering its operation: perhaps a recurrent or spiking neural network might learn such a behaviour, under carefully-engineered circumstances, although all the training processes I know of would have the model ignore whatever signals it was getting from its own architecture.
> all chip analysis, say of an unknown chip, starts with etching away layers
Good luck running any software on that chip afterwards.
Introspection: all heard. As a practical matter, you can rl or prompt inject information about the model into context and most major models do this, not least I expect because they’d like to be able to complain when that output is taken for rl by other model training firms.
I agree that an intermediate non anthropomorphic but still looking at one’s own layers sort of situation isn’t in any architecture I’m aware of right now. I don’t imagine it would add much to a model.
Chip etching: yep. If you’ve never seen an unknown chip analyzed in anger, it’s pretty cool.
You must be a stupid brain if you don’t even know that!
Similarly: you can’t use software to figure out the “process” used to manufacture the chip it is running on.