> This is a wild test, because LLMs get really pushy and insistent that the dog only has 4 legs.
Most human beings, if they see a dog that has 5 legs, will quickly think they are hallucinating and the dog really only has 4 legs, unless the fifth leg is really really obvious. It is weird how humans are biased like that:
1. You can look directly at something and not see it because your attention is focused elsewhere (on the expected four legs).
2. Our pre-existing knowledge (dogs have four legs) influences how we interpret visual information from the bottom-up.
3. Our brain actively filters out "unimportant" details that don't align with our expectations or the main "figure" of the dog.
Attention should fix this however, like if you ask the AI to count the number of legs the dog has specifically, it shouldn't go nuts.
A straight up "dumber" computer algorithm that isn't trained extensively on real and realistic image data is going to get this right more often than a transformer that was.
Yes, its all evolution. 5 legged dogs aren't very common, so we don't specifically look for them. Like we aren't looking for humans with six fingers.
I get it, the litmus test of parent is to show that the AI is smarter than a human, not as smart as a human. Can the AI recognize details that are difficult for normal people to see even though the AI has been trained on normal data like the humans have been.
I think the LLM is just trying to be useful, not omniscient. Binary thinkers are probably not going to be able to appreciate the difference, however.
If you want the AI to identify a dog, we are done. If you want the AI to identify subtle differences from reality, then you are going to have to use a different technique.
Most human beings, if they see a dog that has 5 legs, will quickly think they are hallucinating and the dog really only has 4 legs, unless the fifth leg is really really obvious. It is weird how humans are biased like that:
1. You can look directly at something and not see it because your attention is focused elsewhere (on the expected four legs).
2. Our pre-existing knowledge (dogs have four legs) influences how we interpret visual information from the bottom-up.
3. Our brain actively filters out "unimportant" details that don't align with our expectations or the main "figure" of the dog.
Attention should fix this however, like if you ask the AI to count the number of legs the dog has specifically, it shouldn't go nuts.
A straight up "dumber" computer algorithm that isn't trained extensively on real and realistic image data is going to get this right more often than a transformer that was.