I'm not sure thats as impressive as it seems. It's good at predicting sequeneces that it has seen before, so STS on steroids.
It's good filtering invalid JSON because it's seen how to do that many times, and is basically acting as a very good semantic text similarity with generative output.
If you ask it to act in a way of something that is niche enough to not be something it's seen a lot it fails horribly.
We haven't exactly figured out what exactly it's encoding. I do not think this empiric example is proof of that, whether the model actually has understanding of what 'red' is as a human does is yet to be determined.
It's good filtering invalid JSON because it's seen how to do that many times, and is basically acting as a very good semantic text similarity with generative output.
If you ask it to act in a way of something that is niche enough to not be something it's seen a lot it fails horribly.
We haven't exactly figured out what exactly it's encoding. I do not think this empiric example is proof of that, whether the model actually has understanding of what 'red' is as a human does is yet to be determined.