haha fair point, you can get the expected results with the right prompt, but I think it still reveals a general lack of true reasoning ability (or something)
Or it just shows that it tries to overcorrect the prompt which is generally a good idea in the most cases where the prompter is not intentionally asking a weird thing.
This happens all the time with humans. Imagine you're at a call center and get all sorts of weird descriptions of problems with a product: every human is expected to not expect the caller is an expert and actually will try to interpolate what they might mean by the weird wording they use