LLM mediocrity is just a reflection of human mediocrity, and my bet is on the average LLM to get way better much faster than the average human doing the same.
Agree with you, but on mediocrity: Mistral barely passes as usable, GPT-4 is barely better than Googling, and nothing else I've tried is even ready for production. So there's some element of the model's design, weights/embeddings, and training data that matters a lot.
Only fine-tuned models are producing impressive work, because when we say something is impressive it by definition means not like the status quo - the model must be tuned toward some bias or other, whether it's aesthetic or otherwise, in order to stand out from the rest. And generic models like GPT or Stable Diffusion will always be generic, they won't have a bias toward certain truths - they'll be mostly unbiased which we want for general research or internet search.
So it's interesting, in order to get incredible quality of work out of AI, you have to make it specific, but in order to that, you have to train it on the work of humans. I think for this reason AI will always be ultimately behind humans, though it of course will displace a lot of work we do, which is significant.