Prompt engineering exists because a) LLMs are trained to optimize for statistically average accommodation of the dataset and b) Sturgeon's law: "ninety percent of everything is crap". Therefore, LLMs out-of-the-box will give worse-than-ideal results by design.
The initial proof that prompt engineering worked was around the VQGAN + CLIP days, where simply adding "world-famous" or "trending on ArtStation" was more than enough to objectively improve generated image quality.
The workaround to prompt engineering is RLHF/alignment of the LLM, but everyone who has played around with ChatGPT knows that isn't sufficient.
The initial proof that prompt engineering worked was around the VQGAN + CLIP days, where simply adding "world-famous" or "trending on ArtStation" was more than enough to objectively improve generated image quality.
The workaround to prompt engineering is RLHF/alignment of the LLM, but everyone who has played around with ChatGPT knows that isn't sufficient.