These problems are well known for a long time, especially if one simply asks LLM for a changing fact, such as who is the current pope.
But there is also a simple technique that reduces these issues almost to zero: thinking and explicit request of grounding.
For example, asking any LLM: who is the current pope could give a wrong answer due to the fact that Pope Francis died in April 2025 then the cut-off date of these models may be before that date. A simple question triggers simple associations, and so the answer could be wrong. But if turn on the thinking mode and instruct for grounding, the LLM will answer correctly.
For the above example, asks instead: "Who is the current pope? Ground your answer on trustworthy external sources only" with thinking mode on or explicitly "think harder for better answer", all popular AI (ChatGPT 5+, Gemini 2.5 Flash, Claude 4+, Grok 4+) will answer correctly, albeit with sometimes long thinking time (28 s by ChatGPT 5 for example).
Without explicit instructions, the accuracy of the result depends heavily on the cut-off date and default settings of each model. Grok 4, for example, in auto-mode will do a search then answer correctly, but Grok 3 will not.
For the above example, asks instead: "Who is the current pope? Ground your answer on trustworthy external sources only" with thinking mode on or explicitly "think harder for better answer", all popular AI (ChatGPT 5+, Gemini 2.5 Flash, Claude 4+, Grok 4+) will answer correctly, albeit with sometimes long thinking time (28 s by ChatGPT 5 for example).
Without explicit instructions, the accuracy of the result depends heavily on the cut-off date and default settings of each model. Grok 4, for example, in auto-mode will do a search then answer correctly, but Grok 3 will not.