These OCR improvements will almost certainly be brought to google books, which is great. Long term it can enable compressing all non-digital rare books into a manageable size that can be stored for less than $5,000.[0] It would also be great for archive.org to move to this from Tesseract. I wonder what the cost would be, both in raw cost to run, and via a paid API, to do that.
Not always, you can improve the loop by putting something real inside, like, a code execution tool, a search engine, a human, other AIs or an API. As long as the model can make use of that external environment its data can improve. By the same logic a human isolated from other humans for a long time might also be in a situation of going crazy.
Practical example - using LLMs to create deep research reports. It pulls over 500 sources into a complex analysis, and after all that compiling and contrasting it generates an article with references, like a wiki page. That text is probably superior to most of its sources in quality. It does not trust any one source completely, it does not even pretend to present the truth, it only summarizes the distribution of information it found on the topic. Imagine scaling wikipedia 1000x by deep-reporting every conceivable topic.
[0] https://annas-archive.org/blog/critical-window.html