This one is about compression, reliable communication. While interesting and well-written, I don't think it matches the original request for "papers on AI, ML, ...".
I had a good quick skim. It has a language model (ok trigrams!) and the cross entropy formula and reasoning for it. On my reading list for sure! We did some information theory at uni but don’t recall all of this stuff, maybe I just forgot.
Research paper on AI doesn't necessarily mean a meme title and chasing a dubious "sota" status on some benchmark. I would say foundational work is more worth reading.