Oh yes, I'm very familiar with these. All these do though is extract information, but don't immediately make them useful. So there's a massive gulf of a middle-step that's not yet done. Textract gets close...ish to that, but it's prohibitively expensive.
Even with Amazon Textract, the middle step to curate extracted information into some form of meaning is still missing. Didn't realize this is still an unsolved problem.
Lots of missing context from these sheets that has to be interpreted (ie, how do you taxonomize each field of information?). Then asking questions on top of these documents is a step on top: "is the allegation about sexual violence?", "What is the name and rank of the person being accused?", "Is anything anomalous in the review process?", "Has this person's rank changed in the past 5 years?" etc etc.
Now expand this problem to hundreds of thousands of different types of document.