Like the review which allowed them tonignore licenses while ingesting all public repos in GitHub? - And yes, true, T&C allow them to ignore the license, while it is questionable whether all people who uploaded stuff to GitHub had the rights given by T&C (uploading some older project with many contributors to GitHub etc.)
Different threat profile. They don’t have the TOS protection for training data and Microsoft is a juicy target for a huge copyright infringement lawsuit.
Yeah, that's an interesting point. But I think with appropriate RAG techniques and proper citations, a future LLM can get around the copyright issues.
The problem right now with GPT4 is that it's not citing its sources (for non search based stuff), which is immoral and maybe even a valid reason to sue over.
I wouldn’t count on that if Microsoft’s legal team does a review of the training data.