Another great tool solving the exact problem we're willing to solve using an external service we can't use.
No company at a decent size (those who actually reach some complexity of documentation) will be okay with exfiltrating confidential information to an external service we have no deal or NDA with. Sure, OpenAI is easy to integrate, but it's also an absolute showstopper for a company.
We don't need state-of-the-art LLMs with 800k context, we need confidentiality.
I'm kinda confused by this. Every company already keeps their data in Google Docs, Notion, Slack, Confluence, Jira, or any number of other providers. When you sign up for one of these services, there's always a compliance step to make sure it's ok. OpenAI's TOS says they don't use API data for training. So what makes sending this data to OpenAI different than sending it to any of the above providers? This is an honest question. I don't understand the difference.
> Every company already keeps their data in Google Docs
The TOS for the (paid) enterprise products such as Google workspace are totally different from the (free) consumer versions. For example Google can't use the data for AI training.
TOS of OpenAI API (which tools like this use) do not allow for model training on the data either. You might be confusing their API with ChatGPT, which has a different policy.
The important point being, with Google, Notion, Slack, Confluence, etc. your company has an actual contract with the vendor, properly signed, with provisions about data handling that your company (and unlike you as an individual) can effectively enforce. There's an actual relationship created here, with benefits and losses flowing both ways.
The Terms of Service? They're worth less than it costs to print them out.
Case in point: right now, Microsoft is repackaging OpenAI models on their Azure platform and raking it in - the main value proposition here is literally just that it's "OpenAI, but with proper contract and an SLA". But companies happily pay up, because that's what makes the difference between "reliable and safe to use at work" vs. "violating internal and external data safety standards, and in some cases plain up illegal".
So if the product from OP used Azure OpenAI, it would be okay? You say "companies happily pay up", but the pricing is exactly the same (source: my company is paying for both APIs).
It's been quite clear for some time that, between OAI and MS, they very neatly split their market: OAI handles the early development and direct customers, and MS handles enterprises. It would require OAI to be a much bigger company than it is right now to properly handle enterprises, and MS already has all that infrastructure (legal, support, etc.). Seems like a sensible setup to me, and I don't see the need for enterprises to run open source models themselves (in this context - of course I see the value in all the other respects about lock-in and specialization), especially if they are already on Azure.
IANAL, but I read the openai api TOS earlier today, and they keep data for up to 30 days for "review" and multiple people can get access to it. If I had confidential data I would not send it to them. Microsoft on the other hand seems to have a option where absolutely no data is stored for their openai service.
You use "review" in quotes, but I don't see that word used in reference to the 30 day policy. This is what I see:
> Any data sent through the API will be retained for abuse and misuse monitoring purposes for a maximum of 30 days, after which it will be deleted (unless otherwise required by law). [0]
The word "review" implies humans reading your data, but this wording only says it's retained for "monitoring". That could mean other things.
Or are you seeing the "review" wording somewhere else?
It is true that the word "review" was my own, it was my interpretation of this paragraph
>OpenAI retains API data for 30 days for abuse and misuse monitoring purposes. A limited number of authorized OpenAI employees, as well as specialized third-party contractors that are subject to confidentiality and security obligations, can access this data solely to investigate and verify suspected abuse.
For our part, we self-host Confluence and gitlab, have tons of internal documentation and web pages, are are prohibited from using external tools unless they can be hosted internally in a sandboxed manner. There's no way on the planet they would approve the use of connecting to an OpenAI API for trawling through internal documentation.
Trust. OpenAI's ignored everyone's copyright and legal usage terms for the rest of their training data, what lawyer is going to trust them to follow their contractual terms?
Why would you send your data to the company that built its value by slurping up everyone's data without consent? It doesn't matter what they promise now, they have shown that they dont care about intellectual property, copyright or any of that. They literally cannot be trusted.
It doesn't matter either way. What matters is that Google offers proper enterprise contracts. Contracts that are enforceable and transfer a lot of legal liability to the vendor. OpenAI, generally, does not offer such things.
Google Search itself is a somewhat special case - it gets a free pass because of its utility and because you're unlikely to paste anything confidential into a search box. But there are many places where even Google Search is banned on the data security grounds.
OpenAI offerings - ChatGPT, the playground, and the API - all very much encourage pasting large amounts of confidential information into them, which is why any organization with minimum of legal sense is banning or curtailing their use.
Two weeks ago I finished a project for a client who wanted a "talk to your documents" application, without using OpenAI or other 3rd party APIs, but by using open source models running on their own infrastructure.
If you're interested in something similar, send me an email.
No company at a decent size (those who actually reach some complexity of documentation) will be okay with exfiltrating confidential information to an external service we have no deal or NDA with. Sure, OpenAI is easy to integrate, but it's also an absolute showstopper for a company.
We don't need state-of-the-art LLMs with 800k context, we need confidentiality.