Another great tool solving the exact problem we're willing to solve using an ext...

dimal · on July 10, 2023

I'm kinda confused by this. Every company already keeps their data in Google Docs, Notion, Slack, Confluence, Jira, or any number of other providers. When you sign up for one of these services, there's always a compliance step to make sure it's ok. OpenAI's TOS says they don't use API data for training. So what makes sending this data to OpenAI different than sending it to any of the above providers? This is an honest question. I don't understand the difference.

jsiepkes · on July 10, 2023

> Every company already keeps their data in Google Docs

The TOS for the (paid) enterprise products such as Google workspace are totally different from the (free) consumer versions. For example Google can't use the data for AI training.

curl-up · on July 10, 2023

TOS of OpenAI API (which tools like this use) do not allow for model training on the data either. You might be confusing their API with ChatGPT, which has a different policy.

TeMPOraL · on July 10, 2023

The important point being, with Google, Notion, Slack, Confluence, etc. your company has an actual contract with the vendor, properly signed, with provisions about data handling that your company (and unlike you as an individual) can effectively enforce. There's an actual relationship created here, with benefits and losses flowing both ways.

The Terms of Service? They're worth less than it costs to print them out.

Case in point: right now, Microsoft is repackaging OpenAI models on their Azure platform and raking it in - the main value proposition here is literally just that it's "OpenAI, but with proper contract and an SLA". But companies happily pay up, because that's what makes the difference between "reliable and safe to use at work" vs. "violating internal and external data safety standards, and in some cases plain up illegal".

ibestvina · on July 11, 2023

So if the product from OP used Azure OpenAI, it would be okay? You say "companies happily pay up", but the pricing is exactly the same (source: my company is paying for both APIs).

It's been quite clear for some time that, between OAI and MS, they very neatly split their market: OAI handles the early development and direct customers, and MS handles enterprises. It would require OAI to be a much bigger company than it is right now to properly handle enterprises, and MS already has all that infrastructure (legal, support, etc.). Seems like a sensible setup to me, and I don't see the need for enterprises to run open source models themselves (in this context - of course I see the value in all the other respects about lock-in and specialization), especially if they are already on Azure.

Epa095 · on July 10, 2023

IANAL, but I read the openai api TOS earlier today, and they keep data for up to 30 days for "review" and multiple people can get access to it. If I had confidential data I would not send it to them. Microsoft on the other hand seems to have a option where absolutely no data is stored for their openai service.

dimal · on July 10, 2023

You use "review" in quotes, but I don't see that word used in reference to the 30 day policy. This is what I see:

> Any data sent through the API will be retained for abuse and misuse monitoring purposes for a maximum of 30 days, after which it will be deleted (unless otherwise required by law). [0]

The word "review" implies humans reading your data, but this wording only says it's retained for "monitoring". That could mean other things.

Or are you seeing the "review" wording somewhere else?

[0] https://openai.com/policies/api-data-usage-policies

Epa095 · on July 10, 2023

It is true that the word "review" was my own, it was my interpretation of this paragraph

>OpenAI retains API data for 30 days for abuse and misuse monitoring purposes. A limited number of authorized OpenAI employees, as well as specialized third-party contractors that are subject to confidentiality and security obligations, can access this data solely to investigate and verify suspected abuse.

nrjames · on July 10, 2023

For our part, we self-host Confluence and gitlab, have tons of internal documentation and web pages, are are prohibited from using external tools unless they can be hosted internally in a sandboxed manner. There's no way on the planet they would approve the use of connecting to an OpenAI API for trawling through internal documentation.

rolisz · on July 10, 2023

There are open source models that can deliver pretty well for chatbot over internal documentation. If you're interested, feel free to reach out to me.

ixfo · on July 10, 2023

Trust. OpenAI's ignored everyone's copyright and legal usage terms for the rest of their training data, what lawyer is going to trust them to follow their contractual terms?

floomk · on July 10, 2023

Why would you send your data to the company that built its value by slurping up everyone's data without consent? It doesn't matter what they promise now, they have shown that they dont care about intellectual property, copyright or any of that. They literally cannot be trusted.

dimal · on July 10, 2023

Isn't this what Google search does? Yet Google Docs, Gmail, etc are all OK?

TeMPOraL · on July 10, 2023

It doesn't matter either way. What matters is that Google offers proper enterprise contracts. Contracts that are enforceable and transfer a lot of legal liability to the vendor. OpenAI, generally, does not offer such things.

Google Search itself is a somewhat special case - it gets a free pass because of its utility and because you're unlikely to paste anything confidential into a search box. But there are many places where even Google Search is banned on the data security grounds.

OpenAI offerings - ChatGPT, the playground, and the API - all very much encourage pasting large amounts of confidential information into them, which is why any organization with minimum of legal sense is banning or curtailing their use.

rolisz · on July 10, 2023

Two weeks ago I finished a project for a client who wanted a "talk to your documents" application, without using OpenAI or other 3rd party APIs, but by using open source models running on their own infrastructure.

If you're interested in something similar, send me an email.