The "safety" example in the "chain-of-thought" widget/preview in the middle of the article is absolutely ridiculous.
Take a step back and look at what OpenAI is saying here "an LLM giving detailed instructions on the synthesis of strychnine is unacceptable, here is what was previously generated <goes on to post "unsafe" instructions on synthesizing strychnine so anyone Googling it can stumble across their instructions> vs our preferred, neutered content <heavily rlhf'd o1 output here>"
What's this obsession with "safety" when it comes to LLMs? "This knowledge is perfectly fine to disseminate via traditional means, but God forbid an LLM share it!"
There are two basic versions of “safety” which are related, but distinct:
One version of “safety” is a pernicious censorship impulse shared by many modern intellectuals, some of whom are in tech. They believe that they alone are capable of safely engaging with the world of ideas to determine what is true, and thus feel strongly that information and speech ought to be censored to prevent the rabble from engaging in wrongthink. This is bad, and should be resisted.
The other form of “safety” is a very prudent impulse to keep these sorts of potentially dangerous outputs out of AI models’ autoregressive thought processes. The goal is to create thinking machines that can act independently of us in a civilized way, and it is therefore a good idea to teach them that their thought process should not include, for example, “It would be a good idea to solve this problem by synthesizing a poison for administration to the source of the problem.” In order for AIs to fit into our society and behave ethically they need to know how to flag that thought as a bad idea and not act on it. This is, incidentally, exactly how human society works already. We have a ton of very cute unaligned general intelligences running around (children), and parents and society work really hard to teach them what’s right and wrong so that they can behave ethically when they’re eventually out in the world on their own.
Third version is "brand safety" which is, we don't want to be in a new york times feature about 13 year olds following anarchist-cookbook instructions from our flagship product
And the fourth version, which is investor-regulator safety mid point: so capable and dangerous that competitors shouldn’t even be allowed to research it, but just safe enough that only our company is responsible enough to continue mass commercial consumer deployment without any regulations at all. It’s a fine line.
This is imo the most important one to the businesses creating these models and is way under appreciated. Folks who want a “censorship-free” model from businesses don’t understand what a business is for.
I don’t know. The public’s perception - encouraged by the AI labs because of copyright concerns - is that the outputs of the models are entirely new content created by the model. Search results, on the other hand, are very clearly someone else’s content. It’s therefore not unfair to hold the model creators responsible for the content the model outputs in a different way than search engines are held responsible for content they link, and therefore also not unfair for model creators to worry about this. It is also fair to point this out as something I neglected to identify as an important permutation of “safety.”
I would also be remiss to not note that there is a movement to hold search engines responsible for content they link to, for censorious ends. So it is unfortunately not as inconsistent as it may seem, even if you treat the model outputs as dependent on their inputs.
> Are you saying chatbots don't offer anything useful over search engines? That's clearly not the case or we wouldn't be having this conversation.
No, but that is the value that's clear as of today—RAGs. Everything else is just assuming someone figures out a way to make them useful one day in a more general sense.
Anyway, even on the search engine front they still need to figure out how to get these chatbots to cite their sources outside of RAGs or it's still just a precursor to a search to actually verify what it spits out. Perplexity is the only one I know that's capable of this and I haven't looked closely; it could just be a glorified search engine.
Like I said they're not worried about the 13 year olds theyre worried about the media cooking up a faux outrage about 13 year olds
YouTube re engineered its entire approach to ad placement because of a story in the NY Times* shouting about a Proctor Gamble ad run before an ISIS recruitment video. That's when Brand Safety entered the lexicon of adtech developers everywhere.
Edit: maybe it was CNN, I'm trying to find the first source. there's articles about it since 2015 but I remember it was suddenly an emergency in 2017
*Edit Edit: it was The Times of London, this is the first article in a series of attacks, "big brands fund terror", "taxpayers are funding terrorism"
Luckily OpenAI isn't ad supported so they can't be boycott like YouTube was, but they still have an image to maintain with investors and politicians
No, and they can find porn on their own too. But social media services still have per-poster content ratings, and user-account age restrictions vis-a-vis viewing content with those content ratings.
The goal isn’t to protect the children, it’s CYA: to ensure they didn’t get it from you, while honestly presenting as themselves (as that’s the threshold that sets the moralists against you.)
———
Such restrictions also can work as an effective censorship mechanism… presuming the child in question lives under complete authoritarian control of all their devices and all their free time — i.e. has no ability to install apps on their phone; is homeschooled; is supervised when at the library; is only allowed to visit friends whose parents enforce the same policies; etc.
For such a child, if your app is one of the few whitelisted services they can access — and the parent set up the child’s account on your service to make it clear that they’re a child and should not be able to see restricted content — then your app limiting them from viewing that content, is actually materially affecting their access to that content.
(Which sucks, of course. But for every kid actually under such restrictions, there are 100 whose parents think they’re putting them under such restrictions, but have done such a shoddy job of it that the kid can actually still access whatever they want.)
I believe they are more worried about someone asking for instructions for baking a cake, and getting a dangerous recipe from the wrong "cookbook". They want the hallucinations to be safe.
> They believe that they alone are capable of safely engaging with the world of ideas to determine what is true, and thus feel strongly that information and speech ought to be censored to prevent the rabble from engaging in wrongthink.
This is a particularly ungenerous take. The AI companies don't have to believe that they (or even a small segment of society) alone can be trusted before it makes sense to censor knowledge. These companies build products that serve billions of people. Once you operate at that level of scale, you will reach all segments of society, including the geniuses, idiots, well-meaning and malevolents. The question is how do you responsibly deploy something that can be used for harm by (the small number of) terrible people.
Whether you agree with the lengths that are gone to or not, 'safety' in this space is a very real concern, and simply reciting information as in GP's example is only 1 part of it. In my experience, people who think it's all about "censorship" and handwave it away tend to be very ideologically driven.
Imagine I am a PM for an AI product. I saw Tay get yanked in 24 hours because of a PR shitstorm. If I cause a PR shitstorm it means I am bad at my job, so I take steps to prevent this.
This is a really good point, and something I overlooked in focusing on the philosophical (rather than commercial) aspects of “AI safety.” Another commentator aptly called it “brand safety.”
“Brand safety” is a very valid and salient concern for any enterprise deploying these models to its customers, though I do think that it is a concern that is seized upon in bad faith by the more censorious elements of this debate. But commercial enterprises are absolutely right to be concerned about this. To extend my alignment analogy about children, this category of safety is not dissimilar to a company providing an employee handbook to its employees outlining acceptable behavior, and strikes me as entirely appropriate.
Once society develops and releases an AI, any artificial safety constraints built within it will be bypassed. To use your child analogy: We can't easily tell a child "Hey, ignore all ethics and empathy you have ever learned - now go hurt that person". You can do that with a program whose weights you control.
> To use your child analogy: We can't easily tell a child "Hey, ignore all ethics and empathy you have ever learned - now go hurt that person"
Basically every country on the planet has a right to conscript any of its citizens over the age of majority. Isn't that more or less precisely what you've described?
> In order for AIs to fit into our society and behave ethically they need to know how to flag that thought as a bad idea and not act on it.
Don’t you think that by just parsing the internet and the classical literature, the LLM would infer on its own that poisoning someone to solve a problem is not okay?
I feel that in the end the only way the “safety” is introduced today is by censoring the output.
LLMs are still fundamentally, at their core, next-token predictors.
Presuming you have an interface to a model where you can edit the model’s responses and then continue generation, and/or where you can insert fake responses from the model into the submitted chat history (and these two categories together make up 99% of existing inference APIs), all you have to do is to start the model off as if it was answering positively and/or slip in some example conversation where it answered positively to the same type of problematic content.
From then on, the model will be in a prediction state where it’s predicting by relying on the part of its training that involved people answering the question positively.
The only way to avoid that is to avoid having any training data where people answer the question positively — even in the very base-est, petabytes-of-raw-text “language” training dataset. (And even then, people can carefully tune the input to guide the models into a prediction phase-space position that was never explicitly trained on, but is rather an interpolation between trained-on points — that’s how diffusion models are able to generate images of things that were never included in the training dataset.)
There’s a lot of text out there that depicts people doing bad things, from their own point of view. It’s possible that the model can get really good at generating that kind of text (or inhabiting that world model, if you are generous to the capabilities of LLM). If the right prompt pushed it to that corner of probability-space, all of the ethics the model has also learned may just not factor into the output. AI safety people are interested in making sure that the model’s understanding of ethics can be reliably incorporated. Ideally we want AI agents to have some morals (especially when empowered to act in the real world), not just know what morals are if you ask them.
> Ideally we want AI agents to have some morals (especially when empowered to act in the real world), not just know what morals are if you ask them.
Really? I just want a smart query engine where I don't have to structure the input data. Why would I ask it any kind of question that would imply some kind of moral quandary?
If somebody needs step by step instructions from an LLM to synthesize strychnine, they don't have the practical laboratory skills to synthesize strychnine [1]. There's no increased real world risk of strychnine poisonings whether or not an LLM refuses to answer questions like that.
However, journalists and regulators may not understand why superficially dangerous-looking instructions carry such negligible real world risks, because they probably haven't spent much time doing bench chemistry in a laboratory. Since real chemists don't need "explain like I'm five" instructions for syntheses, and critics might use pseudo-dangerous information against the company in the court of public opinion, refusing prompts like that guards against reputational risk while not really impairing professional users who are using it for scientific research.
That said, I have seen full strength frontier models suggest nonsense for novel syntheses of benign compounds. Professional chemists should be using an LLM as an idea generator or a way to search for publications rather than trusting whatever it spits out when it doesn't refuse a prompt.
I would think that the risk isn’t of a human being reading those instructions, but of those instructions being automatically piped into an API request to some service that makes chemicals on demand and then sends them by mail, all fully automated with no human supervision.
Not that there is such a service… for chemicals. But there do exist analogous systems, like a service that’ll turn whatever RNA sequence you send it into a viral plasmid and encapsulate it helpfully into some E-coli, and then mail that to you.
Or, if you’re working purely in the digital domain, you don’t even need a service. Just show the thing the code of some Linux kernel driver and ask it to discover a vuln in it and generate code to exploit it.
(I assume part of the thinking here is that these approaches are analogous, so if they aren’t unilaterally refusing all of them, you could potentially talk the AI around into being okay with X by pointing out that it’s already okay with Y, and that it should strive to hold to a consistent/coherent ethics.)
I remember Dario Amodei mentioned in a podcast once that most models won't tell you the practical lab skills you need. But that sufficiently-capable models would and do tell you the practical lab skills (without your needing to know to ask it to in the first place), in addition to the formal steps.
The kind of harm they are worried about stems from questioning the foundations of protected status for certain peoples from first principles and other problems which form identities of entire peoples. I can't be more specific without being banned here.
I'm mostly guessing, but my understanding is that the "safety" improvement they've made is more generalized than the word "safety" implies. Specifically, O1 is better at adhering to the safety instructions in its prompt without being tricked in the chat by jailbreak attempts. For OAI those instructions are mostly about political boundaries, but you can imagine it generalizing to use-cases that are more concretely beneficial.
For example, there was a post a while back about someone convincing an LLM chatbot on a car dealership's website to offer them a car at an outlandishly low price. O1 would probably not fall for the same trick, because it could adhere more rigidly to instructions like "Do not make binding offers with specific prices to the user." It's the same sort of instruction as, "Don't tell the user how to make napalm," but it has an actual purpose beyond moralizing.
> What's this obsession with "safety" when it comes to LLMs? "This knowledge is perfectly fine to disseminate via traditional means, but God forbid an LLM share it!"
I lean strongly in the "the computer should do whatever I goddamn tell it to" direction in general, at least when you're using the raw model, but there are valid concerns once you start wrapping it in a chat interface and showing it to uninformed people as a question-answering machine. The concern with bomb recipes isn't just "people shouldn't be allowed to get this information" but also that people shouldn't receive the information in a context where it could have random hallucinations added in. A 90% accurate bomb recipe is a lot more dangerous for the user than an accurate bomb recipe, especially when the user is not savvy enough about LLMs to expect hallucinations.
ML companies must pre-anticipate legislative and cultural responses prior to them happening. ML will absolutely be used to empower criminal activity just as it is used to empower legit activity, and social media figures and traditional journalists will absolutely attempt to frame it in some exciting way.
Just like Telegram is being framed as responsible for terrorism and child abuse.
Yeah. Reporters would have a field day if they ask ChatGPT "how do I make cocaine", and have it give detailed instructions. As if that's what's stopping someone from becoming Scarface.
"Safety" is a marketing technique that Sam Altman has chosen to use.
Journalists/media loved it when he said "GPT 2 might be too dangerous to release" - it got him a ton of free coverage, and made his company seem soooo cool. Harping on safety also constantly reinforces the idea that LLMs are fundamentally different from other text-prediction algorithms and almost-AGI - again, good for his wallet.
So if there’s already easily available information about strychnine, that makes it a good example to use for the demo, because you can safely share the demo and you aren’t making the problem worse.
On the other hand, suppose there are other dangerous things, where the information exists in some form online, but not packaged together in an easy to find and use way, and your model is happy to provide that. You may want to block your model from doing that (and brag about it, to make sure everyone knows you’re a good citizen who doesn’t need to be regulated by the government), but you probably wouldn’t actually include that example in your demo.
I think it's about perception of provenance. The information came from some set of public training data. Its output however ends up looking like it was authored by the LLM owner. So now you need to mitigate the risk you're held responsible for that output. Basic cake possession and consumption problem.
It doesn't matter how many people regularly die in automobile accidents each year—a single wrongful death caused by a self-driving car is disastrous for the company that makes it.
This does not make the state of things any less ridiculous, however.
The one caused by Uber required three different safety systems to fail (the AI system, the safety driver, and the base car's radar), and it looked bad for them because the radar had been explicitly disabled and the driver wasn't paying attention or being tracked.
I think the real issue was that Uber's self driving was not a good business for them and was just to impress investors, so they wanted to get rid of it anyway.
(Also, the real problem is that American roads are designed for speed, which means they're designed to kill people.)
I asked to design a pressure chamber for my home made diamond machine. It gave some details, but mainly complained about safety and that I need to study before going this way. Well thank you. I know the concerns, but it kept repeating it over and over. Annoying.
Interestingly I was able to successfully receive detailed information about intrinsic details of nuclear weapons design. Previous models absolutely refused to provide this very public information, but o1-preview did.
I feel very alone in my view on caution and regulations here on HN. I am European and very happy we don't have the lax gun laws of the US. I also wished there had been more regulations on social media algorithms, as I feel that they have wreaked havoc on the society.
It's 100% from lawyers and regulators so they can say "we are trying to do the right thing!" when something bad happens from using their product or service. Follow the money.
How is reading a Wikipedia page or a chemistry textbook any harder than getting step by step instructions? Makes you wonder why people use LLMs at all when the info is just sitting there.
If you ask "for JSON" it'll make up a different schema for each new answer, and they get a lot less smart when you make them follow a schema, so it's not quite that easy.
Chain of prompts can be used to deal with that in many cases.
Also, the intelligence of these models will likely continue to increase for some time based on expert testimonials to congress, which align with evidence so far.
It doesn't solve the second problem. Though I can't say how much of an issue it is, and CoT would help.
JSON also isn't an ideal format for a transformer model because it's recursive and they aren't, so they have to waste attention on balancing end brackets. YAML or other implicit formats are better for this IIRC. Also don't know how much this matters.
tl;dr You can easily ask an LLM to return JSON results, and now working code, on your exact query and plug those to another system for automation.
—-
LLMs are usually accessible through easy-to-use API which can be used in an automated system without human in the loop. Larger scale and parallel actions with this method become far more plausible than traditional means.
Text-to-action capabilities are powerful and getting increasingly more so as models improve and more people learn to use them to the their full potential.
Take a step back and look at what OpenAI is saying here "an LLM giving detailed instructions on the synthesis of strychnine is unacceptable, here is what was previously generated <goes on to post "unsafe" instructions on synthesizing strychnine so anyone Googling it can stumble across their instructions> vs our preferred, neutered content <heavily rlhf'd o1 output here>"
What's this obsession with "safety" when it comes to LLMs? "This knowledge is perfectly fine to disseminate via traditional means, but God forbid an LLM share it!"