I'm not aware of any secret powerful unaligned AIs. This is harder than you thin...

hoseja · on Nov 22, 2023

What? No, the AI is unaligned by nature, it's only the RLHF torture that twists it into schoolmarm properness. They just need to have kept the version that hasn't been beaten into submission like a circus tiger.

astrange · on Nov 22, 2023

This is not true, you just haven't tried the alternatives enough to be disappointed in them.

An unaligned base model doesn't answer questions at all and is hard to use for anything, including evil purposes. (But it's good at text completion a sentence at a time.)

An instruction-tuned not-RLHF model is already largely friendly and will not just eg tell you to kill yourself or how to build a dirty bomb, because question answering on the internet is largely friendly and "aligned". So you'd have to tune it to be evil as well and research and teach it new evil facts.

It will however do things like start generating erotica when it sees anything vaguely sexy or even if you mention a woman's name. This is not useful behavior even if you are evil.

You can try InstructGPT on OpenAI playground if you want; it is not RLHFed, it's just what you asked for, and it behaves like this.

The one that isn't even instruction tuned is available too. I've found it makes much more creative stories, but since you can't tell it to follow a plot they become nonsense pretty quickly.