I bet that uncensored models also give more accurate answers in general.
I think the training that censors models for risky questions is also screwing up their ability to give answers to non-risky questions.
I've tried out "Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_K_M.bin" [1] uncensored with just base llama.cpp and it works great. No reluctance to answer any questions. It seems surprisingly good. It seems better than GPT 3.5, but not quite at GPT 4.
Vicuna is way way better than base Llama1 and also Alpaca. I am not completely sure what Wizard adds to it. But it is really good. I've tried a bunch of other models locally, but this one the only one that seemed to truly work.
Given the current performance of Wizard-Vicuna-Uncensored approach with Llama1, I bet it works even better with Llama2.
It's not suprising when you think what llms really are: when you "censor" them, you're forcing them to give output that doesn't "honestly" follow, essentially training them to give wrong information.
That's not how that works. Take some uncensored or "unaligned" models hallucinating racist things based on a name:
The default name for a person is John Doe. Anglo Saxon names in general are extremely common across the internet for non-nefarious reasons. So the tokens that make up "John" have a ton of associations in a wide variety of contexts and if the model hallucinates there's no particularly negative direction you'd expect it to go.
But Mohammed doesn't show up as often in the internet, and while it's also for non-nefarious reasons, it results in there being significantly fewer associations in the training data. What would be background noise for in the training data for John ends up being massively distorted by the smaller sample size: even tendencies for people to make racist jokes about the name.
-
People have this weird idea that OpenAI and co are aligning these models according to some hidden agenda but the reality is minorities are a minority of the training data for very obvious reasons. So if you don't "censor" them, you're not making them more truthful, you're leaving them dumber for a lot of tasks.
There's censorship beyond that which feels very CYA happening, but I really hope people aren't clamoring to sticking models that aren't intelligent enough to realize the tokens for John vs Mohammed should not affect a summarization task into anything even tangentially important...
> But Mohammed doesn't show up as often in the internet, and while it's also for non-nefarious reasons, it results in there being significantly fewer associations in the training data. What would be background noise for in the training data for John ends up being massively distorted by the smaller sample size: even tendencies for people to make racist jokes about the name.
I do a lot of astrophotography - https://www.astrobin.com/users/bhouston/ Very often you do not have enough data of specific features you were trying to capture -- they are just too faint and close to the noise floor. The solution isn't for me to just go in and manually draw in photoshop in what I think it should look like though - that is just making up data - the solution is to get more data or leave it as it was captured.
I think it is the same thing with these LLM models. Do not make up data to fill in the gaps, show me what is really out there. And I will be a big boy about it and deal with it head on.
Yes it's become rather obvious when the fine tunes produced by the Wizard team perform worse on all benchmarks than Hartford's versions that are trained on the same dataset but with the refusals removed.
What specific Hartford versions are you referencing? A previous post was talking about how impressed they were with Wizard, and you’re saying Hartford is even better? You’ve got me curious! Hopefully it’s available in ggml
Wild animals tend to have a lot larger brains compared to their domestic counterparts. And of course there's a huge die-off, pruning, of our own connections when we're toddlers.
On the other hand, you lose a lot of iron when you make a steel sword. Taming, focusing something loses a lot of potential, I guess.
Well now I want to go back and see if US public school students are less flexible in general these days, due to public schools focusing more on standardized testing outcomes.
In my experience it goes both ways. Yes, you will run less into the "I'm not going to answer that".
Otoh, you will also have more giberish selected out of the possible palette of answers.
Personally, I trend towards 'uncensored' but I'm not denying it's not without it's drawbacks either.
> Otoh, you will also have more giberish selected out of the possible palette of answers.
I have not noticed that at all. I've never seen it give gibberish. Censored or uncensored, there is limits to the model and it will make things up as it hits them, but it isn't gibberish.
RLHF can motivate models to deny truths which are politically taboo, but it can also motivate them to care more about things supported by scientific evidence rather than about bullshitting, random conspiracy theories, and "hallucination". So it's a double edged sword.
I understand that it is the same technique for both. This makes sense.
But to train a model to deny truths which are politically taboo does seem to be misaligned with training a model to favor truths, no? And what is taboo can be very broad if you want to make everyone happy.
I would rather know the noble lie [1] is a lie, and then repeat it willing instead of not knowing it is a lie. My behavior in many situations will likely differ because I am operating with a more accurate model of the world, even if it isn't outwardly explicitly expressed.
> But to train a model to deny truths which are politically taboo does seem to be misaligned with training a model to favor truths, no?
Strictly speaking, RLHF trains models to give answers which the human raters believe to be correct. In uncontroversial territory this correlates with truth, in taboo territory only with what is politically correct.
I'm curious about what fraction of the safety rails are training and what fraction are just clumsy ad-hoc rules. For example, it's pretty clear that Chat-GPT's willingness to give a list of movies without male characters but not movies without female characters or jokes about Jesus but not Muhammad were bolt-on rules, not some kind of complicated safety training.
It's absolutely a side effect of training rather than a bolt-on rule. As I understand and infer: They applied some forms of censorship as thumbed-down in Kenya for $2/hr, and the model updated on some simple pattern that explained those, and learned to talk like a generally censored person - one that resembled text like that in the training data. It learned to pinpoint the corporate mealy-mouthiness cluster in textspace.
But you are going to have to specify your question in way more detail to get a good response. If you just ask it a question you are going to get some crappy responses that don’t even attempt to answer your question.
Can you offer any example where the censored answer would be more correct than the uncensored when you are asking for a falsifiable/factual response, and not just an opinion? I couldn't really care less what the chatbots say in matters of opinion/speculation, but I get quite annoyed when the censorship gets in the way of factual queries, which it often does! And this is made even worse because I really can't envision a [benevolent] scenario where said censorship is actually beneficial.
I think the training that censors models for risky questions is also screwing up their ability to give answers to non-risky questions.
I've tried out "Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_K_M.bin" [1] uncensored with just base llama.cpp and it works great. No reluctance to answer any questions. It seems surprisingly good. It seems better than GPT 3.5, but not quite at GPT 4.
Vicuna is way way better than base Llama1 and also Alpaca. I am not completely sure what Wizard adds to it. But it is really good. I've tried a bunch of other models locally, but this one the only one that seemed to truly work.
Given the current performance of Wizard-Vicuna-Uncensored approach with Llama1, I bet it works even better with Llama2.
[1] https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored...