Looking at his attempts at jailbreaking some models, I'm not sure he even remotely understands what he's doing, e.g. he tries to counter non-existent refusal training in Gemini [0] while doing nothing against the external guardrails which actually protect the model. Looks like a pompous e-celeb, all performance with no substance.
jailbreaks are holistic, it’s not like you’re deprogramming / “countering” individual parts. Nobody creating jailbreaks “understand what they’re doing”
That's exactly what you do in case of refusal training, though. Yes, it will affect other "parts", but that's not the point. In this case the model itself doesn't even need a jailbreak.
>Nobody creating jailbreaks “understand what they’re doing”
Unless you mean those "god mode jailbreaker" e-celebrities showing off on Twitter/Reddit, that's simply not true.
https://github.com/elder-plinius/L1B3RT4S/blob/main/GOOGLE.m...