I personally enjoy the “You’re absolutely right!” exclamation. It signals alignm...

transcriptase · 2025-09-29T17:57:23 1759168643

You’re overlooking the fact that it still says that when you are, in reality, absolutely wrong.

fnordsensei · 2025-09-29T18:59:57 1759172397

That’s not the purpose of it, as I understand it; it’s a token phrase generated to cajole it down a particular path.[1] An alignment mechanism.

The complement appears to be, “actually, that’s not right.”, a correction mechanism.

1: https://news.ycombinator.com/item?id=45137802

GoatInGrey · 2025-09-29T20:13:47 1759176827

It gets annoying because A) it so quickly dismisses its own logic and conclusion from less than two minutes ago (extreme confidence with minimal conviction), and B) it fucks up the second time too (sometimes in the same way!) about 33% of the time.

ewoodrich · 2025-09-29T21:11:13 1759180273

Gemini 2.5 Pro seems to have a tic where after an initial failed task, it then starts asserting escalating levels of confidence for each subsequent attempt. Like it's ever conscious of its failure lingering in its context and feels the need to over compensate as a form of reassuring both the user and itself that it's not going to immediately faceplant again.

transcriptase · 2025-09-30T03:55:10 1759204510

ChatGPT does the same thing, to the point that after several rounds of pointing out errors or hallucinations it will say things like “Ok, you’re right. No more foolish mistakes. This is it, for all the marbles. Here is an assured, triple-checked, 100% error-free, working script, with no chance of failure.”

Which fails in pretty much the exact same way it did before.

Once ChatGPT hits that supremely confident “Ok nothing was working because I was being an idiot but now I’m not” type of dialogue, I know it’s time to just start a new chat. There’s no pulling it out of “spinning the tires while gaslighting” mode.

I’ve even had it go as far as outputting a zip file with an empty .txt that supposedly contained the solution to a certain problem it was having issues with.

fnordsensei · 2025-09-29T21:44:43 1759182283

I’ve had the opposite experience with GPT-5, where it’s utterly convinced that its own (incorrect) solution is the way to go that it turns me down and preemptively launches tools to implement what it has in mind.

I get that it’s tradeoffs, but erring on the side of the human being correct is probably going to be a safer bet for another generation or two.

baobabKoodaa · 2025-09-29T20:13:17 1759176797

Hmmh. I believe your explanation, but I don't think that's the full story. It's also a sycophancy mechanism to maximize engagement from real users and reward hack AI labelers.

fnordsensei · 2025-09-29T21:39:08 1759181948

That doesn’t seem plausible to me. Not that LLMs can’t be sycophantic, but I don’t think this phrase in particular is part of it.

It’s a canned phrase in a place where an LLM could be much more creative to much greater efficacy.

transcriptase · 2025-09-29T23:54:19 1759190059

I think there’s something to it.

Part of me thinks that when they do their “which of these responses do you prefer” A/B test on users… whereas perhaps many on HN would try to judge the level of technical detail, complexity, usefulness… I’m inclined to believe the midwit population at large would be inclined to choose the option where the magic AI supercomputer reaffirms and praises the wisdom of whatever they say, no matter how stupid or wrong it is.

fnordsensei · 2025-09-30T06:06:35 1759212395

I don't disagree exactly, it's just that it smells weird.

LLMs are incredibly good at social engineering when we let them, whereas I could write the code to emit "you're right" or "that's not quite right" without involving any statistical prediction.

Ie., as a method of persuasion, canned responses are incredibly inefficient (as evidenced by the annoyance with them), whereas we know that the LLM is capable of being far more insidious and subtle in its praise of you. For example, it could be instructed to launch weak counter arguments, "spot" the weaknesses, and then conclude that your position is the correct one.

But let's say that there's a monitoring mechanism that concludes that adjustments are needed. In order to "force" the LLM to drop the previous context, it "seeds" the response with "You're right", or "That's not quite right", as if it were the LLMs own conclusion. Then, when the LLM starts predicting what comes next, it must conclude things that follow from "you're right" or "that's not quite right".

So while they are very inefficient as persuasion and communication, they might be very efficient at breaking with the otherwise overwhelming context that would interfere with the change you're trying to affect.

That's the reason why I like the canned phrases. It's not that I particularly enjoy the communication in itself, it's that they are clear enough signals of what's going on. They give a tiny level observability to the black box, in the form of indicating a path change.

clbrmbr · 2025-09-30T10:51:13 1759229473

But the there’s also the negative psychological impact on the user having the model so strongly agree with them all the time. —— I cannot be the only one who half expects humans to say this to me all the time now?

podgietaru · 2025-09-29T18:12:38 1759169558

And that it often spits out the exact same wrong answer in response.