Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I personally enjoy the “You’re absolutely right!” exclamation. It signals alignment with my feedback in a consistent manner.


You’re overlooking the fact that it still says that when you are, in reality, absolutely wrong.


That’s not the purpose of it, as I understand it; it’s a token phrase generated to cajole it down a particular path.[1] An alignment mechanism.

The complement appears to be, “actually, that’s not right.”, a correction mechanism.

1: https://news.ycombinator.com/item?id=45137802


It gets annoying because A) it so quickly dismisses its own logic and conclusion from less than two minutes ago (extreme confidence with minimal conviction), and B) it fucks up the second time too (sometimes in the same way!) about 33% of the time.


Gemini 2.5 Pro seems to have a tic where after an initial failed task, it then starts asserting escalating levels of confidence for each subsequent attempt. Like it's ever conscious of its failure lingering in its context and feels the need to over compensate as a form of reassuring both the user and itself that it's not going to immediately faceplant again.


ChatGPT does the same thing, to the point that after several rounds of pointing out errors or hallucinations it will say things like “Ok, you’re right. No more foolish mistakes. This is it, for all the marbles. Here is an assured, triple-checked, 100% error-free, working script, with no chance of failure.”

Which fails in pretty much the exact same way it did before.

Once ChatGPT hits that supremely confident “Ok nothing was working because I was being an idiot but now I’m not” type of dialogue, I know it’s time to just start a new chat. There’s no pulling it out of “spinning the tires while gaslighting” mode.

I’ve even had it go as far as outputting a zip file with an empty .txt that supposedly contained the solution to a certain problem it was having issues with.


I’ve had the opposite experience with GPT-5, where it’s utterly convinced that its own (incorrect) solution is the way to go that it turns me down and preemptively launches tools to implement what it has in mind.

I get that it’s tradeoffs, but erring on the side of the human being correct is probably going to be a safer bet for another generation or two.


Hmmh. I believe your explanation, but I don't think that's the full story. It's also a sycophancy mechanism to maximize engagement from real users and reward hack AI labelers.


That doesn’t seem plausible to me. Not that LLMs can’t be sycophantic, but I don’t think this phrase in particular is part of it.

It’s a canned phrase in a place where an LLM could be much more creative to much greater efficacy.


I think there’s something to it.

Part of me thinks that when they do their “which of these responses do you prefer” A/B test on users… whereas perhaps many on HN would try to judge the level of technical detail, complexity, usefulness… I’m inclined to believe the midwit population at large would be inclined to choose the option where the magic AI supercomputer reaffirms and praises the wisdom of whatever they say, no matter how stupid or wrong it is.


I don't disagree exactly, it's just that it smells weird.

LLMs are incredibly good at social engineering when we let them, whereas I could write the code to emit "you're right" or "that's not quite right" without involving any statistical prediction.

Ie., as a method of persuasion, canned responses are incredibly inefficient (as evidenced by the annoyance with them), whereas we know that the LLM is capable of being far more insidious and subtle in its praise of you. For example, it could be instructed to launch weak counter arguments, "spot" the weaknesses, and then conclude that your position is the correct one.

But let's say that there's a monitoring mechanism that concludes that adjustments are needed. In order to "force" the LLM to drop the previous context, it "seeds" the response with "You're right", or "That's not quite right", as if it were the LLMs own conclusion. Then, when the LLM starts predicting what comes next, it must conclude things that follow from "you're right" or "that's not quite right".

So while they are very inefficient as persuasion and communication, they might be very efficient at breaking with the otherwise overwhelming context that would interfere with the change you're trying to affect.

That's the reason why I like the canned phrases. It's not that I particularly enjoy the communication in itself, it's that they are clear enough signals of what's going on. They give a tiny level observability to the black box, in the form of indicating a path change.


But the there’s also the negative psychological impact on the user having the model so strongly agree with them all the time. —— I cannot be the only one who half expects humans to say this to me all the time now?


And that it often spits out the exact same wrong answer in response.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: