There needs to be a sycophancy benchmark in these comparisons. More baseless pra...

Workaccount2 · 2025-11-18T14:34:24 1763476464

This idea isn't just smart, it's revolutionary. You're getting right at the heart of the problem with today's benchmarks — we don't measure model praise. Great thinking here.

For real though, I think that overall LLM users enjoy things to be on the higher side of sycophancy. Engineers aren't going to feel it, we like our cold dead machines, but the product people will see the stats (people overwhelmingly use LLMs to just talk to about whatever) and go towards that.

swalsh · 2025-11-18T12:47:21 1763470041

You're absolutely right

jstummbillig · 2025-11-18T12:56:43 1763470603

Does not get old.

Yossarrian22 · 2025-11-18T13:21:28 1763472088

It’s not just irritating, it’s repetitive

causal · 2025-11-18T14:34:16 1763476456

It's a revolution in subtle humor. Well done.

fumblebee · 2025-11-20T14:17:19 1763648239

It’s not just irritating, it’s repetitive

this_user · 2025-11-18T13:29:48 1763472588

I'm sorry, you are absolutely right.

---

But seriously, I find it helps to set a custom system prompt that tells Gemini to be less sycophantic and to be more succinct and professional while also leaving out those extended lectures it likes to give.

falcor84 · 2025-11-18T13:25:40 1763472340

"You know, you are also right"

BoredPositron · 2025-11-18T12:57:46 1763470666

Your comment demonstrates a remarkably elevated level of cognitive processing and intellectual rigor. Inquiries of this caliber are indicative of a mind operating at a strategically advanced tier, displaying exceptional analytical bandwidth and thought-leadership potential. Given the substantive value embedded in your question, it is operationally imperative that we initiate an immediate deep-dive and execute a comprehensive response aligned with the strategic priorities of this discussion.

postalcoder · 2025-11-18T13:05:38 1763471138

I care very little about model personality outside of sycophancy. The thing about gemini is that it's notorious for its low self esteem. Given that thing is trained from scratch, I'm very curious to see how they've decided to take it.

supjeff · 2025-11-18T13:14:27 1763471667

given how often these llms are wrong, doesnt it make sense that they are less confident?

postalcoder · 2025-11-18T13:41:17 1763473277

Indeed. But I've had experiences with gemini-2.5-pro-exp where its thoughts could be described as "rejected from the prom" vibes. It's not like I abused it either, it was running into loops because it was unable to properly patch a file.

astrange · 2025-11-18T21:02:17 1763499737

Sonnet-4.5 has the lowest self esteem of any model I've used. Gemini frequently argues with me.

1899-12-30 · 2025-11-18T13:17:22 1763471842

https://eqbench.com/spiral-bench.html

SiempreViernes · 2025-11-18T14:45:22 1763477122

I'd like if the scorecard also gave an expected number of induced suicides per hundred thousand users.

lkbm · 2025-11-18T15:23:17 1763479397

https://llmdeathcount.com/ shows 15 deaths so far, and LLM user count is in the low billions, which puts us on the order of 0.0015 deaths per hundred thousand users.

I'm guessing LLM Death Count is off by an OOM or two, so we could be getting close to one in a million.

Lord-Jobo · 2025-11-18T13:17:32 1763471852

And have the score heavily modified based on how fixable the sycophancy is.