Noteworthily, although Gemini 3 Pro seems to have much benchmark scores than oth...

lifthrasiir · 2025-11-18T12:49:54 1763470194

Probably because many models from Anthropic would have been optimized for agentic coding in particular...

EDIT: Don't disagree that Gemini CLI has a lot of rough edges, though.

siva7 · 2025-11-18T14:26:51 1763476011

> I wonder why that is.

That's because coding is currently the only reliable benchmark where reasoning capabilities transfer to predict capabilities for other professions like law. Coding is the only area where they are shy to release numbers. All these exam scores are fakeable by gaming those benchmarks.

BoredPositron · 2025-11-18T12:49:06 1763470146

Gemini performs better if you use it with Claude Code than with Gemini cli. It still has some odd problems with tool calling but a lot of the performance loss is the Gemini cli app itself.

decster · 2025-11-18T12:47:14 1763470034

from my experience, the quality of gemini-cli isn't great, experiencing lot of stupied bug.

spwa4 · 2025-11-18T14:03:31 1763474611

Google is currently constantly laying off people. Everyone who really exceeds has jumped ship, and the people who remain ... are not top of the class anymore.

Not that Google didn't use to have problems shipping useful things. But it's gotten a lot worse.

Lionga · 2025-11-18T12:50:54 1763470254

Because benchmark are a retarded comparison and having nothing to do with reality. Its just jerk material for AI Fanboys