No… seriously. Every model release is accused. Including Opus, GPT-5.4, whatever. And yes, including smaller models that are not the top in every benchmark.
I would almost be tempted to call it benchmaxed if that term weren’t such a joke at this point. It is a deeply unserious term these days.
Gemma 4 is worse than its benchmarks show in terms of agentic workflows. The Qwen3.x models are much better; not benchmaxed. I have tested this extensively for my own workflows. Google really needs to release Gemma 4.1 ASAP. I really hope they’re not planning to just wait another calendar year like they did for Gemma 3 -> 4 with no intermediate updates.