We did both -- we did a number of UI iterations (eg. improving thinking loading states, making it more clear how many tokens are being downloaded, etc.). But we also reduced the default effort level after evals and dogfooding. The latter was not the right decision, so we rolled it back after finding that UX iterations were insufficient (people didn't understand to use /effort to increase intelligence, and often stuck with the default -- we should have anticipated this).
Having a "Recovery Mode"/"Safe Boot" flag to disable our configurations (or progressively enable) to see how claude code responds would be nice. Sometimes I get worried some old flag I set is breaking things. Maybe the flag already exists? I tried Claude doctor but it wasn't quite the solution.
For instance:
Is Haiku supposed to hit a warm system-prompt cache in a default Claude code setup?
I had `DISABLE_TELEMETRY=1` in my env and found the haiku requests would not hit a warm-cached system prompt. E.g. on first request just now w/ most recent version (v2.1.118, but happened on others):
w/ telemetry off - input_tokens:10 cache_read:0 cache_write:28897 out:249
w/ telemetry on - input_tokens:10 cache_read:24344 cache_write:7237 out:243
I used to think having so many users was leading to people hitting a lot of edge cases, 3 million users is 3 million different problems. Everyone can't be on the happy path. But then I started hitting weird edge cases and started thinking the permutations might not be under control.
> people didn't understand to use /effort to increase intelligence, and often stuck with the default -- we should have anticipated this
UI is UI. It is naive to expect that you build some UI but users will "just magically" find out that they should use it as a terminal in the first place.
It took you a month to revert after multiple complaints. You still blamed users for using the product exactly as you advertised it. And all of your official channels were completely quite for two months, whether it was about new draconian peak hour limits, or about the new defaults, or about exponentially increasing token costs.
People literally started seeing issues immediately as you changed the defaults: https://x.com/levelsio/status/2029307862493618290 And despite a huge amount of reports you still kept it for a whole month.
And then you shipped a completely untested feature with prompt cache misses and literally gaslit users and blamed users for using the product as advertised.
Now untold umber of people have been hit by these changes, so as an apology you reset usage limits three hours before they would reset anyway.
Good job.
Edit. By the way, a very telling sentence from the report:
--- start quote ---
We’ll ensure that a larger share of internal staff use the exact public build of Claude Code (as opposed to the version we use to test new features); and we'll make improvements to our Code Review tool that we use internally
--- end quote ---
Translation: no one is using or even testing the product we ship, and we blindly trust Claude Code to review and find bugs for us. Last one isn't even a translation: https://x.com/bcherny/status/2017742750473720121
Off topic, but I'm hoping you'll maybe see this. There's been an issue with the VS code extension that makes it pretty much impossible to use (PreToolUse can't intercept permission requests anymore, using PermissionRequest hooks always open the diff viewer and steals focus):
We did both -- we did a number of UI iterations (eg. improving thinking loading states, making it more clear how many tokens are being downloaded, etc.). But we also reduced the default effort level after evals and dogfooding. The latter was not the right decision, so we rolled it back after finding that UX iterations were insufficient (people didn't understand to use /effort to increase intelligence, and often stuck with the default -- we should have anticipated this).