The bump from 4.6 to 4.7 is not very noticeable to me in improved capabilities so far, but the faster consumption of limits is very noticeable.
I hit my 5 hour limit within 2 hours yesterday, initially I was trying the batched mode for a refactor but cancelled after seeing it take 30% of the limit within 5 minutes. Had to cancel and try a serial approach, consumed less (took ~50 minutes, xhigh effort, ~60% of the remaining allocation IIRC), but still very clearly consumed much faster than with 4.6.
It feels like every exchange takes ~5% of the 5 hour limit now, when it used to be maybe ~1-2%. For reference I'm on the Max 5x plan.
For now I can tolerate it since I still have plenty of headroom in my limits (used ~5% of my weekly, I don't use claude heavily every day so this is OK), but I hope they either offer more clarity on this or improve the situation. The effort setting is still a bit too opaque to really help.
The most frustrating part is the quality loss caused by the forced adaptive thinking. It eats 5-10% of my Max 5x usage and churns for ten minutes, only to come back with totally untrustworthy results. It lazily hand-waves issues away in order to avoid reading my actual code and doing real reasoning work on it. Opus simply cannot be trusted if adaptive thinking is enabled.
You don't have to use adaptive thinking. It had been turned off on my main work computer. I was using a different computer on a trip and I started getting so angry at Claude for doing a bad job. I evetually figured out it was adaptive thinking and set it to "hard" and it started working again. At the time I think "hard" was the top choice. With 4.7, my computer now shows "xhard", which I assume is the equivelent setting. There is one higher setting than this, which I haven't tried yet. I would tell you how to change these settings, but I don't remember. By the way, I have been happy with 4.7 so far. I actually did not like 4.6 and preferred 4.5 and used that most of the time until this new release.
"With Opus 4.6, extended thinking was a toggle you managed: turn it on for hard stuff, off for quick stuff. If you left it on, every question paid the thinking tax whether it needed to or not. Now, with Opus 4.7, extended thinking becomes adaptive thinking. "
You want extended thinking? It's not adaptive thinking and opus will turn it on if it thinks it needs to. But it probably won't, according to user reports as tokens are expensive. Except opus 4.7 now uses 35% more and outputs more thinking tokens.
From what I understand you shouldn't wait more than 5min between prompts without compacting or clearing or you'll pay for reinitializing the cache. With compaction you still pay but it's less input tokens.
(Is compaction itself free?)
Only if you set `ENABLE_PROMPT_CACHING_1H`, which was mentioned in the release notes for a recent Claude Code release but doesn't seem to be in the official docs.
That'd be awesome but it doesn't reflect what I see. Do you have a source for that?
What I see is if take a quick break the session loses ~5% right at the start of the next prompt processing. (I'm currently on max 5x)
Not at my workstation right now, but simply ask claude to analyze jsonl transcript of any session, there are two cache keys there, one is 5m, another 1h. Only 1h gets set. There are also some entries there that will tell you if request was a cache hit or miss, or if cache rewrite happened. I've had claude test another claude and on max 5x subscription, cache miss only happened if message was sent after 1h, or if session was resumed using /resume or --resume (this is a bug that exists since January - all session resumes will cause a full cache rewrite).
However, cache being hit doesn't necessarily mean Anthropic won't just subtract usage from you as if it wasn't hit. It's Anthropic we're talking about. They can do whatever they want with your usage and then blame you for it.
I have heard that if you have telemetry disabled the cache is 5 minutes, otherwise 1h. No clue how true that is however my experience (with telemetry enabled) has been the 1h cache.
It's true as far as I can tell, just by my own checking using `/status`. You can also tell by when the "clear" reminder hint shows up. Also if you look at the leaked claude code you can see that almost everything in the main thread is cached with 1H TTL (I believe subagents use 5 minute TTL)
Isn't that how the kv cache currently works? Of course they could decide to hold on to cache items for longer than an hour, but the storage requirements are pretty significant while the chance of sessions resumption slinks rapidly.
The storage requirements for large-model KV caches are actually comparatively tiny: the per-token size grows far less than model parameters. Of course, we're talking "tiny" for stashing them on bulk storage and slowly fetching them back to RAM. But that should still be viable for very long context, since the time for running prefill is quadratic.
We only have open models to go by, so looking at GLM 5.1 for instance, we're talking about almost 300 GB of kv-cache for a full context window of 200k tokens.
The point of prompt caching is to save on prefill which for large contexts (common for agentic workloads) is quite expensive per token. So there is a context length where storing that KV-cache is worth it, because loading it back in is more efficient than recomputing it. For larger SOTA models, the KV cache unit size is also much smaller compared to the compute cost of prefill, so caching becomes worthwhile even for smaller context.
If I have a conversation with claude then come back 30 minutes later to resume the conversation, the KV values for that prefill prefix are going to be exactly the same. That's the whole point of this caching in the first place.
If you're willing to incur a latency penalty on a "cold resume" (which is fine for most use-cases), why couldn't they just move it to disk. The size of the KV cache should scale on the order of something like (context_length * n_layers * residual_length). I think for a standard V3-MoE model at 1M token length, this should be on the order of 100G at FP16? And you can surely play tricks with KV compression (e.g. the recent TurboQuant paper). It doesn't seem like an outrageous amount of data to put onto cheap scratch HDD (and it doesn't grow indefinitely since really old conversations can be discarded).
> If I have a conversation with claude then come back 30 minutes later to resume the conversation, the KV values for that prefill prefix are going to be exactly the same.
Correct, when you’re using the API you can choose between 60 minute or 5 minute cache writes for this reason, but I believe the subscription doesn’t offer this. 60 minute cache writes are about 25% more expensive than regular cache writes.
I don’t have insights into internals at Anthropic so I don’t know where the pain point is for increasing cache sizes.
Ah I can see how my phrasing might be misleading, but these prompts were made within 5 minutes of each other, the timing I mentioned were what Claude spent working.
is it 5 mins between constant prompting/work or 5 mins as in if i step away from the comp for 5 mins and comp back and prompt again im not subject to reinit?
if it's the latter that's crazy. i dont even know what to do there, compactions already feel like a memory wipe
My dad in India gets prescribed antibiotics whenever he's sick. Despite my constant explanations, he insists that this is how it should be, because when you're sick your immunity is lowered.
On the other hand, the last time I got prescribed antibiotics was probably almost 10 years ago when I ended up in the hospital from an abscess.
Granted, my dad is old, but that part of the world still seems to expect doctors to do more for a common cold than just tell you to rest for a week and take an acetaminophen or phenylephrine if/when needed (even when that's really all you need).
> Granted, my dad is old, but that part of the world still seems to expect doctors to do more for a common cold than just tell you to rest for a week and take an acetaminophen or phenylephrine if/when needed (even when that's really all you need).
FYI phenylephrine is effectively a placebo and the FDA has proposed ending its use in OTC drugs. (There've been HN threads on the subject, with many comments.)
Phenylephrine is a placebo for nasal congestion, but it’s a solid drug for raising blood pressure. Used all the time in anesthesia (obviously not an OTC use).
CadQuery and build123d have been very handy for prototyping stuff for 3d printing. AI still isn't quite good enough to generate correct scripts, but AI autocomplete at least helps with putting together small snippets.
My last project involved making a cosplay helmet. I modeled the shell in blender, it was a low poly design, so I exported it to an OBJ, then put together some Python to load the OBJ, give the triangles some configurable thickness etc. Then I used it to explore how to print the helmet in such a way that the outer surface would be too clean to tell it's FDM printed, without needing to do any sanding.
Initially I explored having cadquery put a number on the back of each triangle and I'd assemble it like a puzzle, but that didn't work out. Eventually I figured out how to cut it up into parts that would also eliminate the need for painting and outer surfaces would be clean, and because it was in code, changing which part a triangle belonged to was a matter of moving the corresponding index into another list.
I probably could've managed it all in blender too, but being much more comfortable with code, it was easier for me to play with normals and manually turning each piece into a solid.
I also go for it for functional designs because, again, tweaking code is more comfortable to me than dealing with constraints and sketches and multiple planes in, say, FreeCAD.
yeah -- have been playing with this as well, ai's spatial reasoning is not quite there yet but with precise construction instructions it can often do the job
for shapes that are hard to print with a traditional slicer, LLMs are also surprisingly good at generating gcode with fullcontrolxyz if you're specific
Maybe referring to it as welfare is odd, but these points are important. It isn't a good look to have a model that tends to get into self-deprecating loops like one of Google's older models, it's an even worse look and potential legal liability if your model becomes associated with a suicide. An overly negative chat model would also just be unpleasant to use.
With the weights being mostly opaque, these kinds of evaluations are an important piece of reducing the harm an AI model can cause.
I feel that anthropomorphizing the model is also potentially very harmful. We've seen that in the LLM interactions that end in tragedy. It's the wording that bothers me.
Interesting to see the benchmark numbers, though at this point I find these incremental seeming updates hard to interpret into capability increases for me beyond just "it might be somewhat better".
Maybe I've skimmed too quickly and missed it, but does calling it 4.7 instead of 5 imply that it's the same as 4.6, just trained with further refined data/fine tuned to adapt the 4.6 weights to the new tokenizer etc?
This seems needlessly cynical. I don't think they said they never planned to release it.
They seemed to make it clear that they expect other labs to reach that level sooner or later, and they're just holding it off until they've helped patch enough vulnerabilities.
Recognizing that certain mutations very blatantly reduce a person's quality of life and making it possible to revert those mutations does not require treating the people who have not had those mutations reverted as lesser.
Thinking of them as lesser leads to a society that prefers to drag each other down instead of lifting each other up.
I had a uncle with Down syndrome. He was the sweetest and funniest person, we remember him every day more than 10 years after he passed away. Down syndrome carries a lot of physical health problems like heart or lung diseases which make their life very painful. He suffered from lung problems since he was 18 until he passed away at 49, living in a lot of pain and being a big burden to my mum and my grandma, who took care of him. Still, it's true, he never lost his smile and love her sister and mother back as much as it's possible, giving all of us who lived with him a lot of joy.
I am very conflicted with these kind of issues, but I think I am of the opinion that it's better to prevent this suffering, but once they're already here we should make their life as easier as possible.
I chose to call it quality of life because I don't think that simply being happy is enough to have quality of life, but I don't agree that it's about valuing intelligence over happiness.
It's a condition they, and their family, have to live with their entire life. You can't really be permanently sad about a condition you have literally been born with and can't expect to change.
Meanwhile, there are conditions that significantly decrease quality of life even though one's intelligence is unaffected. I think the factor is better described as choice. There are a large number of things a person with Downs just does not have the choice to do differently.
I know that I'm in the small minority of people that read Flowers for Algernon and didn't think the ending was a sad one. His life was interrupted with some brief magic and resolved into what it was always meant to be.
People have gotten emotional with me about my take on that, and that's just fiction. I guess my point is I don't think there is a clear morality play here. This is more like a trolley problem where you have to decide for yourself how much control you're comfortable with.
This is incorrect in that the term was not neutral before WW2 nor was Nazi Germany Eugenics really unique. Taking these claims one at a time:
>the term was value neutral.
By the late 1930s the academic community had largely moved on from eugenics, the catholic church denounced it 1930 with their Casti Connubii, the Eugenics Office Records closing in 1935 and finally Laughlin retiring in 1939. (The leading Eugenicist)In 1930s being a Eugenics was viewed much like homeopathy is viewed today.
>Until a certain Austrian painter decided to practice eugenics in a uniquely negative way,
Eugenics in the united states saw the rise of the "Moron Laws" and mass sterilization of marginalized communities in the US. In fact, Nazi Germany's Eugenics policies were largely inspired by US Eugenic legislation and actively promoted by US Eugenicist. (Particularly California) Heck mass sterilization programs in the US didn't even die with WW2 continuing into mid 1970s.
I'm troubled by this thread because the vibe I'm getting is Eugenics was only bad because the science wasn't there yet and the Nazi's did it, this time will be different. No, the aspect which made eugenics dangerous were inherently political and every bit as relevant today than they were a hundred years ago. (Who decides which traits should be "edited" out? What traits should be "edited" in? What policies should be legislated? Who is primarily impacted by these policies? How much agency do the people impacted by these policy have in the situation?)
Powerpoint will continue to persist because other people need to be able to edit your slide deck without understanding your HTML.
My employer blocks office plugins, so I can't try Claude for PowerPoint, but sometimes I get Claude to generate Python scripts, which produce PowerPoint slides via python-pptx. This also benefits from being able to easily read and generate figures from raw data.
I don't really like the way Claude tends to format slides (too much marketing speak and flowcharts), but it has good ideas often enough that it's still worth it to me. So I treat this as a starting point and replace the bad parts.
Most flights are available at high frequencies (on the order of days, weeks) compared to concerts (once a year or so). You also don't care as much about sitting together on a plane.
I disagree, if you can't get seats with your friends in a concert, you might just not go because the social aspect is part of the experience, but if you can't get neighboring seats on a plane, you'd (or at least I would) just tolerate it since you would still get to be together at the main event (the destination).
I hit my 5 hour limit within 2 hours yesterday, initially I was trying the batched mode for a refactor but cancelled after seeing it take 30% of the limit within 5 minutes. Had to cancel and try a serial approach, consumed less (took ~50 minutes, xhigh effort, ~60% of the remaining allocation IIRC), but still very clearly consumed much faster than with 4.6.
It feels like every exchange takes ~5% of the 5 hour limit now, when it used to be maybe ~1-2%. For reference I'm on the Max 5x plan.
For now I can tolerate it since I still have plenty of headroom in my limits (used ~5% of my weekly, I don't use claude heavily every day so this is OK), but I hope they either offer more clarity on this or improve the situation. The effort setting is still a bit too opaque to really help.
reply