More

hgoel · 2026-04-18T17:31:08 1776533468

The bump from 4.6 to 4.7 is not very noticeable to me in improved capabilities so far, but the faster consumption of limits is very noticeable.

I hit my 5 hour limit within 2 hours yesterday, initially I was trying the batched mode for a refactor but cancelled after seeing it take 30% of the limit within 5 minutes. Had to cancel and try a serial approach, consumed less (took ~50 minutes, xhigh effort, ~60% of the remaining allocation IIRC), but still very clearly consumed much faster than with 4.6.

It feels like every exchange takes ~5% of the 5 hour limit now, when it used to be maybe ~1-2%. For reference I'm on the Max 5x plan.

For now I can tolerate it since I still have plenty of headroom in my limits (used ~5% of my weekly, I don't use claude heavily every day so this is OK), but I hope they either offer more clarity on this or improve the situation. The effort setting is still a bit too opaque to really help.

matheusmoreira · 2026-04-18T22:40:19 1776552019

The most frustrating part is the quality loss caused by the forced adaptive thinking. It eats 5-10% of my Max 5x usage and churns for ten minutes, only to come back with totally untrustworthy results. It lazily hand-waves issues away in order to avoid reading my actual code and doing real reasoning work on it. Opus simply cannot be trusted if adaptive thinking is enabled.

sutterd · 2026-04-19T02:33:49 1776566029

You don't have to use adaptive thinking. It had been turned off on my main work computer. I was using a different computer on a trip and I started getting so angry at Claude for doing a bad job. I evetually figured out it was adaptive thinking and set it to "hard" and it started working again. At the time I think "hard" was the top choice. With 4.7, my computer now shows "xhard", which I assume is the equivelent setting. There is one higher setting than this, which I haven't tried yet. I would tell you how to change these settings, but I don't remember. By the way, I have been happy with 4.7 so far. I actually did not like 4.6 and preferred 4.5 and used that most of the time until this new release.

scrollop · 2026-04-19T06:14:18 1776579258

"With Opus 4.6, extended thinking was a toggle you managed: turn it on for hard stuff, off for quick stuff. If you left it on, every question paid the thinking tax whether it needed to or not. Now, with Opus 4.7, extended thinking becomes adaptive thinking. "

https://claude.com/resources/tutorials/working-with-claude-o...

You want extended thinking? It's not adaptive thinking and opus will turn it on if it thinks it needs to. But it probably won't, according to user reports as tokens are expensive. Except opus 4.7 now uses 35% more and outputs more thinking tokens.

__s · 2026-04-19T04:22:01 1776572521

/effort

thefourthchime · 2026-04-19T02:19:03 1776565143

It's like they TRYING to get people to move to GPT 5.4, which I trust far more at coding. It's just slightly more annoying as an agent with codex.

_blk · 2026-04-18T18:05:11 1776535511

From what I understand you shouldn't wait more than 5min between prompts without compacting or clearing or you'll pay for reinitializing the cache. With compaction you still pay but it's less input tokens. (Is compaction itself free?)

gck1 · 2026-04-18T20:20:35 1776543635

Cache ttl on max subscriptions is 1h, FYI.

bashtoni · 2026-04-18T22:25:44 1776551144

Only if you set `ENABLE_PROMPT_CACHING_1H`, which was mentioned in the release notes for a recent Claude Code release but doesn't seem to be in the official docs.

g4cg54g54 · 2026-04-18T22:36:24 1776551784

subusers supposedly get it automatic again after the fix (and now also with `DISABLE_TELEMETRY=1`)

but if you are api user you must set `ENABLE_PROMPT_CACHING_1H` as i understood

and when using your own api (via `ANTHROPIC_BASE_URL`) ensure `CLAUDE_CODE_ATTRIBUTION_HEADER=0` is set as well... https://github.com/anthropics/claude-code/issues/50085

and check out the other neckbreakers ive found pukes lots of malicious compliance by feels... :/

[BUG] new sessions will *never* hit a (full)cache #47098 https://github.com/anthropics/claude-code/issues/47098

[BUG] /clear bleeds into the next session (what also breaks cache) #47756 https://github.com/anthropics/claude-code/issues/47756

[BUG] uncachable system prompt caused by includeGitInstructions / CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS -> git status https://github.com/anthropics/claude-code/issues/47107

andersa · 2026-04-18T22:29:38 1776551378

Bruh. It's getting hard to track down all these MAKE_IT_ACTUALLY_WORK settings that default to off for no reason.

plaguuuuuu · 2026-04-19T02:42:44 1776566564

no way, I didn't realise this worked.

My attention span is such that I get side tracked and wind up taking longer than 5 mins quite a bit :D

_blk · 2026-04-18T20:33:05 1776544385

That'd be awesome but it doesn't reflect what I see. Do you have a source for that? What I see is if take a quick break the session loses ~5% right at the start of the next prompt processing. (I'm currently on max 5x)

gck1 · 2026-04-18T20:44:04 1776545044

Not at my workstation right now, but simply ask claude to analyze jsonl transcript of any session, there are two cache keys there, one is 5m, another 1h. Only 1h gets set. There are also some entries there that will tell you if request was a cache hit or miss, or if cache rewrite happened. I've had claude test another claude and on max 5x subscription, cache miss only happened if message was sent after 1h, or if session was resumed using /resume or --resume (this is a bug that exists since January - all session resumes will cause a full cache rewrite).

However, cache being hit doesn't necessarily mean Anthropic won't just subtract usage from you as if it wasn't hit. It's Anthropic we're talking about. They can do whatever they want with your usage and then blame you for it.

Fabricio20 · 2026-04-18T21:00:51 1776546051

I have heard that if you have telemetry disabled the cache is 5 minutes, otherwise 1h. No clue how true that is however my experience (with telemetry enabled) has been the 1h cache.

HarHarVeryFunny · 2026-04-18T21:29:04 1776547744

They've acknowledged that as a bug and have fixed it.

ethanj8011 · 2026-04-18T20:43:31 1776545011

It's true as far as I can tell, just by my own checking using `/status`. You can also tell by when the "clear" reminder hint shows up. Also if you look at the leaked claude code you can see that almost everything in the main thread is cached with 1H TTL (I believe subagents use 5 minute TTL)

krackers · 2026-04-18T22:03:47 1776549827

>pay for reinitializing the cache

Why can't they save the kv cache to disk then later reload it to memory?

vanviegen · 2026-04-19T07:07:22 1776582442

Isn't that how the kv cache currently works? Of course they could decide to hold on to cache items for longer than an hour, but the storage requirements are pretty significant while the chance of sessions resumption slinks rapidly.

zozbot234 · 2026-04-19T07:29:38 1776583778

The storage requirements for large-model KV caches are actually comparatively tiny: the per-token size grows far less than model parameters. Of course, we're talking "tiny" for stashing them on bulk storage and slowly fetching them back to RAM. But that should still be viable for very long context, since the time for running prefill is quadratic.

vanviegen · 2026-04-19T09:19:47 1776590387

We only have open models to go by, so looking at GLM 5.1 for instance, we're talking about almost 300 GB of kv-cache for a full context window of 200k tokens.

That's hardly tiny.

stavros · 2026-04-18T23:57:42 1776556662

Probably because the costly operation is loading it onto the GPU, doesn't matter if it's from disk or from your request.

zozbot234 · 2026-04-19T00:12:22 1776557542

The point of prompt caching is to save on prefill which for large contexts (common for agentic workloads) is quite expensive per token. So there is a context length where storing that KV-cache is worth it, because loading it back in is more efficient than recomputing it. For larger SOTA models, the KV cache unit size is also much smaller compared to the compute cost of prefill, so caching becomes worthwhile even for smaller context.

stingraycharles · 2026-04-19T02:04:50 1776564290

It’s a shitload of data, and it only works if all the tokens are 100% identical, i.e. all the attention values are exactly the same.

Typically it’s cached for about 5 minutes, you can pay extra for longer caches.

krackers · 2026-04-19T03:02:24 1776567744

If I have a conversation with claude then come back 30 minutes later to resume the conversation, the KV values for that prefill prefix are going to be exactly the same. That's the whole point of this caching in the first place.

If you're willing to incur a latency penalty on a "cold resume" (which is fine for most use-cases), why couldn't they just move it to disk. The size of the KV cache should scale on the order of something like (context_length * n_layers * residual_length). I think for a standard V3-MoE model at 1M token length, this should be on the order of 100G at FP16? And you can surely play tricks with KV compression (e.g. the recent TurboQuant paper). It doesn't seem like an outrageous amount of data to put onto cheap scratch HDD (and it doesn't grow indefinitely since really old conversations can be discarded).

stingraycharles · 2026-04-19T03:42:28 1776570148

> If I have a conversation with claude then come back 30 minutes later to resume the conversation, the KV values for that prefill prefix are going to be exactly the same.

Correct, when you’re using the API you can choose between 60 minute or 5 minute cache writes for this reason, but I believe the subscription doesn’t offer this. 60 minute cache writes are about 25% more expensive than regular cache writes.

I don’t have insights into internals at Anthropic so I don’t know where the pain point is for increasing cache sizes.

conception · 2026-04-18T18:39:00 1776537540

Yeah the caching change is probably 90% of “i run out of usage so fast now!” Issues.

hgoel · 2026-04-18T18:15:27 1776536127

Ah I can see how my phrasing might be misleading, but these prompts were made within 5 minutes of each other, the timing I mentioned were what Claude spent working.

trueno · 2026-04-18T20:01:29 1776542489

is it 5 mins between constant prompting/work or 5 mins as in if i step away from the comp for 5 mins and comp back and prompt again im not subject to reinit?

if it's the latter that's crazy. i dont even know what to do there, compactions already feel like a memory wipe

hgoel · 2026-04-17T03:17:31 1776395851

My dad in India gets prescribed antibiotics whenever he's sick. Despite my constant explanations, he insists that this is how it should be, because when you're sick your immunity is lowered.

On the other hand, the last time I got prescribed antibiotics was probably almost 10 years ago when I ended up in the hospital from an abscess.

Granted, my dad is old, but that part of the world still seems to expect doctors to do more for a common cold than just tell you to rest for a week and take an acetaminophen or phenylephrine if/when needed (even when that's really all you need).

Marsymars · 2026-04-17T04:15:34 1776399334

> Granted, my dad is old, but that part of the world still seems to expect doctors to do more for a common cold than just tell you to rest for a week and take an acetaminophen or phenylephrine if/when needed (even when that's really all you need).

FYI phenylephrine is effectively a placebo and the FDA has proposed ending its use in OTC drugs. (There've been HN threads on the subject, with many comments.)

Pseudoephedrine works great though.

devilbunny · 2026-04-17T11:39:12 1776425952

Phenylephrine is a placebo for nasal congestion, but it’s a solid drug for raising blood pressure. Used all the time in anesthesia (obviously not an OTC use).

hgoel · 2026-04-17T03:02:17 1776394937

CadQuery and build123d have been very handy for prototyping stuff for 3d printing. AI still isn't quite good enough to generate correct scripts, but AI autocomplete at least helps with putting together small snippets.

My last project involved making a cosplay helmet. I modeled the shell in blender, it was a low poly design, so I exported it to an OBJ, then put together some Python to load the OBJ, give the triangles some configurable thickness etc. Then I used it to explore how to print the helmet in such a way that the outer surface would be too clean to tell it's FDM printed, without needing to do any sanding.

Initially I explored having cadquery put a number on the back of each triangle and I'd assemble it like a puzzle, but that didn't work out. Eventually I figured out how to cut it up into parts that would also eliminate the need for painting and outer surfaces would be clean, and because it was in code, changing which part a triangle belonged to was a matter of moving the corresponding index into another list.

I probably could've managed it all in blender too, but being much more comfortable with code, it was easier for me to play with normals and manually turning each piece into a solid.

I also go for it for functional designs because, again, tweaking code is more comfortable to me than dealing with constraints and sketches and multiple planes in, say, FreeCAD.

awinter-py · 2026-04-17T19:25:04 1776453904

yeah -- have been playing with this as well, ai's spatial reasoning is not quite there yet but with precise construction instructions it can often do the job

for shapes that are hard to print with a traditional slicer, LLMs are also surprisingly good at generating gcode with fullcontrolxyz if you're specific

hgoel · 2026-04-16T23:29:21 1776382161

Maybe referring to it as welfare is odd, but these points are important. It isn't a good look to have a model that tends to get into self-deprecating loops like one of Google's older models, it's an even worse look and potential legal liability if your model becomes associated with a suicide. An overly negative chat model would also just be unpleasant to use.

With the weights being mostly opaque, these kinds of evaluations are an important piece of reducing the harm an AI model can cause.

deflator · 2026-04-17T13:13:36 1776431616

I feel that anthropomorphizing the model is also potentially very harmful. We've seen that in the LLM interactions that end in tragedy. It's the wording that bothers me.

hgoel · 2026-04-16T15:15:04 1776352504

Interesting to see the benchmark numbers, though at this point I find these incremental seeming updates hard to interpret into capability increases for me beyond just "it might be somewhat better".

Maybe I've skimmed too quickly and missed it, but does calling it 4.7 instead of 5 imply that it's the same as 4.6, just trained with further refined data/fine tuned to adapt the 4.6 weights to the new tokenizer etc?

hgoel · 2026-04-16T15:07:01 1776352021

This seems needlessly cynical. I don't think they said they never planned to release it.

They seemed to make it clear that they expect other labs to reach that level sooner or later, and they're just holding it off until they've helped patch enough vulnerabilities.

hgoel · 2026-04-16T04:01:41 1776312101

What other method do you propose to deal with data loss?

hgoel · 2026-04-16T03:26:21 1776309981

Recognizing that certain mutations very blatantly reduce a person's quality of life and making it possible to revert those mutations does not require treating the people who have not had those mutations reverted as lesser.

Thinking of them as lesser leads to a society that prefers to drag each other down instead of lifting each other up.

TurdF3rguson · 2026-04-16T04:06:06 1776312366

I don't know about this argument because they seem a lot happier than I am.

That's not to say that it's unreasonable to value intelligence over happiness, but framing it as quality of life seems off.

amunozo · 2026-04-16T07:42:54 1776325374

I had a uncle with Down syndrome. He was the sweetest and funniest person, we remember him every day more than 10 years after he passed away. Down syndrome carries a lot of physical health problems like heart or lung diseases which make their life very painful. He suffered from lung problems since he was 18 until he passed away at 49, living in a lot of pain and being a big burden to my mum and my grandma, who took care of him. Still, it's true, he never lost his smile and love her sister and mother back as much as it's possible, giving all of us who lived with him a lot of joy.

I am very conflicted with these kind of issues, but I think I am of the opinion that it's better to prevent this suffering, but once they're already here we should make their life as easier as possible.

hgoel · 2026-04-16T11:07:31 1776337651

I chose to call it quality of life because I don't think that simply being happy is enough to have quality of life, but I don't agree that it's about valuing intelligence over happiness.

It's a condition they, and their family, have to live with their entire life. You can't really be permanently sad about a condition you have literally been born with and can't expect to change.

Meanwhile, there are conditions that significantly decrease quality of life even though one's intelligence is unaffected. I think the factor is better described as choice. There are a large number of things a person with Downs just does not have the choice to do differently.

TurdF3rguson · 2026-04-16T12:14:36 1776341676

I know that I'm in the small minority of people that read Flowers for Algernon and didn't think the ending was a sad one. His life was interrupted with some brief magic and resolved into what it was always meant to be.

People have gotten emotional with me about my take on that, and that's just fiction. I guess my point is I don't think there is a clear morality play here. This is more like a trolley problem where you have to decide for yourself how much control you're comfortable with.

wqaatwt · 2026-04-16T06:12:30 1776319950

That’s still eugenics, though. Except this time it’s not pseudoscience.

RobotToaster · 2026-04-16T07:56:28 1776326188

Until a certain Austrian painter decided to practice eugenics in a uniquely negative way, the term was value neutral.

The motor bus was hailed as a eugenic invention because it helped prevent inbreeding in small villages, for instance.

MerManMaid · 2026-04-16T17:21:24 1776360084

This is incorrect in that the term was not neutral before WW2 nor was Nazi Germany Eugenics really unique. Taking these claims one at a time:

>the term was value neutral.

By the late 1930s the academic community had largely moved on from eugenics, the catholic church denounced it 1930 with their Casti Connubii, the Eugenics Office Records closing in 1935 and finally Laughlin retiring in 1939. (The leading Eugenicist)In 1930s being a Eugenics was viewed much like homeopathy is viewed today.

https://en.wikipedia.org/wiki/Casti_connubii - https://en.wikipedia.org/wiki/Eugenics_in_the_United_States

>Until a certain Austrian painter decided to practice eugenics in a uniquely negative way,

Eugenics in the united states saw the rise of the "Moron Laws" and mass sterilization of marginalized communities in the US. In fact, Nazi Germany's Eugenics policies were largely inspired by US Eugenic legislation and actively promoted by US Eugenicist. (Particularly California) Heck mass sterilization programs in the US didn't even die with WW2 continuing into mid 1970s.

https://en.wikipedia.org/wiki/Nazi_eugenics - https://alexwellerstein.com/publications/wellerstein_stateso...

I'm troubled by this thread because the vibe I'm getting is Eugenics was only bad because the science wasn't there yet and the Nazi's did it, this time will be different. No, the aspect which made eugenics dangerous were inherently political and every bit as relevant today than they were a hundred years ago. (Who decides which traits should be "edited" out? What traits should be "edited" in? What policies should be legislated? Who is primarily impacted by these policies? How much agency do the people impacted by these policy have in the situation?)

hgoel · 2026-04-16T01:50:33 1776304233

Powerpoint will continue to persist because other people need to be able to edit your slide deck without understanding your HTML.

My employer blocks office plugins, so I can't try Claude for PowerPoint, but sometimes I get Claude to generate Python scripts, which produce PowerPoint slides via python-pptx. This also benefits from being able to easily read and generate figures from raw data.

I don't really like the way Claude tends to format slides (too much marketing speak and flowcharts), but it has good ideas often enough that it's still worth it to me. So I treat this as a starting point and replace the bad parts.

hgoel · 2026-04-15T23:27:34 1776295654

Most flights are available at high frequencies (on the order of days, weeks) compared to concerts (once a year or so). You also don't care as much about sitting together on a plane.

ipaddr · 2026-04-16T02:03:43 1776305023

You care just as much on a plane. Sitting beside wife/friend => stranger

hgoel · 2026-04-16T02:08:54 1776305334

I disagree, if you can't get seats with your friends in a concert, you might just not go because the social aspect is part of the experience, but if you can't get neighboring seats on a plane, you'd (or at least I would) just tolerate it since you would still get to be together at the main event (the destination).

mrWiz · 2026-04-16T06:28:31 1776320911

This position sounds bonkers to me. I don’t care at all about who I sit next to on a plane but like to see concerts with friends.