More

skippyboxedhero · 2026-02-03T19:25:14 1770146714

It feels very close to a trade-off point.

I agree with all posts in the chain: Opus is good, Anthropic have burned good will, I would like to use other models...but Opus is too good.

What I find most frustrating is that I am not sure if it is even actual model quality that is the blocker with other models. Gemini just goes off the rails sometimes with strange bugs like writing random text continuously and burning output tokens, Grok seems to have system prompts that result in odd behaviour...no bugs just doing weird things, Gemini Flash models seem to output massive quantities of text for no reason...it is often feels like very stupid things.

Also, there are huge issues with adopting some of these open models in terms of IP. Third parties are running these models and you are just sending them all your code...with a code of conduct promise from OpenRouter?

I also don't think there needs to be a huge improvement in models. Opus feels somewhat close to the reasonable limit: useful, still outputs nonsense, misses things sometimes...there are open models that can reach the same 95th percentile but the median is just the model outputting complete nonsense and trying to wipe your file system.

The day for open models will come but it still feels so close and so far.

skippyboxedhero · 2026-02-03T19:03:10 1770145390

Failing to make xenophobic choices when it comes to...enterprise software, is the issue?

The US has spent tens of trillions defending Europe indirectly subsidizing social policies despite this the US has persistently been unpopular with Europeans because, obviously, they are a political target for domestic politicians (btw, you see this almost everywhere...if country A gives country B subsidies, you will almost always find that country A's people are virulently hated by a significant proportion of country B's population, the US was more unpopular than Russian before the Ukraine invasion in Germany...let me just repeat: a country which invaded Europe was more popular than a country which gave hundreds of billions a year in defence subsidies).

Acting as if xenophobia towards the US hasn't always been part of the European political climate is not based in reality. Europe has been trying to protect its own market for decades, unsuccessfully. What is more, there is very limited trade WITHIN Europe in certain industries because of the hurdle of national xenophobia and protectionism. Europe has made an industry out of failure and greivance...and, for some reason, part of this narrative is that no country contributes as much as Europe.

Reality? Iran...continued to break US sanctions for years so that failing European defence companies could sell their junk, investigations of Iranian politicians bribing EU parliamentarians. Russia...continued to break US sanctions after Ukraine invasion, had an extremely subservient relationship with Russia despite being repeatedly told by the US that NordStream 2 would lead to Ukraine invasion, former German president actually works for NordStream. On and on, the same mistakes being made all the time because there has never been any real strategy apart from extreme short-term political advantage to protect continued failure to generate social or economic gain in most of Europe (not all tbf, but the executive polling numbers that you see in some countries is incredible, you wouldn't think they have elections).

jabwd · 2026-02-03T22:11:20 1770156680

I think you are failing to realize the billions the US has made from "defending" europe. Regardless, once the US is no longer colonizing the entire planet and the dollar isn't the only currency anyone cares about your opinion will change realllll quick. You'll have forgotten this wall of nonsense you wrote though by then I'm sure.

lkjvkbn · 2026-02-03T20:39:33 1770151173

Stop with this nonsense. You know it is false.

USA is not defending Europe from anyone.

intrasight · 2026-02-03T22:35:54 1770158154

It's crazy that this nonsense is still promoted - both by Republicans and Democrats, and leaders in Europe. WTF is wrong with people. If Americans are stupid enough to buy into the military-industrial conspiracy that we have to spend 10% of our GDP on defense, then they get what they deserve. The EU is right to call BS on that whole paradigm.

hunterpayne · 2026-02-04T01:40:28 1770169228

The US hasn't spent 10% of our GDP on defense since the cold war. We begged you (both parties) to at least keep at least token military and you couldn't even do that. And on the other side of your continent you have a real risk playing out right now that you can't defend yourself from. The EU is currently having plenty of negative consequences from absurd takes like yours. And even with all that, you write that dribble.

skippyboxedhero · 2026-01-31T00:11:39 1769818299

Correct, momentum acceleration is generally a mean reversion signal in futures, and can be effectively combined with momentum signals i.e. you go long when it goes up but when it starts going up a lot you reduce your position.

And these signals are usually very compressed in time because acceleration is actually just an acceleration in the number of decisions being taken, which tends to blow off quite spectacularly.

Something that has changed is the large retail participation, which is making the scale of these moves quite crazy. Will be interesting to see what happens next, as with crypto the scale of the wipe seems so large that it is hard to see how that participation continues.

Healthy for markets but I am guessing this will conflict heavily with the politics.

skippyboxedhero · 2026-01-24T19:57:01 1769284621

I think the suspicion is based on this app being offered in a region whose government is hostile to privacy and this implementation being connected with the strong nativist bent in Europe.

The "spec" is not relevant in any way because we have no idea what else is going on. Why was it relevant that these operators must specifically be in the EU? Everyone is just complying with the global spec...but the app provider must be in Europe...okay.

jeroenhd · 2026-01-25T01:22:50 1769304170

> Why was it relevant that these operators must specifically be in the EU

The integration is only possible because the EU forced Meta's hand. The law only applies to massive digital empires with gatekeeper levels of control.

I don't think the EU would mind at all if Meta would permit American companies to interoperate with them. Meta won't just permit it, they have to protect their WhatsApp Business money machine of course.

That's also why the feature is only available to EU numbers. Not because BirdyChat hates Australians, but because WhatsApp won't permit them to send messages to numbers from those countries.

oblio · 2026-01-24T20:48:07 1769287687

> region whose government is hostile to privacy

Which government?

skippyboxedhero · 2026-01-24T21:35:56 1769290556

EU. I don't think it is any better at the national level however.

oblio · 2026-01-25T07:19:47 1769325587

The EU is not a government. It's a loose economic confederation. And national European governments vary wildly in their positions on this.

skippyboxedhero · 2026-01-25T17:35:45 1769362545

It isn't an "economic confederation". It has a parliament, an executive, a judiciary, and a civil service. I would read the wiki page on the European Union.

oblio · 2026-01-26T07:17:46 1769411866

The EU parliament can't propose laws, unlike any parliament in the world.

The executive is formed out of national government heads of state, which can veto everything.

Its judiciary and actually all 3 branches are strictly limited in their powers to powers delegated to them (which are weaker than the US Articles of Confederation).

The civil service is covered by the comments above.

In technical terms it is a government, in real life is is strictly limited, albeit growing. No country could operate with the "government" the EU has. France has several million government employees for about 70 million people while the EU has at most 50 000 workers for 450 million citizens).

This is a very complicated topic and I don't really apreciate the condescension inherent in sending me to Wikipedia.

snowmobile · 2026-01-25T11:48:19 1769341699

Call it what you want but the fact remains that they can write a lot of laws the member countries must follow, for better or worse. GDPR, Chat Control, etc.

skippyboxedhero · 2026-01-24T19:39:22 1769283562

It isn't sub agents. The gap with existing tooling is that the abstraction is over a task rather than a conversation (due to the issue with third-party apps, Claude Code has been inherently limited to conversations which is why they have been lacking in this area, Claude Code Web was the first move in this direction), and the AI is actually coordinating the work (as opposed to being constantly prompted by the user).

One of the issues that people had which necessitated this feature is that you have a task, you tell Claude to work on it, and Claude has to keep checking back in for various (usually trivial) things. This workflow allows for more effective independent work without context management issues (if you have subagents, there is also an issue with how the progress of the task is communicated by introducing things like task board, it is possible to manage this state outside of context). The flow is quite complex and requires a lot of additional context that isn't required with chat-based flow, but is a much better way to do things.

The way to think about this pattern - one which many people began concurrently building in the past few months - is an AI which manages other AIs.

vidarh · 2026-01-24T20:50:01 1769287801

It isn't "just" sub agents, but you can achieve most of this just with a few agents that take on generic roles, and a skill or command that just tells claude to orchestrate those agents, and a CLAUDE.md that tells it how to maintain plans and task lists, and how to allow the agents to communicate their progress.

It isn't all that hard to bootstrap. It is, however, something most people don't think about and shouldn't need to have to learn how to cobble together themselves, and I'm sure there will be advantages to getting more sophisticated implementations.

skippyboxedhero · 2026-01-24T21:52:19 1769291539

Right, but the model is still: you tell the AI what to do, this is the AI tells other AIs what to do. The context makes a huge difference because it has to be able to run autonomously. It is possible to do this with SDK and the workflow is completely different.

It is very difficult to manage task lists in context. Have you actually tried to do this? i.e. not within a Claude Code chat instance but by one-shot prompting. It is possible that they have worked out some way to do this, but when you have tens of tasks, merge conflicts, you are running that prompt over months, etc. At best, it doesn't work. At worst, you are burning a lot of tokens for nothing.

It is hard to bootstrap because this isn't how Claude Code works. If you are just using OpenRouter, it is also not easy because, after setting up tools/rebuilding Claude Code, it is very challenging to setup an environment so the AI can work effectively, errors can be returned, questions returned, etc. Afaik, this is basically what Aider does...it is not easy, it is especially not easy in Claude Code which has a lot of binding choices from the business strategy that Anthropic picked.

vidarh · 2026-01-24T23:43:19 1769298199

> Have you actually tried to do this? i.e. not within a Claude Code chat instance but by one-shot prompting.

You ask if I've tried to do this, and then set constraints that are completely different to what I described.

I have done what I described. Several times for different projects. I have a setup like that running right now in a different window.

> It is hard to bootstrap because this isn't how Claude Code works.

It is how Claude Code works when you give it a number of sub-agents with rules for how to manage files that effectively works like task queues, or skills/mcp servers to interact with communications tools.

> it is not easy

It is not easy to do in a generic way that works without tweaks for every project and every user. It is reasonably easy to do for specific teams where you can adjust it to the desired workflows.

skippyboxedhero · 2026-01-25T17:41:03 1769362863

I can tell you based on your description that you did not do this. Subagents are completely different and cannot be used in this way.

No, it isn't how Claude Code works because Claude Code is designed to work with limited task queues, this is not what this feature is. Again, I would suggest you trying to actually build something like this. Why do you think Anthropic are doing this? They just don't understand anything about their product?

No, it doesn't work within that context. Again: sharing context between subagents, single instance running for months...I am not even sure why someone would think this could work. The constraints that I set are the ones that you require to build this...because I have done this. You are talking about having some CLAUDE.md files like you have invented the wheel, lol. HN is great.

vidarh · 2026-01-26T09:01:05 1769418065

> I can tell you based on your description that you did not do this. Subagents are completely different and cannot be used in this way.

And yet I have used them exactly in the way I described. That you assume they can't just demonstrate that you haven't tried very hard.

> No, it isn't how Claude Code works because Claude Code is designed to work with limited task queues, this is not what this feature is.

Claude allows your setup to execute arbitrary code that gets injected into context. The entire point is that you don't need to rely on built in capabilities of Claude Code to do any of this.

> No, it doesn't work within that context. Again: sharing context between subagents, single instance running for months...I am not even sure why someone would think this could work.

I know what I described works because I am doing it. You can achieve what I described in a variety of ways: Using skills to tell the agents how to access a shared communications channel. Using MCP servers. Just using CLAUDE.md and describe how to use files as a shared communications channel.

This is only difficult if you lack imagination.

> You are talking about having some CLAUDE.md files like you have invented the wheel, lol. HN is great.

No, the exact opposite: I'm saying that this isn't hard, that it isn't anything revolutionary or even special. It's pretty basic usage of the existing facilities. There's no invention there.

You're the one trying to imply this is more revolutionary than it is.

ukuina · 2026-01-24T23:33:19 1769297599

It's natural to assume that subagents will scale to the next level of abstraction; as you mentioned, they do not.

The unlock here is tmux-based session management for the teammates, with two-way communication using agent inbox. It works very well.

adastra22 · 2026-01-24T20:53:19 1769287999

> Claude Code has been inherently limited to conversations

How so? I’ve been using “claude -p” for a while now.

But even within an interactive session, an agent call out is non-interactive. It operates entirely autonomously, and then reports back the end result to the top level agent.

skippyboxedhero · 2026-01-24T21:38:20 1769290700

Because of OAuth. If they gave people API keys then no-one buys their ludicrously priced API product (I assume their strategy is to subsidise their consumer product with the business product).

You can use Claude Code SDK but it requires a token from Claude Code. If you use this token anywhere else, your account gets shut down.

Claude -p still hits Claude Code with all the tools, all the Claude Code wrapping.

tobyjsullivan · 2026-01-24T21:59:25 1769291965

I believe they’re talking about Claude Code’s built-in agents feature which works fine with a Max subscription.

https://code.claude.com/docs/en/sub-agents

Are you talking about the same thing or something else like having Claude start new shell sessions?

skippyboxedhero · 2026-01-25T17:44:42 1769363082

Okay...and continue to work up the levels? Why do you think OAuth might be limiting? Why do you think they started building subagents first? What is the difference between subagents and products like Aider?

If they were able to wrap the API directly, this is relatively easy to implement but they have to do this within Claude Code which is based on giving a prompt/hiding API access. This is obvious if you think carefully about what Claude Code is, what requests it is sending to the API, etc.

adastra22 · 2026-01-26T07:23:13 1769412193

What does OAuth have to do with any of this? I think you are deeply confused.

adastra22 · 2026-01-25T00:39:33 1769301573

That’s not what this subthread is about. They’re talking about the subagent within Claude Code itself.

Btw, you can use the Claude Agent SDK (the renamed Claude Code SDK) with a subscription. I can tell you it works out of the box, and AFAIK it is not a ToS violation.

skippyboxedhero · 2026-01-25T17:48:36 1769363316

Yes, you can build feature in OP with SDK have done this. Works well...but this is something completely different to agents.

Subagents and the auth implementation are linked because Anthropic's initial strategy was to have a prompt-based interaction which, because of the progress in model performance, has ended up being limiting as users want to run things without prompting. This is why they developed Claude Code Web (this product is more similiar to what this feature will do than subagents, subagents are similar if you have a very shallow understanding...the purpose of this change is to abstract away human interaction, i assume that will use subagents but the context/prompt management is quite different).

mmcclure · 2026-01-25T04:29:22 1769315362

Oh really? I was looking at the Agent SDK for an idea and the docs seemed to imply that wasn't the case.

    Unless previously approved, we do not allow third party developers to offer Claude.ai login or rate limits for their products, including agents built on the Claude Agent SDK. Please use the API key authentication methods described in this document instead.

I didn't dig deeper, but I'd pick it back up for a little personal project if I could just use my current subscription. Does it just use your local CC session out of the box?

adastra22 · 2026-01-25T15:52:57 1769356377

You can’t resell - that’s the third party language. You can build and use for your own purposes. And yes it just picks up your local sessions out of the box.

TeMPOraL · 2026-01-24T22:59:35 1769295575

> If they gave people API keys then no-one buys their ludicrously priced API product

The main driver for those subscriptions is that their monthly cost with Opus 3.7 and up pays itself back in couple hours of basic CC use, relative to API prices.

blibble · 2026-01-24T23:03:22 1769295802

can't you just rip the oauth client secret out of the code?

skippyboxedhero · 2026-01-25T17:49:53 1769363393

You can. This is how Opencode worked, but they are clamping down on that approach.

As someone else has mentioned, you can actually use SDK for programmatic access. But that happens within the CC wrapper so it isn't a true API experience i.e. it has CC tools.

skippyboxedhero · 2026-01-24T19:36:12 1769283372

Also created my own version of this. Seems like this is an idea whose time has come.

My implementation was slightly different as there is no shared state between tasks, and I don't run them concurrently/coordinate. Will be interesting to see if this latter part does work because I tried similar patterns and it didn't work. Main issue, as with human devs, was structuring work.

skippyboxedhero · 2026-01-23T17:11:59 1769188319

There is an incentive for dishonesty about what AI can and cannot do.

People from OpenAI was saying that GPT2 had achieved AGI. There is a very clear incentive for that statement to be made by people who are not using AI for anything productive.

Even as increasingly bombastic claims are made, it is obvious that the best AI cannot one-shot everything if you are an actual user. And the worst ones: was using Gemini yesterday and it wouldn't stop outputting emojis, was using Grok and it refused to give me a code snippet because it claimed its system prompt forbade this...what can you say?

I don't understand why anyone would want to work on a codebase they didn't understand either. What happens when something goes wrong?

Again though, there is massive financial incentive to make these claims, and some other people will fall along with that because it is good for their career, etc. I have seen this in my own company where senior people are shoehorning this stuff in that they clearly do not actually use or understand (to be clear, this is engineering not management...these are people who definitely should understand but do not).

Great tool, but the 100% vibecoding without looking at the code, for something that you are actually expecting others to use, is a bad idea. Feels more like performance art than actual work. I like jokes, I like coding, room for both but don't confuse the two.

rozap · 2026-01-24T05:12:29 1769231549

> I don't understand why anyone would want to work on a codebase they didn't understand either. What happens when something goes wrong?

It's your coworker's problem. The one who actually understands the big picture and how the system fits into it. They'll deal with it.

skippyboxedhero · 2026-01-23T15:47:25 1769183245

Ah yes, the risk of small fines that is why people won't do dangerous things. Have we tried a £50 fine for murder?

Economist brain.

The problem is very simple: driving tests aren't hard enough, too many people have driving licences, and we don't retest people. In addition, enforcement of people driving without a licence is completely pathetic (as anyone who has driven in the UK can attest to, the stuff I have seen over the past few years is insane...obviously there is an underlying cause but if you see a clapped out hatchback, Just Eats bag in the front seat, P plates on the car, you know to steer well clear...as if the multiple dents on the car already didn't give it away).

jen20 · 2026-01-23T15:54:36 1769183676

Automatic enforcement of dumb low level stuff is supposed to free up police time for the more serious things. Whether that happens or not is a political decision. I remember the time before red light cameras in London, and the time afterwards, and the situation was much improved after they showed up.

I agree the driving test is too easy (though several orders of magnitude more difficult than in the US states I've had to do one in), and there is too little enforcement of otherwise dangerous behaviour.

skippyboxedhero · 2026-01-23T16:13:41 1769184821

I don't think I mentioned anything wrong with automatic enforcement. I think the claim was that when confronted with a financial incentive, people who drive recklessly will stop driving recklessly. Would this be the case if we paid people £50/month to drive better?

It makes no sense at all. The problem in policy is generally that you have people talking past each other: speed limits are effective for people who are generally going to comply with them anyway, they are not intended to stop serious accidents. The majority of accidents are not caused by "accidents" (as most people on here would think them), they are caused by people who drive recklessly a huge proportion of the time and eventually have an accident.

Again, the solution to this is simple: do not give these people driving licences. In the UK, you can kill someone with your car driving recklessly and be out of jail in 18 months. And I don't think people realise this is true, or that this won't have been the first "near miss" for these people...it will have been months and years of doing stuff that will kill someone, and eventually killing them. How are they supposed to kill people with cars if they can't own a car?

mercanlIl · 2026-01-23T18:40:00 1769193600

Some of these infractions also carry the risk of losing your license. So it is more than just the fine.

skippyboxedhero · 2026-01-23T15:43:04 1769182984

your source...is a union? really? you can look at ONS numbers yourself (and you will see this isn't the case).

Scotland has seen a drastic reduction in police numbers (unfortunately for you, not a Tory government :( oh well) despite record government funding levels. Labour's plan appears to be attempting the same trick with consolidation of forces, which should allow massive reductions in numbers. In Scotland, there are some days when there is one traffic car covering an area the size of England, and the expected time to respond to car accidents is usually 6-12 hours (this includes situations with serious injuries).

There is a lot more going on here than funding because government has never had more resources. The Tories, to their credit, actually put money in but (even then) the results were no better.

Also, in response to original comment, I am not sure why you think the Police are competent. Much of the policing function of a few decades ago not lies with private companies. Police numbers are generally high but the level of output has never been lower. You are seeing this in multiple areas of the public sector, public-sector output hasn't increased since 1997 whilst govt spending to GDP has basically doubled. The police have massive structural issues with their remit in the UK because of demographic change, and it is generally seen as a career for people of low ability resulting in fairly weak performance. It doesn't feel complex but than you realise that people don't understand that a politician looking to get elected might say it is even simpler. Does anyone actually work at a company where more spending increases results? I have never seen this to be the case. If anything, more spending seems to lead worse results.

amiga386 · 2026-01-23T15:59:13 1769183953

> you can look at ONS numbers yourself

ONS numbers say >20,000 fewer frontline officers from 2010-2018, which is pretty much in line with what the union said. See the graph here:

https://www.gov.uk/government/statistics/police-workforce-en...

> In Scotland, there are some days when there is one traffic car covering an area the size of England

Are you high, or did an AI write this?

Area of Scotland: 80,231 km^2

Area of England: 132,932 km^2

So on some days, in Scotland, there is one traffic car covering an area that is larger than Scotland. OK, where's it patrolling? Or are you saying Police Scotland only sends out 60% of one car to cover the whole country?

skippyboxedhero · 2026-01-23T16:26:22 1769185582

2010-2018...when did the Tories time in government end? Based on your comment, I am assuming 2018.

Lol, quite the pedant. To be clear though, yes when they are short-staffed they only have one car actually on patrol for the whole country (iirc, the actual full staffing policy overnights is two cars...which you can see has been covered by the media).

Traffic was consolidated into Police Scotland so there is only one police force, and so there aren't local forces patrolling a local area. I believe the total number of traffic police is something like 400 now (which is mostly not people on patrol) and so, overnight around holidays, the policy is to have two cars which turns into less than that on some occasions.

GordonS · 2026-01-23T15:57:15 1769183835

> In Scotland, there are some days when there is one traffic car covering an area the size of England,

Scotland is smaller than England, so this makes no sense.

Furthermore, anyone who drives regularly in Scotland knows this to be completely false - there are plenty of traffic cops around (sometimes incognito too), and they are sometimes even seen waiting in rural and semi-rural areas.

skippyboxedhero · 2026-01-23T16:30:02 1769185802

Again, there are not. The number has fallen significantly...I am not sure what you are arguing with (or why). You can just check because the number of police and the number of traffic police is reported. If you just Google, you will see that the current staffing level for overnight in Scotland is two cars for traffic police.

I live in a rural area, I have done so for two/three decades. When I moved here, you very often saw police doing speed checks because I live in an affluent area and the police would come out if you asked the right people. I don't think I have seen that for fifteen years. Again though, the data is that the number is way down since consolidation...which was the point and stated aim of the policy.

Hilarious to see pearl-clutching when people point out the SNP has been doing this after complaining about the Tories. This is why the UK is so shit, reality doesn't matter, just politics.

GordonS · 2026-01-23T18:05:35 1769191535

> Hilarious to see pearl-clutching when people point out the SNP has been doing this after complaining about the Tories. This is why the UK is so shit, reality doesn't matter, just politics.

I'm not sure where this tirade came from; I wasn't arguing in favour of any political party.

But getting back to the matter of traffic police - I have eyes, and I can see traffic cops with them. I have family in the force elsewhere in Scotland too, so I know that what you're saying simply isn't true. I really don't know why you are making this false claim.

skippyboxedhero · 2026-01-19T17:44:52 1768844692

Right, but what people miss is that some people (usually not academic economists) are able to predict the economy with good levels of accuracy. It is not hard, there is a ton of data, the problem is that you have to drop your own personal interest in politics...for 99.99% of people feeling politically validated is more important than being right on the economy.

threethirtytwo · 2026-01-19T17:56:53 1768845413

Isn't there reams of data on weather patterns too? I thought that like the weather, economic systems are fundamentally chaotic and thus demonstrably unpredictable. There's a whole field of math where we can create these mathematical models that simulate chaos and are fundamentally unpredictable. I was under the impression that Economics falls under the purview of this mathematical theory in terms of unpredictability.

If I may ask, who are the people that do these predictions? What is there methodology and how do you know it's not luck?