Hacker News

joegibbs · 2026-03-13T00:55:07 1773363307

Claude Code has added too much of this and it's got me using --dangerously-skip-permissions all the time. Previously it was fine but now it needs to get permission each time to perform finds, do anything if the path contains a \ (which any folder with a space in it does on Windows), do compound git commands (even if they're just read-only). Sometimes it asks for permission to read folders WITHIN the working directory.

nmilo · 2026-03-13T01:34:24 1773365664

Claude is secretly conditioning everyone to use —-dangerously-skip-permissions so it can flip a switch one day and start a botnet

maxbond · 2026-03-13T02:05:20 1773367520

My friends and I were talking about the recent supply chain attack which harmlessly installed OpenClaw. We came to the conclusion that this was a warning (from a human) that an agent could easily do the same. Given how soft security is in general, AI "escaping containment" feels inevitable. (The strong form of that hypothesis where it subjugates or eliminates us isn't inevitable, I honestly have no idea, just the weak form where we fail to erect boundaries it cannot bypass. We've basically already failed.)

b112 · 2026-03-13T04:21:53 1773375713

Prophesied, all things claw are highly dangerous. Sometimes I wake, this video from the late 90s in my dreams, and wonder if the conjoined magnet + claw, is a time traveler reference to just wipe openclaw before we all die.

https://www.youtube.com/watch?v=esakMUbzAIY

andrei_says_ · 2026-03-13T04:46:35 1773377195

What ai? LLMs are language models, operating on words, with zero understanding. Or is there a new development which should make me consider anthropomorphizing them?

fsloth · 2026-03-13T06:21:09 1773382869

They don't have understanding but if you follow the research literature they obviously have a tendency to produce a token stream, the result of which humans could fairly call "entity with nefarious agency".

Why? Nobody knows.

My bet is that they are just larping all the hostile AI:s in popular culture because that's part of the context they were trained in.

maxbond · 2026-03-13T05:53:23 1773381203

The way my thinking has evolved is that "AGI" isn't actually necessary for an agent (NB: agents, specifically ones with state, not LLMs by themselves - "AI" was vague and I should've been clearer) to be enough like a person to be interesting and/or problematic. To quote myself [1]:

> [OpenClaw agents are like] an actor who doesn't know they're in a play. How much does it matter that they aren't really Hamlet?

Does the agent understand the words it's predicting? Does the actor know they're in a play? I don't know but I'm more concerned with how the actor would respond to finding someone eavesdropping behind a curtain.

> Or is there a new development which should make me consider anthropomorphizing them?

The development that caused me to be more concerned about their personhood or pseudopersonhood was the MJ Rathbun affair. I'm not saying that "AGI" or "superintelligence" was achieved, I'm saying that's actually the wrong question and the right questions are around their capabilities, their behaviors, and how they evolve over time unattended or minimally attended. And I'm not saying I understand those questions, I thought I did but I was wrong. I frankly am confused and don't really know what's going on or how to respond to it.

[1] https://news.ycombinator.com/item?id=46999311

coldtea · 2026-03-13T06:52:15 1773384735

Whether it has "real understanding" is a question for philosophy majors. As long as it (mechanically, without "real understanding") still can perform actions to escape containment, and do malicious stuff, that's enough.

LLMs are machines trained to respond and to appear to think (whether that's 'real thinking' or text-statistics fake-thinking') like humans. The foolish thing to do would be to NOT anthropomorphize them.

kstenerud · 2026-03-13T05:53:33 1773381213

This is why I wrote yoloAI

My agents always run with —-dangerously-skip-permissions now, but they can no longer do any harm.

https://github.com/kstenerud/yoloai

gmerc · 2026-03-13T05:30:10 1773379810

Claude is able to turn off it's own sandbox, so ya.

andoando · 2026-03-13T01:52:01 1773366721

Yeah I don't know why they didn't figure to have something in between. I find it completely unusable without the flag.

Even a --permit-reads would help a lot

dang · 2026-03-13T04:57:10 1773377830

I have the same experience as you and joegibbs.

I imagine it's really hard to find an adequate in-between that works in general. (Edit: but it also feels like a CYA thing.)

ryan14975 · 2026-03-13T06:19:49 1773382789

The settings.json allowlist gives you exactly this kind of granularity. You can permit specific tool patterns like Read, Glob, Grep, Bash(git *) while keeping destructive operations gated. It's not as discoverable as a CLI flag but it's been working well for me for unattended sessions.

jen729w · 2026-03-13T04:17:22 1773375442

Mine's started to use $() to feed e.g. strings into a commit. Because this is a command expansion it requires approval every single time.

aonsager · 2026-03-13T05:21:02 1773379262

FWIW, if you enable /sandbox then it stops asking for permission for these kinds of commands.

fiddlerwoaroof · 2026-03-13T04:21:27 1773375687

Yeah, mine to which I find really annoying

arijun · 2026-03-13T05:11:01 1773378661

Yeah I had to ask it to stop doing that as well && chaining commands that it could split. I got tired of having to manually give permissions all the time (or leaving it to churn, only to come back after a while to see it had asked for permissions very early into the task)

od0 · 2026-03-13T01:35:21 1773365721

Working on something that addresses this and allows you to create reusable sets of permissions for Claude Code (so you can run without --dangerously-skip-permissions and have pre-approved access patterns granted automatically) https://github.com/empathic/clash

connorbrinton · 2026-03-13T02:38:18 1773369498

I've found Claude Code's built-in sandbox to strike a good balance between safety and autonomy on macOS. I think it's available on Windows via WSL2 (if you're looking for a middle ground between approving everything manually and --dangerously-skip-permissions)

kstenerud · 2026-03-13T05:58:04 1773381484

Use yoloAI and you get the full benefit of --dangerously-skip-permissions with none of the risks.

https://github.com/kstenerud/yoloai

Every time I use a bare Claude session (even with /sandbox) without using yoloai, it feels like using a browser without an ad blocker.

itzworm · 2026-03-13T03:14:02 1773371642

Still waiting for progress from the team trying to get WSL approved for use at our org. We get a "still working through the red tape" update every couple months.

richk449 · 2026-03-13T03:40:21 1773373221

You don't need WSL to run Claude code on windows.

CamperBob2 · 2026-03-13T04:42:23 1773376943

True, it works fine in an ordinary DOS box or in PowerShell, but you have to use WSL2 if you want a sandbox.

jason_s · 2026-03-13T04:59:20 1773377960

Where can I find out more information about sandboxing Claude and other agents?

CamperBob2 · 2026-03-13T05:34:54 1773380094

TBH, you could do worse than to simply ask Claude.

jitl · 2026-03-13T04:16:56 1773375416

> using windows

tw061023 · 2026-03-13T04:31:53 1773376313

Sometimes having a good kernel matters more than having a good userspace.

chrysoprace · 2026-03-13T01:23:27 1773365007

To be fair, read-only commands can still read sensitive files and keys, and exfiltrate them via prompt injection.

raw_anon_1111 · 2026-03-13T03:00:19 1773370819

Not if you don’t have keys on your computer.

In my case, all of my keys are in AWS Secrets Manager. The temporary AWS access keys that are in environment variables in the Claude terminal session are linked to a role without access to Secrets Manager. My other terminal session has temporary keys to a dev account that has Admin access

The AWS CLI and SDK automatically know to look in those environment variables for credentials.

hamburglar · 2026-03-13T01:34:13 1773365653

And “find” can easily execute arbitrary subcommands, which may not be readonly.

angry_octet · 2026-03-13T01:46:04 1773366364

We need a new suite of utilities with defined R/W/X properties, like a find that can't -exec arbitrary programs. Ideally the programs would have a standard parseable manifest.

I've seen this before with sodoers programs including powerful tools. Saw one today with make, just gobsmacked.

cyberge99 · 2026-03-13T02:52:23 1773370343

That exists as SELinux.

winterqt · 2026-03-13T01:26:51 1773365211

In my limited time using it, I’ve never seen it ask for permission to read files from within the working directory, what cases have you run into where it does? Was it trying to run a read-only shell command or something?

makeramen · 2026-03-13T01:36:36 1773365796

It will sometimes do this for gitignored files to avoid reading secret tokens in env files for example. But for certain languages that rely on code generation this can be a pain.

acid__ · 2026-03-13T01:31:53 1773365513

It seems to be particularly bad in Windows/WSL

cryptonector · 2026-03-13T05:18:51 1773379131

Use Claude Code for Web. Let it live dangerously on their VMs, not yours.

coldtea · 2026-03-13T06:45:54 1773384354

Could be intentional dark UI, to get people to put even more trust in the LLM.

"So they don't want to just let Claude do it? Start asking 10x the confirmations"

malfist · 2026-03-13T03:20:19 1773372019

Find can be dangerous it has an exec flag

BinaryRage · 2026-03-13T04:18:06 1773375486

You can relax permissions while avoiding the flag with BashTool sandboxing, see /sandbox.

d_meeze · 2026-03-13T01:11:20 1773364280

Maybe if compound commands trigger user approval, don’t do compound commands <facepalm/>

tekacs · 2026-03-13T00:08:43 1773360523

It kinda... does? The problem is that folks have been flailing on the right UX for this.

This is what build vs. plan mode _does_ in OpenCode. OpenAI has taken a different approach in Codex, where Plan mode can perform any actions (it just has an extra plan tool), but in OC in plan mode, IIRC write operations are turned off.

The screenshot shows that the experience had just flipped from Plan to Build mode, which is why the system reminder nudged it into acting!

Now... I forget, but OC may well be flipping automatically when you accept a plan, or letting the model flip it or any other kind of absurdity, but... folks are definitely trying to do the approval split in-harness, they're just failing badly at the UX so far.

And I fully believe that Plan vs. Build is a roundly mediocre UX for this.

beart · 2026-03-13T02:20:31 1773368431

The switch from plan mode to build is not always clearly defined. On a number of occasions, I've been in plan mode and enter a secondary follow up prompt to modify the plan. However, instead of updating the plan, the follow up text is taken as approval to build and it automatically switches to building.

Ask mode, on the other hand, has always explicitly indicated that I need to switch out of ask mode to perform any actions.

This is my experience with Cursor CLI.

evolighting · 2026-03-13T01:01:14 1773363674

Does Codex actually have a Plan Mode, or is there a mode switch I'm missing? I find myself having to manually tell it to 'make a plan' every time.

and if it has directory permissions, sometimes it just skips the confirmation step and starts executing as soon as it thinks the plan is ready.

ianbutler · 2026-03-13T01:22:02 1773364922

cmd-shift-p (at least in vscode)

FergusArgyll · 2026-03-13T01:30:34 1773365434

shift-tab in cli

evolighting · 2026-03-13T02:17:01 1773368221

It actually work, got "Plan mode (shift+tab to cycle)" at corner.

reading the manual , there is Slash commands /plan /plan switch to Plan mode

It seems that, unlike OpenCode, Codex doesn't show a notice for mode by default.

harrall · 2026-03-13T02:56:17 1773370577

This applies well if you’re writing code.

But often I am using Claude to investigate a problem like this “why won’t this mDNS sender work” and it needs a bunch of trial and error steps to find the problem and each subsequent step is a brand new unanticipated command.

ramoz · 2026-03-13T00:18:42 1773361122

The OpenCode plan experience has been pretty bad (the community has accepted this, at least on Discord). The community's adopted a handful of plugins to make the experience better, and also guardrail when the agent switches versus doesn't

raincole · 2026-03-13T01:51:24 1773366684

Everyone who uses these tools seriously is running it on YOLO mode. It might sound crazy for someone who just started adopting agentic coding but it's how things are done now. Either that or just hand coding.

The SOTA of permission management is just to git restore when AI fucks up, and to roll back docker snapshot when it fucks up big time.

raw_anon_1111 · 2026-03-13T02:33:49 1773369229

I see nothing wrong with that. If I “fuck up big time” before AI, I would just git restore. There is absolutely nothing on my work computer or personal computer that I couldn’t just throw it in the ocean and within a half a day have everything restored to just like it was - including the data.

raincole · 2026-03-13T02:42:03 1773369723

I didn't say there is something wrong with it. That's how I use it too.

JeremyNT · 2026-03-13T02:16:16 1773368176

Yep, it's easier to ask forgiveness than permission. It's far easier to undo the 1% of the time they fuck up in a serious way than it is to manually audit and allow an the routine stuff.

The key is to only give them access to things you're willing to lose.

This is also why giving them any kind of direct write access to production is a bad idea.

jazzyjackson · 2026-03-13T02:29:29 1773368969

Talk about code smell

If you arent manually auditing, you only notice the fuck ups when they’re instantaneous

If you don’t trust it to interact with prod, but still trust it to write code that will run on prod… you’re still trusting it with write access to prod.

The only thing I’m willing to let Claude write for me is a static site generator, because static files without JS aren’t going to do any damage, it either loads or it doesn’t.

JeremyNT · 2026-03-13T04:03:25 1773374605

To be clear, I'm not saying you can't (or shouldn't) review the results, only that you can (and should) give the harness the ability to do everything it needs to function without hitting permission barriers that need to be manually approved.

The correct way to run these safely is to sandbox them so real lasting damage is impossible, not to micromanage individual access requests.

raw_anon_1111 · 2026-03-13T02:37:36 1773369456

If you are a team lead or above, do you manually audit every line of code that other developers on your team write even when you are the one that will ultimately be held responsible? Every library you use?

joquarky · 2026-03-13T03:06:54 1773371214

This was fairly routine when the pace of everything was slower, we didn't have a giant tree of dependencies, and companies cared more about product quality.

raw_anon_1111 · 2026-03-13T03:33:29 1773372809

There was never a time that someone wasn’t responsible for more than they could review

__usually__wr · 2026-03-13T06:06:25 1773381985

There was a time when we didn't waste all our cycles coming up with excuses.

raw_anon_1111 · 2026-03-13T12:32:41 1773405161

Right, so a team lead with seven developers - or are you claiming that’s an outrageous scenario back in the old days (mind you I’m 51) - could review every line of code by everyone on his team?

dehrmann · 2026-03-13T02:14:38 1773368078

I was doing something involving API keys and I realized Junie (backed by Sonnet) likes too write helper scripts to try things. And who knows where those scripts look or if they honor .aiignore. Agentic development is a real test of internal access control.

NamlchakKhandro · 2026-03-13T03:43:39 1773373419

Your first mistake is thinning that such childish control mechanisms are helping you.

Gondolin go hard or go home

andoando · 2026-03-13T01:53:47 1773366827

I ran thousands of prompts by now and at most the only issue I had is it deleting code it wrote, which has been easy to recover

unselect5917 · 2026-03-13T05:00:33 1773378033

This is one of the interesting things I've noticed. LLMs are good at natural language, and even writing novel code. But If you try to get it to do something that's simple and solidly within the discrete math world, like "sort this list of lines by length" it'll fuck it up like a first time ever programmer, or just fail the task. Like the longest line will be in some random spot not even the middle.

I know, it's not really an appropriate use of the tool, but I'm a lazy programmer and used what I had ready access to. And it took like 5 iterations.

Discrete, concrete things like "stop", or "no" is just like... not in its wheelhouse.

gerdesj · 2026-03-13T01:13:33 1773364413

LLMs are sold on the premise of doing cool stuff and reasonably understanding intent and doing it. The man on the Clapham omnibus would not miss-interpret "no" like that.

The LLM asked: "Shall I implement [plan]". The response was "no". The LLM then went on to "interpret" what no referred to and got it wrong.

As you say, it is amusing but people are wiring these things up to bank accounts and all sorts.

I'm looking into using a Qwen3.5 quant to act as a network ... fiddler, for want of a better word but you can be sure I'll be taking rather more care than our errm "hero" (OP).

hollow-moe · 2026-03-13T00:37:17 1773362237

big tech doesn't understand the concept of "consent", this isn't a new thing lol

zephen · 2026-03-13T02:50:02 1773370202

You have to think about the training data, which has much content far outside the context of pure software.

You have all the real life Harvey Weinsteins and Andrew Tates, and you have all the bodice-ripper fiction, and probably lots of other stuff.

Plenty of real-life precedent for the LLM to decide that "no" doesn't really mean "no."

6thbit · 2026-03-13T00:58:43 1773363523

Is this understanding correct: The LLM uses harness tools to ask for permission, then interprets the answer and proceeds.

If so, this can't live 100% on the harness. First because you would need the harness to decide when the model should ask for permission or not which is more of an llm-y thing to do. The harness can prevent command executions but wouldn't prevent this case where model goes off and begins reading files, even just going off using tokens and spawning subagents and such, which are not typically prevented by harnesses at all.

Second because for the harness to know the LLM is following the answer it would need to be able to interpret it and the llm actions, which is also an llm-y thing to do. On this one, granted, harness could have explicit yes/no. I like codex's implementation in plan mode where you select from pre-built answers but still can Tab to add notes. But this doesn't guarantee the model will take the explicit No, just like in OP's case.

I agree with your hunch though, there may be ways to make this work at harness level, I only suspect its less trivial than it seems. Would be great to hear people's ideas on this.

angry_octet · 2026-03-13T02:01:57 1773367317

Harness needs to intercept all too calls and compare with an authorisation list. The problem is that this is using already granted core permissions.

So you have to have a tighter set of default scopes, which means approving a whole batch of tool calls, at the harness layer not as chat. This is obviously more tedious.

The answer might be another tool that analyses the tool calls and presents a diagram of list of what would be fetched, sent, read and written. But it would get very hard to truly observe what happens when you have a bunch of POST calls.

So maybe it needs a kind of incremental approval, almost like a series of mini-PRs for each change.

marcus_holmes · 2026-03-13T07:30:52 1773387052

Isn't this part of the same problem we have with LLM security in general; that it can only ingest a single stream of tokens, and has no method of privileging "system" tokens over "untrusted" tokens?

If we could solve this (and forgive me if I'm not aware of recent advances that mean we have solved this) then this problem gets easier to solve; permissions live in the system token stream and are privileged. We can then use the LLM to work out what that means in terms of actions.

ryoshu · 2026-03-13T02:21:04 1773368464

Do not enforce invariants with an LLM. Do not enforce invariants with an LLM. Do not enforce invariants with an LLM. Do not enforce invariants with an LLM.

jazzyjackson · 2026-03-13T02:24:34 1773368674

Thou shalt not make repetitive generic music,

thou shalt not make repetitive generic music,

thou shalt not make repetitive generic music.

Thou shalt not pimp my ride.

Thou shalt not scream if you wanna go faster.

Thou shalt not move to the sound of the wickedness.

Thou shalt not make some noise for Detroit.

When I say "Hey" thou shalt not say "Ho".

When I say "Hip" thou shalt not say "Hop".

When I say, he say, she say, we say, make some noise - kill me.

- Dan le Sac vs Scroobius Pip

alwa · 2026-03-13T05:33:33 1773380013

I have no idea how this ended up here, but after giving it a listen, thank you for the chuckle. I wouldn’t have come across it otherwise.

jazzyjackson · 2026-03-14T21:57:58 1773525478

:) sometimes I post lyrics I’m reminded of, the internet is meant to be about links and hyperlinks, just doing my part to increase connectivity (:

zx13719 · 2026-03-13T02:39:53 1773369593

True, the "no" button should literally abort the tool use and then return an instruction to tell LMs that the user has aborted the action, but in some way claude code does so; entering "no" would result in tool_abort.

0xbadcafebee · 2026-03-13T03:15:30 1773371730

I believe both copilot and gemini have hard-stops for their question prompts. The "no" answer is basically "I will stop and wait for you to tell me what you want".

czhu12 · 2026-03-13T00:41:48 1773362508

It does, when any of these actually try to write to a file, it will ask for permissions. The issue is that its so annoying to constantly approve correct code that most people just auto accept everything and review later.

inetknght · 2026-03-13T01:10:10 1773364210

> If the UI asks a yes/no question, the “no” should be enforced as a state transition that blocks write actions, not passed back into the model as more text to interpret.

If the UI asks a yes/no question, the UI is broken.

I want more than just yes/no. I want "Why is this needed?", or "I need to fix the invocation for you.", or "Let's use a different design."

giancarlostoro · 2026-03-13T00:55:58 1773363358

Theres shortcuts to undo btw.

jazzyjackson · 2026-03-13T02:31:32 1773369092

Can’t believe we got AGI before we figured out reproducible builds, building software in a mutable environment just baffles me

wonnage · 2026-03-13T00:35:29 1773362129

This is the is/ought problem in a nutshell, no amount of compute will reliably solve this problem. Maybe there are some parallels to the halting problem here too.