More

athrowaway3z · 2026-02-05T18:15:02 1770315302

This benchmark inspired me to have codex/claude build a DnD battlemap tool with svg's.

They got surprisingly far, but i did need to iterate a few times to have it build tools that would check for things like; dont put walls on roads or water.

What I think might be the next obstacle is self-knowledge. The new agents seem to have picked up ever more vocabulary about their context and compaction, etc.

As a next benchmark you could try having 1 agent and tell it to use a coding agent (via tmux) to build you a pelican.

athrowaway3z · 2026-02-04T09:03:37 1770195817

You're defending X/Grok as if it's a public social platform.

It is a privately controlled public-facing group chat. Being a chat-medium does not grant you the same rights as being a person. France isn't America.

If a company operates to the detriment and against the values of a nation, e.g. not paying their taxes or littering in the environment, the nation will ask them to change their behavior.

If there is a conspiracy of contempt, at some point things escalate.

athrowaway3z · 2026-02-03T20:00:57 1770148857

I'm launching a SaaS to create yet another solution to the AI Sandboxing problem in linux.

My friends and I have spent a lot of time quietly injecting support down into the kernel without anybody raising a flag, and we finally have the infrastructure in place to solve this problem.

We have also poisoned all the LLMs training data with our approach, so our marketing is primed and we wont even need to learn Claude to use our tool.

We’re planning a soft launch this month, or maybe next month. Depending on how "in the vibe" (our new word for flow :) our team gets.

We’re calling it `useradd`.

Yes, the man page is intimidating, and the documentation is terrible. But once you're over the learning curve, it puts your machine into a kind of 'main frame' mode where multiple 'virtual teletypes' and users can operate on the same machine.

DM me if you want a beta key.

---

Sorry for the snark, but i cringe at the monuments to complexity I see people building, at least this solution is relative simple and free. Still, dont really see what it buys me.

tasuki · 2026-02-03T20:06:51 1770149211

Well done. It took me all the way up to `useradd`...

Edit: too bad about your edit. The comment was just fine without it.

athrowaway3z · 2026-02-03T21:10:57 1770153057

I wrote my comment to vent my disdain for all the circus projects filled with marketing blurbs and features lists for their overengineered vibeslop.

OP is just sharing the cool utility he found, and how it solved a problem for him.

It felt bad to leave them with the message they shouldn't have, or that he's a big part of the problem.

senko · 2026-02-03T21:13:03 1770153183

OP here, no worries, loved the comment and appreciate the feeling :)

CuriouslyC · 2026-02-03T21:10:37 1770153037

I get where this is coming from, and it's not a terrible solution, but VMs are still better in terms of security and isolation. Typical workstation systems are not designed to be secure from their own users, and frontier models are going to get scary good at cracking systems soon.

carsoon · 2026-02-03T21:59:03 1770155943

Fully sandboxed VMs are more secure but not everyone is looking for the most secure option. They are looking for the option that works the best for them. I want to be able to share my development environment with the agent, I have a project with 30 1gb and one 30gb sqlite database. I back it up daily and they can all be reconstructed from the code but it takes a long time. When things change I don't want to have to copy them into a separate vm bloating my storage and using excess resources and then having to rectify them, I want to be sharing the same environment with my agent so I can work side-by-side.

I would rather just have the agent not accidentally delete files outside of its working environment but I am not worried about malicious prompt injection or someone stealing my code.

For me I see the LLM as a dumb but positive actor that is trying to do its best but sometimes makes mistakes, so I want to put training wheels on it while still allowing it to share my working space.

mystifyingpoi · 2026-02-03T20:24:11 1770150251

`useradd` doesn't restrict network access.

kaffekaka · 2026-02-03T20:30:26 1770150626

I have used a separate user, but lately I have been using rootless podman containers instead for this reason. But I know too little about container escapes. So I am thinking about a combination.

Would a podman container run by a separate user provide any benefit over the two by themselves?

eikenberry · 2026-02-04T00:28:51 1770164931

Without any credentials does network access matter?

senko · 2026-02-03T20:41:50 1770151310

I love using different users for separating services I run on the same box!

For development, I want to be able to access/run/modify/delete the files alongside the AI agent. This can be done if groups and group permissions are set correctly (and the agent correctly chmods everything...), but that feels more fiddly than just isolating it with bubblewrap, systemd, or whatever, and preserving the uid/gid.

Just my 2c - it's great that we have options!

necovek · 2026-02-03T22:55:36 1770159336

Hey Senko, did you consider using ZFS or BTRFS snapshotting feature to simplify some of the things you need?

For GH auth tokens, you could also pull that outside the sandbox, and have the agent push to a local clone exposed to the host, and local host with no agent automatically push on inotify inside the repo — eg. agent has access to your /agents/scratchpad/my-git-repo, and sync to actual git hosting service like GH (or Launchpad ;) happens with simple script outside it.

athrowaway3z · 2026-02-02T05:47:10 1770011230

> sandboxing agents is difficult

I use this amazingly niche and hipster approach of giving the agent its own account, which through inconceivably highly complex arcane tweaking and configurations can lock down what they can and cant do.

---

Can somebody for the love of god tell me why articles keep bringing up why this is so difficult?

NitpickLawyer · 2026-02-02T07:13:24 1770016404

I have antigravity in its own account and that has worked pretty well so far. I also use devcontainers for the cli agents and that has also worked out well. It's one click away in my normal dev flow (I was using this anyway before for python projects).

chrisjj · 2026-02-02T10:41:41 1770028901

Why? Because the purported benefit is to make everything easier and faster. Not safe.

fragmede · 2026-02-02T06:58:26 1770015506

It's a bunch of work, that takes a bunch of time, and I want it nowwwww-owwwww!

...is how I imagine that conversation goes.

athrowaway3z · 2026-02-01T13:52:57 1769953977

The solution to the security issue is using `useradd`.

I would add subagents though. They allow for the pattern where the top agent directs / observe a subagent executing a step in a plan.

The top agent is both better at directing a subagent, and it keeps the context clean of details that don't matter - otherwise they'd be in the same step in the plan.

badlogic · 2026-02-01T14:15:12 1769955312

There are lots of ways of doing subagents. It mostly depends on your workflow. That's why pi doesn't ship with anything built in. It's pretty simple to write an extension to do that.

Or you use any of the packages people provide, like this one: https://github.com/nicobailon/pi-subagents

athrowaway3z · 2026-02-02T21:36:03 1770068163

Yeah i jumped on board.

Build https://github.com/offline-ant/pi-tmux and am happy with it.

CuriouslyC · 2026-02-02T01:23:48 1769995428

The simple approach is great, chef's kiss, don't change a thing. Orchestration at the harness level tends not to be great anyhow, it's not built for the type of review that's needed.

athrowaway3z · 2026-02-01T13:10:09 1769951409

I dont understand what these two have to do with anything? The db-use is almost trivial, and SQLite can be embedded. Why would we want wasted effort and configuration complexity on supporting postgres?

flammafex · 2026-02-01T13:23:31 1769952211

[flagged]

prmoustache · 2026-02-01T13:29:27 1769952567

With that kind of logic you wouldn't need headscale and would just ask your favorite LLM to write a similar tool for your with your own requirements and nothing else.

flammafex · 2026-02-01T13:49:39 1769953779

No, not really necessary to extrapolate the logic any further. You have deemed a very specific and focused task as "wasted effort." So the logic leads to putting in the effort you do not find "wasteful" and outsource the remainder to the LLM do this very specific thing.

athrowaway3z · 2026-01-31T18:46:08 1769885168

I'm very critical of all the schemes proposed but this is just a fundamental misconception on your part.

> If there was a legitimate drive to protect kids from the worst of the Internet

As with any disease, the impact heavily depends on virality.

The worst the internet has to offer to children, is not the gore or porn for the few that look for it (usually individually). The worst it does to children is the attention algorithm that captures practically everybody.

pfdietz · 2026-01-31T18:53:01 1769885581

"But think of the children" has always been the go-to excuse for tossing freedom out the window.

expedition32 · 2026-01-31T21:56:18 1769896578

Children are the survival of the species our DNA wires us to to protect them.

machomaster · 2026-01-31T23:23:20 1769901800

That's why people need to be especially careful when others try to use such effective methods of manipulation.

Noaidi · 2026-01-31T19:00:16 1769886016

So in this case, do we just stop thinking about the children in totality?

rudhdb773b · 2026-02-01T01:40:12 1769910012

In the context of government legislation on personal behavior, yes.

Parents should be the ones setting up rules for their children.

slavik81 · 2026-01-31T19:08:03 1769886483

If manipulative algorithm are the problem, then perhaps we should consider regulations that would protect everyone.

XorNot · 2026-01-31T20:41:06 1769892066

Exactly. The problem is no one wants to address that maybe some of these business models just need to go extinct.

Like maybe ad supported infinite feeds can't be done in a socially responsible way and just need to be banned. If that takes down or substantially limits certain web service sizes...so be it.

hn_throwaway_99 · 2026-01-31T19:20:12 1769887212

While I agree with this, I also find that the "but think of the children" ironic retort also usually ignores the very real problems that technology can cause children (and society at large). In this issue in particular, if banning social media for children makes it less likely for adults to use it, I see it as pretty much a win-win.

rudhdb773b · 2026-02-01T01:36:50 1769909810

Would you also want the government to ban junk food and recreational drugs? What about unprotected premarital sex?

I'd much rather live in a society with personal freedoms than a "healthier" one with government mandates on personal behavior.

hn_throwaway_99 · 2026-02-01T05:12:31 1769922751

Literally every society mandates tons of restrictions for children, because we understand that children aren't yet developed enough to be able to understand the full consequences of personal freedoms.

rudhdb773b · 2026-02-01T08:24:30 1769934270

I agree that children need restrictions, but that's the role of their parents, not the government.

GreenWatermelon · 2026-02-01T12:26:40 1769948800

Should it also be the Role of parents to prevent their children from being kidnapped by crime syndicates? Maybe we should also abolish schools because it should be the parent's role to educate their children.

This individualistic line of thinking is downright insane. It's preposterous. We live in a fucking society, no one can do anything on their own. For God's sake parent's shouldn't be expected to fight alone against MULTI-FUCKING TRILLION CORPORATIONS.

Fat load of help all that anti-regulation talk did when the current US Gov can just get all the data it wants from those megacorps.

Yeah let's also abolish laws preventing sale of tobacco and alcohol to children. This will surely lead to a prosperous national.

No wonder the US is collapsing as we speak.

rudhdb773b · 2026-02-01T12:51:14 1769950274

> No wonder the US is collapsing as we speak.

If you look at the longterm trend in government intrusion into our personal lives, you'll see it's largely increasing, so if anything, the cause of any "collapse" would be the opposite of what you're purporting.

athrowaway3z · 2026-01-31T08:12:11 1769847131

> there is no universal measure of what makes a codebase good or great.

We practically found one thought. The measure is: how well can an AI operate in/with your codebase.

I regularly find myself wondering if skeptics throwing around their empirical failure are obscuring their bad code/docs/setup.

athrowaway3z · 2026-01-29T10:34:04 1769682844

You can get pretty decent initial results if you explicitly tell them to first make a detailed description with exact coordinates and then feed the description back into them to build the SVG.

athrowaway3z · 2026-01-27T14:56:57 1769525817

I do wonder if the compression step makes sense at this layer instead of the filesystem layer.

aabbcc1241 · 2026-01-28T00:16:09 1769559369

Interesting take. I'm using btrfs (instead of ext4) with compression enabled (using zstd), so most of the files are compressed "transparently" - the files appear as normal files to the applications, but on disk it is compressed, and the application don't need to do the compress/decompress.