Quick note - be careful of gendering & anthropomorphising large language models, since you’re talking to a non-human machine so should be wary of how it can affect your mindset.
Anthropic, both in the name, and in their model cards, agressively anthropomorphize their models.
You probably should start doing it. Ghost in the Shell is about super intelligent AI creating a "ghost" (scientifically understood version of the soul) out of thin air. I believe such a thing is possible. The same movie literally predicted model merging (the end of the film the AI model merges with the major) to a tee.
Further, the appearance of sentience/cognition/consciousness might as well be identical to actual sentience/cognition/consciousness. That is to say, we can't know if you're a P-zombie or not. Bladerunner and most other cyberpunk stuff is coming and gonna hit you and every other AI-denialist in the face. The Von-Kampf test is absurd and pretty bad (inaccurate) in their universe for a reason.
I tell my LLM it's a good bot and thank it, because even a tiny risk of subjective qualia experienced by a model (and again, Anthropic themselves believe in this exact risk) means I should treat it like a quasi-ethical actor.
This is also a reason why the robot torture scene in empire strikes back could be a real dynamic in the future.
The possibility of intelligent machines undergoing transformative regeneration actually dates back to a party hosted by one Charles Babbage where, in attendance, was one Charles Darwin, who only thereafter published On the Origin of Species
Great to see. It’ll be great to democratize access to remotely using cli coding agents.
I’ve been iterating the past few months on a solution to use Claude Code on my phone while it runs on my laptop and it’s a lot of moving parts: Tailscale, git worktrees, tmux, an always-on “caffeinate” process, and a ton of hooks & tweaks to fix bugs along the way. It’s become very comfortable but in the process, impossible for anyone but me to understand.
But it’s awesome because I own the machine that runs tests and am not paying monthly for anything but Claude Max - and it keeps going if I lock my phone or go into a cell reception dead zone.
Productising such a thing would be a very interesting challenge indeed.
Passing tests in your repo are great documentation of the tool at a microscopic level. And rerunning tests only burns tokens on failures (since passed tests just print a dot) so it’s token efficient too.
Some other neat tricks:
- For greater efficiency configure your test runner to print nothing (not even a dot/filename) for test successes. Agents don’t need progress dots, only the exit code & failure details
- Have your agent implement a 10ms timeout per test. pytest has hooks to do this. The agent will see tests time out and mock out all I/O and third party code - why test what one assumes third parties tested already! Your test suite is CPU-bound without a shared database, has no shared data and no tests that interfere with or depend on each other, so tests can run in parallel.
I'm OK with longer running tests because I always have them run against a real database (often SQLite, sometimes PostgreSQL) and real files created in temporary directories but I can see how the time limit might be useful for tests that don't need those kind of components.
This is understandable, to want everything point and click and go. But doubt your mindset matches that of the community, so unfortunately it may take a while…
Maybe try something more commercial like Zorin OS?
Abstractions can take away but many add tremendous value.
For example, the author has coded for their entire career on silicon-based CPUs but never had to deal with the shittiness of wire-wrapped memory, where a bit-flip might happen in one place because of a manufacturing defect and good luck tracking that down. Ever since lithography and CPU packaging, the CPU is protected from the elements and its thermal limits are well known and computed ahead of time and those limits baked into thermal management so it doesn’t melt but still goes as fast as we understand to be possible for its size, and we make billions of these every day and have done for over 50 years.
Moving up the stack you can move your mouse “just so” and click, no need to bit-twiddle the USB port (and we can talk about USB negotiation or many other things that happen on the way) and your click gets translated into an action and you can do this hundreds of times a day without disturbing your flow.
Or javascript jit compilation, where the js engine watches code run and emits faster versions of it that make assumptions about types of variables - with escape hatches if the code stops behaving predictably so you don’t get confusing bugs that only happen if the browser jitted some code. Python has something similar. Thanks to these jit engines you can write ergonomic code that in the typical scenario is fast enough for your users and gets faster with each new language release, with no code changes.
Lets talk about the decades of research that went into autoregressive transformer models, instruction tuning, and RLHF, and then chat harnesses. Type to a model and get a response back, because behind the scenes your message is prefixed with “User: “, triggering latent capabilities in the model to hold its end of a conversation. Scale that up and call it a “low key research preview” and you have ChatGPT. Wildly simple idea, massive implications.
These abstractions take you further from the machine and yet despite that they were adopted en masse. You have to account for the ruthless competition out there - each one would’ve been eliminated if they hadn’t proven to be worth something.
You’ll never understand the whole machine so just work at the level you’re comfortable with and peer behind the curtain if and when you need (eg. when optimizing or debugging).
This is nothing new; business gotta pay for itself after all.
But ads don’t have to ruin a great company.
A century or more ago, top tier journalistic institutions created norms of putting strong barriers between the reporting and advertising sides of the house. That kept trust with customers and made journalism a sustainable long term business.
So, It’s mostly Google that couldn’t keep its hands out of the cookie jar (not solely Google, but they’re an industry leader.) It really doesn’t have to go south, it’s not the default, but Google did set the tone for Silicon Valley in exactly the way wise journalism company leaders did for their industry in the late 1800s. If OpenAI has a long term view on this they’ll follow a journalism industry model instead of a cookie jar model - but they have to believe deep down that customer trust is worth more than ad dollars long term.
There are reasons to hope: OpenAI has more and fiercer competition than Google; including Chinese competitors that can’t be lobbied away. Qwen, DeepSeek, Mistral and Kimi all have free chat UIs!
Quick note - be careful of gendering & anthropomorphising large language models, since you’re talking to a non-human machine so should be wary of how it can affect your mindset.
reply