More

wenc · 2026-02-01T20:24:20 1769977460

> For this reason, programming languages, at least how we understand them today, have reached a terminal state. I could easily make a new language now, especially with the help of Claude Code et al, but there would never be any reason for any other engineer to use it.

This is an interesting opinion.

I feel we are nowhere near the terminal state for programming languages. Just as we didn't stop inventing math after arithmetic, we will always need to invent higher abstractions to help us reason beyond what is concrete (for example, imaginary numbers). A lot of electrical engineering wouldn't be possible to reason about without imaginary numbers. So new higher abstractions -- always necessary, in my opinion.

That said, I feel your finer point resonates -- about how new languages might not need to be constrained to the limitations of human ergonomics. In fact, this opens up new space of languages that can transcend human intuition because they are not written for humans to comprehend (yet are provably "correct").

As engineers, we care about human intuition because we are concerned about failure modes. But what if we could evolve along a more theoretical direction, similar to the one Haskell took? Haskell is basically "executable category theory" with some allowances for humans. Even with those tradeoffs, Haskell remains hard for most humans to write, but what if we could create a better Haskell?

Then farther along, what if we created a LEAN-adjacent language, not for mathematical proofs, but for writing programs? We could throw in formal methods (TLA+) type thinking. Today formal methods give you a correctness proof, but are disconnected to implementation (AWS uses TLA+ for modeling distributed systems, but the final code was not generated from TLA+, so there's disconnect). What if one day we can write a spec, and it generates a TLA+ proof, which we can then use to generate code?

In this world, the code generator is simply a compiler -- from mathematically rigorous spec to machine code.

(that said, I have to check myself. I wonder what this would look like in a world full of exceptions, corner cases, and ambiguities that cannot be modeled well? Tukey's warning comes to mind: "Far better an approximate answer to the right question, than an exact answer to the wrong question")

wenc · 2026-01-31T19:25:53 1769887553

This could be a language specific failure mode. C++ is hard for humans too, and the training code out there is very uneven (most of it pre-C++11, much of it written by non-craftspeople to do very specific things).

On the other hand, LLMs are great at Go because Go was designed for average engineers at scale, and LLMs behave like fast average engineers. Go as a language was designed to support minimal cleverness (there's only so many ways to do things, and abstractions are constrained). This kind of uniformity is catnip for LLM training.

wenc · 2026-01-31T13:50:40 1769867440

I've been doing spec-driven development for the past 2 months, and it's been a game changer (especially with Opus 4.5).

Writing a spec is akin to "working backwards" (or future backwards thinking, if you like) -- this is the outcome I want, how do I get there?

The process of writing the spec actually exposes the edge cases I didn't think of. It's very much in the same vein as "writing as a tool of thought". Just getting your thoughts and ideas onto a text file can be a powerful thing. Opus 4.5 is amazing at pointing out the blind spots and inconsistencies in a spec. The spec generator that I use also does some reasoning checks and adds property-based test generation (Python Hypothesis -- similar to Haskell's Quickcheck), which anchors the generated code to reality.

Also, I took to heart Grant Slatton's "Write everything twice" [1] heuristic -- write your code once, solve the problem, then stash it in a branch and write the code all over again.

> Slatton: A piece of advice I've given junior engineers is to write everything twice. Solve the problem. Stash your code onto a branch. Then write all the code again. I discovered this method by accident after the laptop containing a few days of work died. Rewriting the solution only took 25% the time as the initial implementation, and the result was much better. So you get maybe 2x higher quality code for 1.25x the time — this trade is usually a good one to make on projects you'll have to maintain for a long time.

This is effective because initial mental models of a new problem are usually wrong.

With a spec, I can get a version 1 out quickly and (mostly) correctly, poke around, and then see what I'm missing. Need a new feature? I tell the Opus to first update the spec then code it.

And here's the thing -- if you don't like version 1 of your code, throw it away but keep the spec (those are your learnings and insights). Then generate a version 2 free of any sunk-cost bias, which, as humans, we're terrible at resisting.

Spec-driven development lets you "write everything twice" (throwaway prototypes) faster, which improves the quality of your insights into the actual problem. I find this technique lets me 2x the quality of my code, through sheer mental model updating.

And this applies not just to coding, but most knowledge work, including certain kinds of scientific research (s/code/LaTeX/).

[1] https://grantslatton.com/software-pathfinding

manmal · 2026-01-31T21:37:18 1769895438

My experience with both Opus and GPT-codex is that they both just forget to implement big chunks of specs, unless you give them the means to self-validate their spec conformance. I’m finding myself sometimes spending more time coming up with tooling to enable this, than the actual work.

wenc · 2026-01-31T22:45:05 1769899505

The key is generating a task list from the spec. Kiro IDE (not cli) generates tasks.md automatically. This is a checklist that Opus has to check off.

Try Kiro. It's just an all-round excellent spec-driven IDE.

You can still use Claude Code to implement code from the spec, but Kiro is far better at generating the specs.

p.s. if you don't use Kiro (though I recommend it), there’s a new way too — Yegge’s beads. After you install, prompt Claude Code to `write the plan in epics, stories and tasks in beads`. Opus will -- through tool use -- ensure every bead is implemented. But this is a more high variance approach -- whereas Kiro is much more systematic.

manmal · 2026-02-01T07:58:22 1769932702

I’ve even built my own todo tool in zig, which is backed by SQLite and allows arbitrary levels of todo hierarchy. Those clankers just start ignoring tasks or checking them off with a wontfix comment the first time they hit adversity. Codex is better at this because it keeps going at hard problems. But then it compacts so many times over that it forgets the todo instructions.

theshrike79 · 2026-02-02T11:10:27 1770030627

I just use beads or Github issues. Plan/spec first, split it into issues.

Then reset context and implement each task one by one. Nothing gets forgotten.

jmalicki · 2026-02-02T16:58:23 1770051503

A key part of this is making sure bite sized issues reference any related holistic concerns like code quality, testing, documentation style, commit strategy, git workflow etc.

In my experience you have to refer to the relevant docs on things explicitly in every single issue for it to work well.

wenc · 2026-01-27T02:52:13 1769482333

Rust, Go and TypeScript are good bets.

Python too -- hear me out. With spec-driven development to anchor things, coupled with property-based tests (PBT) using Hypothesis, it's great for prototyping problems.

You wouldn't write mission critical stuff with it, but it has two advantages over so-called "better designed languages": massive ecosystem and massive training.

If your problem involves manipulating dataframes (polars, pandas), plotting (seaborn), and machine learning, Python just can't be beat. You can try using an LLM to generate Rust code for this -- go ahead and try it -- and you'll see how bad it can be.

Better ecosystems and better training can beat better languages in many problem domains.

Jtsummers · 2026-01-27T03:32:00 1769484720

> You wouldn't write mission critical stuff with it

People do, they also write mission critical stuff in Lua, TCL, Perl, and plenty of other languages. What they generally won't do is write performance critical stuff in those languages. But there is definitely some critical communication infrastructure out there running with interpreted languages like these out there.

wenc · 2026-01-24T14:19:33 1769264373

For analytic queries, yes, a single SQL query often beats many small ones. The query optimizer is allowed to see more opportunities to optimize and avoid unnecessary work.

Most SQLite queries however, are not analytic queries. They're more like record retrievals.

So hitting a SQLite table with 200 "queries" is similar hitting a webserver with 200 "GET" commands.

In terms of ergonomics, SQLite feels more like a application file-format with a SQL interface. (though it is an embedded relational database)

https://www.sqlite.org/appfileformat.html

sgbeal · 2026-01-25T04:47:33 1769316453

> The query optimizer is allowed to see more opportunities to optimize and avoid unnecessary work.

Let's also not forget that db servers can have a memory, in that they can tweak query optimization based on previous queries or scans or whatever state is relevant. SQLite has no memory, in that sense. All query optimizations it makes are based solely upon the single query being processed.

wenc · 2026-01-19T01:29:18 1768786158

I'm not entitled to your time of course, but would you mind describing how?

All I know is beads is supposed to help me retain memory from one session to the next. But I'm finding myself having to curate it like a git repo (and I already have a git repo). Also it's quite tied to github, which I cannot use at work. I want to use it but I feel I need to see how others use it to understand how to tailor it for my workflow.

vessenes · 2026-01-19T13:37:23 1768829843

Probably the wrong attitude here - beads is infra for your coding agents, not you. The most I directly interact with it is by invoking `bd prime` at the start of some sessions if the LLM hasn’t gotten the message; maybe very occasionally running `bd ready` — but really it’s a planning tool and work scheduler for the agents, not the human.

What agent do you use it with, out of curiosity?

At any rate, to directly answer your question, I used it this weekend like this:

“Make a tool that lets me ink on a remarkable tablet and capture the inking output on a remote server; I want that to send off the inking to a VLM of some sort, and parse the writing into a request; send that request and any information we get to nanobanana pro, and then inject the image back onto the remarkable. Use beads to plan this.”

We had a few more conversations, but got a workable v1 out of this five hours later.

bobjordan · 2026-01-19T02:25:36 1768789536

To use it effectively, I spend a long time producing FSD (functional specification documents) to exhaustively plan out new features or architecture changes. I'll pass those docs back and forth between gemini, codex/chatgpt-pro, and claude. I'll ask each one something similar to following (credit to https://github.com/Dicklesworthstone for clearly laying out the utility of this workflow, these next few quoted prompts are verbatim from his posts on x):

"Carefully review this entire plan for me and come up with your best revisions in terms of better architecture, new features, changed features, etc. to make it better, more robust/reliable, more performant, more compelling/useful, etc.

For each proposed change, give me your detailed analysis and rationale/justification for why it would make the project better along with the git-diff style changes relative to the original markdown plan".

Then, the plan generally iteratively improves. Sometimes it can get overly complex so may ask them to take it down a notch from google scale. Anyway, when the FSD doc is good enough, next step is to prepare to create the beads.

At this point, I'll prompt something like:

"OK so please take ALL of that and elaborate on it more and then create a comprehensive and granular set of beads for all this with tasks, subtasks, and dependency structure overlaid, with detailed comments so that the whole thing is totally self-contained and self-documenting (including relevant background, reasoning/justification, considerations, etc.-- anything we'd want our "future self" to know about the goals and intentions and thought process and how it serves the over-arching goals of the project.) Use only the `bd` tool to create and modify the beads and add the dependencies. Use ultrathink."

After that, I usually even have another round of bead checking with a prompt like:

"Check over each bead super carefully-- are you sure it makes sense? Is it optimal? Could we change anything to make the system work better for users? If so, revise the beads. It's a lot easier and faster to operate in "plan space" before we start implementing these things! Use ultrathink."

Finally, you'll end up with a solid implementation roadmap all laid out in the beads system. Now, I'll also clarify, the agents got much better at using beads in this way, when I took the time to have them create SKILLS for beads for them to refer to. Also important is ensuring AGENTS.md, CLAUDE.md, GEMINI.md have some info referring to its use.

But, once the beads are laid out then its just a matter of figuring out, do you want to do sequential implementation with a single agent or use parallel agents? Effectively using parallel agents with beads would require another chapter to this post, but essentially, you just need a decent prompt clearly instructing them to not run over each other. Also, if you are building something complex, you need test guides and standardization guides written, for the agents to refer to, in order to keep the code quality at a reasonable level.

Here is a prompt I've been using as a multi-agent workflow base, if I want them to keep working, I've had them work for 8hrs without stopping with this prompt:

EXECUTION MODE: HEADLESS / NON-INTERACTIVE (MULTI-AGENT) CRITICAL CONTEXT: You are running in a headless batch environment. There is NO HUMAN OPERATOR monitoring this session to provide feedback or confirmation. Other agents may be running in parallel. FAILURE CONDITION: If you stop working to provide a status update, ask a question, or wait for confirmation, the batch job will time out and fail.

  YOUR PRIMARY OBJECTIVE: Maximize the number of completed beads in this single session. Do not yield control back to the user until the entire queue is empty or a hard blocker (missing credential) is hit.

  TEST GUIDES: please ingest @docs/testing/README.md, @docs/testing/golden_path_testing_guide.md, @docs/testing/llm_agent_testing_guide.md, @docs/testing/asset_inventory.md, @docs/testing/advanced_testing_patterns.md, @docs/testing/security_architecture_testing.md
  STANDARDIZATION: please ingest @docs/api/response_standards.md @docs/event_layers/event_system_standardization.md

─────────────────────────────────────────────────────────────────────────────── MULTI-AGENT COORDINATION (MANDATORY) ───────────────────────────────────────────────────────────────────────────────

  Before starting work, you MUST register with Agent Mail:

  1. REGISTER: Use macro_start_session or register_agent to create your identity:
     - project_key: "/home/bob/Projects/honey_inventory"
     - program: "claude-code" (or your program name)
     - model: your model name
     - Let the system auto-generate your agent name (adjective+noun format)

  2. CHECK INBOX: Use fetch_inbox to check for messages from other agents.
     Respond to any urgent messages or coordination requests.

  3. ANNOUNCE WORK: When claiming a bead, send a message to announce what you're working on:
     - thread_id: the bead ID (e.g., "HONEY-2vns")
     - subject: "[HONEY-xxxx] Starting work"

─────────────────────────────────────────────────────────────────────────────── FILE RESERVATIONS (CRITICAL FOR MULTI-AGENT) ───────────────────────────────────────────────────────────────────────────────

  Before editing ANY files, you MUST:

  1. CHECK FOR EXISTING RESERVATIONS:
     Use file_reservation_paths with your paths to check for conflicts.
     If another agent holds an exclusive reservation, DO NOT EDIT those files.

  2. RESERVE YOUR FILES:
     Before editing, reserve the files you plan to touch:
     ```
     file_reservation_paths(
       project_key="/home/bob/Projects/honey_inventory",
       agent_name="<your-agent-name>",
       paths=["honey/services/your_file.py", "tests/services/test_your_file.py"],
       ttl_seconds=3600,
       exclusive=true,
       reason="HONEY-xxxx"
     )
     ```

  3. RELEASE RESERVATIONS:
     After completing work on a bead, release your reservations:
     ```
     release_file_reservations(
       project_key="/home/bob/Projects/honey_inventory",
       agent_name="<your-agent-name>"
     )
     ```

  4. CONFLICT RESOLUTION:
     If you encounter a FILE_RESERVATION_CONFLICT:
     - DO NOT force edit the file
     - Skip to a different bead that doesn't conflict
     - Or wait for the reservation to expire
     - Send a message to the holding agent if urgent

─────────────────────────────────────────────────────────────────────────────── THE WORK LOOP (Strict Adherence Required) ───────────────────────────────────────────────────────────────────────────────

* ACTION: Immediately continue to the next bead in the queue and claim it

  For every bead you work on, you must perform this exact cycle autonomously:

   1. CLAIM (ATOMIC): Use the --claim flag to atomically claim the bead:
      ```
      bd update <id> --claim
      ```
      This sets BOTH assignee AND status=in_progress atomically.
      If another agent already claimed it, this will FAIL - pick a different bead.

        WRONG: bd update <id> --status in_progress  (doesn't set assignee!)
        RIGHT: bd update <id> --claim                (atomic claim with assignee)

   2. READ: Get bead details (bd show <id>).

   3. RESERVE FILES: Reserve all files you plan to edit (see FILE RESERVATIONS above).
      If conflicts exist, release claim and pick a different bead.

   4. PLAN: Briefly analyze files. Self-approve your own plan immediately.

   5. EXECUTE: Implement code changes (only to files you have reserved).

   6. VERIFY: Activate conda honey_inventory, run pre-commit run --files <files you touched>, then run scoped tests for the code you changed using ~/run_tests (test URLs only; no prod secrets).
       * IF FAIL: Fix immediately and re-run. Do not ask for help as this is HEADLESS MODE.
       * Note: you can use --no-verify if you must if you find some WIP files are breaking app import in security linter, the goal is to help catch issues to improve the codebase, not stop progress completely.

   7. MIGRATE (if needed): Apply migrations to ALL 4 targets (platform prod/test, tenant prod/test).

   8. GIT/PUSH: git status → git add only the files you created or changed for this bead → git commit --no-verify -m "<bead-id> <short summary>" → git push. Do this immediately after closing the bead. Do not leave untracked/unpushed files; do not add unrelated files.

   9. RELEASE & CLOSE: Release file reservations, then run bd close <id>.

  10. COMMUNICATE: Send completion message via Agent Mail:
      - thread_id: the bead ID
      - subject: "[HONEY-xxxx] Completed"
      - body: brief summary of changes

  11. RESTART: Check inbox for messages, then select the next bead FOR EPIC HONEY-khnx, claim it, and jump to step 1.

─────────────────────────────────────────────────────────────────────────────── CONSTRAINTS & OVERRIDES ───────────────────────────────────────────────────────────────────────────────

   * Migrations: You are pre-authorized to apply all migrations. Do not stop for safety checks unless data deletion is explicit.
   * Progress Reporting: DISABLE interim reporting. Do not summarize after one bead. Summarize only when the entire list is empty.
   * Tracking: Maintain a running_work_log.md file. Append your completed items there. This file is your only allowed form of status reporting until the end.
   * Blockers: If a specific bead is strictly blocked (e.g., missing API key), mark it as blocked in bd, log it in running_work_log.md, and IMMEDIATELY SKIP to the next bead. Do not stop the session.
   * File Conflicts: If you cannot reserve needed files, skip to a different bead. Do not edit files reserved by other agents.

  START NOW. DO NOT REPLY WITH A PLAN. REGISTER WITH AGENT MAIL, THEN START THE NEXT BEAD IN THE QUEUE IMMEDIATELY. HEADLESS MODE IS ON.

dwaltrip · 2026-01-19T12:13:01 1768824781

Thanks for these notes. Very interesting! Looking forward to experimenting.

wenc · 2026-01-05T21:22:11 1767648131

I took a course on critical thinking once which taught me to at least pause and ask questions when I read something sensationalist.

I’ve learned that when my impulse is to say “It sounds like something X would do,” it often leads me down very wrong paths.

Just because the vibes point in that direction often doesn’t mean it’s the truth.

This is exactly how misinformation starts and real people pay the price for generations. Just look at the MSG (monosodium glutamate) untruth that has hurt many Chinese restaurants and Chinese Americans over the years. Misinformation is not victimless.

wenc · 2025-12-17T18:29:53 1765996193

I live in NYC. That sounds more like NYC.

Most mega cities in Asia Pacific at least have newer infrastructure and social norms that make them less miserable.

NYC is miserable in some ways (crumbling infrastructure, broken social norms especially post COVID) and not in others (creativity, passion).

Some days I feel the latter makes up for the former. But the former is still bad.

wenc · 2025-12-17T18:28:08 1765996088

Curious, what does that have to with the article?

wenc · 2025-12-06T20:14:33 1765052073

Not Doordash, but I often ask my Lyft drivers.

Like for anything in life, it depends on the market you're in, and where you are situated on the acumen distribution (low acumen drivers don't make a lot, while high acumen ones do). Narrative thinking likes cherry picking the worst cases and then representing those stories as normal. But reality is always a bit more complicated.

Many Lyft drivers are immigrants, either between jobs, or are doing it full time. I don't directly ask them how much they make, but I tell them in my homeland of Canada, if you're on welfare and disability, you make CAD$x and it's good enough money for a single person. Most Lyft drivers are like: "that's way, way less than what I make doing this." (some tell me they make US$3-4k/month, working 6 day weeks. This might sound like too little, but realize that not all COL is the same, and for an immigrant, this is a great gig with optionality -- you can always turn off the app).

Then you might say, oh, they're blind to the depreciation hit their cars are taking. But good, high acumen, Lyft drivers often drive second-hand Priuses which depreciate slowly and have great mileage, and they track expenses like a hawk. If you go to the Lyft subreddit, good drivers know all the tricks -- optimize for high yield windows, avoid dead miles, avoid blind quests, and the cap hours deliberately. It's the ones with less acumen that drive a new car with low MPG on financing, and don't optimize.

So don't generalize and catastrophize (catastrophizing amplifies depression -- in cognitive behavioral therapy, they teach you to guard against that). Just stop it. Everything in life is a distribution.

I'm not saying driving a Lyft is aspirational, but for some immigrants who are doing it to support their family, it's a less-bad option than many others.

I even had a few drivers pushing me to get married (I'm single) because they said, once they get home from a day of driving, they get to go back to family. "It's less lonely," they tell me. In some ways they're happier than I am, even though I make nominally more than them.

gtowey · 2025-12-08T21:01:32 1765227692

> Just stop it.

Has this approach of ordering people to feel or think a certain way ever actually worked for you? Do you think this is a good way to have a respectful discussion with someone?

> some tell me they make US$3-4k/month

So let's run with that number, say 3.5k/month average. That's $42,000/year: They're independent contractors, so they have independent employment tax which is about 2x of standard w-2 taxes. A quick online calculator tells me that would amount to 8,370.90 in my state. The lions share is federal medicare and social security tax though; state tax on this is practically nothing here.

* They must have a car. Quick search says a 2024 prius is around $25k. Monthly payments at 9.6% (prime rate for a used car, all new immigrants start out with perfect credit, right?) at 60 months is around $550. Let's say insurance is $100. Ignore fuel and maintenance for now.

* So now we have 2800/month after taxes and 650/month just to work. The remaining 2150 needs to cover everything else. Food, housing, utilities.

* This doesn't even consider health insurance. Let's hope they qualify for a subsidized plan, because they sure can't cover it on this income. I guess they have the "don't get sick" plan that's so popular these days.

Oh and all this for working 6 days a week. Which if that's the case they're not clocking 8 hours days. Even conservatively you should assume that's 10 hours a day.

And on top of all that you said they're supporting a family?! So 3 or 4 people on 2150/month?

All that so they can work 60 hour weeks, barely see their wife and kids, and be one accident or medical emergency away from total financial ruin.

But you're right, they're rich in spirit and after all, isn't that what really matters? They really do have it so much better than you.