I have to say I’m in the exact camp the author is complaining about. I’ve shipped non trivial greenfield products which I started back when it was only ChatGPT and it was shitty. I started using Claude with copying and pasting back and forth between the web chat and XCode. Then I discovered Cursor. It left me with a lot of annoying build errors, but my productivity was still at least 3x. Now that agents are better and claude 4 is out, I barely ever write code, and I don’t mind. I’ve leaned into the Architect/Manager role and direct the agent with my specialized knowledge if I need to.
I started a job at a demanding startup and it’s been several months and I have still not written a single line of code by hand. I audit everything myself before making PRs and test rigorously, but Cursor + Sonnet is just insane with their codebase. I’m convinced I’m their most productive employee and that’s not by measuring lines of code, which don’t matter; people who are experts in the codebase ask me for help with niche bugs I can narrow in on in 5-30 minutes as someone whose fresh to their domain. I had to lay off taking work away from the front end dev (which I’ve avoided my whole career) because I was stepping on his toes, fixing little problems as I saw them thanks to Claude. It’s not vibe coding - there’s a process of research and planning and perusing in careful steps, and I set the agent up for success. Domain knowledge is necessary. But I’m just so floored how anyone could not be extracting the same utility from it. It feels like there’s two articles like this every week now.
Look, the person who wrote that comment doesn't need to prove anything to you just because you're hopped up after reading a blog post that has clearly given you a temporary dopamine bump.
People who understand their domains well and are excellent written communicators can craft prompts that will do what we used to spend a week spinning up. It's self-evident to anyone in that situation, and the only thing we see when people demand "evidence" is that you aren't using the tools properly.
We don't need to prove anything because if you are working on interesting problems, even the most skeptical person will prove it to themselves in a few hours.
Feeling triggered? Feeling afraid? And yes, every claim needs to be proven, otherwise those who make the claims will only convince 4 year olds.
>People who understand their domains well and are excellent written communicators can craft prompts that will do what we used to spend a week spinning up. It's self-evident to anyone in that situation, and the only thing we see when people demand "evidence" is that you aren't using the tools properly.
You have no proof of this, so I guess you chose your camp already?
Same experience here, probably in a slightly different way of work (PhD student). Was extremely skeptical of LLMs, Claude Code has completely transformed the way I work.
It doesn't take away the requirements of _curation_ - that remains firmly in my camp (partially what a PhD is supposed to teach you! to be precise and reflective about why you are doing X, what do you hope to show with Y, etc -- breakdown every single step, explain those steps to someone else -- this is a tremendous soft skill, and it's even more important now because these agents do not have persistent world models / immediately forget the goal of a sequence of interactions, even with clever compaction).
If I'm on my game with precise communication, I can use CC to organize computation in a way which has never been possible before.
It's not easier than programming (if you care about quality!), but it is different, and it comes with different idioms.
I find that the code quality LLMs output is pretty bad. I end up going through so many iterations that it ends up being faster to do it myself. What I find agents actually useful for is doing large scale mechanical refractors. Instead of trying to figure out the perfect vim macro or AST rewrite script, I'll throw an agent at it.
I disagree strongly at this point. The code is generally good if the prompt was reasonable at this point but also every test possible is now being written, every ui element has the all required traits, every function has the correct documentation attached, the million little refactors to improve the codebase are being done, etc.
Someone told me ‘ai makes all the little things trivial to do’ and i agree strongly with that. Those many little things are things that together make a strong statement about quality. Our codebase has gone up in quality significantly with ai whereas we’d let the little things slide due to understaffing before.
Have to disagree with this too - ask an LLM to architect a project, or propose a cleaner solution and usually does a good job.
Where it still sucks is doing both at once. Thus the shift to integrating "to do" lists in Cursor. My flow has shifted to "design this feature" then "continue to implement" 10 times in a row with code review between each step.
The auditing is not quick. I prefer cursor to claude code because I can review its changes while it’s going more easily and stop and redirect it if it starts to veer off course (which is often, but the cost of doing business). Over time I still gain an understanding of the codebase that I can use to inform my prompts or redirection, so it’s not like I’m blindly asking it to do things. Yes, I do ask it to write unit tests a lot of the time. But I don’t have it spin off and just iterate until the unit tests pass — that’s a recipe for it to do what it needs to do to pass them and is counterproductive. I plan what I want the set of tests to look like and have them write functions in isolation without mentioning tests, and if tests fail I go through a process of auditing the failing code and then the tests themselves to make sure nothing was missed. It’s exactly how I would treat a coworkers code that I review. My prompts range from a few sentences to a few paragraphs, and nowadays I construct a large .md file with a checklist that we iterate on for larger refactors and projects to manage context
Recently worked with a weird C flavor (Monkey C) it hallucinated every single method, all the time, every time again.
I know it's just a question of time, likely. However that was soooo far from helpful. And it was itself so sure it's doing it right, again and again without ever consulting the docs
Please re-read the article. Especially the first list of things we don't know about you, your projects etc.
Your specific experience cannot be generalized. And speaking as the author, and who is (as written in the article) literally using these tools everyday.
> But I’m just so floored how anyone could not be extracting the same utility from it. It feels like there’s two articles like this every week now.
This is where we learn that you haven't actually read the article. Because it is very clearly stating, with links, that I am extracting value from these tools.
And the article is also very clearly not about extracting or not extracting value.
I did read the entire article before commenting and acknowledge that you are using them to some affect, but the line about 50% of the time it works 50% of the time is where I lost faith in the claims you’re making. I agree it’s very context dependent but, in the same way, you did not outline your approaches and practices in how you use AI in your workflow. The same lack of context exists on the other side of the argument.
I agree about the 50/50 thing. It's about how much Claude helped me, and I use it daily too.
I'll give some context, though.
- I use OCaml and Python/SQL, on two different projects.
- Both are single-person.
- The first project is a real-time messaging system, the second one is logging a bunch of events in an SQL database.
In the first project, Claude has been... underwhelming. It casually uses C idioms, overabuses records and procedural programming, ignores basic stuff about the OCaml standard library, and even gave me some data structures that slowed me down later down the line. It also casuallyies about what functions does.
A real example: `Buffer.add_utf_8_uchar` adds the ASCII representation of an utf8 char to a buffer, so it adds something that looks like `\123\456` for non-ascii.
I had to scold Claude for using this function to add an utf8 character to a Buffer so many times I've lost count.
In the second project, Claude really shined. Making most of the SQL database and moving most of the logic to the SQL engine, writing coherent and readable Python code, etc.
I think the main difference is that the first one is an arcane project in an underdog language. The second one is a special case of a common "shovel through lists of stuffs and stuff them in SQL" problem, in the most common language.
Just FYI, try commenting on that function what it is intended to be used for. Because without more info LLMs will rely on function names strongly. Heck, have the LLM add comments to every function and I bet it will start to do better.
> but the line about 50% of the time it works 50% of the time is where I lost faith in the claims you’re making.
It's a play on the Anchorman joke that I slightly misremembered: "60% of the time it works 100% of the time"
> is where I lost faith in the claims you’re making.
Ah yes. You lost faith in mine, but I have to have 100% faith in your 100% unverified claim about "job at a demanding startup" where "you still haven't written a single line of code by hand"?
Why do you assume that your word and experience is more correct than mine? Or why should anyone?
> you did not outline your approaches and practices in how you use AI in your workflow
No one does. And if you actually read the article, you'd see that is literally the point.
> …the line about 50% of the time it works 50% of the time is where I lost faith in the claims you’re making…
That's where the author lost me as well. I'd really be interested in a deep dive on their workflow/tools to understand how I've been so unbelievably lucky in comparison.
It’s not. It’s like I used to play baseball professionally and now I’m a coach or GM building teams and yielding results. It’s a different set of skills. I’m working mostly in idea space and seeing my ideas come to life with a faster feedback loop and the toil is mostly gone
Otherwise, 99% of my code these days is LLM generated, there's a fair amount of visible commits from my opensource on my profile https://github.com/wesen .
A lot of it is more on the system side of things, although there are a fair amount of one-off webapps, now that I can do frontends that don't suck.
I’d like to, but purposefully am using a throwaway account. It’s an iOS app rated 4.5 stars on the app store and has a nice community. Mild userbase, in the hundreds.
Mean time to shipping features of various estimated difficulty. It’s subjective and not perfect, but generally speaking I need to work way less. I’ll be honest, one thing I think I could have done faster without AI was to implement CRDT-based cloud sync for a project I have going. I think I’ve tried to utilize AI too much for this. It’s good at implementing vector clock implementations, but not at preventing race conditions.
> there’s a process of research and planning and perusing in careful steps, and I set the agent up for success
Are there any good articles you can share or maybe your process? I’m really trying to get good at this but I don’t find myself great at using agents and I honestly don’t know where to start. I’ve tried the memory bank in cline, tried using more thinking directives, but I find I can’t get it to do complex things and it ends up being a time sink for me.
More anecdata: +1 for “LLMs write all my production code now”. 25+ years in industry, as expert as it’s possible to be in my domain. 100% agree LLMs fail hilariously badly, often, and dangerously. And still, write ~all my code.
No agenda here, not selling anything. Just sitting here towards the later part of my career, no need to prove anything to anyone, stating the view from a grey beard.
Crypto hype was shill from grifters pumping whatever bag holding scam they could, which was precisely what the behavioral economic incentives drove. GenAI dev is something else. I’ve watched many people working with it, your mileage will vary. But in my opinion (and it’s mine, you do you), hand coding is an apocryphal skill. The only part I wonder about is how far up and down the system/design/architecture stack the power-tooling is going to go. My intuition and empirical findings incline towards a direction I think would fuel a flame war. But I’m just grey beard Internet random, and hey look, no evidence just more baseless claims. Nothing to see here.
Disclosure: I hold no direct shares in Mag 7, nor do I work for one.
I started a job at a demanding startup and it’s been several months and I have still not written a single line of code by hand. I audit everything myself before making PRs and test rigorously, but Cursor + Sonnet is just insane with their codebase. I’m convinced I’m their most productive employee and that’s not by measuring lines of code, which don’t matter; people who are experts in the codebase ask me for help with niche bugs I can narrow in on in 5-30 minutes as someone whose fresh to their domain. I had to lay off taking work away from the front end dev (which I’ve avoided my whole career) because I was stepping on his toes, fixing little problems as I saw them thanks to Claude. It’s not vibe coding - there’s a process of research and planning and perusing in careful steps, and I set the agent up for success. Domain knowledge is necessary. But I’m just so floored how anyone could not be extracting the same utility from it. It feels like there’s two articles like this every week now.