I have to say I’m in the exact camp the author is complaining about. I’ve shippe...

rester324 · 2025-07-05T00:07:24 1751674044

But you just confirmed everything the blogpost claimed.

You didn't share any evidence with us even though you claim unbelievable things.

You even went as far as registering a throwavay account to hide your identity and to make verifying any of your claims impossible.

Your comment feels more like a joke to me

peteforde · 2025-07-05T02:23:10 1751682190

... this from an account with <100 karma.

Look, the person who wrote that comment doesn't need to prove anything to you just because you're hopped up after reading a blog post that has clearly given you a temporary dopamine bump.

People who understand their domains well and are excellent written communicators can craft prompts that will do what we used to spend a week spinning up. It's self-evident to anyone in that situation, and the only thing we see when people demand "evidence" is that you aren't using the tools properly.

We don't need to prove anything because if you are working on interesting problems, even the most skeptical person will prove it to themselves in a few hours.

rester324 · 2025-07-05T03:15:13 1751685313

Feeling triggered? Feeling afraid? And yes, every claim needs to be proven, otherwise those who make the claims will only convince 4 year olds.

>People who understand their domains well and are excellent written communicators can craft prompts that will do what we used to spend a week spinning up. It's self-evident to anyone in that situation, and the only thing we see when people demand "evidence" is that you aren't using the tools properly.

You have no proof of this, so I guess you chose your camp already?

mccoyb · 2025-07-04T22:08:29 1751666909

Same experience here, probably in a slightly different way of work (PhD student). Was extremely skeptical of LLMs, Claude Code has completely transformed the way I work.

It doesn't take away the requirements of _curation_ - that remains firmly in my camp (partially what a PhD is supposed to teach you! to be precise and reflective about why you are doing X, what do you hope to show with Y, etc -- breakdown every single step, explain those steps to someone else -- this is a tremendous soft skill, and it's even more important now because these agents do not have persistent world models / immediately forget the goal of a sequence of interactions, even with clever compaction).

If I'm on my game with precise communication, I can use CC to organize computation in a way which has never been possible before.

It's not easier than programming (if you care about quality!), but it is different, and it comes with different idioms.

0x696C6961 · 2025-07-04T22:04:00 1751666640

I find that the code quality LLMs output is pretty bad. I end up going through so many iterations that it ends up being faster to do it myself. What I find agents actually useful for is doing large scale mechanical refractors. Instead of trying to figure out the perfect vim macro or AST rewrite script, I'll throw an agent at it.

AnotherGoodName · 2025-07-04T22:39:02 1751668742

I disagree strongly at this point. The code is generally good if the prompt was reasonable at this point but also every test possible is now being written, every ui element has the all required traits, every function has the correct documentation attached, the million little refactors to improve the codebase are being done, etc.

Someone told me ‘ai makes all the little things trivial to do’ and i agree strongly with that. Those many little things are things that together make a strong statement about quality. Our codebase has gone up in quality significantly with ai whereas we’d let the little things slide due to understaffing before.

troupo · 2025-07-05T05:14:37 1751692477

> The code is generally good if the prompt was reasonable at this point

Which, again, is 100% unverifiable and cannot be generalized. As described in the article.

How do I know this? Because, as I said in the article, I use these tools daily.

And "prompt was reasonable" is a yet another magical incantation that may or may not work. Here's my experience: https://news.ycombinator.com/item?id=44470144

0x696C6961 · 2025-07-05T00:30:56 1751675456

> The code is generally good if the prompt was reasonable

The point is writing that prompt takes longer than writing the code.

> Someone told me ‘ai makes all the little things trivial to do’ and i agree strongly with that

Yeah, it's great for doing all of those little things. It's bad at doing the big things.

diggan · 2025-07-05T08:25:17 1751703917

> The point is writing that prompt takes longer than writing the code.

Luckily we can reuse system prompts :) Mine usually contains something like https://gist.github.com/victorb/1fe62fe7b80a64fc5b446f82d313... + project-specific instructions, which is reused across sessions.

Currently, it does not take the same amount of time to prompt as if I was to write the code.

lubujackson · 2025-07-05T15:51:11 1751730671

Have to disagree with this too - ask an LLM to architect a project, or propose a cleaner solution and usually does a good job.

Where it still sucks is doing both at once. Thus the shift to integrating "to do" lists in Cursor. My flow has shifted to "design this feature" then "continue to implement" 10 times in a row with code review between each step.

CharlesW · 2025-07-04T22:17:47 1751667467

> I find that the code quality LLMs output is pretty bad.

That was my experience with Cursor, but Claude Code is a different world. What specific product/models brought you to this generalization?

troupo · 2025-07-05T05:16:52 1751692612

Claude Code depending on weather, phase of the moon, and compute availability at a specific point in time: https://news.ycombinator.com/item?id=44470144

the__alchemist · 2025-07-04T22:51:33 1751669493

What sort of mechanical refactors?

0x696C6961 · 2025-07-05T00:27:53 1751675273

"Find all places this API is used and rewrite it using these other APIs."

xoralkindi · 2025-07-04T23:14:20 1751670860

> I audit everything myself before making PRs and test rigorously

How do you audit code from an untrusted source that quickly, LLMs do not have the whole project in their heads and are proned to hallucinate.

On average how long are your prompts and does the LLM also write the unit tests?

hotpotat · 2025-07-05T11:18:15 1751714295

The auditing is not quick. I prefer cursor to claude code because I can review its changes while it’s going more easily and stop and redirect it if it starts to veer off course (which is often, but the cost of doing business). Over time I still gain an understanding of the codebase that I can use to inform my prompts or redirection, so it’s not like I’m blindly asking it to do things. Yes, I do ask it to write unit tests a lot of the time. But I don’t have it spin off and just iterate until the unit tests pass — that’s a recipe for it to do what it needs to do to pass them and is counterproductive. I plan what I want the set of tests to look like and have them write functions in isolation without mentioning tests, and if tests fail I go through a process of auditing the failing code and then the tests themselves to make sure nothing was missed. It’s exactly how I would treat a coworkers code that I review. My prompts range from a few sentences to a few paragraphs, and nowadays I construct a large .md file with a checklist that we iterate on for larger refactors and projects to manage context

bamboozled · 2025-07-04T22:04:37 1751666677

I use Claude code for hours a day, it’s a liar, trust what it does at your own risk.

I personally think you’re sugar coating the experience.

swader999 · 2025-07-05T01:25:04 1751678704

It lies with such enthusiasm though.

herbst · 2025-07-05T08:26:09 1751703969

Recently worked with a weird C flavor (Monkey C) it hallucinated every single method, all the time, every time again.

I know it's just a question of time, likely. However that was soooo far from helpful. And it was itself so sure it's doing it right, again and again without ever consulting the docs

CharlesW · 2025-07-04T22:29:00 1751668140

> I use Claude code for hours a day, it’s a liar, trust what it does at your own risk.

The person you're responding to literally said, "I audit everything myself before making PRs and test rigorously".

bamboozled · 2025-07-06T00:36:25 1751762185

I didn't see that but I assume they edited their comment.

troupo · 2025-07-04T22:07:30 1751666850

Please re-read the article. Especially the first list of things we don't know about you, your projects etc.

Your specific experience cannot be generalized. And speaking as the author, and who is (as written in the article) literally using these tools everyday.

> But I’m just so floored how anyone could not be extracting the same utility from it. It feels like there’s two articles like this every week now.

This is where we learn that you haven't actually read the article. Because it is very clearly stating, with links, that I am extracting value from these tools.

And the article is also very clearly not about extracting or not extracting value.

hotpotat · 2025-07-04T22:12:02 1751667122

I did read the entire article before commenting and acknowledge that you are using them to some affect, but the line about 50% of the time it works 50% of the time is where I lost faith in the claims you’re making. I agree it’s very context dependent but, in the same way, you did not outline your approaches and practices in how you use AI in your workflow. The same lack of context exists on the other side of the argument.

alt187 · 2025-07-04T22:24:36 1751667876

I agree about the 50/50 thing. It's about how much Claude helped me, and I use it daily too.

I'll give some context, though.

- I use OCaml and Python/SQL, on two different projects.

- Both are single-person.

- The first project is a real-time messaging system, the second one is logging a bunch of events in an SQL database.

In the first project, Claude has been... underwhelming. It casually uses C idioms, overabuses records and procedural programming, ignores basic stuff about the OCaml standard library, and even gave me some data structures that slowed me down later down the line. It also casuallyies about what functions does.

A real example: `Buffer.add_utf_8_uchar` adds the ASCII representation of an utf8 char to a buffer, so it adds something that looks like `\123\456` for non-ascii.

I had to scold Claude for using this function to add an utf8 character to a Buffer so many times I've lost count.

In the second project, Claude really shined. Making most of the SQL database and moving most of the logic to the SQL engine, writing coherent and readable Python code, etc.

I think the main difference is that the first one is an arcane project in an underdog language. The second one is a special case of a common "shovel through lists of stuffs and stuff them in SQL" problem, in the most common language.

You basically get what you trained for.

lubujackson · 2025-07-05T15:54:55 1751730895

Just FYI, try commenting on that function what it is intended to be used for. Because without more info LLMs will rely on function names strongly. Heck, have the LLM add comments to every function and I bet it will start to do better.

alt187 · 2025-07-06T08:20:06 1751790006

It's not my function in the example, it's a standard library function. It does have a weird name though.

troupo · 2025-07-04T22:21:23 1751667683

> but the line about 50% of the time it works 50% of the time is where I lost faith in the claims you’re making.

It's a play on the Anchorman joke that I slightly misremembered: "60% of the time it works 100% of the time"

> is where I lost faith in the claims you’re making.

Ah yes. You lost faith in mine, but I have to have 100% faith in your 100% unverified claim about "job at a demanding startup" where "you still haven't written a single line of code by hand"?

Why do you assume that your word and experience is more correct than mine? Or why should anyone?

> you did not outline your approaches and practices in how you use AI in your workflow

No one does. And if you actually read the article, you'd see that is literally the point.

CharlesW · 2025-07-04T22:20:33 1751667633

> …the line about 50% of the time it works 50% of the time is where I lost faith in the claims you’re making…

That's where the author lost me as well. I'd really be interested in a deep dive on their workflow/tools to understand how I've been so unbelievably lucky in comparison.

troupo · 2025-07-04T22:21:47 1751667707

Sibling comment: https://news.ycombinator.com/item?id=44468374

gabrieledarrigo · 2025-07-05T00:08:30 1751674110

> I started a job at a demanding startup and it’s been several months and I have still not written a single line of code by hand

Damn, this sounds pretty boring.

hotpotat · 2025-07-05T11:21:24 1751714484

It’s not. It’s like I used to play baseball professionally and now I’m a coach or GM building teams and yielding results. It’s a different set of skills. I’m working mostly in idea space and seeing my ideas come to life with a faster feedback loop and the toil is mostly gone

gyomu · 2025-07-04T22:01:52 1751666512

> I’ve shipped non trivial greenfield products

Links please

larve · 2025-07-04T22:08:28 1751666908

Here's maybe the most impressive thing I've vibecoded, where I wanted to track a file write/read race condition in a vscode extension: https://github.com/go-go-golems/go-go-labs/tree/main/cmd/exp...

This is _far_ from web crud.

Otherwise, 99% of my code these days is LLM generated, there's a fair amount of visible commits from my opensource on my profile https://github.com/wesen .

A lot of it is more on the system side of things, although there are a fair amount of one-off webapps, now that I can do frontends that don't suck.

hotpotat · 2025-07-04T22:07:44 1751666864

I’d like to, but purposefully am using a throwaway account. It’s an iOS app rated 4.5 stars on the app store and has a nice community. Mild userbase, in the hundreds.

exe34 · 2025-07-04T22:18:54 1751667534

> but my productivity was still at least 3x

How do you measure this?

hotpotat · 2025-07-05T11:26:23 1751714783

Mean time to shipping features of various estimated difficulty. It’s subjective and not perfect, but generally speaking I need to work way less. I’ll be honest, one thing I think I could have done faster without AI was to implement CRDT-based cloud sync for a project I have going. I think I’ve tried to utilize AI too much for this. It’s good at implementing vector clock implementations, but not at preventing race conditions.

exe34 · 2025-07-05T12:21:03 1751718063

Are you sure it wasn't just stealing from open source projects? If so, you could just cut out the middle man.

dosnem · 2025-07-05T04:35:41 1751690141

> there’s a process of research and planning and perusing in careful steps, and I set the agent up for success

Are there any good articles you can share or maybe your process? I’m really trying to get good at this but I don’t find myself great at using agents and I honestly don’t know where to start. I’ve tried the memory bank in cline, tried using more thinking directives, but I find I can’t get it to do complex things and it ends up being a time sink for me.

hotpotat · 2025-07-05T11:22:37 1751714557

https://www.lesswrong.com/posts/dxiConBZTd33sFaRC/field-note...

pier25 · 2025-07-06T15:38:13 1751816293

And you created an account just to write this unbelievable claim?

A bit suspicious, wouldn’t you agree?

the__alchemist · 2025-07-04T21:58:49 1751666329

Web dev CRUD in node?

hotpotat · 2025-07-04T22:14:42 1751667282

Multi platform web+native consumer application with lots of moving parts and integration. I think to call it a CRUD app would be oversimplifying it.

itsafarqueue · 2025-07-06T07:52:50 1751788370

More anecdata: +1 for “LLMs write all my production code now”. 25+ years in industry, as expert as it’s possible to be in my domain. 100% agree LLMs fail hilariously badly, often, and dangerously. And still, write ~all my code.

No agenda here, not selling anything. Just sitting here towards the later part of my career, no need to prove anything to anyone, stating the view from a grey beard.

Crypto hype was shill from grifters pumping whatever bag holding scam they could, which was precisely what the behavioral economic incentives drove. GenAI dev is something else. I’ve watched many people working with it, your mileage will vary. But in my opinion (and it’s mine, you do you), hand coding is an apocryphal skill. The only part I wonder about is how far up and down the system/design/architecture stack the power-tooling is going to go. My intuition and empirical findings incline towards a direction I think would fuel a flame war. But I’m just grey beard Internet random, and hey look, no evidence just more baseless claims. Nothing to see here.

Disclosure: I hold no direct shares in Mag 7, nor do I work for one.