Eh accuracy and reliability is a different topic hashed out many times on HN. Th...

maltelau · 2026-03-07T07:14:03 1772867643

And every time the issue is side-stepped by chatbot proponents.

Accuracy and reliability are necessary to know real productivity. If you have produced code that doesn't work right, you haven't "produced" anything (except in the economic sense of managing to get someone to pay for it).

For example, if you produce 5x more code at 5% reliability, the net result is a -75% change in productivity (ignoring the overhead costs of detecting said reliability).

hansmayer · 2026-03-07T11:51:51 1772884311

Exactly this. High productivity if all you do is generate slop and brainrot videos. If you are going to generate code with it... well how productive was the genius at the AWS who used Kiro to cause that December outage ? 3 years ago that would have been a career-ending choice of productivity tools.

therealdrag0 · 2026-03-09T01:10:30 1773018630

We’ve all been waiting for the reliability shoe to drop for, what, a year now?

It’s only slop of you don’t understand the code, prompt, and result, and skip code reviews. You can have large productivity gains without reducing quality standards.

hansmayer · 2026-03-09T08:20:04 1773044404

> It’s only slop of you don’t understand the code, prompt, and result, and skip code reviews. You can have large productivity gains without reducing quality standards.

So essentially like delegating all work to a beginner programmer only 10x more frustrating? Well, that's not what I would classify under "Pocket PhD" or "Nation of PhDs in a datacenter", which is the bullshit propaganda the AI CEOs are relentlessly pushing. We should not have to figure this out for them - they were saying this will write ALL code in 6 months from "now", the last time "now" being January 2026, so in little over 4.5 months. No, we should not be fixing this mess, f*k understanding the prompts and doing the code reviews of the AI slop. Why does it not work as advertised?

therealdrag0 · 2026-03-09T17:30:37 1773077437

I’m not here to defend bs propaganda. I don’t think I’ve seen anyone defend that stuff. I don’t know if you’re shifting goalposts or that’s what you’ve always been worried about.

I’m just saying the productivity gains are real, even in serious production level and life critical systems.

If you are only able to think in binaries, no-AI or phd-AI, that’s a you problem.

hansmayer · 2026-03-10T07:35:38 1773128138

> I’m just saying the productivity gains are real, even in serious production level and life critical systems.

Again, neither serious studies (See that METR study on dev productivity), nor the ever increasing rate of major incidents caused by AI support your statement. Not to mention the absolute lack of well known AI-produced products that we know of.

> If you are only able to think in binaries, no-AI or phd-AI, that’s a you problem.

No, you see if I were a CEO of a public ompany and I lied through my teeth to the investors and the general public, about the capabilities of my product, then I would normally go to jail. The CEOs of major AI companies are making claims that do not seem to be confirmed in reality. They have burned several hundred billion dollars so far, in pursuit of "god-level intelligence". What came out instead is "your prompting sucks" or similar level of nonsense.

I am only holding them to the standards they have repeatedly, boldly and insistently set themselves. You should be too.

therealdrag0 · 2026-03-10T19:48:56 1773172136

> METR

Yes I’ve seen it. It was certainly interesting at the time. If you refresh yourself on the study, it admits to reflecting a narrow point in time on a narrow task type and toolset.

Last July most people I know weren’t automating Jira tickets, pull requests, comment addressing, design docs, multi repo research, and customizing rule sets. Now everyone I know does, each of these incrementally speed up productivity.

> Not to mention the absolute lack of well known AI-produced products that we know of.

This is a strange comment. We have a well known example in openclaw, which is notoriously vibe coded, which again if you follow the thread, I’m not defending. whereas I know senior and staff engineers at most FAANG companies and every single one uses AI to code, so many many products you know are being written with AI.

I don’t wanna dox myself but last year my company developed a greenfield product with a pretty large headcount of eng (multiple teams)that was built with an AI first development workflow, now that doesn’t mean the 20 engineers just stood around and twiddling their thumbs. They were doing real engineering and software development work with heavy agentic AI use. They shipped it in six months and it’s been in prod for months. If you can’t see how AI is being used I don’t know what to tell you.

hansmayer · 2026-03-11T07:17:33 1773213453

> This is a strange comment. We have a well known example in openclaw, which is notoriously vibe coded, which again if you follow the thread, I’m not defending. whereas I know senior and staff engineers at most FAANG companies and every single one uses AI to code, so many many products you know are being written with AI.

Oh it's a product? What does it do? Leak data and delete inboxes? I would not call that a "product" at least not in the commercial sense.

> I don’t wanna dox myself but last year my company developed a greenfield product with a pretty large headcount of eng (multiple teams)that was built with an AI first development workflow

Yeah you sure are not "doxxing" yourself with this generic statement. I am sure you guys built something with the "AI first" workflow. The point being, based on what he AI CEOs and AI boosters are saying, this should have been a project with one person organising a "fleet of agents" . Why wasn't it? If it still requires a large engineering headcount, what's the point of using the AI?

therealdrag0 · 2026-03-12T00:39:31 1773275971

> AI CEOs and AI boosters are saying You really enjoy trying to argue about something I’m not arguing about. I literally could not care less what they’re saying. I use tools available for me for my profession, to the extent that they are useful. Anyone who thinks AI can one shot a successful product is clueless, but that has nothing to do with the actual ways AI and agents are being used today. And anyone incapable of understanding that and the difference is equally clueless.