You probably just don't have the hang of it yet. It's very good but it's not a m...

staticassertion · 2026-03-14T14:32:12 1773498732

I absolutely have the hang of Claude and I still find that it can make those ridiculous mistakes, like replicating logic into a test rather than testing a function directly, talking to a local pg that was stale/ running, etc. I have a ton of skills and pre-written prompts for testing practices but, over longer contexts, it will forget and do these things, or get confused, etc.

You can minimize these problems with TLC but ultimately it just will keep fucking up.

extr · 2026-03-15T06:33:30 1773556410

Don't know what to tell you. Sounds like you're holding it wrong. Based on the current state of things I would try to get better at holding it the right way.

staticassertion · 2026-03-15T09:50:11 1773568211

I can't tell if you're joking?

withinboredom · 2026-03-14T16:48:27 1773506907

My favorite is when you need to rebuild/restart outside of claude and it will "fix the bug" and argue with you about whether or not you actually rebuilt and restarted whatever it is you're working on. It would rather call you a liar than realize it didn't do anything.

kyyol · 2026-03-14T20:42:29 1773520949

this is a pretty annoying problem -- i just intentionally solve it by asking claude to always use the right build command after each batch of modifications, etc

staticassertion · 2026-03-14T17:04:49 1773507889

"That's an old run, rebuild and the new version will work" lol

Huppie · 2026-03-14T10:34:43 1773484483

With the back and forth refining I find it very useful to tell Claude to 'ask questions when uncertain' and/or to 'suggest a few options on how to solve this and let me choose / discuss'

This has made my planning / research phase so much better.

visarga · 2026-03-14T05:36:59 1773466619

Yes pretty much my workflow. I also keep all my task.md files around as part of the repo, and they get filled up with work details as the agent closes the gates. At the end of each one I update the project memory file, this ensures I can always resume any task in a few tokens (memory file + task file == full info to work on it).

__mharrison__ · 2026-03-14T06:15:59 1773468959

Pretty good workflow. But you need to change the order of the tests and have it write the tests first. (TDD)

sarchertech · 2026-03-14T06:25:48 1773469548

I mean I’ve been using AI close to 4 years now and I’ve been using agents off and on for over a year now. What you’re describing is exactly what I’m doing.

I’m not seeing anyone at work either out of hundreds of devs who is regularly cranking out several thousand lines of pretty good working code in 30-45 minutes.

What’s an example of something you built today like this?

extr · 2026-03-15T06:59:10 1773557950

Fair, that's optimistic, and it depends what you're doing. Looking at a personal project I had a PR from this week at +3000 -500 that I feel quite good about, took about 2 nights of about an hour each session to shape it into what I needed (a control plane for a polymarket trading engine). Though if I'm being fair, this was an outlier, only possible because I very carefully built the core of the engine to support this in advance - most of the 3K LoC was "boilerplate" in the sense I'm just manipulating existing data structures and not building entirely new abstractions. There are definitely some very hard-fought +175 -25 changes in this repo as well.

Definitely for my day job it's more like a few hundred LoC per task, and they take longer. That said, at work there are structural factors preventing larger changes, code review, needing to get design/product/coworker input for sweeping additions, etc. I fully believe it would be possible to go faster and maintain quality.

sarchertech · 2026-03-15T12:28:07 1773577687

Those numbers are much more believable, but now we’re well into maybe a 2-3x speed up. I can easily write 500 LOC in an hour if I know exactly what I’m building (ignoring that LOC is a terrible metric).

But now I have to spend more time understanding what it wrote, so best case scenario we’re talking maybe a 50% speed up to a part of my job that I spent maybe 10-20% on.

Making very big assumptions that this doesn’t add long term maintenance burdens or result in a reduction of skills that makes me worse at reviewing the output, it’s cool technology.

On par with switching to a memory managed language or maybe going from J2EE to Ruby on Rails.

extr · 2026-03-16T05:01:12 1773637272

Thinking in terms of a "speed up multiplier" undersells it completely. The speed up on a task I would have never even attempted is infinite. For my +3000 PR recently on my polymarket engine control plane, I had no idea how these types of things are typically done. It would have taken me many hours to think through an implementation and hours of research online to assemble an understanding on typical best practices. Now with AI I can dispatch many parallel agents to examine virtually all all public resources for this at once.

Basically if it's been done before in a public facing way, you get a passable version of that functionality "for free". That's a huge deal.

sarchertech · 2026-03-16T13:18:28 1773667108

1. You think you have something following typical best practices. You have no way to verify that without taking the time to understand the problem and solution yourself.

2. If you’d done 1, you’d have the knowledge yourself next time the problem came up and could either write it yourself or skip the verifications step.

I’m not saying there aren’t problems out there where the problem is hard to solve but easy to verify. And for those use cases LLMs are terrific.

But many problems have the inverse property. And many problems that look like the first type are actually the second.

LLMs are also shockingly good at generating solutions that look plausible, independent of correctness or suitability, so it’s almost always harder to do the verification step than it seems.

extr · 2026-03-16T18:48:47 1773686927

The control plane is already operational and does what I need. Copying public designs solved a few problems I didn't even know I had (awkward command and control UX) and seems strictly superior to what I had before. I could have taken a lot longer on this - probably at least a week, to "deeply understand the problem and solution". But it's unclear what exactly that would have bought me. If I run into further issues I will just solve them at that time.

So what is the issue exactly? This pattern just seems like a looser form of using a library versus building from scratch.

sarchertech · 2026-03-21T12:33:39 1774096419

For one I’d argue that you shouldn’t just use a library without understanding what it does and verifying it does what it says.

But a library has been used by multiple people who have verified that it does what it says it does as long as you pick something popular.

You have no idea what this code does. Maybe it has a huge security flaw? Or maybe it’s just riddled with bugs that you don’t know enough to expose.

Maybe it “follows best practices” that your agents uncovered or maybe it doesn’t.

If you expose customer data, or you fuck up in a way that costs customers money, the AI isn’t liable for that you are.

Now if this is just a toy app where no one can be harmed sure who cares.