Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You probably just don't have the hang of it yet. It's very good but it's not a mind reader and if you have something specific you want, it's best to just articulate that exactly as best you can ("I want a test harness for <specific_tool>, which you can find <here>"). You need to explain that you want tests that assert on observable outcomes and state, not internal structure, use real objects not mocks, property based testing for invariants, etc. It's a feedback loop between yourself and the agent that you must develop a bit before you start seeing "magic" results. A typical session for me looks like:

- I ask for something highly general and claude explores a bit and responds.

- We go back and forth a bit on precisely what I'm asking for. Maybe I correct it a few times and maybe it has a few ideas I didn't know about/think of.

- It writes some kind of plan to a markdown file. In a fresh session I tell a new instance to execute the plan.

- After it's done, I skim the broad strokes of the code and point out any code/architectural smells.

- I ask it to review it's own work and then critique that review, etc. We write tests.

Perhaps that sounds like a lot but typically this process takes around 30-45 minutes of intermittent focus and the result will be several thousand lines of pretty good, working code.

 help



I absolutely have the hang of Claude and I still find that it can make those ridiculous mistakes, like replicating logic into a test rather than testing a function directly, talking to a local pg that was stale/ running, etc. I have a ton of skills and pre-written prompts for testing practices but, over longer contexts, it will forget and do these things, or get confused, etc.

You can minimize these problems with TLC but ultimately it just will keep fucking up.


Don't know what to tell you. Sounds like you're holding it wrong. Based on the current state of things I would try to get better at holding it the right way.

I can't tell if you're joking?

My favorite is when you need to rebuild/restart outside of claude and it will "fix the bug" and argue with you about whether or not you actually rebuilt and restarted whatever it is you're working on. It would rather call you a liar than realize it didn't do anything.

this is a pretty annoying problem -- i just intentionally solve it by asking claude to always use the right build command after each batch of modifications, etc

"That's an old run, rebuild and the new version will work" lol

With the back and forth refining I find it very useful to tell Claude to 'ask questions when uncertain' and/or to 'suggest a few options on how to solve this and let me choose / discuss'

This has made my planning / research phase so much better.


Yes pretty much my workflow. I also keep all my task.md files around as part of the repo, and they get filled up with work details as the agent closes the gates. At the end of each one I update the project memory file, this ensures I can always resume any task in a few tokens (memory file + task file == full info to work on it).

Pretty good workflow. But you need to change the order of the tests and have it write the tests first. (TDD)

I mean I’ve been using AI close to 4 years now and I’ve been using agents off and on for over a year now. What you’re describing is exactly what I’m doing.

I’m not seeing anyone at work either out of hundreds of devs who is regularly cranking out several thousand lines of pretty good working code in 30-45 minutes.

What’s an example of something you built today like this?


Fair, that's optimistic, and it depends what you're doing. Looking at a personal project I had a PR from this week at +3000 -500 that I feel quite good about, took about 2 nights of about an hour each session to shape it into what I needed (a control plane for a polymarket trading engine). Though if I'm being fair, this was an outlier, only possible because I very carefully built the core of the engine to support this in advance - most of the 3K LoC was "boilerplate" in the sense I'm just manipulating existing data structures and not building entirely new abstractions. There are definitely some very hard-fought +175 -25 changes in this repo as well.

Definitely for my day job it's more like a few hundred LoC per task, and they take longer. That said, at work there are structural factors preventing larger changes, code review, needing to get design/product/coworker input for sweeping additions, etc. I fully believe it would be possible to go faster and maintain quality.


Those numbers are much more believable, but now we’re well into maybe a 2-3x speed up. I can easily write 500 LOC in an hour if I know exactly what I’m building (ignoring that LOC is a terrible metric).

But now I have to spend more time understanding what it wrote, so best case scenario we’re talking maybe a 50% speed up to a part of my job that I spent maybe 10-20% on.

Making very big assumptions that this doesn’t add long term maintenance burdens or result in a reduction of skills that makes me worse at reviewing the output, it’s cool technology.

On par with switching to a memory managed language or maybe going from J2EE to Ruby on Rails.


Thinking in terms of a "speed up multiplier" undersells it completely. The speed up on a task I would have never even attempted is infinite. For my +3000 PR recently on my polymarket engine control plane, I had no idea how these types of things are typically done. It would have taken me many hours to think through an implementation and hours of research online to assemble an understanding on typical best practices. Now with AI I can dispatch many parallel agents to examine virtually all all public resources for this at once.

Basically if it's been done before in a public facing way, you get a passable version of that functionality "for free". That's a huge deal.


1. You think you have something following typical best practices. You have no way to verify that without taking the time to understand the problem and solution yourself.

2. If you’d done 1, you’d have the knowledge yourself next time the problem came up and could either write it yourself or skip the verifications step.

I’m not saying there aren’t problems out there where the problem is hard to solve but easy to verify. And for those use cases LLMs are terrific.

But many problems have the inverse property. And many problems that look like the first type are actually the second.

LLMs are also shockingly good at generating solutions that look plausible, independent of correctness or suitability, so it’s almost always harder to do the verification step than it seems.


The control plane is already operational and does what I need. Copying public designs solved a few problems I didn't even know I had (awkward command and control UX) and seems strictly superior to what I had before. I could have taken a lot longer on this - probably at least a week, to "deeply understand the problem and solution". But it's unclear what exactly that would have bought me. If I run into further issues I will just solve them at that time.

So what is the issue exactly? This pattern just seems like a looser form of using a library versus building from scratch.


For one I’d argue that you shouldn’t just use a library without understanding what it does and verifying it does what it says.

But a library has been used by multiple people who have verified that it does what it says it does as long as you pick something popular.

You have no idea what this code does. Maybe it has a huge security flaw? Or maybe it’s just riddled with bugs that you don’t know enough to expose.

Maybe it “follows best practices” that your agents uncovered or maybe it doesn’t.

If you expose customer data, or you fuck up in a way that costs customers money, the AI isn’t liable for that you are.

Now if this is just a toy app where no one can be harmed sure who cares.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: