> And it doesn't freak you out that you're relying on thousands of lines of code that you've never looked at?
I was a product manager for 15 years. I helped sell products to customers who paid thousands or millions of dollars for them. I never looked at the code. Customers never looked at the code. The overwhelming majority of people in the world are constantly relying on code they've never looked at. It's mostly fine.
> How do you verify the end result?
That's the better question, and the answer is a few things. First, when it makes changes to my ad accounts, I spot check them in the UI. Second, I look at ad reporting pretty often, since it's a core part of running my business. If there were suddenly some enormous spike in spend, it wouldn't take me long to catch it.
It's thousands of lines of variation on my own hand-tooling, run through tests I designed, automated by the sort of onboarding docs I should have been writing years ago.
Why wouldn't you test? That sounds like a bad thing.
Me? I use AI to write tests just as I use it to write everything else. I pay a lot of attention to what's being done including code quality but I am no more insecure about trusting those thousands of tested lines than I am about trusting the byte code generated from the 'strings of code'.
We have just moved up another level of abstraction, as we have done many times before. It will take time to perfect but it's already amazing.
So they don't know if it has the right behavior to begin with, or even if the tests are testing the right behavior.
This is what people are talking about. This is why nobody responsible wants to uberscale a serious app this way. It's ridiculous to see so much hype in this thread, people claiming they've built entire businesses without looking at any code. Keep your business away from me, then.
I've been doing agentic work for companies for the past year and first of all, error rates have dropped to 1-2% with the leading Q3 and Q4 models... 2026's Q1 models blowing those out the water and being cheaper in some way
but second of all, even when error rates were 20%, the time savings still meant A Viable Business. a much more viable business actually, a scarily crazy viable business with many annoyed customers getting slop of some sort, with a human in the loop correcting things from the LLM before it went out to consumers
agentic LLM coders are better than your co-workers. they can also write tests. they can do stress testing, load testing, end to end testing, and in my experience that's not even what course corrects LLMs that well, so we shouldn't even be trying to replicate processes made for humans with them. like a human, the LLM is prone to just correct the test as the test uses a deprecated assumption as opposed to product changes breaking a test to reveal a regression.
in my experience, type errors, compiler errors, logs on deployment and database entries have made the LLM correct its approach more than tests. Devops and Data science, more than QA.
Do you trust the assembly your compiler puts out? The machine code your assembler puts out? The virtual machine it runs on? Thousands of lines of code you've never looked at...
We agree then that you can verify, test, and trust the deterministic code an LLM produces without ever looking at it.
> That's one reason we test
That's one way we can trust and verify code produced by an LLM. You can't stop doing all the other things that aren't coding.
I get there's a difference. Shitty code can be produced by LLMs or humans. LLMs really can pump out the shitty code. I just think the argument that you cant trust code you haven't viewed is not a good argument. I very much trust a lot of code I've never seen, and yes I've been bitten by it too.
Not trying to be an ass, more trying to figure out how im going to deal for the next decade before retirement age. Uts going to be a lot of testing and verification I guess
The compiler works without an internet connection and requires too little resources to be secretly running a local model. (Also, you can’t inspect the source code.)
> You know humans can hallucinate?
We are talking about compilers…
> We agree then that you can verify, test, and trust the deterministic code an LLM produces without ever looking at it.
Unlike a compiler, an LLM does not produce code in a deterministic way, so it’s not guaranteed to do what the input tells it to.
Compiler theory and implementation is based on mathematical and logic principles. And hence much more provable and trustworthy than a LLM thats stitching together pieces of text based on ‘training’
Also you really do have to know how the underlying assembly integer operations work or you can get yourself into a world of hurt. Do they not still teach that in CS classes?
I wouldn't trust thousands of lines of code from one of my co-workers without testing