Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Try it on a million line code base where it's not so cut and dry to even determine if the code is running correctly or what correctly means when it changes day to day.


"A tool is only useful if I can use it in every situation".

LLM's don't need to find every bug in your code - even if they found an additional 10% of genuine bugs compared to existing tools, it's still a pretty big improvement to code analysis.

In reality, I suspect the scope is much higher than 10%.


If it takes you longer to vet hallucinations than to just test your code better, is it an improvement? If you accept a bug fix for a hallucination that you got too lazy to check because you grew dependent on AI to do the analysis for you, and the bug "fix" itself causes other unforeseen issues or fails to recognize why an exception in this case might be worth preserving, is it really an improvement?


What if it takes you longer to vet false positives from a static analysis tool rather than just testing your code better?


What if indeed. Most static analyses tools (disclaimer: anecdotal) have very little false positives these days. This may be much worse in C/C++ land though, I don't know.


Is it better or worse than a human, though?


It’s slightly worse than a junior developer, and just as confidently incorrect, but much faster to iterate.

Either is better than no assistant at all. With circumstantial caveats.


Sounds like it will go far!


I would imagine worse, because a human has a much, much, much larger context size.


But also a much much shorter attention span and tolerance for BS.

If you ask the LLM to analyze those 1000000 lines 1000 at a time, 1000 times, it’ll do it, with the same diligence and attention to detail across all 1000 pages.

Ask a human to do it and their patience will be tested. Their focus will waver, they’ll grow used to patterns and miss anomalies, and they’ll probably skip chunks that look fine at first glance.

Sure the LLM won’t find big picture issues at that scale. But it’ll find plenty of code smells and minor logic errors that deserve a second look.


Ok, why don't you run this experiment on a large public open source code base? We should be drowning in valuable bug reports right now but all I hear is hype.


While true, on the other hand an AI is a tool, and can have a much larger context size, and it can apply all of that at once. It also isn't limited by availability or time constraints, i.e. if you have only one developer that can do a review, and the tooling or AI can catch 90% of what that developer would catch.


I've separated 5000 line class into smaller domains yesterday. It didn't provide end solution, it wasn't perfect, but gave me a good plan where to place what.

Once it is capable to process larger context windows it will become impossible to ignore.


You can’t, it has a context size window of 8192 tokens. That’s like 1000 lines depending on programming language




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: