All LLMs hit a ceiling of complexity beyond which they cease to understand the code base. Greenfield projects show this particularly well, because the LLM works until it doesn’t. There are lots of greenfield projects that should exist! And lots of ways that someone can manage context and their own understanding of code to push the LLM further than its current limits, although not indefinitely far.
I just vibecoded 2 silly projects over the weekend to test this for the first time (prior all my AI coding is at the function level, which has been enormously beneficial).
First app was to scrape some data using a browser. It did an excellent job here, went down one wrong path that it obsessed over (it was a good idea in theory and should have worked) and in the end produced a fully-working tool that exceeded my requirements and the UI looked way more polished than I would have bothered with for a tool I wrote for me only.
Second app is a DHT crawler. It has gone down so many dead ends with this thing. The docs for the torrent tools don't match the code, I guess, so it gets horribly confused (so do GPT, Grok, Claude, Gemini). Still not working 100% and I've wasted way more time than it probably would have taken to learn the protocols and write it from scratch.
The main issue is -- I have no idea what the code looks like or really how it works. When I write code I almost always have a complete mental map of the entire codebase and where all the functions are in which files. I literally know none of that. I've tried opening the code on the DHT app and it is mentally exhausted. I nope out and just go back to the agent window and try poking it instead, which is a huge time waster.
So, mixed feelings on this. The scraper app saved me a bunch of time, but it was mostly a straightforward project. The DHT app was more complicated and it broke the system in a bunch of ways.
> All LLMs hit a ceiling of complexity beyond which they cease to understand the code base
That's a sign that you need to refactor/rearchitect/better-modularize the initial code. My experience is that with no existing code patterns to follow, the LLM will generate a sprawl that isn't particularly cohesive. That's fine for prototyping, but when the complexity of the code gets too much for its context, taking a day or so to reorganize everything more cleanly pays off, because it will allow it to make assumptions about how particular parts of the code work without actually having to read it.
The greenfield phase feels amazing and if it stopped there I would have been really happy. This is where LLMs are really good. It's also a lot more fun than glueing tools together so I can see more people getting into it. It's not something one does profesionnaly though. It's more like an extra. I guess we'll have to see how far they can push the ceiling...
Every 6 months since chatgpt launched, everyone keeps telling me that LLMs are going to be amazing in a year from now and they'll replace programmers, just you wait
They're getting better, but a lot of the improvement was driven by increases in the training data. These models have now consumed literally all available information on the planet - where do they go from here?
The "time to amazingness" is falling quickly, though. It used to be "just a few years" a few years ago, and has been steady around 6 months for the last year or so.
I'm waiting for the day when every comment session on the internet will be full of people predicting AGI tomorrow.
As far as I understand, coding ability of AIs is now driven mostly entirely by RL, as well synthetic data generated by inference time compute combined with code execution tool use.
Coding is arguably the single thing least affected by a shortage of training data.
We're still in the very early steps of this new cycle of AI coding advancements.
Yeah... There are improvements to be made by increasing the context window and having agents reference documentation more. Half the issues I see are with agents just doing their own thing instead of following established best practices they could/should be referencing in a codebase or looking up documentation.
Which, believe it or not, is the same issue I see in my own code.
Give the LLM access to a VM with a compiler and have it generate code for it to train on. They're great at next.js but not as good with swift. So have it generate a million swift programs, along with tests to verify they actually work, and add that to the private training data set.
As you can see from the other commenters on here, any perceived limitation is no longer the fault of the LLM. So where we go from here is gaslighting. Never mind that the LLM should be good at refactoring, you need to keep doing that for it until it works you see. Or the classic you’re prompting it wrong, etc.
Let's hope you are right, and that the limitations of AI will never improve, and that all of us get to live out our full natural lifespans without AI xrisk
The fundamental question is "will the LLM get better before your vibecoded codebase becomes unmaintainable, or you need a feature that is beyond the LLM's ceiling". It's an interesting race.