1. "Vibe coding" is a spectrum of how much human supervision (and/or scaffolding in the form of human-written tests and/or specs) is involved.
2. The problem with "bad code" has nothing to do with the short-term success of the product but with the ability to evolve it successfully over time. In other words, it's about long-term success, not short-term success.
3. Perhaps most importantly, Claude Code is a fairly simple product at its core, and almost all its value comes from the model, not from its own code (and the same is true on the cost side). Claude Code is relatively a low stakes product. This means that the problems caused by bad code matter less in this instance, and they're managed further by Claude Code not being at the extreme "vibey" end of the spectrum.
So AI aside, Claude Code is proof that if you pour years and many billions into a product, it can be a success even if the code in the narrow and small UI layer isn't great.
There's this definition of LLM generation + "no thorough review or testing"
And there's the more normative one: just LLM generation.[1][2][3]
"Not even looking at it" is very difficult as part of a definition. What if you look at it once? Or just glance at it? Is it now no longer vibe coding? What if I read a diff every ten commits? Or look at the code when something breaks?
At which point is it no longer vibe coding according to this narrower definition?
If you do not know the code at all, and are going off of "vibes", it's vibecoding. If you can get a deep sense of what is going on in the code based off of looking at a diff every ten commits, then that's not vibe coding (I, myself, are unable to get a sense from that little of a look).
If you actually look at the code and understand it and you'd stand by it, then it's not vibecode. If you had an LLM shit it out in 20 minutes and you don't really know what going on, it's vibecode. Which, to me, is not derogatory. I have a bunch of stuff I've vibecoded and a bunch of stuff that I've actually read the code and fixed it, either by hand or with LLM assistance. And ofc, all the code that was written by me prior to ChatGPT's launch.
You're repeating the broader definition, great. But your post leaves me with the same question about degrees.
You say there's two cases: no review and full review, "deep sense of the code", and that one is vibe coding and one is not.
What about the degrees in between? At what point does vibe coding become something else?
For example, I would not say "looking at the diffs" to ever be enough review to get a deep sense of what's been done. You need to look at diagrams and systematically presented output to understand any complex system.
Is one person's vibe coding then another persons deep understanding non-vine coding?
If you can answer this question you may be able to convince me.
You're right that it's a spectrum. Just like anything else, you can be 'mostly' vibe coding or 'somewhat' vibe coding. But the threshold where it stops being vibe coding isn't entirely subjective.
If you are trusting the AI's logic and primarily verifying the output (the app runs, the button works), you are vibe coding. If you are reading the diffs, verifying the architecture, you are transitioning back toward engineering. Any sincere developer knows where they are sitting on that spectrum.
You say the threshold is not entirely subjective, but then you describe a subjective (you just know it) and ambiguous (transitioning back toward engineering) threshold.
Sure seems to me like it's subjective.
Also, I've nedlver ever heard so much talk about "verifying architecture" as when people talk about vibe coding.
That's not something you usually do. The architecture is the overall structure of a design, and has to be elaborated into functional designs and interface contracts before you have something you can verify in actual code. The architecture itself is very much an intangible thing. "Verifying architecture" in diffs is nonsense, and is definitely not engineering.
Hm you could do like five degrees of vibecoding. Level one. You laboriously still look at the code and the diffs being generated.
Level two: You sometimes look at the code being generated. You have a feel for how the classes are architected together but don't know the details. Level 4. You're aware of the classes and files in use, but beyond that, you have no idea what's going on.
Level 5. You just spit stuff at the LLM and have it shit out code that you have zero clue what it's doing. You don't even know if you're using react or not!
It's a bit absurd that a semantic debate is happening over a term coined in someone's shower thought tweet. Maybe the real problem is that it's just a stupid phrase that should never have been taken so seriously. But here we are...
I think it's perfectly serviceable. Prompting software into existance is a vibes-based activity, and it's completely at odds with engineering. Which is why it's good that there's a term that conveys this.
1 is definitely false right now. I gave specs, tests, full datasets, reference code to translate to an llm and still produce garbage code/fall flat on it's face. I just spent one week translating a codebase from go to cpp and i had to throw the whole thing out because it put in some horrible bugs that it could not fix even burning 500$ worth of tokens and me babysitting it. As i said it had everything at it's disposal: tests, reference impl, lots of data to work with. I finally got my lazy ass to inplement it and lo and behold i did it in 2 days with no bugs (that i know of) and the code quality is miles better than that undigested vomit. The codebase was a protocol library for decoding network traffic that used a lot of bit twiddling, flow control, huffman table compression, mildly complicated stuff. So no - if you want working non-trivial code that you can rely on then definitely don't use a llm to do it. Use it for autocomplete, small bits of code but never let the damn thing do the thinking for you.
Oh, I agree. Anthropic themselves proved that even with a full spec and thousands of human-crafted tests, unsupervised agents couldn't produce even something as relatively simple as a workable C compiler, even when the model was trained on the spec, tests, the theory, a reference implementation, and even when given a reference implementation as an oracle.
But my point was that I don't think the development of Claude Code itself isn't supervised, hence it's not really "vibe coded".
2. The problem with "bad code" has nothing to do with the short-term success of the product but with the ability to evolve it successfully over time. In other words, it's about long-term success, not short-term success.
3. Perhaps most importantly, Claude Code is a fairly simple product at its core, and almost all its value comes from the model, not from its own code (and the same is true on the cost side). Claude Code is relatively a low stakes product. This means that the problems caused by bad code matter less in this instance, and they're managed further by Claude Code not being at the extreme "vibey" end of the spectrum.
So AI aside, Claude Code is proof that if you pour years and many billions into a product, it can be a success even if the code in the narrow and small UI layer isn't great.