I agree, LLMs definitely sand off a lot of personality, and you can see it in writing the most, at this point I'm sure tons of people are subconsciously trained to lower the trust for something where they recognize typical patterns.
With the code, especially interfaces, the results will be similar -- more standardized palettes, predictable things.
To be fair, the converging factor is going on pretty much forever, e.g. radio/TV led to the lots of local accents disappearing, our world is heavily globalized.
If the models don't get to the point where they can correct fixes on their own, then yeah, everything will be falling apart. There is just no other way around increasing entropy.
The only way to harness it is to somehow package code producing LLMs into an abstraction and then somehow validate the output. Until we achieve that, imo doesn't matter how closely people watch out the output, things will be getting worse.
> If the models don't get to the point where they can correct fixes on their own
Depending on what you're working on, they are already at that point. I'm not into any kind of AI maximalist "I don't read code" BS (I read a lot of code), but I've been building a fairly expensive web app to manage my business using Astro + React and I have yet to find any bug or usability issue that Claude Code can't fix much faster than I would have (+). I've been able to build out, in a month, a fully TDD app that would have conservatively taken me a year by myself.
(+) Except for making the UI beautiful. It's crap at that.
The key that made it click is exactly what the person describes here: using specs that describe the key architecture and use cases of each section. So I have docs/specs with files like layout.md (overall site shell info), ui-components.md, auth.md, database.md, data.md, and lots more for each section of functionality in the app. If I'm doing work that touches ui, I reference layout and ui-components so that the agent doesn't invent a custom button component. If I'm doing database work, reference database.md so that it knows we're using drizzle + libsql, etc.
This extends up to higher level components where the spec also briefly explains the actual goal.
Then each feature building session follows a pattern: brainstorm and create design doc + initial spec (updates or new files) -> write a technical plan clearly following TDD, designed for batches of parallel subagents to work on -> have Claude implement the technical plan -> manual testing (often, I'll identify problems and request changes here) -> automated testing (much stricter linting, knip etc. than I would use for myself) -> finally, update the spec docs again based on the actual work that was done.
My role is less about writing code and more about providing strict guardrails. The spec docs are an important part of that.
I think this is the logical next step -- instead of manually steering the model, just rely on the acceptance criteria and some E2E test suite (that part is tricky since you need to verify that part).
I personally think we are not that far from it, but it will need something built on top of current CLI tools.
All your thoughts are and experiences are real and pretty unique in some ways. However, the circumstances are usually well-defined and expected (our life is generally very standardized), so the responses can be generalized successfully.
You can see it here as well -- discussions under similar topics often touch the same topics again and again, so you can predict what will be discussed when the next similar idea comes to the front page.
So what if we are quite predictable. That doesn't mean we are "trying" to predict the next word, or "trying" to be predictable, which is what llms are doing.
Over a large population, trends emerge. An LLM is not a member of the population, it is a replicator of trends in a population, not a population of souls but of sentences, a corpus.
I personally never seen anything as bad as XCode, but granted, I haven't used really old IDEs (always preferred just editors). Last year I built a few small apps using both XCode/Swift and Visual Studio/C# and using MS stack felt like you are living in the future, despite WinUI having a very dated UI syntax.
> I don't know what jobs have been impacted yet, but there will likely be pressure for all content creators and knowledge workers to use the tools to get more work done.
You claimed that it already happened to illustrators and artists, and while I am sure they use it one way or another, I don't think it transformed the industry. Now, I am not saying that it won't amount to anything in software, I just don't think it is ready as of right now outside of greenfield projects, mostly because the scope is limited.
I am pretty positive that at some point we'll have a tool which will automate the generation -> code review -> fixing (multiple loops) -> releasing without people. Currently people are the bottleneck and imo a better way is to exclude people completely outside of initial problem statement and accepting the result. Otherwise it is just too janky, that 10x comes with a huge asterisk that can unironically slow you down after all said and done.
I think fundamentally this approach is flawed for anything more complex than a simple endpoint. AI is already really good for throwaway code, that is very clear, it is also decent if you watch it like a hawk.
However, the complexity is still not handled super well, as you need to spend more time in code review and testing to make sure all edge cases are covered and the general module interconnection is decent. Ideally we want to modularize and make the breaking surface very small, but often it is not possible.
I think the next step is to fully remove people as accepting changes manually is just too brittle; I also think it is probably possible to do with the current tools but needs a very different approach from the current meta of highly specific docs.
It's still not JS-level/JS-compatible GC (yet?) and it is still quite low level (more about corralling buffers of bytes than objects, a bit closer to OS-level page management than JS or C# level GC), as it is intended to be lower level than most languages need so that different languages can build different things with it. It is also a tiny stepping stone to better memory sharing with JS APIs (and the eventual goal of WASM "direct DOM"), but still not quite finished on that front as more steps remain.
The summary is often incorrect in at least some subtle details, which is invisible to a lot of people who do not understand LLM limitations.
Now, we can argue that a typical SEO-optimized garbage article is not better, but I feel like the trust score for them was lower on average from a typical person.
Marketing departments are already speaking of GEO - generative engine optimization. When a user asks an AI for the best X, you want it to say your X is the best, and you'll do whatever it takes to achieve that.
But in a lot of cases you can't know all the dependencies, so you lean on the community trusting that a package solves the problem well enough that you can abstract it.
You can pin the dependency and review the changes for security reasons, but fully grasping the logic is non-trivial.
Smaller dependencies are fine to copy at first, but at some point the codebase becomes too big, so you abstract it and at that point it becomes a self-maintained dependency. Which is a fair decision, but it is all about tradeoffs and sometimes too costly.
You'd get those benefits from traditional dependencies if you copy them in and never update. Is an AI dependency going to have the equivalent of "upstream fixes"?
With the code, especially interfaces, the results will be similar -- more standardized palettes, predictable things.
To be fair, the converging factor is going on pretty much forever, e.g. radio/TV led to the lots of local accents disappearing, our world is heavily globalized.
reply