I think PG said something about sitting down and hacking being how you understand the problem, and it’s right. You can write UML after you’ve got your head round it, but the feedback loop when hacking is essential.
yes, they must be killing it hundreds of times per day, maybe its time for 'please rewrite opencode, but dont touch anything, you can only use `cp`' kind of prompt
Interesting, why impossible? We studied compiler construction at uni. I might have to dig out a few books, but I’m confident I could write one. I can’t imagine anyone on my course of 120 nerds being unable to do this.
You are underestimating the complexity of the task so do other people on the thread. It's not trivial to implement a working C compiler very much so to implement the one that proves its worth by successfully compiling one of the largest open-source code repositories ever, which btw is not even a plain ISO C dialect.
You thought your course mates would be able to write a C compiler that builds the Linux?
Huh. Interesting. Like the other guy pointed out, compiler classes often get students to write toy C compilers. I think a lot of students don't understand the meaning of the word "toy". I think this thread is FULL of people like that.
If it helps, I did a PhD in computer science and went to plenty of seminars on languages, fuzz testing compilers, reviewed for conferences like PLDI. I’m not an expert but I think I know enough to say - this is conceptually within reach if a PITA.
I took a compilers course 30 years ago. I have near zero confidence anyone (including myself) could do it. The final project was some sort of toy language for programming robots with an API we were given. Lots of yacc, bison, etc.
Hey! I built a Lego technic car once 20 years ago. I am fully confident that I can build an actual road worthy electric vehicle. It's just a couple of edge cases and a bit bigger right? /s
That's really helpful, actually, as you may be able to give me some other ideas for projects.
So, things you don't think I or my coursemates could do include writing a C compiler that builds a Linux kernel.
What else do you think we couldn't do? I ask because there are various projects I'll probably get to at some point.
Things on that list include (a) writing an OS microkernel and some of the other components of an OS. Don't know how far I'll take it, but certainly a working microkernel for one machine, if I have time I'll build most of the stack up to a window manager. (b) implementing an LLM training and inference stack. I don't know how close to the metal I'd go, I've done some low level CUDA a long time ago when it was very new and low-level, depends on time. I'll probably start the LLM stuff pretty soon as I'm keen to learn.
Are these also impossible? What other things would you add to the impossible list?
Building a microkernel based OS feels feasible because it’s actually quite open ended. An “OS” could be anything from single user DOS to a full blown Unix implementation, with plenty in between.
Amiga OS is basically a microkernel and that was built 40 years ago. There are also many other examples, like Minix. Do I think most people could build a full microkernel based mini Unix? No. But they could get “something” working that would qualify as an OS.
On the other hand, there are not many C compilers that build Linux. There are many implementations of C compilers, however. The goal of “build Linux” is much more specific.
Point 1 is saying results may not generalise, which is not a counter claim. It’s just saying “we cannot speak for everyone”.
Point 4 is saying there may be other techniques that work better, which again is not a counter claim. It’s just saying “you may find bette methods.”
Those are standard scientific statements giving scope to the research. They are in no way contradicting their findings. To contradict their findings, you would need similarly rigorous work that perhaps fell into those scenarios.
Not pushing an opinion here, but if we’re talking about research then we should be rigorous and rationale by posting counter evidence. Anyone who has done serious research in software engineering knows the difficulties involved and that this study represents one set of data. But it is at least a rigorous set and not anecdata or marketing.
I for one would love a rigorous study that showed a reliable methodology for gaining generalised productivity gains with the same or better code quality.
Reasoning by analogy is usually a bad idea, and nowhere is this worse than talking about software development.
It’s just not analogous to architecture, or cooking, or engineering. Software development is just its own thing. So you can’t use analogy to get yourself anywhere with a hint of rigour.
The problem is, AI is generating code that may be buggy, insecure, and unmaintainable. We have as a community spent decades trying to avoid producing that kind of code. And now we are being told that productivity gains mean we should abandon those goals and accept poor quality, as evidenced by MoltBook’s security problems.
It’s a weird cognitive dissonance and it’s still not clear how this gets resolved.
Now then, Moltbook is a pathological case. Either it remains a pathological case or our whole technological world is gonna stumble HARD as all the fundamental things collapse.
I prefer to think Moltbook is a pathological case and unrepresentative, but I've also been rethinking a sort of game idea from computer-based to entirely paper/card based (tariffs be damned) specifically for this reason. I wish to make things that people will have even in the event that all these nice blinky screens are ruined and go dark.
Just the first system that was coded by AI could think of. Note this is unrelated to the fact that its users are LLMs - the problem was in the development of Moltbook itself.
I'm not arguing that LLMs are at a point today where we can blindly trust their outputs in most applications, I just don't think that 100% correct output is necessarily a requirement for that. What it needs to be is correct often enough that the cost of reviewing the output far outweighs the average cost of any errors in the output, just like with a compiler.
This even applies to human written code and human mistakes, as the expected cost of errors goes up we spend more time on having multiple people review the code and we worry more about carefully designing tests.
If natural language is used to specify work to the LLM, how can the output ever be trusted? You'll always need to make sure the program does what you want, rather than what you said.
>"You'll always need to make sure the program does what you want, rather than what you said."
Yes, making sure the program does what you want. Which is already part of the existing software development life cycle. Just as using natural language to specify work already is: It's where things start and return to over and over throughout any project. Further: LLM's frequently understand what I want better than other developers. Sure, lots of times they don't. But they're a lot better at it than they were 6 months ago, and a year ago they barely did so at all save for scripts of a few dozen lines.
That's exactly my point, it's a nice tool in the toolbox, but for most tasks it's not fire-and-forget. You still have to do all the same verification you'd need to do with human written code.
Just create a very specific and very detailed prompt that is so specific that it starts including instructions and you came up with the most expensive programming language.
You trust your natural language instructions thousand times a day. If you ask for a large black coffee, you can trust that is more or less what you’ll get. Occasionally you may get something so atrocious that you don’t dare to drink, but generally speaking you trust the coffee shop knows what you want. It you insist on a specific amount of coffee brewed at a specific temperature, however, you need tools to measure.
AI tools are similar. You can trust them because they are good enough, and you need a way (testing) to make sure what is produced meet your specific requirements. Of course they may fail for you, doesn’t mean they aren’t useful in other cases.
What’s to stop the barista putting sulphuric acid in your coffee? Well, mainly they don’t because they need a job and don’t want to go to prison. AIs don’t go to prison, so you’re hoping they won’t do it because you’ve promoted them well enough.
The person I'm replying to believes that there will be a point when you no longer need to test (or review) the output of LLMs, similar to how you don't think about the generated asm/bytecode/etc of a compiler.
That's what I disagree with - everything you said is obviously true, but I don't see how it's related to the discussion.
I don't necessarily think we'll ever reach that point and I'm pretty sure we'll never reach that point for some higher risk applications due to natural language being ambiguous.
There are however some applications where ambiguity is fine. For example, I might have a recipe website where I tell a LLM to "add a slider for the user to scale the number of servings". There's a ton of ambiguity there but if you don't care about the exact details then I can see a future where LLMs do something reasonable 99.9999% of the time and no one does more than glance at it and say it looks fine.
How long it is until we reach that point and if we'll ever reach that point is of course still up for debate, but I dnt think it's completely unrealistic.
The challenge not addressed with this line of reasoning is the required sheer scale of output validation on the backend of LLM-generated code. Human hand-developed code was no great shakes at the validation front either, but the scale difference hid this problem.
I’m hopeful what used to be tedious about the software development process (like correctness proving or documentation) becomes tractable enough with LLM’s to make the scale more manageable for us. That’s exciting to contemplate; think of the complexity categories we can feasibly challenge now!
If the author is here, please could you also confirm you’ve never been paid by any AI company, marketing representative, community programme, in any shape or form?
He explicitly said "I don't work for, invest in, or advise any AI companies." in the article.
But yes, Hashimoto is a high profile CEO/CTO who may well have an indirect, or near-future interest in talking up AI. HN articles extoling the productivity gains of Claude on HN do generally tend to be from older, managerial types (make of that what you will).
Probably exhausting to be that way. The author is well respected and well known and has a good track record. My immediate reaction wasn’t to question that he spoke in good faith.
I don’t know the author, and am suspicious of the amount of astroturfing that has gone on with AI. This article seems reasonable so I looked for a disclaimed and found it oddly worded, hence the request for clarification.
What evidence do you have for that? Your point about Saudi is literally mentioned by the parent as one of the few negative points.
I’m not saying this is how it will play out, but this reads as lazy cynicism - which is a self-realising attitude and something I really don’t admire about our nerd culture. We should be aiming higher.
I think there’s also the classic “I can build zoom in a day” - they get video working between two machines. But it’s the last 80% of the app that takes 99% of the time. Nerds don’t see the whole product, just the happy path of a wee technical challenge.
I would guess, and it is a guess, that there are two reasons apple is “behind” in AI. First, they have nowhere near the talent pool or capability in this area. They’re not a technical research lab. For the same reason you don’t expect apple to win the quantum race, they will not lead on AI. Second, AI is a half baked product right now and apple try to ship products that properly work. Even Vision Pro is remarkably polished for a first version. AI on the other hand is likely to suffer catastrophic security problems, embarrassing behaviour, distinctly family-unfriendly output.
Apple probably realised they were hugely behind and then spent time hand wringing over whether they remained cautious or got into the brawl. And they decided to watch from the sidelines, buy in some tech, and see how it develops.
So far that looks entirely reasonable as a decision. If Claude wins, for example, apple need only be sure Claude tools work on Mac to avoid losing users, and they can second-move once things are not so chaotic.
I think PG said something about sitting down and hacking being how you understand the problem, and it’s right. You can write UML after you’ve got your head round it, but the feedback loop when hacking is essential.
reply