The author headline starts with "LLMs are a failure", hard to take author seriously with such a hyperbole even if second part of headline ("A new AI winter is coming") might be right.
But it can work well even if you don't know what you are doing (or don't look at the impl).
For example, build a TUI or GUI with Claude Code while only giving it feedback on the UX/QA side. I've done it many times despite 20 years of software experience. -- Some stuff just doesn't justify me spending my time credentializing in the impl.
Hallucinations that lead to code that doesn't work just get fixed. Most code I write isn't like "now write an accurate technical essay about hamsters" where hallucinations can sneak through lest I scrutinize it; rather the code would just fail to work and trigger the LLM's feedback loop to fix it when it tries to run/lint/compile/typecheck it.
But the idea that you can only build with LLMs if you have a software engineer copilot isn't true and inches further away from true every month, so it kinda sounds like a convenient lie we tell ourselves as engineers (and understandably so: it's scary).
> Hallucinations that lead to code that doesn't work just get fixed
How about hallucinations that lead to code that doesn't work outside of the specific conditions that happen to be true in your dev environment? Or, even more subtly, hallucinations that lead to code which works but has critical security vulnerabilities?
Replace "hallucination" with "oversight" or "ignorance" and you have the same issue when a human writes the code.
A lot of that will come to the prompter's own foresight much like the vigilance of a beginner developer where they know they are working on a part of the system that is particularly sensitive to get right.
That said, only a subset of software needs an authentication solution or has zero tolerance to some codepath having a bug. Those don't apply to almost all of the apps/TUIs/GUIs I've built over the last few months.
If you have to restrict the domain to those cases for LLMs to be "disastrous", then I'll grant that for this convo.
> A lot of that will come to the prompter's own foresight
And, on the current trend, how on earth are prompters supposed to develop this foresight, this expertise, this knowledge?
Sure, fine, we have them now, in the form of experienced devs, but these people will eventually be lost via attrition, last even faster if companies actually do make good on their threat to replace a team of 10 devs with a team of three prompters (former senior devs).
The short-sightedness of this, the ironic lack of foresight, is troubling. You're talking about shutting off the pipeline that will produce these future prompters.
The only way through, I think, will be if (very big if) the LLMs get so much better at coding (not code-gen) that you won't need a skilled prompter.
Far more is being done than simply throwing more GPU's at the problem.
GPT-5 required less compute to train than GPT-4.5. Data, RL, architectural improvements, etc. all contribute to the rate of improvement we're seeing now.
Because intelligence is so much more than stochastically repeating stuff you've been trained on.
It needs to learn new information, create novel connections, be creative.. We are utterly clueless as to how the brain works and how intelligence is created.
We just took one cell, a neuron, made the simplest possible model of it, made some copies of it and you think it will suddenly spark into life by throwing GPUs at it ?
Nope. Can't learn anything after the training data, only within the very narrow context window.
Any novel connections are through randomness, hence hallucinations instead of useful connections with background knowledge of involved systems or concepts.
About creativity, see my previous point. If I spit out words that go next to eachother, it won't be creativity. Creativity implies a goal, a purpose, or sometimes by chance, but utilising systematic thinking with understanding of the world.
I was considering refuting this point by point, but it seems your mind is already made up.
I feel that many people who deny the current utility and abilities of large-language models will continue to do so far after they've exceeded human intelligence, because the perception that they are fundamentally limited, regardless of whether they actually are or if their criticisms make any sense, is necessary for some load-bearing part of their sanity.
If AGI is built from LLMs, how could we trust it? It's going to "hallucinate", so I'm not sure that this AGI future people are clamoring for is going to really be all that good if it is built on LLMs.
Humans who repeatedly deny LLM capabilities despite the numerous milestones they've surpassed seem more like stochastic parrots.
The same arguments are always brought up, often short pithy one-liners without much clarification. It seems silly that despite this argument first emerging when LLM's could barely write functional code, now that LLM's have reached gold-medal performance on the IMO, it is still being made with little interrogation into its potential faults, or clarification on the precise boundary of intelligence LLM's will never be able to cross.
> Claim: gpt-5-pro can prove new interesting mathematics.
>Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper, and I checked the proof it's correct.
I think the original commenter meant that the LLM can't be called wrong because the concept requires understanding. However, I think it would be fine to call the LLM's response incorrect.
That's not what Occam's razor means. It means that after you have exhausted all options to rule out competing hypotheses, you choose the simplest one that remains, for the time being.
Consider some explanations that are consistent with the evidence presented so far. And remember that the purpose of the investigation is to come up with actionable conclusions.
1. One of the pilots randomly flipped and crashed the plane for no reason. In this case, nothing can be done. It could have happened to anyone at any time, and we were extraordinarily unlucky that the person in question was in position to inflict massive casualties.
2. Something was not right with one of the pilots, the airline failed to notice it, and the pilot decided to commit a murder-suicide. If this was the case, signs of the situation were probably present, and changes in operating procedures may help to avoid similar future accidents.
3. One of the pilots accidentally switched the engines off. The controls are designed to prevent that, but it's possible that improper training taught the pilot to override the safeties instinctively. In this case, changes to training and/or cockpit design could prevent similar accidents in the future.
Because further investigation may shed light on hypotheses 2 and 3, it's premature to make conclusions.
Extremely unlikely, since we can hear the other pilot ask why he turned the fuel switches. If it was an electrical glitch, he wouldn’t be able to see that they are in the cutoff position.
All we know is the pilot flying is only asking whether the pilot monitoring if he cut off
- We don't know if he meant the switch specifically at all. He could also have meant engines or thrust in general. There are many other visual signals and UX indicators to know if engines are spinning down. Thrust levels, to RPM to falling speed, change in angle of attack, rate of climb, even engine noise, vibrations you expect at full thrust etc.
- We also don't know if the switch was physically in cut off position in the first place or even if was the pilot noticed that specific visual signal and meant that when he spoke.
If it was a software issue, it is possible the switch was properly positioned, and software issue cut engine was cutoff, the display screens and other lights would show that.
In such a scenario, the pilot(s) would have likely checked with each other first if they did something as in the audio and manually tried restarting the engine as they seem to have done.
I am not saying it is a bug or any specific fault scenario, Just that it too early, we don't yet have enough information to say what is likely at all.
I think there are a couple of factors that disprove these theories:
- The specific mention of "cut off" in the CVR is very telling. If both pilots were genuinely surprised, you'd expect they'd say something like "engine failure" or "loss of thrust" first. Noone thinks the engines have been shut down as a knee-jerk reaction to a sudden loss of thrust.
- If investigators had the slightest indication there's a software or hardware bug out there that randomly causes dual engine failures, an emergency airworthiness directive would have been issued by now. This hasn't happened.
> an emergency airworthiness directive would have been issued by now. This hasn't happened.
737 Max incidents proved it isn’t always the case.
This is also not the NTSB or FAA doing direct investigation . Without certainty no one is issuing a directive, at this stage it is simply too early and only a possibility
I wouldn’t read so much intent during high stress part of takeoff from two non native speakers
> 737 Max incidents proved it isn’t always the case.
> This is also not the NTSB or FAA doing direct investigation . Without certainty no one is issuing a directive, at this stage it is simply too early and only a possibility
You are mistaken. The first MAX crash resulted in emergency directives being issued barely a week after the crash. That investigation was conducted by the Indonesian authorities, not US ones.
Emergency directives aren't issued when there's complete certainty, quite the opposite. Hence the "emergency" bit.
> I wouldn’t read so much intent during high stress part of takeoff from two non native speakers
I agree there's some fuzziness since the exact transcription wasn't provided. But "why did you cut out the engines" is by no means a normal question when facing sudden thrust loss.
Respectfully, media reports on what the investigation is focusing on should be taken with a grain of salt unless said media is known to be reputable and have credible sources.
If they had a credible indication of a technical failure that causes engines to randomly shut down, they would have already grounded 787 fleets, which hasn't happened.
I keep reading "muscle memory" but the theory that one pilot shut down the engines instead of performing another action has nothing to do with muscle memory.
Muscle memory allows you to perform both actions effectively but doesn't make you confuse them. Especially when the corresponding sequence of callouts and actions is practiced and repeated over and over.
All of us have muscle memory for activating the left blinker in our car and pulling the handbrake, but has anyone pulled the handbrake when they wanted to signal left?
reply