More

agubelu · 2025-12-02T08:20:51 1764663651

How do you measure whether some output corresponds to 8 hours of work, and not 4 or 16 hours?

jakupovic · 2025-12-02T10:52:23 1764672743

He doesn't known what he is talking about. Bunch of wannabe founders waxing BS. If you want 8 hours of guaranteed output use a bot

agubelu · 2025-12-01T17:07:17 1764608837

Isn't that what the author means?

"it still requires genuine expertise to spot the hallucinations"

"works very well if you do know what you are doing"

pzo · 2025-12-01T17:18:34 1764609514

The author headline starts with "LLMs are a failure", hard to take author seriously with such a hyperbole even if second part of headline ("A new AI winter is coming") might be right.

hombre_fatal · 2025-12-01T17:16:41 1764609401

But it can work well even if you don't know what you are doing (or don't look at the impl).

For example, build a TUI or GUI with Claude Code while only giving it feedback on the UX/QA side. I've done it many times despite 20 years of software experience. -- Some stuff just doesn't justify me spending my time credentializing in the impl.

Hallucinations that lead to code that doesn't work just get fixed. Most code I write isn't like "now write an accurate technical essay about hamsters" where hallucinations can sneak through lest I scrutinize it; rather the code would just fail to work and trigger the LLM's feedback loop to fix it when it tries to run/lint/compile/typecheck it.

But the idea that you can only build with LLMs if you have a software engineer copilot isn't true and inches further away from true every month, so it kinda sounds like a convenient lie we tell ourselves as engineers (and understandably so: it's scary).

int_19h · 2025-12-01T21:00:34 1764622834

> Hallucinations that lead to code that doesn't work just get fixed

How about hallucinations that lead to code that doesn't work outside of the specific conditions that happen to be true in your dev environment? Or, even more subtly, hallucinations that lead to code which works but has critical security vulnerabilities?

hombre_fatal · 2025-12-01T23:36:16 1764632176

Replace "hallucination" with "oversight" or "ignorance" and you have the same issue when a human writes the code.

A lot of that will come to the prompter's own foresight much like the vigilance of a beginner developer where they know they are working on a part of the system that is particularly sensitive to get right.

That said, only a subset of software needs an authentication solution or has zero tolerance to some codepath having a bug. Those don't apply to almost all of the apps/TUIs/GUIs I've built over the last few months.

If you have to restrict the domain to those cases for LLMs to be "disastrous", then I'll grant that for this convo.

What about everything else?

lelanthran · 2025-12-02T14:50:36 1764687036

> A lot of that will come to the prompter's own foresight

And, on the current trend, how on earth are prompters supposed to develop this foresight, this expertise, this knowledge?

Sure, fine, we have them now, in the form of experienced devs, but these people will eventually be lost via attrition, last even faster if companies actually do make good on their threat to replace a team of 10 devs with a team of three prompters (former senior devs).

The short-sightedness of this, the ironic lack of foresight, is troubling. You're talking about shutting off the pipeline that will produce these future prompters.

The only way through, I think, will be if (very big if) the LLMs get so much better at coding (not code-gen) that you won't need a skilled prompter.

Good luck with that.

hombre_fatal · 2025-12-03T02:27:33 1764728853

> how on earth are prompters supposed to develop this foresight, this expertise, this knowledge?

I suppose curiosity. The same way anyone develops expertise in the abstractions below after getting excited about the higher layer.

fzeroracer · 2025-12-02T07:45:16 1764661516

Have you checked your package imports lately?

agubelu · 2025-10-14T07:07:57 1760425677

Even assuming that's the case, everyone's acting like throwing more GPUs at the problem is somehow gonna get them to AGI

atleastoptimal · 2025-10-14T07:10:39 1760425839

Far more is being done than simply throwing more GPU's at the problem.

GPT-5 required less compute to train than GPT-4.5. Data, RL, architectural improvements, etc. all contribute to the rate of improvement we're seeing now.

4gotunameagain · 2025-10-14T07:28:29 1760426909

The very idea that AGI will arise from LLMs is ridiculous at best.

Computer science hubris at its finest.

atleastoptimal · 2025-10-14T07:30:46 1760427046

Why is it ridiculous that an LLM or a system similar to or built off of an LLM could reach AGI?

4gotunameagain · 2025-10-14T09:06:30 1760432790

Because intelligence is so much more than stochastically repeating stuff you've been trained on.

It needs to learn new information, create novel connections, be creative.. We are utterly clueless as to how the brain works and how intelligence is created.

We just took one cell, a neuron, made the simplest possible model of it, made some copies of it and you think it will suddenly spark into life by throwing GPUs at it ?

atleastoptimal · 2025-10-14T23:32:46 1760484766

>It needs to learn new information, create novel connections, be creative.

LLM's can do all those things

4gotunameagain · 2025-10-15T06:37:07 1760510227

Nope. Can't learn anything after the training data, only within the very narrow context window.

Any novel connections are through randomness, hence hallucinations instead of useful connections with background knowledge of involved systems or concepts.

About creativity, see my previous point. If I spit out words that go next to eachother, it won't be creativity. Creativity implies a goal, a purpose, or sometimes by chance, but utilising systematic thinking with understanding of the world.

atleastoptimal · 2025-10-16T23:39:05 1760657945

I was considering refuting this point by point, but it seems your mind is already made up.

I feel that many people who deny the current utility and abilities of large-language models will continue to do so far after they've exceeded human intelligence, because the perception that they are fundamentally limited, regardless of whether they actually are or if their criticisms make any sense, is necessary for some load-bearing part of their sanity.

4gotunameagain · 2025-10-17T06:19:10 1760681950

I never denied the utility or abilities of LLMs. I use them almost daily, and they can be very useful.

I am denying that they are intelligent, or that by scaling them / upgrading them they will suddenly spring to life and become AGI.

leptons · 2025-10-14T08:33:40 1760430820

If AGI is built from LLMs, how could we trust it? It's going to "hallucinate", so I'm not sure that this AGI future people are clamoring for is going to really be all that good if it is built on LLMs.

NuclearPM · 2025-10-18T22:07:29 1760825249

Humans are wrong all the time too. Intentionally and unintentionally.

saubeidl · 2025-10-14T07:31:50 1760427110

Because LLMs are just stochastic parrots and don't do any thinking.

atleastoptimal · 2025-10-14T07:45:48 1760427948

Humans who repeatedly deny LLM capabilities despite the numerous milestones they've surpassed seem more like stochastic parrots.

The same arguments are always brought up, often short pithy one-liners without much clarification. It seems silly that despite this argument first emerging when LLM's could barely write functional code, now that LLM's have reached gold-medal performance on the IMO, it is still being made with little interrogation into its potential faults, or clarification on the precise boundary of intelligence LLM's will never be able to cross.

saubeidl · 2025-10-14T07:48:49 1760428129

Which novel idea have LLMs brought forward so far?

atleastoptimal · 2025-10-14T23:34:20 1760484860

> Claim: gpt-5-pro can prove new interesting mathematics.

>Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper, and I checked the proof it's correct.

https://x.com/SebastienBubeck/status/1958198661139009862

(please excuse the x link)

leptons · 2025-10-14T08:35:06 1760430906

Call me back when LLMs stop "hallucinating" constantly.

agubelu · 2025-08-08T10:55:24 1754650524

"Wrong" is a concept that clearly applies when something is objectively wrong.

I asked ChatGPT for help with Wordle the other day, by asking for a 5-letter word that contained P, M, K and Y. It said:

> Yes, the word skimp contains the letters P, M, K, and Y

Would you say that wrong is not a concept that applies to this answer?

syncmaster913n · 2025-08-08T13:47:10 1754660830

I think the original commenter meant that the LLM can't be called wrong because the concept requires understanding. However, I think it would be fine to call the LLM's response incorrect.

agubelu · 2025-07-28T15:50:29 1753717829

> Suddenly no British man can be more than 3 months in another European country before being “banned”

People finding out that voting against free movement means movement will no longer be free remains my favorite Brexit trope.

agubelu · 2025-07-12T15:02:20 1752332540

Yet Occam's razor still applies

jltsiren · 2025-07-12T16:30:36 1752337836

That's not what Occam's razor means. It means that after you have exhausted all options to rule out competing hypotheses, you choose the simplest one that remains, for the time being.

Consider some explanations that are consistent with the evidence presented so far. And remember that the purpose of the investigation is to come up with actionable conclusions.

1. One of the pilots randomly flipped and crashed the plane for no reason. In this case, nothing can be done. It could have happened to anyone at any time, and we were extraordinarily unlucky that the person in question was in position to inflict massive casualties.

2. Something was not right with one of the pilots, the airline failed to notice it, and the pilot decided to commit a murder-suicide. If this was the case, signs of the situation were probably present, and changes in operating procedures may help to avoid similar future accidents.

3. One of the pilots accidentally switched the engines off. The controls are designed to prevent that, but it's possible that improper training taught the pilot to override the safeties instinctively. In this case, changes to training and/or cockpit design could prevent similar accidents in the future.

Because further investigation may shed light on hypotheses 2 and 3, it's premature to make conclusions.

manquer · 2025-07-12T20:01:12 1752350472

Given the fly by wire nature of 787 there is an also fourth option.

The physical switch was not touched at all , and the software has a bug under some rare conditions which cut off the supply to both engines.

sgerenser · 2025-07-13T00:51:30 1752367890

Extremely unlikely, since we can hear the other pilot ask why he turned the fuel switches. If it was an electrical glitch, he wouldn’t be able to see that they are in the cutoff position.

manquer · 2025-07-13T01:55:33 1752371733

All we know is the pilot flying is only asking whether the pilot monitoring if he cut off

- We don't know if he meant the switch specifically at all. He could also have meant engines or thrust in general. There are many other visual signals and UX indicators to know if engines are spinning down. Thrust levels, to RPM to falling speed, change in angle of attack, rate of climb, even engine noise, vibrations you expect at full thrust etc.

- We also don't know if the switch was physically in cut off position in the first place or even if was the pilot noticed that specific visual signal and meant that when he spoke.

If it was a software issue, it is possible the switch was properly positioned, and software issue cut engine was cutoff, the display screens and other lights would show that.

In such a scenario, the pilot(s) would have likely checked with each other first if they did something as in the audio and manually tried restarting the engine as they seem to have done.

I am not saying it is a bug or any specific fault scenario, Just that it too early, we don't yet have enough information to say what is likely at all.

agubelu · 2025-07-13T06:58:29 1752389909

I think there are a couple of factors that disprove these theories:

- The specific mention of "cut off" in the CVR is very telling. If both pilots were genuinely surprised, you'd expect they'd say something like "engine failure" or "loss of thrust" first. Noone thinks the engines have been shut down as a knee-jerk reaction to a sudden loss of thrust.

- If investigators had the slightest indication there's a software or hardware bug out there that randomly causes dual engine failures, an emergency airworthiness directive would have been issued by now. This hasn't happened.

manquer · 2025-07-13T11:30:57 1752406257

> an emergency airworthiness directive would have been issued by now. This hasn't happened.

737 Max incidents proved it isn’t always the case.

This is also not the NTSB or FAA doing direct investigation . Without certainty no one is issuing a directive, at this stage it is simply too early and only a possibility

I wouldn’t read so much intent during high stress part of takeoff from two non native speakers

agubelu · 2025-07-13T12:43:58 1752410638

> 737 Max incidents proved it isn’t always the case.

> This is also not the NTSB or FAA doing direct investigation . Without certainty no one is issuing a directive, at this stage it is simply too early and only a possibility

You are mistaken. The first MAX crash resulted in emergency directives being issued barely a week after the crash. That investigation was conducted by the Indonesian authorities, not US ones.

Emergency directives aren't issued when there's complete certainty, quite the opposite. Hence the "emergency" bit.

> I wouldn’t read so much intent during high stress part of takeoff from two non native speakers

I agree there's some fuzziness since the exact transcription wasn't provided. But "why did you cut out the engines" is by no means a normal question when facing sudden thrust loss.

burnt-resistor · 2025-07-15T08:17:10 1752567430

Exactly. 80% chance it was the one who asked who didn't do it, but we'll never know.

burnt-resistor · 2025-07-15T08:13:06 1752567186

I think that's some extreme speculation bordering on conspiracy theory that would need evidence to back up.

agubelu · 2025-07-12T14:53:52 1752332032

Respectfully, media reports on what the investigation is focusing on should be taken with a grain of salt unless said media is known to be reputable and have credible sources.

If they had a credible indication of a technical failure that causes engines to randomly shut down, they would have already grounded 787 fleets, which hasn't happened.

agubelu · 2025-07-12T14:46:50 1752331610

I keep reading "muscle memory" but the theory that one pilot shut down the engines instead of performing another action has nothing to do with muscle memory.

Muscle memory allows you to perform both actions effectively but doesn't make you confuse them. Especially when the corresponding sequence of callouts and actions is practiced and repeated over and over.

All of us have muscle memory for activating the left blinker in our car and pulling the handbrake, but has anyone pulled the handbrake when they wanted to signal left?

rogerrogerr · 2025-07-12T15:23:09 1752333789

Another comment has the right analogy: has anyone here accidentally unplugged their mouse when they meant to hit caps lock?

agubelu · 2025-06-27T13:30:15 1751031015

It's also a very US-centric thing to think that the US is the only country with significant differences between its internal subdivisions.

agubelu · 2025-06-27T10:26:31 1751019991

The countries with strong social-democratic parties are not the ones you think they are.