I am curious if LLMs evangelists understand how off-putting it is when they knee-jerk rationalize how badly these tools are performing. It makes it seem like it isn't about technological capabilities: it is about a religious belief that "competence" is too much to ask of either them or their software tools.
I wonder how many of those evangelists have some dumb AI startup that'll implode once the hype dies down (or a are a software engineer who feels smart when he follows their lead). One thing that's been really off putting about the technology industry is how fake-it-till-you-make-it has become so pervasive.
We live in a post-truth society. This means that, unfortunately, most of society has learned that it doesn't matter if what you're saying is true. All that matters is that the words that you speak cause you or your cause to gain power.
This is why I'm so dismissive of self-styled political moderates who argue that the path to political comity is to talk things out with political opponents, meet them half way etc. You cannot have political comity with people who don't value truth and don't adhere to rational methods of argument. Such people will lie about their premises, repudiate arguments they previously agreed to (either on their own initiative or because their political weathervane of choice has changed direction), and their promises are meaningless because they don't see any shame in breaking a promise with people they don't respect. Basically about 1/3 of the US has taken ont eh trait of narcissistic personality disorder at a group level.
I partially agree, it seems a lot have shifted the argument to news media criticism or something else. But this study is also questionable, for anyone who reads actual academic studies that should be immediately obvious. I don't understand why the bar is this low for some paid Ipsos study vs. some peer-reviewed paper in some IEEE journal?
Like for a study like this I expect as a bare minimum clearly stated model variants used, R@k recall numbers measuring retrieval and something like BLEU or ROUGE to measure summarization accuracy against some baseline on top of their human evaluation metrics. If this is useless for the field itself, I don't understand how this can be useful for anyone outside the field?
Anyone and everyone who has bought any stocks into the circular Ponzi pyramid has this knee jerk response to rationalise LLM failure modes.
They want to believe that statistical distribution of meaningless tokens is real cognition of machines and if not that, works flawlessly for most of the cases and if not flawlessly, is usable enough to be valued at trillions of dollars collectively.
I actually believe in the idea of machine cognition (with a bunch of caveats which I'm not going to type out here) but fully agree it's being used to hype the market through a combination of cynicism and naivete.
statistical distribution of meaningless tokens
As a aside note the biggest argument for the possibility of machine consciousness is the depressing fact that so many humans are uncritical bullshit spreaders themselves.
Is that just an LLM thing? I thought that as a society, we decided a long time ago that competence doesn't really matter.
Why else would we be giving high school diplomas to people who can't read at a 5th grade level? Or offshore call center jobs to people who have poor English skills?
It's been a 50 year downward slope. We're in the harvest phase of that crop. All the people we raised to believe their incompetence was just as valid as other people's facts are now confidently running things because they think magical thinking works.
I'm curious if LLM skeptics bother to click through and read the details on a study like this, or if they just reflexively upvote it because it confirms their priors.
This is a hit piece by a media brand that's either feeling threatened or is just incompetent. Or both.
Because yours is anecdotal evidence, a study like this should have a higher bar than that and be useful to support your experience, but it doesn't do that. It doesn't even say what exact models they evaluated ffs
They've conned themselves with the LLMs they use, and are desperate to keep the con going: "The LLMentalist Effect: how chat-based Large Language Models replicate the mechanisms of a psychic’s con"
>people are convinced that language models, or specifically chat-based language models, are intelligent... But there isn’t any mechanism inherent in large language models (LLMs) that would seem to enable this...
and says it must be a con but then how come they pass most of the exams designed to test humans better than humans do?
And there are mechanisms like transformers that may do something like human intelligence.