I’ll tell you why this happens. You might use ChatGPT for a bit and your initial impressions will be great. It does what I ask of it! You might be aware that it makes mistakes sometimes, but when you use it, you don’t notice it because you’re using it interactively.
Now if LLMs are just effective as your experience says, they are indeed extremely useful and you absolutely should see if they can help you.
It’s only when you attempt to build a product — and it could be one person writing one Python script — that uses LLMs in an automated way with minimal human input that you really get insights into LLMs’ strengths and their limitations. You realize it could be useful, but you have to sometimes baby it a lot.
How many people get to step two? That’s a select few. Most people are stuck in the dreamy phase of trying out interactive LLMs.
This is a re-occurring issue with all new technology. Heck it happens with new software frameworks.
The other problem I find is that LLMs are changing so fast, that what you evaluated 6-12 months ago, might be completely different now with newer models.
So the strengths and weaknesses quickly can become outdated as the strengths grow and weaknesses diminish.
When the first batch of LLMs people tried in 2023 had a lot of weaknesses. At the end of 2024, we can see increases in performance in speed and the complexity of output. People are creating frameworks on top of the LLMs that further increase their value. We went from thousands of tokens in context to millions of tokens pretty fast.
I can see myself dividing problems up into 4 groups:
1. LLMs currently solve the problem
2. It doesn't solve it now, but we are within a couple iteration of next generation models or frameworks to be able to solve it
3. LLMs are still years off from being able to solve this effectively so wait and implement it when it can.
4. LLMs will never solve this.
I think a lot of people building products are in group 2 right now.
This definitely resonates but I'm left wondering why there hasn't been a collective "sobering up" on this front. Not on a personal/team/company level, but just in terms of the general push to cram AI into everything. For how much longer will new s assault us in software where it ostensibly won't be that useful?
It seems that the effort required to make an LLM work robustly within a single context (spreadsheet, worddoc, email, whatever) is so gargantuan (honestly) that the returns or even the initial manpower wouldn't be there. So any new feels more or less like bloat, and if not fully useless, then at least a bit anxiety inducing in that you have no clue how much you can rely on it.
Very few managers get quick promotions for NOT rolling out a high-visibility AI enhancement. LLMs can theoretically fit into an amazing diversity of products. Even if just 10% of managers say yes and the other 90% say no, thats still a lot of shoehorning every year in an attempt to book a “win” for a promotion.
Totally. And everytime someone sobers up, there is a cabal of people saying "we've sunk however many $$$ into this, it's the core feature of the xx roll-out...drink up, the hype party continues, like it or not...." So now you see phenoms like the one-time 'premier tier subscriber only feature of co-pilot on Github now pushed to everyone, prompts to use the generative A.I in iStock on every page, compulsory "use Co-pilot to write your draft' prompts on every new doc in MS Word - because I don't think companies are able to grok the widespread disinterest in much of it. I'm still waiting for one that will be non-networked and sit on my desktop to do my tax returns and haggle with phone company bots.
Now if LLMs are just effective as your experience says, they are indeed extremely useful and you absolutely should see if they can help you.
It’s only when you attempt to build a product — and it could be one person writing one Python script — that uses LLMs in an automated way with minimal human input that you really get insights into LLMs’ strengths and their limitations. You realize it could be useful, but you have to sometimes baby it a lot.
How many people get to step two? That’s a select few. Most people are stuck in the dreamy phase of trying out interactive LLMs.
This is a re-occurring issue with all new technology. Heck it happens with new software frameworks.