Hacker Newsnew | past | comments | ask | show | jobs | submit | Philip-J-Fry's commentslogin

What do you mean?

LMStudio is listed as an alternative. It offers a chat UI, a model server supporting OpenAI, Anthropic and LMStudio API interfaces. It supports loading the models on demand or picking what models you want loaded. And you can tweak every parameter.

And it uses llama.cpp which is the whole point of the blog post.


Thanks for pointing that out. From the description in the blog post it sounded like it was GUI only without an API, and I didn't bother looking into it because of that. But it look pretty nice, so I'll give it a try.

I think the difference is that with LLMs, in a lot of cases you do see some diminishing returns.

I won't deny that the latest Claude models are fantastic at just one shotting loads of problems. But we have an internal proxy to a load of models running on Vertex AI and I accidentally started using Opus/Sonnet 4 instead of 4.6. I genuinely didn't know until I checked my configuration.

AI models will get to this point where for 99% of problems, something like Gemma is gonna work great for people. Pair it up with an agentic harness on the device that lets it open apps and click buttons and we're done.

I still can't fathom that we're in 2026 in the AI boom and I still can't ask Gemini to turn shuffle mode on in Spotify. I don't think model intelligence is as much of an issue as people think it is.


100% agree here. The actual practical bottleneck is harness and agentic abilities for most tasks.

It's the biggest thing that stuck out to me using local AI with open source projects vs Claude's client. The model itself is good enough I think - Gemma 4 would be fine if it could be used with something as capable as Claude.

And that's gonna stay locked down unfortunately especially on mobile and cars - it needs access to APIs to do that stuff - and not just regular APIs that were built for traditional invoking.

The same way that websites are getting llm.txts I think APIs will also evolve.


Agree on the diminishing returns,the Opus 4.6 anecdote is a good signal

I'm not sure I understand your last paragraph? The two sentences seem to contradict?

GPT 3.5 was intelligent enough to understand that command and turn it into a correct shaped JSON object: the platforms don't have tight enough integration to take advantage of the intelligence

I think security is the issue-ai is good at circumventing this. For example , ai can read paywalled articles you cannot. Do you really want ai to have ‘free range’.?

I mean to me even difference between Opus and Sonnet is as clear as day and night, and even Opus and the best GPT model. Opus 4.6 just seems much more reliable in terms of me asking it to do something, and that to actually happen.

It depends what you're asking it though. Sure, in a software development environment the difference between those two models is noticeable.

But think about the general user. They're using the free Gemini or ChatGPT. They're not using the latest and greatest. And they're happy using it.

And I am willing to bet that a lot of paying users would be served perfectly fine by the free models.

If a capable model is able to live on device and solve 99% of people's problems, then why would the average person ever need to pay for ChatGPT or Gemini?


But even other tasks, like research etc, where dates are important, little details and connections are important, reasoning is important, background research activities or usage of tools outside of software development, and this is where I am finding much of the LLMs most useful for my life.

Even Opus makes mistakes with dates or not understanding news and everything correctly in context with chronological orders etc, and it would be even worse with smaller and less performing models.

Scheduling, planning, researching products, shopping, trip plans, etc...


You're quick to say "to me" in your comparison.

My experience is very different than yours. Codex and CC yield very differenty result both because of the harness differencess and the model differences, but niether is noticeably better than the other.

Personally, I like Codex better just because I don't have to mess with any sort of planning mode. If I imply that it shouldn't change code yet, it doesn't. CC is too impatient to get started.


I guess yes, that's a harness difference, and you can also configure CC as a harness to behave very differently, but still with same harness and guidance, "to me" there's still a difference in terms of Opus 4.6 and e.g. GPT 5.4 or which GPT model do you use? I've been using Claude Code, Codex and OpenCode as harnesses presently, but for serious long running implementation I feel like I can only really rely on CC + Opus 4.6.

Yes 5.4

Perhaps Opus is superior and I'm just jaded.

I come from Cursor before having adopted the TUI tools. Opus was nothing short of pathetic in their environment compared to the -codex models. I would only use it for investigations and planning because it was faster.

Like you've said, though, that could just be a harness issue.


I have the opposite experience. Codex gets to work much faster than Claude Code. Also I've never seen the need to use planning mode for Claude. If it thinks it needs a plan it will make one automatically.

I'll drink to the idea that it's all in my head.

We have AI companies constantly fear-mongering that their next model is somehow too dangerous to release. But they just continue to go on an acquisition spree.

This just confirms to me that we are no where near AI being able to write any complicated software. I mean, if it could woudln't OpenAI just prompt it into existence? ;)


> We have AI companies constantly fear-mongering that their next model is somehow too dangerous to release

I'm guessing you're referring to this recent report of the security vulnerabilities Mythos found and submitted patches for? That just seems like they don't want the negative press and/or liability if their new model ends up being used to create 0-days that cause widespread damage.

https://red.anthropic.com/2026/mythos-preview/


I don't buy this at all. Code quality will always matter. Context is king with LLMs, and when you fill that context up with thousands of lines of spaghetti, the LLM will (and does) perform worse. Garbage in, garbage out, that's still the truth from my experience.

Spaghetti code is still spaghetti code. Something that should be a small change ends up touching multiple parts of the codebase. Not only does this increase costs, it just compounds the next time you need to change this feature.

I don't see why this would be a reality that anyone wants. Why would you want an agent going in circles, burning money and eventually finding the answer, if simpler code could get it there faster and cheaper?

Maybe one day it'll change. Maybe there will be a new AI technology which shakes up the whole way we do it. But if the architecture of LLMs stays as it is, I don't see why you wouldn't want to make efficient use of the context window.


I didn't say that you "want" spaghetti code or that spaghetti code is good.

I said that (a) apps are getting simpler and smaller in scope and so their code quality matters less, and (b) AI is getting better at writing good code.


Apps are getting bigger and more ambitious in scope as developers try to take advantage of any boost in production LLMs provide them.

Every metric I've seen points to there being an explosion in (a) the number of apps that exist and (b) the number of people making applications.

What relevance do either of those claims have to the claim of the comment you are responding to?

Are you trying to imply that having more things means that each of them will be smaller? There are more people than there were 500 years ago - are they smaller, or larger?

Also, the printing press did lead to much longer works. There are many continuous book series that have run for decades, with dozens of volumes and millions of words. This is a direct result of the printing press. Just as there are television shows that have run with continuous plots for thousands of hours. This is a consequence of video recording and production technologies; you couldn't do that with stage plays.

You seem to be trying to slip "smaller in scope" into your statement without backing, even though I'd insist that applications individuals wrote being "smaller in scope" was a obvious consequence of the tooling available. I can't know everything, so I have to keep the languages and techniques limited to the ones that I do know, and I can't write fast enough to make things huge. The problems I choose to tackle are based on those restrictions.

Those are the exact things that LLMs are meant to change.


The average piece written and published today today is much shorter than the average piece from the past. Look at Twitter. Social media in general. Internet forums. Blog posts. Emails. Chats. Etc. The amount of this content DWARFS other content.

The same is true of most things that get democratized. Look at video. TikTok, YouTube, YouTube shorts.

Look at all the apps people are building are building for themselves with AI. They are typically not building Microsoft Word.

Of course there will be some apps that are bigger and more ambitious than ever. I myself am currently building an app that's bigger an more ambitious than I would have tried to build without AI. I'm well aware of this use case.

But as many have pointed out, AI is worse at these than at smaller apps. And pretending that these are the only apps that matter is what's leading developers imo to over-value the importance of code quality. What's happening right now that's invisible to most professional engineers is an explosion in the number of time, bespoke personal applications being quickly built by non-developers are that are going to chip away at people's reasons to buy and use large, bloated, professional software with hundreds of thousands of users.


> Look at all the apps people are building are building for themselves with AI.

The apps those people were making before LLMs became ubiquitous were no apps. So by definition they are now larger and more ambitious.


There's already been an explosion of apps - and most of them suck, are spam, or worse, will steal your data.

We don't need more slop apps, we already have that and have for years.


Yes, some people are pushing everything-apps.

But just as many are creating One More Habit Tracker or Todo App, so many that Apple had to change their review guidelines to block the surge of low-tier app slop.

Internally people are creating bespoke tools for themselves to fix issues in their daily workflows that would've either been a 100k€ software project that lasts for 6 months or required an expensive SaaS system with 420 extra features they didn't need - and the price to match.


The Jevons paradox says otherwise. As producing apps becomes cheaper, we will not be able to help ourselves: we will make them larger until they fill all available space and cost just as much to produce and maintain.

That's the incorrect application of the Jevons Paradox. We won't get bigger apps, we'll get more apps.

Think about what happened to writing when we went from scribes to the printing press, and from the printing press to the web. Books and essays didn't get bigger. We just got more people writing.


Parrots can't "talk". They just mimick noises they've heard before


This reminds me of being told dogs don't feel emotions by someone who never owned one. Parrots most definitely can talk. Their language is extremely primitive but if you've ever been around a grey and it's owner for some time, they definitely talk to each other. The parrot will readily communicate observations and desires.


Isn't that what humans do too? We mimic noises we've heard before and we associate meaning to the noises. Parrots can do that. Our quaker parrot would bite you, then say 'not supposed to bite'. He clearly associated some kind of meaning to that phrase.


Not to make an argument against parrots understanding, but humans understand noises before they mimic them. Children are often able to learn and express themselves in sign language (if taught obviously) earlier than they can learn to speak, and they can respond to spoken word in sign language before they can speak.


Or maybe he just learned that's what people say when he bites them, so he started saying that himself.


https://en.wikipedia.org/wiki/Alex_(parrot)

Common misconception. Parrots are much more than just mimicry machines. There is also Apollo the parrot that shows this in detail and following from Irene's research with Alex


Many animals can communicate.

Parrots can't speak fluent English, which shouldn't be surprising. Last I checked, no human is fluent in Parrot or Dolphin.

Though, at least one parrot may have demonstrated an ability to understand language at more than a surface level.


Bumblebee (the Transformer) might have an objection here. Purposeful mimicry can be used for talking on certain complexity. It does not have to be human-level to be communication.


This is also what toddlers do until bit by bit they're repeating everything you say back to you in context.


So what you’re saying is that parrots are stochastic parrots.


You've just described most of the information economy.


This thread is going to end with Monty Python jokes.


So do we, otherwise we would all speak our own individual language.


Like Starlings do.


I mean, isn’t that just what you’re doing too? If you see a cow, and you’ve been taught that ‘cow’ is the sound that describes a cow, don’t you say “cow?”


>The use of IEnumerable as a "generator" mechanic is quite a good hack though.

Is that a hack? Is that not just exactly what IEnumerable and IEnumerator were built to do?


It feels hacky because you have to (had to?) use it as the async/await tool and because of that the types you're generating and how they are handled is a huge mess.

Really you're generating the vague concept of a yield instruction but you can return other coroutines that are implicitly run and nest your execution... Because of this you can't wait less than a frame so things are often needlessly complicated and slow.

It's like using a key to jam a door shut. Sure a key is for keeping doors closed but...


Conceptually that's no different to any security measures that prevent you from accessing data you're not supposed to? At the end of the day with all data that is colocated you're trusting that some permission feature somewhere is preventing you from accessing data you're not supposed to.

We trust that Amazon or Google or Microsoft are successful in protecting customer data for example. We trust that when you log into your bank account the money you see is yours, and when you deposit it we trust that the money goes into your account. But it's all just mostly logical separation.


> At the end of the day with all data that is colocated you're trusting that some permission feature somewhere is preventing you from accessing data you're not supposed to.

Right but ideally more than one.

> But it's all just mostly logical separation.

Yes, ideally multiple layers of this. You don't all share one RDS instance and then get row level security.


Can you give an example of more than one layer of logical separation at the data layer?

We all know that authentication should have multiple factors. But that's a different problem. Fundamentally at the point you're reading or writing data you're asking the question "does X has permission to read/write Y".

I don't see what you're getting at.


I don't know their use case enough to understand what would or would not be an appropriate mitigation. For example, with regards to financial data, you could have client side encryption on values where those keys are brokered separately. I can't exactly design their system for them, but they're describing a system in which every employee has direct database access and the database holds financial information.


Right, encryption would protect the data. But still, at the end of the day you're trusting the permission model of the database. Encryption won't prevent you updating a row or deleting a row if the database permission model failed.


Well, I think we basically agree? My suggestion is merely that a database holding financial data should have more than a single layer of security. Granting direct access to a database is a pretty scary thing. A simple example would be that any vulnerability in the database is directly accessible, even just by placing a broker in between users and the database I'd likely start to feel a lot better, and now I'd have a primitive for layering on additional security measures.

Encryption is an extremely powerful measure for this use case. If the data does not need to be indexed, you could literally take over the database process entirely and still not have access, it definitely doesn't rely on the permission model of the db because the keys would be brokered elsewhere.


> My suggestion is merely that a database holding financial data should have more than a single layer of security.

We require SSO(Azure via vault) to authenticate to the DB. We also don't expose PostgreSQL to the public internet. We aren't complete monsters :)

> Granting direct access to a database is a pretty scary thing.

For you maybe, because you were taught it's scary or it just seems different? I dunno. I'm very surprised with all the pushback about it being a single layer. Every other data access architecture will be a single layer too, it just can be made to look like it isn't. Or people think their bespoke access control system will be better because they have more control. Our experience taught us that's just bad thinking.

We've been doing direct access to PostgreSQL since 1993 without many issues. Though RLS is "recent" in terms of deployment(it came about in PG 10 I think). Before that we had a bespoke solution(written with lots of views and some C/pgsql code, it was slow and kind of sucked). RLS was a little buggy when it first was released, but within a year or so it was reliable and we moved everything over as quick as we could and haven't looked back.

> Encryption is an extremely powerful measure for this use case.

We do this with some data in some tables, but it's a PITA to do it right, so it's use is quite limited. We use Hashicorp Vault(now openbao) to hold the encryption/decryption keys.


I'm not sure where this "it's always one layer" thing is coming from, that's just not true. Nor do I see where I've said you should toss out RLS for a bespoke system - I see myself saying the opposite a few times.

> For you maybe, because you were taught it's scary or it just seems different?

Over a decade in computer security and software engineering. Nothing I'm saying is contentious. For some reason when I say "Having one boundary is bad" you say "There's only ever one boundary", which... is not true.


What is this app and what does it do? Can we see it?

I find it very hard to believe anyone could code anything complicated with Claude that 5-6 competent developers could do.

I am currently working on a relatively complicated UI on an internal tool and Claude constantly just breaks it. I tried asking it to build it step by step, adding each functionality I need piece by piece. But the code it eventually got was complete garbage. Each new feature it added would break an existing one. It was averse to refactoring the code to make it easier to add future features. I tried to point it in the right direction and it still failed.

It got to the point where I took a copy of the code, cut it back to basics and just wrote it myself. I basically halved the amount of code it wrote, added a couple of extra features and it was human readable. And if I started with this, it would have took less time!


I had trouble in my early days with the quality of things I made.

One of the things I found helped a lot is building on top of a well-structured stack. Make yourself a scaffold. Make sure it is exactly how you like your code structured, etc. Work with Claude to document the things you like about it (I call mine polyArch2.md).

The scaffold will serve as a seed crystal. The document will serve as a contract. You will get much better results.


Its a financial asset management system, and its for proprietary use only. Maybe Im doing some YT insights in the future.

> I find it very hard to believe anyone could code anything complicated with Claude that 5-6 competent developers could do. <

I should have put a disclaimer - Im not layman, instead 25y+ IT experience. Without my prior experience, I think this project wouldnt have come into existence.


>Some new value will be discovered in the code itself - maybe conceptual clarity, algorithmic novelty, structural cleanliness, readability, succinctness, etc. Those values will become the new foundations for future gatekeeping.

It's a nice idea, but I feel like that's only going to be the case for very small companies or open source projects. Or places that pride themselves on not using AI. Artisan code I call it.

At my company the prevailing thought is that code will only be written by AI in the future. Even if today that's not the case, they feel it's inevitable. I'm skeptical of this given the performance of AI currently. But their main point is, if the code solves the business requirements, passes tests and performs at an adequate level, it's as good as any hand written code. So the value of readable, succinct, novel code is completely lost on them. And I fear this will be the case all over the tech sector.

I'm hopeful for a bit of an anti-AI movement where people do value human created things more than AI created things. I'll never buy AI art, music, TV or film.


The work involved in maintaining a standard library is things like bug fixes. A larger standard library (or multi versions) means there's more likely to be bugs. You also have performance improvements, and when new versions of the language come out which has features to improve performance, you will most likely want to go back through and refactor some code to take advantage of it. You will also want to go through and refactor to make code easier to maintain. All of this just gets harder with a larger surface.

And the more stuff you pack into the standard library the more expertise you need on the maintenance team for all these new libraries. And you don't want a standard library that is bad, because then people won't use it. And then you're stuck with the maintenance burden of code that no one uses. It's a big commitment to add something to a standard library.

So it's not that things just suddenly break.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: