LLM in silicon is the future. It won't be long until you can just plug an LLM ch...

jillesvangurp · 2026-03-31T10:20:10 1774952410

You can always delegate sub agents to cloud based infrastructure for things that need more intelligence. But the future indeed is to keep the core interaction loop on the local device always ready for your input.

A lot of stuff that we ask of these models isn't all that hard. Summarize this, parse that, call this tool, look that up, etc. 99.999% really isn't about implementing complex algorithms, solving important math problems, working your way through a benchmark of leet programming exercises, etc. You also really don't need these models to know everything. It's nice if it can hallucinate a decent answer to most questions. But the smarter way is to look up the right answer and then summarize it. Good enough goes a long way. Speed and latency are becoming a key selling point. You need enough capability locally to know when to escalate to something slower and more costly.

This will drive an overdue increase in memory size of phones and laptops. Laptops especially have been stuck at the same common base level of 8-16GB for about 15 years now. Apple still sells laptops with just 8GB (their new Neo). I had a 16 GB mac book pro in 2012. At the time that wasn't even that special. My current one has 48GB; enough for some of the nicer models. You can get as much as 256GB today.

zozbot234 · 2026-03-31T10:26:25 1774952785

> This will drive an overdue increase in memory size of phones and laptops.

DRAM costs are still skyrocketing, so no, I don't think so. It's more likely that we'll bring back wear-resistant persistent memory as formerly seen with Intel Optane.

jillesvangurp · 2026-04-01T04:16:32 1775016992

Standard pig cycle in economics. Production capacity eventually goes up to meet demand and prices come down again. RAM has been going through cycles like this for decades. People seem to have no memory whatsoever of previous cycles every time it happens. Just wait a few years for it to become cheap again.

theshrike79 · 2026-03-31T08:44:07 1774946647

I'm expecting someone to come up with an LLM version of the Coral USB Accelerator: https://www.coral.ai/products/accelerator

Just plug in a stick in your USB-C port or add an M.2 or PCIe board and you'll get dramatically faster AI inference.

angoragoats · 2026-03-31T11:48:35 1774957715

I think there are drastic differences between computer vision models and LLMs that you’re not considering. LLMs are huge relative to vision models, and require gobs of fast memory. For this reason a little USB dongle isn’t going to cut it.

Put another way, there already exist add-in boards like this, and they’re called GPUs.

amelius · 2026-03-31T12:45:36 1774961136

GPUs are still software programmable.

An "LLM chip" does not need that and so can be much more efficient.

angoragoats · 2026-03-31T23:55:10 1775001310

Sure, but that’s somewhat orthogonal to the point I was making, which is that LLMs are huge in size. Even in the case of a custom “LLM chip,” you’ll need huge amounts of very fast storage of some sort (likely DRAM), which places constraints on the size, power consumption, and cost of such a device. This device, if it existed, would not in any way resemble the Coral TPU product that the GP was referencing; I think in fact it would be closer in size, price, and form factor to a GPU.