I'm running Fedora Silverblue as my host OS, this is the kernel:
$ uname -a
Linux fedora 6.18.9-200.fc43.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Feb 6 21:43:09 UTC 2026 x86_64 GNU/Linux
You also need to set a few kernel command line paramters to set it up to allow it to use most of your memory as graphics memory, I have the following in my kernel command line, those are each 110 GiB expressed in number of pages (I figure leaving 18 GiB or so for CPU memory is probably a good idea):
Then I'm running llama.cpp in the official llama.cpp Docker containers. The Vulkan one works out of the box. I had to build the container myself for ROCm, the llama.cpp container has ROCm 7.0 but I need 7.2 to be compatible with my kernel. I haven't actually compared the speed directly between Vulkan and ROCm yet, I'm pretty much at the point where I've just gotten everything working.
(as mentioned, still just getting this set up so I've been moving around between using `-hf` to pull directly from HuggingFace vs. using `uvx hf download` in advance, sorry that these commands are a bit messy, the problem with using `-hf` in llama.cpp is that you'll sometimes get surprise updates where it has to download many gigabytes before starting up)
This is huge. There's been 3rd party Signal library for this for years -- and for some reason I can't determine, the developers have opted NOT to do this.
In the 80s I read all the comics compilations from the late 50s -> 70s, that was the golden age of the strip. It was an amazing comic and you'll see why all the strips creators since then were inspired by it.
For all my constant freak-outs about AI in general, it turned out to be a godsend last year when my wife’s mom was hospitalized (and later passed away a few weeks afterward). Multimodal ChatGPT had just become available on mobile, so being able to feed it photos of her vital sign monitors to figure out what was going on, have it translate what the doctors were telling us in real time, and explain things clearly made an incredible difference. I even used it to interpret legal documents and compare them with what the attorneys were telling us — again, super helpful.
And when the bills started coming in, it helped there too. Hard to say if we actually saved anything — but it certainly didn’t hurt.
Doubters say it's not as accurate or could hallucinate. But the thing about hiring professionals is that you have to blindly trust them because you'd need to have a professional level of knowledge to qualify who is competent.
LLMs are a good way to double check if the service you're getting is about right or steer them onto the right hypothesis when they have some confirmation bias. This assumes that you know how to prompt it with plenty of information and open questions that don't contain leading presuppositions.
An LLM read my wife's blood lab results and found something the doctor was ignoring.
This post resurfaced a thought I had. MSFT is really, really pushing AI. It would be really cool if someone attempted, with any of the coding models / agents, to recreate Windows from "scratch". THAT would be very interesting, and useful -- on my levels.
When I ran the Python Meetup here in Phoenix -- an engineer for Intel's compilers group would show up all the time. I remember he would constantly be frustrated that Intel management would purposely down-play and cripple advances of the Atom processor line because they thought it would be "too good" and cannibalize their desktop lines. This was over 15 years ago -- I was hearing this in real-time. He flat out said that Intel considered the mobile market a joke.
reply