Oh does llama.cpp use MLX or whatever? I had this question, wonder if you know? ...

irusensei · 2026-03-31T08:12:59 1774944779

>Oh does llama.cpp use MLX or whatever?

No. It runs on MacOS but uses Metal instead of MLX.

zozbot234 · 2026-03-31T08:47:04 1774946824

ANE-powered inference (at least for prefill, which is a key bottleneck on pre-M5 platforms) is also in the works, per https://github.com/ggml-org/llama.cpp/issues/10453#issuecomm...

OkGoDoIt · 2026-03-31T08:58:00 1774947480

Is that better or worse?

irusensei · 2026-03-31T10:54:00 1774954440

Depends.

MLX is faster because it has better integration with Apple hardware. On the other hand GGUF is a far more popular format so there will be more programs and model variety.

So its kinda like having a very specific diet that you swear is better for you but you can only order food from a few restaurants.

drob518 · 2026-03-31T11:59:47 1774958387

But you can always fall back to GGUF while waiting for the world to build a few more MLX restaurants. Or something like that; the analogy is a bit stretched.

irusensei · 2026-03-31T23:02:39 1774998159

Yeah I'm terrible with analogies.

LoganDark · 2026-03-31T08:25:57 1774945557

llama.cpp uses GGML which uses Metal directly.