Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

like someone said above: brew install llama.cpp

llama-server -hf ggml-org/gemma-4-E4B-it-GGUF --port 8000 (with MCP support and web chat interface)

and you have OpenAI API on the same 8000 port. (https://github.com/ggml-org/llama.cpp/tree/master/tools/serv... lists the endpoints)



And why do I use ggml-org/gemma-4-E4B-it-GGUF instead of one of the 162 other models that can be found under the ggml-org namespace? And how do I even know that this is the namespace to look at?

That's what I meant by model management. I'm too tired to scroll through a bazillion models that all have very cryptic names and abbreviations just to find the one that works well on my system with my software stack.

I want a simple interface that a tool like me can scroll through easily, click on, and then have a model that works well enough. If I put in that much brain power to get my LLM working, I might as well do the work myself instead of using an LLM in the first place.


1. Go to HF

2. Choose the model they recommend

3. Run the one-liner the site gives you

Bonus: faster access to latest models and better memory usage


The first model I see on the HF homepage is this one: MiniMaxAI/MiniMax-M2.7

Do you think that this 229B parameter model will work on my consumer PC?

Stop pretending like HF is in any way beginner friendly.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: