Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Just to confirm, Ollama's naming is very confusing on this. Only the `deepseek-r1:671b` model on Ollama is actually deepseek-r1. The other smaller quants are a distilled version based on llama.

https://ollama.com/library/deepseek-r1



Which, according to the Ollama team, seems to be on purpose, to avoid people accidentally downloading the proper version. Verbatim quote from Ollama:

> Probably better for them to misunderstand and run 7b than run 671b. [...] if you don't like how things are done on Ollama, you can run your own object registry, like HF does.


It’s definitely on purpose - but if the purpose was to help the users making good choices they could actually give information - and explain what is what - instead of hiding it.


I read, they don't merge PRs for Intel or AMD hardware, so it seems to be generally a bit of a shady project.


Just use Llamacpp directly


Could you expand on this, is there any disadvantage to continuing with ollama?

I use Ollama for prototyping and then move what I can to a vLLM set up


I think if you find Ollama useful, use it regardless of others say. I did give it a try, but found it lands in a weird place of "Meant for developers, marketed to non-developers", where llama.cpp sits on one extreme, and apps like LM Studio sits on the other extreme, Ollama landing somewhere in the middle.

I think the main point that turned me off was how they have their custom way of storing weights/metadata on disk, which makes it too complicated to share models between applications, I much prefer to be able to use the same weights across all applications I use, as some of them end up being like 50GB.

I ended up using llama.cpp directly (since I am a developer) for prototyping and recommending LM Studio for people who want to run local models but aren't developers.

But again, if you find Ollama useful, I don't think there is any reasons for dropping it immediately.


Yeah, I made the same argument but they seem convinced it's better to just provide their own naming instead of separating the two. Maybe marketing gets a bit easier when people believe them to be the same?

    ollama has their own way of releasing their models. 
    when you download r1 you get 7b. 
    this is due to not everyone is able to run 671b. 
    if its missleading then more likely due to user not reading.  
I'm not super convinced by their argument to blame users for not reading, but after all it is their project so.


If nothing is specified the rule of least surprise would be the full vanilla version I would say.


The conspiracy theorist in me thinks that it's deliberate sabotage of a Chinese model.


No, those checkpoints have also been provided by DeepSeek.


It is very interesting how salty many in the LLM community are over Deep Seek.

DS has more or less been ignored for a very long time before this.


> It is very interesting how salty many in the LLM community are over Deep Seek

You think Ollama is purposefully using misleading naming because they're mad about DeepSeek? What benefit would there be for Ollama to be misleading in this way?


The quote would imply some crankiness. But ye it could be just general nerd crankiness too of course. Maybe I should not imply the reason or speculate too much about the reason in this specific case.

There is no benefit I think.


It's also not helping the confusion that the distills themselves were made and released by DeepSeek.

If you want the actual "lighter version" of the model the usual way, i.e. third-party quants, there's a bunch of "dynamic quants" of the bona fide (non-distilled) R1 here: https://unsloth.ai/blog/deepseekr1-dynamic. The smallest of them is just able to barely run on a beefy desktop, at less than 1 token per second.


Also Ollama is traditionally very sloppy with the chat templates they use, which does impact model performances.


> Ollama is traditionally very sloppy with the chat templates they use

Not that I don't believe you (I do, and I think I've seen them correct this before too), but you happen to have specific examples when this happened?


https://github.com/ollama/ollama/issues/1977

More recently deepseek 2 had a space after the assistant turn, causing issues with output quality and language https://www.reddit.com/r/LocalLLaMA/comments/1dko6rp/if_your...


I feel this particularly when I use gguf support.

How do you get accurate information on the template structure?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: