Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No new model? Maybe after the Qwen 3 release today they decided to hold back on Llama 4 Thinking until it benchmarks more competitively.


Beyond solid benchmarks, Alibaba's power move was dropping a bunch of models available to use and run locally today. That's disruptive already and the slew of fine tunes to come will be good for all users and builders.

https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2...


> Beyond solid benchmarks, Alibaba's power move was dropping a bunch of models available to use and run locally today.

I agree, the advantage of qwen3's family is a plethora of sizes and architectures to chose from. Another one is ease of fine-tuning for downstream tasks.

On the other hand, I'd say it's "in spite" of their benchmarks, because there's obviously something wrong with either the published results, or the way they measure them, or something. Early impressions do not support those benchmarks at all. At one point they even had a 4b model be better than their prev gen 72b model, which was pretty solid on its own. Take benchmarks with a huge boulder of salt.

Something is messing with recent benchmarks, and I don't know exactly what but I have a feeling that distilling + RL + something in their pipelines is making benchmark data creep into the models, either by reward hacking, or other signals getting leaked (i.e. prev gen models optimised for one benchmark are "distilling" those signals into newer smaller models. No, a 4b model is absoulutely not gonna be better than 4o/sonnet3.7, whatever the benchmarks say).


What's the minimum GPU/NPU hardware and memory to run Qwen3 locally?


There is a 0.6B model so basically nothing.

And the MoE 30B one has a decent shot at running OK without GPU. I'm on a 5800x3d so two generations old and its still very usable


I'm running 4B on my 8GB AMD 7600 via ollama


`model.safetensors` for Qwen3-0.6B is a single 1.5GB file.

Qwen3-235B-A22B has 118 `.safetensors` files at 4GB each.

There are a bunch of models and quants between those.


Does it run in 8x80G? Or does the KV cache and other buffers push it over the edge?


Qwen3 is a family of models, the very smallest are only a few GB and will run comfortably on virtually any computer of the last 10 years or recent-ish smart phone. The largest - well, depends how fast you want it to run.


There are models down to 0.6B and you can even run Qwen3 30B-A3B reasonably fast on CPU only.


They released the Llama 4 suite three weeks ago.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: