Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

At 4 bit quantization the weights only take half the RAM. You need a good chunk for context as well, but in my limited testing Qwen3-30B rand well on a single RTX 3090 (24GB VRAM).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: