Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models (arxiv.org)
43 points by titaniumtown on Dec 13, 2023 | hide | past | favorite | 12 comments


> affordable commodity hardware, like a single server with 4x NVIDIA A6000 or 8x NVIDIA 3090 GPUs

I need to seriously revise my definition of affordable commodity hardware


You can rent such a system for less than $4/hr. That sounds pretty affordable to me!


That's nearly minimum wage in most western countries, or a really nice living in some countries.


Speaking of the US, it's roughly a price or a hamburger.

If you can't afford a hamburger, your problems are likely not in compressing trillion-parameter models.


Same if you're going to afford a hamburger every hour.


Right, that's about $3k/month.


Running humongous models for the price of a small car? Yes, it's absolutely affordable. It's peanuts for all except the smallest, self-bootstrapped startups. Amortized it's way less than the expenses for data scientist and developers that can actually make full use of the cards.


NVIDIA's price gouging has distorted people's idea of "affordable"


> Concretely, QMoE can compress the 1.6 trillion parameter SwitchTransformer-c2048 model to less than 160GB (20x compression, 0.8 bits per parameter) at only minor accuracy loss, in less than a day on a single GPU.

I'm not in the field. Can someone explain how the sub-1-bit part works--are they also reducing the number of parameters as part of the compression?


It takes a 2/1.5bit model, groups parameters together then exploits a lack of entropy in the parameters to compress it a bit like text compression. It was only below 1bit for the ultra large model, guess the smaller ones weren’t quite as random.

It’ll be interesting to see if it works on the new mistral moe model, which is less sparse and probably trained more per param than these.


sparse means there's a lot of nulls.

think of it like bog standard compression algorithm.


Nice!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: