QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models

iTokio · on Dec 13, 2023

> affordable commodity hardware, like a single server with 4x NVIDIA A6000 or 8x NVIDIA 3090 GPUs

I need to seriously revise my definition of affordable commodity hardware

jetrink · on Dec 13, 2023

You can rent such a system for less than $4/hr. That sounds pretty affordable to me!

withinboredom · on Dec 13, 2023

That's nearly minimum wage in most western countries, or a really nice living in some countries.

nine_k · on Dec 13, 2023

Speaking of the US, it's roughly a price or a hamburger.

If you can't afford a hamburger, your problems are likely not in compressing trillion-parameter models.

avmich · on Dec 14, 2023

Same if you're going to afford a hamburger every hour.

thelastparadise · on Dec 16, 2023

Right, that's about $3k/month.

samus · on Dec 13, 2023

Running humongous models for the price of a small car? Yes, it's absolutely affordable. It's peanuts for all except the smallest, self-bootstrapped startups. Amortized it's way less than the expenses for data scientist and developers that can actually make full use of the cards.

ronsor · on Dec 13, 2023

NVIDIA's price gouging has distorted people's idea of "affordable"

karmakaze · on Dec 13, 2023

> Concretely, QMoE can compress the 1.6 trillion parameter SwitchTransformer-c2048 model to less than 160GB (20x compression, 0.8 bits per parameter) at only minor accuracy loss, in less than a day on a single GPU.

I'm not in the field. Can someone explain how the sub-1-bit part works--are they also reducing the number of parameters as part of the compression?

chessgecko · on Dec 13, 2023

It takes a 2/1.5bit model, groups parameters together then exploits a lack of entropy in the parameters to compress it a bit like text compression. It was only below 1bit for the ultra large model, guess the smaller ones weren’t quite as random.

It’ll be interesting to see if it works on the new mistral moe model, which is less sparse and probably trained more per param than these.

cyanydeez · on Dec 13, 2023

sparse means there's a lot of nulls.

think of it like bog standard compression algorithm.

kosolam · on Dec 13, 2023

Nice!