> it doesn’t offer any advantages over 3- or 4-bit quantization. "zero-shot accu...

kingsleyopara · 2025-04-06T11:02:55 1743937375

For zero-shot accuracy from Table 3:

* LLaMA 3 8B: baseline 72.26, 4-bit 71.31, 3-bit 62.79

* LLaMA 3 70B: baseline 79.51, 4-bit 78.06, 3-bit 74.68

These results seem comparable to modern quantization methods—for example, the ~4-bit results for smaller LLaMA models listed here: https://ai.meta.com/blog/meta-llama-quantized-lightweight-mo...

timschmidt · 2025-04-06T11:17:07 1743938227

I don't see any comparable numbers on the page you linked. Seems to only have numbers for 1B and 3B parameter models. Comparisons to AWQ and OmniQuant in Table 3 seem quite favorable with SeedLM showing 10% - 50% better performance.

Also seems like the techniques may be possible to combine.

_0ffh · 2025-04-06T17:27:05 1743960425

As a rule of thumb, the bigger the model is, the more graciously it degrades under quantisation. So you may assume performance loss for a 8B model would be lower than for a 3B model. (I know that doesn't make up for missing numbers in link, just fyi.)