At what quantization? And if it is in fact quantized below fp8, how is the perfo... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		lordofgibbons 42 days ago \| parent \| context \| favorite \| on: Cerebras Code now supports GLM 4.6 at 1000 tokens/... At what quantization? And if it is in fact quantized below fp8, how is the performance impacted on all the various benchmarks?

antonvs 42 days ago [–]

They claim they don't use quantization.

The reason for their speed is this chip: https://www.cerebras.ai/chip

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact