yes I read that. do you think it's reasonable to assume that the same expert wil... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		read_if_gay_ on Dec 8, 2023 \| parent \| context \| favorite \| on: Mistral "Mixtral" 8x7B 32k model [magnet] yes I read that. do you think it's reasonable to assume that the same expert will be selected so consistently that model swapping times won't dominate total runtime?

tarruda on Dec 8, 2023 [–]

No idea TBH, we'll have to wait and see. Some say it might be possible to efficiently swap the expert weights if you can fit everything in RAM: https://x.com/brandnarb/status/1733163321036075368?s=20

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact