Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
read_if_gay_
on Dec 8, 2023
|
parent
|
context
|
favorite
| on:
Mistral "Mixtral" 8x7B 32k model [magnet]
yes I read that. do you think it's reasonable to assume that the same expert will be selected so consistently that model swapping times won't dominate total runtime?
tarruda
on Dec 8, 2023
[–]
No idea TBH, we'll have to wait and see. Some say it might be possible to efficiently swap the expert weights if you can fit everything in RAM:
https://x.com/brandnarb/status/1733163321036075368?s=20
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: