I don't think that's the case, for full speed you still need (5B*8)/2+2~fewB overhead.
I think the experts chosen per-token? That means that yes you technically only need two in VRAM memory+router/overhead per token, but you'll have to constantly be loading in different experts unless you can fit them all, which would still be terrible for performance.
So you'll still be PCIE/RAM speed limited unless you can fit all of the experts into memory (or get really lucky and only need two experts).
no doesn't work that way. experts can change per token so for interactive speeds you need all in memory unless you want to wait for model swaps between tokens.
2B for the attention head and 5B from each of 2 experts.
It should be able to run slightly faster than a 13B desnse model, in as little as 16GB of RAM with room to spare.