For people wanting to run it locally, you can fit the 7b model (just) into a 24GB VRAM GPU (e.g. 3090/4090). The 3b model appears to be much more reasonable, but I would say the output is.... of limited quality based on the few tests I've run thus far.
Gist that mostly just takes the notebook Stability AI have in the github repo, and turns it into a script you can run locally after installing a few dependencies from pip:
I suspect the community will start creating lower precision/quantized versions of the model very quickly. LLaMa 30b quantized to 4 bits is runnable on a 3090/4090.
It's very slow, and for the 7b model you're still looking at a pretty hefty RAM hit whether it's CPU or GPU. The model download is something like 40GB.