For people wanting to run it locally, you can fit the 7b model (just) into a 24G...

cmsj · on April 19, 2023

Gist that mostly just takes the notebook Stability AI have in the github repo, and turns it into a script you can run locally after installing a few dependencies from pip:

https://gist.github.com/cmsj/2d6b247ad4fc8f15011105feeda763e...

millimeterman · on April 19, 2023

I suspect the community will start creating lower precision/quantized versions of the model very quickly. LLaMa 30b quantized to 4 bits is runnable on a 3090/4090.

janmo · on April 19, 2023

Don't need a GPU to run the model, you can use your RAM and CPU, but it might be a bit slow

cmsj · on April 19, 2023

It's very slow, and for the 7b model you're still looking at a pretty hefty RAM hit whether it's CPU or GPU. The model download is something like 40GB.

MacsHeadroom · on April 20, 2023

There's already support in llama.cpp. It runs faster than ChatGPT on my old laptop CPU.

brucethemoose2 · on April 20, 2023

7B quantized down to 4 bits will run on a 2060.