Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For people wanting to run it locally, you can fit the 7b model (just) into a 24GB VRAM GPU (e.g. 3090/4090). The 3b model appears to be much more reasonable, but I would say the output is.... of limited quality based on the few tests I've run thus far.


Gist that mostly just takes the notebook Stability AI have in the github repo, and turns it into a script you can run locally after installing a few dependencies from pip:

https://gist.github.com/cmsj/2d6b247ad4fc8f15011105feeda763e...


I suspect the community will start creating lower precision/quantized versions of the model very quickly. LLaMa 30b quantized to 4 bits is runnable on a 3090/4090.


Don't need a GPU to run the model, you can use your RAM and CPU, but it might be a bit slow


It's very slow, and for the 7b model you're still looking at a pretty hefty RAM hit whether it's CPU or GPU. The model download is something like 40GB.


There's already support in llama.cpp. It runs faster than ChatGPT on my old laptop CPU.


7B quantized down to 4 bits will run on a 2060.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: