Are you getting anything besides gibberish out of it? I tried their recommended commandline and it's dog slow even though I built their llama.cpp fork with AVX2 enabled. This is what I get:
$ ./build/bin/llama-cli -hf prism-ml/Bonsai-8B-gguf -p "Explain quantum computing in simple terms." -n 256 --temp 0.5 --top-p 0.85 --top-k 20 -ngl 99
> Explain quantum computing in simple terms.
\( ,
None ( no for the. (,./. all.2... the ..... by/
EDIT: It runs fine in their collab notebook. Looking at that you have to do: git checkout prism (in the llama.cpp repo) before you build. That's a missing instruction if you're going straight to their fork of llama.cpp. Works fine now.
UPDATE: I was using the llama.cpp CPU backend and was still getting gibberish. On Google colab they're running with CUDA. I turned Claude loose on the problem and it discovered a problem in the llama.cpp CPU backend code where a float was being converted to an int and basically going to 0. Now it runs fine locally with the CPU backend.
Then found out they didn't implement AVX2 for their Q1_0_g128 CPU kernel. Added that and getting ~12t/s which isn't shabby for this old machine.
Cool model.