> I can run it on my Macbook Air at 12tkps, can't wait to try this on my desktop...

MyFirstSass · on Dec 8, 2023

Thanks for the tip. I'm on the M2 Air with 16 GB's of ram.

If anyone has faster than 12tkps on Air's let me know.

I'm using the LM Studio GUI over llama.cpp with the "Apple Metal GPU" option. Increasing CPU threads seemingly does nothing either without metal.

Ram usage hovers at 5.5GB with a q5_k_m of Mistral.

M4v3R · on Dec 8, 2023

Try different quantization variations. I got vastly different speeds depending on which quantization I chose. I believe q4_0 worked very well for me. Although for a 7B model q8_0 runs just fine too with better quality.

ukuina · on Dec 9, 2023

LlamaFile typically outperforms LM Studio and even Ollama.