Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The GGLLM fork seems to be the leading falcon winner for now [1]

It comes with its own variant of the GGML sub format "ggcv1" but there's quants available on HF [2]

Although if you have a GPU I'd go with the newly released AWQ quantization instead [3] the performance is better.

(I may or may not have a mild local LLM addiction - and video cards cost more then drugs)

[1] https://github.com/cmp-nct/ggllm.cpp

[2] https://huggingface.co/TheBloke/falcon-7b-instruct-GGML

[3] https://huggingface.co/abhinavkulkarni/tiiuae-falcon-7b-inst...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: