I'm trying to disable "thinking", but it doesn't seem to work (in llama.cpp). The usual `--reasoning-budget 0` doesn't seem to change it, nor `--chat-template-kwargs '{"enable_thinking":false}'` (both with `--jinja`). Am I missing something?
EDIT: Ok, looks like there's yet another new flag for that in llama.cpp, and this one seems to work in this case: `--reasoning off`.
FWIW, I'm doing some initial tries of unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL, and for writing some Nix, I'm VERY impressed - seems significantly better than qwen3.5-35b-a3b for me for now. Example commandline on a Macbook Air M4 32gb RAM:
llama-cli -hf unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL -t 1.0 --top-p 0.95 --top-k 64 -fa on --no-mmproj --reasoning-budget 0 -c 32768 --jinja --reasoning off
EDIT: Ok, looks like there's yet another new flag for that in llama.cpp, and this one seems to work in this case: `--reasoning off`.
FWIW, I'm doing some initial tries of unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL, and for writing some Nix, I'm VERY impressed - seems significantly better than qwen3.5-35b-a3b for me for now. Example commandline on a Macbook Air M4 32gb RAM:
(at release b8638, compiled with Nix)