That's what they say but I just spent 10 minutes searching the git repo, reading...

zhwu · on April 3, 2023

If you follow this command in their instruction, the delta will be automatically downloaded and applied to the base model. https://github.com/lm-sys/FastChat#vicuna-13b: `python3 -m fastchat.model.apply_delta --base /path/to/llama-13b --target /output/path/to/vicuna-13b --delta lmsys/vicuna-13b-delta-v0`

viraptor · on April 3, 2023

This can be then quantized to the llama.cpp/gpt4all format, right? Specifically, this only tweaks the existing weights slightly, without changing the structure?

_0lhw · on April 3, 2023

I may have missed the detail, but it also expects the pytorch conversion rather than original LLaMa model.

zhwu · on April 3, 2023

Yes, you need to convert the original LLaMA model to the huggingface format, according to https://github.com/lm-sys/FastChat#vicuna-weights and https://huggingface.co/docs/transformers/main/model_doc/llam...

MMMercy2 · on April 3, 2023

You can use this command to apply the delta weights. (https://github.com/lm-sys/FastChat#vicuna-13b) The delta weights are hosted on huggingface and will be automatically downloaded.

superkuh · on April 3, 2023

Thanks! https://huggingface.co/lmsys/vicuna-13b-delta-v0

Edit, later: I found some instructive pages on how to use the vicuna weights with llama.cpp (https://lmsysvicuna.miraheze.org/wiki/How_to_use_Vicuna#Use_...) and pre-made ggml format compatible 4-bit quantized vicuna weights, https://huggingface.co/eachadea/ggml-vicuna-13b-4bit/tree/ma... (8GB ready to go, no 60+GB RAM steps needed)

eurekin · on April 3, 2023

I did try, but got:

``` ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported. ```

superkuh · on April 3, 2023

> Unfortunately there's a mismatch between the model generated by the delta patcher and the tokenizer (32001 vs 32000 tokens). There's a tool to fix this at llama-tools (https://github.com/Ronsor/llama-tools). Add 1 token like (C controltoken), and then run the conversion script.

DrSiemer · on April 4, 2023

Just rename it in the tokenconfig.json

eurekin · on April 4, 2023

Thanks, that indeed worked!

This and using conda in wsl2, instead on bare windows