I'm totally new to AI. If I take for example LLaMa 3.1 (small size 8B), what's t...

reissbaker · on Sept 12, 2024

Let's assume that the average token size in your 1GB file is 4 characters (which is the average that the OpenAI tokenizer generally will get; I assume the Llama tokenizer is similar). 4 chars is 4 bytes, assuming here that you're using UTF-8 and your characters are in the Latin range, so that means your training data is about 264MM tokens.

Let's assume you're doing a single-epoch LoRA training run. A single H100 should be enough to train Llama 3.1 8B, and it should crank through 264MM tokens in a couple hours, IMO. Since you're not doing multi-GPU training, a PCIe H100 should be fine — you don't need the slightly pricier SXM H100s — and the PCIe versions go for about $2.50/hr on Runpod.

So, about $5 for a custom model, that's probably the best in the world at whatever your task is! (Even if it might be a little dumber at other tasks.) Insanely cheap when you think about it.

TPUs won't beat H100s on price for on-demand personal use cases, but for reserved capacity (i.e. businesses) they're slightly cheaper.

voiper1 · on Sept 12, 2024

I'm still new to LoRA/fine tunes, but: I can't just dump in 1gb of data, correct? I need to structure it in Question/Answer or the like?

So it would seem the cost really becomes converting/curating the data into a usable format first.

staticman2 · on Sept 12, 2024

You can dump in 1gb of data (Unsloth supports "raw text training") but whether you'd get good results or a useless model is a different issue. I doubt you'd get a good result unless you combine that with question/answer training as well, assuming that feature is even useful at all for your scenario.

fbn79 · on Sept 12, 2024

Really incredible :O I was imagining numbers with two extra zeros