Hot take but Mistral 7B is the actual state of the art of LLM's.
ChatGPT 4 is amazing yes and i've been a day 1 subscriber, but it's huge, runs on server farms far away and is more or less a black box.
Mistral is tiny, and amazingly coherent and useful for it's size for both general questions and code, uncensored, and a leap i wouldn't have believed possible in just a year.
I can run it on my Macbook Air at 12tkps, can't wait to try this on my desktop.
State of the art for something you can run on a Macbook air, but not state of the art for LLMs, or even open source. Yi 34B and Llama2 70B still beat it.
True but it's ahead of the competition when size is considered, which is why i really look forward to their 13B, 33B models etc. because if they are as potent who knows what leaps we'll take soon.
I remember running llama1 33B 8 months ago that as i remember was on Mistral 7B's level while other 7B models were a rambling mess.
Given that 50% of all information consumed in the internet is produced in the last 24 hours, smaller models could hold a serious advantage over bigger models.
If an LLM or a SmallLM can be retrained or fine-tuned constantly, every week or every day to incorporate recent information then outdated models trained a year or two years back hold no chance to keep up. Dunno about the licensing but OpenAI could incorporate a smaller model like Mistral7B into their GPT stack, re-train it from scratch every week, and charge the same as GPT-4. There are users who might certainly prefer the weaker, albeit updated models.
It's much easier to do RAG than try to shoehorn the entirety of the universe into 7B parameters every 24 hours. Mistral's great at being coherent and processing info at 7B, but you wouldn't want it as an oracle.
I didn't know about RAG, thanks for sharing. I am not sure, if outdated information can be tackled with RAG though, especially in coding.
Just today, i asked GPT and Bard(Gemini) to write code using slint, neither of them had any idea of slint. Slint being a relatively new library, like two and a half (0.1 version) to one and a half (0.2 version) years back [1] is not something they trained on.
Natural language doesn't change that much over the course of a handful of years, but in coding 2 years back may as well be a century. My argument is that, SmallLMs not only they are relevant, they are actually desirable, if the best solution is to be retrained from scratch.
If on the other hand a billion token context window proves to be practical, or the RAG technique solves most of use cases, then LLMs might suffice. This RAG technique, could it be aware, of million of git commits daily, on several projects, and keep it's knowledge base up to date? I don't know about that.
Thanks for letting me know, i didn't use GPT-4, but i was under the impression that the cutoff data between all GPT's was the same, or almost the same. The code is correct, yes.
I do not have a GPT4 subscription, i did not bother because it is so slow, limited queries etc. If the cutoff date is improved, like being updated periodically i may think about it. (Late response, forgot about the comment!)
Yes it’s much better now in all those areas, I think you’ll be surprised if your last experience was a few months ago. The difference in ability between 3.5 and 4 is significant.
I am with you on this. Mistral 7B is amazingly good. There are finetunes of it (the Intel one, and Berkeley Starling) that feel like they are within throwing distance of gpt3.5T... at only 7B!
I was really hoping for a 13B Mistral. I'm not sure if this MOE will run on my 3090 with 24GB. Fingers crossed that quantization + offloading + future tricks will make it runnable.
True i've been using the OpenOrca finetune and just downloaded the new UNA Cybertron model both tuned on the Mistral base.
They are not far from GPT-3 logic wise i'd say if you consider the breadth of data, ie. very little in 7GB's; so missing other languages, niche topics and prose styles etc.
I honestly wouldn't be surprised if 13B would be indistinguishable from GPT-3.5 on some levels. And if that is the case - then coupled with the latest developments in decoding - like Ultrafastbert, Speculative, Jacobi, Lookahead etc. i honestly wouldn't be surprised to see local LLM's on current GPT-4 level within a few years.
> I can run it on my Macbook Air at 12tkps, can't wait to try this on my desktop.
That seems kinda low, are you using Metal GPU acceleration with llama.cpp? I don't have a macbook, but saw some of the llama.cpp benchmarks that suggest it can reach close to 30tk/s with GPU acceleration.
Try different quantization variations. I got vastly different speeds depending on which quantization I chose. I believe q4_0 worked very well for me. Although for a 7B model q8_0 runs just fine too with better quality.
it really is. it feels at the very least equal to llama2 13b.
if mistral 70b had existed and was as much an improvement over llama2 70b as it is at 7b size, it would definitely be on part with gpt3.5
Not a hot take, I think you're right. If it was scaled up to 70b, I think it would be better than Llama 2 70b. Maybe if it was then scaled up to 180b and turned into a MoE it would be better than GPT-4.
ChatGPT 4 is amazing yes and i've been a day 1 subscriber, but it's huge, runs on server farms far away and is more or less a black box.
Mistral is tiny, and amazingly coherent and useful for it's size for both general questions and code, uncensored, and a leap i wouldn't have believed possible in just a year.
I can run it on my Macbook Air at 12tkps, can't wait to try this on my desktop.