fblgit's comments

fblgit · on Dec 8, 2023

Correct. UNA can align the MoE at multiple layers, experts, nearly any part of the neural network I would say. Xaberius 34B v1 "BETA".. is the king, and its just that.. the beta. I'll be focusing on the Mixtral, its a christmas gift.. modular in that way, thanks for the lab @mistral!

brucethemoose2 · on Dec 8, 2023

Do a Yi 200K version as well! That would make my Christmas, as Mistral Moe is only maybe 32K.

inciampati · on Dec 12, 2023

Do you have any docs describing the method?

fblgit · on Dec 8, 2023

UNA: Uniform Neural Alignment. Haven't u noticed yet? Each model that I uniform, behaves like a pre-trained.. and you likely can fine-tune it again without damaging it.

If you chatted with them, you know .. that strange sensation, you know what is it.. Intelligence. Xaberius-34B is the highest performer of the board, and is NOT contaminated.

valine · on Dec 8, 2023

How much data do you need for UNA? Is a typical fine tuning dataset needed or can you get away with less than that?

brucethemoose2 · on Dec 8, 2023

In addition to what was said, if its anything like DPO you don't need a lot of data, just a good set. For instance, DPO requires "good" and "bad" responses for each given prompt.

fblgit · on Dec 8, 2023

doesn't require much data, in a 7B can take a couple hours ~

valine · on Dec 8, 2023

That’s cool. A couple hours on a single GPU or like 8x a100s?