Mixed precision is a default method to pretrain and full fine tune right now. It...

Mixed precision is a default method to pretrain and full fine tune right now. It is especially good in transformers, because they have memory bottleneck in activations (outputs of intermediate layers stored for backprop), and running forward pass in fp16/bf16 reduces VRAM by almost half (speeds up forward pass as well).