I was wondering how well this would work :) You can definitely push this further...

		WithinReason 20 days ago \| parent \| context \| favorite \| on: MegaTrain: Full Precision Training of 100B+ Parame... I was wondering how well this would work :) You can definitely push this further, the question is: how well can the gradients and updates compress?