sounds very similar to https://docs.pytorch.org/docs/stable/distributed.fsdp.ful... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		1aurent29 2 days ago \| parent \| context \| favorite \| on: MegaTrain: Full Precision Training of 100B+ Parame... sounds very similar to https://docs.pytorch.org/docs/stable/distributed.fsdp.fully_... i wonder how much this could be replicated using only this pytorch primitive

		help

RandyOrion 1 day ago [–]

Check out Fig. 6 in this paper, it shows the comparison between the proposed method and pytorch native FSDP offload method.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact