Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
1aurent29
2 days ago
|
parent
|
context
|
favorite
| on:
MegaTrain: Full Precision Training of 100B+ Parame...
sounds very similar to
https://docs.pytorch.org/docs/stable/distributed.fsdp.fully_...
i wonder how much this could be replicated using only this pytorch primitive
help
RandyOrion
1 day ago
[–]
Check out Fig. 6 in this paper, it shows the comparison between the proposed method and pytorch native FSDP offload method.
reply
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: