Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It depends on how the parallelism is implemented, e.g. distributed data parallel (DDP) to synchronize gradients: https://pytorch.org/tutorials/intermediate/ddp_tutorial.html

It's a rabbit hole I stay away from for pragmatic reasons.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: