Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
minimaxir
on Dec 29, 2024
|
parent
|
context
|
favorite
| on:
All You Need Is 4x 4090 GPUs to Train Your Own Mod...
It depends on how the parallelism is implemented, e.g. distributed data parallel (DDP) to synchronize gradients:
https://pytorch.org/tutorials/intermediate/ddp_tutorial.html
It's a rabbit hole I stay away from for pragmatic reasons.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
It's a rabbit hole I stay away from for pragmatic reasons.