These companies innovate in all of those areas and direct those resources towards building hyper-scale custom infrastructure, including CPU, TPU, GPU, and custom networking hardware for the largest cloud systems, and conduct research and development on new compilers and operating system components to exploit them.
They're building it for themselves and employ world-class experts across the entire stack.
How can NVIDIA develop "more integrated" solutions when they are primarily building for these companies, as well as many others?
Examples of these companies doing things you mention as being somehow unique to or characteristic of NVIDIA:
They're building it for themselves and employ world-class experts across the entire stack.
How can NVIDIA develop "more integrated" solutions when they are primarily building for these companies, as well as many others?
Examples of these companies doing things you mention as being somehow unique to or characteristic of NVIDIA:
Complex kernel drivers or modules:
- AWS: Nitro, ENA/EFA, Firecracker, NKI, bottlerocket
- Google: gasket/apex, gve, binder
- Meta: Katran, bpfilter, cgroup2, oomd, btrfs
Hardware simulators:
- AWS: Neuron, Annapurna builds simulations for nitro, graviton, inferentia and validates aws instances built for EDA services
- Google: Goldfish, Ranchu, Cuttlefish
- Meta: Arcadia, MTIA, CFD for thermal management
Optimizing Compilers:
- Amazon: NNVM, Neo-AI
- Google: MLIR, XLA, IREE
- Meta: Glow, Triton, LLM Compiler
Acceleration Libraries:
- Amazon: NeuronX, aws-ofi-nccl
- Google: Jax, TF
- Meta: FBGEMM, QNNPACK