If H800 is a memory-constrained model that NVIDIA built to avoid the Chinese export ban on H100 with equivalent fp8 performance,
it makes zero sense to believe Elon Musk, Dario Armodei and Alexandr Wang's claims that DeepSeek smuggled H100s.
The only reason why a team would allocate time on memory optimizations and writing NVPTX code rather than focusing on posttraining is if they severely struggled with memory during training.
This is a massive trick pulled by Jensen, take the H100 design whose sales are regulated by the government, make it look 40x weaker and call it H800, while conveniently leaving 8-bit computation as fast as H100. Then bring it to China and let companies stockpile without disclosing production or sales numbers, and have no export controls.
Eventually, after 7 months, US govt starts noticing the H800 sales and introduces new export controls, but it's too late. By this point, DeepSeek has started research using fp8. They slowly build bigger and bigger models, work on the bandwidth and memory consumptions, until they make r1 - their reasoning model.
Especially since he seems intent on everyone talking about him all the time. I find it questionable when a person wants to be the centre of attention no matter. Perhaps attention is not all we need.
He's like a broken smart network switch, smart as in managed. Packets with switch MAC on it are all broken, but erroneously forwarded ones often has valuable data. We through L3 don't know which one is which.
You should start a blog... or maybe not - pursue the battle in academia/work and occasionally drop nuggets of wisdom like this somewhere. But do not delete them.
As the (tautological) saying goes: everyone is doing their best. My interest is whether this can be improved - perhaps at some point when AI gets closer to challenging us for cognitive supremacy we will awake from our slumber.
It is a question. I tried to put what my opinion is on a few statements but I absolutely cannot summarize 160 pages (Business Insider did using GPT, which I find insulting and funny) nor have a 100% opinion on something that involves national security, secrets and other stuff that I don't have access to.
He's an atheist psychiatrist. However, he enjoys how natural selection, social dynamics and reputation can also be modeled by the moral rules of most religions. For example, going to therapy isn't that different from practicing confessions in a church.
Therapy is only similar to confession in a descriptive sense, from an external observer that doesn't believe in the religion in question. For the believer, it's a very different thing to be confessing sins to a representative of the divine.
He also wrote Unsong, a sprawling work of kabbalistic magical realism, full of deep references to Judaism and Christianity (and groan-inducing puns.) Highly recommend if you're a lapsed Catholic or a secular Jew.
> For example, going to therapy isn't that different from practicing confessions in a church.
Depends on the kind of therapy. Some are more dogmatic and one might even call them "religious" (overloading the word), while other forms of therapy are more grounded and based on empiricism -- the latter ones are far removed from religious confession.
> For example, going to therapy isn't that different from practicing confessions in a church.
If your therapist responds to your sharing by suggesting that straying from their advice, or some kooky ancient text, results in eternal damnation then run far away. Churches can be devastating to ones mental health. Do not recommend.
> Churches can be devastating to ones mental health. Do not recommend.
No issue with your first clause: indeed, churches can be devastating to one's mental health (as can ~anything). That observation does not support your second clause. Religiosity is generally associated with lower mental health problems [0]. Correlation is not causation, of course; difficult to do a prospective study in this field.
This is as silly of a strawman as creating a scenario where in the confession booth your priest goes over the BioPsychoSocial model and how their mental health is important for their overall well-being.
Fun fact: jgc (John Graham-Cumming) is the person that started the petition that forced the UK government to publicly apologize for its harsh treatment and persecution of Alan Turing.
The character limit incentivizes content that's polarizing and evokes strong emotions, because you are competing for attention without the ability to express subtleties/details.
The only reason why a team would allocate time on memory optimizations and writing NVPTX code rather than focusing on posttraining is if they severely struggled with memory during training.
I mean, take a look at the numbers:
https://www.fibermall.com/blog/nvidia-ai-chip.htm#A100_vs_A8...
This is a massive trick pulled by Jensen, take the H100 design whose sales are regulated by the government, make it look 40x weaker and call it H800, while conveniently leaving 8-bit computation as fast as H100. Then bring it to China and let companies stockpile without disclosing production or sales numbers, and have no export controls.
Eventually, after 7 months, US govt starts noticing the H800 sales and introduces new export controls, but it's too late. By this point, DeepSeek has started research using fp8. They slowly build bigger and bigger models, work on the bandwidth and memory consumptions, until they make r1 - their reasoning model.