Hacker Newsnew | past | comments | ask | show | jobs | submit | ismailmaj's commentslogin

"Oh sorry guys, we made the mistake again that saves us X% in compute cost, we will fix it soon!"

I find it a bit too introspective, living in a specific period creates urgency, there is still a (opportunity) deadline.

My concerns are more practical:

- How do we make sure power-wrestling bad leaders don't halt generations, currently old age gives us a soft reset.

- How do we cope with suffering? With no clear end in sight I expect more mental health problems. And I'm unsure we can remove those issues without striping the humanity out of people, I would for sure hate to be turned into a calculator with qualia.


For the lazy, he says this on repeat using 2000 words:

...

In the CPB Digital Cosmos, the system first locked into a strange ratio: two thirds consciousness, one third physics.

...

That anomaly appeared as the missing 0.1 spark.

For the first time the system stabilized. Life emerged.


I don’t think it was ever an option since it had ties with the french government early on (Cédric O) and Macron’s party is quite pro EU


They let so many important French companies down. So, yes, it could happen despite this beginning.


Just in time for the monthly EU bashing


Weekly.


daily


Don't think so, I expect that system to use Spanner, so my best guess is that the user generated an image at the end of the credit reset window (which is around noon EST).


I’m guessing the argument is that LLMs get worse for problems they haven’t seen before, so you may assume they think for problems that are commonly discussed in the internet or seen on github, but once you step out of that zone, you get plausible but logically false results.

That or a reductive fallacy, in either case I’m not convinced, IMO they are just not smart enough (either due to lack of complexity in the architecture or bad training that didn’t help it generalize reasoning patterns).


Probably refers to jobs that are unrelated to his interest like cashier, just to get by for now.


This. Earning a living by not breaking the law.


To make matters worse, the RTX3090 was released during the crypto craze and so a decent amount of the second hand market could contain overused GPUs that won’t last long, even if 3xxx to 4xxx performance difference is not that high, I would avoid the 3xxx series totally for resell value.


I bought 2 ex mining 3090s ~3 years ago. They’re in an always on pc that I remote into. Haven’t had a problem. If there was mass failures of gpus due to mining I would expect to have heard more about it


I have rig of 7 3090s that I bought from crypto bros, they are lasting quite alright and have been chugging along fine for the last 2 years. GPUs are electronic devices not mechanical devices, they rarely blow up.


How do you have a rig that fits that many cards?? those things take 3 slots apiece.

Pictures, or it never happened! :D


you get a motherboard designed for the purpose (many pcie slots) and a case (usually open frame) that holds that many cards. riser cables are used so every card doesnt plug directly into the motherboard


I've noticed on ebay there are a lot of 3090s for sale that seem to have rusted or corroded heatsinks. I actually can't recall seeing this with used GPUs before but maybe I just haven't paying attention. Does this have to do with running them flat out in a basement or something?


Run near a saltwater source without AC and that will happen.


The way I explained it to myself in the past why so much of the CUDA algorithms don't care much about numerical stability is that the error is a form of regularization (i.e. less overfitting over the data) in deep learning.


I am not quite sure what that means! :)

But reasons why deep learning training is very robust to moderate inaccuracy in gradients:

1. Locally, sigmoid and similar functions are the simplest smoothest possible non-linearity to propagate gradients through.

2. Globally, outside of deep recurrent networks, there is no recursion which makes the total function smooth and well behaved.

3. While the perfect gradient indicates the ideal direction to adjust parameters, for fastest improvement, all that is really needed to reduce error is to move parameters in the direction of the gradient signs, with a small enough step. That is a very low bar.

It's like telling an archer they just need to shoot an arrow so it lands closer to the target than where the archer is standing, but not worry about hitting it!

4. Finally, the perfect first order gradient is only meaningful at one point of the optimization surface. Moving away from that point, i.e. updating the parameters at all, and the gradient changes quickly.

So we are in gradient heuristic land even with "perfect" first order gradients. The most perfectly calculated gradient isn't actually "accurate" already.

To actually get an accurate gradient over a parameter step, would take fitting the local gradient with a second or third order polynomial. I.e. not just first, but second and third order derivatives. At vastly greater computational and working memory cost.

--

The only critical issue for calculating gradients, is that there is enough precision that at least directional gradient information makes it from errors back to the parameters to update. If precision is too low, then the variable magnitude rounding inherent to floating point arithmetic can completely drop directional information for smaller gradients. Without accurate gradient signs, learning stalls.


Typically for matrix multiplications there is a wide range of algorithms you could use to compute it, on one extreme end you could use numerically stable summation and the other extreme you could have tiled matmul with FP8, the industry trend seems to go further away from numerical stable algorithms without much quality drop it seems. My claim is probably unfair since it ignores the scale you gain from the speed/precision tradeoff, so I assumed numerical stability is not that beneficial compared to something precision heavy like physics simulation in HPC.


> I assumed numerical stability is not that beneficial compared to something precision heavy like physics simulation in HPC.

Yes, exactly.

For physics, there is a correct result. I.e. you want your simulation to reflect reality with high accuracy, over a long chain of calculations. Extremely tight constraint.

For deep learning, you don't have any specific constraints on parameters, except that you want to end up with a combination that fits the data well. There are innumerable combinations of parameter values that will do that, you just need to find one good enough combination.

Wildly different.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: