Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Without exception, every AI company is a play for your data. AI requires a continuing supply of new data to train on, it does not "get better" merely by using the existing trainsets with more compute.

Furthermore, synthetic data is a flawed concept. At a minimum, it tends to propagate and amplify biases in the model generating the data. If you ignore that, there's also the fundamental issue that data doesn't exist purely to run more gradient descent, but to provide new information that isn't already compressed into the existing model. Providing additional copies of the same information cannot help.



it does not "get better" merely by using the existing trainsets with more compute.

Pretty sure it does - that’s the whole point of using more test time compute. Also, a lot of research efforts goes into improving data efficiency.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: