Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Good article, but I feel that it misses an important property of diffusion models that they model the score function (derivative of log prob) [1] and that diffusion sampling is akin to Langevin dynamics [2]. IMO these explain why it's easier to train these models than GANs, because of an easier modeling objective.

[1] https://yang-song.net/blog/2021/score/

[2] https://lilianweng.github.io/posts/2021-07-11-diffusion-mode...



Yes, these blog posts offer a different perspective on diffusion models from the "projection onto data" perspective described in this blog post. You can view them as different ways of interpreting the same training objective and sampling process. In our perspective, diffusion models are easier to train because instead of predicting the gradient of the _exact_ distance function, the training objective predicts the gradient of a _smoothed_ distance function. Sampling the diffusion model is akin to taking multiple approximate gradient steps.

To gain a deeper understanding of diffusion models, I encourage everyone to read all of these blog posts and learn about the different interpretations :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: