Living papers are useful vision, but it will take a long way to get there.
Even notebooks still are problematic, for example, this study found that only 25% of Jupyter notebooks could be executed, and of those, only 4% actually reproduced the same results.
One compromise is to evaluate the paper separate from it's artifacts, which are reviewed for availability, reproducibility, and reusability. In software engineering conferences, this is becoming a standard, and while there is a huge burden for reviewers to evaluate these things, I think it does take us in the right direction. So in this case, we also submitted our paper for evaluation for its artifacts.
Even notebooks still are problematic, for example, this study found that only 25% of Jupyter notebooks could be executed, and of those, only 4% actually reproduced the same results.
http://www.ic.uff.br/~leomurta/papers/pimentel2019a.pdf
One compromise is to evaluate the paper separate from it's artifacts, which are reviewed for availability, reproducibility, and reusability. In software engineering conferences, this is becoming a standard, and while there is a huge burden for reviewers to evaluate these things, I think it does take us in the right direction. So in this case, we also submitted our paper for evaluation for its artifacts.