Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Code cannot and should not be self documenting at scale. You cannot document "the why" with code. In my experience, that is only ever used as an excuse not to write actual documentation or use comments thoughtfully in the codebase by lazy developers.


this always starts out right but over the years the code changes and its documentation seldom does, even on the best of teams. the amount of code documentation that I have seen that is just plain wrong (it was right at some point) far outnumbers the amount of code documentation that was actually in-sync with the code. 30 years in the industry so large sample size. now I prefer no code documentation in general


The good thing about having documentation in the (version-controlled) code is that it allows you to retrace when it was correct (using git blame or equivalent), and that gives you background about why certain things are the way they are. I 100% prefer outdated documentation in the code to no documentation.


Are there any good systems that somehow enforce consistency between documentation and code? Maybe the problem is fundamentally ill-posed.


Simon Willison had this idea of "Documentation unit tests" in 2018: https://simonwillison.net/2018/Jul/28/documentation-unit-tes...

It's not a massively complex AI monstrosity (it's from 2018 after all) or a perfect solution, but it's a good jumping off point.

With a slight sprinkling of LLM this could be improved quite a bit. Not by having the agent write the documentation necessarily, but for checking the parity and flagging it for users.

For example a CI job that checks that relevant documentation has been created / updated when new functionality is added or old one is changed.


interesting that they don’t mention doctest which has been a python built-in for quite a while.

It allows you to write simple unit tests directly in your doc strings, by essentially copying the repl output so it doubles as an example.

combined with something like sphinx that is almost exactly what you’re looking for.

doctest kind of sucks for anything where you need to set up state, but if you’re writing functional code it is often a quick and easy way to document and test your code/documentation at the same time.

https://docs.python.org/3/library/doctest.html


Doctest is writing unit tests in doctstrings.

That system is an unit test that checks that functions are documented in the documentation. Nothing to do with docstrings.


right but docstrings are documentation, so if your doctest is working, then at least that part of the documentation is correct.

Even without doctest, generating your documentation from docstrings is much easier to keep updated than writing your documentation somewhere else, because it is right there as you are making changes.


If the documentation and code could be in-sync, then the documentation would just be code, like type hints. But good documentation that the parent is talking about cannot be in-sync.

Programming languages can't understand semantics, and that's why we program in the first place. I can't tell a computer "I would like a program to achieve this goal", instead I have to instruct it how to achieve the goal. Then, I would need to document elsewhere what the goal is and why I'm doing it.

LLMs change that, we can now legitimately ask the model "I would like a program for this goal". But the documentation is lost in the code if we don't save comments or save the prompt.

Git commits are also a good source of documentation. They shouldn't describe what we're doing, because I can just read the code. But often, I come across code and I'm thinking "why are we doing this? Can I change this? If I change it, what are the side effects?" If I'm lucky, the git blame will answer those questions for me.


I am not saying it doesn't matter because it does, but how much does it matter now since we can get documentation on the fly?

I started working on something today I hadn't touched in a couple years. I asked for a summary of code structure, choices I made, why I made them, required inputs and expected outputs. Of course it wasn't perfect, but it was a very fast way to get back up to speed. Faster than picking through my old code to re-familiarize myself for sure.


We cannot get full documentation on the fly, though. We can get "what this does" level of documentation for the system that AI is looking at. And if all you are doing is writing some code, maybe that is enough. But AI cannot offer the bigger picture of where it fits in the overall infrastructure, nor the business strategy. It cannot tell you why technical debt was chosen on some feature 5-10 years ago. And those types of documentation are far more important these days, as people write less of the code by hand.

This is the same discussion that goes round ad nauseum about comments. Nobody needs comments to tell us what the code does. We need comments to explain why choices were made.


Keeping the documentation in the repo (Markdown files) and using an AI coding agent to update the code seems to work quite well for keeping documentation up to date (especially if you have an AGENTS.md/CLAUDE.md in the repo telling it to always make sure the documentation is up to date).


Ultimately the code is the documentation.


Code can only ever document "what" by definition, never "why". If it could document "why", then no computer programmers would exist. So, we have to supplement the "why" using natural language. There's a 100% loss conversion there when we convert it to code.


This is correct. Comments serve a purpose too, but they should only be used when code fails to self document which should be the exception.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: