> But the truth is that all kinds of things disappear all the time in all aspect...

hexis · on April 15, 2022

The Internet Archive does not respect robots.txt - https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...

dewey · on April 15, 2022

The blog post you are linking is outdated. They are honoring robots.txt files. From the FAQ:

> Some sites are not available because of robots.txt or other exclusions. What does that mean? Such sites may have been excluded from the Wayback Machine due to a robots.txt file on the site or at a site owner’s direct request.

If you exclude them in your robots.txt file they will also absolutely retroactively remove your site from the index.

- https://news.ycombinator.com/item?id=16965575

- https://help.archive.org/help/using-the-wayback-machine/

fencepost · on April 16, 2022

I would absolutely love an option that meant "archive and make available forever from this point backwards" to protect against domain expirations and re-registration (possibly by domain squatters or content farms).

hexis · on April 15, 2022

I hope you're right! The lack of an update on that post, combined with the FAQ saying the opposite thing, makes it even harder for me to know what their policy is. Respecting robots.txt is a civilized thing to do and I hope they do it.

account42 · on April 19, 2022

I hope they don't. If you don't want things archived, don't put them out there.