Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Having worked on bot detection in the past. Some really simple old fashioned attacks happened by doing the opposite of what the robots.txt file says.

While I doubt it does much today, that file really only matters to those that want to play by the rules which on the free web is not an awful lot of the web anymore I’m afraid.



That was the first thing that I have learnt about the robots.txt file. Even RFC 9309 Robots Exclusion Protocol document: https://www.rfc-editor.org/rfc/rfc9309.html - mentions:

> These rules are not a form of access authorization.

Meaning that these are not enforced in any way. They cannot prevent you from accessing anything really.

I think the only approach that could work in this scenario would be to find which companies disregard the robots.txt, and bring it to the attention of technical community. Practices like these could make the company look shady and untrustworthy if found out. That could be one way to keep them accountable, even though there is still no guarantee they will abide by it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: