Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Every well-behaved bot (ie, almost all of them) has its own user agent string. You can just grep -v those out.


I wish that were true, but I'd estimate that less than half of the bot hits on my website properly identify themselves as bots. Most of the crawlers running on DigitalOcean/AWS/etc. seem to use a user-agent string lifted from one of the common browsers.


Check for the order of the headers and the TLS protocols it claims to support – those are useful for identification.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: