Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Got any reports/statistics to back this up? I highly doubt websites are not wanting major search engines to index them. AFAIK it's been standard practice to use `User-agent: *` for a long time. There are other anti-crawling measures because the bad crawlers are not going to respect your robots.txt.


It's not necessarily a lot of sites that block non-popular bots - but often it's big sites (i.e. content-centric sites such as Social Media). Think Yelp, Twitter, LinkedIn, Instagram, etc.

That can add up to a serious percentage of the web.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: