Got any reports/statistics to back this up? I highly doubt websites are not wanting major search engines to index them. AFAIK it's been standard practice to use `User-agent: *` for a long time. There are other anti-crawling measures because the bad crawlers are not going to respect your robots.txt.
It's not necessarily a lot of sites that block non-popular bots - but often it's big sites (i.e. content-centric sites such as Social Media). Think Yelp, Twitter, LinkedIn, Instagram, etc.
That can add up to a serious percentage of the web.