It is difficult to say this is what consumers want, when right now consumers are getting the best of both worlds: The ease of AI agents without the long-term negative consequences of destroying the publishers who created all the high quality training data in the first place.
I think in the long term the highest quality content creators are going to find ways to keep their information out of AI training data, and put it behind walled gardens.
The AI isn't "reading the web" though, they are reading the top hits on the search results, and are free-riding on the access that Google/Bing gets in order to provide actual user traffic to their sites. Many webmasters specifically opt their pages out of being in the search results (via robots.txt and/or "noindex" directives) when they believe the cost/benefit of the bot traffic isn't worth the user traffic they may get from being in the search results.
One of my websites that gets a decent amount of traffic has pretty close to a 1-1 ratio of Googlebot accesses compared to real user traffic referred from Google. As a webmaster I'm happy with this and continue to allow Google to access the site.
If ChatGPT is giving my website a ratio of 100 bot accesses (or more) compared to 1 actual user sent to my site, I very much should have to right to decline their access.
> If ChatGPT is giving my website a ratio of 100 bot accesses (or more) compared to 1 actual user sent to my site
are you trying to collect ad revenue from the actual users? otherwise a chatbot reading your page because it found it by searching google and then relaying the info, with a link, to the user who asked for it seems reasonable
While yes, I am attempting to collect ad revenue from users, and yes, I don't want somebody competing with me and cutting me out the loop, a large part of it is controlling my content. I'm not arguing whether the AI chatbot has the legal right to access the page, I'm not a legal scholar. What I'm saying is that the leading search engines also have the equal rights to access whatever content they want, and yet they all give webmasters the following tools:
- Ability to prevent their crawlers from accessing URLs via robots.txt
- Ability to prevent a page from being indexed on the internet (noindex tag)
- Ability to remove existing pages that you don't want indexed (webmaster tools)
- Ability to remove an entire domain from the search engine (webmaster tools)
It is really impolite for the AI chatbots to go around and flout all these existing conventions because they know that webmasters would restrict their access because it's much less beneficial than it is for existing search engines.
In the long run, all this is going to lead to is more anti-bot countermeasures, more content behind logins (which can have legally binding anti-AI access restrictions) and less new original content. The victim will be all humans who aren't using a chatbot to slightly benefit the ones who are.
And again, I'm not suggesting that AI chatbots should not be allowed to load webpages, just that webmasters should be able to opt out of it.
> While yes, I am attempting to collect ad revenue from users, and yes, I don't want somebody competing with me and cutting me out the loop, a large part of it is controlling my content.
> It is really impolite for the AI chatbots to go around and flout all these existing conventions because they know that webmasters would restrict their access because it's much less beneficial than it is for existing search engines.
I agree with you about the long run effects on the internet at large, but I still don't understand the horse you have in it personally. I read you as saying (1) it's less about ad revenue than content control, but (2) content control is based on analysis of benefits, i.e. ad revenue?
> Well you have no rights when you expose a server to the internet.
Technically you don’t, but there are still laws that affect what you can legally do when accessing the web. Beyond the copyright issues that have been outlined by people a lot more qualified than me, I think you could also make the point that AI crawlers actively cause direct and indirect financial harm.
A datacenter could consume a lot of water with evaporative cooling. I don't know how prevalent it is, but given how cheap and efficient evaporative cooling is, I'd guess datacenters use it a lot where possible (probably in combination with other cooling methods).
Coincidentally I just had a professionally done garage door spring replacement today, and I asked the repairman this question, and here is what he said:
1. The springs lift the door from the bottom, and from each side, which puts less load on the door itself as compared to if the entire weight were being lifted from the top middle every time.
2. The motors can be smaller, quieter and use less power
3. In case of power failure, the door is much more functional and safer the less apparently weight it has.
Also the springs themselves are very unlikely to be dangerous (as long as you don't try to replace them yourself), because he said they almost always break when the door is at the closed state, because that is when they are under the most tension. Therefore on the whole, the springs in practice offer no practical safety risk, while greatly increasing the safety of the door in it's normal operation while also reducing wear and tear on the door. They also allow people to have heavier types of doors if they want them.
for companies raising 9 figure later-stage rounds? that's not obvious to me
and relevant to this case, often the investor will do a higher valuation (artificially minting a unicorn etc) for optics/vanity reasons, which eats an additional 1+ years of future growth, eliminating the relevance of a discount here
and for folks who many not have followed terms above: investors get preferred shares, with rights over these discounted common shares. These include things like veto rights over acquisitions, first money out ("if $200M raised, no one else sees any $ until that $200M is paid back"), and for high-valuation unicorn rounds, often something like a participation multiple ("guaranteed extra $100M profit, so no one sees anything till $300M paid"), high interest rate on convertible debt portions, etc. So beyond the obvious dilution hit of new investors, there are a lot of these gotchas that trade a bigger bank account for heightened exit value risks to employees.
The people who come up with 409a prices have every incentive to make it as low as possible provided it is somewhat defensible to the IRS.
I assure you they can get more creative than saying that the last preferred price was at $X, therefore our hands are tied and the common must be close to that. They can take into consideration the preferred preferences, the current state of the business, the time since the last round, etc. For example, the 409a value can keep going down and down if the value of the business is (defensibly) going down and down, regardless of the last fundraising round.
This is a thing I'd love to see data for - the strike price discount at time of acquisition for later-stage companies. These same companies in that megaround companies probably have stock on secondary markets, which might be a good proxy for some of this.
And totally agree wrt creative arguments being viable... Just not clear what ends up happening in practice. Ex: I can imagine a split between paper unicorns vs ones w revenue backing it up being closer to market, and those later ones often switching to RSUs. So genuine curiosity here.
Self driving electric cars are much better than car shares as they'll have significantly higher usage rates and eliminate much of the need for parking spots in dense downtown areas.
If you believe car shares are part of an efficient transportation future then self-driving cars are part of it!
I'd just like to note the irony of saying how bad Google is while at the same time saying that Google's general purpose search engine is still better than any of the specific search engines of these individual sites.
Perhaps the issue isn't that Google is bad, and the issue is that search is incredibly hard.
No I disagree, because this simply wasn't the case a few years ago. Back then I could just search something, and Google would give me results, and that was it. It wasn't too long ago when googling was a proper skill to be learned and it felt like that could get you anywhere on the web. Google was exceptional once.
Now Google won't even acknowledge "" anymore, and having to hold its hand and guide it towards a single website which I already need to be aware of, is pretty pathetic compared to what Google was once able to do. Also the fact that it gives back so much spam and even puts it at the top of results.
> Google's general purpose search engine is still better than any of the specific search engines of these individual sites
This is only partially true. Google's search engine is definitely better than Reddit, but that is really not hard (I need to emphasize this, as reddit's search is really bad, unless it is old.reddit, then it is at least somewhat OK), but for many other sites the reason to pick Google is just convenience.
> Perhaps the issue isn't that Google is bad, and the issue is that search is incredibly hard.
I think the issue is more Google deliberately allowing and pushing all that spam, because users that find what they seek will spend less time on the site. Otherwise I find it hard to explain this drastic drop in quality. Would also explain why they are taking away all the useful search and query tools.
Or the people responsible for working on it just don't have the skill anymore, who knows.
You hit it exactly. It never was this bad, and now it's all junk results. And even worse, every page is the same junk results. It just repeats. It's maddening.
The answer is somewhere in between pushing people to click ads, the zealous bias towards automation, the pivot towards AI (no, the one that happened in 2016), and (tinfoil hat time) the influence of entities that saw Google's previous adroitness at quickly connecting the average person with specific and accurate information as a threat.
It's not bad because they can't build a decent search engine. Building a decent search engine is a solved problem, which they solved.
It's bad because their incentives aren't aligned with their users. The shit results they are giving aren't because they can't give good ones, they are because they don't want to.
They are bad now, but they were exceptional few years ago. Same thing with Gmail, now I get obvious spam in my inbox and real email in the spam folder. Looks like they gave up.
Generally speaking the people who live in SF aren't comparing SF to Boise, they are comparing SF to other elite cities like NYC, LA, Boston, Austin, etc. And the COL is not 2-3x higher in SF than those other cities.
Also as a little story about how crime infested SF is, a couple years ago my wife accidentally left her key ring in the lock on our front door (which included car keys to a car right out front), where they stayed for the next ~18 hours, right in obvious public view from the street, until we noticed the next morning. And lo and behold the crime was so bad that the keys were right there the next morning and we had a nice laugh about it.
I think in the long term the highest quality content creators are going to find ways to keep their information out of AI training data, and put it behind walled gardens.