Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Irrespective of whether major search engines might use language models, fake web sites will use them.This could make it increasingly difficult to find valid information and maybe precipitate some sort of arms race between algorithms that detect algorithms

A pathetic scenario but somehow consistent with the rules (or lack thereof) of the game



It seems to me (or I hope) this isn't really much different from the current situation: we already have a ton of useless quasi-content made up only for search engines, just look for something like "best android phone" or whatever.

Search engines will have to rely more on signals outside the content, such as links from other authoritative sources, but it does not look like a qualitatively different world.


I agree it is a quantitative difference at its core but cheap automation can dramatically lower the signal to noise. Crossing a certain quality threshold may eventually precipitate binary behavioral changes for users (i.e. conclude that certain online tools are unusable / untrustworthy)

I also agree that authoritative sources become critical. Yet those typically rely on very human assessments (with their own pitfals and controversies) and in any case much more slow / costly to develop.

How exactly this all will play out is not clear (to me). But the naive technosolutionism of deploying "AI at scale" and believing that it will just work as advertised seems misplaced. The human condition is very reflexive.


It will be an interesting arms race, but I'm hopeful for truth there.

I can imagine good enough AI being able to spot truth even better than what humans do - by veryfing sites and commenters with sources of real information to estimate their credibility.

E.g. in a theme similar to Page Rank, you could have an AI that has some sites as a source of objective truth (Wikipedia, science journals, reputable sources of news etc), and then use that as a basis of estimating trustworthiness of a material.

Also, AI could find, for a given subject, opposing opinions, and estimate which ones are possibly fake, and which ones are real.

In essence - do what current fact-checkers do, but for every single website and comment in existence.


> some sites as a source of objective truth (Wikipedia, science journals, reputable sources of news etc)

the irony is that finding "objective truth" is a very non-trivial human game but in all cases costly. E.g journalism has been decimated after losing their traditional ad revenue. Wikipedia and science journals survive because they rely on informal and formal public funds etc.


> I can imagine good enough AI being able to spot truth even better than what humans do

The thing that makes me question that is the data that is used to train those models to begin with. To disect truth on the internet, can you use the internet as a source of truth to train it?


Wikipedia is not always source of truth.

Look for english version of article about Nord Stream... Compare with any other langage (no need to know these other languages).

There is something fishy going on here.


What did you find fishy about them? They seem roughly the same, aside from the countries involved with the projects having a bit more local information.


I think they're referring to the length of the articles, e.g.:

https://en.wikipedia.org/wiki/Nord_Stream

https://nl.wikipedia.org/wiki/Nord_Stream

https://de.wikipedia.org/wiki/Nord_Stream

The Dutch and German are a lot longer than the quite short English version. But ... that's just a matter of organisation: in the English the editors chose to make separate "Nord stream {1,2}" articles, in other languages they folded it in one article. On the German one in particular it's just two huge sections.

In short, it's fishy in the same way that bread tastes like fish: not at all.


I should have checked just before posting.

One month ago, they were no english version available from the french page on the article, only a 3 lines 'simplified english' version were linked.


I have absolutely no idea whether it applied to that case, too, but I've found that Wikipedia's language mapping sometimes breaks down when there's no easy 1:1 mapping between articles in differing languages.


I think Godel would have something interesting to say here; or maybe not, I'm not really sure.


Sure, but humans face the same issue when fact checking. Even moreso because we can’t browse the whole of humanity’s knowledge as well as machines can.


"Fake web sites" have all of the same ranking and reputation problems as they did before.


you mean like how you can submit a post to reddit with literally any title you want and people will accept it as true fact?


/me goes to register gptoverflow.com




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: