Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you watch the following video from Google (for about 5 minutes from the marked location), it looks like these algorithms have a pretty big blind spot: the unknown.

https://www.youtube.com/watch?v=sphFCJE1HkI#t=7m50s

So you can use Google itself to find words which are obviously not well known (just construct a dictionary filled with random characters and see how many results come back for each word). Take a passage of clean text, and randomly replace the nouns in the text with these garbage words (capitalize the first character), and post it to your social network. The surrounding text will provide legitimacy, but the garbage words will probably throw off the algorithms. Do it often enough, and someone at Google will probably investigate the issue :-)

Or maybe go and get public domain text which has a lot of words which are not in use today and fill up your FB/Google/Twitter feeds with such data. My view is that such data will very likely throw off the existing algorithms, and if people do it at sufficient scale, we may discover that ML algorithms are only as smart as the training data set.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: