Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I hear this a lot, but it I haven't personally noticed any real problems with quoted search (other than the occasional difficulty actually finding the content on the resultant webpages).

I have noticed a general decline in search quality over the last couple of years, but nothing specific to quotes.

I realise I'm not saying anything particularly useful with this comment, but I just thought I'd add another datapoint.

Edit: Thinking more, my biggest issue is when the quoted text occurs in the "recommended similar posts" section of a page (particularly common with reddit). That section gets re-rendered on each view, so it probably won't be there once I click the result.



Quoted search is provably broken for some queries, try a google search for "[::]" (with the double quotes), it has no results. Similarly, try a search for 'linux next hop "[::]"' (with the double quotes), none of the results will contain [::].

Proof: https://archive.ph/AAa6k and https://archive.ph/9WGe7


> none of the results will contain

More and more frequently I was getting this for the actual search terms, in quotes or not, to the point where I would Control+F any words just to find none of them existed on the page. It's the reason I switched to dumber search engines.

I've assumed the "fast path" is to search for "phrases with similar meaning", rather than actual words. But that really destroys technical searches.


If you haven't read our post, I'd encourage doing do. Quotes do work to find the exact terms specified. But control-F won't locate some of the terms we find when fully rendering a doc -- that's why the list explains using developer tools to search if control-F comes up with nothing.


It might be useful to offer people a way to search for content that is rendered in the page, rather than content that is only visible in developer tools.


The content is rendered on the page. For example, say someone has an email sign-up box. When the page renders, the box appears and it might list all the countries in the world, so that you can pick your country from the list. All those countries are rendered, available if you use the box. But if you ctrl-f search, you might not see that text even though it did render. Real case I looked into which prompted the tip of using developer tools.


This isn't evidence of anything changing, google has always ignored punctuation - treating it as whitespace, as mentioned in the article.



Interesting - looks like they're doing this via a bunch of special-case rules.

To any google engineers reading:

Please add `really-verbatim` mode, indicated by backtick quotes, which also requires strict matching of punctuation.


I'm a Google engineer way too far organisationally to ever have any say in this.

I wonder if that will ever be worth the hardware cost. Back when I did some coursework on information retrieval, it seemed that you get superlinear savings via reducing the cardinality of tokens. So you'd do stemming, remove all punctuation, words that are too frequent ("do", "be", "and", "or", ...)... Basically remove all grammar. You do the same to your search query and the index. This intuitively reduces your compute by at least an order of magnitude, especially for languages with rich grammar (e.g. stemming nouns in Polish reduces the cardinality of tokens by a factor of 7 and verbs by a factor of 162).


No way they'll inflate their indexes even 20% and add complexity into their algorithms for 0.1% queries that won't bring any additional income.


They don't necessarily have to inflate their indexes. Backtick-quoted results ought to be a subset of double-quoted results, so they can use the standard quoted search algorithm, and then filter out imperfect matches from those results.


Google searches ignore punctuation, so it's not even indexed, so there's no way to search for punctuation without inflating the index


Read what I said. They can use the standard index, then filter the results as a last pass.


I did read what you said. Imagine trying to search for

  [::]
as 8192kjshad09- suggested earlier in the thread. What standard index results are you going to filter? Since "[::]" isn't in the index, you won't have anything to go on. To do your back-tick really-verbatim searches, the index has to be enlarged.


Ah, sorry, I see what you mean.


I work for Google Search. We did look a this, and we'll keep looking to see if we can improve, but it turns out to be a very hard lift.


The post explains that we see some punctuation as spaces so that query is a search for nothing, which is why it fails.


In my experience, quoted search works very well. I do a lot of really, really obscure searches with quoted search and Google finds the pages for me. Maybe other people get different results, but I'm puzzled by all the complaints about quoted search. You're all using double quotes ", opening and closing, right?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: