[HN Gopher] Tell HN: Google doesn't work anymore for exact matches
       ___________________________________________________________________
        
       Tell HN: Google doesn't work anymore for exact matches
        
       It's been a while since I have felt that Google's results have
       deteriorated. It takes a lot of tricks to find what I am looking
       for. Today an interesting case occurred that frustrated me a lot
       and is worth telling HN.  First, I was looking for a song and
       searched for: "here were the dreams are born" (I know I mistyped).
       One of the first results I found was this interesting story (Google
       results https://imgur.com/a/gUq4XVZ):
       https://mechahuggermr.tripod.com/id66.html  I took the following
       sentence from this story and used it in the readme of an internal
       project:  "David, we have been expecting you - this is what you
       have been searching for - this place, David, is where dreams are
       born"  Some people wanted to know where this quote came from and
       could not find it on Google.  I also tested and cannot enter any
       combination of parameters into Google to find this page. I tried
       quotation marks, literal search and no hyphen. Nothing, it is
       impossible to find it.  Does anyone know what is going on here? Can
       someone do a magic call and find this page on Google?  Has Google's
       AI/BERT Enhanced Search reached a point where indexed pages can not
       be found?  All results were tested with a Brazilian connection and
       replicated in a Private Session on an US VPN.
        
       Author : bratao
       Score  : 112 points
       Date   : 2022-01-29 21:14 UTC (1 hours ago)
        
       | ergonaught wrote:
       | Google finds it for: "David, is where dreams are born.""
       | 
       | And: "The voice was deep and melodious when it spoke."
       | 
       | And most other things. Examine the raw HTML for that area and you
       | might give them a pass when searching for an exact phrase that
       | doesn't actually exist in the document itself.
        
         | [deleted]
        
         | GistNoesis wrote:
         | You are probably right. In the HTML there are some "br" line
         | returns between each line of the citation. It can find the
         | citation from parts of each of these lines but not from the
         | whole citation.
        
         | Retric wrote:
         | I don't, Google dates to 1996. Stripping white spaces/line
         | breaks etc should be part of basic parsing. Consider someone
         | typing in a poem or song lyrics etc a few extra <br> should be
         | expected especially back then.
        
       | lelandfe wrote:
       | Curiously, searching directly on the site with that quote
       | produces "No results found," and then shows an inexact match with
       | just that quote underneath. This is clearly a real bug on
       | Google's side.
       | 
       | https://imgur.com/a/2XFogU5
        
         | lelandfe wrote:
         | I may have figured it out. The site is committing hijinks with
         | the text. They're manually wrapping text with `<br>`'s and then
         | manually wrapping the _source_ with spaces. Here 's the HTML of
         | the lines in question:                   <DIV>The voice was
         | deep and melodious when it spoke. &#8220;David, we have been
         | <BR>expecting you - this is what you have
         | been searching for - this place, <BR>David, is where dreams are
         | born.&#8221; It was at this moment David realized <BR>the
         | being was speaking to him with its own voice, not by thought.
         | David <BR>stood unmoving. He realized he had never dreamed
         | before                                    or even had ever
         | <BR>slept.
         | 
         | If you search for same-line sentence fragments you'll find the
         | page: https://www.google.com/search?q=%22The+voice+was+deep+and
         | +me.... Not an excuse: this is a case Google should handle.
         | 
         | For posterity: https://imgur.com/a/DAUpLit
        
           | xyzzyz wrote:
           | When every site was full of <br>s and &nbsp;s back in the
           | day, Google had not been at all confused by it.
        
             | dnissley wrote:
             | Are we sure about that? My recollection is the same, but it
             | would be nice to have some way of ensuring my memory isn't
             | faulty...
        
               | capableweb wrote:
               | Just to remind you of how things were when Google first
               | launched (1996): W3C just started with the recommendation
               | of CSS level 1 (https://www.w3.org/Press/CSS1-REC-
               | PR.html), people were using dl, dt, ul, li and blockquote
               | elements for "styling" (layouting really) websites,
               | Internet Explorer 1.0 was launched the year before and
               | most people who wrote HTML documents were amateurs at
               | best. It's a 100% bet that the markup of yore was messed
               | up compared to todays "standards".
        
       | smt88 wrote:
       | This (shitty NLP) has been bad for a while, but I did notice it
       | get worse recently in a way that feels crippling to me. I don't
       | have a functional search engine anymore.
        
         | Liquix wrote:
         | Does anyone have insight into _why_ google search has
         | deteriorated so rapidly over the last ~6-12 months? Optimizing
         | for NLP or websites learning SEO don't seem like they would
         | have this big of an impact. Everyone seems to agree [0] [1] [2]
         | that this is a problem yet it keeps getting worse
         | 
         | https://news.ycombinator.com/item?id=27379083
         | 
         | https://news.ycombinator.com/item?id=29794372
         | 
         | https://news.ycombinator.com/item?id=29414562
        
           | causality0 wrote:
           | The voice recognition has gone to shit as well, to the point
           | where it may as well be editorializing. Apparently I'm not
           | allowed to begin a sentence with the word "our" because no
           | matter what pronunciation I use it becomes "how". I just
           | don't get it. I learned my "computer voice" talking to
           | garbage voice command systems in the early 2000s that
           | insisted on crystal-clear speech and had absolutely no issues
           | with Apple or Google voice typing until probably 2018. Since
           | then it's a been a steady decline into near-unusability. I
           | _dare_ anyone to successfully get Google to voice-type the
           | word  "o'clock".
        
       | noobermin wrote:
       | As of now, searching the quote brings up this thread. I feel like
       | Google now prioritizes certain websites (like HN) and essentially
       | skips things like tripod websites.
        
         | fault1 wrote:
         | Hasn't Google more more or less prioritized "authority" since
         | Pagerank?
         | 
         | Of course, the exact heuristics to weight authority are in a
         | continuous flux.
        
       | dnissley wrote:
       | Fwiw, the original page is formatted oddly. The line breaks seem
       | like they're part of the content? As opposed to them just being
       | one big paragraph that is wrapped by a single tag?
       | 
       | E.g. try doing this search, with each individual line quoted
       | separately:
       | https://www.google.com/search?q=%22David%2C+we+have+been%22+...
       | 
       | My question at this point is -- did this literal search ever work
       | on Google?
        
       | bbarnett wrote:
       | Because Google is... annoying, and silly, try verbatim search
       | tools > verbatim, after you get search results.
        
         | Someone1234 wrote:
         | Verbatim helps with Google silently altering your query
         | (essentially an alternative to the now-required quoting
         | everything) but it doesn't solve the massive spam issue that
         | has infected Google.
         | 
         | Google, as a company, feels a lot like IBM at the end of its
         | glory days. Google won't suddenly disappear but much like IBM
         | they will slowly shrink in relevance forever.
        
       | hamiltonians wrote:
       | google hardly works for anything
        
         | User23 wrote:
         | It's still not bad at getting Wikipedia links.
        
       | josefcullhed wrote:
       | Interesting, in Sweden I only got this story when I made the same
       | searches: https://imgur.com/a/k1Avbtm
        
       | capableweb wrote:
       | I agree with your general point that the search quality has gone
       | down, quotes doesn't even always work anymore to get exact
       | results.
       | 
       | Looking into your suggested example: That turned out to be
       | interesting and unexpected.
       | 
       | So, the exact string you put here was "David, we have been
       | expecting you - this is what you have been searching for - this
       | place, David, is where dreams are born", which is what you get
       | when you copy the text from the website. It's correct that it
       | doesn't work on Google searching for verbatim.
       | 
       | The actual DOM of the snippet looks like this:
       | "David, we have been <br>expecting you - this is what you have
       | been searching for - this place, <br>David, is where dreams are
       | born."
       | 
       | If you take any snippet of text that doesn't do a line-break, it
       | seems exact searches do work, like "expecting you - this is what
       | you have been searching for - this place" or "deep and melodious
       | when it spoke".
       | 
       | If you do take a snippet that does a line-break, then it cannot
       | find anything, like "David, we have been expecting you" or "this
       | place, David, is where "
       | 
       | It seems that Google as unlearned how to treat different type of
       | whitespaces, especially when the author/software has introduced
       | manual line-breaks via the <br/> HTML tag.
       | 
       | I'm sure they have at one point introduced some "quality filter"
       | that gives higher score based on how well the markup is made by
       | the websites, for one reason or another, and eventually it got so
       | "improved" or established that even if it's the only relevant hit
       | for a human, the computer simply ignores the result for low
       | scoring, since the markup is not 100% correct.
        
         | pcthrowaway wrote:
         | Can someone confirm if it's also broken then for bits of text
         | that are wrapped in inline elements? I don't have a suitable
         | example to try to search for off hand, but for example:
         | <div>           this is the <span className="bold">best</span>
         | day of my life         </div>
        
         | sorokod wrote:
         | Perhaps "don't attribute to cleverness something that can be
         | explained by incompetence" applies here.
        
       | michaelcampbell wrote:
       | DDG does much better with quotes for required/exact matches.
        
       | ColinWright wrote:
       | Clickables:
       | 
       | https://imgur.com/a/gUq4XVZ
       | 
       | https://mechahuggermr.tripod.com/id66.html
       | 
       | I tested this, putting in the exact phrase.
       | 
       | DDG finds the source. Google doesn't, and instead finds this
       | submission.
        
       | guerrilla wrote:
       | I stopped using Google entirely. I honestly feel violated every
       | time it strips out words that I asked it to search for on the
       | very first page. NO, I said search for this, do not do something
       | ELSE you piece of shit.
       | 
       | I actually use DuckDuckGo exclusively now, not because it got
       | better (it did a tiny bit), but because Google got so absolutely
       | horrible that DDG is now actually better! I have the habit of
       | trying Google if I can't find something with DuckDuckGo, but
       | honestly I don't even know why I bother because not once has it
       | helped since this degradation started.
       | 
       | I do wonder why though. I got the feeling that maybe they just
       | gave up. Maybe they don't have to care anymore being a _de facto_
       | monopoly and having so many other projects. It 's hard not to
       | think that spammers run the internet now... Ad networks run
       | everything and then content is just generated shit spammed into
       | results and feeds.
       | 
       | </rant>
        
         | Tempest1981 wrote:
         | While searching for info on a virus DLL, Google was pathetic. I
         | only had luck with yandex.com
        
           | guerrilla wrote:
           | Pro-tip: Yandex image search is pretty amazing actually.
        
         | moistly wrote:
         | So many bot-authored and SEO-tweaked garbage listicles and
         | advertiser-funded "reviews" and poorly-written "TIL"/"learn
         | from me" blogs bloated by advertising. In a few ways the web is
         | better now than it was a few decades ago, and in very many ways
         | it is much worse. Advertising has basically leeched almost all
         | the value out of the web.
        
       | jeffbee wrote:
       | Web indexing and search is a constant battle between space and
       | time, so it does not really surprise me that results for any
       | given input may not be stable over time. Generalizing from single
       | examples, however, is illogical.
        
       | [deleted]
        
       | MattGaiser wrote:
       | Replicated the problem here in Canada too. Bing does not find it
       | either though.
        
       | monkeybutton wrote:
       | I was looking for a specific person recently and searched: <name
       | of person> Canada
       | 
       | I guess they were pretty obscure so Google in all their wisdom
       | displayed the results for Canada, with the entire name struck
       | through. Fantastic. Defaulting to the most generic term in a
       | query to the point of absolute uselessness.
        
         | laurent92 wrote:
         | I've always found "lemming" ridiculous, especially in all
         | software that copied Google despite not being generalist.
         | "We've seen you are searching for 'Phillips screw 24x17', I
         | won't tell you that we don't have any but here are results for
         | 'Screwdrivers', just in case you want to use a screwdriver
         | instead of a screw. Also here are a few Phillips TVs, in case
         | this might help you fix your car."
        
           | monkeybutton wrote:
           | Product search on websites for traditional brick and mortar
           | stores is the worst for this. I guess they weren't born with
           | the challenge of "if customers can't find the product they
           | want, you will die" that online-only businesses have, but
           | still, it's not like online shopping is a new thing. And
           | people might like to know if the store even has what they
           | need before heading out!
        
       | braddeicide wrote:
       | Google results are low quality unless you enable verbatim. Tools,
       | all results, verbatim.
       | 
       | It blows my mind this isn't the default. I can only assume
       | they've adopted the opinion of search engines before them that
       | they could benefit from showing lower quality results to keep the
       | users on their site longer.
        
       ___________________________________________________________________
       (page generated 2022-01-29 23:00 UTC)