[HN Gopher] Tell HN: Google doesn't work anymore for exact matches
___________________________________________________________________
Tell HN: Google doesn't work anymore for exact matches
It's been a while since I have felt that Google's results have
deteriorated. It takes a lot of tricks to find what I am looking
for. Today an interesting case occurred that frustrated me a lot
and is worth telling HN. First, I was looking for a song and
searched for: "here were the dreams are born" (I know I mistyped).
One of the first results I found was this interesting story (Google
results https://imgur.com/a/gUq4XVZ):
https://mechahuggermr.tripod.com/id66.html I took the following
sentence from this story and used it in the readme of an internal
project: "David, we have been expecting you - this is what you
have been searching for - this place, David, is where dreams are
born" Some people wanted to know where this quote came from and
could not find it on Google. I also tested and cannot enter any
combination of parameters into Google to find this page. I tried
quotation marks, literal search and no hyphen. Nothing, it is
impossible to find it. Does anyone know what is going on here? Can
someone do a magic call and find this page on Google? Has Google's
AI/BERT Enhanced Search reached a point where indexed pages can not
be found? All results were tested with a Brazilian connection and
replicated in a Private Session on an US VPN.
Author : bratao
Score : 112 points
Date : 2022-01-29 21:14 UTC (1 hours ago)
| ergonaught wrote:
| Google finds it for: "David, is where dreams are born.""
|
| And: "The voice was deep and melodious when it spoke."
|
| And most other things. Examine the raw HTML for that area and you
| might give them a pass when searching for an exact phrase that
| doesn't actually exist in the document itself.
| [deleted]
| GistNoesis wrote:
| You are probably right. In the HTML there are some "br" line
| returns between each line of the citation. It can find the
| citation from parts of each of these lines but not from the
| whole citation.
| Retric wrote:
| I don't, Google dates to 1996. Stripping white spaces/line
| breaks etc should be part of basic parsing. Consider someone
| typing in a poem or song lyrics etc a few extra <br> should be
| expected especially back then.
| lelandfe wrote:
| Curiously, searching directly on the site with that quote
| produces "No results found," and then shows an inexact match with
| just that quote underneath. This is clearly a real bug on
| Google's side.
|
| https://imgur.com/a/2XFogU5
| lelandfe wrote:
| I may have figured it out. The site is committing hijinks with
| the text. They're manually wrapping text with `<br>`'s and then
| manually wrapping the _source_ with spaces. Here 's the HTML of
| the lines in question: <DIV>The voice was
| deep and melodious when it spoke. “David, we have been
| <BR>expecting you - this is what you have
| been searching for - this place, <BR>David, is where dreams are
| born.” It was at this moment David realized <BR>the
| being was speaking to him with its own voice, not by thought.
| David <BR>stood unmoving. He realized he had never dreamed
| before or even had ever
| <BR>slept.
|
| If you search for same-line sentence fragments you'll find the
| page: https://www.google.com/search?q=%22The+voice+was+deep+and
| +me.... Not an excuse: this is a case Google should handle.
|
| For posterity: https://imgur.com/a/DAUpLit
| xyzzyz wrote:
| When every site was full of <br>s and s back in the
| day, Google had not been at all confused by it.
| dnissley wrote:
| Are we sure about that? My recollection is the same, but it
| would be nice to have some way of ensuring my memory isn't
| faulty...
| capableweb wrote:
| Just to remind you of how things were when Google first
| launched (1996): W3C just started with the recommendation
| of CSS level 1 (https://www.w3.org/Press/CSS1-REC-
| PR.html), people were using dl, dt, ul, li and blockquote
| elements for "styling" (layouting really) websites,
| Internet Explorer 1.0 was launched the year before and
| most people who wrote HTML documents were amateurs at
| best. It's a 100% bet that the markup of yore was messed
| up compared to todays "standards".
| smt88 wrote:
| This (shitty NLP) has been bad for a while, but I did notice it
| get worse recently in a way that feels crippling to me. I don't
| have a functional search engine anymore.
| Liquix wrote:
| Does anyone have insight into _why_ google search has
| deteriorated so rapidly over the last ~6-12 months? Optimizing
| for NLP or websites learning SEO don't seem like they would
| have this big of an impact. Everyone seems to agree [0] [1] [2]
| that this is a problem yet it keeps getting worse
|
| https://news.ycombinator.com/item?id=27379083
|
| https://news.ycombinator.com/item?id=29794372
|
| https://news.ycombinator.com/item?id=29414562
| causality0 wrote:
| The voice recognition has gone to shit as well, to the point
| where it may as well be editorializing. Apparently I'm not
| allowed to begin a sentence with the word "our" because no
| matter what pronunciation I use it becomes "how". I just
| don't get it. I learned my "computer voice" talking to
| garbage voice command systems in the early 2000s that
| insisted on crystal-clear speech and had absolutely no issues
| with Apple or Google voice typing until probably 2018. Since
| then it's a been a steady decline into near-unusability. I
| _dare_ anyone to successfully get Google to voice-type the
| word "o'clock".
| noobermin wrote:
| As of now, searching the quote brings up this thread. I feel like
| Google now prioritizes certain websites (like HN) and essentially
| skips things like tripod websites.
| fault1 wrote:
| Hasn't Google more more or less prioritized "authority" since
| Pagerank?
|
| Of course, the exact heuristics to weight authority are in a
| continuous flux.
| dnissley wrote:
| Fwiw, the original page is formatted oddly. The line breaks seem
| like they're part of the content? As opposed to them just being
| one big paragraph that is wrapped by a single tag?
|
| E.g. try doing this search, with each individual line quoted
| separately:
| https://www.google.com/search?q=%22David%2C+we+have+been%22+...
|
| My question at this point is -- did this literal search ever work
| on Google?
| bbarnett wrote:
| Because Google is... annoying, and silly, try verbatim search
| tools > verbatim, after you get search results.
| Someone1234 wrote:
| Verbatim helps with Google silently altering your query
| (essentially an alternative to the now-required quoting
| everything) but it doesn't solve the massive spam issue that
| has infected Google.
|
| Google, as a company, feels a lot like IBM at the end of its
| glory days. Google won't suddenly disappear but much like IBM
| they will slowly shrink in relevance forever.
| hamiltonians wrote:
| google hardly works for anything
| User23 wrote:
| It's still not bad at getting Wikipedia links.
| josefcullhed wrote:
| Interesting, in Sweden I only got this story when I made the same
| searches: https://imgur.com/a/k1Avbtm
| capableweb wrote:
| I agree with your general point that the search quality has gone
| down, quotes doesn't even always work anymore to get exact
| results.
|
| Looking into your suggested example: That turned out to be
| interesting and unexpected.
|
| So, the exact string you put here was "David, we have been
| expecting you - this is what you have been searching for - this
| place, David, is where dreams are born", which is what you get
| when you copy the text from the website. It's correct that it
| doesn't work on Google searching for verbatim.
|
| The actual DOM of the snippet looks like this:
| "David, we have been <br>expecting you - this is what you have
| been searching for - this place, <br>David, is where dreams are
| born."
|
| If you take any snippet of text that doesn't do a line-break, it
| seems exact searches do work, like "expecting you - this is what
| you have been searching for - this place" or "deep and melodious
| when it spoke".
|
| If you do take a snippet that does a line-break, then it cannot
| find anything, like "David, we have been expecting you" or "this
| place, David, is where "
|
| It seems that Google as unlearned how to treat different type of
| whitespaces, especially when the author/software has introduced
| manual line-breaks via the <br/> HTML tag.
|
| I'm sure they have at one point introduced some "quality filter"
| that gives higher score based on how well the markup is made by
| the websites, for one reason or another, and eventually it got so
| "improved" or established that even if it's the only relevant hit
| for a human, the computer simply ignores the result for low
| scoring, since the markup is not 100% correct.
| pcthrowaway wrote:
| Can someone confirm if it's also broken then for bits of text
| that are wrapped in inline elements? I don't have a suitable
| example to try to search for off hand, but for example:
| <div> this is the <span className="bold">best</span>
| day of my life </div>
| sorokod wrote:
| Perhaps "don't attribute to cleverness something that can be
| explained by incompetence" applies here.
| michaelcampbell wrote:
| DDG does much better with quotes for required/exact matches.
| ColinWright wrote:
| Clickables:
|
| https://imgur.com/a/gUq4XVZ
|
| https://mechahuggermr.tripod.com/id66.html
|
| I tested this, putting in the exact phrase.
|
| DDG finds the source. Google doesn't, and instead finds this
| submission.
| guerrilla wrote:
| I stopped using Google entirely. I honestly feel violated every
| time it strips out words that I asked it to search for on the
| very first page. NO, I said search for this, do not do something
| ELSE you piece of shit.
|
| I actually use DuckDuckGo exclusively now, not because it got
| better (it did a tiny bit), but because Google got so absolutely
| horrible that DDG is now actually better! I have the habit of
| trying Google if I can't find something with DuckDuckGo, but
| honestly I don't even know why I bother because not once has it
| helped since this degradation started.
|
| I do wonder why though. I got the feeling that maybe they just
| gave up. Maybe they don't have to care anymore being a _de facto_
| monopoly and having so many other projects. It 's hard not to
| think that spammers run the internet now... Ad networks run
| everything and then content is just generated shit spammed into
| results and feeds.
|
| </rant>
| Tempest1981 wrote:
| While searching for info on a virus DLL, Google was pathetic. I
| only had luck with yandex.com
| guerrilla wrote:
| Pro-tip: Yandex image search is pretty amazing actually.
| moistly wrote:
| So many bot-authored and SEO-tweaked garbage listicles and
| advertiser-funded "reviews" and poorly-written "TIL"/"learn
| from me" blogs bloated by advertising. In a few ways the web is
| better now than it was a few decades ago, and in very many ways
| it is much worse. Advertising has basically leeched almost all
| the value out of the web.
| jeffbee wrote:
| Web indexing and search is a constant battle between space and
| time, so it does not really surprise me that results for any
| given input may not be stable over time. Generalizing from single
| examples, however, is illogical.
| [deleted]
| MattGaiser wrote:
| Replicated the problem here in Canada too. Bing does not find it
| either though.
| monkeybutton wrote:
| I was looking for a specific person recently and searched: <name
| of person> Canada
|
| I guess they were pretty obscure so Google in all their wisdom
| displayed the results for Canada, with the entire name struck
| through. Fantastic. Defaulting to the most generic term in a
| query to the point of absolute uselessness.
| laurent92 wrote:
| I've always found "lemming" ridiculous, especially in all
| software that copied Google despite not being generalist.
| "We've seen you are searching for 'Phillips screw 24x17', I
| won't tell you that we don't have any but here are results for
| 'Screwdrivers', just in case you want to use a screwdriver
| instead of a screw. Also here are a few Phillips TVs, in case
| this might help you fix your car."
| monkeybutton wrote:
| Product search on websites for traditional brick and mortar
| stores is the worst for this. I guess they weren't born with
| the challenge of "if customers can't find the product they
| want, you will die" that online-only businesses have, but
| still, it's not like online shopping is a new thing. And
| people might like to know if the store even has what they
| need before heading out!
| braddeicide wrote:
| Google results are low quality unless you enable verbatim. Tools,
| all results, verbatim.
|
| It blows my mind this isn't the default. I can only assume
| they've adopted the opinion of search engines before them that
| they could benefit from showing lower quality results to keep the
| users on their site longer.
___________________________________________________________________
(page generated 2022-01-29 23:00 UTC)