[HN Gopher] How not to break a search engine
___________________________________________________________________
How not to break a search engine
Author : akpa1
Score : 37 points
Date : 2021-07-03 19:43 UTC (1 days ago)
(HTM) web link (about.sourcegraph.com)
(TXT) w3m dump (about.sourcegraph.com)
| walrus01 wrote:
| If you want to see a very bad example of a broken search engine,
| use the Google voice Assistant to search for "how many raccoons
| can fit"
| Jap2-0 wrote:
| For something a little more innocuous, type "Cicero" in the
| address bar in Chrome or Firefox and see what it autocompletes
| it to.
| Dah00n wrote:
| I'm not sure that is "broken" but more like inappropriate. I
| read an article with a list of bad default results (as in with
| no filter bubble) that are way worse. Like searching for
| "tween" on bing shows auto suggestions like "tween swimsuits
| inappropriate" and "tween budding images". Same for all search
| engines that uses bing like Brave and DDG (at least at the time
| I read about it). It seems pretty obvious what those searches
| are looking for.
| fsckboy wrote:
| i just did a regular text search for it and the top results
| where "how many raccoons can fit up your butt"... no voice
| assistant involved, and the destination urls were not google's.
| oogali wrote:
| Wow, I was not expecting _that_ result.
| walrus01 wrote:
| Neither were the raccoons
| tantalor wrote:
| No live traffic experiment? Seems risky to launch without
| amelius wrote:
| I want to know how to _benchmark_ a search engine.
| chii wrote:
| I'm sure there's tonnes of research and prior art on this
| subject, but it's an interesting inquiry.
|
| off the top of my head, there's two meaning - performance
| benchmarking (how fast the search results comes back), and
| accuracy/fit-for-purposeness benchmarking (how good it is at
| finding something the user intends).
|
| Performance is easy. It's the accuracy/fit-for-purposeness that
| would be an interesting benchmark.
|
| I wonder if you have to use an empirical measurement for
| accuracy - that is, give a random sample of people a target
| piece of code (or file) to find, and see how long or how many
| queries it takes to find it.
| toast0 wrote:
| For quality, you really do need to do human qualititative
| measures to get a full measure, with all of the fun that
| involves.
|
| However, you can do things like generate search terms from
| your top N documents through some method, and then do the
| queries and confirm the document you generated the term from
| shows up in the top M results.
|
| This can be circular though if you're not careful; the top N
| documents may not include important documents that nobody
| could find.
| sqs wrote:
| @rijnard (the blog post author) is awesome, and all of the code
| changes he talks about in the blog post are public. You can see
| all of his recent changes to search code in
| https://sourcegraph.com/search?q=context:global+repo:%5Egith...
| in case you want to follow along by just reading the code (that
| query shows all of his diffs that touch paths containing
| `search`).
| 1970-01-01 wrote:
| I was playing with breaking them a long time ago. Here are some
| interesting searches that seem to do almost nothing:
|
| Quoted Single ASCII Chars:
|
| https://search.yahoo.com/search?p="~"
|
| https://www.google.com/search?&q="~"
|
| https://www.bing.com/search?q="~"
|
| https://search.yahoo.com/search?p="`"
|
| https://www.google.com/search?&q="`"
|
| https://www.bing.com/search?q="`"
|
| https://search.yahoo.com/search?p="!"
|
| https://www.google.com/search?&q="!"
|
| https://www.bing.com/search?q="!"
|
| https://search.yahoo.com/search?p="@"
|
| https://www.google.com/search?&q="@"
|
| https://www.bing.com/search?q="@"
|
| https://search.yahoo.com/search?p="#"
|
| https://www.google.com/search?&q="#"
|
| https://www.bing.com/search?q="#"
|
| https://search.yahoo.com/search?p="$"
|
| https://www.google.com/search?&q="$"
|
| https://www.bing.com/search?q="$"
|
| https://search.yahoo.com/search?p="%"
|
| https://www.google.com/search?&q="%"
|
| https://www.bing.com/search?q="%"
|
| https://search.yahoo.com/search?p="^"
|
| https://www.google.com/search?&q="^"
|
| https://www.bing.com/search?q="^"
|
| https://search.yahoo.com/search?p="&"
|
| https://www.google.com/search?&q="&"
|
| https://www.bing.com/search?q="&"
|
| https://search.yahoo.com/search?p=" _"
|
| https://www.google.com/search?&q="_ "
|
| https://www.bing.com/search?q=" _"
|
| https://search.yahoo.com/search?p="("
|
| https://www.google.com/search?&q="("
|
| https://www.bing.com/search?q="("
|
| https://search.yahoo.com/search?p=")"
|
| https://www.google.com/search?&q=")"
|
| https://www.bing.com/search?q=")"
|
| https://search.yahoo.com/search?p="-"
|
| https://www.google.com/search?&q="-"
|
| https://www.bing.com/search?q="-"
|
| https://search.yahoo.com/search?p="_"
|
| https://www.google.com/search?&q="_"
|
| https://www.bing.com/search?q="_"
|
| https://search.yahoo.com/search?p="+"
|
| https://www.google.com/search?&q="+"
|
| https://www.bing.com/search?q="+"
|
| https://search.yahoo.com/search?p="="
|
| https://www.google.com/search?&q="="
|
| https://www.bing.com/search?q="="
|
| https://search.yahoo.com/search?p="%"
|
| https://search.yahoo.com/search?p="\"
|
| https://search.yahoo.com/search?p=%22\%22
|
| https://search.yahoo.com/search?p="#"
|
| https://search.yahoo.com/search?p=":"
|
| https://search.yahoo.com/search?p="%"
|
| https://www.bing.com/search?q="%28%29"
|
| https://www.bing.com/search?q="_. _"
|
| https://www.bing.com/search?q="=="
|
| https://www.bing.com/search?q="$"
|
| https://www.bing.com/search?q="+"
|
| https://www.google.com/search?&q="~"
|
| https://www.google.com/search?&q="`"
|
| https://www.google.com/search?&q="!"
|
| https://www.google.com/search?&q="!!"
|
| https://www.google.com/search?&q="@"
|
| https://www.google.com/search?&q="#"
|
| https://www.google.com/search?&q="$"
|
| https://www.google.com/search?&q="%"
|
| https://www.google.com/search?&q="^"
|
| https://www.google.com/search?&q="&"
|
| https://www.google.com/search?&q=_ #quotes do nothing here
|
| https://www.google.com/search?&q="\*" #same as above
|
| https://www.google.com/search?&q=":)"
|
| https://www.google.com/search?&q="~"
|
| https://www.google.com/search?&q="\"
|
| https://www.google.com/search?&q=")"
|
| https://www.google.com/search?&q="%"
___________________________________________________________________
(page generated 2021-07-04 23:00 UTC)