[HN Gopher] How not to break a search engine
       ___________________________________________________________________
        
       How not to break a search engine
        
       Author : akpa1
       Score  : 37 points
       Date   : 2021-07-03 19:43 UTC (1 days ago)
        
 (HTM) web link (about.sourcegraph.com)
 (TXT) w3m dump (about.sourcegraph.com)
        
       | walrus01 wrote:
       | If you want to see a very bad example of a broken search engine,
       | use the Google voice Assistant to search for "how many raccoons
       | can fit"
        
         | Jap2-0 wrote:
         | For something a little more innocuous, type "Cicero" in the
         | address bar in Chrome or Firefox and see what it autocompletes
         | it to.
        
         | Dah00n wrote:
         | I'm not sure that is "broken" but more like inappropriate. I
         | read an article with a list of bad default results (as in with
         | no filter bubble) that are way worse. Like searching for
         | "tween" on bing shows auto suggestions like "tween swimsuits
         | inappropriate" and "tween budding images". Same for all search
         | engines that uses bing like Brave and DDG (at least at the time
         | I read about it). It seems pretty obvious what those searches
         | are looking for.
        
         | fsckboy wrote:
         | i just did a regular text search for it and the top results
         | where "how many raccoons can fit up your butt"... no voice
         | assistant involved, and the destination urls were not google's.
        
         | oogali wrote:
         | Wow, I was not expecting _that_ result.
        
           | walrus01 wrote:
           | Neither were the raccoons
        
       | tantalor wrote:
       | No live traffic experiment? Seems risky to launch without
        
       | amelius wrote:
       | I want to know how to _benchmark_ a search engine.
        
         | chii wrote:
         | I'm sure there's tonnes of research and prior art on this
         | subject, but it's an interesting inquiry.
         | 
         | off the top of my head, there's two meaning - performance
         | benchmarking (how fast the search results comes back), and
         | accuracy/fit-for-purposeness benchmarking (how good it is at
         | finding something the user intends).
         | 
         | Performance is easy. It's the accuracy/fit-for-purposeness that
         | would be an interesting benchmark.
         | 
         | I wonder if you have to use an empirical measurement for
         | accuracy - that is, give a random sample of people a target
         | piece of code (or file) to find, and see how long or how many
         | queries it takes to find it.
        
           | toast0 wrote:
           | For quality, you really do need to do human qualititative
           | measures to get a full measure, with all of the fun that
           | involves.
           | 
           | However, you can do things like generate search terms from
           | your top N documents through some method, and then do the
           | queries and confirm the document you generated the term from
           | shows up in the top M results.
           | 
           | This can be circular though if you're not careful; the top N
           | documents may not include important documents that nobody
           | could find.
        
       | sqs wrote:
       | @rijnard (the blog post author) is awesome, and all of the code
       | changes he talks about in the blog post are public. You can see
       | all of his recent changes to search code in
       | https://sourcegraph.com/search?q=context:global+repo:%5Egith...
       | in case you want to follow along by just reading the code (that
       | query shows all of his diffs that touch paths containing
       | `search`).
        
       | 1970-01-01 wrote:
       | I was playing with breaking them a long time ago. Here are some
       | interesting searches that seem to do almost nothing:
       | 
       | Quoted Single ASCII Chars:
       | 
       | https://search.yahoo.com/search?p="~"
       | 
       | https://www.google.com/search?&q="~"
       | 
       | https://www.bing.com/search?q="~"
       | 
       | https://search.yahoo.com/search?p="`"
       | 
       | https://www.google.com/search?&q="`"
       | 
       | https://www.bing.com/search?q="`"
       | 
       | https://search.yahoo.com/search?p="!"
       | 
       | https://www.google.com/search?&q="!"
       | 
       | https://www.bing.com/search?q="!"
       | 
       | https://search.yahoo.com/search?p="@"
       | 
       | https://www.google.com/search?&q="@"
       | 
       | https://www.bing.com/search?q="@"
       | 
       | https://search.yahoo.com/search?p="#"
       | 
       | https://www.google.com/search?&q="#"
       | 
       | https://www.bing.com/search?q="#"
       | 
       | https://search.yahoo.com/search?p="$"
       | 
       | https://www.google.com/search?&q="$"
       | 
       | https://www.bing.com/search?q="$"
       | 
       | https://search.yahoo.com/search?p="%"
       | 
       | https://www.google.com/search?&q="%"
       | 
       | https://www.bing.com/search?q="%"
       | 
       | https://search.yahoo.com/search?p="^"
       | 
       | https://www.google.com/search?&q="^"
       | 
       | https://www.bing.com/search?q="^"
       | 
       | https://search.yahoo.com/search?p="&"
       | 
       | https://www.google.com/search?&q="&"
       | 
       | https://www.bing.com/search?q="&"
       | 
       | https://search.yahoo.com/search?p=" _"
       | 
       | https://www.google.com/search?&q="_ "
       | 
       | https://www.bing.com/search?q=" _"
       | 
       | https://search.yahoo.com/search?p="("
       | 
       | https://www.google.com/search?&q="("
       | 
       | https://www.bing.com/search?q="("
       | 
       | https://search.yahoo.com/search?p=")"
       | 
       | https://www.google.com/search?&q=")"
       | 
       | https://www.bing.com/search?q=")"
       | 
       | https://search.yahoo.com/search?p="-"
       | 
       | https://www.google.com/search?&q="-"
       | 
       | https://www.bing.com/search?q="-"
       | 
       | https://search.yahoo.com/search?p="_"
       | 
       | https://www.google.com/search?&q="_"
       | 
       | https://www.bing.com/search?q="_"
       | 
       | https://search.yahoo.com/search?p="+"
       | 
       | https://www.google.com/search?&q="+"
       | 
       | https://www.bing.com/search?q="+"
       | 
       | https://search.yahoo.com/search?p="="
       | 
       | https://www.google.com/search?&q="="
       | 
       | https://www.bing.com/search?q="="
       | 
       | https://search.yahoo.com/search?p="%"
       | 
       | https://search.yahoo.com/search?p="\"
       | 
       | https://search.yahoo.com/search?p=%22\%22
       | 
       | https://search.yahoo.com/search?p="#"
       | 
       | https://search.yahoo.com/search?p=":"
       | 
       | https://search.yahoo.com/search?p="%"
       | 
       | https://www.bing.com/search?q="%28%29"
       | 
       | https://www.bing.com/search?q="_. _"
       | 
       | https://www.bing.com/search?q="=="
       | 
       | https://www.bing.com/search?q="$"
       | 
       | https://www.bing.com/search?q="+"
       | 
       | https://www.google.com/search?&q="~"
       | 
       | https://www.google.com/search?&q="`"
       | 
       | https://www.google.com/search?&q="!"
       | 
       | https://www.google.com/search?&q="!!"
       | 
       | https://www.google.com/search?&q="@"
       | 
       | https://www.google.com/search?&q="#"
       | 
       | https://www.google.com/search?&q="$"
       | 
       | https://www.google.com/search?&q="%"
       | 
       | https://www.google.com/search?&q="^"
       | 
       | https://www.google.com/search?&q="&"
       | 
       | https://www.google.com/search?&q=_ #quotes do nothing here
       | 
       | https://www.google.com/search?&q="\*" #same as above
       | 
       | https://www.google.com/search?&q=":)"
       | 
       | https://www.google.com/search?&q="~"
       | 
       | https://www.google.com/search?&q="\"
       | 
       | https://www.google.com/search?&q=")"
       | 
       | https://www.google.com/search?&q="%"
        
       ___________________________________________________________________
       (page generated 2021-07-04 23:00 UTC)