[HN Gopher] A look at search engines with their own indexes (2021)
       ___________________________________________________________________
        
       A look at search engines with their own indexes (2021)
        
       Author : mnem
       Score  : 68 points
       Date   : 2024-06-09 17:28 UTC (5 hours ago)
        
 (HTM) web link (seirdy.one)
 (TXT) w3m dump (seirdy.one)
        
       | danielcampos93 wrote:
       | It needs updating to include you.com, perplexity, etc. Most of
       | those are google reskins/emulators but they are there non the
       | less
        
         | marginalia_nu wrote:
         | > a look at search engines _with their own indexes_
        
         | simonw wrote:
         | Perplexity have their own index now, though it's not clear to
         | me how much they use that over Bing in their core experience.
         | 
         | It's also hard to find information about it (they really need
         | to write more about it), but it's mentioned in this article:
         | https://thenewstack.io/more-than-an-openai-wrapper-perplexit...
        
       | Waterluvian wrote:
       | Is there some 80/20 rule for web indexing?
       | 
       | I'm not saying having deep per-page indexing of Reddit, for
       | example, isn't useful. But is there any value in a breadth-
       | focused index that is far cheaper to maintain?
        
         | marginalia_nu wrote:
         | Almost certainly. Internet search is above all a problem of
         | improving the signal to noise ratio.
         | 
         | There's an inordinate amount of documents that will never be a
         | good search result for any query. Both in trivial cases that
         | have barely anything to index in them, but also sign-up forms,
         | cookie policies, redundant information (e.g. any given man page
         | exists in dozens if not hundreds of identical copies on the
         | web).
        
           | reddalo wrote:
           | > cookie policies
           | 
           | Unless you're specifically searching for other websites'
           | cookie policies (e.g. to understand how they work, or to do
           | research on them, or just to plainly copy them...)
        
       | jeffreyw128 wrote:
       | Missed exa.ai! Embeddings-based search engine with its own index
        
         | HeatrayEnjoyer wrote:
         | How does an embeddings based search work? Without hallucinating
         | bad links?
        
           | janalsncm wrote:
           | Not sure what they are doing but embeddings and hallucination
           | are completely separable imo (you can have hallucination even
           | without embedding-based retrieval). Likely you have an
           | embedding for the query which is close to the embedding of
           | the doc for some measure of similarity. That could be
           | semantic similarity or even user behavior.
        
           | cyanydeez wrote:
           | Embeddings arnt grnerative AI.
           | 
           | Theyre just vecotors of arbitrary.dimension and similarity is
           | calculated by a ndimensional fnction.
        
       | dang wrote:
       | Related:
       | 
       |  _A look at search engines with their own indexes (2021)_ -
       | https://news.ycombinator.com/item?id=31820149 - June 2022 (114
       | comments)
        
       | mrweasel wrote:
       | I have been somewhat impressed by Mojeek, but it does have two
       | obvious flaws:
       | 
       | 1) It not really good for localized search, it might be if you're
       | local to the US or UK.
       | 
       | 2) No !bangs. Coming from Ecosia I frequently just do !w !maps
       | !yt because I know where I want the answer to come from
       | 
       | For English language searches, it completely usable, but not
       | quite as good as Bing or Google. I really wanted to try to use
       | Mojeek as my default for an extended period of time, but the lack
       | of good local search makes it a bit annoying.
        
         | marginalia_nu wrote:
         | Local search and location-aware search is probably Google's
         | biggest moat against smaller search engines. Bing does it
         | passably, but it's aguably still pretty bad.
         | 
         | What's worse is that it's probably hard to ever get working
         | well without the internet-scale profiling Google has access to.
        
           | reddalo wrote:
           | > Local search and location-aware search is probably Google's
           | biggest moat
           | 
           | The European Union, at least, has limited that a bit by
           | preventing Google from linking Google Maps from their SERP.
           | 
           | So now, if you're in the EU, local results will display a map
           | but you can't click on it.
        
       | raytopia wrote:
       | A little tangential but does anyone know if there are any modern
       | web directories?
       | 
       | I'm wondering because it seems like due to the amount of spam on
       | the web there needs to be more human curation as opposed to
       | algrothims deciding what websites are valuable or not.
        
         | marginalia_nu wrote:
         | https://ooh.directory/ is one
        
         | reddalo wrote:
         | Ohh I remember Google Directory, good times. They closed it
         | down for good in 2011.
         | 
         | Dmoz was also closed, but it seems like there's a "new" Dmoz
         | called Curlie [1], founded by some of the original team
         | members.
         | 
         | [1] https://curlie.org
        
       | wakawaka28 wrote:
       | Can we get a list for 2024?
        
         | marginalia_nu wrote:
         | This is a living document. Last updated a few weeks ago.
         | 
         | https://git.sr.ht/~seirdy/seirdy.one/log/master/item/content...
        
       ___________________________________________________________________
       (page generated 2024-06-09 23:01 UTC)