[HN Gopher] A look at search engines with their own indexes (2021)
___________________________________________________________________
A look at search engines with their own indexes (2021)
Author : mnem
Score : 68 points
Date : 2024-06-09 17:28 UTC (5 hours ago)
(HTM) web link (seirdy.one)
(TXT) w3m dump (seirdy.one)
| danielcampos93 wrote:
| It needs updating to include you.com, perplexity, etc. Most of
| those are google reskins/emulators but they are there non the
| less
| marginalia_nu wrote:
| > a look at search engines _with their own indexes_
| simonw wrote:
| Perplexity have their own index now, though it's not clear to
| me how much they use that over Bing in their core experience.
|
| It's also hard to find information about it (they really need
| to write more about it), but it's mentioned in this article:
| https://thenewstack.io/more-than-an-openai-wrapper-perplexit...
| Waterluvian wrote:
| Is there some 80/20 rule for web indexing?
|
| I'm not saying having deep per-page indexing of Reddit, for
| example, isn't useful. But is there any value in a breadth-
| focused index that is far cheaper to maintain?
| marginalia_nu wrote:
| Almost certainly. Internet search is above all a problem of
| improving the signal to noise ratio.
|
| There's an inordinate amount of documents that will never be a
| good search result for any query. Both in trivial cases that
| have barely anything to index in them, but also sign-up forms,
| cookie policies, redundant information (e.g. any given man page
| exists in dozens if not hundreds of identical copies on the
| web).
| reddalo wrote:
| > cookie policies
|
| Unless you're specifically searching for other websites'
| cookie policies (e.g. to understand how they work, or to do
| research on them, or just to plainly copy them...)
| jeffreyw128 wrote:
| Missed exa.ai! Embeddings-based search engine with its own index
| HeatrayEnjoyer wrote:
| How does an embeddings based search work? Without hallucinating
| bad links?
| janalsncm wrote:
| Not sure what they are doing but embeddings and hallucination
| are completely separable imo (you can have hallucination even
| without embedding-based retrieval). Likely you have an
| embedding for the query which is close to the embedding of
| the doc for some measure of similarity. That could be
| semantic similarity or even user behavior.
| cyanydeez wrote:
| Embeddings arnt grnerative AI.
|
| Theyre just vecotors of arbitrary.dimension and similarity is
| calculated by a ndimensional fnction.
| dang wrote:
| Related:
|
| _A look at search engines with their own indexes (2021)_ -
| https://news.ycombinator.com/item?id=31820149 - June 2022 (114
| comments)
| mrweasel wrote:
| I have been somewhat impressed by Mojeek, but it does have two
| obvious flaws:
|
| 1) It not really good for localized search, it might be if you're
| local to the US or UK.
|
| 2) No !bangs. Coming from Ecosia I frequently just do !w !maps
| !yt because I know where I want the answer to come from
|
| For English language searches, it completely usable, but not
| quite as good as Bing or Google. I really wanted to try to use
| Mojeek as my default for an extended period of time, but the lack
| of good local search makes it a bit annoying.
| marginalia_nu wrote:
| Local search and location-aware search is probably Google's
| biggest moat against smaller search engines. Bing does it
| passably, but it's aguably still pretty bad.
|
| What's worse is that it's probably hard to ever get working
| well without the internet-scale profiling Google has access to.
| reddalo wrote:
| > Local search and location-aware search is probably Google's
| biggest moat
|
| The European Union, at least, has limited that a bit by
| preventing Google from linking Google Maps from their SERP.
|
| So now, if you're in the EU, local results will display a map
| but you can't click on it.
| raytopia wrote:
| A little tangential but does anyone know if there are any modern
| web directories?
|
| I'm wondering because it seems like due to the amount of spam on
| the web there needs to be more human curation as opposed to
| algrothims deciding what websites are valuable or not.
| marginalia_nu wrote:
| https://ooh.directory/ is one
| reddalo wrote:
| Ohh I remember Google Directory, good times. They closed it
| down for good in 2011.
|
| Dmoz was also closed, but it seems like there's a "new" Dmoz
| called Curlie [1], founded by some of the original team
| members.
|
| [1] https://curlie.org
| wakawaka28 wrote:
| Can we get a list for 2024?
| marginalia_nu wrote:
| This is a living document. Last updated a few weeks ago.
|
| https://git.sr.ht/~seirdy/seirdy.one/log/master/item/content...
___________________________________________________________________
(page generated 2024-06-09 23:01 UTC)