[HN Gopher] I made a search engine worse than Elasticsearch (2024)
       ___________________________________________________________________
        
       I made a search engine worse than Elasticsearch (2024)
        
       Author : softwaredoug
       Score  : 119 points
       Date   : 2025-06-05 18:37 UTC (1 days ago)
        
 (HTM) web link (softwaredoug.com)
 (TXT) w3m dump (softwaredoug.com)
        
       | niazangels wrote:
       | Learnt a lot from this! Thank you for the write up.
        
       | neuroelectron wrote:
       | This is worth more than Alphabet
        
         | sph wrote:
         | How? Alphabet already has a search engine worse than
         | Elasticsearch.
        
           | endymion-light wrote:
           | alphabet have a search engine? i thought it was just an ad
           | machine at this point
        
             | softwaredoug wrote:
             | An ad machine that's a search engine, just optimized for ad
             | relevance not just search relevance :)
        
             | mrguyorama wrote:
             | It is a search engine. You enter a search string and it
             | returns all the ads that are associated with that search
             | and your user.
        
       | sh34r wrote:
       | I feel like this is a rite of passage for all engineers: messing
       | around with things like Lucene long enough to realize that
       | search-for-humans is a relatively hard problem, even at small
       | scale.
       | 
       | Improving your simple website's search function will take days or
       | weeks, not hours. If you make your own search engine, it's almost
       | guaranteed to be worse than ElasticSearch.
        
         | bob1029 wrote:
         | You can get pretty far with Lucene primitives. That's the level
         | of abstraction I prefer to work at. Running search in a
         | different process or container means I lose the advantages of
         | tight integration of search/indexer logic with business logic.
         | Keeping indexes on the local disk (just like SQLite) is a
         | really simple deployment model too.
         | 
         | I agree that implementing something like Lucene from scratch
         | would be an uphill battle. Probably not worth the time.
        
         | jillesvangurp wrote:
         | It's not a reason to not take on such a project and learn
         | something. But it is a good reason to approach the subject with
         | some humility. There are posts here every few months/weeks of
         | someone boasting that they are running circles around Lucene in
         | some way. BTW. Elasticsearch uses Lucene. Lucene is where all
         | the cool stuff it does is implemented.
         | 
         | Implementing your own search is indeed a bit of a rite of
         | passage. Usually, if you go look at such implementations,
         | you'll find they implemented 1% of the features, cut lots of
         | corners and then came up with some benchmark that proves they
         | are faster for some toy dataset. WAND would be a good example
         | of something most of these things don't do.
         | 
         | Doug is of course a search relevance expert who has published
         | several books on the subject. So, this is not some naive person
         | implementing BM25 but just somebody building tools they need to
         | do bigger things. Sometimes Elasticseach/Lucene are just
         | overkill and it is worth having your own implementation.
         | 
         | You can find my own vibe coded version here:
         | https://github.com/jillesvangurp/querylight. Nice embeddable
         | search engine for kotlin multiplatform (works in kotlin-js,
         | android, ios, wasm, and of course jvm). I use it in some
         | browser based apps.
         | 
         | If I need a proper search engine, I use Elasticsearch or
         | Opensearch.
        
           | fucalost wrote:
           | +1 for OpenSearch, especially with UltraWarm nodes
        
         | cha42 wrote:
         | I use PostgreSQL full text search and GIN indexing and often
         | find it to be good enough and fast enough without the hassle to
         | have to handle a second engine just for search.
        
         | stuaxo wrote:
         | Having elasticsearch, as this resource hungry slow to update
         | JVM based thing always seems so horrible in Django based
         | projects.
         | 
         | In that world, using haystack and choosing a backend based on
         | C++ is so much less hassle for deployment.
         | 
         | Although for many things just FTS in Postgres is fine too.
         | 
         | I'm sure for planet scale stuff ES is fine, but otherwise I've
         | only found it brings pain in the kind of dev I get to do.
        
         | moralestapia wrote:
         | I made mine and it performs way better for my specific use
         | case. Also, single digit ms latencies.
         | 
         | I might actually open source it, it's a single file anyway.
        
         | pphysch wrote:
         | > Improving your simple website's search function will take
         | days or weeks, not hours.
         | 
         | Full-text search, sure, but you can easily provide a better
         | overall search experience by creating a custom wrapping
         | algorithm that provides shortcuts for common access patterns of
         | _your_ users in _your_ application, in addition to full-text
         | search.
        
       | Alifatisk wrote:
       | This made me so thankful for Elasticsearch existence
        
       | stuaxo wrote:
       | I mean.. I hate having to use elasticsearch, so this is quite a
       | feat.
       | 
       | (To be fair, I've only worked on projects that use ES where it is
       | entirely unnessacary).
        
       | nchmy wrote:
       | Folks should check out Manticoresearch. It evolved out of Sphinx
       | search, which is older than Lucene and powers things like
       | Craigslist.
       | 
       | Much easier to deal with and faster than elastic
       | 
       | https://manticoresearch.com/
        
         | 0xC0ncord wrote:
         | The problem I quickly ran into with Manticoresearch is it's
         | missing a bunch of the API that most Elasticsearch clients
         | expect. It certainly is fast, though.
        
           | Imustaskforhelp wrote:
           | I am sure that it isn't that big of a dealbreaker for me
           | personally but surely this can be created by the
           | Manticoresearch right? It doesn't seem to be that bad given
           | the performance gains of atleast 2x on elasticsearch which is
           | already pretty performant in my opinion and also, you get to
           | be stress free about if elasticsearch would change its
           | license again or not given their license pull if I remember
           | correctly.
        
         | Imustaskforhelp wrote:
         | Very interesting. Thanks for the share! Appreciate it.
        
       | 0xB0UNCE00 wrote:
       | And so what if it's worse than elasticsearch, it's the playing
       | around and learning that counts.
        
       | fucalost wrote:
       | I actually _really_ like Elasticsearch. It's very powerful,
       | there's a healthy ecosystem of tools (increasingly for OpenSearch
       | too), and the query language makes sense to me.
       | 
       | Sure it's computationally expensive, inefficient even, but for
       | many use-cases it just works.
       | 
       | I'd add that for production deployments, AWS has developed a new
       | instance family that enables OpenSearch data to be stored on S3
       | [1], bringing significant cost savings.
       | 
       | [1] https://docs.aws.amazon.com/opensearch-
       | service/latest/develo...
        
       | amai wrote:
       | More search engines worse than elastic search:
       | 
       | - https://www.meilisearch.com/
       | 
       | - https://typesense.org/
       | 
       | - https://github.com/Sygil-Dev/whoosh-reloaded
        
       ___________________________________________________________________
       (page generated 2025-06-06 23:02 UTC)