[HN Gopher] I made a search engine worse than Elasticsearch (2024)
___________________________________________________________________
I made a search engine worse than Elasticsearch (2024)
Author : softwaredoug
Score : 119 points
Date : 2025-06-05 18:37 UTC (1 days ago)
(HTM) web link (softwaredoug.com)
(TXT) w3m dump (softwaredoug.com)
| niazangels wrote:
| Learnt a lot from this! Thank you for the write up.
| neuroelectron wrote:
| This is worth more than Alphabet
| sph wrote:
| How? Alphabet already has a search engine worse than
| Elasticsearch.
| endymion-light wrote:
| alphabet have a search engine? i thought it was just an ad
| machine at this point
| softwaredoug wrote:
| An ad machine that's a search engine, just optimized for ad
| relevance not just search relevance :)
| mrguyorama wrote:
| It is a search engine. You enter a search string and it
| returns all the ads that are associated with that search
| and your user.
| sh34r wrote:
| I feel like this is a rite of passage for all engineers: messing
| around with things like Lucene long enough to realize that
| search-for-humans is a relatively hard problem, even at small
| scale.
|
| Improving your simple website's search function will take days or
| weeks, not hours. If you make your own search engine, it's almost
| guaranteed to be worse than ElasticSearch.
| bob1029 wrote:
| You can get pretty far with Lucene primitives. That's the level
| of abstraction I prefer to work at. Running search in a
| different process or container means I lose the advantages of
| tight integration of search/indexer logic with business logic.
| Keeping indexes on the local disk (just like SQLite) is a
| really simple deployment model too.
|
| I agree that implementing something like Lucene from scratch
| would be an uphill battle. Probably not worth the time.
| jillesvangurp wrote:
| It's not a reason to not take on such a project and learn
| something. But it is a good reason to approach the subject with
| some humility. There are posts here every few months/weeks of
| someone boasting that they are running circles around Lucene in
| some way. BTW. Elasticsearch uses Lucene. Lucene is where all
| the cool stuff it does is implemented.
|
| Implementing your own search is indeed a bit of a rite of
| passage. Usually, if you go look at such implementations,
| you'll find they implemented 1% of the features, cut lots of
| corners and then came up with some benchmark that proves they
| are faster for some toy dataset. WAND would be a good example
| of something most of these things don't do.
|
| Doug is of course a search relevance expert who has published
| several books on the subject. So, this is not some naive person
| implementing BM25 but just somebody building tools they need to
| do bigger things. Sometimes Elasticseach/Lucene are just
| overkill and it is worth having your own implementation.
|
| You can find my own vibe coded version here:
| https://github.com/jillesvangurp/querylight. Nice embeddable
| search engine for kotlin multiplatform (works in kotlin-js,
| android, ios, wasm, and of course jvm). I use it in some
| browser based apps.
|
| If I need a proper search engine, I use Elasticsearch or
| Opensearch.
| fucalost wrote:
| +1 for OpenSearch, especially with UltraWarm nodes
| cha42 wrote:
| I use PostgreSQL full text search and GIN indexing and often
| find it to be good enough and fast enough without the hassle to
| have to handle a second engine just for search.
| stuaxo wrote:
| Having elasticsearch, as this resource hungry slow to update
| JVM based thing always seems so horrible in Django based
| projects.
|
| In that world, using haystack and choosing a backend based on
| C++ is so much less hassle for deployment.
|
| Although for many things just FTS in Postgres is fine too.
|
| I'm sure for planet scale stuff ES is fine, but otherwise I've
| only found it brings pain in the kind of dev I get to do.
| moralestapia wrote:
| I made mine and it performs way better for my specific use
| case. Also, single digit ms latencies.
|
| I might actually open source it, it's a single file anyway.
| pphysch wrote:
| > Improving your simple website's search function will take
| days or weeks, not hours.
|
| Full-text search, sure, but you can easily provide a better
| overall search experience by creating a custom wrapping
| algorithm that provides shortcuts for common access patterns of
| _your_ users in _your_ application, in addition to full-text
| search.
| Alifatisk wrote:
| This made me so thankful for Elasticsearch existence
| stuaxo wrote:
| I mean.. I hate having to use elasticsearch, so this is quite a
| feat.
|
| (To be fair, I've only worked on projects that use ES where it is
| entirely unnessacary).
| nchmy wrote:
| Folks should check out Manticoresearch. It evolved out of Sphinx
| search, which is older than Lucene and powers things like
| Craigslist.
|
| Much easier to deal with and faster than elastic
|
| https://manticoresearch.com/
| 0xC0ncord wrote:
| The problem I quickly ran into with Manticoresearch is it's
| missing a bunch of the API that most Elasticsearch clients
| expect. It certainly is fast, though.
| Imustaskforhelp wrote:
| I am sure that it isn't that big of a dealbreaker for me
| personally but surely this can be created by the
| Manticoresearch right? It doesn't seem to be that bad given
| the performance gains of atleast 2x on elasticsearch which is
| already pretty performant in my opinion and also, you get to
| be stress free about if elasticsearch would change its
| license again or not given their license pull if I remember
| correctly.
| Imustaskforhelp wrote:
| Very interesting. Thanks for the share! Appreciate it.
| 0xB0UNCE00 wrote:
| And so what if it's worse than elasticsearch, it's the playing
| around and learning that counts.
| fucalost wrote:
| I actually _really_ like Elasticsearch. It's very powerful,
| there's a healthy ecosystem of tools (increasingly for OpenSearch
| too), and the query language makes sense to me.
|
| Sure it's computationally expensive, inefficient even, but for
| many use-cases it just works.
|
| I'd add that for production deployments, AWS has developed a new
| instance family that enables OpenSearch data to be stored on S3
| [1], bringing significant cost savings.
|
| [1] https://docs.aws.amazon.com/opensearch-
| service/latest/develo...
| amai wrote:
| More search engines worse than elastic search:
|
| - https://www.meilisearch.com/
|
| - https://typesense.org/
|
| - https://github.com/Sygil-Dev/whoosh-reloaded
___________________________________________________________________
(page generated 2025-06-06 23:02 UTC)