hngopher.com

       [HN Gopher] OpenSearch 3.0 Released
       ___________________________________________________________________
        
       OpenSearch 3.0 Released
        
       Author : kmaliszewski
       Score  : 86 points
       Date   : 2025-05-07 15:38 UTC (7 hours ago)
        
 (HTM) web link (opensearch.org)
 (TXT) w3m dump (opensearch.org)
        
       | simple10 wrote:
       | Just learning about OpenSearch. Looks like it's a fork of
       | Elasticsearch from 2021 when Elasticsearch changed licensing
       | model. https://github.com/opensearch-project/OpenSearch
       | 
       | Anyone know if it's still a drop in replacement for
       | Elasticsearch? And how does it compare on performance and
       | features?
        
         | __s wrote:
         | It is not a drop in replacement (but almost is)
         | 
         | 1.x is compatible with ES 7.10
        
         | lockhead wrote:
         | It's slower on same hardware, but fine, stay away if you need
         | the UI, the Kibana Fork is hellish slow and riddled with bugs.
        
           | darkamaul wrote:
           | It's slightly more complex that this. Both OpenSearch and
           | Elasticsearch have workflows where they excel.
           | 
           | My company did a fairly comprehensive benchmark of the two
           | products [0] if you are interested in comparing performances.
           | 
           | [0] https://blog.trailofbits.com/2025/03/06/benchmarking-
           | opensea...
        
         | Y-bar wrote:
         | It's worth noting that in September 2024 Elasticsearch once
         | again returned to a fully open source license (A GPLv3).
        
           | Salgat wrote:
           | Fool me once...
        
         | jillesvangurp wrote:
         | I maintain a kotlin client for both Elasticsearch and
         | Opensearch (jillesvangurp/kt-search). There are some
         | differences but they are mostly still API compatible for most
         | of the commonly used features.
         | 
         | There are some exceptions to this and vector search would be
         | one of those. The feature was added post fork. There are a few
         | other things of course. E.g. search_after works slightly
         | different on both. My client works around that. And there are a
         | lot of newer features on both sides that are annoyingly
         | different. Both have some sql querying capabilities now but
         | they both have their own take on that.
         | 
         | Elastic still has the edge on features IMHO. Especially Kibana
         | has a lot more features than Amazon's fork. And on the
         | aggregation front, Elastic has done quite a bit of feature and
         | optimization work in the last few years (that's what powers the
         | dashboards). For performance it depends what you do. But they
         | both heavily lean on Lucene which remains the open source
         | search library both products use. Elastic cloud is a bit better
         | than opensearch in AWS from what I've seen. If you self host
         | and tune, both should be very similar.
         | 
         | Elastic also just tagged version 9.0, which uses the same new
         | version of Lucene as Opensearch 3.0. I have support for both
         | new versions in my client already (added that a few weeks ago).
         | It now works with Elasticsearch v7, 8, and 9 and Opensearch
         | 1,2, & 3.
         | 
         | A lot of my consulting clients seem to prefer Opensearch
         | lately. That's mainly because of the less complicated licensing
         | and the AWS support. If you have a legacy Elasticsearch setup
         | switching it to Opensearch should be doable (depending on what
         | you use). But expect to reindex all your data. I don't think a
         | direct migration is possible. If you use Elastic's client
         | libraries, you may need to switch to Opensearch specific ones.
         | This is generally a bit painful (package names, feature
         | differences, etc.). That's why I created kt-search a few years
         | ago.
        
           | Salgat wrote:
           | That's what we ended up doing for our migrations. We actually
           | had a bunch of old Elasticsearch 2.3 databases (ancient), so
           | we stood up an OpenSearch database in parallel for each and
           | on service startup did a one-time automatic index and bulk
           | copy over of all the data. So far very happy with OpenSearch.
        
           | simple10 wrote:
           | Ah thanks for the detail! Super useful comment.
        
         | blueelephanttea wrote:
         | > Anyone know if it's still a drop in replacement for
         | Elasticsearch?
         | 
         | As you point out it was forked a number of years ago so it
         | started from the same place (7.10). Elasticsearch is now on
         | 9.0+ and has 27,000 more commits than OpenSearch. So I doubt it
         | is a drop-in replacement anymore.
         | 
         | I have no idea how many of those 27K commits are key features,
         | but it is clear divergence.
        
         | ignoramous wrote:
         | > Just learning about OpenSearch. Looks like ...
         | 
         |  _OpenSearch_ was once a personal search results aggregator
         | conceived at A9 (Amazon 's Silicon Valley subsidiary):
         | https://github.com/dewitt/opensearch
        
           | Blackthorn wrote:
           | Sometimes, the same name refers to multiple things.
        
       | aabhay wrote:
       | Does anyone use OpenSearch for its knn and vector capabilities?
       | Is it any good? It's always hard to know with systems like this
       | whether it works at scale until your team is fighting fires.
        
         | alex_duf wrote:
         | It works with some caveats. I've seen it handle searches with
         | millions of documents no problem, but the KNN search requires
         | to load the entirety of the embedding graph in memory. So watch
         | your RAM consumption.
         | 
         | The quality of your results will depend mostly on the quality
         | of your embeddings
        
         | seanhunter wrote:
         | Irrespective of opensearch, if the dimension of your vector
         | embedding is reasonably large you'll probably want an
         | approximate nearest neighbours approach like HNSW rather than
         | knn itself
         | 
         | https://docs.opensearch.org/docs/1.2/search-plugins/knn/appr...
         | 
         | For whatever an endorsement from a random stranger is worth,
         | we've been using opensearch for a vectordb for hybrid search
         | across text and multimodal embeddings as well as traditional
         | metadata and it's been great but we're not "full production"
         | yet so I can't really speak to scale, but it's opensearch so I
         | expect the scale to be fine most probably.
        
         | antirez wrote:
         | I don't know about OpenSearch implementation, but recently I
         | implemented from scratch Vector Sets for Redis using the HNSW
         | as a data structure, and there are many other stores that use
         | the same data structure. When HNSWs are well implemented, you
         | can stay assured they scale very well compared to the task at
         | hand, but you can expect insertion speed only on the order of a
         | few thousands per second, if you are hitting a _single_ HNSW.
         | Reads are much faster, in Redis I get 80k /s easily (but it
         | uses multiple cores).
         | 
         | So if you want to build a very, very large index using HSNWs,
         | you have to understand if you normally have many writes that
         | accumulate evenly, or if your index is a mostly read-only thing
         | that is rebuilt from time to time. Mass-insertion the first
         | time is going to be very slow. You can parallelize it if you
         | build N parallel HNSWs, since the searches can be composed as
         | the union of the results (sorted by cosine similarity). But
         | often the bottleneck is the embedding model itself.
         | 
         | What is really not super scalable is the _size_ of HNSWs. They
         | use of memory is big (Redis by default uses 8 bit quantization
         | for this reason), and on disk they require seeks. If you have
         | large vectors, like 1024 components, quantization is a must.
        
         | binarymax wrote:
         | I use it all the time. If it's "good" depends more on your
         | model for embeddings, but you do need to know a bit to tune the
         | index. Whatever algo you choose, read the paper.
         | 
         | If you're using lucene HNSW, it will scale but will eat lots
         | and lots of Heap RAM. If you're using FAISS or nmslib plugins
         | keep an eye out for JNI RAM consumption as well as its outside
         | the heap.
         | 
         | Overall, I'd say that it is a challenge to easily scale ANN
         | past 100M vectors unless it's given significant attention from
         | the team.
        
       | unethical_ban wrote:
       | I just want a quick log ingestion tool that can parse syslog
       | easily and graph/search fields for me.
       | 
       | Setting up a simple log ingestion on Opensearch or ELK felt like
       | a true journey, in a bad way.
        
         | binarymax wrote:
         | It's surprising how challenging this is for both Elastic and
         | Opensearch. The problem is that it's all configuration and no
         | convention, so you need to roll everything yourself. There
         | should be prescribed recipes to make this simpler. If you're
         | using something like opentelemetry you can find help easier but
         | it's still annoying.
        
           | nullify88 wrote:
           | It's possible but you need to buy in to the Elastic
           | ecosystem. Stuff like *beats, logstash, etc, they can
           | configure all sorts of index templates, and ingest pipelines
           | depending on what you've configured it to receive.
           | 
           | These days, getting data in and out of Elasticsearch is quite
           | easy with dynamic field mapping. Its keeping it performant
           | which is tricky.
        
           | dbacar wrote:
           | I think both these tools are more on the easy side of setting
           | up if you follow their guidelines. You can be up and running
           | very quickly. The problems arise when you need some custom
           | logic in processing log files. If you have simple shipping
           | requiremts you can bypass logstash altogether . Elastic and
           | opensearch are not the right tool for application metrics
           | though in my opinion, for that use case just use prometheus
           | and grafana.
        
         | wingmanjd wrote:
         | Have you tried out Graylog? Their core product does pretty
         | decently at my $DAYJOB.
        
       ___________________________________________________________________
       (page generated 2025-05-07 23:01 UTC)