[HN Gopher] OpenSearch 3.0 Released
___________________________________________________________________
OpenSearch 3.0 Released
Author : kmaliszewski
Score : 86 points
Date : 2025-05-07 15:38 UTC (7 hours ago)
(HTM) web link (opensearch.org)
(TXT) w3m dump (opensearch.org)
| simple10 wrote:
| Just learning about OpenSearch. Looks like it's a fork of
| Elasticsearch from 2021 when Elasticsearch changed licensing
| model. https://github.com/opensearch-project/OpenSearch
|
| Anyone know if it's still a drop in replacement for
| Elasticsearch? And how does it compare on performance and
| features?
| __s wrote:
| It is not a drop in replacement (but almost is)
|
| 1.x is compatible with ES 7.10
| lockhead wrote:
| It's slower on same hardware, but fine, stay away if you need
| the UI, the Kibana Fork is hellish slow and riddled with bugs.
| darkamaul wrote:
| It's slightly more complex that this. Both OpenSearch and
| Elasticsearch have workflows where they excel.
|
| My company did a fairly comprehensive benchmark of the two
| products [0] if you are interested in comparing performances.
|
| [0] https://blog.trailofbits.com/2025/03/06/benchmarking-
| opensea...
| Y-bar wrote:
| It's worth noting that in September 2024 Elasticsearch once
| again returned to a fully open source license (A GPLv3).
| Salgat wrote:
| Fool me once...
| jillesvangurp wrote:
| I maintain a kotlin client for both Elasticsearch and
| Opensearch (jillesvangurp/kt-search). There are some
| differences but they are mostly still API compatible for most
| of the commonly used features.
|
| There are some exceptions to this and vector search would be
| one of those. The feature was added post fork. There are a few
| other things of course. E.g. search_after works slightly
| different on both. My client works around that. And there are a
| lot of newer features on both sides that are annoyingly
| different. Both have some sql querying capabilities now but
| they both have their own take on that.
|
| Elastic still has the edge on features IMHO. Especially Kibana
| has a lot more features than Amazon's fork. And on the
| aggregation front, Elastic has done quite a bit of feature and
| optimization work in the last few years (that's what powers the
| dashboards). For performance it depends what you do. But they
| both heavily lean on Lucene which remains the open source
| search library both products use. Elastic cloud is a bit better
| than opensearch in AWS from what I've seen. If you self host
| and tune, both should be very similar.
|
| Elastic also just tagged version 9.0, which uses the same new
| version of Lucene as Opensearch 3.0. I have support for both
| new versions in my client already (added that a few weeks ago).
| It now works with Elasticsearch v7, 8, and 9 and Opensearch
| 1,2, & 3.
|
| A lot of my consulting clients seem to prefer Opensearch
| lately. That's mainly because of the less complicated licensing
| and the AWS support. If you have a legacy Elasticsearch setup
| switching it to Opensearch should be doable (depending on what
| you use). But expect to reindex all your data. I don't think a
| direct migration is possible. If you use Elastic's client
| libraries, you may need to switch to Opensearch specific ones.
| This is generally a bit painful (package names, feature
| differences, etc.). That's why I created kt-search a few years
| ago.
| Salgat wrote:
| That's what we ended up doing for our migrations. We actually
| had a bunch of old Elasticsearch 2.3 databases (ancient), so
| we stood up an OpenSearch database in parallel for each and
| on service startup did a one-time automatic index and bulk
| copy over of all the data. So far very happy with OpenSearch.
| simple10 wrote:
| Ah thanks for the detail! Super useful comment.
| blueelephanttea wrote:
| > Anyone know if it's still a drop in replacement for
| Elasticsearch?
|
| As you point out it was forked a number of years ago so it
| started from the same place (7.10). Elasticsearch is now on
| 9.0+ and has 27,000 more commits than OpenSearch. So I doubt it
| is a drop-in replacement anymore.
|
| I have no idea how many of those 27K commits are key features,
| but it is clear divergence.
| ignoramous wrote:
| > Just learning about OpenSearch. Looks like ...
|
| _OpenSearch_ was once a personal search results aggregator
| conceived at A9 (Amazon 's Silicon Valley subsidiary):
| https://github.com/dewitt/opensearch
| Blackthorn wrote:
| Sometimes, the same name refers to multiple things.
| aabhay wrote:
| Does anyone use OpenSearch for its knn and vector capabilities?
| Is it any good? It's always hard to know with systems like this
| whether it works at scale until your team is fighting fires.
| alex_duf wrote:
| It works with some caveats. I've seen it handle searches with
| millions of documents no problem, but the KNN search requires
| to load the entirety of the embedding graph in memory. So watch
| your RAM consumption.
|
| The quality of your results will depend mostly on the quality
| of your embeddings
| seanhunter wrote:
| Irrespective of opensearch, if the dimension of your vector
| embedding is reasonably large you'll probably want an
| approximate nearest neighbours approach like HNSW rather than
| knn itself
|
| https://docs.opensearch.org/docs/1.2/search-plugins/knn/appr...
|
| For whatever an endorsement from a random stranger is worth,
| we've been using opensearch for a vectordb for hybrid search
| across text and multimodal embeddings as well as traditional
| metadata and it's been great but we're not "full production"
| yet so I can't really speak to scale, but it's opensearch so I
| expect the scale to be fine most probably.
| antirez wrote:
| I don't know about OpenSearch implementation, but recently I
| implemented from scratch Vector Sets for Redis using the HNSW
| as a data structure, and there are many other stores that use
| the same data structure. When HNSWs are well implemented, you
| can stay assured they scale very well compared to the task at
| hand, but you can expect insertion speed only on the order of a
| few thousands per second, if you are hitting a _single_ HNSW.
| Reads are much faster, in Redis I get 80k /s easily (but it
| uses multiple cores).
|
| So if you want to build a very, very large index using HSNWs,
| you have to understand if you normally have many writes that
| accumulate evenly, or if your index is a mostly read-only thing
| that is rebuilt from time to time. Mass-insertion the first
| time is going to be very slow. You can parallelize it if you
| build N parallel HNSWs, since the searches can be composed as
| the union of the results (sorted by cosine similarity). But
| often the bottleneck is the embedding model itself.
|
| What is really not super scalable is the _size_ of HNSWs. They
| use of memory is big (Redis by default uses 8 bit quantization
| for this reason), and on disk they require seeks. If you have
| large vectors, like 1024 components, quantization is a must.
| binarymax wrote:
| I use it all the time. If it's "good" depends more on your
| model for embeddings, but you do need to know a bit to tune the
| index. Whatever algo you choose, read the paper.
|
| If you're using lucene HNSW, it will scale but will eat lots
| and lots of Heap RAM. If you're using FAISS or nmslib plugins
| keep an eye out for JNI RAM consumption as well as its outside
| the heap.
|
| Overall, I'd say that it is a challenge to easily scale ANN
| past 100M vectors unless it's given significant attention from
| the team.
| unethical_ban wrote:
| I just want a quick log ingestion tool that can parse syslog
| easily and graph/search fields for me.
|
| Setting up a simple log ingestion on Opensearch or ELK felt like
| a true journey, in a bad way.
| binarymax wrote:
| It's surprising how challenging this is for both Elastic and
| Opensearch. The problem is that it's all configuration and no
| convention, so you need to roll everything yourself. There
| should be prescribed recipes to make this simpler. If you're
| using something like opentelemetry you can find help easier but
| it's still annoying.
| nullify88 wrote:
| It's possible but you need to buy in to the Elastic
| ecosystem. Stuff like *beats, logstash, etc, they can
| configure all sorts of index templates, and ingest pipelines
| depending on what you've configured it to receive.
|
| These days, getting data in and out of Elasticsearch is quite
| easy with dynamic field mapping. Its keeping it performant
| which is tricky.
| dbacar wrote:
| I think both these tools are more on the easy side of setting
| up if you follow their guidelines. You can be up and running
| very quickly. The problems arise when you need some custom
| logic in processing log files. If you have simple shipping
| requiremts you can bypass logstash altogether . Elastic and
| opensearch are not the right tool for application metrics
| though in my opinion, for that use case just use prometheus
| and grafana.
| wingmanjd wrote:
| Have you tried out Graylog? Their core product does pretty
| decently at my $DAYJOB.
___________________________________________________________________
(page generated 2025-05-07 23:01 UTC)