[HN Gopher] We upgraded an old, 3PB large, Elasticsearch cluster...
___________________________________________________________________
We upgraded an old, 3PB large, Elasticsearch cluster without
downtime
Author : ollieparsley
Score : 108 points
Date : 2022-11-11 15:54 UTC (7 hours ago)
(HTM) web link (underthehood.meltwater.com)
(TXT) w3m dump (underthehood.meltwater.com)
| metadat wrote:
| I've heard horror stories from friends about working at
| meltwater. Setting that aside for a moment, this is an amazing
| software engineering achievement.
|
| Pulling off this level of scale with Elasticsearch is no easy
| feat and very impressive from a technical perspective. When
| you're running ES with petabytes of mission critical data as a
| core service powering the universe of a business, cluster
| rebuilds aren't an option (or maybe they are, as a last resort,
| but absolutely will not be acceptable on an ongoing basis).
|
| Relying on Elasticsearch mega-clusters in this manner is akin to
| running an ultra-marathon with a really sharp pair of scissors
| glued in each hand. Or maybe even more extreme than my
| (admittedly lame) analogy.
|
| Running nodes with such high shard counts is an appreciably
| precarious proposition, because there is a fair amount of
| overhead in the Elasticsearch management protocol. I wonder what
| the performance testing strategy entailed.
|
| I have a lot of respect for the engineers working to make this
| project and service a success story. When it comes to
| Elasticsearch at scale, such outcomes are the exception.
| karlney wrote:
| Thank you for those kind words.
|
| And yes we have had our fair share of pain with the old cluster
| for sure.
|
| the new version (7.17) is still behaving a lot better so far
| and feels a lot more predictable.
| trendy0 wrote:
| What were the stories?
| karlney wrote:
| One time, a few years ago a particularly nasty query was
| executed over and over again and it took a few hours to find
| it and then block it.
|
| And during that time so many nodes had became slow and
| unresponsive that another (for us) previously unseen memory
| leak started to occur.
|
| Nodes kept building up queues of unanswered ping requests on
| them. And the requests contained our 100Mb large cluster
| state, so the heaps filled up and evenmore nodes became
| unresponsive.
|
| And from then on the whole thing turned into a death spiral
| of doom.
|
| After trying, and failing to get it under control for 48
| hours we gave up and rebuilt the whole cluster from scratch,
| using the snapshots we store on S3.
|
| The recovery took another 90 hours or so. That was not a fun
| week.
| andrelaszlo wrote:
| I usually don't explain my downvotes, but I thought that your
| comment was good overall, but the "horror stories from friends
| about working at meltwater" without explaining what they are
| just makes it a bit unfair.
|
| As criticism, it's very vague, and as someone who doesn't work
| at Meltwater (for the last 5 years or so at least) it doesn't
| give me any information either. Well except that there are
| rumors about Meltwater, but that would be true about any large
| corporation.
|
| Maybe I misunderstood and the horror stories were about ES, but
| I got it as being about the company itself. Could you expand?
| What type of stories? :)
| krallja wrote:
| While we're telling ES war stories:
|
| FogBugz was still on twelve ElasticSearch 1.6 nodes when I left
| in 2018. We also had a custom plugin (essentially requesting
| facets that weren't stored in ElasticSearch back from FogBugz),
| which was the main reason we hadn't spent much time thinking
| about upgrading it. To keep performance adequate, we scheduled
| cache flush operations that, even at the time, we knew were
| pants-on-head crazy to be doing in production. I can't remember
| if we were running 32-bit or 64-bit with Compressed OOPs.
|
| Kiln was on an even older version, v1.4 if I remember correctly.
| And one of the shards had a corruption warning, yet it didn't
| seem to affect stability or results. But that wasn't a fun
| cluster to operate, since it refused to do certain types of
| maintenance because of the supposed corruption.
|
| Hopefully the newer versions are easier to migrate between. I
| don't remember what exactly was preventing us from upgrading, but
| I'm sure part of it was wanting to avoid a full reindex.
| rjh29 wrote:
| It's good to hear stories of real-world systems. If you only
| look at blog posts you get the idea that everyone is doing
| everything perfectly, but of course it's not really like that
| at all...
| andrelaszlo wrote:
| Congrats on finishing that monster migration!
|
| > In order to control how queries are executed, we have built a
| plugin which exposes a set of custom query types. We use these
| query types to provide functionality and performance
| optimisations not available in stock Elasticsearch. For example,
| we have implemented wildcards within phrases, with support for
| executing within SpanNear queries. We optimise "*" to a match-
| all-query. And a whole lot of other things.
|
| Did you port your the in-house plugins? Seems like a big blocker.
| karlney wrote:
| Thank you. Yes it was a massive project.
|
| I don't want to spoil the other blog posts but we managed to
| solve almost all of our custom use cases without modifying
| elasticsearch itself. We still have one custom plugin but only
| to enhance functionality, not for performance and stability
| reasons.
| semi-extrinsic wrote:
| While I fully understand why you run this thing with 300+
| nodes as you do, I have to wonder, just for fun - could you
| actually fit this whole thing on a single large server? Looks
| like something with 16 TiB RAM and 2 PiB SSD storage is
| actually a server you could theoretically buy today?
| karlney wrote:
| We feel that ~300 nodes strikes a good balance in the
| cattle vs pets philosophy.
|
| Going up to i3en.12xlarge (or equivalent) would probably
| have worked as well.
|
| But after that the impact of loosing just one node would be
| too big.
| andrelaszlo wrote:
| Cool! Will stay tuned for the next post :)
| permb wrote:
| Such an amazing engineering team that the world doesn't know
| about (based in Gothenburg, Sweden).
|
| Disclaimer: I was once part of it
| taf2 wrote:
| I did an upgrade with the team from 1.7 to 7.5.2 a few years ago
| we used terraform to build the 7.5.2 cluster with about 28 nodes.
| First we did a snapshot to upgrade the data from 1.7 to 2.4 and
| we synced by having our applications write to both. To get them
| to a synced state right before snapshotting we set a redis key
| that told our application servers to start writing every document
| changed or created to a redis set so we would have a set of all
| things changed since snapshot. This was to account for the time
| between snapshotting and getting the new cluster up. Once we have
| the set of changes synced we could test queries by switching a
| customer account to read from 2.4 via another redis set of
| upgrade accounts. Once we were confident and saw no new
| deprecations we did the process again for 5.6 and the. 7.5... as
| I recall we could skip 6.x It was an intense few weeks but
| definitely worth it for us. We also cleaned up our deployment to
| have a dedicated set of master, data and client nodes.
| yeldarb wrote:
| Apologies for the shameless plug, but strikes me that this might
| be the most relevant place on the Internet right now to reach a
| bunch of Elasticsearch experts who might be interested... we're
| using Elasticsearch to index over 100M images for multimodal
| vector search & looking to expand our team:
| https://www.ycombinator.com/companies/roboflow/jobs/fYL4yzG-...
| endisneigh wrote:
| Is there no other search database that can be persisted other
| than Elastic/Lucene/Solr?
|
| I get that there's little money to be made in these things but
| it's surprising. Seems like most full text search are relatively
| simple plug-ins to existing databases or in memory only.
| bratao wrote:
| Yes, there is. We moved from ES to Vespa (vespa.ai) and never
| looked back. WE got better results, speed and WAY lower
| maintenance costs. I really don't understand how underrated
| this project is.
| murkt wrote:
| How do you deal with Vespa's query language, YQL?
| bratao wrote:
| I was also suspicious "Great, another one that wants to
| reinvent SQL". But in practice it works very well, to the
| point of I enjoying it.
| morelisp wrote:
| Nearly a decade ago (oh god) I converted some overdesigned five
| node ES mess to https://github.com/mchaput/whoosh. It's
| (obviously) not the fastest or anything, but it was more than
| good enough for low-dozens of GBs of mostly static data.
___________________________________________________________________
(page generated 2022-11-11 23:01 UTC)