[HN Gopher] Zero downtime Postgres migration, done right
___________________________________________________________________
Zero downtime Postgres migration, done right
Author : rigaspapas
Score : 281 points
Date : 2021-06-11 14:25 UTC (8 hours ago)
(HTM) web link (engineering.theblueground.com)
(TXT) w3m dump (engineering.theblueground.com)
| ckboii89 wrote:
| does anyone know if this works when the target database is a
| replica/standby? The downside using pg_dump is that it acquires a
| lock on the table its dumping, and doing this on production may
| cause some slowness
| jeffbee wrote:
| Bucardo has no performance impact, it just adds a trigger to
| every mutation. Negligible!
|
| I really think articles of this kind are not useless, but need to
| explicitly narrow their audience to set expectation at the start.
| This particular topic is PG replication for users who aren't very
| sensitive to write latency.
| Nextgrid wrote:
| Internet Archive link in case the original is overloaded:
| https://web.archive.org/web/20210611143214/https://engineeri...
| jlmorton wrote:
| If you are in AWS, or have connectivity available, AWS Database
| Migration Service makes this relatively trivial.
|
| DMS for Postgres is based on Postgres Logical Replication, which
| is built-in to Postgres, and the same thing Bucardo is using
| behind the scenes. But AWS DMS is very nearly point-and-click to
| do this sort of migration.
| chousuke wrote:
| The key here is Postgres 9.5. AWS DMS does not support it
| because they require logical replication support.
|
| A few years back I migrated a PostgreSQL 9.2 database to AWS
| and wasn't able to use RDS because logical replication was not
| available.
|
| I did try to use Bucardo but ultimately didn't trust myself to
| configure it such that it wouldn't lose data (first attempt
| left nearly all BLOBs unreplicated because the data isn't
| actually in the tables you set triggers on)
|
| Physical replication to a self-built instance was easy, I was
| 100% confident it wouldn't be missing data, and the downtime
| from cutover was about 15 minutes (It involved restarting piles
| of slow Java applications)
| booleanbetrayal wrote:
| I think DMS generally lags RDS releases. Last I checked, there
| still wasn't a way to replicate from Postgres RDS 12 -> 13 with
| DMS.
| endisneigh wrote:
| I wonder how much easier software engineering would be if there
| were a period where things are simply not available.
|
| What problems are currently very difficult would be made trivial
| if 6 hours of downtime every Sunday were acceptable? 10PM-4AM EST
| legohead wrote:
| Easier sure, but Zero downtime migration is a fun engineering
| problem (albeit stressful). Just need an employer who is
| willing to pay for it.
| yupper32 wrote:
| I find it very hard to come up with a use case where a weekly 6
| hours of downtime at night EST would be acceptable.
|
| US government websites already often do this, and I find that
| completely unacceptable since these services need to be
| available to anyone, regardless of work and life schedules.
|
| Any website that's international would also suffer greatly for
| an EST centralized scheduled downtime.
|
| Maybe a very localized website that doesn't have much impact on
| real life?
| endisneigh wrote:
| Why wouldn't it be acceptable? People aren't available all
| the time either, nor are stores. In fact software is unique
| in its availability.
| yupper32 wrote:
| Frankly, because it can be. Weekly scheduled downtime is
| arbitrary, manufactured, and lazy.
|
| Of course, having to schedule the occasional downtime for a
| database migration is fine. There's probably a few times a
| year that you'd need to do it if you don't have the
| bandwidth to do fancy zero-downtime solutions. It's the
| weekly arbitrary downtime that I'm firmly against.
| endisneigh wrote:
| Why is it lazy? When things are up people have to work.
| Do you believe people should be working all of the time?
|
| I think regular downtime is only natural. If you had to
| choose between 95% availability or 100% availability
| other than the before mentioned downtime which would you
| choose?
| yupper32 wrote:
| Sorry I'm a bit confused by your argument.
|
| > When things are up people have to work.
|
| That's not true. Monitoring software and on-call
| rotations are well established things, and wouldn't even
| go away in your world. Believe it or not, people who run
| websites already sleep at night.
|
| > Do you believe people should be working all of the
| time?
|
| I'm not sure where you got that. Besides, you're already
| having people in the office in the middle of the night
| every Sunday already. Or are you saying that the downtime
| is when your team gets to sleep?
|
| > If you had to choose between 95% availability or 100%
| availability other than the before mentioned downtime
| which would you choose?
|
| Again, not sure what you're getting at. There's only a
| few reasons why you'd need to schedule a downtime.
| Database migrations are a good one. All of the reasons
| should only happen a few times a year, likely. Way better
| than 52 times a year.
| endisneigh wrote:
| I'm not making a particular argument. I'm just wondering
| if having regular downtime is better or worse in terms of
| reducing engineering complexity.
| kabdib wrote:
| What part of the world?
|
| One service I worked on (with customers around the globe) had
| about 10 minutes of scheduled downtime a week. Peak user load
| was, interestingly enough, during our core hours, when there
| are a lot more hands on deck to handle any issues. So that's
| the time we chose.
|
| Naturally people still complained bitterly about that ten
| minutes.
|
| Getting to zero didn't seem worth the significant engineering
| effort required. "You're not Amazon, and you're not 911"
| [emergency number in the US]
| doctor_eval wrote:
| Setting aside the other comments, I think it depends on your
| definition of "easier".
|
| Yes - it is easier in terms of up-front investment to have a
| scheduled (daily, weekly) downtime. But that often means the
| downtime is not automated - it's someone doing the work, often
| out of hours when they're not fresh, things can do wrong, etc.
| And it also means that the duration of the work is limited to
| the scheduled downtime. Some large upgrades just won't fit in
| the allotted time, and you need to schedule an outage.
|
| On the other hand, creating a system designed for zero downtime
| (ie, in-place upgrades, removal of big batch jobs etc) is a lot
| more of an investment up front, and makes upgrade planning more
| complex, but ultimately it saves time and effort because you
| can do all the changes during working hours, and "upgrades"
| become just how you deploy code - part of the core skill set of
| your team - and you do them 10 times a day.
|
| The main difference that I've seen is that the scheduled
| downtime approach tends to create a much more conservative
| upgrade cycle, much bigger upgrades, and more risk of an
| upgrade failing - which means that a pile of features don't get
| released. Also, certain customers will push back and demand a
| ton of documentation and detailed release notes and advanced
| warning of the upgrades etc - this creates loads of
| unproductive work for a small company. A continuous deployment
| model tends to reduce these problems because you're not
| externalising your upgrade schedule and you don't have a single
| big upgrade that can go bang, so maybe one feature doesn't get
| released but everything else does.
|
| Having taken both approaches (and even moved a company from
| scheduled to continuous) I would never go back to a scheduled
| downtime approach.
| maccard wrote:
| Your team is now working 10pm to 4am est every Sunday, and
| anyone who uses your services in any time zone around the world
| is without your service. 10pm est is lunch time in Australia.
| 4am est is working hours in many parts of western Europe.
|
| You don't have to be huge to have customers all around the
| world.
| yifanl wrote:
| I think the idea is that this world is one where the customer
| accepts that there may be an outage every Sunday because
| feature work can be done far more cheaply.
| endisneigh wrote:
| You're correct, but if there happened to be an outage during
| that time now you'd have to have your team working then
| anyway.
|
| The existence of a planned downtime doesn't necessarily mean
| deployments, or work, have to happen during that time.
| yupper32 wrote:
| > but if there happened to be an outage during that time
| now you'd have to have your team working then anyway.
|
| So you're having them work late 52 times a year AND when
| there's an outage vs only when there's an outage. They're
| working late WAY more in your plan.
| endisneigh wrote:
| The existence of planned downtime doesn't mean they have
| to work during said time. I'm not following your
| reasoning
| kabdib wrote:
| Er, you bring the service down so you can do work on it.
| _Somebody_ is doing that work.
|
| I guarantee that you'll have trouble hiring qualified
| people for a role that includes core hours of "2AM-5AM
| Sunday mornings" (or whatever other oddball time you've
| decided). Updating systems is precisely when you may need
| to have all-hands-on-deck responses. Been there, multiple
| times.
| sigstoat wrote:
| they might not have to work during every planned
| downtime, but they're going to have to work during some
| of them, or what was the point of the downtime?
| endisneigh wrote:
| The point is to create a predictable period in which
| downtime, if any, would occur during
| xmprt wrote:
| This predictability only matters for frequent, repeat
| customers. But those are the exact customers that you
| could just email saying "the website is going to be down
| on Sunday for 1 hour for planned maintenance" a few times
| a year when it's actually required.
|
| If you run a business (eg. HFT where you run certain
| cleanup scripts after trading hours) where regular
| planned downtime is fine then go for it but most of the
| time, you don't need a downtime schedule every single
| week because you're not doing planned maintenance work
| every single week.
| endisneigh wrote:
| I think people are getting too caught up in my particular
| example. It could be a month, biweekly, etc.
| i_have_an_idea wrote:
| Interestingly, this is unacceptable in the US and probably most
| of Western Europe. However, I know of more than one bank in
| Eastern Europe where online banking simply doesn't work after
| midnight, until about 6am.
| ivan888 wrote:
| I didn't read this article, but I really hate the tag line "done
| right". It expresses such a poor sense of humility, which is one
| of, or perhaps the most, important traits in the world of
| software
| doctor_eval wrote:
| Yeah I got the same feeling but couldn't quite put my finger on
| it.
|
| "Here's how we did this. Feedback welcome" would seem more
| appropriate.
|
| Perhaps it doesn't fit the hyper-aggressive mould of the would
| be titan of industry.
|
| That said their engineering blog seems to be down so
| -\\_(tsu)_/-
| dkhenry wrote:
| This is one of the areas where Postgres is so far behind MySQL is
| embarrassing. Zero downtime migrations in MySQL have been a
| common method for over 10 years. This solution is far from ideal
| due to the use of Triggers which can greatly increase the load on
| the database and slow down transactions. If you don't have a lot
| of load on your DB thats fine, but if you are pushing your DB
| this will bring it down.
|
| In MySQL they started with triggers with PT-OSC, but now there is
| GH-OST which does it with Replication. You can do something like
| this with Postgres by using Logical replication, but its still
| requires hand holding, and to my knowledge there is no way to
| "throttle" the migration like you can with GH-OST. Where I work
| now we are building all this out so we can have first class
| online migrations, but the chasm is still pretty big.
| merb wrote:
| > but its still requires hand holding, and to my knowledge
| there is no way to "throttle" the migration like you can with
| GH-OST
|
| eh? you only do a basebackup and than you can begin the logical
| replication. at some point you than you do a failover?
|
| chtitux basically described the process which is extremly
| simple.
| booleanbetrayal wrote:
| Has anyone here leveraged pglogical for this before? Looking at a
| similar migration but it has native extension support in AWS RDS.
| Would love to hear any success / horror stories!
| coolspot wrote:
| I just don't get all the "just shutdown the db for an hour,
| you're not Netflix" comments.
|
| If you can do things properly, as an engineer, you absolutely
| should, even if your company serves "just" hundreds of thousands
| instead of hundreds of millions users.
|
| It is not like they wrote their own database for that, they just
| used an open source tool.
| mswtk wrote:
| My understanding of doing things "properly" as an engineer, is
| picking the solution with the right tradeoffs for my use case.
| If the cost to the business of having some amount of scheduled
| downtime occasionally is significantly less than the
| engineering cost of maintaining several 9s worth of
| availability over major migrations, then I consider the former
| to be "done right".
| perrygeo wrote:
| If you do things properly as an engineer, you've already
| negotiated and committed to service-level agreements with
| specific downtime objectives, right? Right?
|
| If you've got say 5 minutes of downtime budget per month, do
| you really think it's a good investment to spend a million
| dollars of engineering effort and opportunity costs to get
| database downtime to zero seconds? Or you could use off-the-
| shelf techniques for orders of magnitude less investment and
| suffer a few minutes of downtime at worst, well within your
| SLO. It's an economic decision, not a technical one.
|
| If zero downtime at any cost is non-negotiable, well you'd
| better hire, plan and budget accordingly. And hope those few
| extra minutes per year are worth the cost.
| coolspot wrote:
| I agree that it is an economic decision. In my understanding
| it didn't cost them millions of dollars to do it zero-
| downtime. Maybe a $10k-$50k in development/admin/test hours
| to make it so.
|
| Also engineers get better (and happier) when they do
| challenging tasks!
| holoduke wrote:
| Looking for something similar for a mariadb setup. Anyone knows
| some resources?
| nick__m wrote:
| You can do multi master replication with conflicts detection
| with symmetricDS. The migration would be similar to the
| procedure described in the article.
| mmanulis wrote:
| Can this be done using existing PostgreSQL functionality around
| replicas? I think there's a plugin for PostgreSQL that supports
| master-master replication as well.
| luhn wrote:
| Braintree (IIRC) had a really clever migration strategy, although
| I can't seem to find the blog post now. They paused all traffic
| at the load balancer, cut over to the new DB, and then resumed
| traffic. No requests failed, just a slight bump in latency while
| the LBs were paused.
|
| This app apparently had robust enough retry mechanisms that they
| were able just eat the errors and not have customer issues--Color
| me impressed! I'm not sure how many teams can make that claim;
| that's a hard thing to nail down.
| scrollaway wrote:
| I remember reading something like that about Adyen. That might
| be why you're unable to find it.
| nezirus wrote:
| pgbouncer has PAUSE comnand, which can be used for seamless
| restarts/failovers or similar
| rattray wrote:
| I wonder if that was it (Adyen and Braintree are
| competitors).
|
| Here's the post:
| https://www.adyen.com/blog/updating-a-50-terabyte-
| postgresql...
|
| Discussed here: https://news.ycombinator.com/item?id=26535357
|
| They architected their application to be able to tolerate
| 15-30min of postgres downtime.
| simonw wrote:
| I think that's covered in this talk (I've not watched the video
| though): https://www.braintreepayments.com/blog/ruby-conf-
| australia-h...
| pgr0ss wrote:
| It was also covered in:
| https://www.braintreepayments.com/blog/switching-
| datacenters...
| luhn wrote:
| Aha, that's the post I was looking for!
| xyst wrote:
| probably had a good circuit breaker design. would love to read
| the article though.
| silviogutierrez wrote:
| Very interesting article. But I have to ask: would taking down
| the system for a couple of hours be _that_ bad?
|
| I looked at the company, and while they seem rather large,
| they're not Netflix or AWS.
|
| I imagine they need to be up for people to be able to check in,
| etc. But they could just block out the planned maintenance as
| check in times far in advance. I'm sure there's a million other
| edge cases but those can be thought out and weighed against the
| engineering effort.
|
| Don't get me wrong, this is very cool.
|
| But I wonder what the engineering cost was. I'd think easily in
| the hundreds of thousands of dollars.
| jeffbee wrote:
| The problem happens when you estimate it will only take a few
| hours and at the end of the first hour it's 1% done, or it
| nears completion and crashes, in a loop. Now what?
| [deleted]
| maccard wrote:
| I think it's always worth questioning both sides. Why is
| downtime acceptable? People on this site routinely complain
| about windows needing a restart for system updates while
| boasting about their Linux servers uptime.
|
| People talk about how kubernetes is overkill for many people,
| but it gives you rolling deployments for your applications out
| of the box.
|
| There's also the "slippery slope" argument. A 0 downtime
| migration means 0 downtime. A 5 minute migration creeps up to a
| 30 minutes migration occasionally, and then regression to the
| mean happens, causing standard migrations to be 30 minutes.
| Daishiman wrote:
| A lot of 0-downtime migrations end up turning into a months-
| long ordeal and can still fail.
|
| Allowing maintenance windows means you can do things in a
| much simple manner. Need to take a snapshot of a DB without
| running into issues with a production system adding data?
| Sure, go ahead, you just saved two months.
| hinkley wrote:
| It's the old can/should argument.
|
| The principle of something like zero downtime or switching
| a vendor out to save 20% may not stack up against the
| reality in opportunity costs, employee burnout, and
| customer turnover.
|
| My company is reorganizing and so everyone is trying to
| settle old philosophical debates during The Churn. But
| everyone is overextended and a number of us have started
| pushing back. Just the meetings to discuss changing vendors
| might push the break-even point out 6 months, before you
| even get into implementation costs. The more time we spend
| working on infrastructure instead of customer-facing
| features and bugs, the more likely those customers are to
| wander off.
| xmprt wrote:
| I think the slippery slope argument is almost always a
| fallacy. In practice you can say "we accept 99.99%
| availability" (which is about 1 hour every year) and budget
| for that. Your service might go down for unplanned reasons
| such as an outage but it could also go down for planned
| maintenance and as long as you're within that budget it seems
| ok.
| munk-a wrote:
| > Why is downtime acceptable?
|
| At least where I'm working downtime is acceptable because our
| customer base is composed of professionals in north america
| that work 9-5 thus taking the DB offline for five minutes at
| midnight PST has essentially no cost to our client base. To
| contrast that, spending three months developing a rollout
| plan costs our company dearly in competitive advantage.
|
| I agree that it's always worth examining from both sides but
| I also think that 0 downtime migration is both a shiny
| interesting problem and a point of pride for developers that
| can cause a company to vastly over invest in a complex
| technical solution to a problem they never had.
| fullstop wrote:
| I work for a small company and it would be devastating if our
| database was down for a few hours.
| silviogutierrez wrote:
| Unplanned? Probably. But maybe it's not as a dire as you
| think? Azure Active Directory (auth as a service) went down
| for a _while_ , globally, and life went on. Same with GCP and
| occasionally AWS[1]
|
| I'm not saying there's no downside, I'm asking _against_ the
| downside of engineering cost. And that itself carries risk of
| failure when you go live. It 's not guaranteed.
|
| [1] AWS is _so_ ubiquitous though, that half the internet
| would be affected so it makes it less individually
| noticeable.
| dividedbyzero wrote:
| So don't host on AWS, wait out their next big outage and
| then take down your app for that big migration, so you can
| hide among all the other dead apps. Half joking, of course.
| silviogutierrez wrote:
| That's brilliant. Have the PR ready to go and
| merge/deploy only when that happens. Just host on us-
| east1 and it'll happen eventually.
| jbverschoor wrote:
| I doubt that.. aws is down sometimes too. Card payments are
| sometimes down, even 112(911) is sometimes down.
|
| I hope you have everything redundant, including the CEO
| fullstop wrote:
| The database has a hot standby and two warm standby
| instances, redundant hardware, network infrastructure, all
| of that.
| xcambar wrote:
| > I hope you have everything redundant, including the CEO
|
| Made me laugh, I'll sure reuse it!
| hobs wrote:
| OTOH I worked for several small companies that had "business
| hours" software - if you had to take downtime on the weekend
| or after hours you'd be looking at impacting <1% of users.
| willcipriano wrote:
| In my view that's just saying at some point in the future the
| company will be devastated. Nothing has 100% uptime.
| chefkoch wrote:
| Then hopefully everything is HA in your company.
| fullstop wrote:
| It absolutely is, BGP with geographically diverse paths,
| databases, app servers, etc, are all redundant. It's hosted
| in-house, so there is a cold standby database in AWS which
| would only be used if, say, an aircraft crashed into our
| server rooms.
| ikiris wrote:
| It always amazes me when someone says they can't ever be
| down, and then says they only serve from one physical
| location actively.
|
| One of these things is not like the other.
| fullstop wrote:
| We have everything in place to run from AWS if needed but
| do not operate from there because of cost.
|
| This goes both ways -- there have been many AWS outages
| which have not affected us. I hear what you're saying,
| but we've had only one instance of extended (hours)
| downtime in the last 20 years.
| xcambar wrote:
| The more experienced I become, the more such down to
| earth solutions seem OMG SO MUCH MORE reasonable than all
| the bells and whistles of "modern" engineering practices.
| jnpatel wrote:
| Could you share what domain your company is in? Just
| curious what has such HA requirements
| sharadov wrote:
| pg_upgrade with --hard-link option will upgrade the db in mins,
| I've done a 2TB db in less than a minute. Yes, there is no
| rollback, but if you do enough testing prior it can be really
| smooth. Am in the middle of doing this across an environment
| where we have tens of hosts from a few GB to couple TB.
| zozbot234 wrote:
| You could take a filesystem-level snapshot for rollback.
| data_ders wrote:
| way cool!
|
| fyi the hyperlink "found here" to setup_new_database.template is
| broken.
| andrewmcwatters wrote:
| What is the equivalent for MySQL?
| tbrock wrote:
| Its insane that it has to be this complex and require third party
| software to accomplish...
|
| Most modern rdbms/nosql database vendors allow a rolling upgrade
| where you roll in new servers and roll out the old ones
| seamlessly.
|
| Also the fact that AWS rds doesnt do this with zero downtime by
| default through automating it this way is also crazy. Why pay for
| hosted when the upgrade story is incomplete? Take downtime to
| upgrade a DB in 2021? Everyone must be joking.
| Pxtl wrote:
| Well, Postgres is F/OSS, so I expect the solution to problems
| to be "lots of small tools"... but I see the same kind of
| herculean battle-plans for MS SQL Server work, and I get
| shocked. That is a paid product, what on Earth are we paying
| for?
| marcosdumay wrote:
| > what on Earth are we paying for?
|
| The same is the same for any relational DBMS: lock-in.
| worewood wrote:
| You're paying for an "if something goes wrong and it's the
| software's fault you can sue us" license
| senko wrote:
| You're not. Approximately 25% of SQL Server EULA text deals
| with ways in which the warranty is limited. The best the
| license gives you is your money back if you prove that it
| was the software's fault.
|
| Of course, you can always sue. A $2T company. Good luck
| with that.
| doctor_eval wrote:
| Went to post the same thing but HN had a little downtime
| burp.
|
| Would only add that the main reason people buy these
| licenses, apart from familiarity, is so they can shift
| the blame if something goes wrong. Its all about ass
| covering.
| salmo wrote:
| Well, this and
|
| 1) the illusion of "support" 2) OSS-fearing companies'
| love of indemnification
| 0xbadcafebee wrote:
| > Most modern rdbms/nosql database vendors
|
| "Modern" = "not a traditional RDBMS". Expecting your 1983
| Toyota Corolla to _fly_ , regardless of the fact that it's now
| 2021, is unrealistic. But personal helicopters have been around
| for a while.
| defaultname wrote:
| This is a complex task, fraught on _any_ platform. Most systems
| base such clustering on a shared filesystem, which is obviously
| orthogonal to the shared nothing designs most try to pursue.
| rhacker wrote:
| "the zero downtime" + this site is definitely down is making me
| laugh.
| zild3d wrote:
| next post "zero downtime static site"
| chtitux wrote:
| If you can afford a one off 1 second of latency for your SQL
| queries, then using logical replication with pgbouncer seems way
| easier :
|
| - setup logical replication between the old and the new server
| (limitations exist on what is replicated, read the docs)
|
| - PAUSE the pgbouncer (virtual) database. Your app will hang, but
| not disconnect from pgbouncer
|
| - Copy the sequences from the old to new server. Sequences are
| not replicated with logical replication
|
| - RESUME the pgbouncer virtual database.
|
| You're done. If everything is automated, your app will see a
| temporary increase of the SQL latency. But they will keep their
| TCP connections, so virtually no outage.
| nhumrich wrote:
| I use pgbouncer and had no idea it supported logical
| replication. I cant find anything about it in the docs. Do you
| have something you can link me to to read more?
| aidos wrote:
| I'm assuming they mean to use Postgres logical replication.
| nhumrich wrote:
| Which is only possible if you are using a version of
| postgres which is new enough, and isn't restricted, such as
| some versions of RDS. Which, explains the whole original
| post.
| booleanbetrayal wrote:
| This is precisely the migration I'm planning on doing in the
| next few weeks with pglogical under the hood for replication.
| Seems like the atomic switch is much easier than any sort of
| problem that could stem from conflict or data duplication
| errors while in a bi-directional replication strategy.
| foobarbazetc wrote:
| That's how you do it.
| stingraycharles wrote:
| Exactly this. The OP's approach reminded me so much of the days
| of Slony, and I wondered why a simpler approach with logical
| replication would not just suffice.
|
| Rather than pgbouncer, I did this in the actual application
| code once (write to both databases at the same time, once
| everything is in sync and you're confident the new server works
| well, fail over to the new one only), but it depends upon how
| much control you can exercise over the application code.
|
| Any approach that is based on triggers makes me shiver,
| however.
| asenchi wrote:
| Yep, you can also prepare the new database by using a snapshot
| of the primary's volume, and use pg-rewind to get them in sync.
| Altogether the tooling can make migrations super easy without
| minimal downtime.
| fullstop wrote:
| This works wonderfully. If you have any long running queries,
| though, the PAUSE won't pause until they have finished.
|
| I love pgbouncer, it is such a great tool.
| twunde wrote:
| To be fair about this page, this was used to migrate versions
| of postgres __prior__ to the introduction of logical
| replication. Logical replication makes this significantly
| easier (ie you no longer need the triggers)
| jbverschoor wrote:
| > Blueground is a real estate tech company offering flexible and
| move-in ready furnished apartments across three continents and 12
| of the world's top cities. We search high and low for the best
| properties in the best cities, then our in-house design team
| transforms these spaces into turnkey spaces for 30 days or
| longer.
|
| Seriously, how big can that db be, and how bad would a 1hr
| reduced availability / downtime be?
|
| Seems like a lot of wasted engineering effort. "You are not
| google"
| cryptoz wrote:
| You cut out the part in About where they hint at it being
| important here even. "But we go a step further, merging
| incredible on-the-ground support with app-based services for a
| seamless experience for all our guests"
|
| It sounds like across multiple timezones, with a tech stack
| that backs the business in a specific way, that downtime could
| be a problem and that it would reduce their offering,
| "seamless" and reliability for moving.
|
| If twitter goes down and you can't tweet, only really Twitter
| loses money and you can try again later. But if you're moving
| into an apartment you don't want to be standing outside trying
| to get the keys but the system is down. Edit: And you also
| don't want the service to tell you that you can't move in from
| 11am-1pm because of 'maintenance'
| svaha1728 wrote:
| Agree that it's a lot of wasted engineering effort for
| companies that can have planned downtimes.
|
| My guess the conversation was 'To be agile, we need to be 100%
| CICD'. Next thing you know everything needs to get pushed
| straight to prod continuously with no downtime.
| majormajor wrote:
| You can often get away with some downtime, but that's not the
| same as not spending any engineering effort.
|
| What kills you is when you take an hour scheduled downtime,
| communicate this out, get everyone on board... and then are
| down for a day. If you don't have a good, well-rehearsed, plan,
| you might be unexpectedly screwed even then. As a rule of
| thumb... something usually doesn't quite go how you expect it
| to!
| Hallucinaut wrote:
| I was trying to get on Zoopla (and, I think, Rightmove) two
| nights ago for over an hour after midnight and the site
| continued to tell me it was down for maintenance, and to wait
| again until the non-night owls were up in the morning. Pretty
| sure their market cap is ten, if not eleven, figures.
___________________________________________________________________
(page generated 2021-06-11 23:00 UTC)