[HN Gopher] Fire declared in OVH SBG2 datacentre building
___________________________________________________________________
Fire declared in OVH SBG2 datacentre building
Author : finniananderson
Score : 1074 points
Date : 2021-03-10 03:11 UTC (19 hours ago)
(HTM) web link (travaux.ovh.net)
(TXT) w3m dump (travaux.ovh.net)
| bigiain wrote:
| There is no cloud, there is just other people's computers. And
| they're on fire.
| quickthrower2 wrote:
| This is a literal kubectl delete pod
| cookiengineer wrote:
| Ah, that explains why there are so many dark patterns in the
| cloud! /s
| Fordec wrote:
| Your data was in the cloud, now it's in the smoke.
| herpderperator wrote:
| ...which has now become an actual cloud.
| Rapzid wrote:
| You might even say, it's _gone with the wind_...
|
| Yeeeeeahhhhhhhhhh!!!!!
| Psype wrote:
| Been there, done that.
|
| The 2nd worst thing, IF you happen to catch it soon and control
| it, is that the temp raise triggers many alerts and automatic
| controls.
|
| So when it's controlled, it's still a real nightmare. Here,
| firemen could not even control it...
| termau wrote:
| Most modern data centres wouldn't have had this issue, at least
| in in Australia they use Argonite suppression systems, these work
| by using a gas that is a mixture of argon and nitrogen that
| suppresses fire by depleting oxygen in the the data hall.
| exikyut wrote:
| I'm seeing quite a lot of repeated sentiment throughout the
| comments that Halon is illegal and is no longer used.
|
| Is the situation "Halon is legal in Australia" or "Halon isn't
| actually illegal per se if you don't use a lot of it"?
| aaronmdjones wrote:
| Halon and Argonite are unrelated.
| divingdragon wrote:
| Halons are being phased out because, being CFCs, they deplete
| the ozone layer.
|
| Other gases like argon or FM200 are not Halons.
| LIV2 wrote:
| I don't think Aussie dc's are all the same. Globalswitch in
| Sydney uses inergen but Equinix is using water at least at SY1
| but I'm reasonably sure that GS is much older
| emersion wrote:
| Picture of the fire from last night:
| https://twitter.com/BobZeHareng/status/1369563084277374980
| r4ibOm wrote:
| I see several people talking about advanced backup systems for
| businesses. I do not have a company, I work as a freelancer, I am
| Brazilian and the current 70 euros of tuition I was paying was
| already compromising my income, since my local currency is quite
| devalued against the dollar. Then imagine the situation. my
| websites are down, the only backup I have was a copy of the vps
| that i made in november last year because i was with the
| intention of creating a server at my house, since it was getting
| expensive to maintain this server at OVH. It would be
| unacceptable for a company of this size not to have its servers
| backed up or to keep them in the same location as the incident,
| since their networks are all connected. I hope you have a
| satisfactory and quick solution to this problem.
| MegaThorx wrote:
| I had my server in SBG2 and sadly the backup failed since the
| end of january. Yep it is my mistake for not checking the
| backups. Now I lost about 1 month of data.
|
| The only good thing is that my backup was offsite.
|
| Does OVH offer automatic snapshots for VPS? I know Hetzner does
| it for 20% addidation of the cost of the server. If they do the
| next question would be if they are destroyed too?
| kuschku wrote:
| They do offer automatic backups, and they're offsite (RBX in
| this case).
|
| Here's more information on them:
| https://docs.ovh.com/gb/en/vps/using-automated-backups-
| on-a-...
|
| With OVH the price can depending on the situation double your
| cost, if you've got a 3EUR VPS and a 3EUR backup
| configuration (as the price depends on size)
| Symbiote wrote:
| If OVH backed up everything, the cost of the service would be
| double.
|
| Many customers don't need a backup, so it's up to each customer
| to arrange their own backups -- perhaps with tools and services
| provided by the hosting company, or their own solution.
|
| Running a company with no backup (for cost or any other reason)
| is very risky, as some people will have found out today.
| jiofih wrote:
| Well, not having managed backups is obviously part of choosing
| to go bare-metal. They do have triple-redundancy backups in
| their cloud offerings. Nobody to blame but yourself.
|
| Also, if you're hosting clients' static websites, you were
| burning your money, there are way cheaper options out there
| (and fully managed).
| Pick-A-Hill2019 wrote:
| For the curious - this is what it looks like when a fire
| suppression system activates in a (small) server room.
|
| https://youtu.be/DrDU4UQUwKg?t=60
| KirillPanov wrote:
| The cloud is on fire.
| Shank wrote:
| A status update on the OVH tracker for a different datacenter
| (LIM-1 / Limburg) says "We are going to intervene in the rack to
| replace a large number of power supply cables that could have an
| insulation defect." [0][1] The same type of issue is "planned" in
| BHS [3] and GRA [2].
|
| Eerie timing: do they possibly suspect some bad cables?
|
| [0]: http://travaux.ovh.net/?do=details&id=49016
|
| [1]: http://travaux.ovh.net/?do=details&id=49017
|
| [3]: http://travaux.ovh.net/?do=details&id=49462
|
| [2]: http://travaux.ovh.net/?do=details&id=49465
| mmauri wrote:
| We have several bare-metal servers on GRA/Gravelines &
| RBX/Roubaix, 3 weeks ago we had a 3h downtime on RBX because
| they were replacing power cords without previous notification.
| Maybe they were aware this could happen, and were in the
| process to fix it
| dylan604 wrote:
| >Eerie timing: do they possibly suspect some bad cables?
|
| Why not? Cables with ratings lower than the load they are
| carrying is a prime cause for electrical fires. If the load is
| too high for long enough, the shielding melts away, and if it
| is close enough for other material to catch fire then that's
| the ball game. It's a common cause for home electrical fires.
| Some lamp with poor wiring catches the drapes on fire, etc.
| Wouldn't think a data center would have flammable curtains
| though.
| jfrunyon wrote:
| They're waiting an awful long time to do the one at BHS-7 if
| so: 14 days from now?
| terom wrote:
| https://www.google.com/search?q=site%3Ahttp%3A%2F%2Ftravaux....
| there's quite a few of these
|
| http://travaux.ovh.net/?do=details&id=47840 earliest one that I
| found was back in December
| [deleted]
| [deleted]
| trailmonster wrote:
| As an industry we like to think we're transparent, honest, and
| perhaps even based on merit, but I can't find any of that in this
| messaging.
|
| "We are currently facing a major incident in our Strasbourg
| datacentre, with a fire declared in the SBG2 building.
| Firefighters intervened immediately on the spot but were unable
| to control the SBG2 fire. As a precautionary measure, the
| electricity was cut off on the whole site, which impacts all our
| services at SBG1, SBG2, SBG3 and SBG4. If your production is in
| Strasbourg, we recommend that you activate your Business Recovery
| Plan."
|
| Incidents are faced, not caused. It's made clear that they called
| the fire department as soon as they should have, and they did
| what they could as well, but in reality, your disaster recovery
| plan and how well you implemented it is what's really the
| question now, isn't it?
|
| I think it's far easier to ask whether you've done the right
| thing, user, than it is to ask why a fire managed to take out an
| entire facility that was designed to prevent that exact scenario,
| but only if you're OVH.
| Kye wrote:
| The best time to test your backups is before the production
| server dies in a fire.
| curiousgal wrote:
| Came across this on r/France
|
| https://i.imgur.com/epj1Lue.png
|
| Translation :
|
| We lost our Gitlab and backups...
|
| And the automatic backups that had been put in place no longer
| worked so a priori we lost everything...
| geocrasher wrote:
| The classic "lp0 on fire" error message comes to mind:
| https://en.wikipedia.org/wiki/Lp0_on_fire
|
| Really though, I feel truly awful for anyone affected by this.
| The post recommends implementing a disaster recovery plan. The
| truth is that most people don't have one. So, let's use this post
| to talk about Disaster Recovery Plans!
|
| Mine: I have 5 servers at OVH (not at SBG) and they all back up
| to Amazon S3 or Backblaze B2, and I also have a dedicated server
| (also OVH/Kimsufi) that gets the backups. I can redeploy in less
| than a day on fresh hardware, and that's good enough for my
| purposes. What's YOUR Disaster Recovery Plan?
| tomxor wrote:
| Basically the same (offsite backups), but the details are in
| the what and how which is subjective... For my purposes I
| decided that offsite backups should only comprise user data and
| that all server configuration be 100% scripted with some
| interactive parts to speed up any customization including
| recovering backups. I also have my own backup servers rather
| than using a service, and implement immutable incremental
| backups with rotated ZFS snapshots (this is way simpler than it
| sounds) - I can highly recommend ZFS as an extremely reliable
| incremental backup solution but you must enable block level
| deduplication and expect it to gobble up all the server RAM to
| be effective (but that's why I dedicate a server to it and
| don't need masses of cheap slow storage)... also the backup
| server is restorable by script and only relies on having at
| least one of the mirrored block devices in tact which I make a
| local copy of occasionally.
|
| I'm not sure how normal this strategy is outside of container
| land but I like just using scripts, they are simple and
| transparent - if you take time and care to write them well.
| mwcampbell wrote:
| This sounds like what I want to do for the new infrastructure
| I'm setting up in one of OVH's US-based data centers. Are you
| running on virtual machines or bare metal? What kind of
| scripting or config management are you using?
| tomxor wrote:
| VPS although there is no dependency on VPS manager stuff so
| I don't see any issue with running on bare metal. No config
| managers, just bash scripts.
|
| They basically install and configure packages using sed or
| heredocs with a few user prompts here and there for setting
| up domains etc.
|
| If you are constantly tweaking stuff this might not suit
| you, but if you know what you need and only occasionally do
| light changes (which you must ensure the scripts reflect)
| then this could be an option for you.
|
| It does take some care to write reliable clear bash
| scripts, and there are some critical choices like `set -e`
| so that you can walk away and have it hit the end and know
| that it didn't just error in the middle without you
| noticing.
| fy20 wrote:
| I have three servers (1 OVH - different location, 2 DO). The
| only thing I backup is the DB, which is synced daily to S3.
| There's a rule to automatically delete files after 30 days to
| handle GDPR and stop the bucket and costs spiralling out of
| control.
|
| Everything is managed with Ansible and Terraform (on DO side),
| so I could probably get everything back up and running in less
| than an hour if needed.
| sverhagen wrote:
| > probably
|
| That makes it sound like you didn't try/practice. I imagine
| that in a real-life scenario things will be a little more
| painful than in one's imagination.
| dylan604 wrote:
| Exactly. Having a plan is only part of it. Good disaster
| plans do dry runs a couple of times a year (when time
| changes is always convenient reminder). If you rehearse the
| recovery when you're not panicked, you have a better chance
| of not skipping a step when the timing is much more
| crucial. Also, some sort of guide with steps given
| procedurally is a great idea.
| slaymaker1907 wrote:
| I don't think this is necessarily true for all parts of a
| disaster plan. Some mechanisms may be untestable because
| it is unknown how to actually trigger it (think certain
| runtime assertions, but on a larger scale).
|
| Even if it possible to trigger and test, actually using
| the recovery mechanism may have some high cost either
| monetarily or maybe losing some small amount of data.
| These mechanisms should almost always be an additional
| layer of defense and only be invoked in case of true
| catastrophe.
|
| In both cases, the mechanisms should be tested as
| thoroughly as possibly, either through artificial
| environments that can simulate improbable scenarios or in
| the latter case on a small test environment to minimize
| cost.
| lesquivemeau wrote:
| Personally my email hosting is down but thanksfully my web
| hosting and nextcloud instance were both at GRA2 (Gravelines).
|
| But i have a friend who potentially lost important uni work
| hosted on his nextcloud instance... On SBG2.
|
| A rough reminder that backups are really important, even if you
| are just an individual
| Ceezy wrote:
| A lot of prayers....
| NicoJuicy wrote:
| Snapshots, db backups and data backups.
|
| Rolling backups with a month retention to box using rsync.
|
| It creates a network drive to box by default when I boot my
| desktop.
|
| I have some scripts for putting production db's in test and
| when I went them locally.
| cm2187 wrote:
| Also stupid things not to forget: make sure your dns provider
| is independent otherwise you won't be able to point to your new
| server (or have a secondary DNS provider). Make sure any email
| required for 2FA or communicating with your hosting service
| managing your infrastructure isn't running on that same
| infrastructure.
| pmlnr wrote:
| I only have a personal server running in Hetzner but it's
| mirrored onto a tiny local computer at home.
|
| They both run postfix + dovecot, so mail is synced via dovecot
| replication. Data is rsync-ed daily, and everything has ZFS
| snapshots. MySQL is not set into replication - my home internet
| breaks often enough to have serious issues, so instead I drop
| everything every day import a full dump from the main server,
| and do a local dump as backup on both sides.
|
| I don't have automatic failover set up.
| clan wrote:
| Not saying that you should never do a full mysql dump. Nor
| that you should not ensure that you can import a full dump.
|
| But when you already use ZFS you can do a very speedy full
| backup with: mysql << EOF FLUSH
| TABLES WITH READ LOCK; system zfs snapshot
| data/db@snapname UNLOCK TABLES; EOF
|
| Transfer the snapshot off-site (and test!). Either as a
| simple filecopy (the snapshot ensured a consistent database)
| or a little more advanced with zfs send/receive. This is much
| quicker and more painless than mysql dump. Especially with
| sizeable databases.
| pmlnr wrote:
| Good point, but my DB is tiny, so for now, I can afford the
| mysqldump. But I'll keep this in mind.
| mwcampbell wrote:
| Do you even need to flush the tables and grab a read lock
| while taking the ZFS snapshot? My understanding was that
| since ZFS snapshots are point-in-time consistent, taking a
| snapshot without flushing tables or grabbing a read lock
| would be safe; restoring from that snapshot would be like
| rebooting after losing power.
| tetha wrote:
| At work, there are several layers.
|
| As an immediate plan, the 2-3 business critical systems are
| replicating their primary storages to systems in a different
| datacenter. This allows us to kick off the configuration
| management in a disaster, and we need something in between 1-4
| hours to setup the necessary application servers and
| middlewares to get critical production running again.
|
| Regarding backups, backups are archived daily to 2 different
| borg repo hosts on different cloud providers. We could lose an
| entire hoster to shenanigans and the damage would be limited to
| ~2 days of data loss at worst. Later this year, we're also
| considering to export some of these archives to our sister
| team, so they can place a monthly or weekly backup on tape in a
| safe in order to have a proper offline backup.
|
| Regarding restores - there are daily automated restore tests
| for our prod databases, which are then used for a bunch of
| other tests after anonymization. On top, we've built most
| database handling on top of the backup/restore infra in order
| to force us to test these restores during normal business
| processes.
|
| As I keep saying, installing a database is not hard. Making
| backups also isn't hard. Ensuring you can restore backups, and
| ensuring you are not losing backups almost regardless of what
| happens... that's hard and expensive.
| tialaramex wrote:
| If you are a corporate entity of some kind, the final layer of
| your plan should always be "Go bankrupt". You can't
| successfully recover from every possible disaster and you
| shouldn't try to. In the event of a sufficiently unlikely
| event, your business fails and every penny spent attempting the
| impossible will be wasted, move on and let professional
| administrators salvage what they can for your creditors.
|
| Lots of people plan for specific elements they can imagine and
| forget other equally or even more important things they are
| going to need in a disaster. Check out how many organisations
| that doubtless have 24/7 IT support in case a web server goes
| down somehow had _no plan_ for what happens if it 's unsafe for
| their 500 call centre employees to sit in tiny cubicles
| answering phones all day even though pandemic respiratory
| viruses are _so_ famously likely that Gates listed them
| consistently as the #1 threat.
| jfrunyon wrote:
| IMHO, the part they had no plan for was being unable to just
| require their employees to come in anyway...
| osmano807 wrote:
| Just lobby the government to put call centers in "essential
| services". In my state they are open even with a partial
| lockdown.
| tialaramex wrote:
| The more insecure your workers, the easier it is to get
| them to come in, regardless of what the supposed rules may
| or may not be.
|
| Fast Fashion for example often employs workers in more or
| less sweatshop conditions close to the customers (this
| makes commercial sense, if you make the hot new items in
| Bangladesh you either need to expensively air freight them
| to customers or they're going to take weeks to arrive after
| they're first ordered - there's a reason it isn't called
| "Slow fashion"). These jobs are poorly paid, many workers
| have dubious right-to-work status, weak local language
| skills, may even be paid in cash - and so if you tell them
| they must come in, none of them are going to say "No".
|
| In fact the slackening off in R for the area where my
| sister lives (today the towering chimneys and cavernous
| brick factories are just for tourists, your new dress was
| made in an anonymous single story building on an industrial
| estate) might be driven more by people not needing to own
| new frocks every week when they've been no further than
| their kitchen in a month than because it would actually be
| illegal to staff their business - if nobody's buying what
| you make then suddenly it makes sense to take a handout
| from the government and actually shut rather than pretend
| making mauve turtleneck sweaters or whatever is
| "essential".
| Mauricebranagh wrote:
| For non uk residents "Frock" is a regional term for dress
| - quite common in the Midlands.
| namibj wrote:
| Just to clarify: trans-atlantic shipments take a week
| port-to-port, e.g. Newark, NJ, USA to Antwerp, Belgium.
| (Bangladesh to Italy via Suez-channel looks like a 2-week
| voyage, or 3 weeks to the US west coast. Especially the
| latter would probably have quite a few stops on the way
| along the Asian coast.) You get better economics than
| shipping via air-freight from one full pallet and up.
| Overland truck transport to and from the port is still
| cheaper than air freight, at least in the US and central
| Europe.
|
| For these major routes, there are typically at least bi-
| weekly voyages scheduled, so for this kind of distance,
| you can expect about 11 days pretty uniformly distributed
| +-2 days, if you pay to get on the next ship.
|
| This may lead to (committing to) paying for the spot on
| the ship when your pallet is ready for pickup at the
| factory, not when it arrives at the port) and use low-
| delay overland trucking services. Which operate e.g. in
| lockstep with the port processing to get your pallet on
| the move within half a day of the container being
| unloaded from the ship, ideally having containers pre-
| sorted at the origin to match truck routes at the
| destination. So they can go on a trailer directly from
| the ship and rotate drivers on the delivery tour,
| spending only a few minutes at each drop-off.
|
| Because those can't rely on customers to be there and get
| you unloaded in less than 5 minutes, they need locations
| they can unload at with on-board equipment. They'd notify
| the customer with a GPS-based ETA display, so the
| customer can be ready and immediately move the delivery
| inside. Rely on 360-degree "dashcam" coverage and
| encourage the customer to have the drop-off point under
| video surveillance, just to easily handle potential
| disputes. Have the delivery person use some suitable
| high-res camera with a built-in light to get some full-
| surface-coverage photographic evidence of the condition
| it was delivered in.
|
| I'd guess with a hydraulic lift on the trailer's back and
| some kind of folding manual pallet jack stuck on that
| (fold-up) lift, so they drive up to the location, unlock
| the pallet jack, un-fold the lift, lower the lift almost
| to the ground, detach the pallet jack to drop it the last
| inch/few cm to the ground, pull the jack out, lower the
| lift the rest of the way, drive it on to the lift, open
| the container, get up with the pallet jack, drive the
| pallets (one-by-one) for this drop-off out of the
| container and leave them on the ground, close and lock
| the container, re-arm the jack's hooks, shove it jack
| back under the slightly-lowered folding lift, make it
| hook back in, fold it up, lock the hooking mechanism
| (against theft at a rest stop (short meal and toilet
| breaks exist, but showering can be delayed for the up to
| 2 nights)), fold it all the way up, and go on to drive to
| their next drop-off point.
| jacquesm wrote:
| The final layer is call the insurance company.
| corty wrote:
| Not really, the insurance won't make things right in an
| instant. They will usually compensate you financially, but
| often only after painstaking evaluation of all
| circumstances, weighing their chances in court to get out
| of paying you and maybe a lengthy court battle and a race
| against your bankruptcy.
|
| So yes, getting insurance can be a good idea to offset some
| losses you may have, as long as they are somewhat limited
| compared to your companies overall assets and income. But
| as soon as the insurance payout matches a significant part
| of your net worth, the insurance might not save you.
| jacquesm wrote:
| Fair enough.
| tomatocracy wrote:
| There are always uninsurable events and for large enough
| companies/risks there are also liquidity limits to the size
| of coverage you can get from the market even for insurable
| events.
|
| As such, it makes sense to make the level of risk you plan
| to accept (by not being insured against it and not
| mitigating) a conscious economic decision rather than
| pretending you've covered everything.
| jacquesm wrote:
| As long as you have outside shareholders you can decide
| that. If you do you'd be surprised about how they will
| respond to an attitude like that. After all: you can
| decide the levels of risk that you personally are
| comfortable with leading to extinguishing of the
| business, but a typical shareholder is looking at you to
| protect their investment and not insuring against a known
| risk which at some point in time materializes is an
| excellent way to find yourself in the crosshairs of a
| minority shareholder lawsuit against a (former) company
| executive.
| tomatocracy wrote:
| In my work life I am a professional investor, so I've
| been through the debate on insure/prepare or not many
| times. It's always an economic debate when you get into
| "very expensive" territory (cheap and easy is different
| obviously).
|
| The big example of this which springs to mind is business
| interruption cover - it's ruinously expensive so it's
| extremely unusual to have the max cover the market might
| be prepared to offer. It's a pure economic decision.
| jacquesm wrote:
| Yes, but it is an informed decision and typically taken
| at the board level, very few CEO's that are not 100%
| owners would be comfortable with the decision to leave an
| existential risk uncovered without full approval of all
| those involved, which is kind of logical.
|
| Usually you'd have to show your homework (offers from
| insurance companies proving that it really is
| unaffordable). I totally get the trade-off, and the fact
| that if the business could not exist if it was properly
| insured that plenty of companies will simply take their
| chances.
|
| We also both know that in case something like that does
| go wrong everybody will be looking for a scapegoat, so
| for the CEO's own protection it is quite important to
| play such things by the book, on the off chance the risk
| one day does materialize.
| tomatocracy wrote:
| Absolutely - but that's kind of my point. You should make
| the decision consciously. The corporate governance that
| goes around that _is_ the company making that decision
| consciously.
| jacquesm wrote:
| And this is the heart of the problem: a lot of times
| these decisions are made by people who shouldn't be
| making them or they aren't made at all, they are just
| made by default without bring the fact that a decision is
| required to the level of scrutiny normally associated
| with such decisions.
|
| This has killed quite a few otherwise very viable
| companies, it is fine to take risks as long as you do so
| consciously and with full approval of all stakeholders
| (or at least: a majority of all stakeholders).
| Interesting effects can result: a smaller investor may
| demand indemnification, then one by one the others also
| want that indemnification and ultimately the decision is
| made that the risk is unacceptable anyway (I've seen this
| play out), other variations are that one shareholder ends
| up being bought out because they have a different risk
| appetite than the others.
| brmgb wrote:
| "Go bankrupt" is not a plan. Becoming insolvent might be the
| end result of a situation but it's not going to help you deal
| with it.
|
| Let's take an example which might lead to bankruptcy. A
| typical answer to a major disaster (let's say your main and
| sole building burning as a typical case) for an SME would be
| to cease activity, furlough employes and stop or defer every
| payments you can while you claim insurance and assess your
| options. Well, none of these things are obvious to do
| especially if all your archive and documents just burnt. If
| you think about it (which you should), you will quickly
| realise that you at least need a way to contact all your
| employes, your bank and your counsel (which would most likely
| be the accountant certifying your results rather than a
| lawyer if you are an SME in my country) offsite. That's the
| heart of disaster planning: having solutions at the ready for
| what was easy to foresee so you can better focus on what
| wasn't.
| Twisell wrote:
| Hence the "final layer" statement.
|
| Bankruptcy when dealt with correctly is a process not an
| end.
|
| If everything else fail it's better to fill for bankruptcy
| when there is still something to recover with help of
| others than to burn everything to ashes because of your
| vanity.
|
| At least that's how I understood parent's comment.
| Sanzig wrote:
| As a quick interlude, since this may be confusing to non-
| US readers: bankruptcy in the United States in the
| context of business usually refers to two concepts,
| whereas in many other countries it refers to just one.
|
| There are two types of bankruptcies in the US used most
| often by insolvent businesses: Chapter 7, and Chapter 11.
|
| A Chapter 7 bankruptcy is what most people in other
| countries think of when they hear "bankruptcy" - it's the
| total dissolution of a business and liquidiation of its
| assets to satisfy its creditors. A business does not
| survive a Chapter 7. This is often referred to as a
| "bankruptcy" or "liquidation" in other countries.
|
| A Chapter 11 bankruptcy, on the other hand, is a process
| by which a business is given court protection from its
| creditors and allowed to restructure. If the creditors
| are satisfied with the reorganisation plan (which may
| include agreeing to change the terms of outstanding
| debts), the business emerges from Chapter 11 protection
| and is allowed to continue operating. Otherwise, if an
| agreement can't be reached, the business may end up in
| Chapter 7 and get liquidated. Most countries have an
| equivalent to a Chapter 11, but the name for it varies
| widely. For example, Canada calls it a "Division 1
| Proposal," Australia and the UK call it "administation,"
| and Ireland calls it "examinership."
|
| Since there's a lot of international visitors to HN I
| just thought I'd jump in and provide a bit of clarity so
| we can all ensure we're using the same definition of
| "bankruptcy." A US Chapter 7 bankruptcy is not a plan,
| it's the game over state. A US Chapter 11 bankruptcy, on
| the other hand, can definitely be a strategic maneuver
| when you're in serious trouble, so it can be part of the
| plan (hopefully far down the list).
| breakfastduck wrote:
| This helps a lot, thanks. I think most people
| international would assume bankruptcy = game over.
| brmgb wrote:
| > Bankruptcy when dealt with correctly is a process not
| an end.
|
| Yes, that's why "Go bankrupt" is _not_ a plan which was
| the entire point of my reply. That 's like saying that
| your disaster recovery plan is "solve the disaster".
| corty wrote:
| Going bankrupt is a plan. However, it is a somewhat more
| involved one than it sounds, at first. That's why there
| should be a corporate lawyer advising on stuff like company
| structure, liabilities, continuance of pension plans,
| ordering and reasons for layoffs, etc.
| dragonwriter wrote:
| > "Go bankrupt" is not a plan.
|
| Yes it is. (Though it's better, as GP suggested, as a final
| layer of a plan and not the only layer.)
|
| > Becoming insolvent might be the end result of a situation
| but it's not going to help you deal with it.
|
| Insolvency isn't bankruptcy. Becoming insolvent is a
| consequence, sure. Bankruptcy absolutely does help you deal
| with that impact, that's rather the point of it.
| physicsguy wrote:
| It's not quite that simple, the data you might have may be
| needed for compliance or regulatory reasons. Having no backup
| strategy might make you personally liable depending on the
| country!
| shog_hn wrote:
| Self-hosted Kuberenetes and a FreeNAS storage system at home,
| and a couple of VMs in the cloud. I've got a mixed strategy,
| but it covers everything to remote locations.
|
| I use S3 API compatible object storage platforms for remote
| backup. E.g. BackBlaze B2. I wrote about my backup scripts for
| FreeNAS (jail that runs s3cmd to copy files to B2) here:
| https://www.shogan.co.uk/cloud-2/cheap-s3-cloud-backup-with-...
|
| For Kuberenetes I use velero which can be configured with an S3
| storage backend target:
| https://www.shogan.co.uk/kubernetes/kubernetes-backup-on-ras...
| vbsteven wrote:
| In my case:
|
| * All my services are dockerized and have gitlab pipelines to
| deploy on a kubernetes cluster (RKE/K3s/baremetal-k8s)
|
| * git repo's containing the build scripts/pipelines are
| replicated on my gitlab instance and multiple work computers
| (laptop & desktop)
|
| * Data and databases are regularly dumped and stored in S3 and
| my home server
|
| * Most of the infrastructure setup (AWS/DO/Azure, installing
| kubernetes) is in Terraform git repositories. And a bit of
| Ansible for some older projects.
|
| Because of the above, if anything happens all I need to restore
| a service is a fresh blank VM/dedicated machine or a cloud
| account with a hosted Kubernetes offering. From there it's just
| configuring terraform/ansible variables with the new hosts and
| executing the scripts.
| jrib wrote:
| How often do you test starting from a clean slate?
| kuschku wrote:
| I have a similar setup: I recreate everything for every
| major kubernetes update.
| traveler01 wrote:
| Most people/companies don't have money to setup those disaster
| plans. They need you to have a similar server ready to go and
| also a backuo solution like Amazon S3.
|
| I was affected, my personal VPS is safe but down and other VPS
| I was managing I don't know anything about. I have the backups
| and right now I'd love for them to just set me up a new VPS so
| I can restore the backups and restore the services.
| rainmaking wrote:
| Spin it up now and refuse to pay for the old one later.
| cbozeman wrote:
| > What's YOUR Disaster Recovery Plan?
|
| Prayer and hope, usually.
| temp8964 wrote:
| Do you have access to your control panel when the servers are
| down?
| Cthulhu_ wrote:
| Personal: I run a webserver for some website (wordpress +
| xenforo), I've set up a cronjob that creates a backup of
| /var/www, /etc and a mysql database dump, then uploads it to an
| S3 bucket (with automatic Glacier archiving after X period set
| up). It should be fairly straightforward to rent a new server
| and set things back up. I still dislike having to set up a
| webserver + php manually though, I don't get why that hasn't
| been streamlined yet.
|
| My employer has a single rack of servers at HQ. It's positioned
| at a very specific angle with an AC unit facing it, their exact
| positions are marked out on the floor in tape. The servers
| contain VMs that most employees work on, our git repository,
| issue trackers, and probably customer admin as well. They say
| they do off-site backups, but honestly, when (not if) that
| thing goes it'll be a pretty serious impact on the business.
| They don't like people keeping their code on their take-home
| laptop either (I can't fathom how my colleagues work and how
| they can stand working in a large codebase using barebones vim
| over ssh), but I've employed some professional disobedience
| there.
| catbuttes wrote:
| Have you considered writing an ansible playbook to set all
| that up? You could even have it pull down the backup and do a
| full restore for you...
| walrus01 wrote:
| Video of the aftermath:
|
| https://twitter.com/PompiersFR/status/1369544801817944071
|
| https://twitter.com/abonin_DNA
| ahmedalsudani wrote:
| My disaster recovery plan: we shall rebuild.
| edoceo wrote:
| F
| julianwachholz wrote:
| Finally we get to rewrite everything from scratch!
| philpem wrote:
| The spice will flow, and the tech debt will go!
| SergeAx wrote:
| I don't have to "backup servers" for a long time now. I have an
| Ansible playbook to deploy and orchestrate services, which, in
| turn, are mostly dockerized. So my recovery plan is to turn on
| "sorry, maintenance" banner via CDN, spin up a bunch of new
| VPSes, run Ansible scenario for deployment and restore database
| from hidden replica or latest dump.
| dboreham wrote:
| > restore database from hidden replica or latest dump
|
| You do have backup servers.
| [deleted]
| jaywalk wrote:
| He said he doesn't have _to_ backup servers, not that he
| doesn 't have backup servers.
| el-salvador wrote:
| Pictures from the fire:
|
| https://www.dna.fr/amp/faits-divers-justice/2021/03/10/stras...
| siod wrote:
| This looks like yet another aluminium composite panel fire...
| rkachowski wrote:
| is that entirely cargo containers? is that common for a data
| center?
| Jon_Lowtek wrote:
| No, SBG2 was a building in the "tower design", as is SBG3
| behind it. The container in the foreground are SBG1 from
| the time when OVH didn't know if Strassburg is going to be
| a permanent thing.
| jfrunyon wrote:
| https://en.wikipedia.org/wiki/Modular_data_center
|
| Container DCs were a big thing for a while. Even Google did
| a whole PR thing about how they used them.
| donalhunt wrote:
| Funnily enough, I think it was the fire risk that caused
| them to ditch the idea and move to their current design.
| Though I know modular design is highly likely to be used
| by all players as edge nodes spring up worldwide.
| jeffbee wrote:
| It was also that the container had literally no
| advantages. It was just a meme that did not survive
| rational analysis. The building in which the datacenter
| is located is the simplest, cheapest part of the design.
| Dividing it up into a bunch of inconveniently-sized
| rectangles solves nothing.
| pulse7 wrote:
| Uff... it looks like half of the containers on this picture
| were on fire...
| jannes wrote:
| Are you making a joke about docker containers or am I
| missing something?
| dgellow wrote:
| Part of OVH datacenters are literal, physical containers
| with racks, power supply, vents, etc.
|
| You can see more details here: https://baxtel.com/data-
| center/ovh-strasbourg-campus
| Mauricebranagh wrote:
| Do the containers have fire suppressant systems
| installed?
| [deleted]
| simonke wrote:
| Without AMP: https://www.dna.fr/faits-divers-
| justice/2021/03/10/strasbour...
| jacquesm wrote:
| Nobody hurt. That's a bit of good news.
| burmanm wrote:
| And here's some more from firefighters, while it was burning:
|
| https://twitter.com/xgarreau/status/1369559995491172354
|
| Looks glowing red to me.
| kijin wrote:
| One of my backup servers used to be in the same datacenter as
| the primary server. I only recently moved it to a different
| host. It's still in the same city, though, so I'm considering
| other options. I'm not a big fan of just-make-a-tarball-of-
| everything-and-upload-it-to-the-cloud backup methodology, I
| prefer something a bit more incremental. But with Backblaze B2
| being so cheap, I might as well just upload tarballs to B2. As
| long as I have the data, the servers can be redeployed in a
| couple of hours at most.
|
| The SBG fire illustrates the importance of geographical
| redundancy. Just because the datacenters have different numbers
| at the end doesn't mean that they won't fail at the same time.
| Apart from a large fire or power outage, there are lots of
| things that can take out several datacenters in close vicinity
| at the same time, such as hurricanes and earthquakes.
| sebmellen wrote:
| Duplicity is your best bet for incremental backups using B2.
| I use this for my personal server and it works brilliantly.
| gingerlime wrote:
| I thought so too for a long while. Until I was trying to
| restore something (just to test things), and wasn't able
| to... it might have been specific to our GPG or an older
| version or something... but I decided to switch to restic
| and am much happier now.
|
| Restic has a single binary that takes care of everything.
| It feels more modern and seems to work really well. Never
| had any issue restoring from it.
|
| Just one data point. Stick to whatever works for you. But
| important to test not only your backups, but also restores!
| remram wrote:
| I've been using Duplicati forever. The fact that it's C#
| is a bit of a pain (some distros don't have recent Mono),
| but running it in Docker is easy enough. Being able to
| check the status of backups and restore files from a web
| UI is a huge plus, so is the ability to run the same app
| on all platforms.
|
| I've found duplicity to be a little simplistic and
| brittle. Purging old backups is also difficult, you
| basically have to make a full backup (i.e. non-
| incremental) before you can do that, which increases
| bandwidth and storage cost.
|
| Restic looks great feature-wise, but still feels like the
| low-level component you'd use to build a backup system,
| not a backup system in itself. It's also pre-1.0.
| sebmellen wrote:
| Interesting, I will check Restic out, I've heard other
| good things about it. Duplicity is a bit of a pain to set
| up and Restic's single binary model is more
| straightforward (Go is a miracle). Thanks for the
| recommendation!
|
| GPG is a bit quirky but I do regularly check my backups
| and restores (if once every few months counts as
| regular).
| nix23 wrote:
| +1 for Restic
|
| It's brilliant, works like a charm on freebsd windows and
| a rpi with linux since over 2 years.
| geocrasher wrote:
| I'm using rclone, it works very well for the purpose too.
| nucleardog wrote:
| Ditto. Moved to rclone after having a bunch of random
| small issues with Duplicity that on their own weren't
| major but made me lose faith in something that's going to
| be largely operating unsupervised except for a monthly
| check-in.
| uncledave wrote:
| I'd stay away from duplicity. I've had serious problems
| with it and large inode counts where it'll bang the CPU at
| 100% and never complete.
|
| Have moved to using rdiff-backup over SSH.
| bombcar wrote:
| I've taken to uploading via rsync or similar entire copies -
| as tarballs use the whole bandwidth each time but rsync on
| files brings only the changes.
| Datagenerator wrote:
| One up for rclone, it's parallel and supports many
| endpoints.
| megous wrote:
| I use tarballs because it allows me to not trust the backup
| servers. ssh is set up such that backup server's ssh keys
| are certified to only run a command that will allow them to
| run a backup script that will just return the encrypted
| data, and nothing else.
|
| It's very easy to use spare storage in various places to do
| backups this way, as ssh, gpg and cron are everywhere, and
| you don't need to install any complicated backup solutions
| or trust the backup storage machines much.
|
| All you have to manage centrally is private keys for backup
| encryption, and CA for signing the ssh keys + some
| occasional monitoring/tests.
| dylan604 wrote:
| Can't you add only changes to a tar?
| teddyh wrote:
| Indeed it does; see the --listed-incremental and
| --incremental options:
|
| https://www.gnu.org/software/tar/manual/tar.html#Incremen
| tal...
| iamd3vil wrote:
| Another option for incremental backups is Restic [0]. It has
| support to backup to Backblaze B2, Amazon S3 and lots of
| other places.
|
| [0] https://restic.net/
| [deleted]
| paulmd wrote:
| > I'm not a big fan of just-make-a-tarball-of-everything-and-
| upload-it-to-the-cloud backup methodology, I prefer something
| a bit more incremental.
|
| pretty much a textbook use-case for zfs with some kind of
| snapshot-rolling utility. Snap every hour, send backups once
| a day, prune your backups according to some timetable.
| Transfer as incrementals against the previous stored
| snapshot. Plus you get great data integrity checking on top
| of that.
|
| "but linus said..."
| raverbashing wrote:
| You can do the same with Ext4
| geococcyxc wrote:
| Ext4 has no snapshot feature, do you mean with lvm?
| raverbashing wrote:
| Yes LVM, sorry
| nix23 wrote:
| >"but linus said..."
|
| Yes i still don't understand him, a he calls himself a
| "filesystem guy". Also i don't understand that no one ever
| mentions NILFS2.
| dave_sullivan wrote:
| I am literally in SBG2 so that has been fun.
|
| Turns out, our disaster recovery plan is pretty good.
|
| Datacenter burned down and I still was up 4 hours later in
| another data center with zero data loss. Good times.
| sebastianconcpt wrote:
| And how's SBG1 doing?
| BadBadJellyBean wrote:
| About 1/3 destroyed
| jfrunyon wrote:
| Servers are at a mix of "cloud" providers, and on-site. Most
| data (including system configs!) is backed up on-site nightly,
| and to B2 nightly with historical copies - and critical data is
| also live-replicated to our international branches. (Some "meh"
| data is backed up only to B2, like our phone logs; we can get
| most of the info from our carrier anyway).
|
| Our goal and the reason we have a lot of stuff backed up on-
| prem is to have our most time-critical operations back up
| within a couple of hours - unless the building is destroyed, in
| which case that's a moot point and we'll take what we can get.
|
| A dev wiped our almost-monolithic
| sales/manufacturing/billing/etc MySQL database a month or two
| ago. (I have been repeatedly overruled on the topic of taking
| access to prod away from devs) We were down for around an hour.
| Most of that time was spent pulling gigs of data out of the
| binlog without also wiping it all again. Because our nightly
| backups had failed a couple weeks prior - after our most recent
| monthly "glance at it".
| jacquesm wrote:
| It's true: most companies do not have a disaster recovery plan,
| and many of them confuse a breach protocol with a disaster
| recovery plan ('we have backups').
|
| Fires in DCs aren't rare at all, I know of at least three, one
| of those in a building where I had servers. This one seems to
| be worse than the other two. Datacenters tend to concentrate a
| lot of flammable stuff, throws a ton of current through them
| and does so 24x7. The risk of a fire is definitely not
| imaginary, which is why most DCs have fire suppression
| mechanisms. Whether those work as advertised depends on the
| nature of the fire. An exploding on prem transformer took out a
| good chunk of EV1's datacenter in the early 2000's, and it
| wasn't so much the fire that caused problems for their
| customers, but the fact that someone got injured (or even died,
| I don't recall exactly), and before the investigation was
| completed and the DC released to the owners again took a long
| time.
|
| Being paranoid and having off-site backups is what allowed us
| to be back online before the fire was out. If not for that I
| don't know if our company would have survived.
| tothrowaway wrote:
| I'm at OVH as well (in the BHS datacenter, fortunately). I run
| my entire production system on one beefy machine. The apps and
| database are replicated to a backup machine hosted with Hetzner
| (in their Germany datacenter). I also run a tiny VM at OVH
| which proxies all traffic to Hetzner. I use a failover IP to
| point at the big rig at OVH. If the main machine fails, I move
| the failover IP to the VM, which sends all traffic to Hetzner.
|
| If OVH is totally down, and the fail over IP doesn't work, I
| have a fairly low TTL on the DNS.
|
| I backup the database state to S3 every day.
|
| Since I'm truly paranoid, I have an Intel NUC at my house that
| also replicates the DB. I like knowing that I have a complete
| backup of my entire business within arm's reach.
| eb0la wrote:
| Are you truly paranoid?
|
| If my money and/or job depended on having something running
| without (or with minimal) disruption I would be as paranoid
| as you, too.
|
| BTW - Some people call this business recovery plan, not plain
| paranoia ;-)
| michaelt wrote:
| Enterprise-level projects often have only light protection
| against wrongful hosting account termination, reasoning
| that spending a lot of money and having an account manager
| keeps them safe from clumsy automated systems.
|
| So they might have their primary and replica databases at
| different DCs from the same hosting provider, and only
| their nightly backup to a different provider. Four copies
| to four different providers is a step above three copies
| with two providers!
|
| A large enterprise would probably be using a filesystem
| with periodic snapshots, or streaming their redo log to a
| backup, to protect against a fat-fingered DBA deleting the
| wrong thing. Of course, filesystem snapshots provide no
| protection against loss of DC or wrongful hosting account
| termination, so you might not count them as true backup
| copies.
| numbsafari wrote:
| This is why you should have a "Cloud 3-2-1" backup plan.
| Have 3 copies of your data, two with your primary
| provider, and 1 with another.
|
| e.g., if you are an AWS customer, have your back ups in
| S3 and use simple replication to sync that to either GCS
| or Azure, where you can get the same level of compliance
| attestation as from AWS.
| buran77 wrote:
| It's not paranoia if you're right. All of the risks GP is
| protecting against are things that happen to someone every
| day, and they should be seen like wearing the seat belt in
| a car.
| klingon78 wrote:
| I have a reliability and risk avoidance mindset, but I've
| had to stand back because my mental gas tank for trying
| to keep things going is near empty.
|
| I've really struggled working with others that either are
| both ignorant and apathetic about the business's ability
| to deal with risk or believe that it's their job to keep
| putting duct tape over the duct tape that breaks multiple
| times a day while users struggle.
|
| I like seeing these comments reminding others to a wear
| seat belt or have backups for their backups, but I don't
| know whether I should care more about reliability. I work
| in an environment that's a constant figurative fire.
|
| I also like to spend time with my family. I know it's
| just a job, and it would be even if I were the only one
| responsible for it; that doesn't negate the importance of
| reliability, but there is a balance.
|
| If you are dedicated to reliability, don't let this deter
| you. Some have a full gas tank, which is great.
| CraigJPerry wrote:
| This resonates with me. I notice my gas tank rarely
| depletes because of technology. It doesn't matter how
| brain dead the 00's oracle forms app with absurd
| unsupported EDI submission excel thinga-ma-bob that
| requires a modem ... <fill in the rest of the dumspter
| fire as your imagination deems>. Making a tech stack safe
| is a fun challenge.
|
| Apathetic people though, that can be really tough going.
| It's just that way "because". Or my favourite "oh we
| don't have permission to change that", how about we make
| the case and get permission? __horrified looks__
| sometimes followed by pitch forks.
| buran77 wrote:
| Reliability is there to keep your things running smoothly
| during normal operations. Backups are there for when you
| reach the end of your reliability rope. Neither is really
| a good replacement for the other. The most reliable
| systems will still fail eventually, and the best of
| backups can't run your day to day operations.
|
| At the end of the day you have a budget (of any kind) and
| a list of priorities on which to spend it. It's up to you
| or your management to set a reasonable budget, and to set
| the right priorities. If they refuse, leave or you'll
| just burn the candle at both ends and just fade out.
| klingon78 wrote:
| Backups are a reliability tool, yes.
|
| A backup on its own is of little worth if unused.
|
| When a backup is used to re-enable something, then the
| amount of time disabled may be decreased. When it is,
| this is reliability- we keep things usable and in
| function, more than not.
| jt2190 wrote:
| Consider:
|
| > ... [F]inance is fundamentally about moving money and
| risk through a network. [1]
|
| Your employer has taken on many, many risks as part of
| their enterprise. If _every_ risk is addressed the
| company likely can't operate profitably. In this context,
| your business needs to identify every risk, weigh the
| likelihood and the potential impact, decide whether to
| address or accept the risk, and finally, if they decide
| to address the risk, whether to address it in-house our
| outsource it.
|
| You've identified a risk that is currently being
| "accepted" by your employer, one that you'd like to
| address in-house. Perhaps they've taken on the risk
| unintentionally, out of ignorance.
|
| As a professional the best I can do is to make sure that
| the business isn't ignorant about the risk they've taken
| on. If the risk is too great I might even leave. Beyond
| that I accept that life is full of risks.
|
| [1] Gary Gensler, "Blockchain and money", Introduction
| https://ocw.mit.edu/courses/sloan-school-of-
| management/15-s1...
| venj wrote:
| are your domains at ovh too ? If yes, I'd consider changing
| this: this morning the manager was quite flooded and the DNS
| service was down for some time...
| notaharvardmba wrote:
| Bacula has some really cool features for cloud backups.
|
| https://bacula.org
| yongjik wrote:
| Ow what an unfortunate name.
|
| https://en.wikipedia.org/wiki/Baculum
| schoen wrote:
| It just means "stick" in the original Latin!
| 867-5309 wrote:
| but fortunately also within arm's reach
| waheoo wrote:
| LGTM
| johnchristopher wrote:
| It seems your setup follows the three rules back-ups with at
| least two in different physical location.
| https://www.nakivo.com/blog/3-2-1-backup-rule-efficient-
| data...
| extrasolar wrote:
| this is the way. I do the same. Not paranoid at all.
| vanviegen wrote:
| Are you me, by any chance? :-)
|
| I also run our entire production system on one beefy machine
| at OVH, and replicate to a similar machine at Hetzner. In
| case of a failure, we just change DNS, which has a 1 hour
| TTL. We've needed to do an unplanned fail-over only once in
| over 10 years.
|
| And like you, I have an extra replica at the office, because
| it feels safe having a physical copy of the data literally at
| hand.
| ta988 wrote:
| Same but with a regular offline physical copy (cheap nas).
| One of my worries is a malicious destruction of the backups
| if anything worms its way in my network
| sandworm101 wrote:
| Which is why "off" is still a great security tool. A copy
| on a non-powered device, even if that device is attached
| to the network, is immune to worms. There is something to
| be said for a NAS solution that requires a physical act
| to turn on and perform an update.
| extrasolar wrote:
| hetzner has storage boxes and auto snapshots. so even if
| someone deletes the backups remotely there are still
| snapshots which they can't get unless they have control
| panel access.
| 7ewis wrote:
| Not done any research into it, but I always thought OVH was
| supposed to be a very budget VPS service primarily for
| personal use rather than business. Although thought it was
| akin to having a Raspberry Pi plugged in at home.
|
| Again, I may be completely wrong but why would you not use
| AWS/GCP? Even if it's complexity, Amazon have Lightsail, or
| if it's cost I thought DigitalOcean was one of the only
| reputable business-grade VPS providers.
|
| I just can't imagine many situations where a VPS would be
| superior to embracing the cloud and using cloud functions,
| containers, instances with autoscaling/load balancers etc.
| Symbiote wrote:
| OVH is a European equivalent to Digital Ocean.
|
| It has twice the revenue, and is the third largest
| hosting provider in the world.
| Thaxll wrote:
| I would call it the EU Rackspace version. Except that
| it's not like insane Rackspace price.
| nixgeek wrote:
| Twice the revenue of DigitalOcean still puts it < $1B
| ARR, or am I missing something? I can't see how that's
| the third largest in the world, or does your definition
| of "hosting provider" exclude clouds?
| Symbiote wrote:
| I took it from the top of their Wikipedia page.
|
| In any case, they aren't "primarily for personal use".
|
| https://en.wikipedia.org/wiki/OVH
| olivierduval wrote:
| OVH STARTED as a budget VPS service some 20 years ago...
| but they grew a lot since 6-7 years, adding more "cloud"
| services and capabilities, even not on par with the main
| players...
|
| Why not use AWS/GCP? From my personal point of view: as a
| French citizen, I'm more and more convinced that I can't
| completly trust the (US) big boys for my own safety.
| Trump showed that "US interest" is far more important
| than "customer interest" or even "ally interest". And
| moreover, Google is showing quite regurlaly that it's not
| a reliable business partner (AWS look better for this).
| extrasolar wrote:
| price, also elaborate hardware customizations are not
| possible and then you are still running on hypervisor vs
| baremetal.
| posix_me_less wrote:
| > Google is showing quite regurlaly that it's not a
| reliable business partner
|
| Interesting, any examples?
| sergiosgc wrote:
| I'm not the OP, but I'd imagine it is the combination of
| no support line and algorithmic suspension of business
| accounts. It is a relevant risk.
| olivierduval wrote:
| Yeah, I was thinking about all the horror stories that
| can be found on this site.
|
| As a customer (or maybe an "involontary data provider"),
| I do as much as I can to avoid Google to be my SPOF, not
| technically (it's really technically reliable) but on the
| business side. I had to setup my own mail server just to
| avoid any risk of google-ban for example... just in case.
| I won't use Google authentificator for the same reason.
| I'm happy to have left Google Photos some years ago, to
| avoid problems of Google shutting it down. And the list
| could go on...
|
| As a business, I like to program Android apps but the
| Google Store is really a risk too. Risk to have any
| Google account blacklisted because some algorithm thought
| I did something wrong. And no appeal.
|
| Maybe all this doesn't apply to GCP customers. Maybe GCP
| customers have a human direct line, with someone to
| really help and the capacity to do it. Or maybe it's just
| Google: as long as it work, enjoy. If it doesn't, go to
| (algorithmic) hell.
| ev1 wrote:
| OVH is one of the largest providers in the world. They
| run a sub brand for personal use (bare metal for $5/m,
| hardware replacements in 30 min or less usually).
|
| ..and they do support all of those things you just
| listed, not just API-backed bare metal.
| treesknees wrote:
| Is that a typo? I only see OVH bare metal starting at
| >$50. How could a provider offer a bare metal server for
| $5?
| tecleandor wrote:
| Kimsufi has tiny cheap Atom servers with self built or
| made to order racks and hardware:
|
| (This is 2011, I think it looks fancier now)
|
| https://lafibre.info/ovh-datacenter/data-center-ovh-
| roubaix-...
|
| Edit: Seems like they stopped publishing videos for that
| datacenter, but this seems to be a video for the burn
| down datacenter in 2013:
| https://www.youtube.com/watch?v=Y47RM9zylFY
| ev1 wrote:
| OVH has new servers.
|
| Their sub-brand soyoustart has older servers (that are
| still perfectly fine), roughly E3 Xeon/16-32GB/3x2TB to
| 4x2TB for $40/m ex vat.
|
| Their other sub brand kimsufi for personal servers has
| Atom low-power bare metal with 2TB HDD (in reality it is
| advertised 500GB/1TB, but they don't really have any of
| those in stock left, if your drive fails they replace it
| with a 2T - so far this has been my exp) for $5.
|
| All of this is powered by automation, you don't really
| get any support and you are expected to be competent. If
| your server is hacked you get PXE-rebooted into a rescue
| system and can scp/rsync off your contents before your
| server is reinstalled. OS installs, reboots, provisioning
| are all automated, there's essentially no human contact.
|
| PS: Scaleway, in Paris, used to offer $2 bare metal
| (ultra low voltage, weaker than an Atom, 2GB ram), but
| pulled all their cheap machines, raised prices on
| existing users, and rebranded as enterprisey. The offer
| was called 'kidechire'
|
| --
|
| It is kind of interesting that on the US side everyone is
| in disbelief, or like "why not use AWS" - while most of
| the European market knows of OVH, Hetzner, etc.
|
| My own reason for using OVH? It's affordable and I would
| not have gotten many projects (and the gaming community I
| help out with) off the ground otherwise. I can rent bare
| metal with NVMe, and several terabytes of RAM for less
| than my daily wage for the whole month, and not worry
| about per-GB billing or attacks. In the gaming world you
| generally do not ever want to use usage based billing -
| made the mistake of using Cloudfront and S3 once and
| banned script kiddies would wget-loop the largest
| possible file from the most expensive region botnet
| repeatedly in a money-DoS.
|
| I legitimately wouldn't have been able to do my "for-fun-
| and-learning" side projects (no funding, no accelerator
| credits, ...) without someone like them. The equivalent
| of a digitalocean $1000/m VM is about $100 on OVH.
| iagovar wrote:
| I like scalingo too. If you need a bit more, they have
| DBaaS, APP Containers and Networking.
| ev1 wrote:
| Scalingo is EUR552.96/m for 16GB of memory.
|
| 32c xeon/256GB ECC/500GB SSD 8TB HDD is $100/m at OVH.
| The difference is amusing.
| yannski wrote:
| you're comparing a PaaS with a piece of hardware. It's
| absolutely not comparable.
|
| Yann, CEO at Scalingo
| sudosysgen wrote:
| It's not a typo. OVH runs Kimsufi, which has bare metal
| servers for as low as 5$. It is pretty insane.
| treesknees wrote:
| Thank you. TIL!
| veeti wrote:
| AWS is a total and utter ripoff compared to the
| price/performance, DDoS protection & unmetered bandwidth
| provided by OVH.
| api wrote:
| If all you need is compute, storage, and a pipe, all the
| big cloud providers are a total ripoff and you should
| look elsewhere. The big ones only make sense if you are
| leveraging their managed features or if you need extreme
| elasticity with little chance of a problem scaling up in
| real time.
|
| OVH is one of the better deals for bare metal, but there
| are even better ones for bandwidth. You have to shop
| around a lot.
|
| Also be sure you have a recovery plan... even with the
| big providers. These days risks include not only physical
| stuff but some stupid bot shutting you off because it
| thinks you violated TOS or is reacting to a possibly
| malicious complaint.
|
| We had a bot at AWS gank some test systems once because
| it thought we were cryptocurrency mining with free
| credits. We weren't, but we were doing very CPU intensive
| testing. I've heard of this and worse happening
| elsewhere. DDOS detector and IDS bots are particularly
| notorious.
| extrasolar wrote:
| no, OVH has dedicated servers, lot of big companies use
| it to build out private clouds, much cheaper then amazon
| or google
| omnimus wrote:
| You cant imagine it yet big chunk of the independent
| internet runs on small vps servers. There isnt much
| difference between DO and OVH, Hetzner, Vultr, Linode...
| not sure why DO would be better. I mean its US company
| doing marketing right. Thats the difference. Plus
| ovh/hetzner have only EU locations.
|
| I think small bussinesses like smaller simple providers
| instead of bigclouds. Its different philosophy if you are
| afraid of extreme centralisation of internet it makes
| sense.
| Sanzig wrote:
| OVH has at least one large North American datacenter in
| Beauharnois, located just south of Montreal. I've used
| them before for cheap dedicated servers. They may have
| others.
| omnimus wrote:
| Yes ididnt know and i was generalizing too much.
|
| But i assume they are less known in US.
| eloff wrote:
| I can think of a lot of big differences. For one you can
| get much larger machines at OVH and Hetzner with fancy
| storage configurations for your database if desired (e.g.
| Optane for your indices, magnetic drives for your
| transaction log, and raided SSDs for the tables)
|
| They also don't charge for bandwidth, although some of
| those other providers have a generous free bandwidth and
| cheap overage.
| omnimus wrote:
| So you are saying they might be even better than DO
| depinding on requirements.
|
| I didnt know.
| eloff wrote:
| Much cheaper and better performance at the high end.
| Doesn't compete at all at the low-end, except through
| their budget brand Kimsufi. I don't see them really as
| targeting the same market.
| sweeneyrod wrote:
| I don't know about OVH but Hetzner beats DO at the lower
| end: for $5/month you get 2 CPUs vs 1, 2 GB RAM vs 1, 40
| GB disk vs 25 and 20 TB traffic vs 1. They have an even
| lower-end package for 2.96 Euro/month as well.
| ArchOversight wrote:
| I rent a server from OVH for $32 a month. It's their So
| You Start line... doesn't come with fancy enterprise
| support and the like.
|
| It's a 4 core 8 thread Xeon with 3x 1TB SATA with 32GB of
| ECC RAM IIRC (E3-SAT-1-32, got it during a sale with a
| price that is guaranteed as long as I keep renewing it)
|
| The thing is great, I can run a bunch of VM's on it, it
| runs my websites and email.
|
| Overall to get something comparable elsewhere I would be
| paying 3 to 4 times as much.
|
| I would consider $50 a month or less low end pricing.
| -\\_(tsu)_/-
| eloff wrote:
| Yeah, I forgot they also have the so you start brand.
| It's probably more expensive than the majority of what
| digital ocean sells, but there is some overlap for sure.
| mwcampbell wrote:
| > Optane for your indices
|
| At OVH? If so, their US data centers don't seem to have
| that option.
|
| Not that I need it. The largest database I run could
| easily fit in RAM on a reasonably sized dedicated box.
| eloff wrote:
| I didn't realize they had US datacenters before now. It's
| possible that's no longer an option. It was on the
| largest servers in the Montreal datacenter when I specced
| that out.
| extrasolar wrote:
| they have 2 data centers in the US
| bleuarff wrote:
| OVH is not Europe-only, it has datacenters in America,
| Asia and Australia[1].
|
| [1] https://www.ovh.com/world/us/about-us/datacenters.xml
| sudosysgen wrote:
| OVH has a location in Canada, now.
| AmericanChopper wrote:
| > I like knowing that I have a complete backup of my entire
| business within arm's reach.
|
| It could also provide a burglar a fantastic opportunity to
| pivot into career in data breaches.
| tgsovlerkhgsel wrote:
| This problem is usually solved through encryption.
| AmericanChopper wrote:
| If I were to ask my CISO if I was allowed to bring the
| production database home, I'm pretty sure his answer
| wouldn't be "as long as you encrypt it".
| brmgb wrote:
| That's because he doesn't trust you with this data. That
| has nothing to do with encryption safety.
|
| There is nothing magical about data centers making them
| safe while your local copy isn't.
| AmericanChopper wrote:
| > There is nothing magical about data centers making them
| safe while your local copy isn't.
|
| Is this a serious comment? My house is not certified as
| being compliant with any security standards. Here's the
| list that the 3rd party datacenter we use is certified as
| complaint with:
|
| https://aws.amazon.com/compliance/programs/
|
| The data centers we operate ourselves are audited against
| several of those standards too. I guess you're right that
| there's nothing magic about security controls, but it has
| nothing to do with trust. Sensitive data should generally
| never leave a secure facility, outside of particularly
| controlled circumstances.
| brmgb wrote:
| Of course it's serious.
|
| You are entierly missing the point by quoting the
| compliance programs followed by AWS whose sole business
| is being a third party hoster.
|
| For most business, what you call sensitive data is
| customers and orders listing, payment history, inventory
| if you are dealing in physical goods and HR related
| files. These are not state secrets. Encryption and a
| modicum of physical security go a long way.
|
| I personally find the idea that you shouldn't store a
| local backup of this kind of data out of security concern
| entirely laughable. But that's me.
| AmericanChopper wrote:
| This is quite a significant revision to your previous
| statement that there's nothing about a data center that
| makes it more secure than your house.
|
| This attitude that your data isn't very important, so
| it's fine to not be very concerned about it's security,
| while not entirely uncommon, is something most
| organisations try to avoid when choosing vendors. It's
| something consumers are generally unconcerned about,
| until a breach occurs, and The Intercept write an article
| about it. At which point I'm sure all the people ITT who
| are saying it's fine to take your production database
| home would be piling on with how stupid the company was
| for doing ridiculous things like taking a copy of their
| production database home.
| brmgb wrote:
| > This is quite a significant revision to your previous
| statement that there's nothing about a data center that
| makes it more secure than your house.
|
| I said there was nothing magical about data centers
| security, a point I stand with.
|
| It's all about proper storage (encryption) and physical
| security. Obviously, the physical security of an AWS data
| center will be tighter that your typical SME but in a way
| which is of no significance to storing backups.
|
| > This attitude that your data isn't very important
|
| You are once again missing the point.
|
| It's not that your data isn't important. It's that
| storing it encrypted in a sensible place (and to be clear
| by that I just mean not lying around - a drawer in an
| office or your server room seems perfectly adequate to
| me) is secure enough.
|
| The benefits of having easily available backups by far
| trump the utterly far fetched idea that someone might
| break into your office to steal your encrypted backups.
| logifail wrote:
| > It's that storing it encrypted in a sensible place (and
| to be clear by that I just mean not lying around - a
| drawer in an office or your server room seems perfectly
| adequate to me) is secure enough.
|
| In the SME space some things are "different", and if
| you've not worked there it can be hard to get one's head
| around it:
|
| A client of mine was burgled some years ago.
|
| Typical small business, offices on an industrial estate
| with no residential housing anywhere nearby. Busy in the
| daytime, quiet as the grave during the night. The
| attackers came in the wee small hours, broke through the
| front door (the locks held, the door frame didn't), which
| must have made quite a bit of noise. The alarm system was
| faulty and didn't go off (later determined to be a 3rd
| party alarm installer error...)
|
| All internal doors were unlocked, PCs and laptops were
| all in plain sight, servers in the "comms room" - that
| wasn't locked either.
|
| The attacker(s) made a cursory search at every desk, and
| the _only_ thing that was taken _at all_ was a light
| commercial vehicle which was parked at the side of the
| property, its keys had been kept in the top drawer of one
| of the desks.
|
| The guy who looked after the vehicle - and who'd lost
| "his" ride - was extremely cross, everyone else (from the
| MD on downwards) felt like they'd dodged a bullet.
|
| Physical security duly got budget thrown at it - stable
| doors and horses, the way the world usually turns.
| mikepurvis wrote:
| But how many of those splashy breaches ended up being
| because of the off-site backup copy of the database at
| the CEO's house?
| sneak wrote:
| Once you're big enough to afford a CISO, you're likely
| big enough to afford office space with decent physical
| security to serve as a third replicated database site to
| complement your two datacenters.
|
| These solutions are not one-size-fits-all. What works for
| a small startup isn't appropriate for a 100+ person
| company.
| AmericanChopper wrote:
| Yes, I agree. Small companies typically are very bad at
| security.
| sneak wrote:
| I am really good at security and I too keep encrypted
| backups on-site in my house.
| _joel wrote:
| Not in my experience. Worked at some small shops that
| were lightyears ahead in terms of policy, procedures and
| attitude compared to places I've worked with 50k+
| employees globally.
| AmericanChopper wrote:
| Large organisations tend not to achieve security
| compliance with overly sophisticated systems of policy
| and controls. They tend to do it using bureaucracy, which
| while usually rather effective at implementing the level
| of control required, will typically leave a lot to be
| desired in regards to UX and productivity. Small
| organisations tend to ignore the topic entirely until
| they encounter a prospective client or regulatory barrier
| that demands it. At which point they may initially
| implement some highly elegant systems. Until they grow
| large enough that they all devolve into bureaucratic
| mazes.
| _joel wrote:
| I'm aware, but that's not been my experience. I've been
| in large places where there's been a lassiez faire
| attitude because it was "another team's job" and general
| bikeshedding over smaller features because the bigger
| picture security wasn't their area or was forced from a
| dictat from above to use X because they're on the board,
| whilst X is completely unfit for purpose. There's no
| pushback. However I've worked at small ISPs where we took
| security extremely seriously. Appropriate background
| check and industry policy but moreso the attitude... we
| wanted to offer customers security because we had pride
| in our work.
| croon wrote:
| Well, it's not because the encryption is insecure.
| axaxs wrote:
| LUKS is your friend.
| Thorrez wrote:
| But it does protect somewhat against ransomware on the
| servers.
| dredmorbius wrote:
| For small firms, CEO / CTO maintaining off-sites at a
| residence is reasonable and not an uncommon practice. As
| with all security / risk mitigation practices, there is a
| balance of risks and costs involved.
|
| And as noted, encrypted backups would be resistant to
| casual interdiction, or even strongly-motivated attempts.
| Data loss being the principle risk mitigated by off-site,
| on-hand backups.
| baggy_trough wrote:
| It all depends on your paranoia level of data hacking
| burglars vs. vaporized data centers.
| ethanpil wrote:
| Does anyone have experience with lvarchive or FSArchiver (or
| similar) to backup images of live systems instead of file based
| backup solutions?
| lgeorget wrote:
| We have two servers at OVH (RBX and GRA, not SBG). I make
| backups of all containers and VMs every day and keep the last
| three, plus one one each month. Backups are stored in a
| separate OVH storage disk and also downloaded to a NAS on-
| premise. In case of a disaster, we'd have to rent a new server,
| reprovision the VMs and containers and restore the backups.
| About two days of work to make sure everything works fine and
| we could lose about 24 hours of data.
|
| It's not the best in terms of Disaster Recovery Plan but we
| accept that level of risk.
| neurostimulant wrote:
| Nothing too crazy, just a simple daily cron to see sync user
| data and database dumps on our OVH boxes to backblaze and
| rsync.net. This simple setup is already saved our asses a few
| times already.
| jbeales wrote:
| My recovery plan: tarball & upload to Object Store. I'm going
| to check out exactly how much replication the OVH object store
| offers, and see about adding a second geographic location, and
| maybe even a second provider, tomorrow.
|
| (My servers aren't in SBG either - phew!)
| nucleardog wrote:
| If your primary data is on OVH, I'd look at using another
| company's object store if feasible (S3, B2, etc). If
| possible, on another payment method. (If you want to be
| really paranoid, something issued under another legal
| entity.)
|
| There's a whole class of (mostly non-technical) risks that
| you solve for when you do this.
|
| If anything happens with your payment method (fails and you
| don't notice in time; all accounts frozen for investigation),
| OVH account (hacked, suspended), OVH itself (sudden
| bankruptcy?), etc, then at least you have _one_ other copy.
| It's not stuff that's likely to happen, but the cost of
| planning for it at least as far as "haven't completely lost
| all my data even if it's going to be a pain to restore" here
| is relatively minimal.
| dfsegoat wrote:
| We test rolling over the entire stack to another AWS DR region
| (just one we dont normally use) from S3 backups, etc. We do
| this annually and try to introduce some variations to the
| scenarios. It takes us about 18 hours realistically.
|
| Documentation / SOPs that have been tested thoroughly by
| various team members are really important. It helps work out
| any kinks in interpretation, syntax errors etc.
|
| It does feel a little ridiculous at the time for all the effort
| involved, but incidents like this show why it's so important.
| juangacovas wrote:
| Less than a day for disaster recovery on fresh hardware? Same
| as my case. As you say, good enough for most purposes, but I'm
| also looking for improvement. I have offsite realtime replicas
| for data and mariaDBs, and offsite nightly backups (combo of
| rsnapshot, lsyncd, mariaDB multi-source replication, and a
| post-new-install script that setups almost everything in case
| you have to recover on bare-metal, i.e. no available VM
| snapshots).
|
| Currently trying to reduce that "less than a day" though.
| Recently discovered "ReaR" (Relax and Recover) from RedHat and
| sounds really nice for bare-metal servers. Not everybody runs
| on virtualized/cloud (being able to recover from VM snapshots
| is really a plus). Let's share experiencies :)
| KronisLV wrote:
| Here's what i do for my homelab setup that has a few machines
| running locally and some VPSes "in the cloud":
|
| I personally have almost all of the software running in
| containers with an orchestrator on top (Docker Swarm in my
| case, others may also use Nomad, Kubernetes or something else).
| That way, rescheduling services on different nodes becomes less
| of a hassle in case of any one of them failing, since i know
| what should be running and what configuration i expect it to
| have, as well as what data needs to be persisted.
|
| At the moment i'm using Time4VPS ( affiliate link:
| https://www.time4vps.com/?affid=5294 ) for the stuff that needs
| decent availability and because they're cheaper than almost all
| of the alternatives i've looked at (DigitalOcean, Vultr,
| Scaleway, AWS, Azure) and that matters to me.
|
| Now, in case the entire data centre disappears, all of my data
| would still be available on a few HDDs under my desk (which are
| then replicated to other HDDs with rsync locally), given that i
| use BackupPC for incremental scheduled backups with rsync:
| https://backuppc.github.io/backuppc/
|
| For simplicity, the containers also use bind mounts, so all of
| the data is readable directly from the file system, for
| example, under /docker (not really following some of the *nix
| file system layout practices, but this works for me because
| it's really easy to tell where the data that i want is).
|
| I actually had to migrate over to a new node a while back, took
| around 30 minutes in total (updating DNS records included).
| Ansible can also really help with configuring new nodes. I'm
| not saying that my setup would work for most people or even
| anything past startups, but it seems sufficient for my
| homelab/VPS needs.
|
| My conclusions: - containers are pretty useful
| for reproducing software across servers - knowing exactly
| which data you want to preserve (such as
| /var/lib/postgresql/data/pgdata) is also pretty useful, even
| though a lot of software doesn't really play nicely with the
| idea - backups and incremental backups are pretty doable
| even without relying on a particular platform's offerings,
| BackupPC is more than competent and buying HDDs is far more
| cost effective than renting that space - automatic
| failover (both DNS and moving the data to a new node) seems
| complicated, as does using distributed file systems; those are
| probably useful but far beyond what i actually want to spend
| time on in my homelab - you should still check your
| backups
| sparrish wrote:
| Got burned once (no pun intended), learned my lesson.
|
| Hot spare on a different continent with replicated data along
| with a third box just for backups. The backup box gets offsite
| backups held in a safe with another redundant copy in another
| site in another safe.
|
| Restores are tested quarterly.
|
| Keep backups of backups. Once bitten, twice shy.
| Twirrim wrote:
| > Hot spare on a different continent
|
| Just be cautious about data locality laws (not likely to
| affect you as joe average, more for businesses)
| Mauricebranagh wrote:
| A few years ago I worked on the British Telecom Worldwide
| intranet team and we had a matrix mapping various countries
| encryption laws.
|
| This was so we remained legal in all of the countries BT
| worked in which required a lot of behind the scenes work to
| make sure we didn't serve "illegaly encypted" Data.
| bongoman37 wrote:
| yeah, there's lots of countries with regulations that
| certain data can't leave the geographical boundary of the
| country. Often, it is the most sensitive data.
| mike_d wrote:
| These laws generally don't work how people think they do.
|
| For example, the Russian data residency law states that a
| copy of the data must be stored domestically, not that it
| can't be replicated outside the country.
|
| The UAE has poorly written laws that have different
| regulations for different types of data - including fun
| stuff like only being subject to specific requirements if
| the data enters a 270 acre business park in Dubai.
|
| Don't even get me started on storing encrypted data in
| one country and the keys in another...
| gameshot911 wrote:
| Have you been bitten, personally? If so, story time?
| pcl wrote:
| > Restores are tested quarterly.
|
| Probably this is the most important part of your plan. It's
| not the backup that matters; it's the restore. And if you
| don't practice it from time to time, it's probably not going
| to work when you need it.
| neo34 wrote:
| "Everyone is safe. Fire has destroyed SBG2. A part of SBG1 is
| destroyed. The firefighters are protecting SBG3. No impact on
| SBG4". Tweet from Octave Klaba, founder of OVHcloud. "All our
| clients on this site are possibly impacted"
|
| Tout le monde est sain et sauf. Le feu a detruit SBG2. Une partie
| de SBG1 est detruite. Les pompiers protegent actuellement SBG3.
| Pas d'impact sur SBG4 >>, a tweete Octave Klaba, le fondateur
| d'OVHcloud, en designant les differentes parties du site. << Tous
| nos clients sur ce centre de donnees sont susceptibles d'etre
| impactes >>, a precise l'entreprise sur Twitter.
| cbg0 wrote:
| This event reminded me of the fire at The Planet a while back:
| https://www.datacenterknowledge.com/archives/2008/06/01/expl...
| archit3cture wrote:
| in case there were not posted before, here are pictures of SBG2
| in flames taken by the firemen.
| https://twitter.com/xgarreau/status/1369559995491172354
|
| This puts an image on the sentence "SBG2 is destroyed". Do not
| expect any recovery from SBG2.
| benlumen wrote:
| Holy hell. Are these "datacenters" really just shipping
| containers? That's what it looks like.
| switch007 wrote:
| Aren't they a bargain-basement provider? You get what you pay
| for I guess?
| tantalor wrote:
| https://en.wikipedia.org/wiki/Modular_data_center
| quickthrower2 wrote:
| Insert docker joke...
| lucb1e wrote:
| Status shows greens across the board
|
| http://status.ovh.com/vms/index_sbg2.html
|
| Am I looking at the wrong thing or am I right to wonder why we
| still bother with public status pages if it never shows the
| real status?
|
| Edit: nvm just saw another comment pointing out the same
| further down the thread (I randomly came across this page while
| looking for the physical location of another DC)
| helge9210 wrote:
| Almost 11 years ago (March 27 2010) in Ukraine datacenter of
| company Hosting.ua went in flames as clients were watching their
| systems go unresponsive at various rows of the datacenter.
|
| Anti-fire systems didn't kick-in. The reason? For couple of days
| the system was detecting a little bit of smoke from one of the
| devices. Operators weren't able to pinpoint exact location,
| considered it a false alarm and manually switched anti-fire
| system off.
| plasma wrote:
| " Update 7:20am Fire is over. Firefighters continue to cool the
| buildings with the water. We don't have the access to the site.
| That is why SBG1, SBG3, SBG4 won't be restarted today."
|
| https://mobile.twitter.com/olesovhcom/status/136953578757072...
| rgj wrote:
| With water, they said.
| optimalsolver wrote:
| Liquid cooling.
| rgj wrote:
| No, as in "firehose". https://mobile.twitter.com/abonin_DNA
| /status/136953802824345...
| lovedswain wrote:
| > In the case of Roubaix 4, the Datacenter is made with a lot of
| wood:
|
| > Finally, we have other photos of the floor of the OVH "Roubaix
| 4" tower. It is clearly wood! Hope it's fireproof wood! A wooden
| datacenter ... is still original, we must admit.
|
| > In France, data centers are mainly regulated by the labor code,
| by ICPE recommendations (with authorization or declaration) and
| by insurers. At the purely regulatory level, the only things that
| are required are:
|
| > - Mechanical or natural smoke extraction for blind premises or
| those covering more than 300m2
|
| > - The fire compartmentalization beyond a certain volume / m2
|
| > - Emergency exits accessible with a certain width
|
| > - Ventilation giving a minimum of fresh air per occupant
|
| > - Access to the firefighter from the facade for the premises
| are the low floor of the last level is more than 8 meters
|
| > - 1 toilet for 10 people (occupying a position considered
| "fixed")
|
| https://lafibre.info/ovh-datacenter/ovh-et-la-protection-inc...
| MayeulC wrote:
| Ah, interesting pics and discussions (in French) in the
| neighboring thread: https://lafibre.info/datacenter/incendie-
| sur-un-site-ovh-a-s...
| malobre wrote:
| "1 chiotte" -> "1 toilet" not puppy
| SilasX wrote:
| Late to the party, but ... context? Everyone is talking like OVH
| or the SBG2 buildings are well-known and common knowledge.
| Symbiote wrote:
| https://letmegooglethat.com/?q=OVH
| walrus01 wrote:
| Reminder to not only have backups, but also have some periodic
| OFFLINE backups.
|
| If your primary is set up with credentials to automatically
| transfer a copy to the backup destination over the network, what
| happens if your primary gets pwned and the access is used to
| encrypt or delete the backup?
|
| Secondly, test doing restores of your backups, and have
| methods/procedures in place for exactly what a restore looks
| like.
| [deleted]
| qeternity wrote:
| > what happens if your primary gets pwned and the access is
| used to encrypt or delete the backup?
|
| Append-only permissions. We do this in S3 for that specific
| reason. S3 lifecycle rules take care of pruning old backups.
|
| You can also build a pull-based system where auth resides on
| the backup system, not the production system.
| dbrgn wrote:
| Wow, SBG3 seems to be OK: <<Update 11:20am: All servers in SBG3
| are okey. They are off, but not impacted. We create a plan how to
| restart them and connect to the network. no ETA. Now, we will
| verify SBG1.>>
|
| https://twitter.com/olesovhcom/status/1369592437585412097
| molszanski wrote:
| Is that a New Serverless Platform everyone is talking about
| recently?
| rctay89 wrote:
| Puzzles and puzzle storm is down on Lichess:
|
| > Due to a fire at one of our data centres, a few of our servers
| are down and may be down permanently. We are restoring these
| servers from backups and will enable puzzles and storm as soon as
| possible. > > We hope that everyone who is dealing with the fire
| is safe, including the firefighters and everyone at OVH. <3.
| gnulinux wrote:
| I think playing "from position" is also broken? I was playing
| chess with my friend and we usually play "from position" but it
| wasn't working just now so we're playing standard instead. It
| might be an unrelated bug.
| ericra wrote:
| I opened two tabs to relax tonight: Hacker News and Lichess.
| This was the top HN thread, and Lichess is having issues
| because of the fire.
|
| I didn't know what OVH was before 10 minutes ago, but this
| seems really impactful. I hope everyone there is safe and that
| the immediate disaster gets resolved quickly.
| sofixa wrote:
| Look them up, they're one of the biggest hosting providers in
| the world ( especially in France), and due to cheap prices
| are especially pop-up with smaller scale stuff.
| Macha wrote:
| Yeah, they scale down to their kimsufi line which used to
| have quite powerful dedicated servers for the price of
| basic VPSes from other providers.
|
| e.g. They have a 4core, 16gb ram server for $22/mo which is
| 25% of what my preferred provider, Linode, charges.
|
| Now, it comes with older consumer hardware (that one is a
| sandy bridge i5), and about as much support as the price
| tag suggests, as well as a dated management interface, but
| when I used to run a modded minecraft server as a college
| student, which needed excessive amounts of RAM and could be
| easily upset by load spikes on other clients, then it was a
| no-brainer, even if I would expect the modern-ish Xeons
| Linode uses to win on a core for core basis.
| ev1 wrote:
| Dated? They're probably the only place that isn't $comedy
| "bare metal cloud" pricing that not only has an API for
| their $5/m servers but also the panel is a reasonably
| modern SPA that implements that API and uses OAuth for
| login
| Macha wrote:
| Has it been replaced since I last used it in ~2016? This
| is not the interface I had to use at all.
|
| This is the interface they had when I used them last:
|
| https://www.youtube.com/watch?v=h5-J_DO_FS0
| ev1 wrote:
| Yeah this is not at all what you use now. It's an Angular
| SPA.
|
| It is still a mess of separate accounts, but you can use
| email addr to log in instead of random-numbers generated
| handle.
|
| The OVHCloud US is completely separated for legal
| purposes, from what I remember. No account sharing,
| different staff, OVH EU cannot help you at all with US
| accounts.
|
| https://www.youtube.com/watch?v=I2G6TkKg0gQ
| littlestymaar wrote:
| Yes it changed. It was replaced by a more modern version
| a few years ago (but the transition was painful, as not
| everything was implemented in the new version when they
| started to deploy it).
| ev1 wrote:
| for me, at times you'd not bother trying to remember your
| NIC-handle and just curl the API instead out of laziness
| 112233 wrote:
| Photos and video from the scene:
|
| https://www.dna.fr/faits-divers-justice/2021/03/10/strasbour...
| betamaxthetape wrote:
| This is certainly a good reminder to have regular backups. I have
| (had?) a VPS in SBG1, the shipping-container-based data centre in
| the Strasbourg site, and the latest I know is that out of the 12
| containers, 8 are OK but 4 are destroyed [1]. Regardless, I
| imagine it will be weeks / probably months before even the OK
| servers can be brought back online.
|
| Naturally, I didn't do regular backups. Most of the data I have
| from a 2019 backup, but there's a metadata database that I don't
| have a copy of, and will need to reconstruct from scratch.
| Thankfully for my case reconstructing will be possible - I know
| that's not the case for everyone.
|
| Right now I'm feeling pretty stupid, but I only have myself to
| blame. For their part, OVH have been really good at keeping
| everyone updated (particularly the Founder and CEO, Octave
| Klaba).
|
| I believe that when I signed up back in 2017, the Strasbourg
| location was advertised as one of the cheapest, so I can imagine
| a lot of people with a ~$4 / month OVH VPS are in the same
| situation, desperately scrambling to find a backup.
|
| (For those that have a OVH VPS that's down right now, you can
| find what location it is in by logging onto the OVH control
| panel.)
|
| [1] https://twitter.com/olesovhcom/status/1369598998441558016
| [deleted]
| pwned1 wrote:
| To be honest, this gives me a little schadenfreude. OVH is the
| most notorious host that refuses to act on abuse complaints for
| phishing sites.
| pmontra wrote:
| Some pictures of the building with firefighters at work
|
| https://www.dna.fr/faits-divers-justice/2021/03/10/strasbour...
|
| Edit:
|
| Video at https://www.youtube.com/watch?v=a9jL_THG58U
|
| Satellite view of the site on Google Maps
| https://goo.gl/maps/L2T6YNFCtiyDdiNv7
| [deleted]
| ilkkao wrote:
| Easy to see now that a lightly constructed five story cube
| might not be a fully fire proof.
| Biganon wrote:
| How do you see that?
| mike_d wrote:
| Would you humor us with a link to a fully fire proof
| datacenter?
| dividedbyzero wrote:
| I think Microsoft had some experimental datacenter
| containers they submerged in the northern Atlantic for
| passive cooling, and I believe those were filled with an
| inert gas as well. I guess that would come very close to an
| actual fireproof datacenter.
| reasonabl_human wrote:
| Yep you're right, learning about this was part of some
| onboarding training we had to complete...
|
| it was an interesting proof of concept, but finding the
| right people to maintain the infra with both IT and Scuba
| skills was a narrow niche to nail down ;)
| potemkinhr wrote:
| I don't opening it at any point before decomissioning it
| completely is even an afterthought with that. They just
| write off any failures and roll with it as long as it's
| viable.
| reasonabl_human wrote:
| Yeah that was one of the show-stopping issues, inability
| to repair / hotswap etc...
| namibj wrote:
| Well, you only need to get shared infrastructure reliable
| enough that you can afford to not design it with repair
| in mind. The cloud servers are already design without
| unit-level maintenance work in mind, which saves money by
| eliminating rails and similar. They get populated racks
| from the factory and just cart them from the dock to
| their place, plug them in (maybe run some self-test to
| make sure there are no loose connectors or so), and
| eventually decommission them after some years.
| verytrivial wrote:
| From the outside (and from a position lacking all inside
| knowledge) it looks highly interconnected and very well
| ventilated. I'm not sure where you'd put a inert gas
| supression system or beefy firewalls to slow the fire
| progress.
| paco3346 wrote:
| Newly constructed datacenters in the US tend to be all
| metal with a full building clean suppression agent.
| https://www.fike.com/products/ecaro-25-clean-agent-fire-
| supp...
|
| I used to work for a provider whose 2 main datacenters of
| 8k+ sq ft could pull all oxygen out of the building in 60
| seconds.
| Twirrim wrote:
| Data centres I used to work in back in the early 2000s
| had argonite gas dumps in place (prior to argonite, halon
| used to be popular but is an ozone depleting gas so was
| phased out)
|
| In the case of a fire, it would dump a lot of argonite
| gas in and consume a large amount of the oxygen in the
| room, depriving the fire of fuel. It's also safe and
| leaves minimal clean-up work afterwards, doesn't harm
| electronics etc. unlike sprinklers and the like.
|
| The amount of oxygen left is sufficient for human life,
| but not for fires, though my understanding is that it can
| be quite unpleasant when it happens. You won't want to
| hang around.
| paco3346 wrote:
| One of ours had a giant red button you could hold to
| pause the 60 second timer before all the oxygen was
| displaced. Every single engineer was trained to
| immediately push that if people were in the room because
| it was supposedly a guaranteed death if you got stuck
| inside once the system went off.
| namibj wrote:
| Well, yeah, these normal inert gas fire suppression
| systems don't do a good job if humans can still breathe.
| The Novec 1230 based ones can actually be sufficiently
| effective for typical flammability properties you can
| cheaply adhere to in a datacenter, but even then you iirc
| would want to add both that and some extra oxygen,
| because the nitrogen in the air is much more effective at
| suffocating humans than at suffocating fire. This stuff
| is just a really, really heavy gas that's liquid below
| about body temperature (boils easily though), and the
| heat capacity of gasses is mostly proportional to their
| density.
|
| Flames are extinguished by this cooling effect (identical
| to water in that regard), but humans rely on catalytic
| processes that aren't affected by the cooling effect.
|
| If you could keep the existing oxygen inside, while
| adding Novec 1230, humans could continue to breathe while
| the flames would still be extinguished, but this would
| require the building/room to be a pressure chamber that
| holds about half an atmosphere of extra pressure. I'm
| pretty sure just blowing in some extra oxygen with the
| Novec 1230 would be far cheaper to do safely and
| reliably.
|
| I mean, in principle, if you gave re-breathers to the
| workers and have some airlocks, you could afford to keep
| that atmosphere permanently, but it'd have to be a bit
| warm (~30 C I'd guess). Don't worry, the air would be
| breathable, but long-term it'd probably be unhealthy to
| breathe in such high concentrations and humans breathing
| would slightly pollute the atmosphere (CO2 can't stay if
| it's supposed to remain breathable).
|
| Just to be clear: in an effective argonite extinguishing
| system you'd have about a minute or two until you pass
| out and need to be dragged out, ideally get oxygen, get
| ventilated (no brain, no breathing) and potentially also
| be resuscitated (the heart stops shortly after your brain
| from a lack of oxygen, so if you're ventilated fast
| enough, it never stops and you wake up a few externally-
| forced breaths later). Having an oxygen bottle to
| supplement your breaths would fix that problem for as
| long as it's not empty.
| gpm wrote:
| > I mean, in principle, if you gave re-breathers to the
| workers and have some airlocks, you could afford to keep
| that atmosphere permanently,
|
| At this point I feel like it would be cheaper just to not
| have workers go there. Fill the place completely full of
| nitrogen with an onsite nitrogen generator (and only 1atm
| pressure). Have 100% of regular maintenance and as much
| irregular maintenance as possible be done by robots. If
| something happens that requires strength/dexterity beyond
| the robots (e.g. a heavy object falling over), either
| have humans go in in some form of scuba gear, or if you
| can work around it just don't fix it.
| namibj wrote:
| That seems reasonable. But just to clarify what I meant
| with airlock: some thick plastic bag with a floor-to-
| ceiling zipper on the "inner" and "outer" ends, and for
| entry, it's first collapsed by a pump sucking the air out
| of it. Then you open the zipper on the outer end, step
| in, close the zipper, let the pump suck away the air
| around you, and open the inner zipper (they should
| probably be automatically operated, as you can't move
| well/much when you are "vacuum bagged").
|
| For exit, basically just the reverse, with the pump
| pumping the air around the person to wherever the person
| came from.
|
| The general issue with unbreathable atmospheres is that a
| failure in their SCBA gear easily kills them.
|
| And re-breathers that are only there so you don't have to
| scrub the atmosphere in the room as often shouldn't be
| particularly expensive. You may even get away with just
| putting a CO2 scrubber on the exhaust path, and giving
| them slightly oxygen-enriched bottled air so you can keep
| e.g. a 50:50 oxygen:nitrogen ratio inside (so e.g. 20%
| O2, 20% N2, 60% Novec 1230). And it doesn't even need to
| be particularly effective, as you can breathe in quite a
| bit of the ambient air without being harmed, and the
| environment can tolerate some of your CO2. Like, as long
| as it scrubs half of your exhausted CO2 it won't even
| feel stuffy in there (you could handle the ambient air
| you'd have without re-breathers being used, as it'd be
| just 1.6% CO2, but you'd almost immediately get a
| headache).
|
| They'd have an exhaust vent for pressure equalization,
| which would chill the air to condense and re-cycle the
| Novec 1230. For pressure equalization in the other
| direction, they'd probably just boil off some of that
| recycled Novec 1230.
|
| So yeah, re-breather not needed, if you just get a
| mouth+nose mask to breathe bottled 50:50 oxygen:nitrogen
| mix. That 50% oxygen limit (actually 500 mBar) is due to
| long-term toxicity, btw. Prolonged exposure to higher
| levels causes lung scarring and myopia/retina detachment,
| so not really fun.
| pulkitanand wrote:
| Cloud going up in smoke.
| userbinator wrote:
| That is a disturbingly large amount of flame. I was expecting a
| datacenter to not have much in the way of flammable material.
|
| ...then I read in some other comments here that they used
| _wood_ in the interior construction.
| Symbiote wrote:
| Wood is a reasonable construction material.
|
| It takes a good while to start burning, and even when
| significantly charred it still retains most of its strength.
| owenmarshall wrote:
| Wood is a reasonable construction material for my house, or
| an office - but is it for a building with that much energy
| kept that close together?
| joshuamorton wrote:
| Treated lumber is generally considered to be fairly
| fireproof (comparable, though with different precise
| failure modes than steel or concrete). It depends on
| exactly what kind of wood is being used. A treated 12x12
| beam is very fire resistant, plywood is less so.
|
| The issue is you'll have lots of plastic (cabling) in a
| DC, and plastic will burn
| namibj wrote:
| There is self-extinguishing cable insulation, though. I'm
| actually surprised this (DC flammability) is still an
| issue, and not already solved by making components self-
| extinguishing and banning non-tiny batteries inside the
| equipment. If you want to have a battery for your raid
| controller, put something next to it that will stop your
| system from burning down it's surroundings.
| account42 wrote:
| Your Google Maps link seems to be off by a bit.
| Gengis wrote:
| Hi,
|
| Does anyone recommend a mail provider that implements IMAP
| failover and replication across different sites ?
|
| My mail account is hosted by OVH, it has been down for hours and
| from what I read I may have to wait for another 1 or 2 days.
|
| Thanks !
| rgj wrote:
| Video here. This will be Disaster Recovery 101 material.
|
| https://mobile.twitter.com/abonin_DNA/status/136953802824345...
| Rapzid wrote:
| Or disaster prevention. I'm curious how their fire suppression
| system failed so spectacularly..
| rgj wrote:
| I think they failed to install it.
|
| They only mention smoke detection:
| https://www.ovh.com/world/us/about-us/datacenters.xml
| lorenzhs wrote:
| In French but with pictures: https://lafibre.info/ovh-
| datacenter/ovh-et-la-protection-inc... - probably a
| different OVH data centre, but they clearly have sprinklers
| there. The argument they made against gas extinguishers is
| by the time they have to use it, the data is gone anyway,
| and it's only going to trigger in the affected areas. It's
| also far safer for the people working there.
| Rapzid wrote:
| That's very interesting find! Google translate seemed to
| do a good enough job on it for me. 8 years ago and there
| are people in that thread ripping on what they see in the
| photos.
| raverbashing wrote:
| Ouch, that's not pretty... but it seems that the fire was
| constrained to 1 or 2 sectors (inside the building) - per their
| updates
|
| Not sure how good were the fire suppression systems of the
| building.
| baggy_trough wrote:
| Looks like a lot of those containers/compartments are melted.
| worldofmatthew wrote:
| https://twitter.com/olesovhcom/status/1369504527544705025
|
| "Update 5:20pm. Everybody is safe. Fire destroyed SBG2. A part of
| SBG1 is destroyed. Firefighters are protecting SBG3. no impact
| SBG4."
| [deleted]
| akamaka wrote:
| For some context:
|
| "SBG1, the first Strasbourg data center, consisting of twelve
| containers, came online in 2012. The 12 containers had a
| capacity of 12,000 servers.
|
| SBG2 is a non-container data center in 2016 using its "Tower"
| design with a capacity of 30,000 servers.
|
| SBG3 tower was built in 2017 with a capacity of 30,000 servers.
|
| SBG4 was built in 2013 as several containers to augment
| capacity, but was decommissioned in 2018 and moved to SB3"
|
| https://baxtel.com/data-center/ovh-strasbourg-campus
| Scoundreller wrote:
| and a 2017 (!) article where they were planning to remove 1
| and 2 because of power issues!
|
| > OVH to Disassemble Container Data Centers after Epic Outage
| in Europe
|
| > "This is probably the worst-case scenario that could have
| happened to us."
|
| > OVH [...] is planning to shut down and disassemble two of
| the three data centers on its campus in Strasbourg, France,
| following a power outage that brought down the entire campus
| Friday, causing prolonged disruption to customer applications
| that lasted throughout the day and well into the evening.
|
| https://www.datacenterknowledge.com/uptime/ovh-
| disassemble-c...
| Macha wrote:
| It sounds like the plan was to shutdown 1 and 4, the latter
| of which happened and the former which did not.
| Scoundreller wrote:
| It's confusing as they say they're shutting down 2 of the
| 3, but then there's 4...
| AdamJacobMuller wrote:
| > Fire destroyed SBG2
|
| This is crazy.
|
| SBG2 was HUGE and if this isn't a translation error on the part
| of Octave (which I could understand given the stress and ESL) I
| have a hard time fathoming what kind of fire could destroy a
| whole facility with nearly 1000 racks of equipment spread out
| across separated halls.
|
| I'm really hoping "destroyed" here means "we lost all power and
| network core and there's smoke/fire/physical damage to SOME of
| that"
|
| I can't even fathom a worst-case scenario of a transformer
| explosion (which does occur and I've seen the aftermath of)
| having this big of an impact. Datacenters are built to contain
| and mitigate these kinds of issues. Fire breaks, dry-pipe
| sprinkler systems and fire-extinguishing gas systems are all
| designed to prevent a fire from becoming large-scale.
|
| Really glad nobody was hurt. OVH is gonna have a bad time
| cleaning all this up.
| vinay_ys wrote:
| I wonder if they had batteries inside the containers. That
| can make a bad situation really worse.
| ikiris wrote:
| This was effectively posted on the outages list as well by
| someone trustworthy. The pictures also look pretty bad from
| the outside.
| exikyut wrote:
| Which outages list? Sounds interesting.
| AnssiH wrote:
| I presume they mean this:
| https://puck.nether.net/mailman/listinfo/outages
| exikyut wrote:
| Ah, thanks!
| sofixa wrote:
| OVH design their own datacenters, so it's possible that they
| missed something or some system or another didn't work as
| intended, thus the heavy damage.
| rgj wrote:
| They did not have a fire suppression system, only smoke
| detection. So yeah, they missed something.
| lorenzhs wrote:
| There are photos of the fire suppression system in one of
| their data centres in this (French) forum thread:
| https://lafibre.info/ovh-datacenter/ovh-et-la-protection-
| inc.... They have sprinklers, with the reasoning being
| that the burning racks' data is gone anyway if there's a
| fire, and at least sprinklers don't accidentally kill the
| technicians.
| 35fbe7d3d5b9 wrote:
| > They have sprinklers, with the reasoning being that the
| burning racks' data is gone anyway if there's a fire
|
| I think the real problem, per that post, is this:
|
| >> They are simple sprinklers that spray with water. It
| has nothing to do with very high pressure misting
| systems, where water evaporates instantly, and which save
| servers. Here, it's watering, and all the servers are
| dead. It's designed like that. Astonishing, isn't it?
|
| >> Obviously, they rely above all on humans to extinguish
| a possible fire, unlike all conventional data centers.
|
| (all thanks to Google Translate)
|
| This strikes me as a _terrible safety system_ because
| even if a human managed to detect the fire, they have to
| make a big call: is the risk of flooding the facility and
| destroying a ton of gear worth putting out a fire? By the
| time the human decides "yes, it is", it may well be too
| late for the sprinklers.
|
| > and at least sprinklers don't accidentally kill the
| technicians.
|
| Not a real risk with modern 1:1 argon:nitrogen systems -
| the goal is to pump in inert gases and reduce oxygen
| content to around 13%, a point where the fire is
| suppressed and people can survive. You wouldn't _want_ to
| be in a room breathing 13% oxygen for a long time, but it
| won 't kill you.
|
| All in all, it looks like this was a "normal accident"[1]
| for a hosting company that aggressively competes on
| price. The data center was therefore built with less
| expensive safeties and carried a higher risk of
| catastrophic failure.
|
| [1]: https://en.wikipedia.org/wiki/Normal_Accidents
| Faaak wrote:
| Sprinklers are only present in OVH Canada datacenter !
| There are no sprinklers in Europe ones.
| mwcampbell wrote:
| Given that they're headquartered in Europe and most
| popular there, why is the satellite location better? Is
| it because the Canadian data center is newer, because
| Canada has stronger regulations in this area, or
| something else? Also, does anyone know how the OVH US
| data centers compare?
| sofixa wrote:
| According to a 2013 forum post by the now CEO of
| Scaleway, a competitor of OVH, it's due to North American
| building regulations that basically force you into
| sprinklers and stuff for insurance reasons.
|
| Source in French: https://lafibre.info/ovh-
| datacenter/ovh-et-la-protection-inc...
| kuschku wrote:
| Which is understandable, as Halon based fire suppression
| systems have been illegal for quite some time :/
| weeeeelp wrote:
| There are non-Halon systems, using agents such as FM-200.
| Those are not as toxic and do not destroy the ozone
| layer.
| iagovar wrote:
| Where did you get this?
| rgj wrote:
| https://www.ovh.com/world/us/about-us/datacenters.xml
| draugadrotten wrote:
| They say different here: "The rooms are also equipped
| with state of the art fire detection and extinction
| systems."
|
| https://www.ovh.com/world/solutions/centres-de-
| donnees.xml
|
| and here
|
| "Every data center room is fitted with a fire detection
| and extinction system, as well as fire doors. OVHcloud
| complies with the APSAD R4 rule for the installation of
| mobile and portable fire extinguishers and has the N4
| certificate of conformity for all our data centers."
| https://us.ovhcloud.com/about/company/security
| sofixa wrote:
| That's for its North America DCs, not European ones.
| AdamJacobMuller wrote:
| > They did not have a fire suppression system
|
| I find it very hard to believe that that would pass code
| anywhere in the US/EU or most of the world. They may not
| have had sprinklers but that doesn't mean there isn't
| fire suppression.
| sofixa wrote:
| According to a forum post discussing the building of said
| DC, with photos and such, there's no visible fire
| suppression system:
|
| https://lafibre.info/ovh-datacenter/ovh-et-la-protection-
| inc...
| rgj wrote:
| No, "destroyed" looks like this: https://mobile.twitter.com/a
| bonin_DNA/status/136953802824345...
|
| EDIT a picture from earlier this night https://pbs.twimg.com/
| media/EwGqV17XMAMF_wa?format=jpg&name=...
| baq wrote:
| I'd say destroyed is an adequate word, yes.
| AdamJacobMuller wrote:
| I'm really shocked, that's an incredible amount of damage.
| Lots of people lost data.
| baggy_trough wrote:
| The pictures seem pretty clear that a lot of it is gone,
| judging by the blackened holes in the walls with firehoses
| being sprayed into.
| hedora wrote:
| Data centers frequently burn down, or are destroyed by
| natural disaster.
|
| These days, fire suppression systems need to be non-lethal,
| so inert gasses are out. Water is too, for obvious reasons.
| Last I checked, they flooded the inside of the DC with
| airborne powder that coats everything (especially the inside
| of anything with a running fan). Once that deploys, the
| machines in the data center are a write off even if the fire
| was minor.
| nixgeek wrote:
| Most datacenters I've worked with in the last 5 years use
| water.
| qbasic_forever wrote:
| Just guessing, but maybe a fire suppression system going off
| could wipe out all the machines?
|
| The couple datacenters I've been inside were small, old and
| used halon gas which wasn't supposed to destroy the machines.
| No idea how it works in big places these days.
| atkbrah wrote:
| A few years back there was an incident in Sweden where
| noise coming from the gas based fire suppression system
| going off destroyed hard drives [1].
|
| 1. https://www.theregister.com/2018/04/26/decibels_destroy_
| disk...
| yread wrote:
| We had the same issue at a customer site. To add since
| the decibels were outside the rated environment the
| warranty was void on the harddisks and they had to be
| replaced even if they were not destroyed.
| abrookewood wrote:
| I've also seen a weird video (lost to time
| unfortunately), where someone showed that they could yell
| at their servers in a data centre and introduce errors
| (or something similar). Was very strange to see, but they
| had a console up clearly showing that it was having an
| impact.
| regecks wrote:
| https://www.youtube.com/watch?v=tDacjrSCeq4&fmt=18
| lrem wrote:
| Because of course it was Bryan :D
| jeffrallen wrote:
| Brian just giving his opinion on anything would give hard
| drives errors... That guy loves to run his mouth (and I
| love to listen).
| mwcampbell wrote:
| Actually, although the video is on Bryan Cantrill's
| YouTube account, that was Brendan Gregg.
| bcantrill wrote:
| I was merely the videographer! For anyone who is curious,
| I discussed the backstory of that video with Ben
| Sigelman.[0]
|
| [0] https://www.youtube.com/watch?v=_IYzD_NR0W4#t=28m46s
| atkbrah wrote:
| I actually planned to include a link to this video in my
| original response but then left it out. Thanks for
| posting!
| gcbirzan wrote:
| ING in Romania had the same issue:
| https://www.datacenterdynamics.com/en/news/noise-from-
| fire-d...
| dylan604 wrote:
| If a fire supression kicks in or the fire department shows up
| with their hoses, would they still say the fire destroyed it
| or just say destroyed due to fire and water damage?
|
| Also, fire suppression system do fail. There was an infamous
| incident in LA for one of the studios. They built a warehouse
| to be a tape vault with tapes going back to the 80s. A fire
| started, but the suppression system failed because there was
| not enough pressure in the system. Total loss. Got to keep
| your safety equipment properly tested!
| lrem wrote:
| There's a problem with testing sprinklers: engaging them
| can be damaging to contents and even structures. So, we're
| talking about completely emptying the facility, then taking
| it offline to dry for a time. I've never heard about this
| being done to anything that was already operational (but I
| wasn't researching this either).
| dylan604 wrote:
| There are methodds of testing the water pressure in the
| pipes without actually engaging the sprinkler heads. It
| is part of the normal checks done during the
| maintenence/inspection a business is supposed to have
| done. In fact, one place I was in had sensors, and would
| sound the actual fire alarm if the pressure fell below
| tolerance at any time. The lack of pressure in the
| Universal vault was 100% unexcusable.
| ygra wrote:
| Isn't something like Halon used in data centers for that
| reason? That can probably be tested without damaging
| infrastructure.
| weeeeelp wrote:
| Halon is pretty much banned for years now, other agents
| have been introduced. Sadly, making an actual full test
| of a gaseous extinguishing system (such as FM200 or Novec
| 1230) can be prohibitively expensive (mainly costs of the
| "reloading" of the system with new gas). Those are just
| mostly tested for correct pressure in tanks and if the
| detection electronics are working fine, making an actual
| dump would be very impractical (evacuation of personnel,
| ventilating the room afterwards etc.)
| proggy wrote:
| It wasn't just "tapes going back to the 80s." Those were
| just the media Universal initially admitted to losing. No,
| that building was the mother lode. It had film archives
| going back over 50 years, and worst of all -- unreleased
| audio tape masters for thousands upon thousands of
| recording artists. The amount of "remastered" album
| potential that fire destroyed is probably in the billions
| of dollars, let alone the historical loss of all those
| recordings by historical persons that will never be heard
| due to a failure of a fire prevention system. Fascinating
| case study in why you should never put all your eggs in one
| basket.
|
| https://en.m.wikipedia.org/wiki/2008_Universal_Studios_fire
| AdamJacobMuller wrote:
| This is why I'm such a fan of digitizing. If you have 1
| film master you effectively have none. Do some 8K or 16K
| scans of that master and effectively manage the resulting
| data and you're effectively 100% immune from future loss
| in perpetuity.
|
| Losing parts of history like that is tragic.
| arp242 wrote:
| In 1996 Surinam lost many of their government archives
| after a fire burned the building down.
|
| I can find surprisingly few English-language resources
| for this (only Dutch ones); guess it's a combination of a
| small country + before the internet really took off.
| jen20 wrote:
| An easy mistake (in the quoted tweet) to make given the
| circumstances, but it probably should be 5:20am rather than pm.
| rsync wrote:
| This is a literal nightmare for me.
|
| I can remember several San Diego fires that threatened the
| original JohnCompanies datacenter[1] circa mid 2000s and thinking
| about all of the assets and invested time and care that went into
| every rack in the facility.
|
| Very interested to read the post-mortem here ... even more
| interested in any actionable takeaways from what is a very rare
| event ...
|
| [1] Castle Access, as it was known, at the intersection of Aero
| Dr. and the 15 ... was later bought by Redit, then Kio ...
| [deleted]
| sebmellen wrote:
| Man, 2006 was a freaky time to be in San Diego.
| MrQuimico wrote:
| I have a VPS in SBG1.
|
| So far I haven't got any communication from OVH alerting me about
| this. I think that's the first thing they should do, alerting
| their customers that something it's happening.
|
| Anyway I was running a service that I was about to close, so this
| may be it. I do have a recovery plan, but I don't know it it is
| worth it at this point.
|
| I'm never using OVH again. The fire can happen, but don't ask me
| about my recovery plan, what about yours?
| Symbiote wrote:
| There's a big link from their homepage:
| https://www.ovh.ie/news/press/cpl1786.fire-our-strasbourg-si...
|
| It says they've alerted customers, but I expect some have been
| missed, through inaccurate email records, email hosted on the
| systems that have been destroyed etc.
| jjjeii3 wrote:
| White collar jobs are not paid well in the EU. If a saleswoman
| from a nearby shop has the same salary, why should I care about
| the quality of my work at all? The entire data center burned
| down? Fine! There are a lot of other places with the same tiny
| salary.
|
| As you can see below, a developer expert is only making 24K-63K
| Euro per year at OVH (in US dollar it's almost the same amount):
|
| https://www.glassdoor.com/Salary/OVHcloud-France-Salaries-EI...
|
| after paying taxes you will only get a half of that amount.
| ogaj wrote:
| Wow. Weren't they just angling for an IPO? I wonder how much of
| this was insured and what the impact is to their overall
| operations.
| moooo99 wrote:
| I think they announced their IP plans yesterday [1], which is
| probably the worst timing one can have (if there even is a good
| timing for a datacenter burning down, probably there isn't).
|
| If they have a good insurance I'm confident this will have
| little impact on their operations, I really hope they do. I
| host a few components on OVH/SoYouStart dedicated servers,
| luckily not mission critical, but still had rather good
| experience with them, especially in terms of price to
| performance.
|
| [1] https://www.reuters.com/article/amp/idUSKBN2B01FM
| toyg wrote:
| The publicity damage alone will be on par with (if not
| bigger) their replacement costs. I wouldn't be surprised if
| they had to rebrand.
|
| The timing is so bad that it becomes almost suspicious. When
| is the best time to sabotage your competitor? When they are
| the most visible.
| brmgb wrote:
| > The publicity damage alone will be on par with (if not
| bigger) their replacement costs. I wouldn't be surprised if
| they had to rebrand.
|
| Honest question: which publicity damage?
|
| A fire in a datacenter is very much part of the things you
| should expect to see happen when you operate a large number
| of datacenters and will obviously cause some disruption to
| your customers hosting physical servers there.
|
| Provided the disruption doesn't significantly extend to
| their cloud customers and doesn't affect people paying for
| guaranteed availability (which it shouldn't - OVH operates
| datacenters throughout the world), this seems to me to be
| an unfortunate incident but not a business threatening one.
| marcinzm wrote:
| Most people I feel would expect fire suppression to kick
| in and prevent the whole data center (and the adjacent
| ones) from catching on fire. The fact that it didn't is
| concerning regarding their operations since they build
| their own custom data centers. The fire isn't the issue,
| how much damage it did is the issue. So one can ask if
| there was there a systematic set of planning mistakes of
| which this is just the first to surface?
| karambahh wrote:
| SBG is a small part of their operations. I doubt it will
| have a lasting issue on their image
| broknbottle wrote:
| It's a Thermal Event not a fire
| Diederich wrote:
| Back in the late 90s, I implemented the first systematic
| monitoring of WalMart Store's global network, including all of
| the store routers, hubs (not switches yet!) and 900mhz access
| points. Did you know that WalMart had some stores in Indonesia?
| They did until 1998.
|
| So when https://en.wikipedia.org/wiki/May_1998_riots_of_Indonesia
| started happening, we heard some harrowing stories of US
| employees being abducted, among other things.
|
| Around that same time, the equipment in the Jakarta store started
| sending high temperature alerts prior to going offline. Our NOC
| wasn't able to reach anyone in the store.
|
| The alerts were quite accurate: that was one of the many
| buildings that had been burned down in the riots. I guess it's
| slightly surprising that electrical power to the equipment in
| question lasted long enough to allow temperature alerting. Most
| of our stores back then used satellite for their permanent
| network connection, so it's possible telcom died prior to the
| fire reaching the UPC office.
|
| In a couple of prominent places in the home office, there were
| large cutouts of all of the countries WalMart was in at the time
| up on the walls. A couple of weeks after this event, the
| Indonesia one was taken down over the weekend and the others re-
| arranged.
| avereveard wrote:
| ovh monitoring is even better, everything is green on the
| destroyed dc http://status.ovh.com/vms/index_sbg2.html
| 1MachineElf wrote:
| Thanks for sharing this interesting story. Part of my family
| immigrated from Indonesia due to those riots, but I was unaware
| up until today of the details covered by the Wikipedia article
| you linked.
|
| I remember during the 2000s and 2010s that WalMart in the USA
| earned a reputation for it's inventories primarily consisting
| of Chinese-made goods. I'm not sure if that reputation goes all
| the way back to 1998, but it makes me wonder if WalMart was
| especially targeted by the anti-Chinese element of the
| Indonesian riots because it.
| Diederich wrote:
| I can't recall (and probably didn't know at the time...it was
| far from my area) where products were sourced for the
| Indonesia stores.
|
| Prior to the early 2000s, WalMart had a strong 'buy American'
| push. It was even in their advertising at the time, and
| literally written on the walls at the home office in
| Bentonville.
|
| Realities changed, though, as whole classes of products were
| more frequently simply not available from the United States,
| and that policy and advertising approach were quietly
| dropped.
|
| Just for the hell of it, I did a quick youtube search:
| "walmart buy american advertisement" and this came up:
| https://www.youtube.com/watch?v=XG-GqDeLfI4 "Buy American -
| Walmart Ad". Description says it's from the 1980s, and that
| looks about right.
| Diederich wrote:
| What the hell, here's another story. The summary to catch
| your attention: in the early 2000s, I first became aware of
| WalMart's full scale switch to product sourcing from China by
| noting some very unusual automated network to site mappings.
|
| Part of what my team (Network Management) did was write code
| and tools to automate all of the various things that needed
| to be done with networking gear. A big piece of that was
| automatically discovering the network. Prior to our auto
| discovery work, there was no good data source for or
| inventory of the routers, hubs, switches, cache engines,
| access points, load balancers, VOIP controllers...you name
| it.
|
| On the surface, it seems scandalous that we didn't know what
| was on our own network, but in reality, short of
| comprehensive and accurate auto discovery, there was no way
| to keep track of everything, for a number of reasons.
|
| First was the staggering scope: when I left the team, there
| were 180,000 network devices handling the traffic for tens of
| millions of end nodes across nearly 5,000 stores, hundreds of
| distribution centers and hundreds of home office
| sites/buildings in well over a dozen countries. The main US
| Home Office in Bentonville, Arkansas was responsible for
| managing all of this gear, even as many of the international
| home offices were responsible for buying and scheduling the
| installation of the same gear.
|
| At any given time, there were a dozen store network equipment
| rollouts ongoing, where a 'rollout' is having people visit
| some large percentage of stores intending to make some kind
| of physical change: installing new hardware, removing old
| equipment, adding cards to existing gear, etc.
|
| If store 1234 in Lexington, Kentucky (I remember because it
| was my favorite unofficial 'test' store :) was to get some
| new switches installed, we would probably not know what day
| or time the tech to do the work was going to arrive.
|
| ANYway...all that adds up to thousands of people coming in
| and messing with our physical network, at all hours of the
| day and night, all over the world, constantly.
|
| Robust and automated discovery of the network was a must, and
| my team implemented that. The raw network discovery tool was
| called Drake, named after this guy:
| https://en.wikipedia.org/wiki/Francis_Drake and the tool that
| used many automatic and manual rules and heuristics to map
| the discovered networking devices to logical sites (ie, Store
| 1234, US) was called Atlas, named after this guy:
| https://en.wikipedia.org/wiki/Atlas_(mythology)
|
| All of that background aside, the interesting story.
|
| In the late 90s and early 2000s, Drake and Atlas were doing
| their thing, generally quite well and with only a fairly
| small amount of care and feeding required. I was snooping
| around and noticed that a particular site of type
| International Home Office had grown enormously over the
| course of a few years. When I looked, it had hundreds of
| network devices and tens of thousands of nodes. This was
| around 2001 or 2002, and at that time, I knew that only US
| Home Office sites should have that many devices, and thought
| it likely that Atlas had a 'leak'. That is, as Atlas did its
| recursive site mapping work, sometimes the recursion would
| expand much further than it should, and incorrectly map
| things.
|
| After looking at the data, it all seemed fine. So I made some
| inquiries, and lo and behold, that particular international
| home office site had indeed been growing explosively.
|
| The site's mapped name was completely unfamiliar to me, at
| the time at least. You might have heard of it:
| https://en.wikipedia.org/wiki/Shenzhen
|
| I was seeing fingerprints in our network of WalMart's whole
| scale switch to sourcing from China.
| hobojones wrote:
| In the early 2000s I was working as a field engineer
| installing/replacing/fixing network equipment for Walmart
| at all hours. It's pretty neat to hear the other side of
| the process! If I remember correctly there was some policy
| that would automatically turn off switch ports that found
| new, unrecognized devices active on the network for an
| extended period of time, which meant store managers
| complaining to me about voip phones that didn't function
| when moved or replaced.
| Diederich wrote:
| Ah neat, so you were an NCR tech! (I peeked at your
| comment history a bit.) My team and broader department
| spent a lot of hours working with, sometimes not in the
| most friendly terms, people at different levels in the
| NCR organization.
|
| You're correct, if Drake (the always running discovery
| engine) didn't detect a device on a given port over a
| long enough time, then another program would shut that
| port down. This was nominally done for PCI compliance,
| but of course having open, un-used ports especially in
| the field is just a terrible security hole in general.
|
| In order to support legit equipment moves, we created a
| number of tools that the NOC and I believe Field Support
| could use to re-open ports as needed. I _think_ we
| eventually made something that authorized in-store people
| could use too.
|
| As an aside, a port being operationally 'up' wasn't by
| itself sufficient for us mark the port as being
| legitimately used. We had to see traffic coming from it
| as well.
|
| You mentioned elsewhere that you're working with a big,
| legacy Perl application, porting it to Python. 99% of the
| software my team at WalMart built was in Perl. (: I'd be
| curious to know, if you can share, what company/product
| you were working on.
| 1MachineElf wrote:
| Epic story! Thank you for sharing it. I appreciate the
| detail you included there.
| Diederich wrote:
| You're quite welcome. In my experience, the right details
| often make a story far more interesting.
| [deleted]
| jeffrallen wrote:
| Hugops to OVH folks, hang in there, you're my favorite European
| data center operator.
| lovedswain wrote:
| He posted an update, seems SBG2 is totally destroyed.
|
| Ouch https://twitter.com/Onepamopa/status/1369484420982407173
| meepmorp wrote:
| > DID ANYTHING SURVIVE? ANYTHING AT ALL?????? OUR DATA IS IN
| SBG2. WHAT DO WE DO NOW ?!?!
|
| Double check those backups, folks.
| mhh__ wrote:
| One of the things I'm still extremely grateful for is that I
| learnt the basics of computer science from an ex-oracle guy
| turned secondary school teacher, who wasn't the best
| programmer let's say but who absolutely drilled into us (I
| was probably the only one listening but still) the importance
| of code quality, backups, information security etc.
|
| Nothing fancy, but it's the kind of bread and butter
| intuition you need to avoid walking straight off a cliff.
|
| He also let me sit at the back writing a compiler instead of
| learning VB.Net, top dude
| ikiris wrote:
| what do we do? Well for starters don't rely on a single
| geographic location _chuckles_
| monkeybutton wrote:
| Trust but verify. As a developer it doesn't matter what
| sysadmins or anyone else says about backups of your data; if
| you haven't run your DR plan and verified the results, it
| doesn't exist.
| mhh__ wrote:
| > Trust but verify.
|
| If it's business critical should you even trust at all?
| casi wrote:
| yeah backups are a case of "don't trust, verify"
| londons_explore wrote:
| Did OVH claim that SBG1 and SBG2 were isolated failure domains?
| Despite them just being different rooms in the same building?
| ev1 wrote:
| They are different physical buildings. OVH does not generally
| claim anything regarding AZ or distance. SBG1, 2, 3, etc is
| just denoting the building your server is in - they are not
| like AWS style AZ or similar, quite literally just building
| addresses.
|
| I have used them for years and I don't believe they've ever
| said anything like deploy in both SBG1 and SBG2 for safety or
| availability, because you don't get that choice.
|
| When you provision a machine (eg via API) they tell you "SBG 5
| min, LON 5 min, BHS 72h" and you pick SBG and get assigned
| first-available. There is no "I want to be in SBG4" generally.
| kuschku wrote:
| In fact, OVH themselves host e.g. backups of SBG in RBX.
| kuschku wrote:
| They're separate buildings, with separate power systems, just
| standing next to another. Next to SBG2 are also SBG3 and SBG4.
| londons_explore wrote:
| Those buildings weren't quite as far apart as they should
| have been if a fire in one requires all 4 of the others to be
| turned off...
| Leo_Verto wrote:
| Not disagreeing with you but the firefighters probably shut
| down all power onsite as soon as they arrived.
| jiofih wrote:
| Apparently hey are more like (temporary) extensions of the
| main building than separate DCs.
| jiofih wrote:
| Apparently they are more like (temporary) extensions of the
| main building than separate DCs.
| rarefied_tomato wrote:
| Was the building made from stacked shipping containers?
| Containers are such a budget-friendly and trendy structural
| building block these days. They even click with the software
| engineers - "Hey, it's like Docker".
|
| Containers would seem to be at a disadvantage when it comes to
| dissipating, rather than containing, heat. I hope improved
| thermal management and fire suppression designs can be
| implemented.
| qeternity wrote:
| OVH use a custom water cooling solution they claim enables
| these niche designs and increases rack density.
|
| As another comment says, that density probably exacerbated the
| issue here.
| AnssiH wrote:
| I could be wrong, but I understood from other comments that
| only SBG1 used containers.
| bdz wrote:
| Rust (the video game) lost all EU server data w/o restore
| https://twitter.com/playrust/status/1369611688539009025
| chrisandchris wrote:
| Someone forgot the 3-2-1 rule for backups.
|
| I don't get why people don't do offsite backup today as it's
| basically for free. AWS Glacier basically costs nothing at all.
| rplnt wrote:
| Now that is weird. I fully understand not having a backup
| service in place, but no data backup either?
| Voloskaya wrote:
| TBF "data" here means the state of the gameworld which anyway
| resets entirely every 2 weeks or every month, so it's not
| exactly a big deal, everyone constantly start over in Rust.
| rplnt wrote:
| Ah, I see. That makes sense then. I thought it's an
| everlasting game world, thanks for clarification.
| phtrivier wrote:
| Seems like data.gouv.fr [1], the government platform for open
| data is impacted ; we might not get the nice COVID-19 graphs from
| non-governemental sites ([2], [3]) today.
|
| I can't wait for the conspiracy theories about how the fire is a
| "cover up" to "hide" bad COVID-19 numbers...
|
| [1]
| https://twitter.com/GuillaumeRozier/status/13695724905996902...
|
| [2] https://covidtracker.fr/
|
| [3] https://www.meteo-covid.com/trouillocarte (Just wanted to
| share the "trouillocarte" - which roughly translates to "'how
| badly is shit hitting the fan today' map" ;) )
| Shadonototro wrote:
| it has nothing to do with covid:
|
| - https://www.usine-digitale.fr/article/le-health-data-hub-
| heb...
|
| - https://www.genethique.org/la-cnam-refuse-le-transfert-
| globa...
|
| - https://france3-regions.francetvinfo.fr/bretagne/donnees-
| med...
|
| Here is more plausible conspiracy theory: USA/microsoft is
| behind the cyberattacks and the fire
| brainzap wrote:
| my VPS is gone, hmm
| benlumen wrote:
| Someone told me that there were _a lot_ of warez sites hosted
| with OVH.
|
| Anyone know of any casualties?
| [deleted]
| pm90 wrote:
| Unfortunately A lot of people are going to find out the hard way
| today why AWS/GCP/Big Expensive Cloud is so expensive (Hint: they
| have redundancy and failover procedures which drive up costs).
|
| Keep in mind I'm talking not of "downtime" but of actual data
| loss which might affect business continuity.
|
| This is really tragic. I'm hoping they have some kind of multi
| regional backup/replication and not just multi zones (although
| from the twitters it appears that only one of the zones was
| destroyed however the others don't seem to be operational atm).
| the_duke wrote:
| I encourage you to have a look at the operating income that AWS
| rakes in.
|
| Sure, the amount of expertise, redundancy and breadth of
| service offerings they provide is worth a markup, but they are
| also significantly more expensive than they need to be.
|
| Thanks to being the leader in an oligopoly, and due to patterns
| like making network egress unjustifiably expensive to keep you
| (/your data) from leaving.
| pm90 wrote:
| I think the question here, then is of subjective value.
|
| AWS may charge more for egress, but that's not high enough
| for it to be a concern for most clients.
|
| A bigger, independent concern is probably that there should
| be sufficient redundancy, backups and such that allows for
| business continuity. (Note again that I'm not saying that all
| companies make full use of these features, but those that
| care for such things do. Additionally, I've honestly never
| heard of an AWS DC burning down. Either it doesn't happen
| frequently or it doesn't have enough of an effect on regular
| customers, both of situations are equivalent for my case).
|
| Most businesses choose to prioritize the second aspect. Even
| if they have to pay extra for egress sometimes, it's just not
| big enough of a concern as compared to businesses continuity.
| hhw wrote:
| I've never heard of any data centre burning down (and I
| work in this industry), so never hearing of an AWS DC
| burning down isn't really saying anything about AWS.
| ev1 wrote:
| I remember hosting.ua DC mysteriously "catch fire"
| opsunit wrote:
| An availability zone (AZ) in AWS eu-west-2 was flooded by a
| fire protection system going off within the last year. It
| absolutely did affect workloads in that AZ. That shouldn't
| have had a large impact on their customers since AWS
| promote and make as trivial as is viable multi-AZ
| architectures.
|
| Put another way: one is guided towards making operational
| good choices rather than being left to discover them
| yourself. This is a value proposition of public clouds
| since it commoditises that specialist knowledge.
| tomwojcik wrote:
| Hm, I can't find anything in google about this flooding
| incident. Can you share some details / source?
| CodesInChaos wrote:
| What surprised me most about today's fire is that their
| datacenters have so little physical separation. I
| expected them to be far enough apart to act as separate
| availability zones.
| x3sphere wrote:
| If the data is that critical, surely you would be backing it up
| frequently and also mirror it on at least one geographically
| separate server?
|
| I use a single server at OVH, and I'm not in the affected DC,
| but if this DID happen to me I could get back up and running
| fairly quickly. All our data is mirrored on S3 and off site
| backups are made frequently enough it wouldn't be an issue.
|
| Plus, you still need to plan for a scenario like this even with
| AWS or any other cloud provider. It is less likely to happen
| with those, given the redundancy, but there is still a chance
| you lose it all without a backup plan.
| zelly wrote:
| Yup, I've never heard of a fire taking out a Big Cloud DC. They
| actually know what they're doing and don't put server racks in
| shipping containers stacked on top of each other. If you want
| quality in life, sometimes you have to pay for it.
|
| Personally I'll continue to use these third world cloud
| providers. But I like to live on the edge.
| jiofih wrote:
| Apple fire: https://www.datacenterdynamics.com/en/news/fire-
| rages-throug...
|
| Google fire: https://www.google.nl/amp/s/gigazine.net/amp/en/
| 20060313_goo...
|
| AWS fire: https://money.cnn.com/2015/01/09/technology/amazon-
| data-cent...
| hashhar wrote:
| AWS and GCP are also prone to same kind of data loss if the AZ
| you are operating in goes down.
|
| They don't automatically geo-replicate things. You still need a
| backup for the torched EC2 instance to be able to relaunch in
| another AZ/region.
| nnx wrote:
| That's true, but it seems whole of SBG region for OVH is
| within same disaster radius for one fire... with SBG2
| destroyed and SBG1 partly damaged.
|
| "The whole site has been isolated, which impacts all our
| services on SBG1, SBG2, SBG3 and SBG4. "
|
| Wonder if those SBGx were advertised as being the same as
| "Availability Zones" - when other cloud providers ensure
| zones are distanced enough from each other (~1km at least) to
| likely survive events such as fire.
| brmgb wrote:
| > but it seems whole of SBG region for OVH is within same
| disaster radius for one fire
|
| SBG is for Strasbourg. That's not a region. It's a city.
| Obviously, SBG1 to 4 are in the radius of one fire. It's
| four different buildings on the same site.
| hashhar wrote:
| Thats a fair point. If OVH does market them as AZs then
| it's disingenuous and liable to suits IMO.
| cbg0 wrote:
| No, it isn't, as there's no clear cut definition of what
| an availability zone is.
| ev1 wrote:
| They were not and never advertised as anything similar to
| AZ. You could not deploy in SBG1,2,3, etc. You only pick
| city = Strasbourg at deploy time. It's merely a building
| marker.
| tyingq wrote:
| The buildings are VERY close to one another.
|
| https://cdn.baxtel.com/data-center/ovh-strasbourg-
| campus/pho...
| Zevis wrote:
| That seems, uh, problematic.
| leajkinUnk wrote:
| "Big cloud" has had fires take out clusters, and somehow they
| manage to keep it out of the news. In spite of the redundancy
| and failover procedures, keeping your data centers running when
| one of the clusters was recently *on fire* is something that is
| often only possible due to heroic efforts.
|
| When I say "heroic efforts", that's in contrast to "ordinary
| error recovery and failover", which is the way you'd want to
| handle a DC fire, because DC fires happen often enough.
|
| The thing is, while these big companies have a much larger base
| of expertise to draw on and simply more staff time to throw at
| problems, there are factors which incentivize these employees
| to *increase risk* rather than reduce it.
|
| These big companies put pressure on all their engineers to
| figure out ways to drive down costs. So, while a big cloud
| provider won't make a rookie mistake--they won't forget to run
| disaster recovery drills, they won't forget to make backups and
| run test restores--they *will* do a bunch of calculations to
| figure out how close to disaster they can run in order to save
| money. The real disaster will then reveal some false, hidden
| assumption in their error recovery models.
|
| Or in other words, the big companies solve all the easy
| problems and then create new, hard problems.
| exikyut wrote:
| I'm curious what references or leads I might follow to learn
| more about these fires and other events you mention.
| leajkinUnk wrote:
| Get a job working at these companies and go out for drinks
| with the old-timers.
| [deleted]
| pm90 wrote:
| You know, those are excellent observations. But they don't
| change the decision calculus in this case. Using bigger cloud
| providers doesn't eliminate all risk, it just creates a
| different kind of risk.
|
| What we call "progress" in humanity is just putting our best
| efforts into reducing or eliminating the problems we know how
| to solve without realizing the problems they may create
| further down the line. The only way to know for sure is to
| try it, see how it goes, and then re-evaluate later.
|
| California had issues with many forest fires. They put out
| all fires. Turns out, that solution creates a bigger problem
| down the line with humongous uncontrollable fires which would
| not have happened if the smaller fires had not been put out
| so frequently. Oops.
| crubier wrote:
| The key differentiator of OVH is the very compact datacenters
| they achieve thanks to water cooling. Some OVH exec were touting
| about that in a recent podcast.
|
| Interestingly in this case, having a very compact data center was
| probably an aggravating factor. This shows how complex these
| technical choices are, you have to think of operating savings,
| with a trade off on the gravity of black swan events...
| nnx wrote:
| Interesting. That said, the technique is not the issue here,
| losing a whole datacenter can always happen. This event would
| have been much less serious if all the four SBG* datacenters
| were not all so close to each other on the same plot of land.
|
| They are so close to each other that they are basically the
| same physical datacenter with 4 logical partitions.
| delfinom wrote:
| They are all annexes built up over time and a victim of their
| own success (the site was meant to be more of a edge node in
| size). The container based annexes were meant to be
| dismantled 3 years ago but profit probably got in the way
| nikanj wrote:
| According to the official status page, the whole datacenter is
| still green http://status.ovh.com/vms/index_sbg2.html
| aetherspawn wrote:
| You're completely right. IMO this is the best comment in this
| whole thread. Their status page must be broken, or it's a lie.
| bootloop wrote:
| If there is a SLA with consequences associated with it every
| status page is going to be a lie.
| aetherspawn wrote:
| Well, it sucks to catch fire and I care for the employees
| and the firemen, but if their status page is a lie then I
| have a whole lot less sympathy for the business. That's
| shady business and they should feel bad.
|
| I can appreciate an honest mistake though, like the status
| page server cron is hosted in the same cluster that caught
| fire and hence it burnt down and can't update the page
| anymore.
| Ploskin wrote:
| Is the status page relevant though? At the very least,
| OVH immediately made a status announcement on their
| support page and they've been active on Twitter. I don't
| see anything shady here. From their support page:
|
| > The whole site has been isolated, which impacts all our
| services on SBG1, SBG2, SBG3 and SBG4. If your production
| is in Strasbourg, we recommend to activate your Disaster
| Recovery Plan
|
| What more could you want?
| [deleted]
| kahrl wrote:
| To have a status page that reflects actual statuses? To
| know that I'm not being lied to or taken advantage of? To
| know that my SLA is being honored?
| PudgePacket wrote:
| > Is the status page relevant though?
|
| What's the point of a status page then if it does not
| show you the status? I don't want to be chasing down
| twitter handles and support pages during an outage.
| Scoundreller wrote:
| Still better than Amazon where their status page
| describes little and fat chance of anyone official
| sharing anything on social media either.
|
| I wonder if a server fire would cause Amazon to go to
| status red. So far anything and everything has fallen
| under yellow.
| whalesalad wrote:
| lol... that's how most status pages are
| tweetle_beetle wrote:
| I feel like there should be place to report infrastructure
| suppliers with misleading status pages, some kind of
| crowdsourced database. Without this information, you only find
| out that they are misleading when something goes very wrong.
|
| At best you might be missing out on some SLA refunds, but at
| worst it could be disasterous for a business. I've been on the
| wrong side of a update-by-hand status system from a hosting
| provider before and it wasn't fun.
| vntok wrote:
| https://downdetector.fr/
| hedora wrote:
| Wtf is this disclaimer on Down Detector for? (Navigate to
| OVH page.). It sits in front of user comments, I think:
|
| > _Unable to display this content to due missing consent.
| By law, we are required to ask your consent to show the
| content that is normally displayed here._
| Symbiote wrote:
| It's a Disqus widget. If you denied consent for third
| party tracking, they can't load it.
|
| I've usually seen this with embedded videos rather than
| comments.
| tweetle_beetle wrote:
| Thanks, can't belive it's taken me 8 years to learn about
| that.
| globular-toast wrote:
| Who monitors the monitors?
|
| Agreed, though. A fake status page is worse than no status
| page. I don't mind if the status page states that it's
| manually updated every few hours as long as it's honest. But
| don't make it look like it's automated when it's not.
| kkwtfeliz wrote:
| The weather map is interesting: http://weathermap.ovh.net/
|
| No traffic whatsoever between sbg-g1 and sbg-g2 ant their
| peers.
| deno wrote:
| Screenshot: https://i.imgur.com/2rpWv0P.png
| exikyut wrote:
| https://archive.is/LOHVS
|
| http://web.archive.org/web/20210310100021/http://status.ovh....
| damsta wrote:
| Four hours later still green.
| sgeisler wrote:
| It seems to be a static site, which seems reasonable since it
| aggregates a lot of data and might encounter high load when
| something goes wrong, so generating it live without caching is
| not viable. So maybe the server that normally updates it is
| down too (not that this would be a good excuse)?
| keraf wrote:
| 10+ hours cache on a status page doesn't look like real-time
| monitoring to me.
|
| I think this is probably linked to a manual reporting system
| and they got bigger fish to fry at the moment than updating
| this status page.
| sigotirandolas wrote:
| Counterpoint: There's a constantly updating timestamp at
| the top of the page that suggests it's automated and real
| time.
| jakub_g wrote:
| Also, 3 years ago they had an outage in Strasbourg, and the
| status page was down apparently as a result of the outage.
|
| https://news.ycombinator.com/item?id=15661218
|
| They are not the only ones though. All too common. Well, it's
| tricky to set this up properly. The only proper way would be to
| use external infra for the status page.
| lebaux wrote:
| Isn't that why things like statuspage.io exist, though?
| dolmen wrote:
| Where is statuspage.io hosted?
| cortesoft wrote:
| I work for a CDN, and we had to change our status page
| provider once when they became our customer.
| formerly_proven wrote:
| In the cloud of course. Why do you ask?
| [deleted]
| kalleboo wrote:
| It's not difficult to make a status page with minimal false
| negatives. Throw up a server on another host that shows red
| when it doesn't get a heartbeat. But then instead you end up
| with false positives. And people will use false positives
| against you to claim refunds against your SLA.
|
| So nobody chooses to make an honest status page.
| jakub_g wrote:
| Yep. I guess what could be done is a two-tiered status
| page: automated health check which shows "possible outage,
| we're investigating" and then a manual update (although
| some would say it looks lame to say "nah, false positive"
| which is probably why this setup is rare).
| toast0 wrote:
| As someone who maintained a status page (poorly), I'm sorry
| on behalf of all status pages.
|
| But, they're usually manual affairs because sometimes the
| system is broken even when the healthcheck looks ok, and
| sometimes writing the healthcheck is tricky, and always you
| want the status page disconnected from the rest of the
| system as much as possible.
|
| It is a challenge to get 'update the status page' into the
| runbook. Especially for runbooks you don't review often
| (like the one for the building is on fire, probably).
|
| Luckily my status page was not quite public; we could show
| a note when people were trying to write a customer service
| email in the app; if you forget to update that, you get
| more email, but nobody posts the system is down and the
| status page says everything is ok.
| gpm wrote:
| > Legend: Servers down: 0 1+ 4+ 6+ 8+ 10+ 15+
|
| If they don't have any servers anymore, how can they be down ;)
| refraincomment wrote:
| I will never gonna financially recover from this..
| kzrdude wrote:
| Lichess was affected by this fire:
| https://twitter.com/lichess/status/1369543554255757314
|
| But they seem to be back up
| odiroot wrote:
| Interesting! I got the news from my local package courier
| website. They warn, their services can be unreliable due to the
| fire at OVH.
|
| It's all connected.
| [deleted]
| simplecto wrote:
| This is horrible and a sobering reminder to do the things we
| don't enjoy or consider -- disaster recovery.
|
| How many of us here plan Fire Drills within our teams and larger
| organizations?
| tester34 wrote:
| holy shit my 3$ VPS!!
|
| nvm not this dc!
| canadianfella wrote:
| I hereby declare this to be a fire.
| martinald wrote:
| Interesting. I wonder if the cladding was a major problem here?
| It looks like it has all burnt out and could have had the fire
| spread extremely rapidly on the outside.
| jiofih wrote:
| The cladding was metal so very unlikely it contributed to the
| fire spreading.
| mot2ba wrote:
| Ovh needed a better firewall :(
| rjsw wrote:
| The photos look like the building had external cladding, wonder
| if that contributed to the size of the blaze [1].
|
| [1] https://en.wikipedia.org/wiki/Grenfell_Tower_fire
| [deleted]
| thbb21 wrote:
| My VPS in SBG3 stopped pinging around 9am.
|
| My impression is that they tried very hard to maintain uptime,
| which was probably a bad idea when we see the extent of the
| damages. This VPS just hosts external facing services and is easy
| to set back up.
| zoobab wrote:
| To my experience, backups in companies are barely done.
|
| Companies want quick money, they push people to skip important IT
| operations, like disaster recovery plans.
|
| And backups are the least monitored systems.
| mwcampbell wrote:
| I just recently started moving some services for my business to
| one of OVH's US-based data centers. Should I take this fire as
| evidence that OVH is incompetent and get out? I really don't want
| AWS, or the big three hyperscalers in general, to be the only
| option.
| lenartowski wrote:
| IMO you should take this fire as evidence, that you need to
| have (working!) backups wherever you host your data. AWS, GCP
| Azure are not fire resistant, same as OVH. I don't know if OVH
| is more or less competent than big three, I choose to trust no
| one.
| ghosty141 wrote:
| I read multiple times that they didn't even have sprinklers,
| only smoke detectors in their EU datacenter(s). I'm 100% sure
| AWS, Azure and Google have better fire prevention.
| Symbiote wrote:
| This thread has people saying they have sprinklers, don't
| have sprinklers, have / don't have gas suppression, and
| have puppies / actually have toilets.
|
| Wait for the misinformation hose to dry up, and decide in a
| few weeks.
|
| https://us.ovhcloud.com/about/company/security
| SamLicious wrote:
| This is such a an eye opening story, didn't know about any of
| this. Thank you for sharing!
| jbeales wrote:
| > If your production is in Strasbourg, we recommend to activate
| your Disaster Recovery Plan.
|
| Ouf.
| aw4y wrote:
| elliot..is it you?
| drpgq wrote:
| Like something out of Mr Robot
| Aldipower wrote:
| Looks a little bit like Fukushima. I hope the clean up doesn't
| take that long though..
| MattGaiser wrote:
| > We recommend to activate your Disaster Recovery Plan.
|
| What percentage of organizations have these?
| RantyDave wrote:
| I guess we're about to find out (or, rather, they are).
| ikiris wrote:
| Some plans have a single step: 1) panic.
| dylan604 wrote:
| Short answer: not enough.
| ohnonotagain9 wrote:
| (site a)---[replicate local LUNs/shares to remote storage
| arrays]--->(site b), (site a)---[replicates local VMs to remote
| HCI]--->(site b), (site a)---[local backups to local data
| archive]--->(site a), (site a)---[local data archive replicates
| to remote data archive]--->(site b), (site b)---[remote data
| archive replicates to remote air gapped data archive]--->(site
| b), (site a)---[replicates to cold storage on
| aws/gcp/azure]--->(site c), (site c)---[replicate to another geo
| site on cloud]--->(site d)
|
| scenario 1: site a is down plan: recover to site b by most
| convinent means
|
| scenario 2: site b is down plan: restore services, operate
| without redundancy out of site a
|
| scenario 3: site c is down plan: restore services, catch up
| later. continue operating out of site a
|
| scenario 4: site b and c down plan: restore services, operate
| without redundancy out of site a
|
| scenario 5: site a and b down plan: cross fingers, restore to new
| site from cold storage on expensive cloud VM instances
|
| scenario 6: data archive corrupted ransomware plan: restore from
| air gapped data archive, hope ransomware was identified within 90
| days
|
| scenario 7: site b and c down, then site a down plan: quit
|
| scenario 8: staff hates job and all quit plan: outsource
|
| scenario 9: and so on...
| ricardobeat wrote:
| > At 00:47 on Wednesday, March 10, 2021, a fire broke out in a
| room in one of our 4 datacenters in Strasbourg, SBG2. Please note
| that the site is not classified as a Seveso site.
|
| > Firefighters immediately intervened to protect our teams and
| prevent the spread of the fire. At 2:54 am they isolated the site
| and closed off its perimeter.
|
| > By 4:09 am, the fire had destroyed SBG2 and continued to
| present risks to the nearby datacenters until the fire brigade
| brought the fire under control.
|
| > From 5:30 am, the site has been unavailable to our teams for
| obvious security reasons, under the direction of the prefecture.
| The fire is now contained.
| [deleted]
| nerdbaggy wrote:
| Dang, I have a lot of respect for Octave and what he has created.
| https://twitter.com/olesovhcom?s=21
| de6u99er wrote:
| After the total loss of one data-center I would tend to
| disagree with this statement.
| redisman wrote:
| Wow. That equipment is going to be very hard to replace right now
| too.
| TrueDuality wrote:
| There have been a handful of talks at computer security
| conferences talking about setting up physical traps in server
| chassis (such as this one:
| https://www.youtube.com/watch?v=XrzIjxO8MOs). Since seeing those
| I've been waiting for some idiot to try something like that in a
| physical server and burn down a data center.
|
| There is NO evidence that is what happened here, and I don't
| think OVH allows customers to bring their own equipment making
| even less likely. Still I wait and hope to hear a root cause from
| this one.
| IceWreck wrote:
| Always, always keep local backups folks.
| coolspot wrote:
| "Local" meaning on the same server, just in another folder,
| right?
| IceWreck wrote:
| Sorry, I should've phrased it better.
|
| Local as in your home/office. While your application may run
| in AWS or whatever remote server, its necessary to have
| copies of your data that you can physically touch and access.
|
| One main deployment, one remote backup and one onsite
| physically accessible backup.
| k_sze wrote:
| I sent this story to my colleagues and one of them asked "where
| is the FM200?"
|
| I don't really know how FM200 systems work in data centres, but
| I'm guessing that if the fire didn't start from within the actual
| server room, FM200 might not save you? e.g. if a fire started
| elsewhere and went out of control, it would be able to burn
| through the walls/ceiling/floor of the server room, in which case
| no amount of FM200 gas can save you, right?
|
| Another possibility, of course, is that the FM200 system simply
| failed to trigger even though the fire started from within the
| server room.
|
| There is no published investigation details about this incident
| yet, I believe. Can somebody chime in about past incidents where
| FM200 failed to save the day?
| _up wrote:
| Apparently they don't use gas at all.
|
| https://lafibre.info/ovh-datacenter/ovh-et-la-protection-inc...
| k_sze wrote:
| Ah, so it's possible that they also used a water sprinkler
| system at SBG2. But still, I wonder how the fire protection
| system (water sprinkler, FM200, or otherwise) not save SBG2?
|
| It doesn't really surprise me that the machines are dead, but
| the whole place being _destroyed_ is much more surreal.
| etiennemarcel wrote:
| I think most of these gases are or will eventually be banned in
| Europe because of their impact on the environment. I've seen
| newer datacenters use water mist sprays.
| mike_d wrote:
| Nitrogen makes up 78% of the atmosphere, so I doubt it will
| be banned. Most datacenters don't actually use halocarbons
| despite the common "FM200" name.
| divingdragon wrote:
| You might be thinking of Halons, which are CFCs that depletes
| the ozone layer? They are mostly phased out worldwide but
| existing installations might still be in use.
|
| FM200 is something else that is often used in modern builds
| (not just datacenters).
| etiennemarcel wrote:
| It seems that HFC are being phased out too:
| https://ec.europa.eu/clima/policies/f-gas/legislation_en
| divingdragon wrote:
| I've heard that one. I thought it mostly affects
| refrigerants, but I didn't notice that FM200 is also an
| HFC. There are other fire suppression gasses with a low
| global warming potential, which probably can still be
| used in the future.
| exikyut wrote:
| How... what. What if the fire is electrical? You can't just
| go "well the triple interlocked electrical isolation will
| trip and cut the current" if a random fully-charged UPS
| decides to get angry...
| giis wrote:
| I don't know whether my 10+ year side project with 225,000
| users(www.webminal.org) gone forever! :(
|
| I have backup snapshots but its stored in ovh itself :( hoping
| for a miracle!
| quickthrower2 wrote:
| Really sorry to hear, hope you get it restored. I cannot judge
| - I use the default backup options in Azure and hope they store
| it in another data centre but never though to check too hard.
| This is very bad luck.
|
| Hopefully you had the code in GitHub but that still leaves the
| DB. It looks like your has something to do with command line or
| Linux lessons so not sure how much user data is critical? Maybe
| you can get this up and running again to some extent.
| XCSme wrote:
| What was the reason of storing the backup on the same server?
| To allow for rollbacks in case of data corruption or some
| changes gone wrong?
| giis wrote:
| Yes, I thought rollbacks will be much easier in case of data
| loss. :sob:
| Operyl wrote:
| I think they're talking about the backup services provided by
| OVH, in which I believe they're stored in RBX.
| kuschku wrote:
| If your backup snapshots are stored through OVH's normal backup
| functionality, then create a new server at e.g. RBX now, and
| restore from those backups. That'll take a few hours and it'll
| all be up again quickly.
| nerdbaggy wrote:
| Here is some pics of what it looked like before, spg 1-4 , and
| it's history
|
| https://baxtel.com/data-center/ovh-strasbourg-campus
| gameshot911 wrote:
| Is it just my laptop or is only ~40% of vertical screen real-
| estate dedicated to the actual content? :<
| softblush wrote:
| It's just you
| aetherspawn wrote:
| I have the same issue on my XPS 13 (4K screen), the header
| takes up a good 30% or so of the height of the screen and
| it's like reading through a mailbox slit.
| AnssiH wrote:
| Nope, exactly the same here (on landscape phone).
| bmurray7jhu wrote:
| SBG3 is almost adjacent to SBG2. It is impressive that the
| firefighters saved SGB3 with minimal damage.
| eb0la wrote:
| I'm surprised to see how close the DCs are to the river.
| Fortunately it's in the high part of the river, less prone to
| overflow.
| RantyDave wrote:
| Maybe quite handy for water cooling?
| anilakar wrote:
| Finally a legitimate reason for BSD sysadmins to run poweroff -n!
| awalias wrote:
| not to be a conspiracist, but are they still hosting wikileaks
| data? https://en.wikipedia.org/wiki/OVH#WikiLeaks
| lgleason wrote:
| https://www.youtube.com/watch?v=1EBfxjSFAxQ
| mhh__ wrote:
| I'm not sure if it's because my tolerance of Graham Linehan has
| snapped or not, but I barely laugh at the IT Crowd any more. As
| with other GL shows I find it's just mostly held together but
| the cast's delivery and such
|
| The laugh track and the writing is honestly dated even by the
| standards of Dads Army.
| jrockway wrote:
| I don't remember the details, but I think that season 2 kind
| of retroactively ruined season 1. They used to have all those
| O'Reilly and EFF stickers, and working at a help desk at the
| time, it felt very authentic. Then everything got super nice
| in season 2 -- leather couches, people were dressing nicely,
| etc. It kind of lost its charm. You can't rewatch it because
| you know Denholm is just going to randomly jump out of a
| window.
|
| (Having said that, I think "Fire" was a memorable episode
| that is still amusing. The 0118911881999119 song, "it's off,
| so I'll turn it on... AND JUST WALK AWAY".)
|
| It might have been ahead of its time. Silicon Valley was well
| received and is as nerdy and intricately detailed as Season 1
| of the IT Crowd. "Normal people" thought it was far out and
| zany. People that work in tech have been to all those
| meetings. And, a major character was named PG!
| wott wrote:
| > They used to have all those O'Reilly and EFF stickers,
| and working at a help desk at the time, it felt very
| authentic. Then everything got super nice in season 2 --
| leather couches, people were dressing nicely, etc. It kind
| of lost its charm.
|
| That sounds like a pretty realistic allegory of the last
| two decades in Free Software (or software in general, or
| the web...)
| kuschku wrote:
| I only just realized the Paul Graham/Peter Gabriel easter
| egg...
| muglug wrote:
| The IT Crowd's comedy became dated incredibly quickly, just
| like Father Ted's.
|
| Comedies that came later ditched the laugh track. They had to
| work harder to get viewers at home to laugh, but ultimately a
| bunch of them (starting with The UK Office) hold up much
| better as a result.
| Tepix wrote:
| Last update
|
| https://twitter.com/olesovhcom/status/1369535787570724864?s=...
|
| "Update 7:20am Fire is over. Firefighters continue to cool the
| buildings with the water. We don't have the access to the site.
| That is why SBG1, SBG3, SBG4 won't be restarted today."
| NorwegianDude wrote:
| I can't see anything about a fire suppression system mentioned?
| Doesn't OVH have one, except for colocation datacenters?
|
| A fire detection system using eg. lasers and Inergen(or Argonite)
| for putting the fire out is commonly used in datacenters. The gas
| fills the room and reduces the amount of oxygen in the room so
| most fires are put out within a minute.
|
| The cool thing is that the gas is designed to be used in rooms
| with people, so that is can be triggered any time. It is however
| quite loud, and some setups have been known to be too loud, even
| destroying harddrives.
| monsieurbanana wrote:
| > even destroying harddrives
|
| So loud that it destroys hard drives... That's scary, are
| people's eardrums much more resistant?
| [deleted]
| MattGaiser wrote:
| Are fires common in data centres? Specialized fire suppression
| tech seems to indicate that they are.
| jhugo wrote:
| The decision to design and implement specialised tech comes
| from a combination of how likely the risk is and the
| magnitude of the potential loss. Fires are not that common in
| DCs, but the potential loss can be enormous (as OVH is
| currently finding out).
| jabroni_salad wrote:
| They aren't super common, but halon and other gas systems are
| just the right tool for the job. It can get inside the server
| chassis and doesn't damage equipment like a chemical
| application would. We won't know what went wrong at OVH until
| a proper post mortem comes out. These systems work by
| suppressing the flame reaction, but if the actual source was
| not addressed, it could reignite after awhile.
| perlgeek wrote:
| Yes, mostly due to two reasons:
|
| * overall high energy density (lots of current flowing
| everywhere)
|
| * the batteries for backup power are dangerous, and can
| easily(ish) overheat when activated.
| torh wrote:
| A few years ago the company I work for installed suppressors on
| the Inergen system. It did trigger from time to time, which was
| tracked to the humidifiers. And yes -- it did destroy
| harddrives because of the pressure/sound waves before we
| installed the supressors. Haven't had any incidents after we
| fixed the humidifiers.
|
| But Inergen (and other gas) is more or less useless if you
| allow it to escape too quickly. So the cooling system should be
| a fairly closed circut.
|
| Edit: I'm also a Norwegian dude. :)
| pmontra wrote:
| What's the point of a fire suppression system that destroys
| what it should protect?
| rini17 wrote:
| When it destroys one room but prevents fire spreading to
| whole building.
| qeternity wrote:
| In a data center the fire suppression is mostly there to
| protect the servers.
| jhugo wrote:
| At a datacentre I used to visit years ago, part of the site
| induction was learning that if the fire suppression alarm went
| off, you had a certain amount of time to get out of the room
| before the argon would deploy, so you should always have a path
| to the nearest exit in mind. The implication was that it wasn't
| safe to be in the room once it deployed, but I don't know for
| sure.
| NorwegianDude wrote:
| Inergen is used to lower oxygen to ~12.5 %, from normal 21 %.
| Most fires need more than 16 % oxygen.
|
| Inergen consists of 52 % nitrogen, 40 % argon and 8 % carbon
| dioxide.
|
| Carbon dioxide might sound strange, but it causes the heart
| to beat faster to compensate for the lowered amounts of
| oxygen.
|
| The whole point of Inergen is to quickly put out fires where
| water/foam/powder isn't usable and the room might contain
| people.
|
| It's cool stuff, and I was under the impression that
| basically all datacenters used it.
| saagarjha wrote:
| Presumably you would have trouble breathing as it displaced
| the oxygen in the room?
| edejong wrote:
| I heard that the pressure difference might rupture your ear
| drums.
| jhugo wrote:
| Yeah, that was always my understanding. GP saying "the gas
| is designed to be used in rooms with people, so that [it]
| can be triggered any time" made me second-guess that
| though.
|
| Maybe there is a concentration of oxygen that is high
| enough for humans to survive, yet too low to sustain
| combustion?
| thspimpolds wrote:
| The "gas" is more like a fog of capsules. FM-200 is a
| common one. Basically it has a Fire suppression agent
| inside crystals which are blasted into the room by
| compressed air. These crystals melt when they get over a
| certain temperature and therefore won't kill you;
| however, breathing that in isn't really pleasant.
|
| Source: I've been in an FM-200 discharge
| lol768 wrote:
| > Maybe there is a concentration of oxygen that is high
| enough for humans to survive, yet too low to sustain
| combustion?
|
| Yes, supposedly (12% ish?). Can't say I'd be thrilled at
| the idea of testing it.
| growt wrote:
| So how is it safe with people if there is no oxygen left to
| breathe? Reminds me of my first trip to a datacenter, where the
| guy who accompanied us said: "In the event of a fire this room
| is filled with nitrogen in 20 seconds. But don't worry:
| nitrogen is not toxic!" Well, I was a little worried :)
| tallanvor wrote:
| Newer systems like the ones mentioned above are designed to
| reduce the amount of oxygen in the room to around 12% (down
| from around 21%). That's low enough to extinguish fires, but
| allows people to safely evacuate and prevent people from
| suffocating if they're incapacitated.
| yalogin wrote:
| I don't know what ovh is and going to the site point me to a
| speed test with no information.
| MattGaiser wrote:
| French AWS/GCP is my understanding.
| tyingq wrote:
| They are really more like Hetzner. They have "cloud", but
| most of the business is dedicated servers. They also operate
| kimsufi.com and soyoustart.com.
|
| They do have APAC, Canadian, and US data centers as well.
| beckler wrote:
| They're a web/hosting service provider, like AWS or GCP, but
| with less services. They're much more popular in European
| countries.
| tecleandor wrote:
| You went to ovh.net instead of ovh.com.
|
| Largest hosting provider in Europe, probably top10 or top5 in
| the world.
| kuschku wrote:
| Imagine AWS, with less features, but 10x-100x lower prices. And
| now you know why until a few years ago they were larger in
| traffic, customers, and number of servers than even AWS.
| qwertykb wrote:
| Honestly just for the memes. https://isovhonfire.com
| donatj wrote:
| Did this already exist or was this thrown up insanely fast.
| qwertykb wrote:
| Threw it up really quick.
| navanchauhan wrote:
| whois[0] shows it was registered on March 10, so thrown up
| insanely fast.
|
| [0] https://who.is/whois/isovhonfire.com
| bithaze wrote:
| Probably not a good look for this to be the first thing another
| hosting company thinks to put up in response.
| qwertykb wrote:
| /shrug/ We have our infrastructure partially in OVH, I see it
| as a friendly jab at them and a way to get updates without
| having to navigate to twitter.
| sofixa wrote:
| OVH aren't nice in that regard, and have trolled competitors
| to "leave it to the pros" before when there were serious
| incidents ( of which OVH have had their fair share), so it's
| not surprising.
| manishsharan wrote:
| And this is why the big 3 will continue to dominate. AWS,
| Microsoft and Google can throw in a lot more money at their
| phyiscal infrastructure than any other cloud provider.
|
| After this sorry episode, I dont think any CTO or CIO of any
| public company will be able to even consider using the other
| guys.
|
| edit: I am not implying that we put all eggs in one basket with
| no failover and dr. I am implying the big cos will pay 2x premium
| on infrastructure to project reliability.
| fooyc wrote:
| I could replicate my whole infrastructure on 3 different OVH
| datacenters, with enough provision to support twice the peak
| load - it would still be cheaper than a single infrastructure
| at AWS, and I would get a better uptime than AWS:
| https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
| manishsharan wrote:
| I agree with you 100%. However, I did state * any CTO or CIO
| of any public company* ... the executives don't worry about
| costs , they worry about being able to * project reliability
| * .
| fooyc wrote:
| Executives that worry about reliability would insist on
| deploying on multiple data centers, which would make the
| project more reliable than any single AWS availability
| zone.
|
| Also, cost matters if the AWS bill is one of the company's
| top expenses.
| joshuamorton wrote:
| reserved a1.large instances are about half the price of OVH's
| b2-7 instance. a1.xlarge are still cheaper (and larger). So
| you get more raw compute per dollar on AWS.
|
| What?
| kuschku wrote:
| If you need that large machines, and are willing to use
| reserved instances, you'd go with dedis on OVH instead of
| VMs. Which is significantly cheaper
| joshuamorton wrote:
| OVH dedicated instances start at about the size of an
| a1.metal instance, which is ~30% more than the comparable
| OVH instance, but you can get discounts in various ways.
|
| Or you could use t4g.2xlarge, which is cheaper. There's
| no situation where OVH is 3x cheaper (I mean maybe if
| bandwidth is your thing, but IDK).
| jiofih wrote:
| Performance of those AWS instances comes nowhere close
| despite the specs.
| christophilus wrote:
| With Google arbitrarily killing accounts, and with Amazon
| showing that they'll do the same if it's politically expedient,
| I'm not sure I'd trust the big three, either. It's a case of
| "pick your poison".
| gilrain wrote:
| If all of your infrastructure is in one data center, you're on
| a disaster clock no matter who you choose.
___________________________________________________________________
(page generated 2021-03-10 23:01 UTC)