[HN Gopher] Fire declared in OVH SBG2 datacentre building
       ___________________________________________________________________
        
       Fire declared in OVH SBG2 datacentre building
        
       Author : finniananderson
       Score  : 1074 points
       Date   : 2021-03-10 03:11 UTC (19 hours ago)
        
 (HTM) web link (travaux.ovh.net)
 (TXT) w3m dump (travaux.ovh.net)
        
       | bigiain wrote:
       | There is no cloud, there is just other people's computers. And
       | they're on fire.
        
         | quickthrower2 wrote:
         | This is a literal kubectl delete pod
        
         | cookiengineer wrote:
         | Ah, that explains why there are so many dark patterns in the
         | cloud! /s
        
         | Fordec wrote:
         | Your data was in the cloud, now it's in the smoke.
        
           | herpderperator wrote:
           | ...which has now become an actual cloud.
        
           | Rapzid wrote:
           | You might even say, it's _gone with the wind_...
           | 
           | Yeeeeeahhhhhhhhhh!!!!!
        
       | Psype wrote:
       | Been there, done that.
       | 
       | The 2nd worst thing, IF you happen to catch it soon and control
       | it, is that the temp raise triggers many alerts and automatic
       | controls.
       | 
       | So when it's controlled, it's still a real nightmare. Here,
       | firemen could not even control it...
        
       | termau wrote:
       | Most modern data centres wouldn't have had this issue, at least
       | in in Australia they use Argonite suppression systems, these work
       | by using a gas that is a mixture of argon and nitrogen that
       | suppresses fire by depleting oxygen in the the data hall.
        
         | exikyut wrote:
         | I'm seeing quite a lot of repeated sentiment throughout the
         | comments that Halon is illegal and is no longer used.
         | 
         | Is the situation "Halon is legal in Australia" or "Halon isn't
         | actually illegal per se if you don't use a lot of it"?
        
           | aaronmdjones wrote:
           | Halon and Argonite are unrelated.
        
           | divingdragon wrote:
           | Halons are being phased out because, being CFCs, they deplete
           | the ozone layer.
           | 
           | Other gases like argon or FM200 are not Halons.
        
         | LIV2 wrote:
         | I don't think Aussie dc's are all the same. Globalswitch in
         | Sydney uses inergen but Equinix is using water at least at SY1
         | but I'm reasonably sure that GS is much older
        
       | emersion wrote:
       | Picture of the fire from last night:
       | https://twitter.com/BobZeHareng/status/1369563084277374980
        
       | r4ibOm wrote:
       | I see several people talking about advanced backup systems for
       | businesses. I do not have a company, I work as a freelancer, I am
       | Brazilian and the current 70 euros of tuition I was paying was
       | already compromising my income, since my local currency is quite
       | devalued against the dollar. Then imagine the situation. my
       | websites are down, the only backup I have was a copy of the vps
       | that i made in november last year because i was with the
       | intention of creating a server at my house, since it was getting
       | expensive to maintain this server at OVH. It would be
       | unacceptable for a company of this size not to have its servers
       | backed up or to keep them in the same location as the incident,
       | since their networks are all connected. I hope you have a
       | satisfactory and quick solution to this problem.
        
         | MegaThorx wrote:
         | I had my server in SBG2 and sadly the backup failed since the
         | end of january. Yep it is my mistake for not checking the
         | backups. Now I lost about 1 month of data.
         | 
         | The only good thing is that my backup was offsite.
         | 
         | Does OVH offer automatic snapshots for VPS? I know Hetzner does
         | it for 20% addidation of the cost of the server. If they do the
         | next question would be if they are destroyed too?
        
           | kuschku wrote:
           | They do offer automatic backups, and they're offsite (RBX in
           | this case).
           | 
           | Here's more information on them:
           | https://docs.ovh.com/gb/en/vps/using-automated-backups-
           | on-a-...
           | 
           | With OVH the price can depending on the situation double your
           | cost, if you've got a 3EUR VPS and a 3EUR backup
           | configuration (as the price depends on size)
        
         | Symbiote wrote:
         | If OVH backed up everything, the cost of the service would be
         | double.
         | 
         | Many customers don't need a backup, so it's up to each customer
         | to arrange their own backups -- perhaps with tools and services
         | provided by the hosting company, or their own solution.
         | 
         | Running a company with no backup (for cost or any other reason)
         | is very risky, as some people will have found out today.
        
         | jiofih wrote:
         | Well, not having managed backups is obviously part of choosing
         | to go bare-metal. They do have triple-redundancy backups in
         | their cloud offerings. Nobody to blame but yourself.
         | 
         | Also, if you're hosting clients' static websites, you were
         | burning your money, there are way cheaper options out there
         | (and fully managed).
        
       | Pick-A-Hill2019 wrote:
       | For the curious - this is what it looks like when a fire
       | suppression system activates in a (small) server room.
       | 
       | https://youtu.be/DrDU4UQUwKg?t=60
        
       | KirillPanov wrote:
       | The cloud is on fire.
        
       | Shank wrote:
       | A status update on the OVH tracker for a different datacenter
       | (LIM-1 / Limburg) says "We are going to intervene in the rack to
       | replace a large number of power supply cables that could have an
       | insulation defect." [0][1] The same type of issue is "planned" in
       | BHS [3] and GRA [2].
       | 
       | Eerie timing: do they possibly suspect some bad cables?
       | 
       | [0]: http://travaux.ovh.net/?do=details&id=49016
       | 
       | [1]: http://travaux.ovh.net/?do=details&id=49017
       | 
       | [3]: http://travaux.ovh.net/?do=details&id=49462
       | 
       | [2]: http://travaux.ovh.net/?do=details&id=49465
        
         | mmauri wrote:
         | We have several bare-metal servers on GRA/Gravelines &
         | RBX/Roubaix, 3 weeks ago we had a 3h downtime on RBX because
         | they were replacing power cords without previous notification.
         | Maybe they were aware this could happen, and were in the
         | process to fix it
        
         | dylan604 wrote:
         | >Eerie timing: do they possibly suspect some bad cables?
         | 
         | Why not? Cables with ratings lower than the load they are
         | carrying is a prime cause for electrical fires. If the load is
         | too high for long enough, the shielding melts away, and if it
         | is close enough for other material to catch fire then that's
         | the ball game. It's a common cause for home electrical fires.
         | Some lamp with poor wiring catches the drapes on fire, etc.
         | Wouldn't think a data center would have flammable curtains
         | though.
        
         | jfrunyon wrote:
         | They're waiting an awful long time to do the one at BHS-7 if
         | so: 14 days from now?
        
         | terom wrote:
         | https://www.google.com/search?q=site%3Ahttp%3A%2F%2Ftravaux....
         | there's quite a few of these
         | 
         | http://travaux.ovh.net/?do=details&id=47840 earliest one that I
         | found was back in December
        
       | [deleted]
        
         | [deleted]
        
       | trailmonster wrote:
       | As an industry we like to think we're transparent, honest, and
       | perhaps even based on merit, but I can't find any of that in this
       | messaging.
       | 
       | "We are currently facing a major incident in our Strasbourg
       | datacentre, with a fire declared in the SBG2 building.
       | Firefighters intervened immediately on the spot but were unable
       | to control the SBG2 fire. As a precautionary measure, the
       | electricity was cut off on the whole site, which impacts all our
       | services at SBG1, SBG2, SBG3 and SBG4. If your production is in
       | Strasbourg, we recommend that you activate your Business Recovery
       | Plan."
       | 
       | Incidents are faced, not caused. It's made clear that they called
       | the fire department as soon as they should have, and they did
       | what they could as well, but in reality, your disaster recovery
       | plan and how well you implemented it is what's really the
       | question now, isn't it?
       | 
       | I think it's far easier to ask whether you've done the right
       | thing, user, than it is to ask why a fire managed to take out an
       | entire facility that was designed to prevent that exact scenario,
       | but only if you're OVH.
        
       | Kye wrote:
       | The best time to test your backups is before the production
       | server dies in a fire.
        
       | curiousgal wrote:
       | Came across this on r/France
       | 
       | https://i.imgur.com/epj1Lue.png
       | 
       | Translation :
       | 
       | We lost our Gitlab and backups...
       | 
       | And the automatic backups that had been put in place no longer
       | worked so a priori we lost everything...
        
       | geocrasher wrote:
       | The classic "lp0 on fire" error message comes to mind:
       | https://en.wikipedia.org/wiki/Lp0_on_fire
       | 
       | Really though, I feel truly awful for anyone affected by this.
       | The post recommends implementing a disaster recovery plan. The
       | truth is that most people don't have one. So, let's use this post
       | to talk about Disaster Recovery Plans!
       | 
       | Mine: I have 5 servers at OVH (not at SBG) and they all back up
       | to Amazon S3 or Backblaze B2, and I also have a dedicated server
       | (also OVH/Kimsufi) that gets the backups. I can redeploy in less
       | than a day on fresh hardware, and that's good enough for my
       | purposes. What's YOUR Disaster Recovery Plan?
        
         | tomxor wrote:
         | Basically the same (offsite backups), but the details are in
         | the what and how which is subjective... For my purposes I
         | decided that offsite backups should only comprise user data and
         | that all server configuration be 100% scripted with some
         | interactive parts to speed up any customization including
         | recovering backups. I also have my own backup servers rather
         | than using a service, and implement immutable incremental
         | backups with rotated ZFS snapshots (this is way simpler than it
         | sounds) - I can highly recommend ZFS as an extremely reliable
         | incremental backup solution but you must enable block level
         | deduplication and expect it to gobble up all the server RAM to
         | be effective (but that's why I dedicate a server to it and
         | don't need masses of cheap slow storage)... also the backup
         | server is restorable by script and only relies on having at
         | least one of the mirrored block devices in tact which I make a
         | local copy of occasionally.
         | 
         | I'm not sure how normal this strategy is outside of container
         | land but I like just using scripts, they are simple and
         | transparent - if you take time and care to write them well.
        
           | mwcampbell wrote:
           | This sounds like what I want to do for the new infrastructure
           | I'm setting up in one of OVH's US-based data centers. Are you
           | running on virtual machines or bare metal? What kind of
           | scripting or config management are you using?
        
             | tomxor wrote:
             | VPS although there is no dependency on VPS manager stuff so
             | I don't see any issue with running on bare metal. No config
             | managers, just bash scripts.
             | 
             | They basically install and configure packages using sed or
             | heredocs with a few user prompts here and there for setting
             | up domains etc.
             | 
             | If you are constantly tweaking stuff this might not suit
             | you, but if you know what you need and only occasionally do
             | light changes (which you must ensure the scripts reflect)
             | then this could be an option for you.
             | 
             | It does take some care to write reliable clear bash
             | scripts, and there are some critical choices like `set -e`
             | so that you can walk away and have it hit the end and know
             | that it didn't just error in the middle without you
             | noticing.
        
         | fy20 wrote:
         | I have three servers (1 OVH - different location, 2 DO). The
         | only thing I backup is the DB, which is synced daily to S3.
         | There's a rule to automatically delete files after 30 days to
         | handle GDPR and stop the bucket and costs spiralling out of
         | control.
         | 
         | Everything is managed with Ansible and Terraform (on DO side),
         | so I could probably get everything back up and running in less
         | than an hour if needed.
        
           | sverhagen wrote:
           | > probably
           | 
           | That makes it sound like you didn't try/practice. I imagine
           | that in a real-life scenario things will be a little more
           | painful than in one's imagination.
        
             | dylan604 wrote:
             | Exactly. Having a plan is only part of it. Good disaster
             | plans do dry runs a couple of times a year (when time
             | changes is always convenient reminder). If you rehearse the
             | recovery when you're not panicked, you have a better chance
             | of not skipping a step when the timing is much more
             | crucial. Also, some sort of guide with steps given
             | procedurally is a great idea.
        
               | slaymaker1907 wrote:
               | I don't think this is necessarily true for all parts of a
               | disaster plan. Some mechanisms may be untestable because
               | it is unknown how to actually trigger it (think certain
               | runtime assertions, but on a larger scale).
               | 
               | Even if it possible to trigger and test, actually using
               | the recovery mechanism may have some high cost either
               | monetarily or maybe losing some small amount of data.
               | These mechanisms should almost always be an additional
               | layer of defense and only be invoked in case of true
               | catastrophe.
               | 
               | In both cases, the mechanisms should be tested as
               | thoroughly as possibly, either through artificial
               | environments that can simulate improbable scenarios or in
               | the latter case on a small test environment to minimize
               | cost.
        
         | lesquivemeau wrote:
         | Personally my email hosting is down but thanksfully my web
         | hosting and nextcloud instance were both at GRA2 (Gravelines).
         | 
         | But i have a friend who potentially lost important uni work
         | hosted on his nextcloud instance... On SBG2.
         | 
         | A rough reminder that backups are really important, even if you
         | are just an individual
        
         | Ceezy wrote:
         | A lot of prayers....
        
         | NicoJuicy wrote:
         | Snapshots, db backups and data backups.
         | 
         | Rolling backups with a month retention to box using rsync.
         | 
         | It creates a network drive to box by default when I boot my
         | desktop.
         | 
         | I have some scripts for putting production db's in test and
         | when I went them locally.
        
         | cm2187 wrote:
         | Also stupid things not to forget: make sure your dns provider
         | is independent otherwise you won't be able to point to your new
         | server (or have a secondary DNS provider). Make sure any email
         | required for 2FA or communicating with your hosting service
         | managing your infrastructure isn't running on that same
         | infrastructure.
        
         | pmlnr wrote:
         | I only have a personal server running in Hetzner but it's
         | mirrored onto a tiny local computer at home.
         | 
         | They both run postfix + dovecot, so mail is synced via dovecot
         | replication. Data is rsync-ed daily, and everything has ZFS
         | snapshots. MySQL is not set into replication - my home internet
         | breaks often enough to have serious issues, so instead I drop
         | everything every day import a full dump from the main server,
         | and do a local dump as backup on both sides.
         | 
         | I don't have automatic failover set up.
        
           | clan wrote:
           | Not saying that you should never do a full mysql dump. Nor
           | that you should not ensure that you can import a full dump.
           | 
           | But when you already use ZFS you can do a very speedy full
           | backup with:                   mysql << EOF             FLUSH
           | TABLES WITH READ LOCK;             system zfs snapshot
           | data/db@snapname             UNLOCK TABLES;         EOF
           | 
           | Transfer the snapshot off-site (and test!). Either as a
           | simple filecopy (the snapshot ensured a consistent database)
           | or a little more advanced with zfs send/receive. This is much
           | quicker and more painless than mysql dump. Especially with
           | sizeable databases.
        
             | pmlnr wrote:
             | Good point, but my DB is tiny, so for now, I can afford the
             | mysqldump. But I'll keep this in mind.
        
             | mwcampbell wrote:
             | Do you even need to flush the tables and grab a read lock
             | while taking the ZFS snapshot? My understanding was that
             | since ZFS snapshots are point-in-time consistent, taking a
             | snapshot without flushing tables or grabbing a read lock
             | would be safe; restoring from that snapshot would be like
             | rebooting after losing power.
        
         | tetha wrote:
         | At work, there are several layers.
         | 
         | As an immediate plan, the 2-3 business critical systems are
         | replicating their primary storages to systems in a different
         | datacenter. This allows us to kick off the configuration
         | management in a disaster, and we need something in between 1-4
         | hours to setup the necessary application servers and
         | middlewares to get critical production running again.
         | 
         | Regarding backups, backups are archived daily to 2 different
         | borg repo hosts on different cloud providers. We could lose an
         | entire hoster to shenanigans and the damage would be limited to
         | ~2 days of data loss at worst. Later this year, we're also
         | considering to export some of these archives to our sister
         | team, so they can place a monthly or weekly backup on tape in a
         | safe in order to have a proper offline backup.
         | 
         | Regarding restores - there are daily automated restore tests
         | for our prod databases, which are then used for a bunch of
         | other tests after anonymization. On top, we've built most
         | database handling on top of the backup/restore infra in order
         | to force us to test these restores during normal business
         | processes.
         | 
         | As I keep saying, installing a database is not hard. Making
         | backups also isn't hard. Ensuring you can restore backups, and
         | ensuring you are not losing backups almost regardless of what
         | happens... that's hard and expensive.
        
         | tialaramex wrote:
         | If you are a corporate entity of some kind, the final layer of
         | your plan should always be "Go bankrupt". You can't
         | successfully recover from every possible disaster and you
         | shouldn't try to. In the event of a sufficiently unlikely
         | event, your business fails and every penny spent attempting the
         | impossible will be wasted, move on and let professional
         | administrators salvage what they can for your creditors.
         | 
         | Lots of people plan for specific elements they can imagine and
         | forget other equally or even more important things they are
         | going to need in a disaster. Check out how many organisations
         | that doubtless have 24/7 IT support in case a web server goes
         | down somehow had _no plan_ for what happens if it 's unsafe for
         | their 500 call centre employees to sit in tiny cubicles
         | answering phones all day even though pandemic respiratory
         | viruses are _so_ famously likely that Gates listed them
         | consistently as the #1 threat.
        
           | jfrunyon wrote:
           | IMHO, the part they had no plan for was being unable to just
           | require their employees to come in anyway...
        
             | osmano807 wrote:
             | Just lobby the government to put call centers in "essential
             | services". In my state they are open even with a partial
             | lockdown.
        
             | tialaramex wrote:
             | The more insecure your workers, the easier it is to get
             | them to come in, regardless of what the supposed rules may
             | or may not be.
             | 
             | Fast Fashion for example often employs workers in more or
             | less sweatshop conditions close to the customers (this
             | makes commercial sense, if you make the hot new items in
             | Bangladesh you either need to expensively air freight them
             | to customers or they're going to take weeks to arrive after
             | they're first ordered - there's a reason it isn't called
             | "Slow fashion"). These jobs are poorly paid, many workers
             | have dubious right-to-work status, weak local language
             | skills, may even be paid in cash - and so if you tell them
             | they must come in, none of them are going to say "No".
             | 
             | In fact the slackening off in R for the area where my
             | sister lives (today the towering chimneys and cavernous
             | brick factories are just for tourists, your new dress was
             | made in an anonymous single story building on an industrial
             | estate) might be driven more by people not needing to own
             | new frocks every week when they've been no further than
             | their kitchen in a month than because it would actually be
             | illegal to staff their business - if nobody's buying what
             | you make then suddenly it makes sense to take a handout
             | from the government and actually shut rather than pretend
             | making mauve turtleneck sweaters or whatever is
             | "essential".
        
               | Mauricebranagh wrote:
               | For non uk residents "Frock" is a regional term for dress
               | - quite common in the Midlands.
        
               | namibj wrote:
               | Just to clarify: trans-atlantic shipments take a week
               | port-to-port, e.g. Newark, NJ, USA to Antwerp, Belgium.
               | (Bangladesh to Italy via Suez-channel looks like a 2-week
               | voyage, or 3 weeks to the US west coast. Especially the
               | latter would probably have quite a few stops on the way
               | along the Asian coast.) You get better economics than
               | shipping via air-freight from one full pallet and up.
               | Overland truck transport to and from the port is still
               | cheaper than air freight, at least in the US and central
               | Europe.
               | 
               | For these major routes, there are typically at least bi-
               | weekly voyages scheduled, so for this kind of distance,
               | you can expect about 11 days pretty uniformly distributed
               | +-2 days, if you pay to get on the next ship.
               | 
               | This may lead to (committing to) paying for the spot on
               | the ship when your pallet is ready for pickup at the
               | factory, not when it arrives at the port) and use low-
               | delay overland trucking services. Which operate e.g. in
               | lockstep with the port processing to get your pallet on
               | the move within half a day of the container being
               | unloaded from the ship, ideally having containers pre-
               | sorted at the origin to match truck routes at the
               | destination. So they can go on a trailer directly from
               | the ship and rotate drivers on the delivery tour,
               | spending only a few minutes at each drop-off.
               | 
               | Because those can't rely on customers to be there and get
               | you unloaded in less than 5 minutes, they need locations
               | they can unload at with on-board equipment. They'd notify
               | the customer with a GPS-based ETA display, so the
               | customer can be ready and immediately move the delivery
               | inside. Rely on 360-degree "dashcam" coverage and
               | encourage the customer to have the drop-off point under
               | video surveillance, just to easily handle potential
               | disputes. Have the delivery person use some suitable
               | high-res camera with a built-in light to get some full-
               | surface-coverage photographic evidence of the condition
               | it was delivered in.
               | 
               | I'd guess with a hydraulic lift on the trailer's back and
               | some kind of folding manual pallet jack stuck on that
               | (fold-up) lift, so they drive up to the location, unlock
               | the pallet jack, un-fold the lift, lower the lift almost
               | to the ground, detach the pallet jack to drop it the last
               | inch/few cm to the ground, pull the jack out, lower the
               | lift the rest of the way, drive it on to the lift, open
               | the container, get up with the pallet jack, drive the
               | pallets (one-by-one) for this drop-off out of the
               | container and leave them on the ground, close and lock
               | the container, re-arm the jack's hooks, shove it jack
               | back under the slightly-lowered folding lift, make it
               | hook back in, fold it up, lock the hooking mechanism
               | (against theft at a rest stop (short meal and toilet
               | breaks exist, but showering can be delayed for the up to
               | 2 nights)), fold it all the way up, and go on to drive to
               | their next drop-off point.
        
           | jacquesm wrote:
           | The final layer is call the insurance company.
        
             | corty wrote:
             | Not really, the insurance won't make things right in an
             | instant. They will usually compensate you financially, but
             | often only after painstaking evaluation of all
             | circumstances, weighing their chances in court to get out
             | of paying you and maybe a lengthy court battle and a race
             | against your bankruptcy.
             | 
             | So yes, getting insurance can be a good idea to offset some
             | losses you may have, as long as they are somewhat limited
             | compared to your companies overall assets and income. But
             | as soon as the insurance payout matches a significant part
             | of your net worth, the insurance might not save you.
        
               | jacquesm wrote:
               | Fair enough.
        
             | tomatocracy wrote:
             | There are always uninsurable events and for large enough
             | companies/risks there are also liquidity limits to the size
             | of coverage you can get from the market even for insurable
             | events.
             | 
             | As such, it makes sense to make the level of risk you plan
             | to accept (by not being insured against it and not
             | mitigating) a conscious economic decision rather than
             | pretending you've covered everything.
        
               | jacquesm wrote:
               | As long as you have outside shareholders you can decide
               | that. If you do you'd be surprised about how they will
               | respond to an attitude like that. After all: you can
               | decide the levels of risk that you personally are
               | comfortable with leading to extinguishing of the
               | business, but a typical shareholder is looking at you to
               | protect their investment and not insuring against a known
               | risk which at some point in time materializes is an
               | excellent way to find yourself in the crosshairs of a
               | minority shareholder lawsuit against a (former) company
               | executive.
        
               | tomatocracy wrote:
               | In my work life I am a professional investor, so I've
               | been through the debate on insure/prepare or not many
               | times. It's always an economic debate when you get into
               | "very expensive" territory (cheap and easy is different
               | obviously).
               | 
               | The big example of this which springs to mind is business
               | interruption cover - it's ruinously expensive so it's
               | extremely unusual to have the max cover the market might
               | be prepared to offer. It's a pure economic decision.
        
               | jacquesm wrote:
               | Yes, but it is an informed decision and typically taken
               | at the board level, very few CEO's that are not 100%
               | owners would be comfortable with the decision to leave an
               | existential risk uncovered without full approval of all
               | those involved, which is kind of logical.
               | 
               | Usually you'd have to show your homework (offers from
               | insurance companies proving that it really is
               | unaffordable). I totally get the trade-off, and the fact
               | that if the business could not exist if it was properly
               | insured that plenty of companies will simply take their
               | chances.
               | 
               | We also both know that in case something like that does
               | go wrong everybody will be looking for a scapegoat, so
               | for the CEO's own protection it is quite important to
               | play such things by the book, on the off chance the risk
               | one day does materialize.
        
               | tomatocracy wrote:
               | Absolutely - but that's kind of my point. You should make
               | the decision consciously. The corporate governance that
               | goes around that _is_ the company making that decision
               | consciously.
        
               | jacquesm wrote:
               | And this is the heart of the problem: a lot of times
               | these decisions are made by people who shouldn't be
               | making them or they aren't made at all, they are just
               | made by default without bring the fact that a decision is
               | required to the level of scrutiny normally associated
               | with such decisions.
               | 
               | This has killed quite a few otherwise very viable
               | companies, it is fine to take risks as long as you do so
               | consciously and with full approval of all stakeholders
               | (or at least: a majority of all stakeholders).
               | Interesting effects can result: a smaller investor may
               | demand indemnification, then one by one the others also
               | want that indemnification and ultimately the decision is
               | made that the risk is unacceptable anyway (I've seen this
               | play out), other variations are that one shareholder ends
               | up being bought out because they have a different risk
               | appetite than the others.
        
           | brmgb wrote:
           | "Go bankrupt" is not a plan. Becoming insolvent might be the
           | end result of a situation but it's not going to help you deal
           | with it.
           | 
           | Let's take an example which might lead to bankruptcy. A
           | typical answer to a major disaster (let's say your main and
           | sole building burning as a typical case) for an SME would be
           | to cease activity, furlough employes and stop or defer every
           | payments you can while you claim insurance and assess your
           | options. Well, none of these things are obvious to do
           | especially if all your archive and documents just burnt. If
           | you think about it (which you should), you will quickly
           | realise that you at least need a way to contact all your
           | employes, your bank and your counsel (which would most likely
           | be the accountant certifying your results rather than a
           | lawyer if you are an SME in my country) offsite. That's the
           | heart of disaster planning: having solutions at the ready for
           | what was easy to foresee so you can better focus on what
           | wasn't.
        
             | Twisell wrote:
             | Hence the "final layer" statement.
             | 
             | Bankruptcy when dealt with correctly is a process not an
             | end.
             | 
             | If everything else fail it's better to fill for bankruptcy
             | when there is still something to recover with help of
             | others than to burn everything to ashes because of your
             | vanity.
             | 
             | At least that's how I understood parent's comment.
        
               | Sanzig wrote:
               | As a quick interlude, since this may be confusing to non-
               | US readers: bankruptcy in the United States in the
               | context of business usually refers to two concepts,
               | whereas in many other countries it refers to just one.
               | 
               | There are two types of bankruptcies in the US used most
               | often by insolvent businesses: Chapter 7, and Chapter 11.
               | 
               | A Chapter 7 bankruptcy is what most people in other
               | countries think of when they hear "bankruptcy" - it's the
               | total dissolution of a business and liquidiation of its
               | assets to satisfy its creditors. A business does not
               | survive a Chapter 7. This is often referred to as a
               | "bankruptcy" or "liquidation" in other countries.
               | 
               | A Chapter 11 bankruptcy, on the other hand, is a process
               | by which a business is given court protection from its
               | creditors and allowed to restructure. If the creditors
               | are satisfied with the reorganisation plan (which may
               | include agreeing to change the terms of outstanding
               | debts), the business emerges from Chapter 11 protection
               | and is allowed to continue operating. Otherwise, if an
               | agreement can't be reached, the business may end up in
               | Chapter 7 and get liquidated. Most countries have an
               | equivalent to a Chapter 11, but the name for it varies
               | widely. For example, Canada calls it a "Division 1
               | Proposal," Australia and the UK call it "administation,"
               | and Ireland calls it "examinership."
               | 
               | Since there's a lot of international visitors to HN I
               | just thought I'd jump in and provide a bit of clarity so
               | we can all ensure we're using the same definition of
               | "bankruptcy." A US Chapter 7 bankruptcy is not a plan,
               | it's the game over state. A US Chapter 11 bankruptcy, on
               | the other hand, can definitely be a strategic maneuver
               | when you're in serious trouble, so it can be part of the
               | plan (hopefully far down the list).
        
               | breakfastduck wrote:
               | This helps a lot, thanks. I think most people
               | international would assume bankruptcy = game over.
        
               | brmgb wrote:
               | > Bankruptcy when dealt with correctly is a process not
               | an end.
               | 
               | Yes, that's why "Go bankrupt" is _not_ a plan which was
               | the entire point of my reply. That 's like saying that
               | your disaster recovery plan is "solve the disaster".
        
             | corty wrote:
             | Going bankrupt is a plan. However, it is a somewhat more
             | involved one than it sounds, at first. That's why there
             | should be a corporate lawyer advising on stuff like company
             | structure, liabilities, continuance of pension plans,
             | ordering and reasons for layoffs, etc.
        
             | dragonwriter wrote:
             | > "Go bankrupt" is not a plan.
             | 
             | Yes it is. (Though it's better, as GP suggested, as a final
             | layer of a plan and not the only layer.)
             | 
             | > Becoming insolvent might be the end result of a situation
             | but it's not going to help you deal with it.
             | 
             | Insolvency isn't bankruptcy. Becoming insolvent is a
             | consequence, sure. Bankruptcy absolutely does help you deal
             | with that impact, that's rather the point of it.
        
           | physicsguy wrote:
           | It's not quite that simple, the data you might have may be
           | needed for compliance or regulatory reasons. Having no backup
           | strategy might make you personally liable depending on the
           | country!
        
         | shog_hn wrote:
         | Self-hosted Kuberenetes and a FreeNAS storage system at home,
         | and a couple of VMs in the cloud. I've got a mixed strategy,
         | but it covers everything to remote locations.
         | 
         | I use S3 API compatible object storage platforms for remote
         | backup. E.g. BackBlaze B2. I wrote about my backup scripts for
         | FreeNAS (jail that runs s3cmd to copy files to B2) here:
         | https://www.shogan.co.uk/cloud-2/cheap-s3-cloud-backup-with-...
         | 
         | For Kuberenetes I use velero which can be configured with an S3
         | storage backend target:
         | https://www.shogan.co.uk/kubernetes/kubernetes-backup-on-ras...
        
         | vbsteven wrote:
         | In my case:
         | 
         | * All my services are dockerized and have gitlab pipelines to
         | deploy on a kubernetes cluster (RKE/K3s/baremetal-k8s)
         | 
         | * git repo's containing the build scripts/pipelines are
         | replicated on my gitlab instance and multiple work computers
         | (laptop & desktop)
         | 
         | * Data and databases are regularly dumped and stored in S3 and
         | my home server
         | 
         | * Most of the infrastructure setup (AWS/DO/Azure, installing
         | kubernetes) is in Terraform git repositories. And a bit of
         | Ansible for some older projects.
         | 
         | Because of the above, if anything happens all I need to restore
         | a service is a fresh blank VM/dedicated machine or a cloud
         | account with a hosted Kubernetes offering. From there it's just
         | configuring terraform/ansible variables with the new hosts and
         | executing the scripts.
        
           | jrib wrote:
           | How often do you test starting from a clean slate?
        
             | kuschku wrote:
             | I have a similar setup: I recreate everything for every
             | major kubernetes update.
        
         | traveler01 wrote:
         | Most people/companies don't have money to setup those disaster
         | plans. They need you to have a similar server ready to go and
         | also a backuo solution like Amazon S3.
         | 
         | I was affected, my personal VPS is safe but down and other VPS
         | I was managing I don't know anything about. I have the backups
         | and right now I'd love for them to just set me up a new VPS so
         | I can restore the backups and restore the services.
        
           | rainmaking wrote:
           | Spin it up now and refuse to pay for the old one later.
        
         | cbozeman wrote:
         | > What's YOUR Disaster Recovery Plan?
         | 
         | Prayer and hope, usually.
        
         | temp8964 wrote:
         | Do you have access to your control panel when the servers are
         | down?
        
         | Cthulhu_ wrote:
         | Personal: I run a webserver for some website (wordpress +
         | xenforo), I've set up a cronjob that creates a backup of
         | /var/www, /etc and a mysql database dump, then uploads it to an
         | S3 bucket (with automatic Glacier archiving after X period set
         | up). It should be fairly straightforward to rent a new server
         | and set things back up. I still dislike having to set up a
         | webserver + php manually though, I don't get why that hasn't
         | been streamlined yet.
         | 
         | My employer has a single rack of servers at HQ. It's positioned
         | at a very specific angle with an AC unit facing it, their exact
         | positions are marked out on the floor in tape. The servers
         | contain VMs that most employees work on, our git repository,
         | issue trackers, and probably customer admin as well. They say
         | they do off-site backups, but honestly, when (not if) that
         | thing goes it'll be a pretty serious impact on the business.
         | They don't like people keeping their code on their take-home
         | laptop either (I can't fathom how my colleagues work and how
         | they can stand working in a large codebase using barebones vim
         | over ssh), but I've employed some professional disobedience
         | there.
        
           | catbuttes wrote:
           | Have you considered writing an ansible playbook to set all
           | that up? You could even have it pull down the backup and do a
           | full restore for you...
        
         | walrus01 wrote:
         | Video of the aftermath:
         | 
         | https://twitter.com/PompiersFR/status/1369544801817944071
         | 
         | https://twitter.com/abonin_DNA
        
         | ahmedalsudani wrote:
         | My disaster recovery plan: we shall rebuild.
        
           | edoceo wrote:
           | F
        
           | julianwachholz wrote:
           | Finally we get to rewrite everything from scratch!
        
             | philpem wrote:
             | The spice will flow, and the tech debt will go!
        
         | SergeAx wrote:
         | I don't have to "backup servers" for a long time now. I have an
         | Ansible playbook to deploy and orchestrate services, which, in
         | turn, are mostly dockerized. So my recovery plan is to turn on
         | "sorry, maintenance" banner via CDN, spin up a bunch of new
         | VPSes, run Ansible scenario for deployment and restore database
         | from hidden replica or latest dump.
        
           | dboreham wrote:
           | > restore database from hidden replica or latest dump
           | 
           | You do have backup servers.
        
             | [deleted]
        
             | jaywalk wrote:
             | He said he doesn't have _to_ backup servers, not that he
             | doesn 't have backup servers.
        
         | el-salvador wrote:
         | Pictures from the fire:
         | 
         | https://www.dna.fr/amp/faits-divers-justice/2021/03/10/stras...
        
           | siod wrote:
           | This looks like yet another aluminium composite panel fire...
        
           | rkachowski wrote:
           | is that entirely cargo containers? is that common for a data
           | center?
        
             | Jon_Lowtek wrote:
             | No, SBG2 was a building in the "tower design", as is SBG3
             | behind it. The container in the foreground are SBG1 from
             | the time when OVH didn't know if Strassburg is going to be
             | a permanent thing.
        
             | jfrunyon wrote:
             | https://en.wikipedia.org/wiki/Modular_data_center
             | 
             | Container DCs were a big thing for a while. Even Google did
             | a whole PR thing about how they used them.
        
               | donalhunt wrote:
               | Funnily enough, I think it was the fire risk that caused
               | them to ditch the idea and move to their current design.
               | Though I know modular design is highly likely to be used
               | by all players as edge nodes spring up worldwide.
        
               | jeffbee wrote:
               | It was also that the container had literally no
               | advantages. It was just a meme that did not survive
               | rational analysis. The building in which the datacenter
               | is located is the simplest, cheapest part of the design.
               | Dividing it up into a bunch of inconveniently-sized
               | rectangles solves nothing.
        
           | pulse7 wrote:
           | Uff... it looks like half of the containers on this picture
           | were on fire...
        
             | jannes wrote:
             | Are you making a joke about docker containers or am I
             | missing something?
        
               | dgellow wrote:
               | Part of OVH datacenters are literal, physical containers
               | with racks, power supply, vents, etc.
               | 
               | You can see more details here: https://baxtel.com/data-
               | center/ovh-strasbourg-campus
        
               | Mauricebranagh wrote:
               | Do the containers have fire suppressant systems
               | installed?
        
             | [deleted]
        
           | simonke wrote:
           | Without AMP: https://www.dna.fr/faits-divers-
           | justice/2021/03/10/strasbour...
        
           | jacquesm wrote:
           | Nobody hurt. That's a bit of good news.
        
           | burmanm wrote:
           | And here's some more from firefighters, while it was burning:
           | 
           | https://twitter.com/xgarreau/status/1369559995491172354
           | 
           | Looks glowing red to me.
        
         | kijin wrote:
         | One of my backup servers used to be in the same datacenter as
         | the primary server. I only recently moved it to a different
         | host. It's still in the same city, though, so I'm considering
         | other options. I'm not a big fan of just-make-a-tarball-of-
         | everything-and-upload-it-to-the-cloud backup methodology, I
         | prefer something a bit more incremental. But with Backblaze B2
         | being so cheap, I might as well just upload tarballs to B2. As
         | long as I have the data, the servers can be redeployed in a
         | couple of hours at most.
         | 
         | The SBG fire illustrates the importance of geographical
         | redundancy. Just because the datacenters have different numbers
         | at the end doesn't mean that they won't fail at the same time.
         | Apart from a large fire or power outage, there are lots of
         | things that can take out several datacenters in close vicinity
         | at the same time, such as hurricanes and earthquakes.
        
           | sebmellen wrote:
           | Duplicity is your best bet for incremental backups using B2.
           | I use this for my personal server and it works brilliantly.
        
             | gingerlime wrote:
             | I thought so too for a long while. Until I was trying to
             | restore something (just to test things), and wasn't able
             | to... it might have been specific to our GPG or an older
             | version or something... but I decided to switch to restic
             | and am much happier now.
             | 
             | Restic has a single binary that takes care of everything.
             | It feels more modern and seems to work really well. Never
             | had any issue restoring from it.
             | 
             | Just one data point. Stick to whatever works for you. But
             | important to test not only your backups, but also restores!
        
               | remram wrote:
               | I've been using Duplicati forever. The fact that it's C#
               | is a bit of a pain (some distros don't have recent Mono),
               | but running it in Docker is easy enough. Being able to
               | check the status of backups and restore files from a web
               | UI is a huge plus, so is the ability to run the same app
               | on all platforms.
               | 
               | I've found duplicity to be a little simplistic and
               | brittle. Purging old backups is also difficult, you
               | basically have to make a full backup (i.e. non-
               | incremental) before you can do that, which increases
               | bandwidth and storage cost.
               | 
               | Restic looks great feature-wise, but still feels like the
               | low-level component you'd use to build a backup system,
               | not a backup system in itself. It's also pre-1.0.
        
               | sebmellen wrote:
               | Interesting, I will check Restic out, I've heard other
               | good things about it. Duplicity is a bit of a pain to set
               | up and Restic's single binary model is more
               | straightforward (Go is a miracle). Thanks for the
               | recommendation!
               | 
               | GPG is a bit quirky but I do regularly check my backups
               | and restores (if once every few months counts as
               | regular).
        
               | nix23 wrote:
               | +1 for Restic
               | 
               | It's brilliant, works like a charm on freebsd windows and
               | a rpi with linux since over 2 years.
        
             | geocrasher wrote:
             | I'm using rclone, it works very well for the purpose too.
        
               | nucleardog wrote:
               | Ditto. Moved to rclone after having a bunch of random
               | small issues with Duplicity that on their own weren't
               | major but made me lose faith in something that's going to
               | be largely operating unsupervised except for a monthly
               | check-in.
        
             | uncledave wrote:
             | I'd stay away from duplicity. I've had serious problems
             | with it and large inode counts where it'll bang the CPU at
             | 100% and never complete.
             | 
             | Have moved to using rdiff-backup over SSH.
        
           | bombcar wrote:
           | I've taken to uploading via rsync or similar entire copies -
           | as tarballs use the whole bandwidth each time but rsync on
           | files brings only the changes.
        
             | Datagenerator wrote:
             | One up for rclone, it's parallel and supports many
             | endpoints.
        
             | megous wrote:
             | I use tarballs because it allows me to not trust the backup
             | servers. ssh is set up such that backup server's ssh keys
             | are certified to only run a command that will allow them to
             | run a backup script that will just return the encrypted
             | data, and nothing else.
             | 
             | It's very easy to use spare storage in various places to do
             | backups this way, as ssh, gpg and cron are everywhere, and
             | you don't need to install any complicated backup solutions
             | or trust the backup storage machines much.
             | 
             | All you have to manage centrally is private keys for backup
             | encryption, and CA for signing the ssh keys + some
             | occasional monitoring/tests.
        
             | dylan604 wrote:
             | Can't you add only changes to a tar?
        
               | teddyh wrote:
               | Indeed it does; see the --listed-incremental and
               | --incremental options:
               | 
               | https://www.gnu.org/software/tar/manual/tar.html#Incremen
               | tal...
        
           | iamd3vil wrote:
           | Another option for incremental backups is Restic [0]. It has
           | support to backup to Backblaze B2, Amazon S3 and lots of
           | other places.
           | 
           | [0] https://restic.net/
        
             | [deleted]
        
           | paulmd wrote:
           | > I'm not a big fan of just-make-a-tarball-of-everything-and-
           | upload-it-to-the-cloud backup methodology, I prefer something
           | a bit more incremental.
           | 
           | pretty much a textbook use-case for zfs with some kind of
           | snapshot-rolling utility. Snap every hour, send backups once
           | a day, prune your backups according to some timetable.
           | Transfer as incrementals against the previous stored
           | snapshot. Plus you get great data integrity checking on top
           | of that.
           | 
           | "but linus said..."
        
             | raverbashing wrote:
             | You can do the same with Ext4
        
               | geococcyxc wrote:
               | Ext4 has no snapshot feature, do you mean with lvm?
        
               | raverbashing wrote:
               | Yes LVM, sorry
        
             | nix23 wrote:
             | >"but linus said..."
             | 
             | Yes i still don't understand him, a he calls himself a
             | "filesystem guy". Also i don't understand that no one ever
             | mentions NILFS2.
        
         | dave_sullivan wrote:
         | I am literally in SBG2 so that has been fun.
         | 
         | Turns out, our disaster recovery plan is pretty good.
         | 
         | Datacenter burned down and I still was up 4 hours later in
         | another data center with zero data loss. Good times.
        
           | sebastianconcpt wrote:
           | And how's SBG1 doing?
        
             | BadBadJellyBean wrote:
             | About 1/3 destroyed
        
         | jfrunyon wrote:
         | Servers are at a mix of "cloud" providers, and on-site. Most
         | data (including system configs!) is backed up on-site nightly,
         | and to B2 nightly with historical copies - and critical data is
         | also live-replicated to our international branches. (Some "meh"
         | data is backed up only to B2, like our phone logs; we can get
         | most of the info from our carrier anyway).
         | 
         | Our goal and the reason we have a lot of stuff backed up on-
         | prem is to have our most time-critical operations back up
         | within a couple of hours - unless the building is destroyed, in
         | which case that's a moot point and we'll take what we can get.
         | 
         | A dev wiped our almost-monolithic
         | sales/manufacturing/billing/etc MySQL database a month or two
         | ago. (I have been repeatedly overruled on the topic of taking
         | access to prod away from devs) We were down for around an hour.
         | Most of that time was spent pulling gigs of data out of the
         | binlog without also wiping it all again. Because our nightly
         | backups had failed a couple weeks prior - after our most recent
         | monthly "glance at it".
        
         | jacquesm wrote:
         | It's true: most companies do not have a disaster recovery plan,
         | and many of them confuse a breach protocol with a disaster
         | recovery plan ('we have backups').
         | 
         | Fires in DCs aren't rare at all, I know of at least three, one
         | of those in a building where I had servers. This one seems to
         | be worse than the other two. Datacenters tend to concentrate a
         | lot of flammable stuff, throws a ton of current through them
         | and does so 24x7. The risk of a fire is definitely not
         | imaginary, which is why most DCs have fire suppression
         | mechanisms. Whether those work as advertised depends on the
         | nature of the fire. An exploding on prem transformer took out a
         | good chunk of EV1's datacenter in the early 2000's, and it
         | wasn't so much the fire that caused problems for their
         | customers, but the fact that someone got injured (or even died,
         | I don't recall exactly), and before the investigation was
         | completed and the DC released to the owners again took a long
         | time.
         | 
         | Being paranoid and having off-site backups is what allowed us
         | to be back online before the fire was out. If not for that I
         | don't know if our company would have survived.
        
         | tothrowaway wrote:
         | I'm at OVH as well (in the BHS datacenter, fortunately). I run
         | my entire production system on one beefy machine. The apps and
         | database are replicated to a backup machine hosted with Hetzner
         | (in their Germany datacenter). I also run a tiny VM at OVH
         | which proxies all traffic to Hetzner. I use a failover IP to
         | point at the big rig at OVH. If the main machine fails, I move
         | the failover IP to the VM, which sends all traffic to Hetzner.
         | 
         | If OVH is totally down, and the fail over IP doesn't work, I
         | have a fairly low TTL on the DNS.
         | 
         | I backup the database state to S3 every day.
         | 
         | Since I'm truly paranoid, I have an Intel NUC at my house that
         | also replicates the DB. I like knowing that I have a complete
         | backup of my entire business within arm's reach.
        
           | eb0la wrote:
           | Are you truly paranoid?
           | 
           | If my money and/or job depended on having something running
           | without (or with minimal) disruption I would be as paranoid
           | as you, too.
           | 
           | BTW - Some people call this business recovery plan, not plain
           | paranoia ;-)
        
             | michaelt wrote:
             | Enterprise-level projects often have only light protection
             | against wrongful hosting account termination, reasoning
             | that spending a lot of money and having an account manager
             | keeps them safe from clumsy automated systems.
             | 
             | So they might have their primary and replica databases at
             | different DCs from the same hosting provider, and only
             | their nightly backup to a different provider. Four copies
             | to four different providers is a step above three copies
             | with two providers!
             | 
             | A large enterprise would probably be using a filesystem
             | with periodic snapshots, or streaming their redo log to a
             | backup, to protect against a fat-fingered DBA deleting the
             | wrong thing. Of course, filesystem snapshots provide no
             | protection against loss of DC or wrongful hosting account
             | termination, so you might not count them as true backup
             | copies.
        
               | numbsafari wrote:
               | This is why you should have a "Cloud 3-2-1" backup plan.
               | Have 3 copies of your data, two with your primary
               | provider, and 1 with another.
               | 
               | e.g., if you are an AWS customer, have your back ups in
               | S3 and use simple replication to sync that to either GCS
               | or Azure, where you can get the same level of compliance
               | attestation as from AWS.
        
             | buran77 wrote:
             | It's not paranoia if you're right. All of the risks GP is
             | protecting against are things that happen to someone every
             | day, and they should be seen like wearing the seat belt in
             | a car.
        
               | klingon78 wrote:
               | I have a reliability and risk avoidance mindset, but I've
               | had to stand back because my mental gas tank for trying
               | to keep things going is near empty.
               | 
               | I've really struggled working with others that either are
               | both ignorant and apathetic about the business's ability
               | to deal with risk or believe that it's their job to keep
               | putting duct tape over the duct tape that breaks multiple
               | times a day while users struggle.
               | 
               | I like seeing these comments reminding others to a wear
               | seat belt or have backups for their backups, but I don't
               | know whether I should care more about reliability. I work
               | in an environment that's a constant figurative fire.
               | 
               | I also like to spend time with my family. I know it's
               | just a job, and it would be even if I were the only one
               | responsible for it; that doesn't negate the importance of
               | reliability, but there is a balance.
               | 
               | If you are dedicated to reliability, don't let this deter
               | you. Some have a full gas tank, which is great.
        
               | CraigJPerry wrote:
               | This resonates with me. I notice my gas tank rarely
               | depletes because of technology. It doesn't matter how
               | brain dead the 00's oracle forms app with absurd
               | unsupported EDI submission excel thinga-ma-bob that
               | requires a modem ... <fill in the rest of the dumspter
               | fire as your imagination deems>. Making a tech stack safe
               | is a fun challenge.
               | 
               | Apathetic people though, that can be really tough going.
               | It's just that way "because". Or my favourite "oh we
               | don't have permission to change that", how about we make
               | the case and get permission? __horrified looks__
               | sometimes followed by pitch forks.
        
               | buran77 wrote:
               | Reliability is there to keep your things running smoothly
               | during normal operations. Backups are there for when you
               | reach the end of your reliability rope. Neither is really
               | a good replacement for the other. The most reliable
               | systems will still fail eventually, and the best of
               | backups can't run your day to day operations.
               | 
               | At the end of the day you have a budget (of any kind) and
               | a list of priorities on which to spend it. It's up to you
               | or your management to set a reasonable budget, and to set
               | the right priorities. If they refuse, leave or you'll
               | just burn the candle at both ends and just fade out.
        
               | klingon78 wrote:
               | Backups are a reliability tool, yes.
               | 
               | A backup on its own is of little worth if unused.
               | 
               | When a backup is used to re-enable something, then the
               | amount of time disabled may be decreased. When it is,
               | this is reliability- we keep things usable and in
               | function, more than not.
        
               | jt2190 wrote:
               | Consider:
               | 
               | > ... [F]inance is fundamentally about moving money and
               | risk through a network. [1]
               | 
               | Your employer has taken on many, many risks as part of
               | their enterprise. If _every_ risk is addressed the
               | company likely can't operate profitably. In this context,
               | your business needs to identify every risk, weigh the
               | likelihood and the potential impact, decide whether to
               | address or accept the risk, and finally, if they decide
               | to address the risk, whether to address it in-house our
               | outsource it.
               | 
               | You've identified a risk that is currently being
               | "accepted" by your employer, one that you'd like to
               | address in-house. Perhaps they've taken on the risk
               | unintentionally, out of ignorance.
               | 
               | As a professional the best I can do is to make sure that
               | the business isn't ignorant about the risk they've taken
               | on. If the risk is too great I might even leave. Beyond
               | that I accept that life is full of risks.
               | 
               | [1] Gary Gensler, "Blockchain and money", Introduction
               | https://ocw.mit.edu/courses/sloan-school-of-
               | management/15-s1...
        
           | venj wrote:
           | are your domains at ovh too ? If yes, I'd consider changing
           | this: this morning the manager was quite flooded and the DNS
           | service was down for some time...
        
           | notaharvardmba wrote:
           | Bacula has some really cool features for cloud backups.
           | 
           | https://bacula.org
        
             | yongjik wrote:
             | Ow what an unfortunate name.
             | 
             | https://en.wikipedia.org/wiki/Baculum
        
               | schoen wrote:
               | It just means "stick" in the original Latin!
        
               | 867-5309 wrote:
               | but fortunately also within arm's reach
        
               | waheoo wrote:
               | LGTM
        
           | johnchristopher wrote:
           | It seems your setup follows the three rules back-ups with at
           | least two in different physical location.
           | https://www.nakivo.com/blog/3-2-1-backup-rule-efficient-
           | data...
        
           | extrasolar wrote:
           | this is the way. I do the same. Not paranoid at all.
        
           | vanviegen wrote:
           | Are you me, by any chance? :-)
           | 
           | I also run our entire production system on one beefy machine
           | at OVH, and replicate to a similar machine at Hetzner. In
           | case of a failure, we just change DNS, which has a 1 hour
           | TTL. We've needed to do an unplanned fail-over only once in
           | over 10 years.
           | 
           | And like you, I have an extra replica at the office, because
           | it feels safe having a physical copy of the data literally at
           | hand.
        
             | ta988 wrote:
             | Same but with a regular offline physical copy (cheap nas).
             | One of my worries is a malicious destruction of the backups
             | if anything worms its way in my network
        
               | sandworm101 wrote:
               | Which is why "off" is still a great security tool. A copy
               | on a non-powered device, even if that device is attached
               | to the network, is immune to worms. There is something to
               | be said for a NAS solution that requires a physical act
               | to turn on and perform an update.
        
               | extrasolar wrote:
               | hetzner has storage boxes and auto snapshots. so even if
               | someone deletes the backups remotely there are still
               | snapshots which they can't get unless they have control
               | panel access.
        
             | 7ewis wrote:
             | Not done any research into it, but I always thought OVH was
             | supposed to be a very budget VPS service primarily for
             | personal use rather than business. Although thought it was
             | akin to having a Raspberry Pi plugged in at home.
             | 
             | Again, I may be completely wrong but why would you not use
             | AWS/GCP? Even if it's complexity, Amazon have Lightsail, or
             | if it's cost I thought DigitalOcean was one of the only
             | reputable business-grade VPS providers.
             | 
             | I just can't imagine many situations where a VPS would be
             | superior to embracing the cloud and using cloud functions,
             | containers, instances with autoscaling/load balancers etc.
        
               | Symbiote wrote:
               | OVH is a European equivalent to Digital Ocean.
               | 
               | It has twice the revenue, and is the third largest
               | hosting provider in the world.
        
               | Thaxll wrote:
               | I would call it the EU Rackspace version. Except that
               | it's not like insane Rackspace price.
        
               | nixgeek wrote:
               | Twice the revenue of DigitalOcean still puts it < $1B
               | ARR, or am I missing something? I can't see how that's
               | the third largest in the world, or does your definition
               | of "hosting provider" exclude clouds?
        
               | Symbiote wrote:
               | I took it from the top of their Wikipedia page.
               | 
               | In any case, they aren't "primarily for personal use".
               | 
               | https://en.wikipedia.org/wiki/OVH
        
               | olivierduval wrote:
               | OVH STARTED as a budget VPS service some 20 years ago...
               | but they grew a lot since 6-7 years, adding more "cloud"
               | services and capabilities, even not on par with the main
               | players...
               | 
               | Why not use AWS/GCP? From my personal point of view: as a
               | French citizen, I'm more and more convinced that I can't
               | completly trust the (US) big boys for my own safety.
               | Trump showed that "US interest" is far more important
               | than "customer interest" or even "ally interest". And
               | moreover, Google is showing quite regurlaly that it's not
               | a reliable business partner (AWS look better for this).
        
               | extrasolar wrote:
               | price, also elaborate hardware customizations are not
               | possible and then you are still running on hypervisor vs
               | baremetal.
        
               | posix_me_less wrote:
               | > Google is showing quite regurlaly that it's not a
               | reliable business partner
               | 
               | Interesting, any examples?
        
               | sergiosgc wrote:
               | I'm not the OP, but I'd imagine it is the combination of
               | no support line and algorithmic suspension of business
               | accounts. It is a relevant risk.
        
               | olivierduval wrote:
               | Yeah, I was thinking about all the horror stories that
               | can be found on this site.
               | 
               | As a customer (or maybe an "involontary data provider"),
               | I do as much as I can to avoid Google to be my SPOF, not
               | technically (it's really technically reliable) but on the
               | business side. I had to setup my own mail server just to
               | avoid any risk of google-ban for example... just in case.
               | I won't use Google authentificator for the same reason.
               | I'm happy to have left Google Photos some years ago, to
               | avoid problems of Google shutting it down. And the list
               | could go on...
               | 
               | As a business, I like to program Android apps but the
               | Google Store is really a risk too. Risk to have any
               | Google account blacklisted because some algorithm thought
               | I did something wrong. And no appeal.
               | 
               | Maybe all this doesn't apply to GCP customers. Maybe GCP
               | customers have a human direct line, with someone to
               | really help and the capacity to do it. Or maybe it's just
               | Google: as long as it work, enjoy. If it doesn't, go to
               | (algorithmic) hell.
        
               | ev1 wrote:
               | OVH is one of the largest providers in the world. They
               | run a sub brand for personal use (bare metal for $5/m,
               | hardware replacements in 30 min or less usually).
               | 
               | ..and they do support all of those things you just
               | listed, not just API-backed bare metal.
        
               | treesknees wrote:
               | Is that a typo? I only see OVH bare metal starting at
               | >$50. How could a provider offer a bare metal server for
               | $5?
        
               | tecleandor wrote:
               | Kimsufi has tiny cheap Atom servers with self built or
               | made to order racks and hardware:
               | 
               | (This is 2011, I think it looks fancier now)
               | 
               | https://lafibre.info/ovh-datacenter/data-center-ovh-
               | roubaix-...
               | 
               | Edit: Seems like they stopped publishing videos for that
               | datacenter, but this seems to be a video for the burn
               | down datacenter in 2013:
               | https://www.youtube.com/watch?v=Y47RM9zylFY
        
               | ev1 wrote:
               | OVH has new servers.
               | 
               | Their sub-brand soyoustart has older servers (that are
               | still perfectly fine), roughly E3 Xeon/16-32GB/3x2TB to
               | 4x2TB for $40/m ex vat.
               | 
               | Their other sub brand kimsufi for personal servers has
               | Atom low-power bare metal with 2TB HDD (in reality it is
               | advertised 500GB/1TB, but they don't really have any of
               | those in stock left, if your drive fails they replace it
               | with a 2T - so far this has been my exp) for $5.
               | 
               | All of this is powered by automation, you don't really
               | get any support and you are expected to be competent. If
               | your server is hacked you get PXE-rebooted into a rescue
               | system and can scp/rsync off your contents before your
               | server is reinstalled. OS installs, reboots, provisioning
               | are all automated, there's essentially no human contact.
               | 
               | PS: Scaleway, in Paris, used to offer $2 bare metal
               | (ultra low voltage, weaker than an Atom, 2GB ram), but
               | pulled all their cheap machines, raised prices on
               | existing users, and rebranded as enterprisey. The offer
               | was called 'kidechire'
               | 
               | --
               | 
               | It is kind of interesting that on the US side everyone is
               | in disbelief, or like "why not use AWS" - while most of
               | the European market knows of OVH, Hetzner, etc.
               | 
               | My own reason for using OVH? It's affordable and I would
               | not have gotten many projects (and the gaming community I
               | help out with) off the ground otherwise. I can rent bare
               | metal with NVMe, and several terabytes of RAM for less
               | than my daily wage for the whole month, and not worry
               | about per-GB billing or attacks. In the gaming world you
               | generally do not ever want to use usage based billing -
               | made the mistake of using Cloudfront and S3 once and
               | banned script kiddies would wget-loop the largest
               | possible file from the most expensive region botnet
               | repeatedly in a money-DoS.
               | 
               | I legitimately wouldn't have been able to do my "for-fun-
               | and-learning" side projects (no funding, no accelerator
               | credits, ...) without someone like them. The equivalent
               | of a digitalocean $1000/m VM is about $100 on OVH.
        
               | iagovar wrote:
               | I like scalingo too. If you need a bit more, they have
               | DBaaS, APP Containers and Networking.
        
               | ev1 wrote:
               | Scalingo is EUR552.96/m for 16GB of memory.
               | 
               | 32c xeon/256GB ECC/500GB SSD 8TB HDD is $100/m at OVH.
               | The difference is amusing.
        
               | yannski wrote:
               | you're comparing a PaaS with a piece of hardware. It's
               | absolutely not comparable.
               | 
               | Yann, CEO at Scalingo
        
               | sudosysgen wrote:
               | It's not a typo. OVH runs Kimsufi, which has bare metal
               | servers for as low as 5$. It is pretty insane.
        
               | treesknees wrote:
               | Thank you. TIL!
        
               | veeti wrote:
               | AWS is a total and utter ripoff compared to the
               | price/performance, DDoS protection & unmetered bandwidth
               | provided by OVH.
        
               | api wrote:
               | If all you need is compute, storage, and a pipe, all the
               | big cloud providers are a total ripoff and you should
               | look elsewhere. The big ones only make sense if you are
               | leveraging their managed features or if you need extreme
               | elasticity with little chance of a problem scaling up in
               | real time.
               | 
               | OVH is one of the better deals for bare metal, but there
               | are even better ones for bandwidth. You have to shop
               | around a lot.
               | 
               | Also be sure you have a recovery plan... even with the
               | big providers. These days risks include not only physical
               | stuff but some stupid bot shutting you off because it
               | thinks you violated TOS or is reacting to a possibly
               | malicious complaint.
               | 
               | We had a bot at AWS gank some test systems once because
               | it thought we were cryptocurrency mining with free
               | credits. We weren't, but we were doing very CPU intensive
               | testing. I've heard of this and worse happening
               | elsewhere. DDOS detector and IDS bots are particularly
               | notorious.
        
               | extrasolar wrote:
               | no, OVH has dedicated servers, lot of big companies use
               | it to build out private clouds, much cheaper then amazon
               | or google
        
               | omnimus wrote:
               | You cant imagine it yet big chunk of the independent
               | internet runs on small vps servers. There isnt much
               | difference between DO and OVH, Hetzner, Vultr, Linode...
               | not sure why DO would be better. I mean its US company
               | doing marketing right. Thats the difference. Plus
               | ovh/hetzner have only EU locations.
               | 
               | I think small bussinesses like smaller simple providers
               | instead of bigclouds. Its different philosophy if you are
               | afraid of extreme centralisation of internet it makes
               | sense.
        
               | Sanzig wrote:
               | OVH has at least one large North American datacenter in
               | Beauharnois, located just south of Montreal. I've used
               | them before for cheap dedicated servers. They may have
               | others.
        
               | omnimus wrote:
               | Yes ididnt know and i was generalizing too much.
               | 
               | But i assume they are less known in US.
        
               | eloff wrote:
               | I can think of a lot of big differences. For one you can
               | get much larger machines at OVH and Hetzner with fancy
               | storage configurations for your database if desired (e.g.
               | Optane for your indices, magnetic drives for your
               | transaction log, and raided SSDs for the tables)
               | 
               | They also don't charge for bandwidth, although some of
               | those other providers have a generous free bandwidth and
               | cheap overage.
        
               | omnimus wrote:
               | So you are saying they might be even better than DO
               | depinding on requirements.
               | 
               | I didnt know.
        
               | eloff wrote:
               | Much cheaper and better performance at the high end.
               | Doesn't compete at all at the low-end, except through
               | their budget brand Kimsufi. I don't see them really as
               | targeting the same market.
        
               | sweeneyrod wrote:
               | I don't know about OVH but Hetzner beats DO at the lower
               | end: for $5/month you get 2 CPUs vs 1, 2 GB RAM vs 1, 40
               | GB disk vs 25 and 20 TB traffic vs 1. They have an even
               | lower-end package for 2.96 Euro/month as well.
        
               | ArchOversight wrote:
               | I rent a server from OVH for $32 a month. It's their So
               | You Start line... doesn't come with fancy enterprise
               | support and the like.
               | 
               | It's a 4 core 8 thread Xeon with 3x 1TB SATA with 32GB of
               | ECC RAM IIRC (E3-SAT-1-32, got it during a sale with a
               | price that is guaranteed as long as I keep renewing it)
               | 
               | The thing is great, I can run a bunch of VM's on it, it
               | runs my websites and email.
               | 
               | Overall to get something comparable elsewhere I would be
               | paying 3 to 4 times as much.
               | 
               | I would consider $50 a month or less low end pricing.
               | -\\_(tsu)_/-
        
               | eloff wrote:
               | Yeah, I forgot they also have the so you start brand.
               | It's probably more expensive than the majority of what
               | digital ocean sells, but there is some overlap for sure.
        
               | mwcampbell wrote:
               | > Optane for your indices
               | 
               | At OVH? If so, their US data centers don't seem to have
               | that option.
               | 
               | Not that I need it. The largest database I run could
               | easily fit in RAM on a reasonably sized dedicated box.
        
               | eloff wrote:
               | I didn't realize they had US datacenters before now. It's
               | possible that's no longer an option. It was on the
               | largest servers in the Montreal datacenter when I specced
               | that out.
        
               | extrasolar wrote:
               | they have 2 data centers in the US
        
               | bleuarff wrote:
               | OVH is not Europe-only, it has datacenters in America,
               | Asia and Australia[1].
               | 
               | [1] https://www.ovh.com/world/us/about-us/datacenters.xml
        
               | sudosysgen wrote:
               | OVH has a location in Canada, now.
        
           | AmericanChopper wrote:
           | > I like knowing that I have a complete backup of my entire
           | business within arm's reach.
           | 
           | It could also provide a burglar a fantastic opportunity to
           | pivot into career in data breaches.
        
             | tgsovlerkhgsel wrote:
             | This problem is usually solved through encryption.
        
               | AmericanChopper wrote:
               | If I were to ask my CISO if I was allowed to bring the
               | production database home, I'm pretty sure his answer
               | wouldn't be "as long as you encrypt it".
        
               | brmgb wrote:
               | That's because he doesn't trust you with this data. That
               | has nothing to do with encryption safety.
               | 
               | There is nothing magical about data centers making them
               | safe while your local copy isn't.
        
               | AmericanChopper wrote:
               | > There is nothing magical about data centers making them
               | safe while your local copy isn't.
               | 
               | Is this a serious comment? My house is not certified as
               | being compliant with any security standards. Here's the
               | list that the 3rd party datacenter we use is certified as
               | complaint with:
               | 
               | https://aws.amazon.com/compliance/programs/
               | 
               | The data centers we operate ourselves are audited against
               | several of those standards too. I guess you're right that
               | there's nothing magic about security controls, but it has
               | nothing to do with trust. Sensitive data should generally
               | never leave a secure facility, outside of particularly
               | controlled circumstances.
        
               | brmgb wrote:
               | Of course it's serious.
               | 
               | You are entierly missing the point by quoting the
               | compliance programs followed by AWS whose sole business
               | is being a third party hoster.
               | 
               | For most business, what you call sensitive data is
               | customers and orders listing, payment history, inventory
               | if you are dealing in physical goods and HR related
               | files. These are not state secrets. Encryption and a
               | modicum of physical security go a long way.
               | 
               | I personally find the idea that you shouldn't store a
               | local backup of this kind of data out of security concern
               | entirely laughable. But that's me.
        
               | AmericanChopper wrote:
               | This is quite a significant revision to your previous
               | statement that there's nothing about a data center that
               | makes it more secure than your house.
               | 
               | This attitude that your data isn't very important, so
               | it's fine to not be very concerned about it's security,
               | while not entirely uncommon, is something most
               | organisations try to avoid when choosing vendors. It's
               | something consumers are generally unconcerned about,
               | until a breach occurs, and The Intercept write an article
               | about it. At which point I'm sure all the people ITT who
               | are saying it's fine to take your production database
               | home would be piling on with how stupid the company was
               | for doing ridiculous things like taking a copy of their
               | production database home.
        
               | brmgb wrote:
               | > This is quite a significant revision to your previous
               | statement that there's nothing about a data center that
               | makes it more secure than your house.
               | 
               | I said there was nothing magical about data centers
               | security, a point I stand with.
               | 
               | It's all about proper storage (encryption) and physical
               | security. Obviously, the physical security of an AWS data
               | center will be tighter that your typical SME but in a way
               | which is of no significance to storing backups.
               | 
               | > This attitude that your data isn't very important
               | 
               | You are once again missing the point.
               | 
               | It's not that your data isn't important. It's that
               | storing it encrypted in a sensible place (and to be clear
               | by that I just mean not lying around - a drawer in an
               | office or your server room seems perfectly adequate to
               | me) is secure enough.
               | 
               | The benefits of having easily available backups by far
               | trump the utterly far fetched idea that someone might
               | break into your office to steal your encrypted backups.
        
               | logifail wrote:
               | > It's that storing it encrypted in a sensible place (and
               | to be clear by that I just mean not lying around - a
               | drawer in an office or your server room seems perfectly
               | adequate to me) is secure enough.
               | 
               | In the SME space some things are "different", and if
               | you've not worked there it can be hard to get one's head
               | around it:
               | 
               | A client of mine was burgled some years ago.
               | 
               | Typical small business, offices on an industrial estate
               | with no residential housing anywhere nearby. Busy in the
               | daytime, quiet as the grave during the night. The
               | attackers came in the wee small hours, broke through the
               | front door (the locks held, the door frame didn't), which
               | must have made quite a bit of noise. The alarm system was
               | faulty and didn't go off (later determined to be a 3rd
               | party alarm installer error...)
               | 
               | All internal doors were unlocked, PCs and laptops were
               | all in plain sight, servers in the "comms room" - that
               | wasn't locked either.
               | 
               | The attacker(s) made a cursory search at every desk, and
               | the _only_ thing that was taken _at all_ was a light
               | commercial vehicle which was parked at the side of the
               | property, its keys had been kept in the top drawer of one
               | of the desks.
               | 
               | The guy who looked after the vehicle - and who'd lost
               | "his" ride - was extremely cross, everyone else (from the
               | MD on downwards) felt like they'd dodged a bullet.
               | 
               | Physical security duly got budget thrown at it - stable
               | doors and horses, the way the world usually turns.
        
               | mikepurvis wrote:
               | But how many of those splashy breaches ended up being
               | because of the off-site backup copy of the database at
               | the CEO's house?
        
               | sneak wrote:
               | Once you're big enough to afford a CISO, you're likely
               | big enough to afford office space with decent physical
               | security to serve as a third replicated database site to
               | complement your two datacenters.
               | 
               | These solutions are not one-size-fits-all. What works for
               | a small startup isn't appropriate for a 100+ person
               | company.
        
               | AmericanChopper wrote:
               | Yes, I agree. Small companies typically are very bad at
               | security.
        
               | sneak wrote:
               | I am really good at security and I too keep encrypted
               | backups on-site in my house.
        
               | _joel wrote:
               | Not in my experience. Worked at some small shops that
               | were lightyears ahead in terms of policy, procedures and
               | attitude compared to places I've worked with 50k+
               | employees globally.
        
               | AmericanChopper wrote:
               | Large organisations tend not to achieve security
               | compliance with overly sophisticated systems of policy
               | and controls. They tend to do it using bureaucracy, which
               | while usually rather effective at implementing the level
               | of control required, will typically leave a lot to be
               | desired in regards to UX and productivity. Small
               | organisations tend to ignore the topic entirely until
               | they encounter a prospective client or regulatory barrier
               | that demands it. At which point they may initially
               | implement some highly elegant systems. Until they grow
               | large enough that they all devolve into bureaucratic
               | mazes.
        
               | _joel wrote:
               | I'm aware, but that's not been my experience. I've been
               | in large places where there's been a lassiez faire
               | attitude because it was "another team's job" and general
               | bikeshedding over smaller features because the bigger
               | picture security wasn't their area or was forced from a
               | dictat from above to use X because they're on the board,
               | whilst X is completely unfit for purpose. There's no
               | pushback. However I've worked at small ISPs where we took
               | security extremely seriously. Appropriate background
               | check and industry policy but moreso the attitude... we
               | wanted to offer customers security because we had pride
               | in our work.
        
               | croon wrote:
               | Well, it's not because the encryption is insecure.
        
             | axaxs wrote:
             | LUKS is your friend.
        
             | Thorrez wrote:
             | But it does protect somewhat against ransomware on the
             | servers.
        
             | dredmorbius wrote:
             | For small firms, CEO / CTO maintaining off-sites at a
             | residence is reasonable and not an uncommon practice. As
             | with all security / risk mitigation practices, there is a
             | balance of risks and costs involved.
             | 
             | And as noted, encrypted backups would be resistant to
             | casual interdiction, or even strongly-motivated attempts.
             | Data loss being the principle risk mitigated by off-site,
             | on-hand backups.
        
             | baggy_trough wrote:
             | It all depends on your paranoia level of data hacking
             | burglars vs. vaporized data centers.
        
         | ethanpil wrote:
         | Does anyone have experience with lvarchive or FSArchiver (or
         | similar) to backup images of live systems instead of file based
         | backup solutions?
        
         | lgeorget wrote:
         | We have two servers at OVH (RBX and GRA, not SBG). I make
         | backups of all containers and VMs every day and keep the last
         | three, plus one one each month. Backups are stored in a
         | separate OVH storage disk and also downloaded to a NAS on-
         | premise. In case of a disaster, we'd have to rent a new server,
         | reprovision the VMs and containers and restore the backups.
         | About two days of work to make sure everything works fine and
         | we could lose about 24 hours of data.
         | 
         | It's not the best in terms of Disaster Recovery Plan but we
         | accept that level of risk.
        
         | neurostimulant wrote:
         | Nothing too crazy, just a simple daily cron to see sync user
         | data and database dumps on our OVH boxes to backblaze and
         | rsync.net. This simple setup is already saved our asses a few
         | times already.
        
         | jbeales wrote:
         | My recovery plan: tarball & upload to Object Store. I'm going
         | to check out exactly how much replication the OVH object store
         | offers, and see about adding a second geographic location, and
         | maybe even a second provider, tomorrow.
         | 
         | (My servers aren't in SBG either - phew!)
        
           | nucleardog wrote:
           | If your primary data is on OVH, I'd look at using another
           | company's object store if feasible (S3, B2, etc). If
           | possible, on another payment method. (If you want to be
           | really paranoid, something issued under another legal
           | entity.)
           | 
           | There's a whole class of (mostly non-technical) risks that
           | you solve for when you do this.
           | 
           | If anything happens with your payment method (fails and you
           | don't notice in time; all accounts frozen for investigation),
           | OVH account (hacked, suspended), OVH itself (sudden
           | bankruptcy?), etc, then at least you have _one_ other copy.
           | It's not stuff that's likely to happen, but the cost of
           | planning for it at least as far as "haven't completely lost
           | all my data even if it's going to be a pain to restore" here
           | is relatively minimal.
        
         | dfsegoat wrote:
         | We test rolling over the entire stack to another AWS DR region
         | (just one we dont normally use) from S3 backups, etc. We do
         | this annually and try to introduce some variations to the
         | scenarios. It takes us about 18 hours realistically.
         | 
         | Documentation / SOPs that have been tested thoroughly by
         | various team members are really important. It helps work out
         | any kinks in interpretation, syntax errors etc.
         | 
         | It does feel a little ridiculous at the time for all the effort
         | involved, but incidents like this show why it's so important.
        
         | juangacovas wrote:
         | Less than a day for disaster recovery on fresh hardware? Same
         | as my case. As you say, good enough for most purposes, but I'm
         | also looking for improvement. I have offsite realtime replicas
         | for data and mariaDBs, and offsite nightly backups (combo of
         | rsnapshot, lsyncd, mariaDB multi-source replication, and a
         | post-new-install script that setups almost everything in case
         | you have to recover on bare-metal, i.e. no available VM
         | snapshots).
         | 
         | Currently trying to reduce that "less than a day" though.
         | Recently discovered "ReaR" (Relax and Recover) from RedHat and
         | sounds really nice for bare-metal servers. Not everybody runs
         | on virtualized/cloud (being able to recover from VM snapshots
         | is really a plus). Let's share experiencies :)
        
         | KronisLV wrote:
         | Here's what i do for my homelab setup that has a few machines
         | running locally and some VPSes "in the cloud":
         | 
         | I personally have almost all of the software running in
         | containers with an orchestrator on top (Docker Swarm in my
         | case, others may also use Nomad, Kubernetes or something else).
         | That way, rescheduling services on different nodes becomes less
         | of a hassle in case of any one of them failing, since i know
         | what should be running and what configuration i expect it to
         | have, as well as what data needs to be persisted.
         | 
         | At the moment i'm using Time4VPS ( affiliate link:
         | https://www.time4vps.com/?affid=5294 ) for the stuff that needs
         | decent availability and because they're cheaper than almost all
         | of the alternatives i've looked at (DigitalOcean, Vultr,
         | Scaleway, AWS, Azure) and that matters to me.
         | 
         | Now, in case the entire data centre disappears, all of my data
         | would still be available on a few HDDs under my desk (which are
         | then replicated to other HDDs with rsync locally), given that i
         | use BackupPC for incremental scheduled backups with rsync:
         | https://backuppc.github.io/backuppc/
         | 
         | For simplicity, the containers also use bind mounts, so all of
         | the data is readable directly from the file system, for
         | example, under /docker (not really following some of the *nix
         | file system layout practices, but this works for me because
         | it's really easy to tell where the data that i want is).
         | 
         | I actually had to migrate over to a new node a while back, took
         | around 30 minutes in total (updating DNS records included).
         | Ansible can also really help with configuring new nodes. I'm
         | not saying that my setup would work for most people or even
         | anything past startups, but it seems sufficient for my
         | homelab/VPS needs.
         | 
         | My conclusions:                 - containers are pretty useful
         | for reproducing software across servers       - knowing exactly
         | which data you want to preserve (such as
         | /var/lib/postgresql/data/pgdata) is also pretty useful, even
         | though a lot of software doesn't really play nicely with the
         | idea       - backups and incremental backups are pretty doable
         | even without relying on a particular platform's offerings,
         | BackupPC is more than competent and buying HDDs is far more
         | cost effective than renting that space       - automatic
         | failover (both DNS and moving the data to a new node) seems
         | complicated, as does using distributed file systems; those are
         | probably useful but far beyond what i actually want to spend
         | time on in my homelab       - you should still check your
         | backups
        
         | sparrish wrote:
         | Got burned once (no pun intended), learned my lesson.
         | 
         | Hot spare on a different continent with replicated data along
         | with a third box just for backups. The backup box gets offsite
         | backups held in a safe with another redundant copy in another
         | site in another safe.
         | 
         | Restores are tested quarterly.
         | 
         | Keep backups of backups. Once bitten, twice shy.
        
           | Twirrim wrote:
           | > Hot spare on a different continent
           | 
           | Just be cautious about data locality laws (not likely to
           | affect you as joe average, more for businesses)
        
             | Mauricebranagh wrote:
             | A few years ago I worked on the British Telecom Worldwide
             | intranet team and we had a matrix mapping various countries
             | encryption laws.
             | 
             | This was so we remained legal in all of the countries BT
             | worked in which required a lot of behind the scenes work to
             | make sure we didn't serve "illegaly encypted" Data.
        
             | bongoman37 wrote:
             | yeah, there's lots of countries with regulations that
             | certain data can't leave the geographical boundary of the
             | country. Often, it is the most sensitive data.
        
               | mike_d wrote:
               | These laws generally don't work how people think they do.
               | 
               | For example, the Russian data residency law states that a
               | copy of the data must be stored domestically, not that it
               | can't be replicated outside the country.
               | 
               | The UAE has poorly written laws that have different
               | regulations for different types of data - including fun
               | stuff like only being subject to specific requirements if
               | the data enters a 270 acre business park in Dubai.
               | 
               | Don't even get me started on storing encrypted data in
               | one country and the keys in another...
        
           | gameshot911 wrote:
           | Have you been bitten, personally? If so, story time?
        
           | pcl wrote:
           | > Restores are tested quarterly.
           | 
           | Probably this is the most important part of your plan. It's
           | not the backup that matters; it's the restore. And if you
           | don't practice it from time to time, it's probably not going
           | to work when you need it.
        
       | neo34 wrote:
       | "Everyone is safe. Fire has destroyed SBG2. A part of SBG1 is
       | destroyed. The firefighters are protecting SBG3. No impact on
       | SBG4". Tweet from Octave Klaba, founder of OVHcloud. "All our
       | clients on this site are possibly impacted"
       | 
       | Tout le monde est sain et sauf. Le feu a detruit SBG2. Une partie
       | de SBG1 est detruite. Les pompiers protegent actuellement SBG3.
       | Pas d'impact sur SBG4 >>, a tweete Octave Klaba, le fondateur
       | d'OVHcloud, en designant les differentes parties du site. << Tous
       | nos clients sur ce centre de donnees sont susceptibles d'etre
       | impactes >>, a precise l'entreprise sur Twitter.
        
       | cbg0 wrote:
       | This event reminded me of the fire at The Planet a while back:
       | https://www.datacenterknowledge.com/archives/2008/06/01/expl...
        
       | archit3cture wrote:
       | in case there were not posted before, here are pictures of SBG2
       | in flames taken by the firemen.
       | https://twitter.com/xgarreau/status/1369559995491172354
       | 
       | This puts an image on the sentence "SBG2 is destroyed". Do not
       | expect any recovery from SBG2.
        
         | benlumen wrote:
         | Holy hell. Are these "datacenters" really just shipping
         | containers? That's what it looks like.
        
           | switch007 wrote:
           | Aren't they a bargain-basement provider? You get what you pay
           | for I guess?
        
           | tantalor wrote:
           | https://en.wikipedia.org/wiki/Modular_data_center
        
             | quickthrower2 wrote:
             | Insert docker joke...
        
         | lucb1e wrote:
         | Status shows greens across the board
         | 
         | http://status.ovh.com/vms/index_sbg2.html
         | 
         | Am I looking at the wrong thing or am I right to wonder why we
         | still bother with public status pages if it never shows the
         | real status?
         | 
         | Edit: nvm just saw another comment pointing out the same
         | further down the thread (I randomly came across this page while
         | looking for the physical location of another DC)
        
       | helge9210 wrote:
       | Almost 11 years ago (March 27 2010) in Ukraine datacenter of
       | company Hosting.ua went in flames as clients were watching their
       | systems go unresponsive at various rows of the datacenter.
       | 
       | Anti-fire systems didn't kick-in. The reason? For couple of days
       | the system was detecting a little bit of smoke from one of the
       | devices. Operators weren't able to pinpoint exact location,
       | considered it a false alarm and manually switched anti-fire
       | system off.
        
       | plasma wrote:
       | " Update 7:20am Fire is over. Firefighters continue to cool the
       | buildings with the water. We don't have the access to the site.
       | That is why SBG1, SBG3, SBG4 won't be restarted today."
       | 
       | https://mobile.twitter.com/olesovhcom/status/136953578757072...
        
         | rgj wrote:
         | With water, they said.
        
           | optimalsolver wrote:
           | Liquid cooling.
        
             | rgj wrote:
             | No, as in "firehose". https://mobile.twitter.com/abonin_DNA
             | /status/136953802824345...
        
       | lovedswain wrote:
       | > In the case of Roubaix 4, the Datacenter is made with a lot of
       | wood:
       | 
       | > Finally, we have other photos of the floor of the OVH "Roubaix
       | 4" tower. It is clearly wood! Hope it's fireproof wood! A wooden
       | datacenter ... is still original, we must admit.
       | 
       | > In France, data centers are mainly regulated by the labor code,
       | by ICPE recommendations (with authorization or declaration) and
       | by insurers. At the purely regulatory level, the only things that
       | are required are:
       | 
       | > - Mechanical or natural smoke extraction for blind premises or
       | those covering more than 300m2
       | 
       | > - The fire compartmentalization beyond a certain volume / m2
       | 
       | > - Emergency exits accessible with a certain width
       | 
       | > - Ventilation giving a minimum of fresh air per occupant
       | 
       | > - Access to the firefighter from the facade for the premises
       | are the low floor of the last level is more than 8 meters
       | 
       | > - 1 toilet for 10 people (occupying a position considered
       | "fixed")
       | 
       | https://lafibre.info/ovh-datacenter/ovh-et-la-protection-inc...
        
         | MayeulC wrote:
         | Ah, interesting pics and discussions (in French) in the
         | neighboring thread: https://lafibre.info/datacenter/incendie-
         | sur-un-site-ovh-a-s...
        
         | malobre wrote:
         | "1 chiotte" -> "1 toilet" not puppy
        
       | SilasX wrote:
       | Late to the party, but ... context? Everyone is talking like OVH
       | or the SBG2 buildings are well-known and common knowledge.
        
         | Symbiote wrote:
         | https://letmegooglethat.com/?q=OVH
        
       | walrus01 wrote:
       | Reminder to not only have backups, but also have some periodic
       | OFFLINE backups.
       | 
       | If your primary is set up with credentials to automatically
       | transfer a copy to the backup destination over the network, what
       | happens if your primary gets pwned and the access is used to
       | encrypt or delete the backup?
       | 
       | Secondly, test doing restores of your backups, and have
       | methods/procedures in place for exactly what a restore looks
       | like.
        
         | [deleted]
        
         | qeternity wrote:
         | > what happens if your primary gets pwned and the access is
         | used to encrypt or delete the backup?
         | 
         | Append-only permissions. We do this in S3 for that specific
         | reason. S3 lifecycle rules take care of pruning old backups.
         | 
         | You can also build a pull-based system where auth resides on
         | the backup system, not the production system.
        
       | dbrgn wrote:
       | Wow, SBG3 seems to be OK: <<Update 11:20am: All servers in SBG3
       | are okey. They are off, but not impacted. We create a plan how to
       | restart them and connect to the network. no ETA. Now, we will
       | verify SBG1.>>
       | 
       | https://twitter.com/olesovhcom/status/1369592437585412097
        
       | molszanski wrote:
       | Is that a New Serverless Platform everyone is talking about
       | recently?
        
       | rctay89 wrote:
       | Puzzles and puzzle storm is down on Lichess:
       | 
       | > Due to a fire at one of our data centres, a few of our servers
       | are down and may be down permanently. We are restoring these
       | servers from backups and will enable puzzles and storm as soon as
       | possible. > > We hope that everyone who is dealing with the fire
       | is safe, including the firefighters and everyone at OVH. <3.
        
         | gnulinux wrote:
         | I think playing "from position" is also broken? I was playing
         | chess with my friend and we usually play "from position" but it
         | wasn't working just now so we're playing standard instead. It
         | might be an unrelated bug.
        
         | ericra wrote:
         | I opened two tabs to relax tonight: Hacker News and Lichess.
         | This was the top HN thread, and Lichess is having issues
         | because of the fire.
         | 
         | I didn't know what OVH was before 10 minutes ago, but this
         | seems really impactful. I hope everyone there is safe and that
         | the immediate disaster gets resolved quickly.
        
           | sofixa wrote:
           | Look them up, they're one of the biggest hosting providers in
           | the world ( especially in France), and due to cheap prices
           | are especially pop-up with smaller scale stuff.
        
             | Macha wrote:
             | Yeah, they scale down to their kimsufi line which used to
             | have quite powerful dedicated servers for the price of
             | basic VPSes from other providers.
             | 
             | e.g. They have a 4core, 16gb ram server for $22/mo which is
             | 25% of what my preferred provider, Linode, charges.
             | 
             | Now, it comes with older consumer hardware (that one is a
             | sandy bridge i5), and about as much support as the price
             | tag suggests, as well as a dated management interface, but
             | when I used to run a modded minecraft server as a college
             | student, which needed excessive amounts of RAM and could be
             | easily upset by load spikes on other clients, then it was a
             | no-brainer, even if I would expect the modern-ish Xeons
             | Linode uses to win on a core for core basis.
        
               | ev1 wrote:
               | Dated? They're probably the only place that isn't $comedy
               | "bare metal cloud" pricing that not only has an API for
               | their $5/m servers but also the panel is a reasonably
               | modern SPA that implements that API and uses OAuth for
               | login
        
               | Macha wrote:
               | Has it been replaced since I last used it in ~2016? This
               | is not the interface I had to use at all.
               | 
               | This is the interface they had when I used them last:
               | 
               | https://www.youtube.com/watch?v=h5-J_DO_FS0
        
               | ev1 wrote:
               | Yeah this is not at all what you use now. It's an Angular
               | SPA.
               | 
               | It is still a mess of separate accounts, but you can use
               | email addr to log in instead of random-numbers generated
               | handle.
               | 
               | The OVHCloud US is completely separated for legal
               | purposes, from what I remember. No account sharing,
               | different staff, OVH EU cannot help you at all with US
               | accounts.
               | 
               | https://www.youtube.com/watch?v=I2G6TkKg0gQ
        
               | littlestymaar wrote:
               | Yes it changed. It was replaced by a more modern version
               | a few years ago (but the transition was painful, as not
               | everything was implemented in the new version when they
               | started to deploy it).
        
               | ev1 wrote:
               | for me, at times you'd not bother trying to remember your
               | NIC-handle and just curl the API instead out of laziness
        
       | 112233 wrote:
       | Photos and video from the scene:
       | 
       | https://www.dna.fr/faits-divers-justice/2021/03/10/strasbour...
        
       | betamaxthetape wrote:
       | This is certainly a good reminder to have regular backups. I have
       | (had?) a VPS in SBG1, the shipping-container-based data centre in
       | the Strasbourg site, and the latest I know is that out of the 12
       | containers, 8 are OK but 4 are destroyed [1]. Regardless, I
       | imagine it will be weeks / probably months before even the OK
       | servers can be brought back online.
       | 
       | Naturally, I didn't do regular backups. Most of the data I have
       | from a 2019 backup, but there's a metadata database that I don't
       | have a copy of, and will need to reconstruct from scratch.
       | Thankfully for my case reconstructing will be possible - I know
       | that's not the case for everyone.
       | 
       | Right now I'm feeling pretty stupid, but I only have myself to
       | blame. For their part, OVH have been really good at keeping
       | everyone updated (particularly the Founder and CEO, Octave
       | Klaba).
       | 
       | I believe that when I signed up back in 2017, the Strasbourg
       | location was advertised as one of the cheapest, so I can imagine
       | a lot of people with a ~$4 / month OVH VPS are in the same
       | situation, desperately scrambling to find a backup.
       | 
       | (For those that have a OVH VPS that's down right now, you can
       | find what location it is in by logging onto the OVH control
       | panel.)
       | 
       | [1] https://twitter.com/olesovhcom/status/1369598998441558016
        
       | [deleted]
        
       | pwned1 wrote:
       | To be honest, this gives me a little schadenfreude. OVH is the
       | most notorious host that refuses to act on abuse complaints for
       | phishing sites.
        
       | pmontra wrote:
       | Some pictures of the building with firefighters at work
       | 
       | https://www.dna.fr/faits-divers-justice/2021/03/10/strasbour...
       | 
       | Edit:
       | 
       | Video at https://www.youtube.com/watch?v=a9jL_THG58U
       | 
       | Satellite view of the site on Google Maps
       | https://goo.gl/maps/L2T6YNFCtiyDdiNv7
        
         | [deleted]
        
         | ilkkao wrote:
         | Easy to see now that a lightly constructed five story cube
         | might not be a fully fire proof.
        
           | Biganon wrote:
           | How do you see that?
        
           | mike_d wrote:
           | Would you humor us with a link to a fully fire proof
           | datacenter?
        
             | dividedbyzero wrote:
             | I think Microsoft had some experimental datacenter
             | containers they submerged in the northern Atlantic for
             | passive cooling, and I believe those were filled with an
             | inert gas as well. I guess that would come very close to an
             | actual fireproof datacenter.
        
               | reasonabl_human wrote:
               | Yep you're right, learning about this was part of some
               | onboarding training we had to complete...
               | 
               | it was an interesting proof of concept, but finding the
               | right people to maintain the infra with both IT and Scuba
               | skills was a narrow niche to nail down ;)
        
               | potemkinhr wrote:
               | I don't opening it at any point before decomissioning it
               | completely is even an afterthought with that. They just
               | write off any failures and roll with it as long as it's
               | viable.
        
               | reasonabl_human wrote:
               | Yeah that was one of the show-stopping issues, inability
               | to repair / hotswap etc...
        
               | namibj wrote:
               | Well, you only need to get shared infrastructure reliable
               | enough that you can afford to not design it with repair
               | in mind. The cloud servers are already design without
               | unit-level maintenance work in mind, which saves money by
               | eliminating rails and similar. They get populated racks
               | from the factory and just cart them from the dock to
               | their place, plug them in (maybe run some self-test to
               | make sure there are no loose connectors or so), and
               | eventually decommission them after some years.
        
             | verytrivial wrote:
             | From the outside (and from a position lacking all inside
             | knowledge) it looks highly interconnected and very well
             | ventilated. I'm not sure where you'd put a inert gas
             | supression system or beefy firewalls to slow the fire
             | progress.
        
             | paco3346 wrote:
             | Newly constructed datacenters in the US tend to be all
             | metal with a full building clean suppression agent.
             | https://www.fike.com/products/ecaro-25-clean-agent-fire-
             | supp...
             | 
             | I used to work for a provider whose 2 main datacenters of
             | 8k+ sq ft could pull all oxygen out of the building in 60
             | seconds.
        
               | Twirrim wrote:
               | Data centres I used to work in back in the early 2000s
               | had argonite gas dumps in place (prior to argonite, halon
               | used to be popular but is an ozone depleting gas so was
               | phased out)
               | 
               | In the case of a fire, it would dump a lot of argonite
               | gas in and consume a large amount of the oxygen in the
               | room, depriving the fire of fuel. It's also safe and
               | leaves minimal clean-up work afterwards, doesn't harm
               | electronics etc. unlike sprinklers and the like.
               | 
               | The amount of oxygen left is sufficient for human life,
               | but not for fires, though my understanding is that it can
               | be quite unpleasant when it happens. You won't want to
               | hang around.
        
               | paco3346 wrote:
               | One of ours had a giant red button you could hold to
               | pause the 60 second timer before all the oxygen was
               | displaced. Every single engineer was trained to
               | immediately push that if people were in the room because
               | it was supposedly a guaranteed death if you got stuck
               | inside once the system went off.
        
               | namibj wrote:
               | Well, yeah, these normal inert gas fire suppression
               | systems don't do a good job if humans can still breathe.
               | The Novec 1230 based ones can actually be sufficiently
               | effective for typical flammability properties you can
               | cheaply adhere to in a datacenter, but even then you iirc
               | would want to add both that and some extra oxygen,
               | because the nitrogen in the air is much more effective at
               | suffocating humans than at suffocating fire. This stuff
               | is just a really, really heavy gas that's liquid below
               | about body temperature (boils easily though), and the
               | heat capacity of gasses is mostly proportional to their
               | density.
               | 
               | Flames are extinguished by this cooling effect (identical
               | to water in that regard), but humans rely on catalytic
               | processes that aren't affected by the cooling effect.
               | 
               | If you could keep the existing oxygen inside, while
               | adding Novec 1230, humans could continue to breathe while
               | the flames would still be extinguished, but this would
               | require the building/room to be a pressure chamber that
               | holds about half an atmosphere of extra pressure. I'm
               | pretty sure just blowing in some extra oxygen with the
               | Novec 1230 would be far cheaper to do safely and
               | reliably.
               | 
               | I mean, in principle, if you gave re-breathers to the
               | workers and have some airlocks, you could afford to keep
               | that atmosphere permanently, but it'd have to be a bit
               | warm (~30 C I'd guess). Don't worry, the air would be
               | breathable, but long-term it'd probably be unhealthy to
               | breathe in such high concentrations and humans breathing
               | would slightly pollute the atmosphere (CO2 can't stay if
               | it's supposed to remain breathable).
               | 
               | Just to be clear: in an effective argonite extinguishing
               | system you'd have about a minute or two until you pass
               | out and need to be dragged out, ideally get oxygen, get
               | ventilated (no brain, no breathing) and potentially also
               | be resuscitated (the heart stops shortly after your brain
               | from a lack of oxygen, so if you're ventilated fast
               | enough, it never stops and you wake up a few externally-
               | forced breaths later). Having an oxygen bottle to
               | supplement your breaths would fix that problem for as
               | long as it's not empty.
        
               | gpm wrote:
               | > I mean, in principle, if you gave re-breathers to the
               | workers and have some airlocks, you could afford to keep
               | that atmosphere permanently,
               | 
               | At this point I feel like it would be cheaper just to not
               | have workers go there. Fill the place completely full of
               | nitrogen with an onsite nitrogen generator (and only 1atm
               | pressure). Have 100% of regular maintenance and as much
               | irregular maintenance as possible be done by robots. If
               | something happens that requires strength/dexterity beyond
               | the robots (e.g. a heavy object falling over), either
               | have humans go in in some form of scuba gear, or if you
               | can work around it just don't fix it.
        
               | namibj wrote:
               | That seems reasonable. But just to clarify what I meant
               | with airlock: some thick plastic bag with a floor-to-
               | ceiling zipper on the "inner" and "outer" ends, and for
               | entry, it's first collapsed by a pump sucking the air out
               | of it. Then you open the zipper on the outer end, step
               | in, close the zipper, let the pump suck away the air
               | around you, and open the inner zipper (they should
               | probably be automatically operated, as you can't move
               | well/much when you are "vacuum bagged").
               | 
               | For exit, basically just the reverse, with the pump
               | pumping the air around the person to wherever the person
               | came from.
               | 
               | The general issue with unbreathable atmospheres is that a
               | failure in their SCBA gear easily kills them.
               | 
               | And re-breathers that are only there so you don't have to
               | scrub the atmosphere in the room as often shouldn't be
               | particularly expensive. You may even get away with just
               | putting a CO2 scrubber on the exhaust path, and giving
               | them slightly oxygen-enriched bottled air so you can keep
               | e.g. a 50:50 oxygen:nitrogen ratio inside (so e.g. 20%
               | O2, 20% N2, 60% Novec 1230). And it doesn't even need to
               | be particularly effective, as you can breathe in quite a
               | bit of the ambient air without being harmed, and the
               | environment can tolerate some of your CO2. Like, as long
               | as it scrubs half of your exhausted CO2 it won't even
               | feel stuffy in there (you could handle the ambient air
               | you'd have without re-breathers being used, as it'd be
               | just 1.6% CO2, but you'd almost immediately get a
               | headache).
               | 
               | They'd have an exhaust vent for pressure equalization,
               | which would chill the air to condense and re-cycle the
               | Novec 1230. For pressure equalization in the other
               | direction, they'd probably just boil off some of that
               | recycled Novec 1230.
               | 
               | So yeah, re-breather not needed, if you just get a
               | mouth+nose mask to breathe bottled 50:50 oxygen:nitrogen
               | mix. That 50% oxygen limit (actually 500 mBar) is due to
               | long-term toxicity, btw. Prolonged exposure to higher
               | levels causes lung scarring and myopia/retina detachment,
               | so not really fun.
        
         | pulkitanand wrote:
         | Cloud going up in smoke.
        
         | userbinator wrote:
         | That is a disturbingly large amount of flame. I was expecting a
         | datacenter to not have much in the way of flammable material.
         | 
         | ...then I read in some other comments here that they used
         | _wood_ in the interior construction.
        
           | Symbiote wrote:
           | Wood is a reasonable construction material.
           | 
           | It takes a good while to start burning, and even when
           | significantly charred it still retains most of its strength.
        
             | owenmarshall wrote:
             | Wood is a reasonable construction material for my house, or
             | an office - but is it for a building with that much energy
             | kept that close together?
        
               | joshuamorton wrote:
               | Treated lumber is generally considered to be fairly
               | fireproof (comparable, though with different precise
               | failure modes than steel or concrete). It depends on
               | exactly what kind of wood is being used. A treated 12x12
               | beam is very fire resistant, plywood is less so.
               | 
               | The issue is you'll have lots of plastic (cabling) in a
               | DC, and plastic will burn
        
               | namibj wrote:
               | There is self-extinguishing cable insulation, though. I'm
               | actually surprised this (DC flammability) is still an
               | issue, and not already solved by making components self-
               | extinguishing and banning non-tiny batteries inside the
               | equipment. If you want to have a battery for your raid
               | controller, put something next to it that will stop your
               | system from burning down it's surroundings.
        
         | account42 wrote:
         | Your Google Maps link seems to be off by a bit.
        
       | Gengis wrote:
       | Hi,
       | 
       | Does anyone recommend a mail provider that implements IMAP
       | failover and replication across different sites ?
       | 
       | My mail account is hosted by OVH, it has been down for hours and
       | from what I read I may have to wait for another 1 or 2 days.
       | 
       | Thanks !
        
       | rgj wrote:
       | Video here. This will be Disaster Recovery 101 material.
       | 
       | https://mobile.twitter.com/abonin_DNA/status/136953802824345...
        
         | Rapzid wrote:
         | Or disaster prevention. I'm curious how their fire suppression
         | system failed so spectacularly..
        
           | rgj wrote:
           | I think they failed to install it.
           | 
           | They only mention smoke detection:
           | https://www.ovh.com/world/us/about-us/datacenters.xml
        
             | lorenzhs wrote:
             | In French but with pictures: https://lafibre.info/ovh-
             | datacenter/ovh-et-la-protection-inc... - probably a
             | different OVH data centre, but they clearly have sprinklers
             | there. The argument they made against gas extinguishers is
             | by the time they have to use it, the data is gone anyway,
             | and it's only going to trigger in the affected areas. It's
             | also far safer for the people working there.
        
               | Rapzid wrote:
               | That's very interesting find! Google translate seemed to
               | do a good enough job on it for me. 8 years ago and there
               | are people in that thread ripping on what they see in the
               | photos.
        
         | raverbashing wrote:
         | Ouch, that's not pretty... but it seems that the fire was
         | constrained to 1 or 2 sectors (inside the building) - per their
         | updates
         | 
         | Not sure how good were the fire suppression systems of the
         | building.
        
           | baggy_trough wrote:
           | Looks like a lot of those containers/compartments are melted.
        
       | worldofmatthew wrote:
       | https://twitter.com/olesovhcom/status/1369504527544705025
       | 
       | "Update 5:20pm. Everybody is safe. Fire destroyed SBG2. A part of
       | SBG1 is destroyed. Firefighters are protecting SBG3. no impact
       | SBG4."
        
         | [deleted]
        
         | akamaka wrote:
         | For some context:
         | 
         | "SBG1, the first Strasbourg data center, consisting of twelve
         | containers, came online in 2012. The 12 containers had a
         | capacity of 12,000 servers.
         | 
         | SBG2 is a non-container data center in 2016 using its "Tower"
         | design with a capacity of 30,000 servers.
         | 
         | SBG3 tower was built in 2017 with a capacity of 30,000 servers.
         | 
         | SBG4 was built in 2013 as several containers to augment
         | capacity, but was decommissioned in 2018 and moved to SB3"
         | 
         | https://baxtel.com/data-center/ovh-strasbourg-campus
        
           | Scoundreller wrote:
           | and a 2017 (!) article where they were planning to remove 1
           | and 2 because of power issues!
           | 
           | > OVH to Disassemble Container Data Centers after Epic Outage
           | in Europe
           | 
           | > "This is probably the worst-case scenario that could have
           | happened to us."
           | 
           | > OVH [...] is planning to shut down and disassemble two of
           | the three data centers on its campus in Strasbourg, France,
           | following a power outage that brought down the entire campus
           | Friday, causing prolonged disruption to customer applications
           | that lasted throughout the day and well into the evening.
           | 
           | https://www.datacenterknowledge.com/uptime/ovh-
           | disassemble-c...
        
             | Macha wrote:
             | It sounds like the plan was to shutdown 1 and 4, the latter
             | of which happened and the former which did not.
        
               | Scoundreller wrote:
               | It's confusing as they say they're shutting down 2 of the
               | 3, but then there's 4...
        
         | AdamJacobMuller wrote:
         | > Fire destroyed SBG2
         | 
         | This is crazy.
         | 
         | SBG2 was HUGE and if this isn't a translation error on the part
         | of Octave (which I could understand given the stress and ESL) I
         | have a hard time fathoming what kind of fire could destroy a
         | whole facility with nearly 1000 racks of equipment spread out
         | across separated halls.
         | 
         | I'm really hoping "destroyed" here means "we lost all power and
         | network core and there's smoke/fire/physical damage to SOME of
         | that"
         | 
         | I can't even fathom a worst-case scenario of a transformer
         | explosion (which does occur and I've seen the aftermath of)
         | having this big of an impact. Datacenters are built to contain
         | and mitigate these kinds of issues. Fire breaks, dry-pipe
         | sprinkler systems and fire-extinguishing gas systems are all
         | designed to prevent a fire from becoming large-scale.
         | 
         | Really glad nobody was hurt. OVH is gonna have a bad time
         | cleaning all this up.
        
           | vinay_ys wrote:
           | I wonder if they had batteries inside the containers. That
           | can make a bad situation really worse.
        
           | ikiris wrote:
           | This was effectively posted on the outages list as well by
           | someone trustworthy. The pictures also look pretty bad from
           | the outside.
        
             | exikyut wrote:
             | Which outages list? Sounds interesting.
        
               | AnssiH wrote:
               | I presume they mean this:
               | https://puck.nether.net/mailman/listinfo/outages
        
               | exikyut wrote:
               | Ah, thanks!
        
           | sofixa wrote:
           | OVH design their own datacenters, so it's possible that they
           | missed something or some system or another didn't work as
           | intended, thus the heavy damage.
        
             | rgj wrote:
             | They did not have a fire suppression system, only smoke
             | detection. So yeah, they missed something.
        
               | lorenzhs wrote:
               | There are photos of the fire suppression system in one of
               | their data centres in this (French) forum thread:
               | https://lafibre.info/ovh-datacenter/ovh-et-la-protection-
               | inc.... They have sprinklers, with the reasoning being
               | that the burning racks' data is gone anyway if there's a
               | fire, and at least sprinklers don't accidentally kill the
               | technicians.
        
               | 35fbe7d3d5b9 wrote:
               | > They have sprinklers, with the reasoning being that the
               | burning racks' data is gone anyway if there's a fire
               | 
               | I think the real problem, per that post, is this:
               | 
               | >> They are simple sprinklers that spray with water. It
               | has nothing to do with very high pressure misting
               | systems, where water evaporates instantly, and which save
               | servers. Here, it's watering, and all the servers are
               | dead. It's designed like that. Astonishing, isn't it?
               | 
               | >> Obviously, they rely above all on humans to extinguish
               | a possible fire, unlike all conventional data centers.
               | 
               | (all thanks to Google Translate)
               | 
               | This strikes me as a _terrible safety system_ because
               | even if a human managed to detect the fire, they have to
               | make a big call: is the risk of flooding the facility and
               | destroying a ton of gear worth putting out a fire? By the
               | time the human decides  "yes, it is", it may well be too
               | late for the sprinklers.
               | 
               | > and at least sprinklers don't accidentally kill the
               | technicians.
               | 
               | Not a real risk with modern 1:1 argon:nitrogen systems -
               | the goal is to pump in inert gases and reduce oxygen
               | content to around 13%, a point where the fire is
               | suppressed and people can survive. You wouldn't _want_ to
               | be in a room breathing 13% oxygen for a long time, but it
               | won 't kill you.
               | 
               | All in all, it looks like this was a "normal accident"[1]
               | for a hosting company that aggressively competes on
               | price. The data center was therefore built with less
               | expensive safeties and carried a higher risk of
               | catastrophic failure.
               | 
               | [1]: https://en.wikipedia.org/wiki/Normal_Accidents
        
               | Faaak wrote:
               | Sprinklers are only present in OVH Canada datacenter !
               | There are no sprinklers in Europe ones.
        
               | mwcampbell wrote:
               | Given that they're headquartered in Europe and most
               | popular there, why is the satellite location better? Is
               | it because the Canadian data center is newer, because
               | Canada has stronger regulations in this area, or
               | something else? Also, does anyone know how the OVH US
               | data centers compare?
        
               | sofixa wrote:
               | According to a 2013 forum post by the now CEO of
               | Scaleway, a competitor of OVH, it's due to North American
               | building regulations that basically force you into
               | sprinklers and stuff for insurance reasons.
               | 
               | Source in French: https://lafibre.info/ovh-
               | datacenter/ovh-et-la-protection-inc...
        
               | kuschku wrote:
               | Which is understandable, as Halon based fire suppression
               | systems have been illegal for quite some time :/
        
               | weeeeelp wrote:
               | There are non-Halon systems, using agents such as FM-200.
               | Those are not as toxic and do not destroy the ozone
               | layer.
        
               | iagovar wrote:
               | Where did you get this?
        
               | rgj wrote:
               | https://www.ovh.com/world/us/about-us/datacenters.xml
        
               | draugadrotten wrote:
               | They say different here: "The rooms are also equipped
               | with state of the art fire detection and extinction
               | systems."
               | 
               | https://www.ovh.com/world/solutions/centres-de-
               | donnees.xml
               | 
               | and here
               | 
               | "Every data center room is fitted with a fire detection
               | and extinction system, as well as fire doors. OVHcloud
               | complies with the APSAD R4 rule for the installation of
               | mobile and portable fire extinguishers and has the N4
               | certificate of conformity for all our data centers."
               | https://us.ovhcloud.com/about/company/security
        
               | sofixa wrote:
               | That's for its North America DCs, not European ones.
        
               | AdamJacobMuller wrote:
               | > They did not have a fire suppression system
               | 
               | I find it very hard to believe that that would pass code
               | anywhere in the US/EU or most of the world. They may not
               | have had sprinklers but that doesn't mean there isn't
               | fire suppression.
        
               | sofixa wrote:
               | According to a forum post discussing the building of said
               | DC, with photos and such, there's no visible fire
               | suppression system:
               | 
               | https://lafibre.info/ovh-datacenter/ovh-et-la-protection-
               | inc...
        
           | rgj wrote:
           | No, "destroyed" looks like this: https://mobile.twitter.com/a
           | bonin_DNA/status/136953802824345...
           | 
           | EDIT a picture from earlier this night https://pbs.twimg.com/
           | media/EwGqV17XMAMF_wa?format=jpg&name=...
        
             | baq wrote:
             | I'd say destroyed is an adequate word, yes.
        
             | AdamJacobMuller wrote:
             | I'm really shocked, that's an incredible amount of damage.
             | Lots of people lost data.
        
           | baggy_trough wrote:
           | The pictures seem pretty clear that a lot of it is gone,
           | judging by the blackened holes in the walls with firehoses
           | being sprayed into.
        
           | hedora wrote:
           | Data centers frequently burn down, or are destroyed by
           | natural disaster.
           | 
           | These days, fire suppression systems need to be non-lethal,
           | so inert gasses are out. Water is too, for obvious reasons.
           | Last I checked, they flooded the inside of the DC with
           | airborne powder that coats everything (especially the inside
           | of anything with a running fan). Once that deploys, the
           | machines in the data center are a write off even if the fire
           | was minor.
        
             | nixgeek wrote:
             | Most datacenters I've worked with in the last 5 years use
             | water.
        
           | qbasic_forever wrote:
           | Just guessing, but maybe a fire suppression system going off
           | could wipe out all the machines?
           | 
           | The couple datacenters I've been inside were small, old and
           | used halon gas which wasn't supposed to destroy the machines.
           | No idea how it works in big places these days.
        
             | atkbrah wrote:
             | A few years back there was an incident in Sweden where
             | noise coming from the gas based fire suppression system
             | going off destroyed hard drives [1].
             | 
             | 1. https://www.theregister.com/2018/04/26/decibels_destroy_
             | disk...
        
               | yread wrote:
               | We had the same issue at a customer site. To add since
               | the decibels were outside the rated environment the
               | warranty was void on the harddisks and they had to be
               | replaced even if they were not destroyed.
        
               | abrookewood wrote:
               | I've also seen a weird video (lost to time
               | unfortunately), where someone showed that they could yell
               | at their servers in a data centre and introduce errors
               | (or something similar). Was very strange to see, but they
               | had a console up clearly showing that it was having an
               | impact.
        
               | regecks wrote:
               | https://www.youtube.com/watch?v=tDacjrSCeq4&fmt=18
        
               | lrem wrote:
               | Because of course it was Bryan :D
        
               | jeffrallen wrote:
               | Brian just giving his opinion on anything would give hard
               | drives errors... That guy loves to run his mouth (and I
               | love to listen).
        
               | mwcampbell wrote:
               | Actually, although the video is on Bryan Cantrill's
               | YouTube account, that was Brendan Gregg.
        
               | bcantrill wrote:
               | I was merely the videographer! For anyone who is curious,
               | I discussed the backstory of that video with Ben
               | Sigelman.[0]
               | 
               | [0] https://www.youtube.com/watch?v=_IYzD_NR0W4#t=28m46s
        
               | atkbrah wrote:
               | I actually planned to include a link to this video in my
               | original response but then left it out. Thanks for
               | posting!
        
               | gcbirzan wrote:
               | ING in Romania had the same issue:
               | https://www.datacenterdynamics.com/en/news/noise-from-
               | fire-d...
        
           | dylan604 wrote:
           | If a fire supression kicks in or the fire department shows up
           | with their hoses, would they still say the fire destroyed it
           | or just say destroyed due to fire and water damage?
           | 
           | Also, fire suppression system do fail. There was an infamous
           | incident in LA for one of the studios. They built a warehouse
           | to be a tape vault with tapes going back to the 80s. A fire
           | started, but the suppression system failed because there was
           | not enough pressure in the system. Total loss. Got to keep
           | your safety equipment properly tested!
        
             | lrem wrote:
             | There's a problem with testing sprinklers: engaging them
             | can be damaging to contents and even structures. So, we're
             | talking about completely emptying the facility, then taking
             | it offline to dry for a time. I've never heard about this
             | being done to anything that was already operational (but I
             | wasn't researching this either).
        
               | dylan604 wrote:
               | There are methodds of testing the water pressure in the
               | pipes without actually engaging the sprinkler heads. It
               | is part of the normal checks done during the
               | maintenence/inspection a business is supposed to have
               | done. In fact, one place I was in had sensors, and would
               | sound the actual fire alarm if the pressure fell below
               | tolerance at any time. The lack of pressure in the
               | Universal vault was 100% unexcusable.
        
               | ygra wrote:
               | Isn't something like Halon used in data centers for that
               | reason? That can probably be tested without damaging
               | infrastructure.
        
               | weeeeelp wrote:
               | Halon is pretty much banned for years now, other agents
               | have been introduced. Sadly, making an actual full test
               | of a gaseous extinguishing system (such as FM200 or Novec
               | 1230) can be prohibitively expensive (mainly costs of the
               | "reloading" of the system with new gas). Those are just
               | mostly tested for correct pressure in tanks and if the
               | detection electronics are working fine, making an actual
               | dump would be very impractical (evacuation of personnel,
               | ventilating the room afterwards etc.)
        
             | proggy wrote:
             | It wasn't just "tapes going back to the 80s." Those were
             | just the media Universal initially admitted to losing. No,
             | that building was the mother lode. It had film archives
             | going back over 50 years, and worst of all -- unreleased
             | audio tape masters for thousands upon thousands of
             | recording artists. The amount of "remastered" album
             | potential that fire destroyed is probably in the billions
             | of dollars, let alone the historical loss of all those
             | recordings by historical persons that will never be heard
             | due to a failure of a fire prevention system. Fascinating
             | case study in why you should never put all your eggs in one
             | basket.
             | 
             | https://en.m.wikipedia.org/wiki/2008_Universal_Studios_fire
        
               | AdamJacobMuller wrote:
               | This is why I'm such a fan of digitizing. If you have 1
               | film master you effectively have none. Do some 8K or 16K
               | scans of that master and effectively manage the resulting
               | data and you're effectively 100% immune from future loss
               | in perpetuity.
               | 
               | Losing parts of history like that is tragic.
        
               | arp242 wrote:
               | In 1996 Surinam lost many of their government archives
               | after a fire burned the building down.
               | 
               | I can find surprisingly few English-language resources
               | for this (only Dutch ones); guess it's a combination of a
               | small country + before the internet really took off.
        
         | jen20 wrote:
         | An easy mistake (in the quoted tweet) to make given the
         | circumstances, but it probably should be 5:20am rather than pm.
        
       | rsync wrote:
       | This is a literal nightmare for me.
       | 
       | I can remember several San Diego fires that threatened the
       | original JohnCompanies datacenter[1] circa mid 2000s and thinking
       | about all of the assets and invested time and care that went into
       | every rack in the facility.
       | 
       | Very interested to read the post-mortem here ... even more
       | interested in any actionable takeaways from what is a very rare
       | event ...
       | 
       | [1] Castle Access, as it was known, at the intersection of Aero
       | Dr. and the 15 ... was later bought by Redit, then Kio ...
        
         | [deleted]
        
         | sebmellen wrote:
         | Man, 2006 was a freaky time to be in San Diego.
        
       | MrQuimico wrote:
       | I have a VPS in SBG1.
       | 
       | So far I haven't got any communication from OVH alerting me about
       | this. I think that's the first thing they should do, alerting
       | their customers that something it's happening.
       | 
       | Anyway I was running a service that I was about to close, so this
       | may be it. I do have a recovery plan, but I don't know it it is
       | worth it at this point.
       | 
       | I'm never using OVH again. The fire can happen, but don't ask me
       | about my recovery plan, what about yours?
        
         | Symbiote wrote:
         | There's a big link from their homepage:
         | https://www.ovh.ie/news/press/cpl1786.fire-our-strasbourg-si...
         | 
         | It says they've alerted customers, but I expect some have been
         | missed, through inaccurate email records, email hosted on the
         | systems that have been destroyed etc.
        
       | jjjeii3 wrote:
       | White collar jobs are not paid well in the EU. If a saleswoman
       | from a nearby shop has the same salary, why should I care about
       | the quality of my work at all? The entire data center burned
       | down? Fine! There are a lot of other places with the same tiny
       | salary.
       | 
       | As you can see below, a developer expert is only making 24K-63K
       | Euro per year at OVH (in US dollar it's almost the same amount):
       | 
       | https://www.glassdoor.com/Salary/OVHcloud-France-Salaries-EI...
       | 
       | after paying taxes you will only get a half of that amount.
        
       | ogaj wrote:
       | Wow. Weren't they just angling for an IPO? I wonder how much of
       | this was insured and what the impact is to their overall
       | operations.
        
         | moooo99 wrote:
         | I think they announced their IP plans yesterday [1], which is
         | probably the worst timing one can have (if there even is a good
         | timing for a datacenter burning down, probably there isn't).
         | 
         | If they have a good insurance I'm confident this will have
         | little impact on their operations, I really hope they do. I
         | host a few components on OVH/SoYouStart dedicated servers,
         | luckily not mission critical, but still had rather good
         | experience with them, especially in terms of price to
         | performance.
         | 
         | [1] https://www.reuters.com/article/amp/idUSKBN2B01FM
        
           | toyg wrote:
           | The publicity damage alone will be on par with (if not
           | bigger) their replacement costs. I wouldn't be surprised if
           | they had to rebrand.
           | 
           | The timing is so bad that it becomes almost suspicious. When
           | is the best time to sabotage your competitor? When they are
           | the most visible.
        
             | brmgb wrote:
             | > The publicity damage alone will be on par with (if not
             | bigger) their replacement costs. I wouldn't be surprised if
             | they had to rebrand.
             | 
             | Honest question: which publicity damage?
             | 
             | A fire in a datacenter is very much part of the things you
             | should expect to see happen when you operate a large number
             | of datacenters and will obviously cause some disruption to
             | your customers hosting physical servers there.
             | 
             | Provided the disruption doesn't significantly extend to
             | their cloud customers and doesn't affect people paying for
             | guaranteed availability (which it shouldn't - OVH operates
             | datacenters throughout the world), this seems to me to be
             | an unfortunate incident but not a business threatening one.
        
               | marcinzm wrote:
               | Most people I feel would expect fire suppression to kick
               | in and prevent the whole data center (and the adjacent
               | ones) from catching on fire. The fact that it didn't is
               | concerning regarding their operations since they build
               | their own custom data centers. The fire isn't the issue,
               | how much damage it did is the issue. So one can ask if
               | there was there a systematic set of planning mistakes of
               | which this is just the first to surface?
        
             | karambahh wrote:
             | SBG is a small part of their operations. I doubt it will
             | have a lasting issue on their image
        
       | broknbottle wrote:
       | It's a Thermal Event not a fire
        
       | Diederich wrote:
       | Back in the late 90s, I implemented the first systematic
       | monitoring of WalMart Store's global network, including all of
       | the store routers, hubs (not switches yet!) and 900mhz access
       | points. Did you know that WalMart had some stores in Indonesia?
       | They did until 1998.
       | 
       | So when https://en.wikipedia.org/wiki/May_1998_riots_of_Indonesia
       | started happening, we heard some harrowing stories of US
       | employees being abducted, among other things.
       | 
       | Around that same time, the equipment in the Jakarta store started
       | sending high temperature alerts prior to going offline. Our NOC
       | wasn't able to reach anyone in the store.
       | 
       | The alerts were quite accurate: that was one of the many
       | buildings that had been burned down in the riots. I guess it's
       | slightly surprising that electrical power to the equipment in
       | question lasted long enough to allow temperature alerting. Most
       | of our stores back then used satellite for their permanent
       | network connection, so it's possible telcom died prior to the
       | fire reaching the UPC office.
       | 
       | In a couple of prominent places in the home office, there were
       | large cutouts of all of the countries WalMart was in at the time
       | up on the walls. A couple of weeks after this event, the
       | Indonesia one was taken down over the weekend and the others re-
       | arranged.
        
         | avereveard wrote:
         | ovh monitoring is even better, everything is green on the
         | destroyed dc http://status.ovh.com/vms/index_sbg2.html
        
         | 1MachineElf wrote:
         | Thanks for sharing this interesting story. Part of my family
         | immigrated from Indonesia due to those riots, but I was unaware
         | up until today of the details covered by the Wikipedia article
         | you linked.
         | 
         | I remember during the 2000s and 2010s that WalMart in the USA
         | earned a reputation for it's inventories primarily consisting
         | of Chinese-made goods. I'm not sure if that reputation goes all
         | the way back to 1998, but it makes me wonder if WalMart was
         | especially targeted by the anti-Chinese element of the
         | Indonesian riots because it.
        
           | Diederich wrote:
           | I can't recall (and probably didn't know at the time...it was
           | far from my area) where products were sourced for the
           | Indonesia stores.
           | 
           | Prior to the early 2000s, WalMart had a strong 'buy American'
           | push. It was even in their advertising at the time, and
           | literally written on the walls at the home office in
           | Bentonville.
           | 
           | Realities changed, though, as whole classes of products were
           | more frequently simply not available from the United States,
           | and that policy and advertising approach were quietly
           | dropped.
           | 
           | Just for the hell of it, I did a quick youtube search:
           | "walmart buy american advertisement" and this came up:
           | https://www.youtube.com/watch?v=XG-GqDeLfI4 "Buy American -
           | Walmart Ad". Description says it's from the 1980s, and that
           | looks about right.
        
           | Diederich wrote:
           | What the hell, here's another story. The summary to catch
           | your attention: in the early 2000s, I first became aware of
           | WalMart's full scale switch to product sourcing from China by
           | noting some very unusual automated network to site mappings.
           | 
           | Part of what my team (Network Management) did was write code
           | and tools to automate all of the various things that needed
           | to be done with networking gear. A big piece of that was
           | automatically discovering the network. Prior to our auto
           | discovery work, there was no good data source for or
           | inventory of the routers, hubs, switches, cache engines,
           | access points, load balancers, VOIP controllers...you name
           | it.
           | 
           | On the surface, it seems scandalous that we didn't know what
           | was on our own network, but in reality, short of
           | comprehensive and accurate auto discovery, there was no way
           | to keep track of everything, for a number of reasons.
           | 
           | First was the staggering scope: when I left the team, there
           | were 180,000 network devices handling the traffic for tens of
           | millions of end nodes across nearly 5,000 stores, hundreds of
           | distribution centers and hundreds of home office
           | sites/buildings in well over a dozen countries. The main US
           | Home Office in Bentonville, Arkansas was responsible for
           | managing all of this gear, even as many of the international
           | home offices were responsible for buying and scheduling the
           | installation of the same gear.
           | 
           | At any given time, there were a dozen store network equipment
           | rollouts ongoing, where a 'rollout' is having people visit
           | some large percentage of stores intending to make some kind
           | of physical change: installing new hardware, removing old
           | equipment, adding cards to existing gear, etc.
           | 
           | If store 1234 in Lexington, Kentucky (I remember because it
           | was my favorite unofficial 'test' store :) was to get some
           | new switches installed, we would probably not know what day
           | or time the tech to do the work was going to arrive.
           | 
           | ANYway...all that adds up to thousands of people coming in
           | and messing with our physical network, at all hours of the
           | day and night, all over the world, constantly.
           | 
           | Robust and automated discovery of the network was a must, and
           | my team implemented that. The raw network discovery tool was
           | called Drake, named after this guy:
           | https://en.wikipedia.org/wiki/Francis_Drake and the tool that
           | used many automatic and manual rules and heuristics to map
           | the discovered networking devices to logical sites (ie, Store
           | 1234, US) was called Atlas, named after this guy:
           | https://en.wikipedia.org/wiki/Atlas_(mythology)
           | 
           | All of that background aside, the interesting story.
           | 
           | In the late 90s and early 2000s, Drake and Atlas were doing
           | their thing, generally quite well and with only a fairly
           | small amount of care and feeding required. I was snooping
           | around and noticed that a particular site of type
           | International Home Office had grown enormously over the
           | course of a few years. When I looked, it had hundreds of
           | network devices and tens of thousands of nodes. This was
           | around 2001 or 2002, and at that time, I knew that only US
           | Home Office sites should have that many devices, and thought
           | it likely that Atlas had a 'leak'. That is, as Atlas did its
           | recursive site mapping work, sometimes the recursion would
           | expand much further than it should, and incorrectly map
           | things.
           | 
           | After looking at the data, it all seemed fine. So I made some
           | inquiries, and lo and behold, that particular international
           | home office site had indeed been growing explosively.
           | 
           | The site's mapped name was completely unfamiliar to me, at
           | the time at least. You might have heard of it:
           | https://en.wikipedia.org/wiki/Shenzhen
           | 
           | I was seeing fingerprints in our network of WalMart's whole
           | scale switch to sourcing from China.
        
             | hobojones wrote:
             | In the early 2000s I was working as a field engineer
             | installing/replacing/fixing network equipment for Walmart
             | at all hours. It's pretty neat to hear the other side of
             | the process! If I remember correctly there was some policy
             | that would automatically turn off switch ports that found
             | new, unrecognized devices active on the network for an
             | extended period of time, which meant store managers
             | complaining to me about voip phones that didn't function
             | when moved or replaced.
        
               | Diederich wrote:
               | Ah neat, so you were an NCR tech! (I peeked at your
               | comment history a bit.) My team and broader department
               | spent a lot of hours working with, sometimes not in the
               | most friendly terms, people at different levels in the
               | NCR organization.
               | 
               | You're correct, if Drake (the always running discovery
               | engine) didn't detect a device on a given port over a
               | long enough time, then another program would shut that
               | port down. This was nominally done for PCI compliance,
               | but of course having open, un-used ports especially in
               | the field is just a terrible security hole in general.
               | 
               | In order to support legit equipment moves, we created a
               | number of tools that the NOC and I believe Field Support
               | could use to re-open ports as needed. I _think_ we
               | eventually made something that authorized in-store people
               | could use too.
               | 
               | As an aside, a port being operationally 'up' wasn't by
               | itself sufficient for us mark the port as being
               | legitimately used. We had to see traffic coming from it
               | as well.
               | 
               | You mentioned elsewhere that you're working with a big,
               | legacy Perl application, porting it to Python. 99% of the
               | software my team at WalMart built was in Perl. (: I'd be
               | curious to know, if you can share, what company/product
               | you were working on.
        
             | 1MachineElf wrote:
             | Epic story! Thank you for sharing it. I appreciate the
             | detail you included there.
        
               | Diederich wrote:
               | You're quite welcome. In my experience, the right details
               | often make a story far more interesting.
        
       | [deleted]
        
       | jeffrallen wrote:
       | Hugops to OVH folks, hang in there, you're my favorite European
       | data center operator.
        
       | lovedswain wrote:
       | He posted an update, seems SBG2 is totally destroyed.
       | 
       | Ouch https://twitter.com/Onepamopa/status/1369484420982407173
        
         | meepmorp wrote:
         | > DID ANYTHING SURVIVE? ANYTHING AT ALL?????? OUR DATA IS IN
         | SBG2. WHAT DO WE DO NOW ?!?!
         | 
         | Double check those backups, folks.
        
           | mhh__ wrote:
           | One of the things I'm still extremely grateful for is that I
           | learnt the basics of computer science from an ex-oracle guy
           | turned secondary school teacher, who wasn't the best
           | programmer let's say but who absolutely drilled into us (I
           | was probably the only one listening but still) the importance
           | of code quality, backups, information security etc.
           | 
           | Nothing fancy, but it's the kind of bread and butter
           | intuition you need to avoid walking straight off a cliff.
           | 
           | He also let me sit at the back writing a compiler instead of
           | learning VB.Net, top dude
        
           | ikiris wrote:
           | what do we do? Well for starters don't rely on a single
           | geographic location _chuckles_
        
           | monkeybutton wrote:
           | Trust but verify. As a developer it doesn't matter what
           | sysadmins or anyone else says about backups of your data; if
           | you haven't run your DR plan and verified the results, it
           | doesn't exist.
        
             | mhh__ wrote:
             | > Trust but verify.
             | 
             | If it's business critical should you even trust at all?
        
               | casi wrote:
               | yeah backups are a case of "don't trust, verify"
        
       | londons_explore wrote:
       | Did OVH claim that SBG1 and SBG2 were isolated failure domains?
       | Despite them just being different rooms in the same building?
        
         | ev1 wrote:
         | They are different physical buildings. OVH does not generally
         | claim anything regarding AZ or distance. SBG1, 2, 3, etc is
         | just denoting the building your server is in - they are not
         | like AWS style AZ or similar, quite literally just building
         | addresses.
         | 
         | I have used them for years and I don't believe they've ever
         | said anything like deploy in both SBG1 and SBG2 for safety or
         | availability, because you don't get that choice.
         | 
         | When you provision a machine (eg via API) they tell you "SBG 5
         | min, LON 5 min, BHS 72h" and you pick SBG and get assigned
         | first-available. There is no "I want to be in SBG4" generally.
        
           | kuschku wrote:
           | In fact, OVH themselves host e.g. backups of SBG in RBX.
        
         | kuschku wrote:
         | They're separate buildings, with separate power systems, just
         | standing next to another. Next to SBG2 are also SBG3 and SBG4.
        
           | londons_explore wrote:
           | Those buildings weren't quite as far apart as they should
           | have been if a fire in one requires all 4 of the others to be
           | turned off...
        
             | Leo_Verto wrote:
             | Not disagreeing with you but the firefighters probably shut
             | down all power onsite as soon as they arrived.
        
             | jiofih wrote:
             | Apparently hey are more like (temporary) extensions of the
             | main building than separate DCs.
        
             | jiofih wrote:
             | Apparently they are more like (temporary) extensions of the
             | main building than separate DCs.
        
       | rarefied_tomato wrote:
       | Was the building made from stacked shipping containers?
       | Containers are such a budget-friendly and trendy structural
       | building block these days. They even click with the software
       | engineers - "Hey, it's like Docker".
       | 
       | Containers would seem to be at a disadvantage when it comes to
       | dissipating, rather than containing, heat. I hope improved
       | thermal management and fire suppression designs can be
       | implemented.
        
         | qeternity wrote:
         | OVH use a custom water cooling solution they claim enables
         | these niche designs and increases rack density.
         | 
         | As another comment says, that density probably exacerbated the
         | issue here.
        
         | AnssiH wrote:
         | I could be wrong, but I understood from other comments that
         | only SBG1 used containers.
        
       | bdz wrote:
       | Rust (the video game) lost all EU server data w/o restore
       | https://twitter.com/playrust/status/1369611688539009025
        
         | chrisandchris wrote:
         | Someone forgot the 3-2-1 rule for backups.
         | 
         | I don't get why people don't do offsite backup today as it's
         | basically for free. AWS Glacier basically costs nothing at all.
        
         | rplnt wrote:
         | Now that is weird. I fully understand not having a backup
         | service in place, but no data backup either?
        
           | Voloskaya wrote:
           | TBF "data" here means the state of the gameworld which anyway
           | resets entirely every 2 weeks or every month, so it's not
           | exactly a big deal, everyone constantly start over in Rust.
        
             | rplnt wrote:
             | Ah, I see. That makes sense then. I thought it's an
             | everlasting game world, thanks for clarification.
        
       | phtrivier wrote:
       | Seems like data.gouv.fr [1], the government platform for open
       | data is impacted ; we might not get the nice COVID-19 graphs from
       | non-governemental sites ([2], [3]) today.
       | 
       | I can't wait for the conspiracy theories about how the fire is a
       | "cover up" to "hide" bad COVID-19 numbers...
       | 
       | [1]
       | https://twitter.com/GuillaumeRozier/status/13695724905996902...
       | 
       | [2] https://covidtracker.fr/
       | 
       | [3] https://www.meteo-covid.com/trouillocarte (Just wanted to
       | share the "trouillocarte" - which roughly translates to "'how
       | badly is shit hitting the fan today' map" ;) )
        
         | Shadonototro wrote:
         | it has nothing to do with covid:
         | 
         | - https://www.usine-digitale.fr/article/le-health-data-hub-
         | heb...
         | 
         | - https://www.genethique.org/la-cnam-refuse-le-transfert-
         | globa...
         | 
         | - https://france3-regions.francetvinfo.fr/bretagne/donnees-
         | med...
         | 
         | Here is more plausible conspiracy theory: USA/microsoft is
         | behind the cyberattacks and the fire
        
       | brainzap wrote:
       | my VPS is gone, hmm
        
       | benlumen wrote:
       | Someone told me that there were _a lot_ of warez sites hosted
       | with OVH.
       | 
       | Anyone know of any casualties?
        
       | [deleted]
        
       | pm90 wrote:
       | Unfortunately A lot of people are going to find out the hard way
       | today why AWS/GCP/Big Expensive Cloud is so expensive (Hint: they
       | have redundancy and failover procedures which drive up costs).
       | 
       | Keep in mind I'm talking not of "downtime" but of actual data
       | loss which might affect business continuity.
       | 
       | This is really tragic. I'm hoping they have some kind of multi
       | regional backup/replication and not just multi zones (although
       | from the twitters it appears that only one of the zones was
       | destroyed however the others don't seem to be operational atm).
        
         | the_duke wrote:
         | I encourage you to have a look at the operating income that AWS
         | rakes in.
         | 
         | Sure, the amount of expertise, redundancy and breadth of
         | service offerings they provide is worth a markup, but they are
         | also significantly more expensive than they need to be.
         | 
         | Thanks to being the leader in an oligopoly, and due to patterns
         | like making network egress unjustifiably expensive to keep you
         | (/your data) from leaving.
        
           | pm90 wrote:
           | I think the question here, then is of subjective value.
           | 
           | AWS may charge more for egress, but that's not high enough
           | for it to be a concern for most clients.
           | 
           | A bigger, independent concern is probably that there should
           | be sufficient redundancy, backups and such that allows for
           | business continuity. (Note again that I'm not saying that all
           | companies make full use of these features, but those that
           | care for such things do. Additionally, I've honestly never
           | heard of an AWS DC burning down. Either it doesn't happen
           | frequently or it doesn't have enough of an effect on regular
           | customers, both of situations are equivalent for my case).
           | 
           | Most businesses choose to prioritize the second aspect. Even
           | if they have to pay extra for egress sometimes, it's just not
           | big enough of a concern as compared to businesses continuity.
        
             | hhw wrote:
             | I've never heard of any data centre burning down (and I
             | work in this industry), so never hearing of an AWS DC
             | burning down isn't really saying anything about AWS.
        
               | ev1 wrote:
               | I remember hosting.ua DC mysteriously "catch fire"
        
             | opsunit wrote:
             | An availability zone (AZ) in AWS eu-west-2 was flooded by a
             | fire protection system going off within the last year. It
             | absolutely did affect workloads in that AZ. That shouldn't
             | have had a large impact on their customers since AWS
             | promote and make as trivial as is viable multi-AZ
             | architectures.
             | 
             | Put another way: one is guided towards making operational
             | good choices rather than being left to discover them
             | yourself. This is a value proposition of public clouds
             | since it commoditises that specialist knowledge.
        
               | tomwojcik wrote:
               | Hm, I can't find anything in google about this flooding
               | incident. Can you share some details / source?
        
               | CodesInChaos wrote:
               | What surprised me most about today's fire is that their
               | datacenters have so little physical separation. I
               | expected them to be far enough apart to act as separate
               | availability zones.
        
         | x3sphere wrote:
         | If the data is that critical, surely you would be backing it up
         | frequently and also mirror it on at least one geographically
         | separate server?
         | 
         | I use a single server at OVH, and I'm not in the affected DC,
         | but if this DID happen to me I could get back up and running
         | fairly quickly. All our data is mirrored on S3 and off site
         | backups are made frequently enough it wouldn't be an issue.
         | 
         | Plus, you still need to plan for a scenario like this even with
         | AWS or any other cloud provider. It is less likely to happen
         | with those, given the redundancy, but there is still a chance
         | you lose it all without a backup plan.
        
         | zelly wrote:
         | Yup, I've never heard of a fire taking out a Big Cloud DC. They
         | actually know what they're doing and don't put server racks in
         | shipping containers stacked on top of each other. If you want
         | quality in life, sometimes you have to pay for it.
         | 
         | Personally I'll continue to use these third world cloud
         | providers. But I like to live on the edge.
        
           | jiofih wrote:
           | Apple fire: https://www.datacenterdynamics.com/en/news/fire-
           | rages-throug...
           | 
           | Google fire: https://www.google.nl/amp/s/gigazine.net/amp/en/
           | 20060313_goo...
           | 
           | AWS fire: https://money.cnn.com/2015/01/09/technology/amazon-
           | data-cent...
        
         | hashhar wrote:
         | AWS and GCP are also prone to same kind of data loss if the AZ
         | you are operating in goes down.
         | 
         | They don't automatically geo-replicate things. You still need a
         | backup for the torched EC2 instance to be able to relaunch in
         | another AZ/region.
        
           | nnx wrote:
           | That's true, but it seems whole of SBG region for OVH is
           | within same disaster radius for one fire... with SBG2
           | destroyed and SBG1 partly damaged.
           | 
           | "The whole site has been isolated, which impacts all our
           | services on SBG1, SBG2, SBG3 and SBG4. "
           | 
           | Wonder if those SBGx were advertised as being the same as
           | "Availability Zones" - when other cloud providers ensure
           | zones are distanced enough from each other (~1km at least) to
           | likely survive events such as fire.
        
             | brmgb wrote:
             | > but it seems whole of SBG region for OVH is within same
             | disaster radius for one fire
             | 
             | SBG is for Strasbourg. That's not a region. It's a city.
             | Obviously, SBG1 to 4 are in the radius of one fire. It's
             | four different buildings on the same site.
        
             | hashhar wrote:
             | Thats a fair point. If OVH does market them as AZs then
             | it's disingenuous and liable to suits IMO.
        
               | cbg0 wrote:
               | No, it isn't, as there's no clear cut definition of what
               | an availability zone is.
        
             | ev1 wrote:
             | They were not and never advertised as anything similar to
             | AZ. You could not deploy in SBG1,2,3, etc. You only pick
             | city = Strasbourg at deploy time. It's merely a building
             | marker.
        
             | tyingq wrote:
             | The buildings are VERY close to one another.
             | 
             | https://cdn.baxtel.com/data-center/ovh-strasbourg-
             | campus/pho...
        
               | Zevis wrote:
               | That seems, uh, problematic.
        
         | leajkinUnk wrote:
         | "Big cloud" has had fires take out clusters, and somehow they
         | manage to keep it out of the news. In spite of the redundancy
         | and failover procedures, keeping your data centers running when
         | one of the clusters was recently *on fire* is something that is
         | often only possible due to heroic efforts.
         | 
         | When I say "heroic efforts", that's in contrast to "ordinary
         | error recovery and failover", which is the way you'd want to
         | handle a DC fire, because DC fires happen often enough.
         | 
         | The thing is, while these big companies have a much larger base
         | of expertise to draw on and simply more staff time to throw at
         | problems, there are factors which incentivize these employees
         | to *increase risk* rather than reduce it.
         | 
         | These big companies put pressure on all their engineers to
         | figure out ways to drive down costs. So, while a big cloud
         | provider won't make a rookie mistake--they won't forget to run
         | disaster recovery drills, they won't forget to make backups and
         | run test restores--they *will* do a bunch of calculations to
         | figure out how close to disaster they can run in order to save
         | money. The real disaster will then reveal some false, hidden
         | assumption in their error recovery models.
         | 
         | Or in other words, the big companies solve all the easy
         | problems and then create new, hard problems.
        
           | exikyut wrote:
           | I'm curious what references or leads I might follow to learn
           | more about these fires and other events you mention.
        
             | leajkinUnk wrote:
             | Get a job working at these companies and go out for drinks
             | with the old-timers.
        
           | [deleted]
        
           | pm90 wrote:
           | You know, those are excellent observations. But they don't
           | change the decision calculus in this case. Using bigger cloud
           | providers doesn't eliminate all risk, it just creates a
           | different kind of risk.
           | 
           | What we call "progress" in humanity is just putting our best
           | efforts into reducing or eliminating the problems we know how
           | to solve without realizing the problems they may create
           | further down the line. The only way to know for sure is to
           | try it, see how it goes, and then re-evaluate later.
           | 
           | California had issues with many forest fires. They put out
           | all fires. Turns out, that solution creates a bigger problem
           | down the line with humongous uncontrollable fires which would
           | not have happened if the smaller fires had not been put out
           | so frequently. Oops.
        
       | crubier wrote:
       | The key differentiator of OVH is the very compact datacenters
       | they achieve thanks to water cooling. Some OVH exec were touting
       | about that in a recent podcast.
       | 
       | Interestingly in this case, having a very compact data center was
       | probably an aggravating factor. This shows how complex these
       | technical choices are, you have to think of operating savings,
       | with a trade off on the gravity of black swan events...
        
         | nnx wrote:
         | Interesting. That said, the technique is not the issue here,
         | losing a whole datacenter can always happen. This event would
         | have been much less serious if all the four SBG* datacenters
         | were not all so close to each other on the same plot of land.
         | 
         | They are so close to each other that they are basically the
         | same physical datacenter with 4 logical partitions.
        
           | delfinom wrote:
           | They are all annexes built up over time and a victim of their
           | own success (the site was meant to be more of a edge node in
           | size). The container based annexes were meant to be
           | dismantled 3 years ago but profit probably got in the way
        
       | nikanj wrote:
       | According to the official status page, the whole datacenter is
       | still green http://status.ovh.com/vms/index_sbg2.html
        
         | aetherspawn wrote:
         | You're completely right. IMO this is the best comment in this
         | whole thread. Their status page must be broken, or it's a lie.
        
           | bootloop wrote:
           | If there is a SLA with consequences associated with it every
           | status page is going to be a lie.
        
             | aetherspawn wrote:
             | Well, it sucks to catch fire and I care for the employees
             | and the firemen, but if their status page is a lie then I
             | have a whole lot less sympathy for the business. That's
             | shady business and they should feel bad.
             | 
             | I can appreciate an honest mistake though, like the status
             | page server cron is hosted in the same cluster that caught
             | fire and hence it burnt down and can't update the page
             | anymore.
        
               | Ploskin wrote:
               | Is the status page relevant though? At the very least,
               | OVH immediately made a status announcement on their
               | support page and they've been active on Twitter. I don't
               | see anything shady here. From their support page:
               | 
               | > The whole site has been isolated, which impacts all our
               | services on SBG1, SBG2, SBG3 and SBG4. If your production
               | is in Strasbourg, we recommend to activate your Disaster
               | Recovery Plan
               | 
               | What more could you want?
        
               | [deleted]
        
               | kahrl wrote:
               | To have a status page that reflects actual statuses? To
               | know that I'm not being lied to or taken advantage of? To
               | know that my SLA is being honored?
        
               | PudgePacket wrote:
               | > Is the status page relevant though?
               | 
               | What's the point of a status page then if it does not
               | show you the status? I don't want to be chasing down
               | twitter handles and support pages during an outage.
        
               | Scoundreller wrote:
               | Still better than Amazon where their status page
               | describes little and fat chance of anyone official
               | sharing anything on social media either.
               | 
               | I wonder if a server fire would cause Amazon to go to
               | status red. So far anything and everything has fallen
               | under yellow.
        
           | whalesalad wrote:
           | lol... that's how most status pages are
        
         | tweetle_beetle wrote:
         | I feel like there should be place to report infrastructure
         | suppliers with misleading status pages, some kind of
         | crowdsourced database. Without this information, you only find
         | out that they are misleading when something goes very wrong.
         | 
         | At best you might be missing out on some SLA refunds, but at
         | worst it could be disasterous for a business. I've been on the
         | wrong side of a update-by-hand status system from a hosting
         | provider before and it wasn't fun.
        
           | vntok wrote:
           | https://downdetector.fr/
        
             | hedora wrote:
             | Wtf is this disclaimer on Down Detector for? (Navigate to
             | OVH page.). It sits in front of user comments, I think:
             | 
             | > _Unable to display this content to due missing consent.
             | By law, we are required to ask your consent to show the
             | content that is normally displayed here._
        
               | Symbiote wrote:
               | It's a Disqus widget. If you denied consent for third
               | party tracking, they can't load it.
               | 
               | I've usually seen this with embedded videos rather than
               | comments.
        
             | tweetle_beetle wrote:
             | Thanks, can't belive it's taken me 8 years to learn about
             | that.
        
           | globular-toast wrote:
           | Who monitors the monitors?
           | 
           | Agreed, though. A fake status page is worse than no status
           | page. I don't mind if the status page states that it's
           | manually updated every few hours as long as it's honest. But
           | don't make it look like it's automated when it's not.
        
         | kkwtfeliz wrote:
         | The weather map is interesting: http://weathermap.ovh.net/
         | 
         | No traffic whatsoever between sbg-g1 and sbg-g2 ant their
         | peers.
        
         | deno wrote:
         | Screenshot: https://i.imgur.com/2rpWv0P.png
        
         | exikyut wrote:
         | https://archive.is/LOHVS
         | 
         | http://web.archive.org/web/20210310100021/http://status.ovh....
        
         | damsta wrote:
         | Four hours later still green.
        
         | sgeisler wrote:
         | It seems to be a static site, which seems reasonable since it
         | aggregates a lot of data and might encounter high load when
         | something goes wrong, so generating it live without caching is
         | not viable. So maybe the server that normally updates it is
         | down too (not that this would be a good excuse)?
        
           | keraf wrote:
           | 10+ hours cache on a status page doesn't look like real-time
           | monitoring to me.
           | 
           | I think this is probably linked to a manual reporting system
           | and they got bigger fish to fry at the moment than updating
           | this status page.
        
             | sigotirandolas wrote:
             | Counterpoint: There's a constantly updating timestamp at
             | the top of the page that suggests it's automated and real
             | time.
        
         | jakub_g wrote:
         | Also, 3 years ago they had an outage in Strasbourg, and the
         | status page was down apparently as a result of the outage.
         | 
         | https://news.ycombinator.com/item?id=15661218
         | 
         | They are not the only ones though. All too common. Well, it's
         | tricky to set this up properly. The only proper way would be to
         | use external infra for the status page.
        
           | lebaux wrote:
           | Isn't that why things like statuspage.io exist, though?
        
             | dolmen wrote:
             | Where is statuspage.io hosted?
        
               | cortesoft wrote:
               | I work for a CDN, and we had to change our status page
               | provider once when they became our customer.
        
               | formerly_proven wrote:
               | In the cloud of course. Why do you ask?
        
               | [deleted]
        
           | kalleboo wrote:
           | It's not difficult to make a status page with minimal false
           | negatives. Throw up a server on another host that shows red
           | when it doesn't get a heartbeat. But then instead you end up
           | with false positives. And people will use false positives
           | against you to claim refunds against your SLA.
           | 
           | So nobody chooses to make an honest status page.
        
             | jakub_g wrote:
             | Yep. I guess what could be done is a two-tiered status
             | page: automated health check which shows "possible outage,
             | we're investigating" and then a manual update (although
             | some would say it looks lame to say "nah, false positive"
             | which is probably why this setup is rare).
        
             | toast0 wrote:
             | As someone who maintained a status page (poorly), I'm sorry
             | on behalf of all status pages.
             | 
             | But, they're usually manual affairs because sometimes the
             | system is broken even when the healthcheck looks ok, and
             | sometimes writing the healthcheck is tricky, and always you
             | want the status page disconnected from the rest of the
             | system as much as possible.
             | 
             | It is a challenge to get 'update the status page' into the
             | runbook. Especially for runbooks you don't review often
             | (like the one for the building is on fire, probably).
             | 
             | Luckily my status page was not quite public; we could show
             | a note when people were trying to write a customer service
             | email in the app; if you forget to update that, you get
             | more email, but nobody posts the system is down and the
             | status page says everything is ok.
        
         | gpm wrote:
         | > Legend: Servers down: 0 1+ 4+ 6+ 8+ 10+ 15+
         | 
         | If they don't have any servers anymore, how can they be down ;)
        
       | refraincomment wrote:
       | I will never gonna financially recover from this..
        
       | kzrdude wrote:
       | Lichess was affected by this fire:
       | https://twitter.com/lichess/status/1369543554255757314
       | 
       | But they seem to be back up
        
       | odiroot wrote:
       | Interesting! I got the news from my local package courier
       | website. They warn, their services can be unreliable due to the
       | fire at OVH.
       | 
       | It's all connected.
        
       | [deleted]
        
       | simplecto wrote:
       | This is horrible and a sobering reminder to do the things we
       | don't enjoy or consider -- disaster recovery.
       | 
       | How many of us here plan Fire Drills within our teams and larger
       | organizations?
        
       | tester34 wrote:
       | holy shit my 3$ VPS!!
       | 
       | nvm not this dc!
        
       | canadianfella wrote:
       | I hereby declare this to be a fire.
        
       | martinald wrote:
       | Interesting. I wonder if the cladding was a major problem here?
       | It looks like it has all burnt out and could have had the fire
       | spread extremely rapidly on the outside.
        
         | jiofih wrote:
         | The cladding was metal so very unlikely it contributed to the
         | fire spreading.
        
       | mot2ba wrote:
       | Ovh needed a better firewall :(
        
       | rjsw wrote:
       | The photos look like the building had external cladding, wonder
       | if that contributed to the size of the blaze [1].
       | 
       | [1] https://en.wikipedia.org/wiki/Grenfell_Tower_fire
        
       | [deleted]
        
       | thbb21 wrote:
       | My VPS in SBG3 stopped pinging around 9am.
       | 
       | My impression is that they tried very hard to maintain uptime,
       | which was probably a bad idea when we see the extent of the
       | damages. This VPS just hosts external facing services and is easy
       | to set back up.
        
       | zoobab wrote:
       | To my experience, backups in companies are barely done.
       | 
       | Companies want quick money, they push people to skip important IT
       | operations, like disaster recovery plans.
       | 
       | And backups are the least monitored systems.
        
       | mwcampbell wrote:
       | I just recently started moving some services for my business to
       | one of OVH's US-based data centers. Should I take this fire as
       | evidence that OVH is incompetent and get out? I really don't want
       | AWS, or the big three hyperscalers in general, to be the only
       | option.
        
         | lenartowski wrote:
         | IMO you should take this fire as evidence, that you need to
         | have (working!) backups wherever you host your data. AWS, GCP
         | Azure are not fire resistant, same as OVH. I don't know if OVH
         | is more or less competent than big three, I choose to trust no
         | one.
        
           | ghosty141 wrote:
           | I read multiple times that they didn't even have sprinklers,
           | only smoke detectors in their EU datacenter(s). I'm 100% sure
           | AWS, Azure and Google have better fire prevention.
        
             | Symbiote wrote:
             | This thread has people saying they have sprinklers, don't
             | have sprinklers, have / don't have gas suppression, and
             | have puppies / actually have toilets.
             | 
             | Wait for the misinformation hose to dry up, and decide in a
             | few weeks.
             | 
             | https://us.ovhcloud.com/about/company/security
        
       | SamLicious wrote:
       | This is such a an eye opening story, didn't know about any of
       | this. Thank you for sharing!
        
       | jbeales wrote:
       | > If your production is in Strasbourg, we recommend to activate
       | your Disaster Recovery Plan.
       | 
       | Ouf.
        
       | aw4y wrote:
       | elliot..is it you?
        
       | drpgq wrote:
       | Like something out of Mr Robot
        
       | Aldipower wrote:
       | Looks a little bit like Fukushima. I hope the clean up doesn't
       | take that long though..
        
       | MattGaiser wrote:
       | > We recommend to activate your Disaster Recovery Plan.
       | 
       | What percentage of organizations have these?
        
         | RantyDave wrote:
         | I guess we're about to find out (or, rather, they are).
        
         | ikiris wrote:
         | Some plans have a single step: 1) panic.
        
         | dylan604 wrote:
         | Short answer: not enough.
        
       | ohnonotagain9 wrote:
       | (site a)---[replicate local LUNs/shares to remote storage
       | arrays]--->(site b), (site a)---[replicates local VMs to remote
       | HCI]--->(site b), (site a)---[local backups to local data
       | archive]--->(site a), (site a)---[local data archive replicates
       | to remote data archive]--->(site b), (site b)---[remote data
       | archive replicates to remote air gapped data archive]--->(site
       | b), (site a)---[replicates to cold storage on
       | aws/gcp/azure]--->(site c), (site c)---[replicate to another geo
       | site on cloud]--->(site d)
       | 
       | scenario 1: site a is down plan: recover to site b by most
       | convinent means
       | 
       | scenario 2: site b is down plan: restore services, operate
       | without redundancy out of site a
       | 
       | scenario 3: site c is down plan: restore services, catch up
       | later. continue operating out of site a
       | 
       | scenario 4: site b and c down plan: restore services, operate
       | without redundancy out of site a
       | 
       | scenario 5: site a and b down plan: cross fingers, restore to new
       | site from cold storage on expensive cloud VM instances
       | 
       | scenario 6: data archive corrupted ransomware plan: restore from
       | air gapped data archive, hope ransomware was identified within 90
       | days
       | 
       | scenario 7: site b and c down, then site a down plan: quit
       | 
       | scenario 8: staff hates job and all quit plan: outsource
       | 
       | scenario 9: and so on...
        
       | ricardobeat wrote:
       | > At 00:47 on Wednesday, March 10, 2021, a fire broke out in a
       | room in one of our 4 datacenters in Strasbourg, SBG2. Please note
       | that the site is not classified as a Seveso site.
       | 
       | > Firefighters immediately intervened to protect our teams and
       | prevent the spread of the fire. At 2:54 am they isolated the site
       | and closed off its perimeter.
       | 
       | > By 4:09 am, the fire had destroyed SBG2 and continued to
       | present risks to the nearby datacenters until the fire brigade
       | brought the fire under control.
       | 
       | > From 5:30 am, the site has been unavailable to our teams for
       | obvious security reasons, under the direction of the prefecture.
       | The fire is now contained.
        
       | [deleted]
        
       | nerdbaggy wrote:
       | Dang, I have a lot of respect for Octave and what he has created.
       | https://twitter.com/olesovhcom?s=21
        
         | de6u99er wrote:
         | After the total loss of one data-center I would tend to
         | disagree with this statement.
        
       | redisman wrote:
       | Wow. That equipment is going to be very hard to replace right now
       | too.
        
       | TrueDuality wrote:
       | There have been a handful of talks at computer security
       | conferences talking about setting up physical traps in server
       | chassis (such as this one:
       | https://www.youtube.com/watch?v=XrzIjxO8MOs). Since seeing those
       | I've been waiting for some idiot to try something like that in a
       | physical server and burn down a data center.
       | 
       | There is NO evidence that is what happened here, and I don't
       | think OVH allows customers to bring their own equipment making
       | even less likely. Still I wait and hope to hear a root cause from
       | this one.
        
       | IceWreck wrote:
       | Always, always keep local backups folks.
        
         | coolspot wrote:
         | "Local" meaning on the same server, just in another folder,
         | right?
        
           | IceWreck wrote:
           | Sorry, I should've phrased it better.
           | 
           | Local as in your home/office. While your application may run
           | in AWS or whatever remote server, its necessary to have
           | copies of your data that you can physically touch and access.
           | 
           | One main deployment, one remote backup and one onsite
           | physically accessible backup.
        
       | k_sze wrote:
       | I sent this story to my colleagues and one of them asked "where
       | is the FM200?"
       | 
       | I don't really know how FM200 systems work in data centres, but
       | I'm guessing that if the fire didn't start from within the actual
       | server room, FM200 might not save you? e.g. if a fire started
       | elsewhere and went out of control, it would be able to burn
       | through the walls/ceiling/floor of the server room, in which case
       | no amount of FM200 gas can save you, right?
       | 
       | Another possibility, of course, is that the FM200 system simply
       | failed to trigger even though the fire started from within the
       | server room.
       | 
       | There is no published investigation details about this incident
       | yet, I believe. Can somebody chime in about past incidents where
       | FM200 failed to save the day?
        
         | _up wrote:
         | Apparently they don't use gas at all.
         | 
         | https://lafibre.info/ovh-datacenter/ovh-et-la-protection-inc...
        
           | k_sze wrote:
           | Ah, so it's possible that they also used a water sprinkler
           | system at SBG2. But still, I wonder how the fire protection
           | system (water sprinkler, FM200, or otherwise) not save SBG2?
           | 
           | It doesn't really surprise me that the machines are dead, but
           | the whole place being _destroyed_ is much more surreal.
        
         | etiennemarcel wrote:
         | I think most of these gases are or will eventually be banned in
         | Europe because of their impact on the environment. I've seen
         | newer datacenters use water mist sprays.
        
           | mike_d wrote:
           | Nitrogen makes up 78% of the atmosphere, so I doubt it will
           | be banned. Most datacenters don't actually use halocarbons
           | despite the common "FM200" name.
        
           | divingdragon wrote:
           | You might be thinking of Halons, which are CFCs that depletes
           | the ozone layer? They are mostly phased out worldwide but
           | existing installations might still be in use.
           | 
           | FM200 is something else that is often used in modern builds
           | (not just datacenters).
        
             | etiennemarcel wrote:
             | It seems that HFC are being phased out too:
             | https://ec.europa.eu/clima/policies/f-gas/legislation_en
        
               | divingdragon wrote:
               | I've heard that one. I thought it mostly affects
               | refrigerants, but I didn't notice that FM200 is also an
               | HFC. There are other fire suppression gasses with a low
               | global warming potential, which probably can still be
               | used in the future.
        
           | exikyut wrote:
           | How... what. What if the fire is electrical? You can't just
           | go "well the triple interlocked electrical isolation will
           | trip and cut the current" if a random fully-charged UPS
           | decides to get angry...
        
       | giis wrote:
       | I don't know whether my 10+ year side project with 225,000
       | users(www.webminal.org) gone forever! :(
       | 
       | I have backup snapshots but its stored in ovh itself :( hoping
       | for a miracle!
        
         | quickthrower2 wrote:
         | Really sorry to hear, hope you get it restored. I cannot judge
         | - I use the default backup options in Azure and hope they store
         | it in another data centre but never though to check too hard.
         | This is very bad luck.
         | 
         | Hopefully you had the code in GitHub but that still leaves the
         | DB. It looks like your has something to do with command line or
         | Linux lessons so not sure how much user data is critical? Maybe
         | you can get this up and running again to some extent.
        
         | XCSme wrote:
         | What was the reason of storing the backup on the same server?
         | To allow for rollbacks in case of data corruption or some
         | changes gone wrong?
        
           | giis wrote:
           | Yes, I thought rollbacks will be much easier in case of data
           | loss. :sob:
        
           | Operyl wrote:
           | I think they're talking about the backup services provided by
           | OVH, in which I believe they're stored in RBX.
        
         | kuschku wrote:
         | If your backup snapshots are stored through OVH's normal backup
         | functionality, then create a new server at e.g. RBX now, and
         | restore from those backups. That'll take a few hours and it'll
         | all be up again quickly.
        
       | nerdbaggy wrote:
       | Here is some pics of what it looked like before, spg 1-4 , and
       | it's history
       | 
       | https://baxtel.com/data-center/ovh-strasbourg-campus
        
         | gameshot911 wrote:
         | Is it just my laptop or is only ~40% of vertical screen real-
         | estate dedicated to the actual content? :<
        
           | softblush wrote:
           | It's just you
        
           | aetherspawn wrote:
           | I have the same issue on my XPS 13 (4K screen), the header
           | takes up a good 30% or so of the height of the screen and
           | it's like reading through a mailbox slit.
        
           | AnssiH wrote:
           | Nope, exactly the same here (on landscape phone).
        
         | bmurray7jhu wrote:
         | SBG3 is almost adjacent to SBG2. It is impressive that the
         | firefighters saved SGB3 with minimal damage.
        
         | eb0la wrote:
         | I'm surprised to see how close the DCs are to the river.
         | Fortunately it's in the high part of the river, less prone to
         | overflow.
        
           | RantyDave wrote:
           | Maybe quite handy for water cooling?
        
       | anilakar wrote:
       | Finally a legitimate reason for BSD sysadmins to run poweroff -n!
        
       | awalias wrote:
       | not to be a conspiracist, but are they still hosting wikileaks
       | data? https://en.wikipedia.org/wiki/OVH#WikiLeaks
        
       | lgleason wrote:
       | https://www.youtube.com/watch?v=1EBfxjSFAxQ
        
         | mhh__ wrote:
         | I'm not sure if it's because my tolerance of Graham Linehan has
         | snapped or not, but I barely laugh at the IT Crowd any more. As
         | with other GL shows I find it's just mostly held together but
         | the cast's delivery and such
         | 
         | The laugh track and the writing is honestly dated even by the
         | standards of Dads Army.
        
           | jrockway wrote:
           | I don't remember the details, but I think that season 2 kind
           | of retroactively ruined season 1. They used to have all those
           | O'Reilly and EFF stickers, and working at a help desk at the
           | time, it felt very authentic. Then everything got super nice
           | in season 2 -- leather couches, people were dressing nicely,
           | etc. It kind of lost its charm. You can't rewatch it because
           | you know Denholm is just going to randomly jump out of a
           | window.
           | 
           | (Having said that, I think "Fire" was a memorable episode
           | that is still amusing. The 0118911881999119 song, "it's off,
           | so I'll turn it on... AND JUST WALK AWAY".)
           | 
           | It might have been ahead of its time. Silicon Valley was well
           | received and is as nerdy and intricately detailed as Season 1
           | of the IT Crowd. "Normal people" thought it was far out and
           | zany. People that work in tech have been to all those
           | meetings. And, a major character was named PG!
        
             | wott wrote:
             | > They used to have all those O'Reilly and EFF stickers,
             | and working at a help desk at the time, it felt very
             | authentic. Then everything got super nice in season 2 --
             | leather couches, people were dressing nicely, etc. It kind
             | of lost its charm.
             | 
             | That sounds like a pretty realistic allegory of the last
             | two decades in Free Software (or software in general, or
             | the web...)
        
             | kuschku wrote:
             | I only just realized the Paul Graham/Peter Gabriel easter
             | egg...
        
           | muglug wrote:
           | The IT Crowd's comedy became dated incredibly quickly, just
           | like Father Ted's.
           | 
           | Comedies that came later ditched the laugh track. They had to
           | work harder to get viewers at home to laugh, but ultimately a
           | bunch of them (starting with The UK Office) hold up much
           | better as a result.
        
       | Tepix wrote:
       | Last update
       | 
       | https://twitter.com/olesovhcom/status/1369535787570724864?s=...
       | 
       | "Update 7:20am Fire is over. Firefighters continue to cool the
       | buildings with the water. We don't have the access to the site.
       | That is why SBG1, SBG3, SBG4 won't be restarted today."
        
       | NorwegianDude wrote:
       | I can't see anything about a fire suppression system mentioned?
       | Doesn't OVH have one, except for colocation datacenters?
       | 
       | A fire detection system using eg. lasers and Inergen(or Argonite)
       | for putting the fire out is commonly used in datacenters. The gas
       | fills the room and reduces the amount of oxygen in the room so
       | most fires are put out within a minute.
       | 
       | The cool thing is that the gas is designed to be used in rooms
       | with people, so that is can be triggered any time. It is however
       | quite loud, and some setups have been known to be too loud, even
       | destroying harddrives.
        
         | monsieurbanana wrote:
         | > even destroying harddrives
         | 
         | So loud that it destroys hard drives... That's scary, are
         | people's eardrums much more resistant?
        
           | [deleted]
        
         | MattGaiser wrote:
         | Are fires common in data centres? Specialized fire suppression
         | tech seems to indicate that they are.
        
           | jhugo wrote:
           | The decision to design and implement specialised tech comes
           | from a combination of how likely the risk is and the
           | magnitude of the potential loss. Fires are not that common in
           | DCs, but the potential loss can be enormous (as OVH is
           | currently finding out).
        
           | jabroni_salad wrote:
           | They aren't super common, but halon and other gas systems are
           | just the right tool for the job. It can get inside the server
           | chassis and doesn't damage equipment like a chemical
           | application would. We won't know what went wrong at OVH until
           | a proper post mortem comes out. These systems work by
           | suppressing the flame reaction, but if the actual source was
           | not addressed, it could reignite after awhile.
        
           | perlgeek wrote:
           | Yes, mostly due to two reasons:
           | 
           | * overall high energy density (lots of current flowing
           | everywhere)
           | 
           | * the batteries for backup power are dangerous, and can
           | easily(ish) overheat when activated.
        
         | torh wrote:
         | A few years ago the company I work for installed suppressors on
         | the Inergen system. It did trigger from time to time, which was
         | tracked to the humidifiers. And yes -- it did destroy
         | harddrives because of the pressure/sound waves before we
         | installed the supressors. Haven't had any incidents after we
         | fixed the humidifiers.
         | 
         | But Inergen (and other gas) is more or less useless if you
         | allow it to escape too quickly. So the cooling system should be
         | a fairly closed circut.
         | 
         | Edit: I'm also a Norwegian dude. :)
        
         | pmontra wrote:
         | What's the point of a fire suppression system that destroys
         | what it should protect?
        
           | rini17 wrote:
           | When it destroys one room but prevents fire spreading to
           | whole building.
        
           | qeternity wrote:
           | In a data center the fire suppression is mostly there to
           | protect the servers.
        
         | jhugo wrote:
         | At a datacentre I used to visit years ago, part of the site
         | induction was learning that if the fire suppression alarm went
         | off, you had a certain amount of time to get out of the room
         | before the argon would deploy, so you should always have a path
         | to the nearest exit in mind. The implication was that it wasn't
         | safe to be in the room once it deployed, but I don't know for
         | sure.
        
           | NorwegianDude wrote:
           | Inergen is used to lower oxygen to ~12.5 %, from normal 21 %.
           | Most fires need more than 16 % oxygen.
           | 
           | Inergen consists of 52 % nitrogen, 40 % argon and 8 % carbon
           | dioxide.
           | 
           | Carbon dioxide might sound strange, but it causes the heart
           | to beat faster to compensate for the lowered amounts of
           | oxygen.
           | 
           | The whole point of Inergen is to quickly put out fires where
           | water/foam/powder isn't usable and the room might contain
           | people.
           | 
           | It's cool stuff, and I was under the impression that
           | basically all datacenters used it.
        
           | saagarjha wrote:
           | Presumably you would have trouble breathing as it displaced
           | the oxygen in the room?
        
             | edejong wrote:
             | I heard that the pressure difference might rupture your ear
             | drums.
        
             | jhugo wrote:
             | Yeah, that was always my understanding. GP saying "the gas
             | is designed to be used in rooms with people, so that [it]
             | can be triggered any time" made me second-guess that
             | though.
             | 
             | Maybe there is a concentration of oxygen that is high
             | enough for humans to survive, yet too low to sustain
             | combustion?
        
               | thspimpolds wrote:
               | The "gas" is more like a fog of capsules. FM-200 is a
               | common one. Basically it has a Fire suppression agent
               | inside crystals which are blasted into the room by
               | compressed air. These crystals melt when they get over a
               | certain temperature and therefore won't kill you;
               | however, breathing that in isn't really pleasant.
               | 
               | Source: I've been in an FM-200 discharge
        
               | lol768 wrote:
               | > Maybe there is a concentration of oxygen that is high
               | enough for humans to survive, yet too low to sustain
               | combustion?
               | 
               | Yes, supposedly (12% ish?). Can't say I'd be thrilled at
               | the idea of testing it.
        
         | growt wrote:
         | So how is it safe with people if there is no oxygen left to
         | breathe? Reminds me of my first trip to a datacenter, where the
         | guy who accompanied us said: "In the event of a fire this room
         | is filled with nitrogen in 20 seconds. But don't worry:
         | nitrogen is not toxic!" Well, I was a little worried :)
        
           | tallanvor wrote:
           | Newer systems like the ones mentioned above are designed to
           | reduce the amount of oxygen in the room to around 12% (down
           | from around 21%). That's low enough to extinguish fires, but
           | allows people to safely evacuate and prevent people from
           | suffocating if they're incapacitated.
        
       | yalogin wrote:
       | I don't know what ovh is and going to the site point me to a
       | speed test with no information.
        
         | MattGaiser wrote:
         | French AWS/GCP is my understanding.
        
           | tyingq wrote:
           | They are really more like Hetzner. They have "cloud", but
           | most of the business is dedicated servers. They also operate
           | kimsufi.com and soyoustart.com.
           | 
           | They do have APAC, Canadian, and US data centers as well.
        
         | beckler wrote:
         | They're a web/hosting service provider, like AWS or GCP, but
         | with less services. They're much more popular in European
         | countries.
        
         | tecleandor wrote:
         | You went to ovh.net instead of ovh.com.
         | 
         | Largest hosting provider in Europe, probably top10 or top5 in
         | the world.
        
         | kuschku wrote:
         | Imagine AWS, with less features, but 10x-100x lower prices. And
         | now you know why until a few years ago they were larger in
         | traffic, customers, and number of servers than even AWS.
        
       | qwertykb wrote:
       | Honestly just for the memes. https://isovhonfire.com
        
         | donatj wrote:
         | Did this already exist or was this thrown up insanely fast.
        
           | qwertykb wrote:
           | Threw it up really quick.
        
           | navanchauhan wrote:
           | whois[0] shows it was registered on March 10, so thrown up
           | insanely fast.
           | 
           | [0] https://who.is/whois/isovhonfire.com
        
         | bithaze wrote:
         | Probably not a good look for this to be the first thing another
         | hosting company thinks to put up in response.
        
           | qwertykb wrote:
           | /shrug/ We have our infrastructure partially in OVH, I see it
           | as a friendly jab at them and a way to get updates without
           | having to navigate to twitter.
        
           | sofixa wrote:
           | OVH aren't nice in that regard, and have trolled competitors
           | to "leave it to the pros" before when there were serious
           | incidents ( of which OVH have had their fair share), so it's
           | not surprising.
        
       | manishsharan wrote:
       | And this is why the big 3 will continue to dominate. AWS,
       | Microsoft and Google can throw in a lot more money at their
       | phyiscal infrastructure than any other cloud provider.
       | 
       | After this sorry episode, I dont think any CTO or CIO of any
       | public company will be able to even consider using the other
       | guys.
       | 
       | edit: I am not implying that we put all eggs in one basket with
       | no failover and dr. I am implying the big cos will pay 2x premium
       | on infrastructure to project reliability.
        
         | fooyc wrote:
         | I could replicate my whole infrastructure on 3 different OVH
         | datacenters, with enough provision to support twice the peak
         | load - it would still be cheaper than a single infrastructure
         | at AWS, and I would get a better uptime than AWS:
         | https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
        
           | manishsharan wrote:
           | I agree with you 100%. However, I did state * any CTO or CIO
           | of any public company* ... the executives don't worry about
           | costs , they worry about being able to * project reliability
           | * .
        
             | fooyc wrote:
             | Executives that worry about reliability would insist on
             | deploying on multiple data centers, which would make the
             | project more reliable than any single AWS availability
             | zone.
             | 
             | Also, cost matters if the AWS bill is one of the company's
             | top expenses.
        
           | joshuamorton wrote:
           | reserved a1.large instances are about half the price of OVH's
           | b2-7 instance. a1.xlarge are still cheaper (and larger). So
           | you get more raw compute per dollar on AWS.
           | 
           | What?
        
             | kuschku wrote:
             | If you need that large machines, and are willing to use
             | reserved instances, you'd go with dedis on OVH instead of
             | VMs. Which is significantly cheaper
        
               | joshuamorton wrote:
               | OVH dedicated instances start at about the size of an
               | a1.metal instance, which is ~30% more than the comparable
               | OVH instance, but you can get discounts in various ways.
               | 
               | Or you could use t4g.2xlarge, which is cheaper. There's
               | no situation where OVH is 3x cheaper (I mean maybe if
               | bandwidth is your thing, but IDK).
        
               | jiofih wrote:
               | Performance of those AWS instances comes nowhere close
               | despite the specs.
        
         | christophilus wrote:
         | With Google arbitrarily killing accounts, and with Amazon
         | showing that they'll do the same if it's politically expedient,
         | I'm not sure I'd trust the big three, either. It's a case of
         | "pick your poison".
        
         | gilrain wrote:
         | If all of your infrastructure is in one data center, you're on
         | a disaster clock no matter who you choose.
        
       ___________________________________________________________________
       (page generated 2021-03-10 23:01 UTC)