Post AZy7AiXYqF7hO2hvs0 by apgarcia@fosstodon.org
(DIR) More posts by apgarcia@fosstodon.org
(DIR) Post #AZy0mINTrXF5ZOxpsO by stefano@mastodon.bsd.cafe
2023-09-20T18:38:01Z
0 likes, 1 repeats
How many of you remember that night/day? I had 142 servers there.Luckily, all the backups were outside and by noon everything was up and running again (in another datacenter, of course). I started immediately after reading the keyword "fire" and "prepare your disaster recovery plan". #Fire #SysAdmin #IT #OVH #OVHFire #Backup #DisasterRecovery
(DIR) Post #AZy1XG06WAnpoAQHj6 by lukem@hachyderm.io
2023-09-20T18:46:10Z
0 likes, 1 repeats
@stefano I remember. Nothing suffered on my end but at some point I started learning Ansible so that I could quickly redeploy my servers anywhere as quick as possible.
(DIR) Post #AZy4lqN8O0TiWqqHAm by fedops@fosstodon.org
2023-09-20T19:22:25Z
0 likes, 1 repeats
@stefano cool. That is a pretty impressive real world DR plan verification, congratulations!
(DIR) Post #AZy50TY9WNDQ5fyNlo by stefano@mastodon.bsd.cafe
2023-09-20T19:25:24Z
0 likes, 0 repeats
@fedops Thank you! But I was hoping I wouldn't have to test it in the real world...
(DIR) Post #AZy5AP4L2HNp5alGoy by fedops@fosstodon.org
2023-09-20T19:26:52Z
0 likes, 1 repeats
@stefano that's the introduction sentence to every good DRP: "We hope we'll never need this; however..." 😄
(DIR) Post #AZy5Gv3F2UYW1BM6CW by stefano@mastodon.bsd.cafe
2023-09-20T19:28:23Z
0 likes, 0 repeats
@fedops I had a call, some hours ago, and I described how I recovered all those servers in just a few hours. They were shocked (as they lost data and many of their services have been down for days). Experience is the best teacher.
(DIR) Post #AZy5XbMpTebntMTLHs by grant@social.zerotier.com
2023-09-20T19:31:03Z
0 likes, 0 repeats
@stefano We had transitioned off of OVH a few years prior to the fire due to another incident that knocked out 2 of the 3 data centers we were operating in.
(DIR) Post #AZy7AiXYqF7hO2hvs0 by apgarcia@fosstodon.org
2023-09-20T19:49:19Z
0 likes, 0 repeats
@stefano At my previous job, we lost a lot fewer servers than that in a fire, and it took us from Friday afternoon until Monday morning to recover. We improved our DR plan afterwards, but still, "a few hours"? That is just damn impressive.
(DIR) Post #AZy8PPsYbeFvo86LFw by stefano@mastodon.bsd.cafe
2023-09-20T20:03:32Z
0 likes, 0 repeats
@grant yes, not a great time for OVH...
(DIR) Post #AZy8ohPiY6rhQkfRzs by stefano@mastodon.bsd.cafe
2023-09-20T20:08:05Z
0 likes, 0 repeats
@apgarcia thank you. All the servers had an external backup. Some were FreeBSD hosts with jails and bhyve VMs, so it's just been a matter of zfs send and receive to other hosts. Others were Linux hosts with some VMs. We had the backups of the single VMs so it's just been a matter of firing up a new VM elsewhere, rsync everything from the backup, fix grub and the IP address and boot. In series, everything has been processed quite easily. The most difficult part has been identifying hardware and VMs to store/run them but we did it.
(DIR) Post #AZy8vChN97RWCD1jxw by stefano@mastodon.bsd.cafe
2023-09-20T20:09:16Z
0 likes, 0 repeats
@grant I'm still using OVH but less than before and with more backups 😉
(DIR) Post #AZyEE4zSWrI9YuJ600 by bekopharm@social.tchncs.de
2023-09-20T21:08:19Z
0 likes, 0 repeats
@stefano had none there but we were watching of course. So many bits moved into the cloud that day. It was a very moving event.And yes, that off-site backup plan was worth it 👍
(DIR) Post #AZyFkb1B5OZm7CBDFY by stefano@mastodon.bsd.cafe
2023-09-20T21:25:47Z
0 likes, 0 repeats
@bekopharm absolutely! I'm old school, I feel safe when data are away (and when a copy is under my desk)