Post B0OtQAk5Tmc2qT7CoS by HerraBRE@mastodon.xyz
(DIR) More posts by HerraBRE@mastodon.xyz
(DIR) Post #B0OtQAk5Tmc2qT7CoS by HerraBRE@mastodon.xyz
2025-11-19T02:06:22Z
1 likes, 1 repeats
The #CloudFlare outage follows a pattern we have seen before. Was it Google last time?1. Generate an exciting config file2. Auto-deploy the file everywhere3. Everything everywhere crashesAll these big systems want to be able to react quickly to certain types of events, so they probably all have this failure mode baked in. Because security! Or some such."Obviously" the files deployed this way "should" be validated carefully. And there "should" be canaries and staged roll-outs... Should.
(DIR) Post #B0OtQIk1jtJleb9GpU by HerraBRE@mastodon.xyz
2025-11-19T02:14:59Z
1 likes, 1 repeats
There's a fun "#devops is hard" lesson here (#CloudFlare).1. Because Security, you want to be able to deploy global changes very quickly2. Because Reliability, you want staged roll-outs that pause or even auto-revert if key metrics get worseYou can't have both 1 and 2 at the same time.And the temptation to go fast sometimes WILL prove irresistible.So if you're looking for ways to globally cripple a big cloud, this is the pattern to look for: what is too urgent for staged roll-outs?