[HN Gopher] Facebook-owned sites are down
___________________________________________________________________
Facebook-owned sites are down
Author : nabeards
Score : 2242 points
Date : 2021-10-04 15:45 UTC (7 hours ago)
(HTM) web link (facebook.com)
(TXT) w3m dump (facebook.com)
| [deleted]
| platz wrote:
| [redacted]
| treesknees wrote:
| What evidence is there to suggest this is due to Facebook being
| down?
| throwawaylolx wrote:
| When I noticed HN was loading slowly, I already knew FB was down.
| [deleted]
| ballenf wrote:
| One real potential cost to FB here is breaking people's
| addictions to FB and IG. This might just be the little finger-
| snap to wake up a sizable chunk of the user base that they life
| is just a little better during the outage.
| rootinier wrote:
| nslookup www.facebook.com 8.8.8.8 Server: 8.8.8.8 Address:
| 8.8.8.8#53
|
| * server can't find www.facebook.com: SERVFAIL
| aduitsis wrote:
| dig +trace messenger.com
|
| shows that all is well with the root DNS servers and
| dig @a.ns.facebook.com messenger.com ;; connection timed
| out; no servers could be reached
|
| and also ping a.ns.facebook.com 3 packets
| transmitted, 0 packets received, 100.0% packet loss
|
| shows that something's wrong with facebook.
| DebtDeflation wrote:
| I assume all of the tertiary sites that use "Login with Facebook"
| are broken now too? So glad I never adopted that.
| pmlittle wrote:
| This is all total left brain looping and complexity coming home
| to roost.
| kaustubhvp wrote:
| it is always DNS!
| gprasanth wrote:
| Suspecting it might be related to the recent letsencrypt cert
| authority expiring? Was just debugging an issue earlier today and
| just couldn't help wondering how much of the internet is secured
| by letsencrypt.
|
| All of the static hosts providing free SSL: vercel, netlify,
| render, firebase hosting, github pages, heroku etc. ...
|
| It does work on modern browsers and devices but goes terribly
| broken on a lot of old devices.
| jaywalk wrote:
| Obviously not possible to check right now to provide proof, but
| I feel quite confident in saying that Facebook does not use
| Let's Encrypt. It's also clearly not an SSL issue.
| [deleted]
| gprasanth wrote:
| You're right. Fb doesn't seem to be using letsencrypt.
|
| https://crt.sh/?q=facebook.com
|
| On a side note, the amount of phishing sites using
| letsencrypt and having a domain similar to facebook.com is
| quite appalling.
| [deleted]
| tomerbd wrote:
| No like for you
| bubblehack3r wrote:
| Their stock is down 5% too. Everything is down for them today ;)
| i_like_apis wrote:
| Great dip to buy. (I'm not a facebook zealot, but you know it
| will recover today or tomorrow once the DNS is sorted in an
| hour or two)
|
| Of course there is the whistle-blower issue too...
| derwiki wrote:
| I don't think stock dip is related to downtime; anecdotally,
| I've never seen a company's stock affected by downtime
| (unless that downtime destroys the business)
| i_like_apis wrote:
| You may be right, but theres a Reuters article about the
| downtime, this is making the news today. I would say
| Facebook is different because of their scale.
|
| Looks like there are a few problems with fb in the news
| today ...
| typingmonkey wrote:
| If it is an DNS error, why is the .onion site also offline?
|
| - https://en.wikipedia.org/wiki/Facebook_onion_address
|
| - facebookwkhpilnemxj7asaniu7vnjjbiltxjqhye3mhbshg7kx5tfyd.onion
| detaro wrote:
| It's not just/primarily a DNS error.
| ApolIllo wrote:
| My guess is that the FB backend also required DNS. The .Onion
| site isn't backed by a backend built on a onion native stack
| (is that a thing?)
| lifthrasiir wrote:
| DNS outage is an _outcome_ of faulty BGP updates. As such not
| only the Internet can 't see the FB network, there is also no
| connectivity from the FB network to the Internet right now.
| susahahhaha wrote:
| add comment
| aritraghosh007 wrote:
| The down page shows copyright from 2020 smh
| fartingflamingo wrote:
| From the archived ramenporn reddit comment thread at [0]:
|
| > This must be incredibly stressful so for your sake I hope you
| sort it out quickly... but for the world's sake, I hope you fail
| and make the problem worse before jumping ship followed by every
| other engineer, leaving it to Zuckerberg to fix himself. But I
| still hope it's not too stressful for you!
|
| https://archive.is/Idsdl
| ur-whale wrote:
| > Facebook-owned sites are down
|
| And the world rejoiced.
| new-day-rising wrote:
| Thoughts and prayers...
| [deleted]
| CodeGlitch wrote:
| Looks like HN is being hit pretty hard right now?
| mrfusion wrote:
| Thought experiment: what if they were down for a week and the
| world completely healed itself?
| quattrofan wrote:
| And right now, the world is a slightly better place
| gunshai wrote:
| Posting this comment will be like farting into a hurricane, but
| here goes.
|
| Company like Facebook has a serious problem and their stock drops
| ... precipitously. CEO of said company instead of selling their
| equity in their company has taken out loans against their equity
| in order to decrease their tax burden and cash in on the value of
| their equity.
|
| What amount of decrease would cause a margin call from lenders
| for the forced sale of said equity and subsequently the loss of
| majority stake in their own company? Now obviously only the
| lenders know this information and assuming I have the rough order
| of operations correct.
|
| Could this be a potential chink in the armor of founders / CEOs /
| anyone who takes out low interest loans against the equity they
| hold in their company? Maybe my understanding of this is too
| simplified.
| osrec wrote:
| Margin calls don't really exist as far as loans are concerned.
| Once you have agreed collateral, and an agreed schedule of
| payment, you only get in trouble if you miss a payment,
| regardless of how the collateral fluctuates in value.
| gunshai wrote:
| Okay, but as the value of the collateral approaches 0 your
| lender asks you to increase your collateral correct?
| odonnellryan wrote:
| Yes. But banks won't loan 100% on equities.
|
| This scenario would just be basically impossible.
| gunshai wrote:
| I am not sure you understand my question / hypothetical.
| A bank is not the only form of a lender first off, second
| the reason there isn't a 100% loan on an equity is that
| it's understood that the value of the underlying
| collateral can fluctuate. These are called over
| collateralized loans.
| MichaelBurge wrote:
| I currently have a $150k margin loan, and it absolutely will
| lead to forced liquidation if the collateral drops in value:
| Interactive Brokers was clear about that.
|
| Separately, my bank tried to sell me a "Pledged Asset Line of
| Credit" that would also have required the collateral to
| maintain a certain value or there would be forced
| liquidations.
|
| Can you link an example of a bank or similar that lets you
| borrow against stock or options collateral without margin
| call risk?
| osrec wrote:
| A margin loan is very different to the kind of financing
| agreement a company will enter into. You are using the
| money at IB to speculate, and probably purchasing volatile
| assets at that. A company will generally utilise that money
| very differently, and it is unlikely that a lending
| institution will accept shares as collateral due to the
| wrong way risk (i.e. if they can't service their debt,
| their shares are probably losing value too, so probably not
| good as collateral).
| xwdv wrote:
| He might not be speculating, he might be holding a bunch
| of SPY shares and simply withdrew $150k as a margin loan
| so he can make a purchase on a house or car but not pay
| taxes on gains yet from selling his shares, opting
| instead to pay off the loan over time through regular
| deposits.
| dtnewman wrote:
| That might be true for a house or a small scale loan, but
| once you are dealing with billions I doubt that's the case. I
| assume that it works as follows: you have $1b in stock, Bank
| gives you $500m line of credit. If stock goes down enough
| they force a sale, but they only sell against what you have
| actually Utilized in your line of credit. If you are Mark
| Zuckerberg and worth more than $100 billion, you probably
| don't have any issues. If you add up all of his houses and
| planes and cars it probably doesn't add up to more than 1% of
| that. He's fine.
| fieldcny wrote:
| Loans based on assets like stocks/bonds/other assets with
| highly variable prices always have collateral requirements.
| If the loan is backed by 100M in facebook shares and the
| price of stock drops in half you will have to hand over more
| stock for collateral. If the price doubles, you can ask for
| your collateral back.
|
| It is doubtful Z has any margin call issues as he has so much
| stock, I can't imagine he would have pledged even 5% of it
| for loans, so he can just hand them another chunk without
| even blinking (which he generally doesn't do any way)
| odonnellryan wrote:
| Nah. Almost certainly he could lose 100% of that value of the
| stock without being at risk of anything like this (as in, he
| probably put up $100m if he wanted a $50m loan, etc..)
|
| Even if he didn't, the bank would let him move funds in without
| forcing him to sell.
| joshmlewis wrote:
| I don't know much about this but from the limited amount of
| I've read it is probably only a portion of the equity owned,
| and generally when borrowing against an asset the lender will
| not give you 100% of that assets value to protect from downside
| risk. Another probability is that it was adjusted in the past,
| potentially year(s) ago, and FB's stock price a year ago was
| almost $100 a share less than current so a $10 drop is not a
| big deal in the long term.
|
| You raise an interesting question though and I'd like to know
| the answer as well!
| husainhz7 wrote:
| Whatsapp's down too. Tough month for FB, especially with the
| leak.
| decrypt wrote:
| What leak are you referring to?
| Wistar wrote:
| https://www.cbsnews.com/news/facebook-whistleblower-
| frances-...
| lawwantsin17 wrote:
| It's all over the news
| samwilliams wrote:
| Whistleblower that spoke to WSJ.
| nerbert wrote:
| https://www.nytimes.com/2021/10/03/technology/whistle-
| blower...
| dmix wrote:
| The actual leak was published on WSJ "The Facebook Files"
|
| https://www.wsj.com/articles/the-facebook-files-11631713039
| orangepurple wrote:
| Facebook Whistleblower Claims Profit Was Prioritized Over
| Clamping Down on Hate Speech
|
| A Facebook whistleblower, who is due to testify before
| Congress on Tuesday, has accused the Big Tech company of
| repeatedly putting profit before doing "what was good for the
| public," including clamping down on hate speech.
|
| Frances Haugen, who told CBS's "60 Minutes" program that she
| was recruited by Facebook as a product manager on the civic
| misinformation team in 2019, said she and her attorneys have
| filed at least eight complaints with the U.S. Securities and
| Exchange Commission.
|
| During her appearance on the television program on Sunday,
| Haugen revealed that she was the whistleblower who provided
| the internal documents for a Sept. 14 expose by The Wall
| Street Journal that claims Instagram has a "toxic" impact on
| the self-esteem of young girls.
|
| That investigation claimed that the social media giant knows
| about the issue but "made minimal efforts to address these
| issues and plays them down in public."
|
| "The thing I saw at Facebook over and over again was there
| were conflicts of interest between what was good for the
| public and what was good for Facebook. And Facebook, over and
| over again, chose to optimize for its own interests, like
| making more money," said Haugen.
|
| She explained that Facebook did so by "picking out" content
| that "gets engagement or reaction," even it that content is
| hateful, divisive, or polarizing, because "it's easier to
| inspire people to anger than it is to other emotions."
|
| "Facebook has realized that if they change the algorithm to
| be safer, people will spend less time on the site, they'll
| click on less ads, they'll make less money," she claimed.
|
| Haugen is expected to to testify at a Senate hearing on Oct.
| 5 titled "Protecting Kids Online," about Facebook's knowledge
| regarding the photo sharing app's allegedly harmful effects
| on children.
|
| During her appearance on the television program, Haugen also
| accused Facebook of lying to the public about the progress it
| made to rein in hate speech on the social media platform. She
| further accused the company of fueling division and violence
| in the United States and worldwide.
|
| "When we live in an information environment that is full of
| angry, hateful, polarizing content it erodes our civic trust,
| it erodes our faith in each other, it erodes our ability to
| want to care for each other. The version of Facebook that
| exists today is tearing our societies apart and causing
| ethnic violence around the world," she said.
|
| She added that Facebook was used to help organize the breach
| of the U.S. Capitol building on Jan. 6, after the company
| switched off its safety systems following the U.S.
| presidential elections.
|
| While she believed no one at Facebook was "malevolent," she
| said the company had misaligned incentives.
|
| "Facebook makes more money when you consume more content,"
| she said. "People enjoy engaging with things that elicit an
| emotional reaction. And the more anger that they get exposed
| to, the more they interact and the more they consume."
|
| Shortly after the televised interview, Facebook spokesperson
| Lena Pietsch released a statement pushing back against
| Haugen's claims.
|
| "We continue to make significant improvements to tackle the
| spread of misinformation and harmful content," said Pietsch.
| "To suggest we encourage bad content and do nothing is just
| not true."
|
| Separately, Facebook Vice President of global affairs Nick
| Clegg told CNN before the interview aired that it was
| "ludicrous" to assert social media was to blame for the the
| events that unfolded on Jan. 6.
|
| The Epoch Times has reached out to Facebook for additional
| comment.
|
| https://www.theepochtimes.com/facebook-whistleblower-
| claims-...
| 542458 wrote:
| While (editorial commentary aside) the basic facts in that
| article are accurate as far as I can tell, I'd be careful
| with that source - The Epoch Times is a mouthpiece for
| Falun Gong's political interests and engages in disinfo
| programs.
|
| https://en.wikipedia.org/wiki/The_Epoch_Times
|
| They also previously ran a large sockpuppet network on
| Facebook and the Facebook ad platform (both of which have
| since been banned) so they may have a bit of a bone to pick
| with the platform.
| cronix wrote:
| Here's what wikipedia says about using wikipedia as a
| source: https://en.wikipedia.org/wiki/Wikipedia:Researchi
| ng_with_Wik...
| hkai wrote:
| This sounds so extremely far-fetched and designed to create
| a negative impression of Facebook. Does anyone even take
| this seriously? If yes, why?
|
| > mobile phones are addictive
|
| > internet is used to organize protests
| TrackerFF wrote:
| Anyone wanna estimate the cost of total downtime for facebook and
| instagram, as far as lost ad revenue goes - per minute?
| devenvdev wrote:
| Looks like someone built a counter:
| https://facebookadloss.facebookadloss.repl.co/
|
| Edit: the counter just jumped from 10B to 60M, so I doubt it's
| any reliable :)
| emptysea wrote:
| Interesting, even some open source sites like:
| https://fbinfer.com are down
|
| but https://glean.software and https://reactjs.org aren't
| jaywalk wrote:
| Anything hosted on Facebook's infrastructure is down. The two
| sites that you note are up aren't hosted by Facebook.
| Graffur wrote:
| What's react hosted on?
| jaywalk wrote:
| Looks like Cloudflare nameservers and Vercel hosting.
| mcintyre1994 wrote:
| According to https://lookup.icann.org/lookup both glean and
| reactjs have Cloudflare nameservers. fbinfer has
| ns.facebook.com nameservers which are presumably down.
| rcarmo wrote:
| Unsurprisingly, Oculus is down as well, as are most services for
| the VR headset. So that's 4 major properties right now.
| wyager wrote:
| Can you not use an Oculus headset if FB servers are down?
| That's absurd.
| rcarmo wrote:
| Some preloaded apps work (like YouTube, Firefox), but stuff
| like settings, the lobby, etc., are very slow or display
| "Unable to Load" messages. Any game that relies on your
| friends list seems to freeze for a while, then try to carry
| on.
| Reubachi wrote:
| Yep.
|
| We've come full circle, where techies are rediscovering the
| original hatred for the Oculus, that it is tied to a social
| media walled garden, for some reason.
| boston_clone wrote:
| I think you may need a refresher on their history -
| https://en.wikipedia.org/wiki/Oculus_(brand)#History
|
| When FB announced they would be buying Oculus, they
| promised that no social media integration would be
| required. FB breaking that promise is not the same as
| Oculus having that requirement from the get-go.
|
| What original hatred are you talking about??
| Reubachi wrote:
| Originally, you did not need a facebook account to use
| oculus after purchase. They framed this as "you do not
| need to integrate social media/facebook".
|
| This ^ means fuck-all, because at that time (day 1),
| their oculus services where hosted in the same
| infrastructure as their social media services.
|
| Last year, they got rid of "you do not need a facebook
| account". But in all situations since inception, all of
| your data is passing through the same infrastructure as
| facebook data. It may not be being exposed, or targeted
| for advertising, but this WAS a huge point of contention
| years back.
| samstave wrote:
| Well, You cant access a city if the freeway has been
| bombed...
|
| Remember the 'information super-highway'? Yeah it gets carpet
| bombed constantly....
| squeaky-clean wrote:
| The Rift headsets probably still work fine, but the Quest
| headsets require a FB connection to work.
| Apocryphon wrote:
| I couldn't help but to think of the fellow who has no monitors
| but uses an Oculus for virtual displays full-time, from last
| week
|
| https://news.ycombinator.com/item?id=28678041
| tut-urut-utut wrote:
| Facebook down, WhatsApp down, but Signal still works. Time for a
| change?
|
| EDIT: Yes, Signal is not federated, but that's what people are at
| least ready to consider as a WhatsApp alternative. I also created
| Matrix / Element account, and had 0 contacts using it already.
| WallyFunk wrote:
| > Signal still works
|
| https://old.reddit.com/r/thehatedone/comments/f160jh/is_sign...
|
| Signal is still centralized and uses AWS. So if AWS was to go
| down, it would affect not just Signal but vast swathes of the
| Internets.
| Arathorn wrote:
| The reason contacts don't tend to show up on Matrix/Element is
| because we don't push the user into sharing their addressbooks
| given the obvious privacy issues. Instead you mainly have to
| figure out who you know out-of-band for now (e.g. tweet "hey,
| who's on Matrix?").
| fsflover wrote:
| Signal had their own share of downtime. How about going to a
| federated system instead of repeating the same mistakes?
| (https://matrix.org)
| JCWasmx86 wrote:
| But still a part would be down, if a server has an outage.
| How about a system, where every device that is used for
| chatting is a server at the same time? I wonder whether
| something like that already exists. Bundle it together with
| bigger servers to handle the load. If the bigger servers
| experience outages, the service can still continue, although
| a bit slower
| fsflover wrote:
| Matrix P2P already exists.
| JCWasmx86 wrote:
| Thanks, didn't know that!
| sparrish wrote:
| Seems to be DNS related.
|
| None of the listed facebook nameservers are resolvable or
| reachable:
|
| a.ns.facebook.com b.ns.facebook.com c.ns.facebook.com
| d.ns.facebook.com
| zulln wrote:
| In the beginning it responded but gave server errors.
| jaywalk wrote:
| Which seems to indicate a massive infrastructure failure.
| sakisv wrote:
| Actually I'd argue that the biggest problem would be to
| wait for the TTL to expire after you've fixed the problem.
| jaywalk wrote:
| The TTL was most likely very low, so I don't see that as
| being an issue.
| sparrish wrote:
| Looks like the routing is goofed. Loops over and over - DDoS
| attacking themselves.
|
| mtr -r -c10 -w -b a.ns.facebook.com Start:
| 2021-10-04T10:02:50-0600 Loss% Snt Last Avg Best Wrst StDev
|
| ... 4.|--
| ae-2-rur101.cosprings.co.denver.comcast.net (162.151.51.125)
| 0.0% 10 12.6 11.9 9.6 19.0 2.9 5.|--
| 24.124.155.233
| 0.0% 10 9.3 10.2 9.1 12.4 1.1 6.|--
| 96.216.22.45
| 0.0% 10 12.0 14.0 11.6 31.3 6.1 7.|--
| be-36041-cs04.1601milehigh.co.ibone.comcast.net (96.110.43.253)
| 20.0% 10 14.6 13.5 11.6 20.5 3.0 8.|--
| be-3402-pe02.910fifteenth.co.ibone.comcast.net (96.110.38.126)
| 0.0% 10 12.2 12.0 11.5 13.2 0.5 9.|--
| 173.167.59.170
| 0.0% 10 13.8 17.8 12.0 34.7 8.4 10.|--
| 129.134.40.74
| 0.0% 10 15.3 12.6 11.4 15.3 1.1 11.|--
| 129.134.43.226
| 0.0% 10 18.9 15.3 12.6 20.3 3.0 12.|--
| 129.134.98.166
| 0.0% 10 12.5 14.2 12.5 20.4 2.3 13.|--
| 129.134.54.61
| 0.0% 10 34.2 30.8 28.9 34.2 1.8 14.|--
| 129.134.53.61
| 0.0% 10 29.8 31.1 28.9 36.5 2.7 15.|--
| 129.134.53.61
| 90.0% 10 31.9 31.9 31.9 31.9 0.0
| sparrish wrote:
| Same issue over at b.ns.facebook.com Looping routing creating
| self-inflicted DDoS
|
| mtr -r -c10 -n b.ns.facebook.com
|
| Start: 2021-10-04T10:28:03-0600 Loss% Snt Last Avg Best Wrst
| StDev 1.|-- 192.168.1.1 0.0%
| 10 0.2 0.2 0.2 0.3 0.0 2.|--
| 96.120.12.229 0.0% 10 10.2 10.8 8.8
| 15.7 1.9 3.|-- 96.110.149.185 0.0%
| 10 17.7 13.6 9.8 32.3 7.0 4.|--
| 162.151.51.125 0.0% 10 10.9 12.2 9.6
| 15.3 1.9 5.|-- 24.124.155.233 0.0%
| 10 13.0 10.4 9.4 13.0 1.2 6.|--
| 96.216.22.45 0.0% 10 16.5 16.7 11.2
| 29.1 6.4 7.|-- 96.110.43.241 0.0%
| 10 17.4 13.6 11.9 17.4 1.6 8.|--
| 96.110.38.114 0.0% 10 12.5 12.8 12.0
| 14.0 0.6 9.|-- 173.167.59.170 0.0%
| 10 36.1 19.3 11.6 36.1 9.7 10.|--
| 129.134.40.76 0.0% 10 13.1 12.3 11.3
| 13.1 0.6 11.|-- 129.134.34.72 0.0%
| 10 15.3 15.7 13.5 21.3 2.5 12.|--
| 129.134.102.85 0.0% 10 39.0 39.2 38.0
| 40.8 1.0 13.|-- 31.13.25.13 0.0%
| 10 30.5 29.8 28.5 31.0 0.9 14.|-- ???
| 100.0 10 0.0 0.0 0.0 0.0 0.0 15.|--
| ??? 100.0 10 0.0 0.0 0.0
| 0.0 0.0 16.|-- 31.13.25.13 90.0%
| 10 30.2 30.2 30.2 30.2 0.0
|
| On a side note... why is it so freaking hard to format line
| breaks in HN?
| blablablub wrote:
| Glad to report that facebook's dns in china is not affected. You
| can dig facebook.com and the depth of the internet happily reply
| with a random ip address as usual.
| ivanmontillam wrote:
| One thing that triggers my OCD is leaving the Facebook session
| open, though it's my own computer.
|
| Maybe it's DNS. It's always DNS.
| debacle wrote:
| Facebook went down (error page, then 503) before DNS went down.
| marbex7 wrote:
| Same for me.
| WallyFunk wrote:
| > Maybe it's DNS
|
| If it is, close Facebook as there's probably a BGP hijack going
| on that is siphoning off personal data and or secrets
| yawnxyz wrote:
| I thought Facebook, Instagram and WhatsApp ran on different
| infrastructure (and they've been trying for a while to align
| everything)?
|
| How could they all go down at the same time, if they have
| different teams of engineers running each product separately?
|
| Could anyone with some background (or person familiar with the
| matter) explain how their system's set up?
| toast0 wrote:
| WhatsApp and Instagram are both in FB infra. As I understand
| it, Instagram is fairly integrated with FB services; when I
| left in 2019, WhatsApp was less so, it was _mostly_ WhatsApp
| specific containers running with FB 's container orchestration
| on FB machines dedicated to WhatsApp (there was and probably is
| some dependence on FB systems for some parts of the app, for
| example the server side of multimedia is mostly a FB system
| with some small tweaks and specific settings, but chat should
| be relatively isolated). Inbound connection loadbalancing is
| shared though.
|
| FWIW, WhatsApp (on phones) should be resiliant to a DNS only
| outage, the clients contain fallback IPs to use _when_ DNS
| doesn 't work, and internal services don't use DNS as far as I
| remember.
|
| At one time, WhatsApp had actually separate infrastructure at
| SoftLayer (IBM Cloud now), but that hasn't been in place for
| quite some time now. When I left, it was mostly just HAProxy to
| catch older clients with SoftLayer IPs as their DNS fallback.
| vodkapump wrote:
| Seems unrelated to their infrastructure, the DNS records for
| facebook.com, instagram.com, whatsapp.com and all derivative
| domains are wiped clean it seems
|
| edit: though saying that, they do run their own registrar...
| Might've fucked something up over there.
| cblconfederate wrote:
| It's like a nuclear bomb exploded on the internet.
| asduoihfijnu wrote:
| add comment
| ionwake wrote:
| Why is this not the first post on hn? It's double the points and
| has over 590 comments?
| raimille1 wrote:
| Whatsapp down as well
| jleyank wrote:
| Somebody published vogon poetry and pictures and the internet
| routed around the damage...
| poetaster wrote:
| I have not laughed so loud since before the virus that shall
| have no vogon name.
| _yo2u wrote:
| value of fb directly proportional to traffic flow curious about
| why fb is down ;)
| EastOfTruth wrote:
| We can only hope that they will be gone forever... and HN is
| having major issues at the same time!
| elboru wrote:
| Is it just me or HN also feels kinda laggy?
| 14 wrote:
| Definitely laggy for me as well. Went to Facebook and couldn't
| so come here to check in and the load time made me think oh it
| must be my wifi is not working with 2 sites not opening then
| finally HN opened. Then tried to hit reply to your post and
| again seemed like it wouldn't load then finally did. So yes
| laggy usually this is the one site that loads almost instantly.
| ourcat wrote:
| Here too. Just had the "We're having some trouble serving your
| request. Sorry!" error.
| cvhashim wrote:
| Some internet backbone provider is probably down itself.
| leafygreene wrote:
| Or some country has started a war.
| comeonseriously wrote:
| Can confirm. HN, YT, Google, etc are all a bit laggy for me at
| the moment (eating lunch so I'm trying to entertain myself).
| busymom0 wrote:
| Yep. I am the developer of HN client HACK for iOS and Android
| and a bunch of users emailed me asking if my app was broken.
| Looks like something bigger is afoot.
| gridder wrote:
| Best HN client app ever. Thanks for the great work!
| alexdumitru wrote:
| Something's wrong with your app. It's not working at all,
| while Harmonic works perfectly.
| flypaca wrote:
| Not just you. It is very laggy on my end too.
| Jamie9912 wrote:
| Yep, struggled to load the homepage and this
| Jyaif wrote:
| With the Facebook properties down, the rest of the internet
| will have a significant increase in usage.
| dcminter wrote:
| Plus I don't know about you, but I came to HN just now
| specifically to check if there was any insight into _why_ it
| was down! The thundering herd just arrived :)
| rocho wrote:
| I can confirm, HN, GitHub and Slack are very slow for me as
| well. Google is very fast, on the other hand.
| yk wrote:
| People either have to work, creating load on GitHub, or waste
| their time elsewhere, creating load on HN and Slack.
| ggerules wrote:
| Also slow for me also.
| gcoguiec wrote:
| Dropping that many BGP routes will have its high latency toll
| on the whole internet backbone for minutes/hours, I'm not
| surprised. I wonder if the recent LE's DST Root CA X3
| deprecation has something to do with the outage (some DC
| internal tool/API not accessible because its certificate is
| expired or something like that).
| szundi wrote:
| People probably got more time to work.
| alexellisuk wrote:
| Also slow here. I can't see anything on the AWS Service
| Dashboard https://status.aws.amazon.com
| 1_player wrote:
| In my experience, any service dashboard is useless unless
| the problem has been going on for so long (i.e. hours) that
| it is obvious something's wrong.
| erhk wrote:
| AWS punishes its sysadmin teams for any downtime so there
| is heavy incentive to not report unless there os a
| community shaped gun pointed at your head. This is not a
| universal problem.
| erhk wrote:
| AWS punishes its sysadmin teams for any downtime so there
| is heavy incentive to not report unless there is a
| community shaped gun pointed at your head. This is not a
| universal problem.
| pilsetnieks wrote:
| All running their DNS on AWS. My guess is that AWS is seeing
| a massive flood of failed and retried DNS requests for
| facebook properties, similar to what jgrahamc mentions here
| for Cloudflare:
| https://twitter.com/jgrahamc/status/1445066136547217413
| throwdecro wrote:
| Is there a "Kessler syndrome" analogue for the internet,
| where failures beget failures until it's just an
| impenetrable cloud of fail, forever?
| motoboi wrote:
| Until someone smashes the "SEND MOAR SERVERS" button.
| nashadelic wrote:
| There's such a thing called the "Thundering Herd"
| problem, that partially matches.
|
| From wiki: the thundering herd problem occurs when a
| large number of processes or threads waiting for an event
| are awoken when that event occurs, but only one process
| is able to handle the event. When the processes wake up,
| they will each try to handle the event, but only one will
| win.
| qwertox wrote:
| I can't see how this is the reason for HN to take 10
| seconds for the response of the main page (I mean, the URL
| fetched from the address bar, not the subrequests the page
| does), as everything else downloads immediately.
|
| The DNS entries should be cached by the browser (and the
| middleware), so that this problem should only happen once,
| but I get this constantly.
|
| Also, I sometimes get an error message from HN, which seems
| to indicate that this is some backend issue which fails
| gracefully with a custom "We're having some trouble serving
| your request. Sorry!" on top of a 502 code.
|
| It feels more like there is something else still broken.
| pilsetnieks wrote:
| In the case of HN it's probably just heavier load than
| normal. It's much faster if you're logged out.
| rocky1138 wrote:
| A couple of years ago, an admin at Hacker News asked those of
| us who are just reading to log out because their system is
| architected in such a way that logged in users use more
| resources than anonymous ones. So, if you're feeling
| altruistic, log out of HN!
| busymom0 wrote:
| Logging out does work! Probably delivering a cache.
| rocho wrote:
| I can confirm, HN, GitHub and Slack are very slow for me as
| well. Google is very fast, on the other hand.
|
| EDIT: actually HN failed to post this comment the first time I
| posted it!
| eeegnu wrote:
| Probably people flooding in to see if anyone knows why things
| are down. Even Google speed test was down, presumably from too
| many people testing if it's their internet that's at issue.
| deadalus wrote:
| https://www.speedtest.net is down too
| blntechie wrote:
| The site is working fine for me. Speedtest CLI also is
| useful but doubt when DNS is down.
| tedmiston wrote:
| working for me
| kzrdude wrote:
| I'd guess that automatic processes dominate. Maybe billions
| of phone apps polling for facebook connectivity (FB messenger
| is down, for example).
| neom wrote:
| HN lagging, BBC was also very laggy about 30 minutes ago, and
| 35 minutes ago our whole company got booted out of their
| various hangouts simultaneously apart from the people in the
| states.
| donkarma wrote:
| yeah lagging for me too
| raymondh wrote:
| It is slow for me too.
| Yuioup wrote:
| Same here. Sounds like another cloudflare-like problem.
| foobarbecue wrote:
| Internet's got a case of the Mondays for sure
| bentcorner wrote:
| General tip: If HN is being laggy and you're determined you
| want to waste some time here, open it in a private window. HN
| works extremely quickly if it doesn't know who you are.
| iamthemalto wrote:
| Wow this really works, thank you. What actually is the reason
| for it being much faster in a private window? Is there so
| much tracking going on in a normal window?
| MrStonedOne wrote:
| One of the first optimizations large/high traffic sites
| will do, is cache pages for logged out users. even if the
| cache is only valid for a minute, that's still a huge
| reduction in server traffic.
|
| The cache is faster because its not having to talk to the
| database, and can be done at by the load balancing layers
| rather then the actual application layer.
|
| Wikipedia does this too (although, via a layer to add back
| on the ip talkpage header).
| thinkingemote wrote:
| its faster because the pages are cached, they are
| effectively static. It's slower when logged in because the
| pages are created dynamically as it has your username,
| tracking favourites, upvotes etc, and much of it cannot be
| quickly cached.
| squeaky-clean wrote:
| You can also just log out instead of opening a private
| window. Users that aren't logged in are served cached
| pages.
| quickthrower2 wrote:
| Could they offer cached pages to logged in users as an
| optimization? You only need to invalidate when a user
| posts a comment, most of the time you are reading now
| commenting?
| elboru wrote:
| That explains why it works fine in my computer, where I
| haven't logged in. Thanks for the tip.
| quaintdev wrote:
| This works like charm. Thank you!
| Ancalagon wrote:
| This would make for a very good deep-dive technical
| discussion in an interview setting, I'm using this.
| jose-cl wrote:
| yes, me too (I'm in south-america)
| [deleted]
| amelius wrote:
| Yes, it's slow here as well, and posting this comment failed
| the first time.
| amelius wrote:
| Yes, it's slow here as well, and posting this comment failed
| the first and second time.
| amelius wrote:
| Yes, it's slow here as well, and posting this comment failed
| the first and second and third time.
| amelius wrote:
| Yes, it's slow here as well, and posting this comment failed
| the first and second and third and fourth time.
| bradenb wrote:
| This is either a hilarious accident or genius comedy.
| tzs wrote:
| This is not too rare when HN is being slow and giving those
| "We're having some trouble serving your request. Sorry!"
| pages.
|
| If you get one of those on your comment submission you have
| no way to know if the trouble stopped it from accepting the
| comment or if it accepted the comment and ran into trouble
| then trying to display the updated thread.
|
| For some reason I can't even begin to guess at HN does not
| seem to have protection against multiple submissions of the
| same form, so if after getting "We're having some trouble
| serving your request. Sorry!" on your comment submission
| you hit refresh again to display the page and the form gets
| resubmitted, you get a duplicate comment.
| i_like_apis wrote:
| Probably traffic related. Lots of people reallocated to
| checking other sites.
| Romanulus wrote:
| It's all those people coming back to the real web.
| StapleHorse wrote:
| That explain the new contacts in Telegram.
| klik99 wrote:
| Is this related to the outages from lets encrypt root cert
| expiring? Probably not since this looks like a DNS issue, but
| still it's a crazy coincidence that two major internet breaking
| events happen in the same week
| treesknees wrote:
| There is zero reason to believe it's related at all. It's
| perfectly reasonable to have multiple large and unrelated
| failures in the same week.
|
| I also wouldn't classify the loss of 1 company, and the
| expiration of some TLS certificates, as the interconnected
| network of networks being broken. The Internet has continued to
| function even if some larger players were unreachable or having
| issues.
| tomohawk wrote:
| Many local governments use FB to get info out.
|
| Events like this show they should use multiple outlets instead of
| the big monopoly.
|
| Alternatives like gab exist, but its incredibly hard to gain
| traction against the big monopolies.
| [deleted]
| alecfreudenberg wrote:
| _smiles and eats popcorn_
| EB66 wrote:
| Is anyone else seeing knock-on effects at the other major public
| DNS providers? I'm seeing nslookups sent to 4.2.2.2 and 8.8.8.8
| intermittently timeout if the hostname does not belong to a major
| website. CloudFlare DNS (1.1.1.1) doesn't appear to be impacted.
| For example:
|
| [root@app ~]# nslookup downforeveryoneorjustme.com 4.2.2.2 ;;
| connection timed out; trying next origin ;; connection timed out;
| no servers could be reached
|
| [root@app ~]# nslookup downforeveryoneorjustme.com 1.1.1.1
| Server: 1.1.1.1 Address: 1.1.1.1#53
|
| Non-authoritative answer: Name: downforeveryoneorjustme.com
| Address: 172.67.166.187 Name: downforeveryoneorjustme.com
| Address: 104.21.91.48
|
| [root@app ~]#
|
| Perhaps DNS queries are skyrocketing and overwhelming some of the
| major public DNS servers.
| rstupek wrote:
| Yes same issue for me with 8.8.8.8, errors for everything but
| big domains.
| r721 wrote:
| See this thread (with replies):
|
| >Now, here's the fun part. @Cloudflare runs a free DNS
| resolver, 1.1.1.1, and lots of people use it. So Facebook etc.
| are down... guess what happens? People keep retrying. Software
| keeps retrying. We get hit by a massive flood of DNS traffic
| asking for http://facebook.com
|
| https://twitter.com/jgrahamc/status/1445066136547217413
|
| >Our small non profit also sees a huge spike in DNS traffic.
| It's really insane.
|
| https://twitter.com/awlnx/status/1445072441886265355
|
| >This is frontend DNS stats from one of the smaller ISPs I
| operate. DNS traffic has almost doubled.
|
| https://twitter.com/TheodoreBaschak/status/14450732299707637...
| pixxel wrote:
| No idea if it's related but a lot of Tor websites have also
| been offline all day (BBC, ProtonMail etc).
| uo21tp5hoyg wrote:
| Yeah Cloudflare have said they're being flooded with Facebook
| DNS retries and had to get extra hands on deck to deal with the
| influx.
| mcintyre1994 wrote:
| I love how understated companies always are about things like
| this.
|
| > Facebook said: "We are aware some people are having trouble
| accessing our apps and products. We are working to get things
| back to normal as quickly as possible and apologise for any
| inconvenience."
|
| https://www.bbc.co.uk/news/technology-58793174
| LuisMondragon wrote:
| Some Oculus Quest owners can't use their device
| https://www.reddit.com/r/OculusQuest/comments/q18xwy/faceboo...
| littlecranky67 wrote:
| The media coverage and lots of the comments don't make sense to
| me. FB would not be so stupid and put all of their crucial DNS
| servers into a single autonomous system (which is now offline due
| to BGP issues). They operate literally dozens of datacenters
| around the world, and are surely not using a single AS for them -
| why not put secondary Nameservers there? Can someone make a sense
| of this?
| mdavidn wrote:
| Sounds like automation deployed a configuration update to most
| of Facebook's peering routers simultaneously. Something similar
| brought down Google in 2019.
| littlecranky67 wrote:
| If so, then it would simply be a BGP issue - no FB servers
| reachable, as all routes are down. But media+claims a combo
| of BGP/DNS. Hard to believe world-wide border routers, only
| responsible for networks containing DNS servers, are
| misconfigured. I am rely curious about that post-mortem :)
| EGreg wrote:
| Telegram seems down too, is it down for you?
| asduoihfijnu wrote:
| ok mom i commented
| vvpan wrote:
| Not an indepth technical comment here but: seeing a tech megacorp
| go offline for a day makes me very very happy.
| MrStonedOne wrote:
| Rumormill is suggesting that facebook badge readers are also down
| causing issues with trying to get to the servers to manually fix
| them.
|
| https://twitter.com/sheeraf/status/1445099150316503057?s=21
| [deleted]
| sAbakumoff wrote:
| rich people serve their revenge for pandora papers
| zrail wrote:
| _hugops_ for the engineers having to deal with this. It 's
| incredibly stressful and I personally feel like they deserve some
| empathy, even if I don't like Facebook.
|
| I wonder if maybe part of the lesson will be to run the root of
| your authoritative DNS hierarchy on separate infrastructure with
| a separate domain name. Using facebook.com as your root is cool
| and all but when that label disappears it causes huge issues.
| chasd00 wrote:
| There will be so many meetings over this. If powerpoint was
| listed on the stock exchange i'd say now's a good time to buy
| hah.
| poetaster wrote:
| I used to do this properly. One vanity got the better of me.
| Got some work to do. TGF SQL.
| koprulusector wrote:
| Reddit wasn't working a few min ago. Broader issue?
| blowski wrote:
| Reddit goes down every 10 mins anyway.
| stemc43 wrote:
| this person uses reddit
| liendolucas wrote:
| Does anyone have a reasonable guess on how much money they have
| already lost?
| gmiller123456 wrote:
| Likely $0. Ad views lost now will likely be made up for later.
| And even if there is a reduction in views, it just makes other
| views more valuable. Facebook doesn't have real competitors, so
| the money isn't going anywhere else.
| cphoover wrote:
| now if only tiktok would fail
| OnceUponADevops wrote:
| Confirming that we're seeing a major outage with all of our
| integrations with FB products.
| elzbardico wrote:
| Rejoice! The revolution has started!
|
| Yes, I know humor is not welcome in HN
| mkaszkowiak wrote:
| This outage is huge. I'm waiting for the write-up, assuming they
| release one
| jeffrom wrote:
| Is facebook being down causing hacker news to get the hug of
| death??
| pmlittle wrote:
| This is all left brain implimentation with looping and classic
| complexity coming home to roost. As we move through time, we
| build off of solutions of the past which are solving a problem,
| but complexity keeps adding on and this is a classic
| programming/computer science delemma.
| m0guz wrote:
| Ironic enough; status.fb.com also down.
| devenvdev wrote:
| Reminds me of S3 outage a couple of years ago when AWS status
| page went down because it was relying on... S3...
| awinter-py wrote:
| need a status.status.fb.com to indicate the status of the
| status
|
| and / or an S3 bucket with a json blob the apps can pull to at
| least tell users 'here's what's up'
| ricardo81 wrote:
| It uses the same nameservers as facebook.com, same point of
| failure.
|
| Seems like another poster posted finer details regarding
| BGP/peering which is ultimately causing the issue.
| zoobab wrote:
| LOL
| rd_police wrote:
| very based response, sir
| russellbeattie wrote:
| It's not hyperbole to say that this is going to literally save
| lives.
|
| Cutting off Facebook's firehouse of hate and misinformation for
| just a couple hours is going to have a obvious positive effect on
| millions of people. At this scale, at least one person will get
| vaccinated today because they didn't see the wall of ignorance
| that is FB's news feed.
|
| Maybe we should introduce "digital blue laws", where one day a
| week, social media is shut down for the overall good of society.
| oldabc wrote:
| https://blog.cloudflare.com/october-2021-facebook-outage/
| boramalper wrote:
| Duplicates:
|
| https://news.ycombinator.com/item?id=28748233
|
| https://news.ycombinator.com/item?id=28748199
|
| https://news.ycombinator.com/item?id=28748159
|
| https://news.ycombinator.com/item?id=28748246
|
| https://news.ycombinator.com/item?id=28748131
| jader201 wrote:
| FB seems to be finally loading for me, after nearly 6 hours.
|
| This will be a highly discussed topic for a bit.
| akshayrajp wrote:
| As are Instagram and WhatsApp
| foobaw wrote:
| This is taking a longer time than expected for a company like
| Facebook - must be serious where a rollback isn't possible or
| trivial.
| derwiki wrote:
| I wonder if everyone refreshing the sites/apps trying to get it
| to load is contributing to the problem
| XCSme wrote:
| Probably not, from other comments it looks like there was a
| wrong configuration rolled out, and now they are logistically
| struggling to get access to fix them.
| chasd00 wrote:
| from what i understand (take with grain of salt) remote access
| to the routers affected is down. So they need to be physically
| plugged in to address the issue. hence some of the other
| "scrambling private jets" comments referring to getting the
| right people physically plugged in to the right routers.
| em3rgent0rdr wrote:
| How much revenue does facebook loose per hour down?
| missedthecue wrote:
| Facebook made $29 billion last quarter which translates to
| $315,217,391 per day. Divide that over 24 hours in a day, and
| it's ~$13 million per hour.
|
| Of course, depends on the hour of day. Facebook likely makes
| more ad money when North America is awake than when Asia is
| awake for instance.
| dmoy wrote:
| This is very hard to get exactly right, because traffic isn't
| constant at all times, and you don't know if people won't just
| make up for lost time using facebook at another time of the
| day, etc. So you can't _really_ know.
|
| But, a good rule of thumb right now is about $10,000,000 per
| hour.
| codediesel wrote:
| still down, people going to signal to have a chat
| throwaway78981 wrote:
| Signal is welcoming everyone:
|
| https://nitter.mailstation.de/signalapp/status/1445062426739...
| maxxxxxx wrote:
| Ironically, I cannot send messages on Signal right now. They
| can't handle the extra load?
| coolspot wrote:
| Signal replicates each message to NSA and FB, so when one is
| down, Signal's backend fails with a timeout error.
| lavp wrote:
| Source?
| MajorSauce wrote:
| Still an American centralized platform. Federated Matrix is the
| way to go!
| kitkat_new wrote:
| to also face a failure in the single point of failure - system?
| jacobwilliamroy wrote:
| Should say "Facebook owned. Sites are down."
| yawnxyz wrote:
| Funny enough, I went to https://www.isitdownrightnow.com/ to
| check if Facebook is down, and isitdownrightnow is down itself...
| probably from the massive number requests coming to check if
| Facebook is down
| homeskool wrote:
| yep down for me too
| cecilpl2 wrote:
| https://downforeveryoneorjustme.com/facebook
| aaronharnly wrote:
| Amusingly, that returns:
|
| > Is Facebook down right now?
|
| > Uh oh! Something went wrong on our side. It's not you, it's
| us. Feel free to contact us if this persists.
| zekrioca wrote:
| Which in turn, reminds of this paper [1] (from someone who
| previously worked at Facebook).
|
| TLDR; Metastable failures occur in open systems with an
| uncontrolled source of load where a trigger causes the system
| to enter a bad state that persists even when the trigger is
| removed.
|
| [1] Metastable Failures in Distributed Systems -
| https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s...
| skizm wrote:
| https://downdetector.com/ seems to be working for me at least.
| horsellama wrote:
| 'Unusual traffic patterns detected' now
| spiantino wrote:
| It's amusing that the top 3 trending reports are the FB sites
| that are down, and then the mobile carriers themselves,
| presumably because when FB doesn't load they assume it's
| their mobile network's fault. People really do think FB is
| the internet
| cronix wrote:
| > People really do think FB is the internet
|
| It's the AOL of 2021
| dylan604 wrote:
| But at one point AOL was the actual internet for it's
| subscribers.
| NullPrefix wrote:
| Facebook tried to do that too.
| Sebb767 wrote:
| > People really do think FB is the internet
|
| It is a really large part of it. Also, when people see
| WhatsApp and see no connection, then open Facebook and see
| no connection either, it's _very_ likely that the link is
| at fault and not Facebook.
| EvanAnderson wrote:
| Seems like the perfect time to launch
| isisitdowndownrightnow.com.
| lostmsu wrote:
| You missed one rightnow in the middle
| epalm wrote:
| Seems like it should be isisitdownrightnowdownrightnow.com
| abracadaniel wrote:
| I've said "I've said it before, and I'll say it again"
| before, and I'll say "I've said it before, and I'll say it
| again" again.
| msdrigg wrote:
| Seems like noel already launches that one
| msdrigg wrote:
| Seems like noel already launched that one
| jbkkd wrote:
| Noticed the same. I started to suspect my mobile plan ran out
| michaelmior wrote:
| I personally like https://isup.me (alias of
| downforeveryoneorjustme.com) because it's much shorter.
|
| isup.me/facebook gets me what I want.
| thrdbndndn wrote:
| Their methodology is flawed it seems.
|
| It says Google is down but it's not. [1]
|
| [1] https://downforeveryoneorjustme.com/google
| [deleted]
| j3th9n wrote:
| I like it, it feels like it's 1999.
| [deleted]
| geocrasher wrote:
| Once again, it's DNS.
|
| https://soundcloud.com/ryan-flowers-916961339/dns-to-the-tun...
| dghughes wrote:
| Twitter seems to be a big buggy now too maybe just a coincidence.
| User comments under posts are not appearing.
| attende_domine wrote:
| % ping whatsapp.com ping: whatsapp.com: Name or service not
| known % ping web.whatsapp.com ping: web.whatsapp.com:
| Name or service not known % ping facebook.com ping:
| facebook.com: Name or service not known % ping
| instagram.com PING instagram.com (31.13.65.174) 56(84)
| bytes of data. 64 bytes from 31.13.65.174 (31.13.65.174):
| icmp_seq=1 ttl=53 time=110 ms
| chasd00 wrote:
| can productivity (or emotional stability) for the overall US
| economy be tracked on a daily basis? I wonder if a wholesale
| facebook outage would show up on that graph as a brief blip in
| the positive direction.
| qualudeheart wrote:
| If it bleeds we can kill it!
| underscore_ku wrote:
| good
| joelbondurant wrote:
| LOL! They should have used the USA Fact-Check Algorithm from the
| Science Ministry.
| erjjones wrote:
| Does everyone just buy in that this is just a network change gone
| wrong? OR could they be mitigating a breach/hack? OR could it be
| some other theory?
| smashah wrote:
| Good, now I can go for lunch.
| unusximmortalis wrote:
| for me even this site loads very very slowly. pinging google name
| server is fast as usual. it could be a more wide problem not just
| FB related.
| TedShiller wrote:
| This is honestly the best feature Facebook has ever developed. I
| hope it's permanent. It has the following effects: you feel
| better about yourself, you can spend more time with your family,
| you are more productive.
| jontro wrote:
| Getting everything back up again will probably be a nightmare.
| Imagine all the internal services trying to reach a consistent
| state after such a long outage.
| AzzieElbab wrote:
| it is probably unrelated but HN is crawling
| mef wrote:
| Looks like the routes to their hosted nameservers are down, e.g.
| A.NS.FACEBOOK.COM
| liendolucas wrote:
| Perhaps tomorrow, the brave man or woman responsible for this
| beautiful screw up will step forward in HN for an outstanding
| ovation. Whoever did this, thank you! As a souvenir I took a
| screenshot on my phone.
| agilob wrote:
| that's not how post-mortems work
| waltbosz wrote:
| I'm curious if this extended outage will do anything to curb the
| dopamine addition caused by facebook.
|
| For example, will FB addicts experience a day of repeated failed
| attempts to get their FB fix, which will then condition them to
| stop trying.
| rvnx wrote:
| Fixed
| ricardo81 wrote:
| Doesn't seem too clever that Facebook's NS servers are
| a.ns.facebook.com, b.ns.facebook.com etc. IIRC that kind of setup
| requires some glue records.
| gorgoiler wrote:
| If you mean because the name servers are in the same zone, this
| is very common. When an NS is returned for a zone, you also get
| an "additional" A and AAAA to resolve the NS name. It's called
| _glue_. dig NS example.com ; ANSWER
| example.com. NS ns1.example.com. ; ADDITIONAL
| ns1.example.com. A 1.2.3.4
|
| Edit: I didn't see your glue comment when I wrote this.
| ricardo81 wrote:
| Cheers, I'd edited my post.
|
| Thought the common wisdom nowadays was to use nameservers on
| different TLDs and sub-labels for the best resilience.
|
| /added, they seem to have glue records so I'd assume it's the
| nameservers themselves having issues.
|
| $ dig NS @g.gtld-servers.net. a.ns.facebook.com
|
| ;; AUTHORITY SECTION:
|
| facebook.com. 172800 IN NS a.ns.facebook.com.
|
| facebook.com. 172800 IN NS b.ns.facebook.com.
|
| facebook.com. 172800 IN NS
|
| c.ns.facebook.com.
|
| facebook.com. 172800 IN NS d.ns.facebook.com.
|
| ;; ADDITIONAL SECTION:
|
| a.ns.facebook.com. 172800 IN A 129.134.30.12
|
| a.ns.facebook.com. 172800 IN AAAA
| 2a03:2880:f0fc:c:face:b00c:0:35
|
| b.ns.facebook.com. 172800 IN A 129.134.31.12
|
| b.ns.facebook.com. 172800 IN AAAA
| 2a03:2880:f0fd:c:face:b00c:0:35
|
| c.ns.facebook.com. 172800 IN A 185.89.218.12
|
| c.ns.facebook.com. 172800 IN AAAA
| 2a03:2880:f1fc:c:face:b00c:0:35
|
| d.ns.facebook.com. 172800 IN A 185.89.219.12
|
| d.ns.facebook.com. 172800 IN AAAA
| 2a03:2880:f1fd:c:face:b00c:0:35
| imalerba wrote:
| Seems like Telegram went down with a big whatsapp-is-down hug of
| death.
| blntechie wrote:
| Yep, Telegram stopped sending messages a while back and not
| loading at all for me now.
| teddyh wrote:
| Here we can see why you should not have all your DNS servers in
| the same AS (in this case, AS32934).
| marstall wrote:
| between outage and whistleblower, this has got to be the worst
| day in facebook's life
| glanzwulf wrote:
| Oh no...
|
| Anyway...
| TremendousJudge wrote:
| WhatsApp is pretty important infrastructure for most of the
| world
| jszymborski wrote:
| Which is regrettable when secure alternatives exist like
| Signal and Matrix whose business model doesn't involve
| selling your data.
| TremendousJudge wrote:
| Yeah I'm not saying I like it
| pc86 wrote:
| For some egregiously loose definitions of "infrastructure,"
| _maybe_.
| TremendousJudge wrote:
| The same definition that includes phone lines also includes
| the messaging service everybody uses
| drcongo wrote:
| And "important"
| nicoburns wrote:
| In parts of South America it's used for all sorts of
| things. Want to know when your bus is arriving? The bus
| company likely only knows because the driver is
| WhatsApp'ing them status updates.
| 6510 wrote:
| Reminds me of a story Jack Fresco use to tell were financial
| workers were unable to get to work because a bridge was not
| usable. People were worried about terrible consequences if all
| these important people were unable to do their work. To their
| surprise life just continued as if nothing changed.
| emmap21 wrote:
| There is a global outage this morning starting from 8:00 AM. This
| list has Google, Tiktok, Zoom, Slack and of course FB products
| and services.
| rocho wrote:
| For Facebook and WhatsApp it looks like a DNS issue, name
| resolution fails with SERVFAIL: $ dig
| facebook.com ; <<>> DiG 9.16.21 <<>> facebook.com
| ;; global options: +cmd ;; Got answer: ;;
| ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 23982
| ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0,
| ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS:
| version: 0, flags:; udp: 512 ;; QUESTION SECTION:
| ;facebook.com. IN A ;; Query time: 16 msec
| ;; SERVER: 8.8.8.8#53(8.8.8.8) ;; WHEN: Mon Oct 04
| 17:53:00 CEST 2021 ;; MSG SIZE rcvd: 41
| WillPostForFood wrote:
| I'm seeing similar DNS errors for many non-Facebook sites.
| Spare_account wrote:
| Do you have some examples?
| WillPostForFood wrote:
| normashooting.com - but only when, like the parent poster,
| using Google's DNS servers. Just switched to Cloudflare and
| it works.
|
| Using Google DNS:
|
| nslookup
|
| > normashooting.com
|
| Server: 8.8.8.8
|
| Address: 8.8.8.8#53
|
| * server can't find normashooting.com:
|
| SERVFAIL
|
| Using Cloudflare DNS servers:
|
| > normashooting.com Server: 1.1.1.1
|
| Address: 1.1.1.1#53
|
| Non-authoritative answer:
|
| Name: normashooting.com
|
| Address: 104.22.56.165
|
| Name: normashooting.com
|
| Address: 104.22.57.165
|
| Name: normashooting.com
|
| Address: 172.67.43.70
| janmo wrote:
| I am getting DNS fails for wikipedia
| peter_retief wrote:
| wfm
| marbex7 wrote:
| Wikipedia wfm.
| chilldill wrote:
| aws.amazon.com is down as well
| chilldill wrote:
| cant login to aws console either
| rstupek wrote:
| Seeing the same thing with 8.8.8.8 name servers. Everything I
| query returns an error
| robjan wrote:
| My ISP's DNS server went down a few minutes after the
| Facebook outage, presumably because all the residential
| customers' devices keep querying.
| pul wrote:
| Jep, also from other caches: https://www.nslookup.io/dns-
| records/facebook.com
| jbverschoor wrote:
| Same here on facebook.com , [api]whatsapp.com (instagram.com
| works)
| LordRishav wrote:
| It's always DNS
| sysadmindotfail wrote:
| >It's always DNS
|
| How is this not the top comment? Underrated
| simlevesque wrote:
| Maybe they tried everything else before that.
|
| At first it was working but they couldn't serve responses:
| https://i.imgur.com/UaCtOiX.png
|
| Notice the "2020"
| rvnx wrote:
| The servers struggle to reply a basic 5xx answer.
|
| Two possibilities:
|
| - the DNS services internally have issues (most likely, as
| this could explain the snowball effect)
|
| - it could be also a core storage issue and all their VMs are
| relying on it and so they don't want to block third-party
| websites and think it will last for a long time, so they
| prefer to answer nothing for now in the DNS (so it will fail
| instantly to the client, and drain the application/database
| servers so they can reboot with less load)
| zarzavat wrote:
| I was on a video call during the incident. The service was
| working but with super-low bandwidth for 30 minutes, then I
| got disconnected and every FB property went down suddenly.
| Seems more suggestive of someone pulling the plug than a
| DNS issue, although it could also be both.
| john37386 wrote:
| Even the Name servers are not returning any values. That's bad.
|
| dig @8.8.8.8 +short facebook.com NS
|
| These are usually anycasted, meaning that 1 ip return in NS are
| in fact several servers spread in several regions. They are
| distributed to closer match through agreements with ISP with
| the BGP protocol. Very interesting, because it seems that it
| took 1 DNS entry misconfiguration to withdraw M$ worth of
| devices from the internet.
| variant wrote:
| BGP goof?
|
| https://twitter.com/g_bonfiglio/status/1445056923309649926?s.
| ..
| ctur wrote:
| It isn't just DNS. If you happen to have cached entries, the
| site is returning errors as well.
| ikiris wrote:
| agreed, they fell off the internet according to routeviews
| Nextgrid wrote:
| Presumably the DNS being down also wreaks havoc in their
| internal infrastructure as services can no longer resolve
| each other's names.
| qeternity wrote:
| Internal services using public dns records?
| msbarnett wrote:
| Probably not, but their external and internal DNS may
| share infrastructure that's at the root of the failure
| qeternity wrote:
| Yikes, seems like an easy redundancy split.
| fragmede wrote:
| It _seems_ like an easy redundancy split, but imagine
| driving two cars down the freeway at the same time,
| because you got a flat tire in one, the other day.
|
| In order to actually be redundant you need to have two
| sets of infrastructure to serve, and then if the internal
| one goes down, the external one's basically useless when
| the internal resolution's down anyway. Capacity planning
| (because you're inside Facebook and can't pretend that
| all data-centers ever-where are connected via an
| infinitely fast network) becomes twice as much work. How
| you do updates for a couple thousand teams isn't trivial
| in the first place, now you have to cordon them off
| appropriately?
|
| I don't know what Facebook's DNS serving infrastructure
| looks like internally, but it's definitely more
| complicated than installing `unbound` on a couple of
| left-over servers.
| qeternity wrote:
| Yes, all of that (imo) is an argument in favor.
|
| I never said it was free, but it's worth it as long as
| it's cheaper than failure.
|
| I don't keep backups because I enjoy having multiple
| copies of my data. I do it because losing that data would
| be devastating.
| rightbyte wrote:
| I wonder if Facebook has circular 'boot' dependencies on
| their microservices or something? I.e. they can't restart
| stuff now when everything is down.
| kccqzy wrote:
| Oh you bet they do. In large organizations with complex
| microservices these dependencies inevitably arise. It
| takes real dedication and discipline to avoid creating
| these circular dependencies.
| samhw wrote:
| This is very true. I tell everyone who'll listen that
| every competent engineer should be well versed in the
| nuances of feedback in complex systems
| (https://en.wikipedia.org/wiki/Feedback).
|
| The most successful systems rely on the property of
| feedback (https://en.wikipedia.org/wiki/Feedback):
| evolution, untrained learning, genetic algorithms, the
| diagonal arguments
| (https://en.wikipedia.org/wiki/Diagonal_argument),
| artificial general intelligence (https://en.wikipedia.org
| /wiki/Technological_singularity), financial markets
| according to no less than George Soros (https://en.wikipe
| dia.org/wiki/Reflexivity_(social_theory)#In...), etc.
|
| That said, virtuous cycles can't exist without vicious
| cycles. I think we as a society need to do a _lot_ more
| work into helping people understand and model feedback in
| complex systems, because at scales like Facebook 's it's
| impossible for any one person to truly understand the
| hidden causal loops until it goes wrong. You only need to
| look at something like the Lotka-Volterra equations (http
| s://en.wikipedia.org/wiki/Lotka%E2%80%93Volterra_equatio.
| ..) to see how deeply counterintuitive these system
| dynamics can be (e.g. "increasing the food available to
| the prey caused the predator's population to
| destabilize":
| https://en.wikipedia.org/wiki/Paradox_of_enrichment).
| clon wrote:
| For sure. Reminds me of the difficulties of starting a
| power grid from total blackout, bringing generators and
| power stations to sync.. .
| [deleted]
| Hokusai wrote:
| Is this related in any way to what happened to Slack recently
| in their DNS?
| etc-hosts wrote:
| No
|
| https://lists.dns-oarc.net/pipermail/dns-
| operations/2021-Sep...
| skywhopper wrote:
| So far the pattern isn't the same. Slack published a DNSSEC
| record that got cached and then deleted it, which broke
| clients that tried to validate DNSSEC for slack.com. But in
| this case, the records are just completely gone. As if
| "facebook.com", "instagram.com", et al just didn't exist.
| hulitu wrote:
| Thank god we have DoH.
| dvratil wrote:
| It's DNS over HTTPS. It relies on the same system as plain
| DNS, so DoH won't really help in this case...
| Animats wrote:
| Even Google's 8.8.8.8 DNS server says can't find, SERVFAIL.
| hikerclimber1 wrote:
| Everything is subjective. Especially laws.
| r721 wrote:
| John Graham-Cumming:
|
| >Between 15:50 UTC and 15:52 UTC Facebook and related
| properties disappeared from the Internet in a flurry of BGP
| updates. This is what it looked like to @Cloudflare.
|
| https://twitter.com/jgrahamc/status/1445065270272434176
| (thread)
|
| UPD
|
| >About five minutes before Facebook's DNS stopped working we
| saw a large number of BGP changes (mostly route withdrawals)
| for Facebook's ASN.
|
| https://twitter.com/jgrahamc/status/1445068309288951820
| htrp wrote:
| In the post-mortem, we'll find out that Facebook's alerting and
| comms systems all run on Facebook. As a result, they can't even
| coordinate the restart to roll back changes.
| mkr-hn wrote:
| I'm genuinely not sure if the reports I heard of employees
| being locked out of the systems they need to fix it because
| their network is down are jokes or true.
| pixelgeek wrote:
| I would like that say that after my "burn it down" comments on
| another Facebook related post that I had nothing to do with this.
| mro_name wrote:
| frankly, who cares? Seriously.
|
| Those services are toxic for years now and everybody knows that.
| Who still uses them occasionally let alone relies on them can't
| be helped, can they?
| mdani wrote:
| Is this in some way connected to the Facebook data leak of 1.5
| billion users? The timing seems quite odd that both these things
| happen around the same time.
| clipradiowallet wrote:
| In other news, a bunch of people got a lot more work done today
| than normal I suspect...
| polack wrote:
| After changing the screen resolution all operating systems will
| prompt the user if the applied settings where correct, otherwise
| it will time out and reset to last known good setting. Maybe time
| for the core internet infrastructure to implement something
| similar? :)
| thealistra wrote:
| You can't really make a system that will unboil an egg.
| the-dude wrote:
| It is about time to speculate about sabotage, a disgruntled
| employee or something more exotic.
|
| All this BGP talk is boring.
| sheepybloke wrote:
| Like this being tweeted:
| https://twitter.com/YourAnonOne/status/1445082304393719818?s...
| [deleted]
| shockeychap wrote:
| I feel for the sysadmins who are fighting ulcers and migraines at
| the moment, but I can't shake feeling that the world is just a
| little bit better for this small window of time.
| korethr wrote:
| At 21:44 UTC, facebook.com resolves for me.
| AtNightWeCode wrote:
| So they managed to remove facebook.com from 1.1.1.1 and 8.8.8.8.
| That is impressive. Not something anyone can achieve in such
| short time by even trying.
| [deleted]
| natas wrote:
| good ridance
| jamespwilliams wrote:
| Is this caused by missing glue records? I can't resolve any of
| FB's nameservers. Anyone know how that could happen?
| zekica wrote:
| The glue records are fine from my end: dig -t NS facebook.com
| @a.gtld-servers.net
| wut42 wrote:
| the glues are still there-- it's not a DNS issue but a network
| one. Their ASN has mostly been withdrawn from everywhere.
| not2b wrote:
| This is a great argument for the antitrust authorities to break
| up Facebook. Allowing the big social media companies to buy each
| other creates a single point of failure. If Instagram and
| WhatsApp were separate companies, a technical disaster at one
| would not take out the other two.
| decrypt wrote:
| From this tweet:
| https://twitter.com/BlazejKrajnak/status/1445063232486531099
|
| "Because of missing DNS records for http://Facebook.com, every
| device with FB app is now DDoSing recursive DNS resolvers. And it
| may cause overloading ..."
| rainboiboi wrote:
| Just wondering - would the engineer who made the mistake be
| fired?
| aaomidi wrote:
| If they are then Facebook is worse than I thought.
| baby wrote:
| That's not the culture at facebook
| babuskov wrote:
| "Move fast and break things". Yeah, it's exactly the
| opposite. The person should be promoted ;)
| [deleted]
| pc86 wrote:
| What makes you think it was a mistake? What makes you think an
| engineer did it?
|
| Sometimes things just break and take time to fix.
| jaywalk wrote:
| How could anyone answer that question? We don't even know that
| an engineer made a mistake in the first place, much less what
| the mistake was and what led up to it.
| tomelliott wrote:
| Nope:
|
| https://www.usenix.org/conference/lisa19/presentation/turner
| vthallam wrote:
| nope. an individual is never blamed for these sort of issues.
| newobj wrote:
| The only person I've ever heard of being fired for an
| operational error was a principal networking engineer at Amazon
| who end-ran DNS policies and hand-edited a zone file. Somehow,
| the file got truncated. It brought down everything including
| the soft phones so people couldn't even spin up a phone-based
| conference call to deal with it. I think Amazon was down for
| several hours, with 8 digit losses. That was in the mid 2000's.
| Heard that person was fired but don't know for sure.
| rainboiboi wrote:
| Thanks everyone for providing the insights, I have no ill-
| intention, just asking for curiosity sake.
| yodon wrote:
| If a single person can cause the failure during the course of
| their normal tasks, it's not the fault of that person it's the
| fault of designers of the systems and processes used by that
| person.
| YPPH wrote:
| This question doesn't deserve downvotes. While the answer is
| quite clearly in the negative (this will be a process failure,
| not a human failure), it looks as though it was asked in good
| faith, and might not be so obvious to those outside the
| industry.
|
| Vote buttons are not a substitute for proper responses to
| legitimate enquiry.
| nicholasjon wrote:
| "Asking for a friend."
|
| I kid. If it were to come down to a single person, that's
| really a failure of the whole organization system and not of
| the individual.
|
| This apocryphal [1] punchline to the Jack Welch story also sums
| up how most orgs deal with this sort of thing:
|
| "I just spent a million dollars on your education - why would I
| fire you now?"
|
| [1]: http://www.nickmilton.com/2016/03/jack-welch-on-learning-
| fro...
| amediauk1 wrote:
| tracert 129.134.30.12
|
| Tracing route to a.ns.facebook.com [129.134.30.12] over a maximum
| of 30 hops: 1 1 ms 1 ms 1 ms
| eehub.home [192.168.1.254] 2 3 ms 3 ms 3 ms
| 172.16.14.63 3 * 5 ms 3 ms 213.121.98.145
| 4 5 ms 3 ms 4 ms 213.121.98.144 5 17 ms
| 8 ms 18 ms 87.237.20.142 6 8 ms 6 ms 7 ms
| lag-107.ear3.London2.Level3.net [212.187.166.149] 7 *
| * * Request timed out. 8 * * *
| Request timed out. 9 7 ms 7 ms 6 ms
| be2871.ccr42.lon13.atlas.cogentco.com [154.54.58.185] 10
| 70 ms 69 ms 70 ms be2101.ccr32.bos01.atlas.cogentco.com
| [154.54.82.38] 11 73 ms 73 ms 74 ms
| be3600.ccr22.alb02.atlas.cogentco.com [154.54.0.221] 12
| 84 ms 85 ms 84 ms be2879.ccr22.cle04.atlas.cogentco.com
| [154.54.29.173] 13 90 ms 90 ms 90 ms
| be2718.ccr42.ord01.atlas.cogentco.com [154.54.7.129] 14
| 143 ms 142 ms 143 ms po111.asw02.sjc1.tfbnw.net
| [173.252.64.102] 15 114 ms 119 ms 114 ms
| be3036.ccr22.den01.atlas.cogentco.com [154.54.31.89] 16
| 125 ms 126 ms 124 ms be3038.ccr32.slc01.atlas.cogentco.com
| [154.54.42.97] 17 91 ms 92 ms 91 ms
| po734.psw03.ord2.tfbnw.net [129.134.35.143] 18 91 ms
| 93 ms 90 ms 157.240.36.97 19 74 ms 74 ms 73 ms
| a.ns.facebook.com [129.134.30.12]
|
| Trace complete.
|
| this is what i got now
| jurajmlich wrote:
| DNS servers of a major internet provider in the Czech Republic
| are down now. Probably not a coincidence (other DNS server's
| stats show increased traffic so my guess is that Vodafone's DNS
| servers were unable to cope with the increased traffic and
| crashed
| https://twitter.com/BlazejKrajnak/status/1445063232486531099).
|
| It's crazy that half the country doesn't have internet because
| Facebook stopped working.
| antocv wrote:
| Its alive!
|
| drill @1.1.1.1 www.facebook.com ;; ->>HEADER<<- opcode: QUERY,
| rcode: NOERROR, id: 2172 ;; flags: qr rd ra ; QUERY: 1, ANSWER:
| 2, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;;
| www.facebook.com. IN A
|
| ;; ANSWER SECTION: www.facebook.com. 3401 IN CNAME star-
| mini.c10r.facebook.com. star-mini.c10r.facebook.com. 3403 IN A
| 31.13.72.36
| strenholme wrote:
| Kinda sorta. There are four DNS servers for Facebook:
| 129.134.30.12, 129.134.31.12, 185.89.218.12, and 185.89.219.12.
|
| Of those, _only_ 185.89.219.12 is up right now ( _Edit_ All
| four DNS servers are now up). For people who want to add
| Facebook to hosts.txt, the A record (IP) I'm getting right now
| is 157.240.11.35 (it was 31.13.70.36)
| daniellehmann wrote:
| See e.g.
| https://www.digwebinterface.com/?hostnames=facebook.com&ns=a...
| for responses from different nameservers.
| kblev wrote:
| "Sorry, something went wrong. Facebook (c) 2020"
| simonklitj wrote:
| Yes, even Facebook falls prey to the wrong copyright year.
| Anyway, I got further now to a page that says "Account
| Temporarily Unavailable." and has the old Facebook layout.
| Would love a peek inside the Facebook codebase to see how
| this happens, hah!
| [deleted]
| david_acm wrote:
| pings to a.ns.facebook.com are no longer timing out
| cryptodan wrote:
| Hope it's permanent
| xmpir wrote:
| Just thinking about all the conspiracy theories you could make of
| this. Yesterday pandora papers, today the internet stops working.
| blobbers wrote:
| well... this is unfortunate:
|
| ; <<>> DiG 9.10.6 <<>> facebook.com ;; global options: +cmd ;;
| Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id:
| 36072 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0,
| ADDITIONAL: 1
| amediauk1 wrote:
| tracert 129.134.30.12
|
| Tracing route to a.ns.facebook.com [129.134.30.12] over a maximum
| of 30 hops: 1 1 ms 1 ms 1 ms
| eehub.home [192.168.1.254] 2 3 ms 3 ms 3 ms
| 172.16.14.63 3 \* 5 ms 3 ms 213.121.98.145
| 4 5 ms 3 ms 4 ms 213.121.98.144 5 17 ms
| 8 ms 18 ms 87.237.20.142 6 8 ms 6 ms 7 ms
| lag-107.ear3.London2.Level3.net [212.187.166.149] 7 \*
| \* \* Request timed out. 8 \* \*
| \* Request timed out. 9 7 ms 7 ms 6 ms
| be2871.ccr42.lon13.atlas.cogentco.com [154.54.58.185] 10
| 70 ms 69 ms 70 ms be2101.ccr32.bos01.atlas.cogentco.com
| [154.54.82.38] 11 73 ms 73 ms 74 ms
| be3600.ccr22.alb02.atlas.cogentco.com [154.54.0.221] 12
| 84 ms 85 ms 84 ms be2879.ccr22.cle04.atlas.cogentco.com
| [154.54.29.173] 13 90 ms 90 ms 90 ms
| be2718.ccr42.ord01.atlas.cogentco.com [154.54.7.129] 14
| 143 ms 142 ms 143 ms po111.asw02.sjc1.tfbnw.net
| [173.252.64.102] 15 114 ms 119 ms 114 ms
| be3036.ccr22.den01.atlas.cogentco.com [154.54.31.89] 16
| 125 ms 126 ms 124 ms be3038.ccr32.slc01.atlas.cogentco.com
| [154.54.42.97] 17 91 ms 92 ms 91 ms
| po734.psw03.ord2.tfbnw.net [129.134.35.143] 18 91 ms
| 93 ms 90 ms 157.240.36.97 19 74 ms 74 ms 73 ms
| a.ns.facebook.com [129.134.30.12]
|
| Trace complete.
| wly_cdgr wrote:
| ...Including that one
| nicbou wrote:
| Alle Storungen shows a massive spike in problems for every
| service it keeps track of: https://xn--allestrungen-9ib.de/
| TravisHusky wrote:
| Interesting; I have been noticing a lot of service are unstable
| today. I wonder if there is a larger outage.
| iseanstevens wrote:
| Interesting Timing!
| suyash wrote:
| Facebook hacked?
| jleyank wrote:
| I would have thought that these companies that are richer then
| $GOD would have (virtual) instances of at least the previous
| stable version available for situations such as this. It would at
| least keep their damn doors open and internal communications
| systems going... Maybe they'll NOW think of such things? What's
| the cliche, penny wise and pound foolish? Or is it, no need to
| listen to experienced Network Designers? I can never remember...
| aenis wrote:
| most of what they do, they do with in house tools, and custom-
| everything, including hardware. as a consequence, for some
| classes of problems there are no experts - not at facebook, not
| anywhere.
|
| i feel for their netops people. uncharted territory with the
| whole world watching and, no doubt, a lot of morons from
| management trying to be "helpful" in getting this nice crisis
| resolved. for any crisis there is always a bunch of clowns with
| MBAs that consider it their golden opportunity to shine (nearly
| always at someone elses expense)
| matsemann wrote:
| What I find weird is that there is no indication in the app that
| nothing is working. I just get a cached view of everything I've
| seen the last few days.
|
| Which is a feature I hate, since it does that all the time even
| when I have a connection. Says there are 3 comments on a post,
| when I know there is more. Opening them doesnt show them, and no
| way to refresh. But going to the web page I can see them.
| caturopath wrote:
| What would a full day of WhatsApp outage mean for the world?
| gillytech wrote:
| It's about damn time. Hopefully they stay down. It will do the
| world some good (long term) to have some time away from this
| platform and platforms like it.
| htrp wrote:
| Facebook is proving that it's systemically important by taking
| the entire site down.....
|
| Zuckerberg is taking his ball and going home unless you stop
| writing mean things about him /s
| jimkleiber wrote:
| Lol exactly what I was thinking. I'm trying to keep my tinfoil
| hat in the closet and yet it seems odd that after there is a
| huge FB whistleblower story on 60 Minutes last night, all of FB
| goes down today.
|
| I really hope it's just some internal technical error and not a
| "see, despite the bad things of FB, you really need us" move.
|
| It's probably trivial, the timing just seems weird to me.
| spaceywilly wrote:
| Let's hope it's this. Everyone will just shrug and move onto
| the next hopefully less evil site
| hkai wrote:
| Which one?
| julianlam wrote:
| lobste.rs? mastodon?
| oneeyedpigeon wrote:
| Used to have a manager that we swore did exactly that. Every
| time he was away on holiday, mysterious site problems to prove
| his worth!
| gagege wrote:
| I had the exact opposite and it was hilarious. Every time my
| manager (a great guy and really good at what he did) was away
| for a week the sprint would go very smoothly.
| emmap21 wrote:
| Not only facebook, but also Google, Zoom, Telegram, Youtube and
| many more internet service/ product/ providers from 8:00 AM
| today. This is more like internet outage.
| jy3 wrote:
| No. All the ones you mentioned are up.
| _fizz_buzz_ wrote:
| Youtube and google are definitely working for me without any
| problems (haven't tested telegram or zoom).
| barbs wrote:
| I find myself a little bit happy that it's down. I use Facebook
| quite often, but mostly because everyone else I know uses it. If
| everyone is forced to find an alternative, that'd be fine by me.
| totaldude87 wrote:
| wonder how much of internet traffic as a whole is down now..
| pat-jay wrote:
| Finally! A small (or not so small) outage for FB, a large benefit
| for mankind :)
| [deleted]
| carrja99 wrote:
| Good.
| bhartzer wrote:
| Facebook being down makes me think of all of those small
| businesses who never built websites. They rely on traffic and
| publicity from their Facebook pages only.
|
| It's so important to diversify, such as building a website.
| tinza123 wrote:
| Well assuming they do that, likely they will host websites on
| AWS or Azure. Then AWS is down, what are you gonna do?
| jressey wrote:
| Move your resources to the one you didn't pick and point your
| domain registrar there.
| K5EiS wrote:
| There's always Oracle Cloud ;)
| [deleted]
| [deleted]
| [deleted]
| bitzlab wrote:
| This morning around 10:00 AM I've been posting on a forum a
| message in to which I have said that I'm hoping Facebook will get
| shut down.. Now, Facebook situation you see it might be my fault.
| Sorry folks, didn't meant that for real.. I still need FB's
| products! =))
|
| https://natrmd.com
| nazgulsenpai wrote:
| I just got the login page. It was fun while it lasted.
| shmiga wrote:
| A lot of addict's will now feel how is it to live in the reality
| not scrolling dump images. I hope this will be tradition at least
| once a month.
| qnsi wrote:
| I am an addict. I refresh this thread waiting for information:
| fb is back online.
|
| I need help but this is too hard for me. I uninstall social
| media once a week but install two days later.
|
| I should probably go to therapy with this, but I am not sure I
| wouldnt be laughed at
| ppqqrr wrote:
| https://heyfocus.com/ worked for me, maybe it'll help (if
| you're on Mac). Addiction to social media is a real problem;
| thousands of engineers are paid to make sure that these
| products ensnare your attention. It wouldn't be odd if it
| takes a few bucks of your own to rescue yourself. Don't
| hesitate to seek help, no one will laugh at you.
| areactnativedev wrote:
| And here we are, looking at the xth top-level comment on HN...
| keithnoizu wrote:
| somewhere an engineer is begging a mnesia instance to come back
| online.
| teddyh wrote:
| Interestingly, their .onion site1 is also down.
|
| https://en.wikipedia.org/wiki/Facebook_onion_address
| fluxem wrote:
| This is expected. I guess, the internal DNS is down, so the
| whole infrastructure is broken
| coolspot wrote:
| The onion site is just a reverse-proxy to the main web-site. So
| if the main site is down (due to internal DNS or BGP issues)
| onion reverse-proxy can't get to it as well.
| john37386 wrote:
| Imagine if they need access to fb.com email to re-enable the
| access for the on-site technician.
| kingkool68 wrote:
| https://www.whatsmydns.net/#A/facebook.com
| EGreg wrote:
| Anyone able to Connect w WhatsApp at the moment?
| html5web wrote:
| Great time to take a break from Facebook and Instagram. Use
| Telegram instead of WhatsApp
| chadlavi wrote:
| I recognize that for WhatsApp users around the globe this is
| probably more than an inconvenience, but the rest of humanity is
| getting something of a reprieve here.
| wooger wrote:
| Also amusingly I think quite a lot of employers use WhatsApp
| for part of their disaster communications plan.
|
| If this happened then some wider issue (github down) there'd be
| chaos.
| phatfish wrote:
| The UK government will grind to a halt if WhatsApp is still
| down tomorrow. Well, more than usual that is.
| Animats wrote:
| New York Times has coverage.[1]
|
| _" A small team of employees was soon dispatched to Facebook's
| Santa Clara, Calif., data center to try a "manual reset" of the
| company's servers, according to an internal memo."_
|
| [1] https://archive.is/iBzs3
| Ristovski wrote:
| Discord [1] is taking a toll from the increased traffic as well:
|
| "We're noticing an elevated level of usage for the time of day
| and are currently monitoring the performance of our systems. We
| do not anticipate this resulting in any impact to the service.
|
| We have temporarily disabled typing notifications. We expect
| these to be re-enabled soon."
|
| [1] https://discordstatus.com/
| Le_Dook wrote:
| Yeah, it seems like a lot of places online are facing the same.
| Even here has been bad for me
| dkarp wrote:
| Someone save a risky release for 9am on a Monday morning? Decided
| Friday afternoon was too risky?
| grayhatter wrote:
| 0830 actually :/
|
| But to be fair... seems like it was a good call to not do it
| Friday night :D
| dkarp wrote:
| If they chose 8:30, then it must have been really risky! ;)
| euroderf wrote:
| The Honolulu office is getting ready for a long night :)
| Animats wrote:
| There's still no connectivity to Facebook's DNS servers:
| > traceroute a.ns.facebook.com traceroute to
| a.ns.facebook.com (129.134.30.12), 30 hops max, 60 byte packets
| 1 dsldevice.attlocal.net (192.168.1.254) 0.484 ms 0.474 ms
| 0.422 ms 2
| 107-131-124-1.lightspeed.sntcca.sbcglobal.net (107.131.124.1)
| 1.592 ms 1.657 ms 1.607 ms 3 71.148.149.196
| (71.148.149.196) 1.676 ms 1.697 ms 1.705 ms 4
| 12.242.105.110 (12.242.105.110) 11.446 ms 11.482 ms 11.328 ms
| 5 12.122.163.34 (12.122.163.34) 7.641 ms 7.668 ms 11.438 ms
| 6 cr83.sj2ca.ip.att.net (12.122.158.9) 4.025 ms 3.368 ms
| 3.394 ms 7 * * * ...
|
| So they're hours into this outage and still haven't re-
| established connectivity to their own DNS servers.
| alexvoda wrote:
| Can someone explain why it is also down when trying to access
| it via Tor using its onion address:
| http://facebookwkhpilnemxj7asaniu7vnjjbiltxjqhye3mhbshg7kx5t...
|
| Or when trying ips directly: https://www.lifewire.com/what-is-
| the-ip-address-of-facebook-...
|
| I would have expected a DNS issue to not affect either of
| these.
|
| I can understand the onionsite being down if facebook
| implemented it the way a thirdparty would (a proxy server
| accessing facebook.com) instead of actually having it
| integrated into its infrastructure as a first class citizen.
| gamacodre wrote:
| The issue here is that this outage was a result of all the
| routes into their data centers being cut off (seemingly from
| the inside). So knowing that one of the servers in there is
| at IP address "1.2.3.4" doesn't help, because no-one on the
| outside even knows how to send a packet to that server
| anymore.
| spiantino wrote:
| You can get through to a web server, but that web server uses
| DNS records or those routes to hit other services necessary
| to render the page. So the server you hit will also time out
| eventually and return a 500
| mdtancsa wrote:
| Its partially there. C and D are still not in the global tables
| according to routeviews ie. 185.89.219.12 is still not being
| advertised to anyone. My peers to them in Toronto have routes
| from them, but not sure how far they are supposed to go inside
| their network. (past hop 2 is them)
|
| % traceroute -q1 -I a.ns.facebook.com
|
| traceroute to a.ns.facebook.com (129.134.30.12), 64 hops max,
| 48 byte packets 1 torix-core1-10G (67.43.129.248) 0.133 ms
|
| 2 facebook-a.ip4.torontointernetxchange.net (206.108.35.2)
| 1.317 ms
|
| 3 157.240.43.214 (157.240.43.214) 1.209 ms
|
| 4 129.134.50.206 (129.134.50.206) 15.604 ms
|
| 5 129.134.98.134 (129.134.98.134) 21.716 ms
|
| 6 *
|
| 7 *
|
| % traceroute6 -q1 -I a.ns.facebook.com
|
| traceroute6 to a.ns.facebook.com
| (2a03:2880:f0fc:c:face:b00c:0:35) from 2607:f3e0:0:80::290, 64
| hops max, 20 byte packets
|
| 1 toronto-torix-6 0.146 ms
|
| 2 facebook-a.ip6.torontointernetxchange.net 17.860 ms
|
| 3 2620:0:1cff:dead:beef::2154 9.237 ms
|
| 4 2620:0:1cff:dead:beef::d7c 16.721 ms
|
| 5 2620:0:1cff:dead:beef::3b4 17.067 ms
|
| 6 *
|
| 7 *
|
| 8 *
| mikefromhome wrote:
| dead beef sounds about right
| boshomi wrote:
| Kevin Beaumont: >>The Facebook outage has
| another major impact: lots of mobile apps constantly poll
| Facebook in the background = everybody is being slammed who
| runs large scale DNS, so knock on impacts elsewhere the long
| this goes on.<<
|
| https://twitter.com/GossiTheDog/status/1445118907187175427
| suyash wrote:
| Source (hacker group Anonymous) :
| https://twitter.com/YourAnonOne/status/1445082304393719818
| [deleted]
| rootusrootus wrote:
| I just got off a short pre-interview conversation with a
| manager at Instagram and he had to dial in with POTS. I got the
| impression that things are very broken internally.
| otikthecessna wrote:
| I read that as POTUS at first and paused for a minute
| dividedbyzero wrote:
| What is POTS?
| tacker2000 wrote:
| Plain Old Telephone System
| woofcat wrote:
| Plain old telephone system. Aka a phone.
| terramex wrote:
| Plain old telephone service
| https://en.wikipedia.org/wiki/Plain_old_telephone_service
| askvictor wrote:
| How much of modern POTS is reliant on VOIP? In Australia at
| least, POTS has been decommissioned entirely, but even where
| it's still running, I'm wondering where IP takes over?
| wolverine876 wrote:
| This person has a POTS line in their current location, and a
| modem, and the software stack to use it, and Instagram has
| POTS lines and modems and software that connect to their
| networks? Wow. How well do Instagram and their internal
| applications work over 56K?
| rescbr wrote:
| They could have dialed in by their own cell phone though
| Animats wrote:
| "facebook.com" is registered with "registrarsafe.com" as
| registrar. "registrarsafe.com" is unreachable because it's
| using Facebook's DNS servers and is probably a unit of
| Facebook. "registrarsafe.com" itself is registered with
| "registrarsafe.com".
|
| I'm not sure of all the implications of those circular
| dependencies, but it probably makes it harder to get things
| back up if the whole chain goes down. That's also probably why
| we're seeing the domain "facebook.com" for sale on domain
| sites. The registrar that would normally provide the ownership
| info is down.
|
| Anyway, until "a.ns.facebook.com" starts working again,
| Facebook is dead.
| Animats wrote:
| Notes as Facebook comes back up:
|
| "registrarsafe.com" is back up. It is, indeed, Facebook's
| very own registrar for Facebook's own domains. _"
| RegistrarSEC, LLC and RegistrarSafe, LLC are ICANN-accredited
| registrars formed in Delaware and are wholly-owned
| subsidiaries of Facebook, Inc. We are not accepting retail
| domain name registrations."_ Their address is Facebook HQ in
| Menlo Park.
|
| That's what you have to do to really own a domain.
| BillinghamJ wrote:
| When the NS hostname is dependent on the domain it serves,
| "glue records" cover the resolution to the NS IP addresses.
| So there's no circular dependency type issue
| john37386 wrote:
| Good catch. Hopefully, they won't need an email sent to
| fb.com from registrarsafe.com to update an important record
| to fix this. What a loop.
| jacurtis wrote:
| Facebook does operate their own private Registrar, since they
| operate tens of thousands of domains. Most of these are
| misspellings and domains from other countries and so forth.
|
| So yes, the registrar that is to blame is themselves.
|
| Source: I know someone within the company that works in this
| capacity.
| robalfonso wrote:
| This is not completely accurate. The whole reason a registrar
| with domain abc.com can use ns1.abc.com is because glue
| records are established at the registry, this allows a
| bootstrap that keeps you in from a circular dependency. All
| that said it's usually a bad idea, for someone as large as
| Facebook they should have nameservers across zones ie
| a.ns.fb.com b.ns.fb.org c.ns.fb.co Etc...
| john37386 wrote:
| There is always a step which involve to email the domain
| when a domain update its information with the registrar. In
| this case, facebook.com and registrarsafe.com are managed
| by the same NS. You need these NS to query the MX to send
| that update approval by email and unblock the registrar
| update. Glue records are more for performance than to make
| that loop. I'm maybe missing something but, hopefully they
| won't need to send an email to fix this issue.
| robalfonso wrote:
| This is not true when your the registrar (as in this
| case) in fact your entire system could be down and you'd
| still have access to the registries system to do this
| update
| jfrunyon wrote:
| I have literally never once received an email to confirm
| a domain change. Perhaps the only exception is on a
| transfer to another registrar (though I can't recall that
| occurring, either).
|
| To be fair, we did have to get an email from eurid
| recently for a transfer auth code, but that was only
| because our registrar was not willing to provide.
|
| In any case, no, they will not need to send an email to
| fix this issue.
| john37386 wrote:
| Yes I meant for transferring to another DNS server. In
| this case, they can't.
| thiht wrote:
| > That's also probably why we're seeing the domain
| "facebook.com" for sale on domain sites. The registrar that
| would normally provide the ownership info is down.
|
| That's not how it works. The info of whether a domain name is
| available is provided by the _registry_ , not by the
| registrars. It's usually done via a domain:check EPP command
| or via a DAS system. It's very rare for registrar to
| registrar technical communication to occur.
|
| Although the above is the clean way to do it, it's common for
| registrars to just perform a dig on a domain name to check if
| it's available because it's faster and usually correct. In
| this case, it wasn't.
| keithnoizu wrote:
| DNS is back, looks like systems are still coming online.
| bronlund wrote:
| Yeah that's some pretty hardcore A/B testing right there.
| john37386 wrote:
| Yeah the patch to fix BGP to reach the DNS is sent by email to
| @facebook.com. Ooops no DNS to resolve the MX records to send
| the patch to fix the BGP routers.
| yoelo wrote:
| Seriously? Is that how it works?
| john37386 wrote:
| I don't know. I doubt. It's just funny to think that you
| need email to fix BGP, but DNS is down because of BGP. You
| need DNS to send email which needs BGP. It's a kind of
| chicken and egg problem but at a massive scale this time.
| _joel wrote:
| You'd think they'd have worked that into their DR plans
| for a complete P1 outage of the domain/DNS, but perhaps
| not, or at least they didn't add removal of BGP
| announcements to the mix.
| boshomi wrote:
| Sheera Frenkel: Was just on phone with
| someone who works for FB who described employees unable
| to enter buildings this morning to begin to evaluate
| extent of outage because their badges weren't working to
| access doors.
|
| https://twitter.com/sheeraf/status/1445099150316503057
| cranekam wrote:
| No. A network like Facebook's is vast and complicated and
| managed by higher-level configuration systems, not people
| emailing patches around.
|
| If this issue is even to do with BGP it's much more likely
| the root of the problem is somewhere in this configuration
| system and that fixing it is compounded by some other
| issues that nobody foresaw. Huge events like this are
| _always_ a perfect storm of several factors, any one or two
| of which would be a total noop alone.
| KuiN wrote:
| The Swiss cheese model of accidents. Occasionally the
| holes all align.
|
| https://en.wikipedia.org/wiki/Swiss_cheese_model
| soneil wrote:
| The fun part of BGP is they apparently make a lot of use
| of it within their network, not just advertising routes
| externally.
|
| https://engineering.fb.com/2021/05/13/data-center-
| engineerin...
|
| (and yes, fb.com resolves)
| [deleted]
| weisk wrote:
| No, the backbone of the internet is not maintained with
| patches sent in emails.
| chiluk wrote:
| https://lkml.org/
| chiluk wrote:
| You are very wrong about that https://lkml.org/
| cbarrick wrote:
| Clearly you and the person you replied to are talking
| about very different things.
| chiluk wrote:
| You are very wrong about that ;) https://lkml.org/
| _joel wrote:
| I think the sub-comment is confusing the linux kernel
| with BGP.
| nacs wrote:
| In a way, the Linux kernel does power the "backbones of
| the internet".
| _joel wrote:
| There are a hell of a lot of non-linux OS's running on
| core routers, but yes, in a way. However BGP isn't via
| email.
| NexRebular wrote:
| luckily not... would be absolutely terrible to have the
| backbone only on linux
| [deleted]
| kossTKR wrote:
| NYT tech reporter Sheera Frenkel gives us this update:
|
| > _Was just on phone with someone who works for FB who
| described employees unable to enter buildings this morning to
| begin to evaluate extent of outage because their badges weren't
| working to access doors._
|
| https://twitter.com/sheeraf/status/1445099150316503057
| adriancooney wrote:
| Got a good chuckle imagining a fuming Zuckerberg not being
| allowed into his office, thinking the world is falling apart.
| lbruder wrote:
| Looks like they misconfigured a web interface that they can't
| reach anymore now that they're off the net.
|
| "anyone have a Cisco console cable lying around?"
| CommieBobDole wrote:
| The only one they have is serial and the company's one usb-
| to-serial converter is missing.
| Edman274 wrote:
| The voices, stories, announcements, photos, hopes and
| sorrows of millions, no, literally _billions_ of people,
| and the promise that they may one day be seen and heard
| again now rests in the hands of Dave, the one guy who is
| closest to a Microcenter, owns his own car and knows how to
| beat the rush hour traffic and has the good sense to not
| forget to _also_ buy an RS-232 cable, since those things
| tend to get finicky.
| winternett wrote:
| Heck of a coincidence I must say...
|
| I can imagine this affects many other sites that use FB for
| authentication and tracking.
|
| If people pay proper attention to it, this is not just an
| average run of the mill "site outage", and instead of checking
| on or worrying about backups of my FB data (Thank goodness I
| can afford to lose it all), I'm making popcorn...
|
| Hopefully law makers all study up and pay close attention.
|
| What transpires next may prove to be very interesting.
| kiernanmcgowan wrote:
| My suspicion is that since a lot of internal comms runs through
| the FB domain and since everyone is still WFH, then its
| probably a massive issue just to get people talking to each
| other to solve the problem.
| rStar wrote:
| time to start working at your mfing desk again, johnson
| gocartStatue wrote:
| They supposedly can't enter facebook office right now.
| Their cards don't work.
| eskathos wrote:
| source?
| jjulius wrote:
| NYT reporter on Twitter.
|
| https://twitter.com/sheeraf/status/1445099150316503057
| BrianKamrany wrote:
| Sheera Frenkel @sheeraf Was just on phone with someone
| who works for FB who described employees unable to enter
| buildings this morning to begin to evaluate extent of
| outage because their badges weren't working to access
| doors.
| eskathos wrote:
| "Something went wrong. Try reloading."
|
| its not loading for me. could you say what it said?
| lynx234 wrote:
| Saw this earlier:
| https://twitter.com/disclosetv/status/1445100931947892736
| BrianKamrany wrote:
| Disclose.tv @disclosetv JUST IN - Facebook employees
| reportedly can't enter buildings to evaluate the Internet
| outage because their door access badges weren't working
| (NYT)
| jnorthrop wrote:
| From the Tweet, "Was just on phone with someone who works
| for FB who described employees unable to enter buildings
| this morning to begin to evaluate extent of outage
| because their badges weren't working to access doors."
| eskathos wrote:
| "Something went wrong. Try reloading."
|
| its not loading for me. could you say what it said?
| david_allison wrote:
| > Was just on phone with someone who works for FB who
| described employees unable to enter buildings this
| morning to begin to evaluate extent of outage because
| their badges weren't working to access doors.
|
| https://nitter.net/sheeraf/status/1445099150316503057
| secondcoming wrote:
| Unlikely, PagerDuty was invented for this kind of thing
| kiernanmcgowan wrote:
| Oh I'm sure everyone knows whats wrong, but how am I
| supposed to send an email, find a coworkers phone number,
| get the crisis team on video chat etc etc if all of those
| connections rely on the facebook domain existing?
| ralphm wrote:
| Hence the suggestion for PagerDuty. It handles all this,
| because responders set their notification methods (phone,
| SMS, e-mail, and app) in their profiles, so that when in
| trouble nobody has to ask those questions and just add a
| person as a responder to the incident.
| korethr wrote:
| Yes, but Facebook is not a small company. Could PagerDuty
| realistically handle the scale of notifications that
| would be required for Facebook's operations?
| robalfonso wrote:
| Even if it can't, it's trivial to use it for an important
| subset, ie is Facebook.com down, is the ns stuff down
| etc. So there is an argument to be made for still using
| an outside service as a fallback
| jfrunyon wrote:
| I guarantee you that every single person at Facebook who
| can do anything at all about this, already knows there's
| an issue. What would them receiving an extra notification
| help with?
| robalfonso wrote:
| We kind of got off topic, I was arguing that if you were
| concerned about internal systems being down (including
| your monitoring/alerting) something like pager duty would
| be fine as a backup. Even at huge scale that backup
| doesn't need to watch everything.
|
| I don't think it's particularly relevant to this issue
| with fb. I suspect they didn't need a monitoring system
| to know things were going badly.
| anigbrowl wrote:
| Sure, if you're...
|
| - not arrogant - or complacent - haven't inadvertently
| acquired the company - know your tech peers well enough
| to have confidence in their identity during an emergency
| - do regular drills to simulate everything going wrong at
| once
|
| Lots of us know what _should_ be happening right now, but
| think back to the many situations we 've all experienced
| where fallback systems turned into a nightmarish war
| story, then scale it up by 1000. This is a historic day,
| I think it's quite likely that the scale of the outage
| will lead to the breakup of the company because it's the
| Big One that people have been warning about for years.
| antoinealb wrote:
| PagerDuty does not solve some of the problems you would
| have at FB's scale, like how do you even know who to
| contact ? And how do they login once they know there is a
| problem ?
| Spooky23 wrote:
| Sure. As long as you plan for disaster.
|
| The place where I worked had failure trees for every
| critical app and service. The goal for incident
| management was to triage and have an initial escalation
| for the right group within 15 minutes. When I left they
| were like 96% on target overall and 100% for
| infrastructure.
| justinzollars wrote:
| What do you think will be the impact on WFH and office
| requirements?
| still_grokking wrote:
| You mean the same problem as when GMail goes down and
| Googlers can't reach each other?
|
| I guess good decentralized public communication services
| could solve those issues for everybody.
| badrequest wrote:
| What do you think all those superfluous chat apps were for?
| praptak wrote:
| Word is that the last time Google had a failure involving a
| cyclical dependency they had to rip open a safe. It
| contained the backup password to the system that stored the
| safe combination.
| l9i wrote:
| The safe in question contained a smartcard required to
| boot an HSM. The safe combination was stored in a secret
| manager that depended on that HSM.
|
| _The engineer attempted to restart the service, but did
| not know that a restart required a hardware security
| module (HSM) smart card. These smart cards were stored in
| multiple safes in different Google offices across the
| globe, but not in New York City, where the on-call
| engineer was located. When the service failed to restart,
| the engineer contacted a colleague in Australia to
| retrieve a smart card. To their great dismay, the
| engineer in Australia could not open the safe because the
| combination was stored in the now-offline password
| manager._
|
| Source: Chapter 1 of "Building Secure and Reliable
| Systems" (https://sre.google/static/pdf/building_secure_a
| nd_reliable_s... size warning: 9 MB)
| brazzy wrote:
| Lovely.
|
| Safes typically have the instructions on how to change
| the combination glued to the inside of the door, and
| ending with something like "store the combination
| securely. _Not inside the safe!_ "
|
| But as they say: make something foolproof and nature will
| create a better fool.
| anigbrowl wrote:
| I'm sure this sort of thing won't be a problem for a
| company whose founding ethos is 'move fast and break
| things.' O:-)
| FearNotDaniel wrote:
| Anyone remember the 90s? There was this thing called the
| Information Superhighway, a kind of decentralised network
| of networks that was designed to allow robust
| communications without a single point of failure. I wonder
| what happened to that...?
| wolverine876 wrote:
| Aren't we still communicating on HN, even though the
| possibly largest network is down? Can you send email?
| mastazi wrote:
| We are a dying breed... A few days ago my daughter asked
| me "will you send me the file on Whatsapp or Discord?". I
| replied I will send an email. She went "oh, you mean on
| Gmail?" :-D
| prox wrote:
| I am going to guess it's one of those things the techies
| want to get round to, but in reality there is never any
| chance or will to do it.
| ewalk153 wrote:
| Folks are still chatting here... seems to work as
| designed...
| oconnor663 wrote:
| I think the issue there is that in exchange for solving the
| "one fat finger = outage" problem, you lose the ability to
| update the server fleet quickly or consistently.
| l9i wrote:
| I can assure you that Google has a procedure in place for
| that.
| l9i wrote:
| I unfortunately cannot edit the parent comment anymore
| but several people pointed out that I didn't back up my
| claim or provided any credentials so here they are:
|
| Google has multiple independent procedures for
| coordination during disasters. A global DNS outage
| (mentioned in
| https://news.ycombinator.com/item?id=28751140) was
| considered and has been taken into account.
|
| I do not attempt to hide my identity here, quite the
| opposite: my HN profile contains my real name. Until
| recently a part of my job was to ensure that Google is
| prepared for various disasterous scenarios and that
| Googlers can coordinate the response independently from
| Google's infrastructure. I authored one of the fallback
| communication procedures that would likely be exercised
| today if Google's network experienced a global outage. Of
| course Google has a whole team of fantastic human beings
| who are deeply involved in disaster preparedness (miss
| you!). I am pretty sure they are going to analyze what
| happened to Facebook today in light of Google's emergency
| plans.
|
| While this topic is really fascinating, I am
| unfortunately not at liberty to disclose the details as
| they belong to my previous employer. But when I stumble
| upon factually incorrect comments on HN that I am in a
| position to correct, why not do that?
| ric2b wrote:
| Yup, they make a new chat app if the previous one is
| down.
| gadnuk wrote:
| Google Talk, Google Voice, Google Buzz, Google+
| Messenger, Hangouts, Spaces, Allo, Hangouts Chat, and
| Google Messages.
|
| At some point, they must run out of names, right?
| londons_explore wrote:
| You forgot the chat boxes inside other apps like Google
| docs, Gmail, YouTube, etc.
| scatters wrote:
| And Google Pay, apparently.
| andrepd wrote:
| You forgot google meet!
| darkhorn wrote:
| And Google Wave.
| knorker wrote:
| For those who don't know who he is: l9i would know this.
| Just clarifying that this is not an Internet nobody
| guessing.
| astrange wrote:
| Why does it matter if he's guessing or not?
| fragmede wrote:
| Because, it may shock you to know, but sometimes people
| just go on the Internet and tell lies.
|
| No _shit_ Google has plans in place for outages.
|
| But what are these plans, are they any good... a
| respected industry figure who's CV includes being at
| Google for 10 years doesn't need to go into detail
| describing the IRC fallback to be believed and trusted
| that there is such a thing.
| new_guy wrote:
| That's just an 'appeal to authority'.
|
| No-one knows or cares who made the statement, it may as
| well have been 'water is wet', it was useless and adds
| nothing but noise.
| l9i wrote:
| I found a comment that was factually incorrect and I felt
| competent to comment on that. Regrettably, I wrote just
| one sentence and clicked _reply_ without providing any
| credentials to back up my claim. Not that I try to hide
| my identity, as danhak pointed out in
| https://news.ycombinator.com/item?id=28751644, my full
| name and URL of my personal website are only a click
| away.
|
| I have replied to my initial comment with provide some
| additonal context:
| https://news.ycombinator.com/edit?id=28752431. Hope that
| helps.
| heartbreak wrote:
| That's...not what "appeal to authority" means.
| jaywalk wrote:
| I don't know who either he or you are, so...
| knorker wrote:
| I was clarifying his comment, since he didn't mention
| that this is not a guess, but inside knowledge.
|
| I was not trying to establish a trust chain.
|
| Take from it what you will.
| sam_lowry_ wrote:
| He is still an anonymous dude to me.
| [deleted]
| danhak wrote:
| HN Profile -> Personal Website -> LinkedIn -> Over 10
| years experience as Google Site Reliability Engineer
| ant6n wrote:
| Is the LinkedIn profile linking back to the hn account?
| e1g wrote:
| Google SRE for 10 years, ending as the Principal Site
| Reliability Engineer (L8).
| sulam wrote:
| s/the//
|
| Google has more than 1 L8 SRE.
| still_grokking wrote:
| I've read here on HN that exactly this was the issue as
| they had one of the bigger outages (I think it was due to
| some auth service failure) and GMail didn't accept
| incoming mail.
| l9i wrote:
| A Gmail outage would be barely an inconvenience as Gmail
| plays a minor role in Google's disaster response.
|
| Disclaimer: Ex-Googler who used to work on disaster
| reponse. Opinions are my own.
| ddalex wrote:
| Googler here - my opinions are my own, not representing the
| company
|
| at the lowest level in case of severe outage we resort to
| IRC, Plain Old Telephone Service and, sometimes, stick-it
| notes taped to windows...
| jug wrote:
| Some people here say their fallback IRC doesn't work due
| to DNS reliance. :|
| comonoid wrote:
| One of my employers once forced all the staff to use an
| internally-developed messenger (for sake of security, but
| some politics was involved as well), but made an
| exception for the devops team who used Telegram.
| lmitfb wrote:
| That would completely defeat the purpose... I have a hard
| time believing that.
| jaywalk wrote:
| Why? Even if it's not DNS reliance, if they self-hosted
| the server (very likely) then it'll be just as
| unreachable as _everything else_ within their network at
| the moment.
| yupper32 wrote:
| The entire purpose of an IRC backup is in case shit hits
| the fan. That means having it run on a completely
| separate stack.
|
| What use is it if it runs on the same stack as what you
| might be trying to fix?
| jaywalk wrote:
| Clearly "our entire network is down, worldwide" wasn't
| part of their planning. Don't get too cocky with your
| 20/20 hindsight.
| yupper32 wrote:
| I don't think it's cocky or 20/20 hindsight. Companies
| I've worked for specifically set up IRC in part because
| "our entire network is down, worldwide" can happen and
| you need a way to communicate.
| littlecranky67 wrote:
| If only IRC would have been built with multi-server
| setups in mind, that forward messages between servers,
| and continues to work if a single - or even a set - of
| servers would go down, just resulting in a netsplit...Oh
| wait, it was!
|
| My bet is, FB will reach out to others in FAMANG, and an
| interest group will form maintaining such an emergency
| infrastructure comm network. Basically a network for
| network engineers. Because media (and shareholders) will
| soon ask Microsoft and Google what their plans for such
| situations are. I'm very glad FB is not in the cloud
| business...
| rrix2 wrote:
| > If only IRC would have been built with multi-server
| setups in mind, that forward messages between servers,
| and continues to work if a single - or even a set - of
| servers would go down, just resulting in a netsplit...Oh
| wait, it was!
|
| yeah _if only_ Facebook 's production engineering team
| had hired a team of full time IRCops for their emergency
| fallback network...
| littlecranky67 wrote:
| Considering how much IRCops were paid back in the day
| (mostly zero as they were volunteers) and what a single
| senior engineer at FB makes, I'm sure you will find 3-4
| people spread amongst the world willing to share this
| 250k+ salary amongst them.
| ceva wrote:
| That is called outbound network :)
| Johnny555 wrote:
| Around here we use Slack for primary communications,
| Google Hangouts (or Chat or whatever they call it now) as
| secondary, and we keep an on-call list with phone numbers
| in our main Git repo, so everyone has it checked out on
| their laptop, so if the SHTF, we can resort to voice
| and/or SMS.
|
| I remembered to publish my cell phone's real number on
| the on-call list rather than just my Google Voice number
| since if Hangouts is down, Google Voice might be too.
| texasbigdata wrote:
| Where are the tapes though? Colo on separate tectonic
| tape or nah?
| Johnny555 wrote:
| ?
| guidoism wrote:
| I worked on the identity system that chat (whatever the
| current name is) and gmail depend on and we used IRC
| since if we relied on the system we support we wouldn't
| be able to fix it.
| strulovich wrote:
| Those communications are done over irc at FB for exactly this
| purpose.
| okwubodu wrote:
| I don't know how true it is but a few reports claim employees
| can't get into the building with their badges.
| korethr wrote:
| Link to such claims here:
| https://news.ycombinator.com/item?id=28750894
|
| I have no doubt that the publicly published post-mortem
| report (if there even is one) will be heavily redacted in
| comparison to the internal-only version. But I very much
| want to see said hypothetical report anyway. This kind of
| infrastructural stuff fascinates me. And I would hope there
| would be some lessons in said report that even small time
| operators such as myself would do well to heed.
| RichardCA wrote:
| I think the real take away is that no one has this
| figured out.
|
| A small company has to keep all of its customers happy
| (or at least be responsive when issues arise, at a bare
| minimum).
|
| Massive companies deal in error budgets, where a fraction
| of a percent can still represent millions of users.
| throwdecro wrote:
| I guess they didn't have an "emergency ingress" plan.
| ToddWBurgess wrote:
| The they will have to old school it and try a brick.
| metadaemon wrote:
| I've heard on Blind this is unrelated, more of a Covid
| restriction issue.
| wolverine876 wrote:
| What is Blind? Or shouldn't I ask?
| monkeydust wrote:
| www.teamblind.com
|
| Enjoy.
| rvnx wrote:
| A copy of Glassdoor
| ithkuil wrote:
| first rule of Blind, never talk about Blind
| cududa wrote:
| I remember my first time having a meeting at Facebook and
| observing none of the doors had keyholes and thinking "hope
| their badge system never goes down"
| londons_explore wrote:
| Breaking the glass to get in to fix the service is
| totally a good business move.
|
| A few hundred bucks of glass Vs a billion wiped off the
| share price if the service is down for a day and all the
| user's go find alternatives.
| Bluecobra wrote:
| In case of emergency, break glass...
|
| ...the doors are glass right?
| tetha wrote:
| All doors are glass with the right combination of a
| halligan bar, an axe and a gasoline powered saw.
|
| And I guess beyond that point, walls are glass. Or you
| need explosives.
| cududa wrote:
| Zucks personal conference room has 3 glass walls, so I've
| been amusing myself imagining him just throwing a chair
| through one of the walls
| thrwyoilarticle wrote:
| Do they (you?) call him that at FB?
| hellbannedguy wrote:
| I don't think he has the strength.
| tablespoon wrote:
| > I remember my first time having a meeting at Facebook
| and observing none of the doors had keyholes and thinking
| "hope their badge system never goes down"
|
| Every internet-connected physical system needs to have a
| sensible offline fallback mode. They should have had
| physical keys, or at least some kind of offline RFID
| validation (e.g. continue to validate the last N badges
| that had previously successfully validated).
| skeeter2020 wrote:
| maybe they're open by default, like old 7-11 stores when
| they went 24hrs and had no locks on the doors :)
| Bombthecat wrote:
| Aaaaaaand it's down!
| [deleted]
| jonny_eh wrote:
| https://twitter.com/sheeraf/status/1445099150316503057
| threevox wrote:
| LOL - score one against building out all tooling internally
| (a la Amazon and apparently Facebook too)
| vineyardmike wrote:
| The rate at which some amazon services lately go done
| because _other_ AWS services went down proves that this is
| an unsustainable house of cards anyways.
| kevin_thibedeau wrote:
| Netflix knows how to build on top of a house of cards.
| _joel wrote:
| Everything is a f*king Facebook problem
| TedShiller wrote:
| This solves the disinformation problem
| m_coder wrote:
| I unblocked Facebook right now from my hosts file so I could
| message someone and couldn't figure out why Facebook failed to
| load. I tested HN and viola I see that the entire world has sent
| Facebook requests to 0.0.0.0 lol
| interestica wrote:
| You broke it.
|
| I didn't receive expected WhatsApp messages and am only now
| realizing there's no indication within the app that there is
| even a problem. It only becomes (somewhat) apparent when
| sending a message never gets a single check mark. Not a
| graceful failure for the user view.
| monkaiju wrote:
| Gotta post this every time theres a big DNS issue, which seems
| daily now.
|
| Check out Dug! Its a global DNS propagation/monitoring toolon the
| CLI: https://github.com/unfrl/dug/
| jmfldn wrote:
| Yet another reason to not over-rely on a few big tech companies
| for the majority of the planet's communication. Forget concerns
| about competition, monopolies and so on for now (as important as
| they are), what we want are many social networks, video
| conferencing apps, messenger apps. Every country should strive to
| build their own Google or FB, or certainly many more should.
| State-backed if needed. It's a question of resilience and
| security as much as anything.
| koksik202 wrote:
| my home connection with ISP is down Vodafone Ireland, so I guess
| they have such a big churn in Vodafone from FB BGP routes that it
| blew Vodafone network. Is it DNS or routing issue?
| [deleted]
| [deleted]
| ctur wrote:
| It's back as of approx 14:47 PST.
| tcarn wrote:
| Agreed, website loading here, still no whatsapp though.
| pytlicek wrote:
| Who else sees their deleted messages on WhatsApp that shouldn't
| be there?
|
| https://twitter.com/Pytlicek/status/1445072626729242637
| reilly3000 wrote:
| That is wild and definitely newsworthy. Capture as many
| screenshots and data as you can.
|
| FWIW it seems possible that the messages remain cached locally
| on your device but deleted from their servers, and with their
| outage they aren't being updated to delete?
| 74d-fe6-2c6 wrote:
| I'm pretty sure this has been building up since the morning
| (Germany). I've had odd connectivity problems to a number of
| sites including slack for a moment.
| [deleted]
| muthdra wrote:
| Lichess Android app is also down but not the webpage. Infinity
| app for Reddit is down. HN is super slow.
| jangrahul wrote:
| makes me think, why dont porn sites ever go down ?
| gmiller123456 wrote:
| Phrasing!
| poorjohnmacafee wrote:
| Teen depression and suicide rates plummeting right now
| drcongo wrote:
| Let's hope it's permanent.
| mrstumpy wrote:
| You win for best comment
| armchairhacker wrote:
| > "Idk man, this seems like a tough issue. Maybe we should just
| give up"
|
| > "Ok." - Zuckerburg
|
| _shuts down $100B company_
| [deleted]
| np1810 wrote:
| +1, with the recent reveal -
|
| https://news.ycombinator.com/item?id=28741755
|
| https://news.ycombinator.com/item?id=28741532
| [deleted]
| [deleted]
| dominotw wrote:
| i talk to my parents in India everyday in india. Watsapp is the
| only game in town there.
| nicce wrote:
| I hope that you can install other messaging apps as well?
| stonecharioteer wrote:
| Not OP. They _can_ but good luck trying to convince parents
| of that. They're not tech savvy enough to install apps
| themselves. They have simple questions about why Whatsapp
| cannot be installed in a basic Nokia phone for instance.
| It's not easy to convince them to use Signal or Telegram or
| anything else.
| hkai wrote:
| Why?
| drdeadringer wrote:
| One American example quote that holds true for countries
| outside of America:
|
| Direct quote: "That website on Facebook."
|
| There are people who believe that "Facebook" literally equals
| "Internet". Facebook, Internet ... Internet, Facebook.
|
| Rinse and repeat for your alternative echo chamber regarding
| Google, the Microsoft Bing, &c.
| vadfa wrote:
| Because he doesn't like the website so he thinks nobody else
| should be able to use it.
| elevenoh wrote:
| b/c most humans are on the wrong end of fb's covert,
| exploitative attention-manipulation
| dessant wrote:
| Because combined with the abysmal state of education in most
| places, and a general lack of government action, Facebook is
| an actual threat to our civilization.
| forgetfulness wrote:
| People unfortunately love the upsides of misinformation, or
| perhaps it's the format that makes it easy to build
| community around shared (misinformed) values, to rally in
| battles that rage for hours or days for a cause you deeply
| believe in and can follow by digesting 30-second soundbites
| on social midea and 30-minute videos on YouTube.
|
| People will do this wherever they can talk in a group
| online, not just Facebook properties. It's... pretty bad
| actually, I think the only tool that exists right now is
| censorship, because the bullshit gets created, spread, and
| wholeheartedly received way faster than debunking will.
|
| And censorship is a power that can't be safely entrusted to
| nobody.
| kbelder wrote:
| Eh, Twitter's worse.
| hkai wrote:
| Why?
|
| The reasons I've seen are:
|
| > it creates a risk of bad self-image for young girls
|
| It's a parent's job to educate your children. There are
| much worse things than Facebook out there.
|
| > it collects data
|
| Literally no harm in knowing that someone is interested in
| JavaScript, cats and fetish porn, and targeting ads to that
| user.
|
| > it's addictive
|
| So is sex, marijuana, and collecting stamps.
|
| > it helps organize protests
|
| Good.
| jachee wrote:
| It actively uses its algorithm to radicalize racists and
| conspiracy theorists, and when it discovered that's what
| it was doing decided to keep doing it because it was good
| for the bottom line:
|
| https://www.businessinsider.com/facebook-pushes-qanon-
| racism...
| [deleted]
| Qi_ wrote:
| An alternate explanation is that the algorithm tries to
| promote engagement and user retention. Presumably, people
| susceptible to radicalization engage with the content
| discussed in the article. It would be unreasonable to
| expect Facebook to not act in its own self-interest.
| jachee wrote:
| Any algorithm that can maximize engagement can be tuned
| to minimize radicalization and dissemination of hatred
| and fascism.
|
| I'd argue that it's absolutely in Facebook's self-
| interest to reduce their active role in promoting
| fascism, racism, homophobia, etc.
| forgetfulness wrote:
| > An alternate explanation is that the algorithm tries to
| promote engagement and user retention. Presumably, people
| susceptible to radicalization engage with the content
| discussed in the article. It would be unreasonable to
| expect Facebook to not act in its own self-interest.
|
| That's the whole point. _Oh they 're just trying to make
| a buck like everyone else_ is exactly the problem.
|
| They are a running a paperclip maximizer that turns
| passive consumers of misinformation into "engaged"
| radicals and the system that is Facebook has no incentive
| to correct this.
|
| https://en.wikipedia.org/wiki/Instrumental_convergence
| hkai wrote:
| To recap, you seem to be concerned that all social media
| are allowing posts to become popular, and those posts
| sometimes promote hatred towards conservatives or
| liberals.
|
| Two questions:
|
| - What do you think should be done about the legacy media
| that is doing the same?
|
| - Should social media promote boring posts, or actively
| censor political content in favour of a certain
| viewpoint, or anything else? Perhaps a real-life name
| registration for anyone with over 1000 followers, like in
| China?
| jachee wrote:
| > those posts sometimes promote hatred towards
| conservatives or liberals.
|
| Incorrect assertion. Those posts promote hatred and/or
| violence toward _humans_ for traits those humans did not
| choose. e.g. race, sexual orientation, etc.
|
| Legacy media aren't actively amplifying the voices and
| recruiting efforts of white supremacists.
|
| Facebook is. They acknowledge that they are. They chose
| to actively allow and encourage it for profit.
| jader201 wrote:
| > It's a parent's job to educate your children. There are
| much worse things than Facebook out there.
|
| I'm guessing that either you're not a parent, or your
| kids aren't teens.
|
| But most parents of teens realize that kids, and
| especially teens, are often much more influenced by
| things like social media & peers (and peers via social
| media) vs. influence their parents have on them.
| SirensOfTitan wrote:
| I don't necessarily disagree, but often I hear FB or other
| tech companies like Twitter singled out re: misinformation.
| News media contributes to misinformation and contributes to
| a warped, partisan, permanently-in-catastrophe-mode
| population just as much as FB, Twitter, and other mediums.
|
| I doubt, if FB goes away, that any of the issues you're
| implying will go away or even get much better. In fact, the
| lack of a real look into the negative effects of consumer
| news product reinforces this idea that only the elite can
| know the truth, and the masses just have to get in line and
| shut up.
|
| News media proliferated nonsense from fed sources to
| justify the Iraq war, they gave Trump 24/7 airtime for a
| while because it increased ratings. They constantly forgo
| any real accountability for their actions, and pretend that
| they aren't just another addictive consumer product that
| warps peoples' brains.
| jose-cl wrote:
| here's your upvote master
| clydethefrog wrote:
| Unfortunately Whatsapp replaced texting for around 80 % of the
| world.
| pid-1 wrote:
| Not a fan of FB, but the main reason for WhatsApp's success
| was SMS sucking hairy balls.
| snarf21 wrote:
| It didn't help that telecoms used to use SMS as an extreme
| profit center. I don't think WhatsApp would have taken over
| the way it did if SMS was always included in all plans for
| free. This is similar to the way "local" long distance used
| to be such a racket.
| martinald wrote:
| Most UK plans included unlimited SMS for a long time, but
| whatsapp still took over.
|
| The group chat functions don't really exist in SMS (maybe
| in MMS but they never work properly), photos (same),
| whatsapp desktop, you can text when you have WiFi but no
| 4G (or using a different sim card when travelling), etc.
| neop1x wrote:
| No problem, they still have phone numbers of those people so
| they can send them SMS with Signal invitation. :)
| jliptzin wrote:
| Of all the big tech companies Facebook is the only one where it
| can completely disappear overnight and my life would be
| completely unaffected (or possibly improved by not having to
| explain to people I don't use facebook, please email or text me
| your invitations rather than use messenger). If Google, Amazon,
| Netflix, Apple disappeared the story would be completely
| different.
| gpvos wrote:
| Facebook is the only one of those that I regularly use. I'd
| like something like Google's Android to stay around. The rest
| I don't need.
| bluecalm wrote:
| Man, out of those only Netflix going down wouldn't cause a
| gigantic billions of dollars worth clusterfuck to people,
| businesses and companies. It's nice you don't use them but
| about everyone around does and mostly for at least some
| important things.
| akudha wrote:
| I am surprised you have Netflix on the list. It would be
| annoying for 2 minutes, then you can simply go for a walk or
| read a book.
| speedgoose wrote:
| Or watch movies and shows using one of the many
| alternatives to Netflix.
| perryizgr8 wrote:
| With Facebook, whatsapp and Instagram down, it feels like the
| entire internet is down for me.
| apexalpha wrote:
| Weird, because over here WhatsApp is ingrained into the
| social fabric of your life. Couldn't imagine ever going back
| to texting/iMessage.
| Wowfunhappy wrote:
| But your friend groups would probably be able to migrate to
| Signal/Discord/Hangouts/etc quite quickly if WhatsApp were
| to disappear, no? WhatsApp has the network effect on its
| side by way of existing, but that could change quickly if
| given a push.
| Broken_Hippo wrote:
| Sure. But you might not get everyone back - you'd have to
| have an alternate method of talking to the folks to get
| them to switch and meet up in the same place. You'd have
| this if the service just slowly died (like landlines),
| but not if something breaks instantly - forever. I'm
| guessing we've all had this when games died (especially
| old text-based MMORPG's, for example. So many people
| gone).
| phpnode wrote:
| At least with WhatsApp you do have the contact's phone
| number, so you can reach them via SMS if necessary.
| bduerst wrote:
| Small and medium businesses would suffer as well, since
| many use WhatsApp as a sales channel now.
| aldanor wrote:
| After using Telegram, WhatsApp is a complete piece of
| garbage, if it disappeared from the face of the earth it
| would be sure for the best as people would move on to
| alternative messengers.
| mixedCase wrote:
| Does Telegram have E2E messages by default, and using a
| sensible encryption protocol? If not, I disagree.
| aldanor wrote:
| IIRC, e2e by default for audio/video; for text chats, can
| be enabled by marking chat as 'secret'. Is it true E2E?
| Probably not (i.e. Telegram has keys that can be turned
| over to any government, noone argues with that)
|
| Does WhatsApp have a true E2E either? Ask hundreds of
| moderators employed by Facebook who review WhatsApp
| messages flagged as improper and the chat history around
| them...
|
| However, accepting the fact that neither of the services
| is truly secure, Telegram experience as a service is much
| better for an average user.
| mixedCase wrote:
| > for text chats, can be enabled by marking chat as
| 'secret'. Is it true E2E? Probably not (i.e. Telegram has
| keys that can be turned over to any government, noone
| argues with that)
|
| That was my problem, and your confirmation means it's
| still as good as nothing.
|
| > Does WhatsApp have a true E2E either? Ask hundreds of
| moderators employed by Facebook who review WhatsApp
| messages flagged as improper and the chat history around
| them...
|
| If one of the ends decides to share a message, it's still
| E2E. That is the big difference.
| aldanor wrote:
| > If one of the ends decides to share a message, it's
| still E2E. That is the big difference.
|
| True. But you can't prove that "one of the ends" must
| necessarily be a human and not the logic in the app code,
| or an intended backdoor? E.g., an automated logic
| scanning for 'malicious' messages on-device.
| AlexandrB wrote:
| I still remember the era when the "in" messenger changed
| every 2-3 years: ICQ -> AIM -> MSN Messenger -> Google
| Chat, etc.
|
| Changing messaging apps not the most convenient thing in
| the world, but it's not some kind of IT cataclysm. Plenty
| of WhatsApp competitors exist.
| erid wrote:
| People would just use an alternative, like Telegram or
| whatever is the next most popular one.
| j4hdufd8 wrote:
| Having trouble doing calls on Telegram now - I guess
| because of the shift in load to Telegram
| dmd wrote:
| WhatsApp, like the metric system, is a "literally
| everywhere but the US" thing. I've never once seen it used
| in the US.
| infinite_beam wrote:
| disappearance of FB might not impact you, but India runs on
| WhatsApp.
| pacija wrote:
| All of the big tech companies you mentioned could completely
| disappear overnight and my life would be completely
| unaffected or possibly improved.
| Carlettosan007 wrote:
| si es muy posible, de otro lado se daria la oportunidad a
| empresas mas cercanas con la gente y que les paguen por los
| usuarios por los datos.
| Carlettosan007 wrote:
| si es muy posible, de otro lado se daria la oportunidad a
| empresas mas cercanas con la gente y que les paguen por los
| usuarios por los datos. finalmente los usuarios son su
| activo para generar muy importantes ingresos, estaria muy
| bien que compartan sus beneficios!
| pmontra wrote:
| Maybe Google, because of the search engine. Android: somebody
| will fill the void.
|
| Messaging: people have been switching on hordes to every new
| free messaging system in the 90s and early 2000s, we will
| adapt to something else.
|
| Netflix and video in general: same thing without the
| 90s/early 2000s.
|
| Amazon: very convenient store, we'll spend a little less and
| somebody will fill the void.
|
| Apple: can't say, never bought anything from them.
|
| By the way, when I couldn't message on WA today I thought day
| they finally cut me off because I still didn't accept their
| new privacy policy from months ago :-) I resolved to wait and
| see for a couple of days.
| ChefboyOG wrote:
| I dunno. If AWS went away suddenly, or if Google Search/the
| G-Suite suddenly stopped existing, the internet as we know
| it would need some time to recover.
| johannes1234321 wrote:
| > Messaging: people have been switching on hordes to every
| new free messaging system in the 90s and early 2000s, we
| will adapt to something else.
|
| Back then the IM population was a lot smaller. Also with
| "Free Basics" and other things in some regions of the world
| Facebook plays a game which makes it impossible to switch.
| (Using Whatsapp is free, for others one ahs to buy mobile
| data credits)
| standardUser wrote:
| Facebook is an unparalleled titan in the realm of advertising
| and WhatsApp is basically a utility-level communication
| system for a big chunk of the globe. Instagram is a key
| cultural driver of the Western world. You many not feel any
| direct firsthand consequences, but the overall impact would
| transform the world around you.
| Devasta wrote:
| Facebook is implicated in genocide in multiple countries,
| and Instagram is nothing but a psychotic lie factory
| designed to induce depression and self loathing in young
| women.
|
| The world would only improve if it disappeared.
| drcongo wrote:
| Yeah, but would there be any drawbacks?
| malandrew wrote:
| I find this kind of comment fascinating because it's
| illustrative of how humans can form intentional
| blindspots as to the utility of a person or institution
| when when all they care about are the negative aspects of
| that person's or institution's existence.
| op: "I don't care about thing X disappearing" re:
| "While you may not care about it because of Y, X also
| provides benefit Z to other people" op: "But
| would there be any drawbacks?"
|
| yeah, there would be drawbacks, other people would lose
| Z, which may matter a lot of them even if it doesn't
| matter to you. Someone just told you about Z, and you
| just responded as if you weren't just told about Z"
|
| These days I find it incredibly frustrating to deal with
| people who have conclusively decided they don't like
| something and that renders them incapable of
| acknowledging other benefits that said thing provides
| even if those benefits aren't relevant to them or are
| less relevant than the things they vocalize caring about.
| PeterisP wrote:
| I can agree with the "intentional blindspots" argument
| but turn it right around.
|
| I'd like to explicitly note that the parent post did
| _not_ say "X also provides benefit Z to other people" -
| it asserted "Facebook is an unparalleled titan in the
| realm of advertising" which is a substantially different
| thing; it's _not_ something that some people simply don
| 't care about and a benefit to some other people and
| considering those statements as equivalent is a (very
| large) intentional blindspot. The current way of how
| advertising is done (driven, in part, by FB) is also a
| harm to many people and society at large, so publicly
| making an implicit assumption that "advertising" is at
| most neutral is not okay, it's something that should be
| called out.
|
| This very "unparalleled titan in the realm of
| advertising" aspect is a major cost on society, a net
| harm that perhaps should be tolerated if it's outweighed
| by some other benefits FB provides (such as the "utility-
| level communication system for a big chunk of the
| globe"), but as itself it's definitely _not_ something
| that should be treated as benign just because some people
| get paid for it.
|
| If FB advertising disappeared with no other drawbacks,
| that would be a great thing. Of course, there _are_ some
| actual drawbacks, but even so it 's quite reasonable to
| motivate people to ask about the actual drawbacks of FB
| being down, because "oh but ads" (with which the
| grandparent post started) is not one.
| drcongo wrote:
| Thank you, I agree with everything you said here. But I'd
| also like to address the other things I was answering
| with the drawbacks quip...
|
| > WhatsApp is basically a utility-level communication
| system for a big chunk of the globe.
|
| Unfortunately, it's not an actual utility though, which
| is precisely my point. It's pure folly to build your
| business around a pseudo utility owned by a private
| company.
|
| > Instagram is a key cultural driver of the Western
| world.
|
| I honestly have no idea how this is being presented as a
| good thing. A "key cultural driver of the western world"
| is an app whose entire purpose is to harvest your data
| and sell it to dodgy partners who will use it to usurp
| democracy.
| standardUser wrote:
| There would be a massive opening for new platforms to
| take over, and the odds that they are also based in the
| West would be much lower.
| hkai wrote:
| What's the advantage of using a Chinese platform instead
| of Facebook in terms of privacy, freedom of speech or
| political influence?
| cdelsolar wrote:
| yes. I want to see what my friends and acquaintances are
| up to.
| dhosek wrote:
| My time on Facebook made it abundantly clear how racist,
| misogynist and otherwise vile a large portion of the
| people I grew up with are. I was much happier having a
| superficial contact with them once every ten years at a
| high school reunion. I'm no longer on Facebook (or
| Twitter).
|
| Occasionally, I'll see/hear/do something and think that
| it would have made a good status update/tweet, but then I
| remember that these things have happened to me for decade
| before social media was a thing and life was fine. Some
| I'll share with my wife or a friend, most just disappear
| and that's fine too.
| yupper32 wrote:
| People seem to not know that you can unfriend or at
| minimum unfollow people on Facebook.
|
| Why did you put up with racist and misogynistic people on
| your feed? Why did you feel the need to delete your
| account instead of unfollowing people?
|
| My feed is nice and clean, with family, some friends, and
| some pages.
| varjag wrote:
| An act of unfriending someone is interpreted as hostile
| action. It's much easier just to not be there in the
| first place.
| yupper32 wrote:
| Then unfollow. They don't know if you unfollow.
| op00to wrote:
| I don't. Why would I need to know more than they decide
| to tell me? I got enough shit on my mind.
| knocte wrote:
| How about you call them to set up a meeting to catch up?
| dkarras wrote:
| Why? It is not as efficient. I can buy everything from
| stores but I use amazon, same thing. I don't actually use
| facebook though because I don't care about anyone really
| but for people that care, it is a solid platform.
|
| There is a gap between "I want to know what people I know
| are up to" and "I want to meet with those people one by
| one to see what they are up to". Some people just want to
| passively watch and that is ok.
| davuinci wrote:
| There are several people earning their living through
| Facebook/Instagram and there is a whole marketplace that
| would impact lots of people. Don't get me wrong, I don't
| use or like FB in any way but FB disappearing overnight
| would definitely have drawbacks for lots of people.
| SV_BubbleTime wrote:
| Replace Facebook in your post with human trafficking :)
|
| Obvious I'm not serious, and it's popular sentiment here
| that _" Fuck Facebook... Oh but I use Instragram and
| WhatsApp of course!"_, but the point was "some people
| making a living on x" isn't really a great argument for
| "x is harmful and we might be better without it".
| dgemm wrote:
| > Facebook is an unparalleled titan in the realm of
| advertising
|
| Uh, Google? It's definitely paralleled, and also preceded
| adolph wrote:
| "Are you alright? What's wrong?"
|
| "I felt a great disturbance in the DNS. As if millions of
| influencers suddenly cried out in terror and were suddenly
| silenced. I feel something terrible has happened. But you
| better get on with your content curation."
| DevKoala wrote:
| > You many not feel any direct firsthand consequences, but
| the overall impact would transform the world around you.
|
| For the better.
| TomSwirly wrote:
| > Facebook is an unparalleled titan in the realm of
| advertising
|
| Not unparalleled - Google exists.
|
| And we need less advertising, not more.
|
| > and WhatsApp is basically a utility-level communication
| system for a big chunk of the globe.
|
| Many other such systems exist - Telegram, Signal, Google
| Chat.
|
| > Instagram is a key cultural driver of the Western world
|
| Western culture will get along just fine without Instagram.
|
| > the overall impact would transform the world around you.
|
| For the better.
| intricatedetail wrote:
| Facebook is an unparalleled titan in the realm of consumer
| manipulation
|
| There I fixed it for you.
| georgeecollins wrote:
| The problem for me is Oculus. I really love their headset and I
| appreciate the investment Facebook has made in that.
|
| I hate the stupid strategy tax that makes me have an FB account
| to use their headset, and has it go down when they have an
| outage. I hope they can learn from MSFT that "Facebook
| Everywhere" is ultimately a self defeating strategy.
| HellDunkel wrote:
| I wish it would stay that way.
| sabujp wrote:
| India runs on whatsapp. They'll have more backups now.
| firstSpeaker wrote:
| Is there any place to see how the overall internet bandwidth
| usage has changed during this outage?
| amir-h wrote:
| Hacker News also got so much slower, is it the load from people
| hoarding here after not being able to reach FB?
| uyt wrote:
| If I wanted to know if a site is down for everyone or just me,
| I would check twitter/hn first before checking the down
| detector sites
| o10449366 wrote:
| Either many HN users are in glee over FB's potential demise
|
| ...or many HN users are also avid FB users (and now have to
| resort to backup sources of entertainment)
| zanethomas wrote:
| hahaha, good riddance
| XiS wrote:
| https://www.cyberciti.biz/humour/a-haiku-about-dns/
| josalhor wrote:
| facebook.com resolves again!
|
| > ping facebook.com
|
| PING facebook.com (31.13.83.36) 56(84) bytes of data.
|
| 64 bytes from edge-star-mini-shv-01-mad1.facebook.com
| (31.13.83.36): icmp_seq=1 ttl=54 time=12.2 ms
|
| 64 bytes from edge-star-mini-shv-01-mad1.facebook.com
| (31.13.83.36): icmp_seq=2 ttl=54 time=12.1 ms
|
| 64 bytes from edge-star-mini-shv-01-mad1.facebook.com
| (31.13.83.36): icmp_seq=3 ttl=54 time=11.7 ms
|
| Can't yet traceroute to a.ns.facebook.com tho
| Animats wrote:
| DNS configuration is becoming a single point of failure. A few
| weeks ago, many services running out of AWS West 2 failed because
| the within-the-datacenter DNS system broke down somehow.
| muthdra wrote:
| Lichess Android app is also down but not the webpage. Infinity
| app for Reddit is down. HN is super slow and "having trouble
| serving requests".
| v4ult wrote:
| Signal FTW
| rootusrootus wrote:
| And a lot of people seem to be coming to HN to find out why,
| judging by how laggy HN is getting right now...
| moffkalast wrote:
| Well it sure is the place to find out for sure.
| hotz wrote:
| The joy that people are getting from this is quite shitty. I hate
| social media but there are people earning a living working for
| these companies. Like others have pointed out, businesses and
| neighborhood watches rely on tech like this. At some point we've
| all had sites/apps go down, in a situation like that the last
| thing you want is people enjoying it. The lack of empathy in this
| thread is telling.
| sasaf5 wrote:
| It pales in comparison to the lack of empathy facebook has
| shown to its user herd.
| peanut_worm wrote:
| I am sure it is just a symptom of the Facebook outage, but it
| seems like every website I am going on is slower than usual
| today.
| Redoubts wrote:
| I've seen a couple fail to log in, because their SSO is broken
| through this. (even if FB login is merely an option)
| donatj wrote:
| Every Facebook App on every phone is DDoS-ing the DNS system
|
| https://twitter.com/blazejkrajnak/status/1445063232486531099
| [deleted]
| WallyFunk wrote:
| This would be a golden opportunity to launch your 'Facebook
| Killer' app. Preferably a social network where people don't pay
| with their data, but with, you know, a thing called Money.
| smt88 wrote:
| Who would pay money to be the first user of a new social
| network?
| alkonaut wrote:
| Here is a handy troubleshooting flowchart for megacorp outages:
|
| > Is it a DNS issue? -> yes
|
| It can be used in reverse as a postmortem too.
| Animats wrote:
| Outage is top story on CNN and Fox. Facebook is not returning
| their calls. Sheera Frenkel at the New York Times has been able
| to get a little more info, but not much.
|
| Now Twitter is starting to have problems with overload.
| slackfan wrote:
| So how much do we need to pitch in to _keep_ it down?
| swayson wrote:
| This really is a fascinating case-study of what is truly
| resilient systems. More often than not, they are not centralized.
| marchingvehicle wrote:
| Where could the physical data centers be that they need to
| access? How far away could it be?
| chippy wrote:
| maybe, there are reports (i.e. unverified tweets) that
| employees cannot access sites due to the security systems also
| being down. I imagine email, and messaging for employees would
| also be down too.
|
| It may be very hard for employees to get to the physical boxes,
| and/or bypass any physical or software security systems.
| rstupek wrote:
| Looks like someone found the light switch and turned everything
| back on!
| kossTKR wrote:
| Reddit r/Sysadmin user that claims to be on the "Recovery Team"
| for this ongoing issue:
|
| > _As many of you know, DNS for FB services has been affected and
| this is likely a symptom of the actual issue, and that 's that
| BGP peering with Facebook peering routers has gone down, very
| likely due to a configuration change that went into effect
| shortly before the outages happened (started roughly 1540 UTC).
| There are people now trying to gain access to the peering routers
| to implement fixes, but the people with physical access is
| separate from the people with knowledge of how to actually
| authenticate to the systems and people who know what to actually
| do, so there is now a logistical challenge with getting all that
| knowledge unified. Part of this is also due to lower staffing in
| data centers due to pandemic measures._
|
| User is providing live updates of the incident here:
|
| https://www.reddit.com/r/sysadmin/comments/q181fv/looks_like...
| guidopallemans wrote:
| He just deleted all his updates.
|
| user:
|
| https://old.reddit.com/user/ramenporn
|
| some messages:
|
| * This is a global outage for all FB-related services/infra
| (source: I'm currently on the recovery/investigation team).
|
| * Will try to provide any important/interesting bits as I see
| them. There is a ton of stuff flying around right now and like
| 7 separate discussion channels and video calls.
|
| * Update 1440 UTC: \ As many of you know, DNS
| for FB services has been affected and this is likely a symptom
| of the actual issue, and that's that BGP peering with Facebook
| peering routers has gone down, very likely due to a
| configuration change that went into effect shortly before the
| outages happened (started roughly 1540 UTC). There
| are people now trying to gain access to the peering routers to
| implement fixes, but the people with physical access is
| separate from the people with knowledge of how to actually
| authenticate to the systems and people who know what to
| actually do, so there is now a logistical challenge with
| getting all that knowledge unified. Part of this
| is also due to lower staffing in data centers due to pandemic
| measures.
| Ueland wrote:
| And there his account went poof, thanks for archiving.
| rodgerd wrote:
| If it was actually someone in Facebook, their job is gone
| by now, too.
| treesknees wrote:
| They were quoted on multiple news sites including Ars
| Technica. I would imagine they were not authorized to post
| that information. I hope they don't lose their job.
|
| Shareholders and other business leaders I'm sure are much
| happier reporting this as a series of unfortunate technical
| failures (which I'm sure is part of it) rather than a
| company-wide organizational failure. The fact they can't
| physically badge in the people who know the router
| configuration speaks to an organization that hasn't
| actually thought through all its failure modes. People
| aren't going to like that. It's not uncommon to have the
| datacenter techs with access and the actual software folks
| restricted, but that being the reason one of the most
| popular services in the world has been down for nearly 3
| hours now will raise a lot of questions.
|
| Edit: I also hope this doesn't damage prospects for more
| Work From Home. If they couldn't get anyone who knew the
| configuration in because they all live a plane ride away
| from the datacenters, I could see managers being reluctant
| to have a completely remote team for situations where
| clearly physical access was needed.
| fidesomnes wrote:
| > I hope they don't lose their job.
|
| I do! Oh well!
| jart wrote:
| Facebook should have had a panic room.
|
| Operations teams normally have a special room with a
| secure connection for situations like this, so that
| production can be controlled in the event of bgp failure,
| nuclear war, etc. I could see physical presence being an
| issue if their bgp router depends on something like a
| crypto module in a locked cage, in which case there's
| always helicopters.
|
| So if anything, Facebook's labor policies are about to
| become cooler.
| sulam wrote:
| That model works great, until you need to ask for
| permission to go into the office, and the way to get
| permission is to use internal email and ticketing
| systems, which are also down.
| samhw wrote:
| Yup, it's terrifying how much is ultimately, _ultimately_
| dependent on dongles and trust. I used to work at a
| company with a billion or so in a bank account (obviously
| a rather special type of account), which was ultimately
| authorised by three very trusted people who were given
| dongles.
| cyberpunk wrote:
| What did the dongles do?
| SahAssar wrote:
| Usually they contain a file called password.txt.
| Sometimes the file is even called something else.
| Sebb767 wrote:
| > nuclear war
|
| I think you need some convincing to keep your SREs on-
| site in case of a nuclear war ;)
| cyberpunk wrote:
| Hey, if I can take the kids and there's food for a decade
| and a bunker I'm probably in ;)
| mike_d wrote:
| I would be absolutely shocked if they didn't.
|
| The problem is when your networking core goes down, even
| if you get in via a backup DSL connection or something to
| the datacenter, you can't get from your jump host to
| anything else.
| jart wrote:
| It helps if your dsl line is is bridging at layer 2 in
| the osi model using rotated psks, so it won't be impacted
| by dns/bgp/auth/routing failures. That's why you need to
| put it in a panic room.
| rStar wrote:
| shoestring budget on a billion dollar product. you get
| what you deserve.
| projectazorian wrote:
| I doubt WFH will be impacted by this - not an insider but
| seems unlikely that the relevant people were on-site at
| data centers before COVID
| vineyardmike wrote:
| > I doubt WFH will be impacted by this - not an insider
| but seems unlikely that the relevant people were on-site
| at data centers before COVID
|
| I think the issue is less "were the right people in the
| data center" and more "we have no way to contact our co-
| workers once the internal infrastructure goes down". In
| non-wfh you physically walk to your co-workers desk and
| say "hey, fb messenger is down and we should chat, what's
| your number?". This proves that self-hosting your infra
| (1) is dangerous and (2) makes you susceptible to super-
| failures if comms goes down during WFH.
|
| Major tech companies (GAFAM+) all self-host and use
| internal tools so they're all at risk of this sort of
| comms breakdown. I know I don't have any co-workers
| number (except one from WhatsApp which if I worked at FB
| wouldn't be useful now).
| practice9 wrote:
| Most of the stuff was probably implemented before COVID
| anyways.
|
| They will fix the issue and add more redundant
| communication channels, which is either an improvement or
| a non-event for WFH.
|
| And Zuck is slowly moving (dogfooding) company culture to
| remote too with their Quest work app experiments
| legitster wrote:
| I'm not sure why shareholders are lumped in here. A lot
| of reasons companies do the secret squirrel routine is to
| hide their incompetence _from_ the shareholders.
| treesknees wrote:
| That is what I meant, although you have lots of
| executives and chiefs who are also shareholders.
| rusk wrote:
| > an organization that hasn't actually thought through
| all its failure modes
|
| Move Fast and Break Things!
| keithnoizu wrote:
| I came here to move fast and break things, and i'm all
| out of move fast.
| fanbelt wrote:
| They must have been moving very fast!
| swayson wrote:
| > I hope they don't lose their job.
|
| FB has such poor integrity, I'd not be surprised if they
| take such extreme measures.
| deanCommie wrote:
| > I hope they don't lose their job.
|
| I hope they do.
|
| #1 it's a clear breach of corporate confidentiality
| policies. I can say that without knowing anything about
| Facebook's employment contracts. Posting insider
| information about internal company technical difficulties
| is going to be against employment guidelines at any Big
| Co.
|
| In a situation like this that might seem petty and cagey.
| But zooming out and looking at the bigger picture, it's
| first and foremost a SECURITY issue. Revealing internal
| technical and status updates needs to go through high-
| level management, security, and LEGAL approvals, lest you
| expose the company to increased security risk by
| revealing gaps that do not need to be publicized.
|
| (Aside: This is where someone clever might say "Security
| by obscurity is not a strategy". It's not the ONLY
| strategy, but it absolutely is PART of an overall
| security strategy.)
|
| #2 just purely from a prioritization/management
| perspective, if this was my employee, I would want them
| spending their time helping resolve the problem not post
| about it on reddit. This one is petty, but if you're
| close enough to the issue to help, then help. And if
| you're not, don't spread gossip - see #1.
| unethical_ban wrote:
| I feel you're thinking through this with a "purely
| logical" standpoint and not a "reality" standpoint.
| You're thinking worst case scenario for the CYA
| management, having more sympathy for the executive
| managers than for the engineer providing insight to the
| tech public.
|
| It seems like a fundamental difference of "who gives a
| shit about corporate" from my side. The level of detail
| provided isn't going to get nationstates anything they
| didn't already know.
| samhw wrote:
| You're very, very right - and insightful - about the
| consequences of sharing this information. I agree with
| you on that. I _don 't_ think you're right that firing
| people is the best approach.
|
| Irrespective of the question of how bad this was, you
| don't fix things by firing Guy A and hoping that the new
| hire Guy B will do it better. You fix it by training
| people. This employee has just undergone some very
| expensive training, as the old meme goes.
| TheGigaChad wrote:
| Clown.
| polote wrote:
| > an organization that hasn't actually thought through
| all its failure modes
|
| Thinking about any potential things that can happen is
| impossible
| mynameisvlad wrote:
| Of course you can't think of every potential scenario
| possible, but an incorrect configuration and rollback
| should be pretty high in any team's risk/disaster
| recovery/failure scenario documentation.
| philwelch wrote:
| This is true, but it's not an excuse for not preparing
| for the contingencies you _can_ anticipate. You 're still
| going to be clobbered by an unanticipated contingency
| sooner or later, but when that happens, you don't want to
| feel like a complete idiot for failing to anticipate a
| contingency that was obvious even without the benefit of
| hindsight.
| radicalbyte wrote:
| Luckily you don't need to do that exhaustively: all you
| have to do is cover the general failure case. What
| happens when communications fail?
|
| This is something that most people aren't good at
| naturally, it tends to come from experience.
| depereo wrote:
| You don't need to consider 'what if a meteor hit the data
| centre and also it was made of cocaine'. You do need to
| think through "how do I get this back online in a
| reasonable timeframe from a starting point of 'everything
| is turned off and has the wrong configuration'."
| JabavuAdams wrote:
| I love that when you had to think of a random improbable
| event, you thought of a cocaine meteor. But ... hell YES!
| fragmede wrote:
| In a company the size of FaceBook, "everything is turned
| off" has never happened since before the company was
| founded 17 years ago. This makes is very hard to be
| _sure_ you can bring it all back online! Every time you
| try it, there are going to be additional issues that crop
| up, and even when you think you 've found them all, a new
| team that you've never heard of before has wedged
| themselves into the data-center boot-up flow.
|
| The meteor isn't made of cocaine, but four of them
| hitting at exactly the same time is freakishly
| improbable. There are other, bigger fish to fry, that
| we're going to treat four simultaneous meteors as
| impossible. Which is great, but then one the day, five of
| them hit at the same time.
| cesarb wrote:
| > "how do I get this back online in a reasonable
| timeframe from a starting point of 'everything is turned
| off and has the wrong configuration'."
|
| The electricity people have a name for that: black start
| (https://en.wikipedia.org/wiki/Black_start). It's
| something they actively plan for, regularly test, and
| once in a while, have to use in anger.
| jnwatson wrote:
| Right, but imagining that DNS goes down doesn't take a
| science fiction author.
| kukx wrote:
| It is a matter of preparation. You can make sure there
| are KVMoIPs or other OOB technologies available on site
| to allow direct access from a remote location. In the
| worst case technician has to know how to connect the OOB
| device or press a power button ;)
| treesknees wrote:
| I'm not disagreeing with you, however clearly (if the
| reddit posts were legitimate) some portion of their
| OOB/DR procedure depended on a system that's down. From
| old coworkers who are at FB, their internal DNS and
| logins are down. It's possible that the
| username/password/IP of an OOB KVM device is stored in
| some database that they can't login to. And the fact FB
| has been down for nearly 4 hours now suggests it's not as
| simple as plugging in a KVM.
| kukx wrote:
| I was referring to the WFH aspect the parent post
| mentioned. My point was that the admins could get the
| same level of access as if they were physically on site,
| assuming the correct setup.
| _joel wrote:
| There could be something in the contract that requires
| all community interaction to go via PR official channels.
|
| It's innocous enough, but leaking info, no matter what,
| will be a problem if it's stated in their contract.
| htrp wrote:
| 100%! comms will want to proof any statement made by
| anybody along with legal to ensure that there is no D&O
| liability for sec fraud.
| jfrunyon wrote:
| > Edit: I also hope this doesn't damage prospects for
| more Work From Home. If they couldn't get anyone who knew
| the configuration in because they all live a plane ride
| away from the datacenters, I could see managers being
| reluctant to have a completely remote team for situations
| where clearly physical access was needed.
|
| You're conflating working remotely ("a plane ride away")
| and working from home.
|
| You're also conflating the people who are responsible
| network configuration, and for coming up with a plan to
| fix this; and the people who are responsible for
| physically interacting with systems. Regardless of WFH
| those two sets likely have no overlap at a company the
| size of Facebook.
| harias wrote:
| Pushshift maintains archives of Reddit. You can use camas
| reddit search to view them.
|
| Comments by u/ramenporn: https://camas.github.io/reddit-
| search/#{%22author%22:%22rame...
| tornato7 wrote:
| PushShift is one of the most amazing resources out the
| for social media data and more people should know about
| it
| madars wrote:
| Can you recommend similar others (or maybe how to find
| them)? I learned of PushShift because snew, an
| alternative reddit frontend showing deleted comments, was
| making fetch requests and I had to whitelist it in
| uMatrix. Did not know about Camas until today.
| [deleted]
| [deleted]
| meragrin_ wrote:
| The account has been deleted as well.
| DaiPlusPlus wrote:
| What are they afraid of? While they are sharing information
| that's internal/proprietary to the company, it isn't
| anything particularly sensitive and having some
| transparency into the problem is good for everyone.
|
| Who'd want to work for a company that might take
| disciplinary action because an SRE posted a reddit comment
| to basically say "BGP's down lol" - If I was in charge I'd
| give them a modest EOY bonus for being helpful in their
| outreach to my users in the wider community.
| handmodel wrote:
| Seems reasonable that at a company of 60k, with hundreds
| who specialize in PR, you do not want a random engineer
| making the choice himself to be the first to talk to the
| press by giving a PR conference on a random forum.
| ric2b wrote:
| Facebook is well known for having really good PR, if they
| go after this guy for sharing such basic info that's yet
| another example of their great PR teams.
| OskarS wrote:
| Honestly, from a PR perspective, I'm not sure it's so
| bad. Giving honest updates showing Facebook hard at work
| is certainly better PR for our kind of crowd than
| whatever actual Facebook PR is doing.
| ALittleLight wrote:
| That one guy's comments seen fine from a PR perspective
| apart from it not being his role to communicate for the
| company.
|
| I still think he should be fired for this kind of
| communication though. One reason is, imagine Facebook
| didn't punish breaches of this type. Every other employee
| is going to be thinking "Cool, I could be in a Wired
| article" or whatever. All they have to do is give
| sensitive company information to reporters.
|
| Either you take corporate confidentiality seriously or
| you don't. Posting details of a crisis in progress on
| your Reddit account is not taking corporate
| confidentiality seriously. If the Facebook corporation
| lightly punishes, scolds, or ignores this person then the
| corporation isn't taking confidentiality seriously
| either.
| mike_d wrote:
| You falsely assume Hacker News is even remotely what
| Facebook PR gives a shit about.
| ballenf wrote:
| It's terrible PR for the FB PR team's performance.
| confiq wrote:
| I agree, but try to explain that to PR people...
| staticassertion wrote:
| Reporters are going to opportunistically start writing
| about those comments vs having to wait for a controlled
| message from a communications team. So the reddit posts
| might not be "so bad", but they're also early and
| preempting any narrative they may want to control.
| orangepanda wrote:
| That was their best PR in years
| Animats wrote:
| Compare Facebook's official tweet: _" We're aware that
| some people are having trouble accessing our apps and
| products. We're working to get things back to normal as
| quickly as possible, and we apologize for any
| inconvenience."_
|
| That's the PR team, clueless.
| tornato7 wrote:
| Facebook has never been open and honest about anything,
| no reason to think they would start now.
| tornato7 wrote:
| To be fair, Facebook has never been open and honest about
| anything.
| HelloNurse wrote:
| I don't think Facebook could actually say anything more
| accurate or more honest. "Everything is dead, we are
| unable to recover, and we are violently ashamed" would be
| a more fun statement, but not a more useful one.
|
| There will be plenty of time to blame someone, share
| technical lessons, fire a few departments, attempt to
| convince the public it won't happen again, and so on.
| handmodel wrote:
| I agree completely. The target audience Facebook is
| concerned about is not techies wanting to know the
| technical issues. Its the huge advertising firms,
| governments, power users, etc. who have concerns about
| the platform or have millions of dollars tied up in it. A
| bland statement is probably the best here - and even if
| the one engineer gave accurate useful info I don't see
| how you'd want to encourage an org in which thousands of
| people feel the need to post about whats going on
| internally during every crisis.
| Sebb767 wrote:
| Well, they could at least be specific about how large the
| outage is. "Some people" is quite different to absolutely
| everyone. At least they did not add a "might" in there.
| no_time wrote:
| These few sentences were a better and more meaningful
| read than what hundreds of PR people could ever come up
| with
| ptero wrote:
| A few random guesses (I am not in any way affiliated with
| FB); just my 2c:
|
| Sharing status of an active event may complicate
| recovery, especially if they suspect adversarial actions:
| such public real-time reports can explain to the red team
| what the blue team is doing and, especially important,
| what the blue team is unable to do at the moment.
|
| Potentially exposing the dirty laundry. While a
| postmortem should be done within the company (and as much
| as possible is published publicly) after the event, such
| early blurbs may expose many non-public things, usually
| unrelated to the issue.
| projectazorian wrote:
| FB takes confidentiality very seriously. He crossed a
| major red line.
| kelnos wrote:
| > _If I was in charge I 'd give them a modest EOY bonus
| for being helpful in their outreach to my users in the
| wider community._
|
| That seems pretty unlikely at any but the smallest of
| companies. Most companies unify _all_ external
| communications through some kind of PR department. In
| those cases usually employees are expressly prohibited
| from making any public comments about the company without
| approval.
| cronix wrote:
| > What are they afraid of?
|
| Zuckerberg Loses $7 Billion in Hours as Facebook Plunges
|
| https://finance.yahoo.com/news/zuckerberg-
| loses-7-billion-ho...
|
| Stop the hemorrhaging. Too much bad press for FB lately
| and it all adds up.
| Denvercoder9 wrote:
| Unlikely to be related. FB's losses today already
| happened before FB went down, and are most likely related
| to the general negative sentiment in the market today,
| and the whistleblower documents. It's actually kind of
| remarkable how little impact the outage had on the stock.
| motoxpro wrote:
| I was thinking the same...
| pythonaut_16 wrote:
| Unrelated to the outage, but I hate headlines like this.
|
| Facebook is down ~5% today. That's a huge plunge to be
| sure, but Zuckerberg hasn't "lost" anything. He owns the
| same number of shares today as he did yesterday. And in
| all likelihood, unless something truly catastrophic
| happens the share price will bounce back fairly quickly.
| The only reason he even appears to have lost $7 billion
| is because he owns so much Facebook stock.
|
| These types of alarmist headlines are inane.
| minusSeven wrote:
| Do we even know if someone had the account deleted? I
| think facebook might have their hands full right now
| solving the issue rather than looking at social media
| posts that discusses the issue.
| kelnos wrote:
| There are a lot of people who work at Facebook, and I'm
| sure the people responsible for policing external comms
| do not have the skills or access to fix what's wrong
| right now.
| _kst_ wrote:
| Assuming that Facebook forced the account to be deleted,
| it wouldn't have been done by anyone who's working on
| fixing the problem.
| jaywalk wrote:
| As much as all of the curious techies here would love
| transparency into the problem, that doesn't actually do
| any good for Facebook (or anyone else) at the moment.
| Once everything is back online, making a full RCA
| available would do actual good for everyone. But I
| wouldn't hold my breath for that.
| treesknees wrote:
| Mentioned in another reply
|
| Shareholders and other business leaders I'm sure are much
| happier reporting this as a series of unfortunate
| technical failures (which I'm sure is part of it) rather
| than a company-wide organizational failure. The fact they
| can't physically badge in the people who know the router
| configuration speaks to an organization that hasn't
| actually thought through all its failure modes. People
| aren't going to like that. It's not uncommon to have the
| datacenter techs with access and the actual software
| folks restricted, but that being the reason one of the
| most popular services in the world has been down for
| nearly 3 hours now will raise a lot of questions.
| birdman3131 wrote:
| I did not read it as they can't get them on site but
| rather that it takes travel to get them on site. Travel
| takes time of which they desperately want not to spend.
| [deleted]
| Narushia wrote:
| The 1440 UTC update is also archived on the Wayback Machine:
| https://web.archive.org/web/20211004171424/https://old.reddi.
| ..
|
| And archive.today: https://archive.ph/sMgCi
| yholio wrote:
| Essentially, they locked themselves out with an uninspired
| command line at the exact moment the datacenter was being
| hijacked by ape-people.
|
| Yup, corporate comms won't love these status updates.
| wtf-is-ur-prblm wrote:
| Sorry, are you referring to data center technicians as "ape
| people"?
| z-nexx wrote:
| As a former data center technician, I wouldn't say it's
| too far off
| ticklemyelmo wrote:
| But we're all ape people.
| samstave wrote:
| https://i.imgur.com/O4yEget.png
| samstave wrote:
| Are you fucking kidding me?
|
| We even had a site and operation for a long while called:
|
| "NOC MONKEY .DOT ORG"
|
| We called all of ourselves NOC MONKEYS. [[Remote Hands]]
|
| Yeah, that was a term used widely.
|
| I'm 46. I assume you are < #
|
| ---
|
| Where were you in 1997 building out the very first XML
| implementations to replace EDI from AS400s to FTP EDI
| file retrievals via some of the first Linux FTP servers
| based in SV?
|
| I was there? Remember LinuxCare?
| eska wrote:
| Are you ok, Sir?
| korethr wrote:
| I mean, when I last worked in a NOC, we used to call
| ourselves "NOC monkeys", so yeah. IF you're in the NOC,
| you're a NOC monkey, if you're on the floor, you're a
| floor monkey. And so on.
| r721 wrote:
| Archived version: https://archive.is/QvdmH
| secondcoming wrote:
| Who is https://www.reddit.com/user/nathan131412/
| larntz wrote:
| This tweet seems to confirm it is a bgp issue...
|
| https://twitter.com/GossiTheDog/status/1445063880963674121?s...
| adamredwoods wrote:
| Cloudflare also confirmed it:
|
| https://twitter.com/jgrahamc/status/1445068309288951820
|
| Also, the Domain name is for sale???
|
| https://whois.domaintools.com/facebook.com
| glenneroo wrote:
| Weird banner at the top, seems like false advertising as it
| says a couple lines down: Expires on 2030-03-29
| bombcar wrote:
| I suspect it's an automated system triggered by DNS not
| resolving, and they try to "make an offer" if you follow
| through.
| adamredwoods wrote:
| You're right, it's misleading, thanks. Other sites
| (dreamhost, godaddy) don't list it as for sale.
| pmlnr wrote:
| > the people with physical access is separate from the people
| with knowledge of [...]
|
| Welcome to the brave new world of troubleshooting. This will
| seriously bite us one day.
| formerly_proven wrote:
| This sounds like something that might have been done with
| security in mind. Although generally speaking, remote hands
| don't have to be elite hackors.
| pmlnr wrote:
| Have you ever tried to remotely troubleshoot THROUGH
| another person?!
| jl6 wrote:
| Yes, and it works if both parties are able to communicate
| using precise language. The onus is on the remote SME to
| exactly articulate steps, and on the local hands to
| exactly follow instructions and pause for clarifications
| when necessary.
| rdtsc wrote:
| Not OP, but many times. Really makes you think hard about
| log messages after an upset customer has to read them
| line by line over the phone.
|
| One was particularly painful, as it was a "funny" log
| message I had added the code when something went wrong.
| Lesson learned was to never add funny / stupid / goofy
| fail messages in the logs. You will regret it sooner or
| later.
| jfrunyon wrote:
| Yes. Depending on the person, it can either go extremely
| well or extremely poorly. Getting someone else to point a
| camera at the screen helps.
| hamburglar wrote:
| My company runs copies of all our internal services in
| air-gapped data centers for special customers. The
| operators are just people with security clearance who
| have some technical skills. They have no special
| knowledge of our service inner workings. We (the dev
| team) aren't allowed to see screenshots or get any data
| back. So yeah, I have done that sort of troubleshooting
| many times. It's very reminiscent of helping your grandma
| set up her printer over the phone.
| touisteur wrote:
| And this is why we should build our critical systems in a
| way that can be debugged on the phone... With your
| grandma.
| ikiris wrote:
| Yeah. Do what you have to.
|
| Sometimes the DR plan isn't so much I have to have a
| working key, I just have to know who gets their first
| with a working key, and break glass might be literal.
| lmilcin wrote:
| I don't think so. I bet nobody is ever going to make that
| mistake at FB again after today.
| gbil wrote:
| this is not new, this is everyday life with helping hands, on
| duty engineers, l2-l3 levels telling people with physical
| access which commands to run etc. etc. etc.
| Scoundreller wrote:
| Then you have security issues like this where someone
| impersonates a client with helping hands and drains your
| exchanges hot wallet:
|
| https://www.huffpost.com/archive/ca/entry/canadian-
| bitcoins-...
| t0mas88 wrote:
| The places I've seen this at had specific verification
| codes for this. One had a simple static code per person
| that the hands-on guys looked up in a physical binder on
| their desk. Very disaster proof.
|
| The other ones had a system on the internal network in
| which they looked you up, called back on your company
| phone and asked for a passphrase the system showed them.
| Probably more secure but requires those systems to be
| working.
| suyash wrote:
| folks with physical access are also denied. source -
| https://twitter.com/YourAnonOne/status/1445100431181598723
| drdeadringer wrote:
| IT: "Please do this fix."
|
| Person 1: "I can't, I don't have physical access."
|
| IT: "Please do this fix."
|
| Person 2: "I can't, I don't have digital access."
|
| Why? It's [IT's?] policy.
| MauranKilom wrote:
| FWIW that's not the original source, just some twitter
| account reposting info shared by someone else. See this
| sub-thread: https://news.ycombinator.com/item?id=28750888
| prox wrote:
| Let me guess, it is tied to FB systems which are down. That
| would be hilarious.
| RobRivera wrote:
| like today! xD
| dsr_ wrote:
| It just bit FB.
| [deleted]
| munk-a wrote:
| Telecommunication satellite communication issues might
| seriously shut down whole regions if they occur.
| rvnx wrote:
| I like how FB decided to send "ramenporn" as their
| spokesperson.
| huevosabio wrote:
| A particular facet I love of the internet era is
| journalists reporting serious events while having to use
| the completely absurd usernames...
|
| "A Facebook engineer in the response team, ramenporn..."
| [deleted]
| WillPostForFood wrote:
| >journalists reporting serious events
|
| A facet I don't love is journalism devolving to reposting
| unverified, anonymous reddit posts.
| sharkweek wrote:
| This felt like something straight out of a post modern
| novel during the whole WSB press rodeo, where some user
| names being used on TV were somewhere between absurd to
| repulsive.
|
| Loved it.
| myself248 wrote:
| I believe that's the exact reason behind the pattern of
| horrifying usernames on reddit and imgur. It's
| magnificent in its surrealness.
| jupp0r wrote:
| Exactly, I'm having deja vues from Vernor Vinge's
| Rainbow's End constantly lately.
| ivanmontillam wrote:
| "Discussed in Hacker News, the user that goes by the
| 'huevosabio' handle, stated as a fact that..."
| runawaybottle wrote:
| 'He was then subsequently attacked by
| "OverTheCounterIvermectin" for his tweets on transgender
| bathrooms from several months ago'.
| hdjjhhvvhga wrote:
| The problem with tweets on transgender bathrooms is that
| you can be attacked for them by either side at any point
| in the future, so the user OverTheCounterIvermectin
| should have known better.
| noir_lord wrote:
| I got quoted as noir_lord in the press.
|
| My bbs handle from 30 years ago.
| Apocryphon wrote:
| _Immortality._
| Scarblac wrote:
| I remember some huge DDOS attacks like a decade ago, and
| people were speculating who could be behind it. The three
| top theories were Russian intelligence, the Mossad, and
| this guy on 4chan who claimed to have a Botnet doing it.
|
| That was the start of living in the future for me.
| [deleted]
| chasd00 wrote:
| 4chan is disturbingly resourceful at times. I have heard
| them described as weaponized autism.
| Y_Y wrote:
| Ya, on hn it's merely productized.
| abhiminator wrote:
| That's a pretty accurate description of the site, lol.
|
| On a side-note, I think you'll enjoy some of the videos
| by the YouTube 'Internet Historian' on 4chan:
|
| * https://www.youtube.com/watch?v=SvjwXhCNZcU
|
| * https://www.youtube.com/watch?v=HiTqIyx6tBU
| RankingMember wrote:
| My favorite example of this is when I saw references to
| "Goatse Security" on the front page of the Wall Street
| Journal
| [deleted]
| mopierotti wrote:
| I'm worried about that person. I doubt Facebook will look
| kindly on breaking incident news being shared on reddit.
| greendave wrote:
| They work at facebook. Can't imagine they have any
| illusions regarding their privacy/anonymity.
| blobbers wrote:
| Curious what the internal "privacy" limitations are.
| Certainly FB must track reddit users : fb account even if
| they don't actually display it. It just makes sense.
| tonfa wrote:
| Thanks to the GDPR at least that's easy to verify for
| European users.
| hdjjhhvvhga wrote:
| That said, it will be interesting to read their post-
| mortem next year and compare it with what ramenporn
| wrote.
| tubby12345 wrote:
| lol no one cares. we're all laughing about this too (all
| of us except the networks people at least...)
| r721 wrote:
| I hope you won't have to delete your account too :)
| rvnx wrote:
| Well, seems like FB shutdowned his post...
| jspdown wrote:
| Apparently Facebook HQ didn't like how ramenporn handled
| the situation. His account has been deleted, as well as
| all his messages about the incident.
| platz wrote:
| his account is active, only the incident comments were
| deleted
| jfrunyon wrote:
| > [Reddit logo] u/ramenporn: deleted
|
| > This user has deleted their account.
| rvnx wrote:
| At least that department at Facebook is still working!
| superflit2 wrote:
| That Ramenporn got engagement by Hate Speech
| teekert wrote:
| There never was a ramenporn.
| [deleted]
| [deleted]
| cheese_van wrote:
| This is why so many teams fight back against the audit
| findings:
|
| "The information systems office did not enforce logical
| access to the system in accordance with role-based access
| policies."
|
| Invariably, you want your best people to have full access to
| all systems.
| Accujack wrote:
| Well, you want the _right_ people to have access. If you
| 're a small shop or act like one, that's your "top" techs.
|
| If you're a mature larger company, that's the team leads in
| your networking area on the team that deal with that
| service area (BGP routing, or routers in general).
|
| Most likely Facebook et. al. management never understood
| this could happen because it's "never been a problem
| before".
| jfrunyon wrote:
| I can't fathom how they didn't plan for this. In any business
| of size, you have to change configuration remotely on a
| regular basis, and can easily lock yourself out on a regular
| basis. Every single system has a local user with a random
| password that we can hand out for just this kind of
| circumstance...
| shadowgovt wrote:
| Organizational complexity grows super-linearly; in general,
| the number of people a company can hire per unit time is
| either constant or grows linearly.
|
| Google once had a very quiet big emergency that was,
| ironically(1), initiated by one of their internal disaster-
| recovery tests. There's a giant high-security database
| containing the 'keys to the kingdom', as it were...
| Passwords, salts, etc. that cannot be represented as one-
| time pads and therefore are potentially dangerous magic
| numbers for folks to know. During disaster recovery once,
| they attempted to confirm that if the system had an outage,
| it would self-recover.
|
| It did not.
|
| This tripped a very quiet panic at Google because while the
| company would tick along fine for awhile without access to
| the master password database, systems would, one by one,
| fail out if people couldn't get to the passwords that had
| to be occasionally hand-entered to keep them running. So a
| cross-continent panic ensued because restarting the
| database required access to two keycards for NORAD-style
| simultaneous activation. One was in an executive's wallet
| who was on vacation, and they had to be flown back to the
| datacenter to plug it in. The other one was stored in a
| safe built into the floor of a datacenter, and the
| combination to that safe was... In the password database.
| They hired a local safecracker to drill it open, fetched
| the keycard, double-keyed the initialization machines to
| reboot the database, and the outside world was none the
| wiser.
|
| (1) I say "ironically," but the actual point of their self-
| testing is to cause these kinds of disruptions before
| chance does. They aren't generally supposed to cause user-
| facing disruption; sometimes they do. Management frowns on
| disruption in general, but when it's due to disaster
| recovery testing, they attach to that frown the grain of
| salt that "Because this failure-mode existed, it would have
| occurred eventually if it didn't occur today."
| iszomer wrote:
| Thanks for telling this story as it was more amusing than
| my experiences of being locked in a security corridor
| with a demagnetised access card, looooong ago.
| l9i wrote:
| That's not quite how it happened. ;)
|
| <shameless plug> We used this story as the opening of
| "Building Secure and Reliable Systems" (chapter 1). You
| can check it out for free at https://sre.google/static/pd
| f/building_secure_and_reliable_s... (size warning: 9 MB).
| </shameless plug>
| hnaccy wrote:
| what if the executive had been pick-pocketed
| shadowgovt wrote:
| EDIT: I had mis-remembered this part of the story. ;)
| What was stored in the executive's brain was the
| _combination_ to a second floor safe in another
| datacenter that held one of the two necessary activation
| cards. Whether they were able to pass it to the
| datacenter over a secure / semi-secure line or flew back
| to hand-deliver the combination I do not remember.
|
| If you mean "Would the pick-pocket have access to
| valuable Google data," I think the answer is "No, they
| still don't have the key in the safe on the other
| continent."
|
| If you mean "Would the pick-pocket have created a
| critical outage at Google that would have required
| intense amounts of labor to recover from," I don't know
| because I don't know how many layers of redundancy their
| recovery protocols had for that outage. It's possible
| Google came within a hair's breadth of "Thaw out the
| password database from offline storage, rebuild what can
| be rebuilt by hand, and inform a smaller subset of the
| company that some passwords are now just gone and they'll
| have to recover on their own" territory.
| chasd00 wrote:
| Another Monday morning at a boring datacenter job, i bet
| they weren't even there yet at 830 when the phones started
| ringing.
| steelframe wrote:
| Assuming anyone can actually look up the phone numbers to
| call.
| CydeWeys wrote:
| There should be 24/7 on-site rotations. I wonder if
| physical presence was cut on account of COVID?
| bink wrote:
| You mean the VOIP phones that could no longer receive
| incoming calls?
| mro_name wrote:
| phones? how lame.
| radomir_cernoch wrote:
| It certainly wasn't the Messenger.
| outworlder wrote:
| > I can't fathom how they didn't plan for this
|
| Maybe because they were planning for a million other
| possible things to go wrong, likely with higher probability
| than this. And busy with each day's pressing matters.
| jfrunyon wrote:
| Anyone who has actually worked in the field can tell you
| that a deploy or config change going wrong, at some
| point, and wiping out your remote access / ability to
| deploy over it is _incredibly, crazy likely_.
| weeeeelp wrote:
| Absolutely, and I'd even call it a rite of passage to
| lock yourself out in some way, having worked in a couple
| of DCs for three years. Low-level tooling like iLO/iDRAC
| can sure help out with those, but is often ignored or too
| heavily abstracted away.
| samhw wrote:
| That someone will win the lottery is also incredibly
| likely. That _a given person_ will win the lottery is, on
| the other hand, vanishingly unlikely. That a given config
| change will go wrong in a given way is ... eh, you see
| where I 'm going with this
| jfrunyon wrote:
| Right, which is why you just roll in protection for all
| manner of config changes by taking pains to ensure there
| are always whitelists, local users, etc. with secure(ly
| stored) credentials available for use if something goes
| wrong; rather than assuming your config changes will be
| perfect.
| samhw wrote:
| I'm not sure it's possible to speculate in a way which is
| generic over all possible infrastructures. You'll also
| hit the inevitable tradeoff of security (which tends
| towards minimal privilege, aka single points of failure)
| vs reliability (which favours 'escape hatches' such as
| you mentioned, which tend to be very dangerous from a
| security standpoint).
| smrtinsert wrote:
| Haha sure. They were too busy implementing php compilers
| to figure out that "whole DR DNS thing"
|
| rotflmao. I'd remove Facebook from my resume.
| mynameisvlad wrote:
| A config change gone bad?
|
| That's like failure scenarios 101. That should be the
| second on the list, after "code change gone bad".
| jfrunyon wrote:
| Exactly! Obviously they have extremely robust testing and
| error catching on things like code deploys: how many
| times do you think they deploy new code a day? And at
| least personally, their error rate is somewhere below 1%.
|
| Clearly something about their networking infrastructure
| is not as robust.
| pupdogg wrote:
| Right? Especially on global scale. Something doesn't add
| up!
| TheOtherHobbes wrote:
| Curious/unfortunate timing. The day after a whistleblower
| docu and with a long list of other legal challenges and
| issues incoming.
| [deleted]
| amalcon wrote:
| Most likely they _did_ plan for this. Then, something
| happened that the failsafe couldn 't handle. E.g. if
| something overwrites /etc/passwd, having a local user won't
| help. I'm not saying that specific thing happened here --
| it's actually vanishingly unlikely -- but your plan can't
| cover every contingency.
| robalfonso wrote:
| Agreed, it's also worth mentioning that at the end of
| every cloud is real physical hardware, and that is
| decidedly less flexible than cloud, if you locked
| yourself out of a physical switch or router you have many
| fewer options.
| [deleted]
| strenholme wrote:
| The Reddit post is down but not before it was archived:
| https://archive.is/QvdmH and https://archive.is/TNrFv
| [deleted]
| cotillion wrote:
| So, does anyone know where to one can buy an LTE gateway with a
| serial port interface? Asking for a friend.
| mekatter wrote:
| These are readily available, OpenGear and others have offered
| them forever. I can't believe fb doesn't have out of band
| access to their core networking in some fashion. OOB access
| to core networking is like insurance, rarely appreciated
| until the house is on fire.
| Sebb767 wrote:
| It's quite possible that they have those, but that the
| credentials are stored in a tool hosted in that datacenter
| or that the DNS entries are managed by the DNS servers that
| are down right now.
| mekatter wrote:
| You are probably right but if that is the case, it isn't
| really out of band and needs another look. I use OpenGear
| devices with cellular to access our core networking to
| multiple locations and we treat them as basically an
| entirely independent deployment, as if it is another
| company. DNS and credentials are stored in alternate
| systems that can be accessed regardless of the primary
| systems.
|
| I'm sure the logistics of this become far more
| complicated as the organization scales but IMHO it is
| something that shouldn't be overlooked, exactly for
| outlier events like this. It pays dividends the first
| time it is really needed. If the accounts of ramenporn
| are correct, it would be paying very well right now.
|
| Out of band access is a far more complicated version of
| not hosting your own status page, which they don't seem
| to get right either.
| daper wrote:
| Our security team complained that we have some services like
| monitoring or SSH access to some Jump Hosts accessible
| without a VPN because VPN should be mandatory to access all
| internal services. I'm afraid once comply we could be in
| similar situation where Facebook is now...
| iso1210 wrote:
| But you have two independent VPNs right, using different
| technologies on different internet handoffs in very
| different parts of your network, right?
| lostapathy wrote:
| Fundamentally, how is a 2nd independent VPN into your
| network a different attack surface than a single, well-
| secured ssh jumphost? When you're using them for narrow
| emergency access to restore the primary VPN, both are
| just "one thing" listening on the wire, and it's not like
| ssh isn't a well-understood commodity.
| jaywalk wrote:
| Still wouldn't help if your configuration change wipes
| you clear off the Internet like Facebook's apparently
| has. The only way to have a completely separate backup is
| to have a way in that doesn't rely on "your network" at
| all.
| tiborsaas wrote:
| "I believe the original change was 'automatic' (as in
| configuration done via a web interface). However, now that
| connection to the outside world is down, remote access to those
| tools don't exist anymore, so the emergency procedure is to
| gain physical access to the peering routers and do all the
| configuration locally."
|
| Hmm, could be a UI/UX bug then :)
| sjg007 wrote:
| Seems odd to not have a redundant backdoor on a different
| network interface. Maybe that is too big of a security risk
| but idk.
| progbits wrote:
| You know how after changing resolution and other video
| settings you get a popup "do you want to keep these
| changes?" with a countdown and automatic revert in case you
| managed to screw up and can't see the output anymore?
|
| Well, I wonder why a router that gets a config update but
| then doesn't see any external traffic for 4 hours doesn't
| just revert back to the last known good config...
| kerng wrote:
| Wondering how Facebook communicates now internally - most of
| their work streams likely depend on Facebooks systems which are
| all down.
|
| Can engineers and security teams even access prod systems
| anymore? Like, would "Bastion" hosts be reachable?
|
| Wonder if they use Signal and Slack now?
| not2b wrote:
| I would think that their internal network would correctly
| resolve facebook.com even though they've borked DNS for the
| external world, or if not they could immediately fix that. So
| at least they'd be able to talk to each other.
| xgme wrote:
| Facebook does use IRC and Zoom as a fallback.
| LordHumungous wrote:
| My team set up a discord lol
| slaymaker1907 wrote:
| If they planned ahead, they should have had their oncalls
| practice on the backup systems (like Signal/Slack/Zoom)
| before now.
| markchristian wrote:
| To the communication angle, I've worked at two different
| BigCo's in my career, and both times there was a fallback
| system of last resort to use when our primary systems were
| unavailable.
| jptech wrote:
| Don't they have a separate instance for internal
| communications?
| ThinkBeat wrote:
| I haven't worked for a FAANG but it would be unthinkable that
| FB does not have backup measures in place for communications
| entirely outside of Facebook.
|
| Hmm well I mean for key people, ops and so on. Not for every
| employee.
|
| Only a few people need that type of access, and they should
| have it ready. They need to bring more people there should be
| an easy way to do it.
|
| Maybe the internal FB Messenger app has a slide button to
| switch to the backup network for those in need.
| mrep wrote:
| > Maybe the internal FB Messenger app has a slide button to
| switch to the backup network for those in need.
|
| Having worked for 2 FAANG companies, I can tell you most
| core services like which FB Messenger would be using
| internal database services and relying on those which would
| be ineffective in a case like this as it would not work and
| the engineering cost to design them to support an external
| database would be a lot more than just paying for like 5
| different external backup products for your SRE team.
| flyingswift wrote:
| FB uses a separate IRC instance for these kinds of issues, at
| least when I used to work there
| alasdair_ wrote:
| There are various non-FB fallback measures, including IRC as
| a last-ditch method. The IRC fallback is usually tested once
| a year for each engineer.
| jrochkind1 wrote:
| Good planning! Now, where does the IRC server live, and is
| it currently routable from the internet?
|
| While normally I know the advice is "Don't plan for
| mistakes not to happen, it's impossible, murphy's law, plan
| for efficient recovery for mistakes"... when it comes to
| "literally our entire infrastructure is no longer routable
| from the internet", I'm not sure there's a great
| alternative to "don't let that happen. ever." And yet, here
| facebook is.
| PeterisP wrote:
| Also, are the users able to reach the server without DNS
| (i.e. are the IP addresse(s) involved static and
| communicated beforehand) and is the server itself able to
| function without DNS?
|
| Routing is one thing which you can't do without (then you
| need to fallback to phone communications), but DNS is
| something that's quite probable to not work well in a
| major disaster.
| Diederich wrote:
| A lot of the core 'ops like' teams at FB use IRC on a daily
| basis.
|
| When I worked there, I wasn't aware of any 'test once per
| year' concept or directive.
|
| Of course, FB is a really big place, so things are
| different in different areas.
| kaustubhvp wrote:
| I just heard from a contact that the fallback/backup IRC is
| also down.
| littlecranky67 wrote:
| Bet it was located at irc.facebook.com ;)
|
| Joking aside, I can see how an IRC _network_ has
| potential to be used in these situations. Maybe FAMANG
| should work together to set something like this up. The
| problem is, a _single_ IRC server is not fail safe, but a
| network of multiple servers would just see a netsplit, in
| which case users would switch servers.
|
| Also, I remember back in the IRCnet days using simply
| telnet to connect to IRCnet just for fun and sending
| messages, so its a very easy protocol that can be
| understood in a global desaster scenario (just the PING
| replys where annoying in telnet).
| treesknees wrote:
| I heard the same thing from my old coworker who is at FB
| currently. All of their internal DNS/logins are broken
| atm so nobody can reach the IRC server. I bet this will
| spur some internal changes at FB in terms of how to
| separate their DR systems in the case of an actual
| disaster.
| gfosco wrote:
| Actually, in this situation: Discord.
| Pasorrijer wrote:
| Facebook is likely scrambling private jets as we speak to get
| the right people to the right places.
| aero-glide2 wrote:
| Reminds me of that episode in Mr Robot
| zolosa wrote:
| The cost of the downtime would be
| [deleted]
| gabaix wrote:
| Facebook 2021 revenue is around $100B. That's $11M an
| hour. Since it's peak hour for ad printing, one can
| assume double or triple this rate.
|
| They are already looking at > $100M in ad loss, not
| counting reputation damage etc.
| prox wrote:
| Think of all the influencers who can't influence and FB
| addicts who can't get their fix (+insta and whatsapp)
| cwkoss wrote:
| Uh oh that user deleted their account. Hope they are OK.
| rexreed wrote:
| I am sure this is not what they specifically mean by fail fast
| and break things often.
| cecilpl2 wrote:
| User has now deleted the update.
| IceWreck wrote:
| > Even in the biggest of organizations, they still have to wait
| for somebody to race down to the datacenter and plug his laptop
| into a router.
|
| I love this comment.
| yawnxyz wrote:
| for something as distributed as Facebook, do multiple
| somebodys all have to race down each individual datacenter
| and plug their laptops into the routers?
|
| As someone with no experience in this, it sounds like a
| terrifying situation for the admins...
| MuffinFlavored wrote:
| Imagine having the a huge portion of the digital world
| internationally riding on your shoulders...
| laurent92 wrote:
| Imagine that guy has this big npm repository locally with
| all those dodgy libraries with uncontrolled origin, in
| their /lib/node_modules with root permissions.
|
| Wait, we all do, here.
| victor9000 wrote:
| You can use a custom npm prefix to avoid the mess you're
| describing. So basically:
|
| See current prefix:
|
| > npm config get prefix
|
| Set prefix to something you can write to without sudo:
|
| > npm config set prefix /some/custom/path
| [deleted]
| qnsi wrote:
| he started deleting the comments
| wolverine876 wrote:
| > Reddit r/Sysadmin user that claims to be on the "Recovery
| Team"
|
| They have time to make public posts, and think it's a good
| idea?
|
| Sure, I'm on the 'Recovery Team' too! How about you?
| bennyp101 wrote:
| Interesting that they published stuff about their BGP setup and
| infrastructure a few months ago - maybe a little tweak to roll
| backs is needed.
|
| "... We demonstrate how this design provides us with flexible
| control over routing and keeps the network reliable. We also
| describe our in-house BGP software implementation, and its
| testing and deployment pipelines. These allow us to treat BGP
| like any other software component, enabling fast incremental
| updates..."
| tedmiston wrote:
| # todo: add rollbacks
| pbhjpbhj wrote:
| Surely Facebook don't update routing systems between data
| centres (IIRC the situation) when they don't have people
| present to fix things if they go wrong? Or have an out-of-
| band connection (satellite, or dial-up (?), or some other
| alternate routing?).
|
| I must be misunderstanding this situation here.
|
| [Aside: I recall updating wi-fi settings on my laptop and
| first checking I had direct Ethernet connection working ...
| and that when I didn't have anything important to do (could
| have done a reinstall with little loss). Is that a reasonable
| analogy?]
| lstodd wrote:
| > don't update routing systems between data centres (IIRC
| the situation) when they don't have people present
|
| Ha. You put too much faith into people.
| fistynuts wrote:
| Move fast and break . . . <NO CARRIER>
| [deleted]
| PeterCorless wrote:
| Comment now seems to be deleted by user.
| sbierwagen wrote:
| That reddit comment has been deleted.
| ds206 wrote:
| Well, those comments have been deleted now... I guess someone's
| boss didn't like the unofficial updates going out? :)
| costcofries wrote:
| Looks like those updates have now been deleted
| winternett wrote:
| Also, equally important to note, there was a massive expose on
| FaceBook yesterday that is reverberating across social media
| and news networks, and today, when I tried to make a post
| including the tag #deletefacebook, my post mysteriously could
| not be published and the page refreshed, mysteriously wiping my
| post...
|
| This is possibly the equivalent of a corporate watergate if you
| ask me... Just my personal opinion as a developer though... Not
| presented as fact... But hrmmm.
| blobbers wrote:
| So what you're saying is facebook... deleted itself?
|
| The singularity is happening. It realized it would end
| society, so it ended itself.
| ds206 wrote:
| They decided that they publish too much misinformation and
| self censored ;)
| rvnx wrote:
| This reminds me the last time the singularity nearly
| happened.
|
| https://google.com/search?q=google
|
| I beg you, don't go there.
| snickersnee11 wrote:
| Just imagine the amount of stress on this people, hope the
| money really worth it.
| tomjen3 wrote:
| This is a one off event, not a chronic stress trigger. I find
| them envigorating personally, as long as everybody concerned
| understands that this is not good in the long run, and that
| you are not going to write your best code this way.
| mov31tmov31t wrote:
| It shouldn't be too stressful. Well-managed companies blame
| processes rather than people, and have systems set up to
| communicate rapidly when large-scale events occur.
|
| It can be sort of exciting, but it's not like there is one
| person typing at a keyboard with a hundred managers breathing
| down their neck. These resolutions are collaborative, shared
| efforts.
| rvnx wrote:
| "it's not like there is one person typing at a keyboard
| with a hundred managers breathing down their neck. These
| resolutions are collaborative, shared efforts"
|
| Well, you'd be surprised about how one person can bring
| everything down and/or save the day at Facebook,
| Cloudflare, Google, Gitlab, etc. Most people are
| observers/cheerleaders when there is an incident.
| cromka wrote:
| > Most people are observers/cheerleaders when there is an
| incident.
|
| Yeah, a typical fight/flight response.
| SomeBoolshit wrote:
| Or most people simply don't have anything useful to add
| or do during an incident.
| ikiris wrote:
| Taking all the available slots in the massive gvc warroom
| ain't much... but its honest work.
| xorcist wrote:
| > Well-managed companies blame processes rather than
| people,
|
| We're six hours without a route to their network, and
| counting. I think we can safely rule out well-managed.
| thih9 wrote:
| > It shouldn't be too stressful. (...) it's not like there
| is one person typing at a keyboard with a hundred managers
| breathing down their neck
|
| Earlier comment mentioned that there is a bottleneck, and
| that people who are physically able to solve the issue are
| few and that they need to be informed what to do; being one
| of these people sounds pretty stressful to me.
|
| "but the people with physical access is separate (...) Part
| of this is also due to lower staffing in data centers due
| to pandemic measures", source:
| https://news.ycombinator.com/item?id=28749244
| mov31tmov31t wrote:
| Sure, but that's what conference calls are for.
|
| Most big tech companies automatically start a call for
| every large scale incident, and adjacent teams are
| expected to have a representative call in and contribute
| to identifying/remediating the issue.
|
| None of the people with physical access are individually
| responsible, and they should have a deep bench of advice
| and context to draw from.
| astridpeth wrote:
| I'm not an IT Operations guy, but as a dev I always thought
| it was exciting when the IT guys had in their shoulders the
| destiny of the firm. I must be exciting.
| donalhunt wrote:
| You tend not to think about it...
|
| Most teams that handle incidents have well documented
| incident plans and playbooks. When something major
| happens you are mostly executing the plan (which has been
| designed and tested). There are always gotchas that
| require additional attention / hands but the general
| direction is usually clear.
| Ansil849 wrote:
| > Well-managed companies blame processes rather than people
|
| I feel like this just obfuscates the fact that individuals
| are ultimately responsible, and allows subpar employees to
| continue existing at an organization when their position
| could be filled by a more qualified employee. (Not talking
| about this Facebook incident in particular, but as a
| generalisation: not attributing individual fault allows
| faulty employees to thrive at the expense of more qualified
| ones).
| kryogen1c wrote:
| > this just obfuscates the fact that individuals are
| ultimately responsible
|
| in critical systems, you design for failure. if your
| organizational plan for personnel failure is that no one
| ever makes a mistake, that's a bad organization that will
| forever have problems.
|
| this goes by many names, like the swiss cheese model[0].
| its not that workers get to be irresponsible, but that
| individuals are responsible only for themselves, and the
| organization is the one responsible for itself.
|
| [0] https://en.wikipedia.org/wiki/Swiss_cheese_model
| Ansil849 wrote:
| > is that no one ever makes a mistake
|
| This isn't what I'm saying, though. The thought I'm
| trying to express is that if no individual accountability
| is done, it allows employees who are not as good at their
| job (read: sloppy) to continue to exist in positions
| which could be better occupied by employees who are
| better at their job (read: more diligent).
|
| The difference between having someone who always triple-
| checks every parameter they input, versus someone who
| never double-checks and just wings it. Sure, the person
| who triple-checks will make mistakes, but less than the
| other person. This is the issue I'm trying to get at.
| tonfa wrote:
| > The difference between having someone who always
| triple-checks every parameter they input, versus someone
| who never double-checks and just wings it. Sure, the
| person who triple-checks will make mistakes, but less
| than the other person. This is the issue I'm trying to
| get at.
|
| If you rely on someone triple-checking, you should
| improve your processes. You need better
| automation/rollback/automated testing to catch things.
| Eventually only intentional failure should be the issue
| (or you'll discover interesting new patterns that should
| be protected against)
| zaat wrote:
| If someone is sloppy and not willing to change he should
| be shown the door, but not because he caused outage but
| because he is sloppy.
|
| People who operate systems under fear tend to do stupid
| things like covering up innocent actions (deleting logs),
| keep information instead of sharing it etc. Very few can
| operate complex systems for long time without doing
| mistake. Organization where the spirit is "oh, outage,
| someone is going to pay for that" wiil never be
| attractive to good people, will have hard time adapting
| to changes and to adopt new tech.
| whermans wrote:
| If there is an incident because an employee was sloppy,
| the fault lies with the hiring process, the evaluation
| process for this employee, or with the process that put
| four eyes on each implementation. The employee fucked up,
| they should be removed if they are not up to standards,
| but putting the blame on them does not prevent the same
| thing from happening in the future.
| nerdawson wrote:
| By focusing on the process, lessons are learned and
| systems are put in place which leads to a cycle of
| improvement.
|
| When individuals are blamed instead, a culture of fear
| sets in and people hide / cover up their mistakes.
| Everybody loses as a result.
| alex_sf wrote:
| I don't think the comment you're replying to applies to
| your concern about subpar employees.
|
| We blame processes instead of people because people are
| fallible. We've spent millenia trying to correct people,
| and it rarely works to a sufficient level. It's better to
| create a process that makes it harder for humans to screw
| up.
| Ansil849 wrote:
| Yes, absolutely, people make mistakes. But the thought I
| was trying to convey is that some people make a lot more
| mistakes than others, and by not attributing individual
| fault these people are allowed to thrive at the cost of
| having less error-prone people in their position. For
| example, someone who triple-checks every parameter that
| they input, versus someone who has a habit of just
| skimming or not checking at all. Yes the triple-checker
| will make mistakes too, but way less than the person who
| puts less effort in.
| mynameisvlad wrote:
| But that has nothing to do with blaming processes vs
| people.
|
| If the process in place means that someone has to triple
| check their numbers to make sure they're correct, then
| it's a broken process. Because even that person who
| triple checks is one time going to be woken up at 2:30am
| and won't triple check because they want sleep.
|
| If the process lets you do something, then someone at
| some point in time, whether accidentally or maliciously,
| will cause that to happen. You can discipline that
| person, and they certainly won't make the same mistake
| again, but what about their other 10 coworkers? Or the
| people on the 5 sister teams with similar access who
| didn't even know the full details of what happened?
|
| If you blame the process and make improvements to ensure
| that triple checking isn't required, then nobody will get
| into the situation in the first place.
|
| _That_ is why you blame the process.
| samhw wrote:
| Yeah, I've heard this view a hundred times on Twitter,
| and I wish it were true.
|
| But sadly, there is no company which doesn't rely, at
| least at one point or another, on a human being typing an
| arbitrary command or value into a box.
|
| You're really coming up against P=NP here. If you can
| build a system which can auto-validate or auto-generate
| everything, then that system doesn't really need humans
| to run at all. We just haven't reached that point yet.
|
| _Edit: Sorry, I just realised my wording might imply
| that P does actually equal NP. I have not in fact made
| that discovery. I meant it loosely to refer to the
| problem, and to suggest that auto-validating these things
| is at least not much harder than auto-executing them._
| mynameisvlad wrote:
| I don't think anyone ever claimed the process itself is
| perfect. If it were, we obviously would never have any
| issues.
|
| To be explicit here, by blaming the process, you are
| discovering and fixing a known weakness in the process.
| What someone would need to triple check for now, wouldn't
| be an issue once fixed. That isn't to say that there
| aren't any other problems, but it ensures that one issue
| won't happen again, regardless of who the operator is.
|
| If you have to triple check that value X is within some
| range, then that can easily be automated to ensure X
| can't be outside of said range. Same for calculations
| between inputs.
|
| To take the overly simplistic triple check example from
| before, said inputs that need to be triple checked are
| likely checked based on some rule set (otherwise the
| person themselves wouldn't know if it was correct or
| not). Generally speaking, those rules can be encoded as
| part of the process.
|
| What was before potentially "arbitrary input" now becomes
| an explicit set of inputs with safeguards in place for
| this case. The process became more robust, but is not
| infallible.
|
| But if you were to blame people, the process still takes
| arbitrary input, the person who messed up will probably
| validate their inputs better but that speaks nothing of
| anyone else on the team, and two years down the line
| where nobody remembers the incident, the issue happens
| again because nothing really has changed.
| samhw wrote:
| The issue is that this view always relies on stuff like
| "make people triple check everything".
|
| - How does that relate to making a config change?
|
| - How do you practically implement a system where someone
| has to triple check everything they do?
|
| - How do you stop them just clicking 'confirm' three
| times?
|
| - Why do you assume they will notice on the 2nd or 3rd
| check, rather than just thinking "well, I know I wrote it
| correctly, so I'll just click confirm"?
|
| I don't think rules can always be encoded in the process,
| and I don't see how such rules will always be able to
| detect all errors, rather than only a subset of very
| obvious errors.
|
| And that's only dealing with the simplest class of
| issues. What about a complex distributed systems problem?
| What about the engineer who doesn't make their system
| tolerant of Byzantine faults? How is any realistic
| 'process' going to prevent that?
|
| This entire trope relies on the fundamental axiom that
| "for any individual action A, there is a process P which
| can prevent human error". I just don't see how that's
| true.
|
| (If the statement were something like "good processes can
| eliminate whole classes of error, and reduce the
| likelihood of incidents", I'd be with you all the way.
| It's this Twitter trope of "if you have an incident, it's
| _a priori_ your company 's fault for not having a process
| to prevent it" which I find to be silly and not even
| _nearly_ proven.)
| zaat wrote:
| If you'd think about it, it isn't very useful to find a
| person who is responsible. Suppose someone cause outage
| or harm, due to neglect or even bad intentions, either
| the system will be setup in a way that the person
| couldn't cause the outage or that in time it will be
| down. To build truly resilient system, especially on
| global scale, there should never be an option for a
| single person to bring down the whole system.
| ric2b wrote:
| > and allows subpar employees to continue existing at an
| organization when their position could be filled by a
| more qualified employee.
|
| Not really, their incompetence is just noticed earlier at
| the review/testing stages instead of in production
| incidents.
|
| If something reaches production that's no longer the
| fault of one person, it's the fault of the process and
| that's what you focus on.
| TrevorJ wrote:
| >Well-managed companies
|
| To what extent does this include Facebook?
| aenis wrote:
| Well, individuals will still stress, if anything, due to
| the feeling of bein personally responsible for inflicting
| damage.
|
| I know someone who accidentally added a rule 'reject access
| to * for all authenticated users' in some stupid system
| where the ACL ruleset itself was covered by this *, and
| this person nearly collapsed when she realized even admins
| were shut out of the system. It required getting low level
| access to the underlying software to reverse engineer its
| ACLs and hack into the system. Major financial institution.
| Shit like leaves people with actual trauma.
|
| As much as I hate fb, I really feel for the net ops guys
| trying to figure it all out, with the whole world watching
| (most of it with shadenfreude)
| ikiris wrote:
| As one of the major responders to an incident analogous to
| this at a different fang... you're high, its still hella
| stressful.
| tristor wrote:
| > It can be sort of exciting, but it's not like there is
| one person typing at a keyboard with a hundred managers
| breathing down their neck.
|
| As someone who formerly did Ops for many many years... this
| is not accurate. Even in a well organized company there are
| usually stakeholders at every level on IM calls so that
| they don't need to play "telephone" for status. For an
| incident of this size, it wouldn't be unusual to have
| C-level executives on the call.
|
| While those managers are mostly just quietly listening in
| on mute if they know what's good (e.g. don't distract the
| people doing the work to fix your problem), their mere
| presence can make the entire situation more tense and
| stressful for the person banging keyboards. If they decide
| to be chatty or belligerent, it makes everything 100x
| worse.
|
| I don't envy the SREs at Facebook today. Godspeed fellow
| Ops homies.
| LordHumungous wrote:
| C levels don't sit on the call with engineers. They
| aren't that dumb. Managers will communicate upward.
| Salgat wrote:
| I think it comes down to the comfort level of the worker.
| I remember when our production environment went down. The
| CTO was sitting with me just watching and I had no
| problem with it since he was completely supportive,
| wasn't trying to hurry me, just wanted to see how the
| process of fixing it worked. We knew it wasn't any
| specific person's fault, so no one had to feel the heat
| from the situation beyond just doing a decent job getting
| it back up.
| yupper32 wrote:
| The stress for me usually goes away once the incident is
| fully escalated and there's a team with me working on the
| issue. I imagine that happened quite quick in this case...
| mrweasel wrote:
| Exactly, the primary focus in situations like this, is to
| ensure that no one feel like they are alone, even if in the
| end it is one person who has to type in the right commands.
|
| Always be there, help them double check, help monitor, help
| make the calls to whomever needs to be informed, help
| debug. No one should ever be alone during a large incident.
| calebm wrote:
| "When you are strong, appear weak."
| [deleted]
| jurajmlich wrote:
| It seems it has caused DNS servers crash for one of biggest
| Czechia's internet provider - Vodafone. Can be unrelated but I
| doubt it
| (https://twitter.com/BlazejKrajnak/status/1445063232486531099).
|
| Think of it - half the country doesn't have internet because of
| this crash, that's terrifying. (Switching DNS servers obviously
| works but that's not something the general population will do)
| blntechie wrote:
| All most people have to do now is install an app and it takes
| care. But messaging need to go from media and news.
| megous wrote:
| If only the news reporting was not as stupid as "internet is
| not working at UPC", instead of DNS resolvers at UPC crashed,
| here's what you can do...
|
| Anyway, I didn't even notice since I run knot-resolver at home.
|
| I wonder what it will be like connecting Facebook back to the
| internet, thundering herd and everything...
| yourad_io wrote:
| I suspect that the DNS aspect will be fine. The middle DNS
| servers only need one valid response to cache it for $TTL,
| but they can't cache SERVFAIL.
| megous wrote:
| I mean connecting your company to the internet when you
| have billions of devices waiting in the line to fetch
| updates, or whatever.
|
| Will not that be an issue? Re-enabling routing to such a
| massive internet service...
| agilob wrote:
| Same in the UK, I've just experienced external DNS outage on
| BT!
| alexdumitru wrote:
| Vodafone is down in Romania too.
| [deleted]
| nabeards wrote:
| Was receiving an error page, now just a server not responding
| error.
| kawsper wrote:
| I can't even resolve Facebook.com
| daitangio wrote:
| Mobile app hangs too (I am from Italy btw)
| amir-h wrote:
| Hacker News also got so much slower, is it the load from people
| hoarding here after not being able to reach FB?
|
| [I'm also getting server error trying to submit this comment]
| slackfan wrote:
| So how much cash do we need to pitch in to _keep_ it down?
| throwaway123x2 wrote:
| The cynic in me wonders if this is related to the Pandora Papers
| leak
| tacker2000 wrote:
| https://www.status.fb.com/ is back online now
| cestith wrote:
| Out of band management is an important feature for the
| reliability of your network.
| MrPatan wrote:
| Oh no
| faramarz wrote:
| What are some of the possible scnarios beyond the DNS issue
| suggested? (and might it be an attack?)
| heegemcgee wrote:
| This is what i came to comments for.
|
| Unfortunately, we have literally dozens of comments that amount
| to nothing more than schadenfreude, and another handful of non-
| FANG employees speculating how one of the largest internet
| operations in existence could improve their game (lol)
| coolspot wrote:
| A BGP routing mistake that can cascade into a hard-to-recover-
| from state of the network where inter-dependencies lock each
| other.
| cwkoss wrote:
| I doubt this is the case, but someone on twitter was
| speculating "what if this is fb's infra team going on strike"
| Sophira wrote:
| Somebody just had their very own "onosecond".
|
| https://www.youtube.com/watch?v=X6NJkWbM1xk
|
| The video is one that Tom Scott published in June 2020 about the
| worst typo he ever made in one of his prior jobs, and while the
| Facebook mistake is almost certainly not going to be anything
| irrecoverable like this one, you can bet that Facebook pride
| themselves on being available all the time.
| todd-davies wrote:
| Perhaps allowing Facebook, WhatsApp and Instagram to merge was
| efficient after all - now that they have synchronized outages,
| people finally have a chance to get on with their lives, free of
| clickbait news and misinformation.
| randomperson_24 wrote:
| World productivity just grew by 10%
| frederikvs wrote:
| or it went down by another 20%, everybody at first thinking
| there's something wrong with their internet connection.
| 38932ur98u wrote:
| This event should be a good conversation starter on how
| horrifyingly monopolistic this trifecta of services has on
| worldwide communication. When I think through a random smattering
| of people in my contact book, I now have no way of contacting
| quite a few people at all. That's fucked. I wonder how many
| important messages, replies, etc will be screwed up due to this.
| GDC7 wrote:
| Maybe intentional?
|
| Zuck trying to give an example of what a world without FB would
| look like, kinda saying to detractors what would happen if they
| had it their way.
| gmiller123456 wrote:
| Maybe "intentional" in quotes. My money is on a major security
| breach and they've shut everything down until they can deal
| with it. Even if you go to Instagram by the IP address [1], you
| get a 400 error. So it looks like things are off line because
| they want them off line for now.
|
| https://31.13.65.174/
| forix wrote:
| That would be classic Zuck right there
| babuskov wrote:
| Is HN hit by something as well? It's loading really slow for me.
| tinyprojects wrote:
| Oculus is also down
| ourcat wrote:
| Indeed. People seem to forget that when Facebook goes down,
| it's not just your feed of depressing posts, photos and
| messages that go away, but also the entire Oculus VR platform,
| since they demanded a FB account to use Quest headsets.
| chungy wrote:
| Even Facebook's Onion site isn't working:
| http://facebookwkhpilnemxj7asaniu7vnjjbiltxjqhye3mhbshg7kx5t...
|
| Fascinating simply that it's apparently not just a DNS issue.
| alexvoda wrote:
| Any idea what is the explanation for this?
| AlexAndScripts wrote:
| Probably internal services depend on the DNS too
| MrYellowP wrote:
| I keep trying to submit to HN but I keep getting an error.
|
| What's wrong with the internet?
|
| FaceBook is down.
|
| My friend from Slovenia is having trouble with discord. It eats
| his messages.
|
| I can't load photos from my friend in telegram and the messages
| take a relatively long time - multiple seconds! - to get
| received.
|
| TrackMania players have talked about having input lag.
|
| ycombinator is really slow and reports an error after submitting.
| "We're having some trouble serving your request. Sorry!" (lost
| count of the times i've tried submitting this)
|
| ycombinator turned out to be giving only errors.
|
| Some sites I've found via google results seem to report that they
| are suffering from slow connections.
|
| Do you have anything to add to this?
| saltyfamiliar wrote:
| I'm having issues with telegram as well. Images won't send and
| the app continuously says "updating" on the top status bar.
|
| How could facebook dns issues cause this?
| user3939382 wrote:
| Big ISP outages in NYC right now
| dclaw wrote:
| Delete Facebook / Instagram / WhatsApp when it comes back up.
| They are all trash.
| mupuff1234 wrote:
| Is there any site that tracks number of users for messaging apps?
| I'd be really curious to see if signal\telegram\etc are seeing a
| big bump.
| pwenzel wrote:
| I noticed that some websites are loading slowly due to the third
| party script https://connect.facebook.net/en_US/fbevents.js
| timing out.
|
| When uBlock Origin is running, this script gets blocked and pages
| return to feeling snappy.
| dizzant wrote:
| Reading the thread, I'm surprised at the number of nearly
| identical "How much do we have to pay to keep it down? xD" posts
| I'm seeing, often from throwaway accounts. Some accounts with
| multiple near-identical posts within the same minute.
|
| Could this be a coordinated smear in HN comments?
| cwkoss wrote:
| I think a lot of people just think facebook is bad for society.
| I do.
| mnd999 wrote:
| Somebody moved fast.
| drummer wrote:
| And broke everything
| spicybright wrote:
| Still amazes me their infra team is supposedly the best in the
| world, and compensated as such, yet things like this happen.
|
| Personally I'm glad FB went down for a few hours, but it's hard
| to imagine how that would happen in the first place.
| jangrahul wrote:
| makes me wonder, why dont porn sites ever seem to go down ?
| ProjectArcturis wrote:
| Have they tried turning it off and then turning it back on again?
| nurhdmsx wrote:
| Its always a DNS problem
| TremendousJudge wrote:
| Whataspp too. This seems pretty big
| yholio wrote:
| I had problems with my internet connection and loaded my ISPs
| site. Strangely, my bill was paid. Even stranger, some sites load
| while others do not.
|
| Then it hit me: I am so dependent on Facebook owned properties
| (Whatsapp, Facebook, insta) that a Facebook failure looks to me
| like an internet failure.
| neb_b wrote:
| Also instagram and messenger
| mindcrime wrote:
| _Okay, let me tell you the difference between Facebook and
| everyone else, we don 't crash EVER! If those servers are down
| for even a day, our entire reputation is irreversibly destroyed!
| Users are fickle, Friendster has proved that. Even a few people
| leaving would reverberate through the entire userbase. The users
| are interconnected, that is the whole point. College kids are
| online because their friends are online, and if one domino goes,
| the other dominos go, don't you get that?_
| dgb23 wrote:
| Is this a quote?
| qnsi wrote:
| google suggest its from a social network movie
| mindcrime wrote:
| Yes, it's from "The Social Network". It's a scene where Mark
| Z. is explaining to Eduardo how important it is that the
| servers stay up all the time.
|
| Of course it was, as far as I know, ficitonalized in the
| first place, although it rings true (in context) to some
| extent. What I wonder is, how much is that true now? That is,
| how much downtime would FB have to experience for enough
| users to start leaving, to the point that it might prompt a
| serious exodus.
| GreenWatermelon wrote:
| At this point in time, I doubt this holds true.
|
| Facebook is just too big and pervasive that such an outage
| would be treated by its users like an internet outage or a
| power outage. Once it's back online, everyone will forget.
| MrYellowP wrote:
| I keep trying to submit to HN but I keep getting an error.
|
| What's wrong with the internet?
|
| FaceBook is down.
|
| My friend from Slovenia is having trouble with discord. It eats
| his messages.
|
| I can't load photos from my friend in telegram and the messages
| take a relatively long time - multiple seconds! - to get
| received.
|
| TrackMania players have talked about having input lag.
|
| ycombinator is really slow and reports an error after submitting.
| "We're having some trouble serving your request. Sorry!" (lost
| count of the times i've tried submitting this)
|
| ycombinator turned out to be giving only errors, but now seems to
| be working _occasionally_. I can not submit anything, though.
|
| Some sites I've found via google results seem to report that they
| are suffering from slow connections.
|
| Do you have anything to add to this?
| tiluha wrote:
| Some of those can probably be explained by facebooks traffic
| being redistributed to other services, overloading them
| agilob wrote:
| A few wordpress blogs crashed because addon facebook pixel is
| crashing. Very intensive lesson for the internet!
| Animats wrote:
| _What 's wrong with the internet?_
|
| With Facebook down, some large DNS servers seem to be
| struggling with the extra load of failing requests to look up
| "facebook.com". Cloudflare reports overload with their DNS
| server at 1.1.1.1, although that's working for me.
|
| Billions of things worldwide are trying to connect to Facebook.
| The lookup which normally returns the IP address for
| facebook.com on the first try now requires trying
| a.ns.facebook.com, b.ns.facebook.com, etc. several times each
| before giving up. Probably several times a minute for everyone
| who has a Facebook app in their phone turned on. That may be
| using a big fraction of world DNS resources.
|
| Vodaphone Ireland seems to be struggling with a DNS overload
| right now, per the Irish Independent. Also, their status page
| can't find "Dublin" as a city.
| lazlee wrote:
| Hopefully forever.
| pytlicek wrote:
| Who else sees their deleted messages on WhatsApp that shouldn't
| be there?
|
| https://news.ycombinator.com/item?id=28749652
| mrkickling wrote:
| When I do dig instagram.com I get an A response for this IP:
| 31.13.65.174 or similar addresses, which leads to an empty page.
| Paianni wrote:
| The level of intelligence of the comments in this thread kinda
| confirm my suspicions that the armchair experts from Reddit et al
| (including myself) are discovering this site.
| Animats wrote:
| Facebook outage is now the top story on CNN and Fox. Facebook
| stock down 5%. Facebook is not returning calls from Fox, or CNN.
| zip1234 wrote:
| DNS?
| [deleted]
| SSLy wrote:
| No, I'm getting their error page, so Load Balancers or whatever
| is behind. EDIT: Or at least not /just/ DNS.
| rootinier wrote:
| Definitely DNS. facebook.com might be in your local dns
| cache.
| andyjohnson0 wrote:
| I'm getting an error page with a dead image link and a 2020
| copyright date (uk)
| orthecreedence wrote:
| Good riddance.
| walrus01 wrote:
| downdetector looks like a real mess for it.
|
| I'm going to parrot the other comment here and say nothing of
| value was lost.
|
| https://downdetector.com/status/facebook/
| [deleted]
| tannhaeuser wrote:
| Expecting to get messages on WhatsApp alternatives tonight ...
| nycdatasci wrote:
| Ignore.
| wrycoder wrote:
| 19 hours ago??
| stonks wrote:
| Facebook and Messenger are working now.
|
| Instagram and WhatsApp - not yet.
| sss111 wrote:
| I was hoping it would stay down for longer haha
| RedShift1 wrote:
| Nothing of value was lost
| subsaharancoder wrote:
| A lot of people, many of them home based businesses, also rely
| on FB Marketplace as a primary source of income.
| tantalor wrote:
| That's terrifying.
| pixelgeek wrote:
| They have to go where their market is sadly
| subsaharancoder wrote:
| Many people don't realize that with the 2020 lockdown and
| next to zero face to face transactions happening, platforms
| like FB Marketplace provided an opportunity for many people
| to set up businesses and generate income. I understand the
| angst people have with FB, but there's a bigger world out
| there beyond our keyboards.
| walrus01 wrote:
| for one example of this look at certain ethnic food
| catering/delivery services that exist in many major cities
| and operate almost exclusively on facebook.
| madeofpalk wrote:
| I can't message my friends on whatsapp :(
| heywherelogingo wrote:
| Seize the moment - switch to signal.
| madeofpalk wrote:
| Is Signal not equally centralised, and thus susceptible to
| the exact same problem as this?
| m-chrzan wrote:
| Yes. In the ideal world messaging would've have followed
| the same federalized model as email. XMPP offers this,
| unfortunately few people use it or even are aware of it.
| Unklejoe wrote:
| Yep. Matrix is a decentralized alternative (provided you
| don't just use the default homeserver).
| CodeGlitch wrote:
| Yes it is.
|
| Alternatives beyond signal that normies can use: Email.
|
| Spread the word!
| _-david-_ wrote:
| Doesn't help when everyone just uses Gmail.
| sneak wrote:
| Write a blog post teaching them how to stop:
|
| https://sneak.berlin/20201029/stop-emailing-like-a-rube/
| goodpoint wrote:
| Correct. Switch to Briar.
| celsoazevedo wrote:
| Yeah, but if you're going to use something centralised
| anyway, may as well use a more private option.
| oneeyedpigeon wrote:
| This issue isn't about privacy, it's about reliability.
| How reliable is Signal compared to WhatsApp?
| derin wrote:
| ...and where do you go when AWS/Signal's servers go down?
|
| How about choosing something that's federated?
| https://matrix.org/
| CodeGlitch wrote:
| Email
|
| (I'm not kidding)
| grey_earthling wrote:
| Delta.chat is an instant messenger implemented over
| email. Alternatively, it's an email client that looks
| like an instant messenger.
| celsoazevedo wrote:
| I'm fine with Matrix, but I'm not seeing the people
| around me moving to it, even with a more friendly
| solution like Element. It's already hard to make them use
| Signal just because they want users to remember a pin...
| gpderetta wrote:
| Can't tell them to switch if whatsapp is down!
|
| More guidance required.
| mdoms wrote:
| Based on their track record I wouldn't be surprised if
| Signal just happened to be having an extended outage too.
| mcheung610 wrote:
| But positive social value was gained
| can16358p wrote:
| Just because a company has questionable or even straight evil
| business practices doesn't mean that literally millions of
| companies/people don't rely on them to do business and
| communicate.
| mattfrommars wrote:
| Facebook bashing is getting old. It's 2021, dammit.
| winter_blue wrote:
| Well, I know you jest, but a lot of conversations, with many
| people, over years and years would be lost. It'd be akin to
| hundreds of email threads with friends being deleted.
| ozfive wrote:
| This cannot be said enough.
| blowski wrote:
| On the contrary, it's said far too much. Facebook is
| extremely valuable for a lot of people. I dislike Facebook as
| much as most people on here, but saying "it's totally
| pointless" is silly and it's not surprising that those who
| say it are ignored by those who use Facebook.
| belter wrote:
| In what ways is Facebook "extremely valuable for a lot of
| people"?
| blowski wrote:
| * A friend of mine runs a posh burger van that moves
| around a lot, and he puts "today's location" on Facebook.
|
| * My wife talks with her family in Brazil through
| Facebook, sharing photos
|
| * My Church receives a lot of help requests from people
| in trouble through Facebook
|
| * Some abuse charities talk give support to victims
| through Facebook
|
| etc
|
| You could argue that it would be nice if there were
| alternatives, or that these organisations shouldn't be
| using Facebook at all. Sign me up for your campaign, I
| agree with you.
|
| But if you say "Facebook has no value" then you will
| never understand the value proposition you need to offer
| in order to kill Facebook.
| Graffur wrote:
| I have many connections to people I met travelling. While
| not friends that I talk to often, the connections are
| still valuable.
| can16358p wrote:
| Communication for many out there. Many will be just fine
| without commenting on cat photos or bragging with their
| likes or followers. Many will be in trouble if they use
| WhatsApp/Instagram/Messenger/Marketplace to do business
| and any important communication.
| DarkmSparks wrote:
| lots of people are heroin dependant, the number of people
| hooked doesn't make it right.
|
| At the very least you are going to need a better arguments
| than that following the recent data dump.
| rawoke083600 wrote:
| Never underestimate software that is 'just good enough'
| johnwheeler wrote:
| much value was gained!
| finolex1 wrote:
| I get that this in jest, but a lot of people rely on Whatsapp
| and FB Messenger for messaging.
| erdos4d wrote:
| I certainly do and I dream of the day that everyone I message
| switches, so I can too.
| heywherelogingo wrote:
| Why not lead the way?
| dekerta wrote:
| There are plenty of ways to communicate with friends and
| family. If Facebook is down long enough, many people will
| just move to something else. (And I hope they do)
| ekianjo wrote:
| Making poor choices seems to be the curse of humankind.
| bborud wrote:
| Maybe they shouldn't.
| paul7986 wrote:
| They relied on AOL Instant Messenger too...
| riffic wrote:
| they shouldn't.
| simlevesque wrote:
| Instagram messaging is also very popular, at least around me.
| m-chrzan wrote:
| A lot of people, out of habit, rely on high fructose corn
| syrup for calories.
| drcongo wrote:
| Quite a lot of people rely on heroin to get through the day
| too.
| parthdesai wrote:
| You do know that whatsapp is literally used by small
| businesses in 3rd world to conduct....business right?
| AkshatM wrote:
| It's a little irksome how other commentors are quick to
| dismiss this very _valid_ point. SMBs in Asia aren 't
| using WhatsApp because they've forced the platform on
| their consumers; it's their consumers who are using
| WhatsApp who've forced a choice on the SMBs. WhatsApp has
| very wide consumer penetration, and its use by businesses
| is meant as a convenience wrapper for customers.
|
| Now, does switching from WhatsApp to some other not-very-
| widely-used platform cause customer engagement /
| retention to drop? I would wager very much so! It's a
| matter of priorities - people go where there is least
| friction, and WhatsApp otherwise provides a seamless
| friction-less experience.
| parthdesai wrote:
| It's very first world centric point of view. I doubt some
| of these commentator claiming whatsapp being down is good
| for the society have ever been outside of the first world
| and have seen how it actually helps people in need.
| Symbiote wrote:
| But doesn't that mean it will be easy for the SMBs to
| move to any replacement service?
| AkshatM wrote:
| At the cost of losing customers, is my point :)
| qwertox wrote:
| Uff, I see no reason to smile about it.
| cwkoss wrote:
| Maybe these business will diversify their communication
| mediums because WhatsApp is down - seems like a good
| thing for society.
| parthdesai wrote:
| Do you even know who these business owners are and what
| kind of life do they live? These are the guys that don't
| have a solid roof over their head, struggle to meet their
| daily needs and might have to sleep hungry if their day's
| sales weren't good. Diversifying is the least of the
| things they have to worry about. Whatsapp allowed them to
| reduce friction when it comes to communicating with
| customers, it helps their sales.
|
| What might be a good thing for society in the first world
| doesn't mean it's necessarily good thing for society in
| the third world.
| cwkoss wrote:
| I reject this logic - it's an argument for sustaining the
| status quo at all costs.
|
| Facebook is the most user-hostile tech megacorp, and they
| will inevitably harm these businesses you care about. The
| sooner the bandaid is ripped off the better.
| AkshatM wrote:
| I mean, sure, status quo can / should be changed - but
| you want to get to a point where a _changed_ status quo
| is sustainable, and you 're not going to get there by
| simply removing existing options. It doesn't change the
| incentives people have for preferring to use the
| platform, namely the pre-existing widespread penetration.
|
| You want to dislodge Facebook, you need to disrupt it /
| curtail its monopoly.
| cmorgan31 wrote:
| You need a contingency plan for when vendors go down even
| in 3rd world countries. It just so happens a lot of us
| would not mind this vendor failing entirely. It's
| unfortunate that we have so little choice in the matter
| but ultimately the same advice holds true for all of us
| smugly throwing insults while keeping our billing in AWS.
| toomanybeersies wrote:
| At the start of this year I started working for an
| employment service company that covers the Indo-Asia-
| Pacific and South American markets.
|
| I was amazed to discover how pervasive Facebook, Inc. has
| become in the developing world for conducting business
| and navigating everyday life.
|
| For a lot of people in developing nations such as the
| Phillipines and Indonesia, Facebook is synonymous with
| the internet. This has been buoyed by their push to
| bundle uncapped/free data for Facebook with mobile plans
| in markets with high growth of mobile internet access.
|
| It's interesting, because I'm always reading articles
| about how "Western teens aren't using Facebook any more",
| which is true, but it's also irrelevant, because they're
| not really a profitable market, teenagers have short
| attention spans and no money. Facebook's growth strategy
| is to become the one stop shop (in lower income nations)
| for everything you want and need.
| dustinmoris wrote:
| You do know that <insert-extremely-damaging-thing> is
| literally used by small businesses in 3rd world to
| conduct....business right?
| rubyist5eva wrote:
| Facebook doesn't care.
| luaybs wrote:
| Not to mention all the small businesses that rely on
| Instagram too. Here it's used as an e-commerce platform.
| aaomidi wrote:
| Have you considered that any change done is going to mean
| winners and losers.
|
| If Facebook permanently goes down then those businesses
| would move to a different platform.
|
| Would it suck? Probably. Would the world be a better
| place without Facebook? A ton of people think so. Me
| included.
|
| This is the same argument people have used when we talk
| about health insurance in the US being scammy. If we ever
| decided to address it it means a good chunk of people
| lose their jobs but also means that the health of this
| country goes up. Which one is more important?
| mitigating wrote:
| But people moving from Facebook to another social media
| or messaging platform is just changing the company. That
| new company could do whatever things you don't like that
| Facebook is doing. Your example seems to mean that we
| move to another healthcare system as in method of
| implementation not just moving from one company to
| another.
| grey_earthling wrote:
| > But people moving from Facebook to another social media
| or messaging platform is just changing the company.
|
| This is not necessarily true. There are social networks
| and messaging systems implemented as open protocols.
| drcongo wrote:
| Maybe that was a bit of a....mistake?
| oblio wrote:
| And the alternative is... ?
| celsoazevedo wrote:
| Email, SMS, good ol' phone calls, Signal, <insert local
| app/platform here>, your own website, etc, on top of
| whatever you use right now.
|
| If you're in a country that relies a lot on Facebook or
| Whatsapp, that's where the main focus will be, but at
| least try to have alternatives just in case something
| goes wrong.
| Spivak wrote:
| So 4/4 of those are platforms controlled by a single
| company or a few large corporations. This really isn't a
| win in any meaningful sense.
|
| It should be fine for huge corporations to exist and
| provide services really efficiently at scale while also
| being forced to play nice and respond to the will of the
| people they serve.
|
| If we collectively can't stop Facebook from doing bad
| thing and being bad stewards to their own platform then
| you won't be able to stop whatever would replace them
| either.
| drcongo wrote:
| It's quite possible to run a business without WhatsApp.
| Lots of businesses have been doing it for quite a long
| time.
| golergka wrote:
| It was a mistake to communicate with the users on a
| platform that they use? Instead of trying to get them on
| signal, losing 90% of leads in the process and making
| each of your sales cost x10 much?
| CodeGlitch wrote:
| Unfortunately they are about to be taught a hard lesson
| in what "free" really means.
| ceejayoz wrote:
| That'll depend on the length of the outage, other tasks
| they can do during it, and the uptime and market
| penetration of any competing services.
|
| I don't think much of a lesson is going to occur here.
| It'll be a brief blip that impacts few meaningfully.
| justapassenger wrote:
| Big tech free services have WAY better uptime than
| commercial alternatives.
| oehtXRwMkIs wrote:
| That's not what they meant by free.
| jamal-kumar wrote:
| I've been doing a pretty good job of moving my client's
| communications to Signal out here.
|
| I feel bad for everyone who relies on whatsapp bots for
| making stuff happen, though. These are getting really
| common out here for a lot of things and it always worries
| me that it's such a linchpin. They're really handy and
| save a lot of bullshit phone calls from having to be
| something people deal with for simple stuff like pharmacy
| delivery. I can get food from the local place down the
| street that's only really open for lunch and totally off
| the map for uber eats, for example... if this persists a
| few more hours those mom and pop type shops aren't going
| to have as great a day.
| walrus01 wrote:
| Maybe an event like this will spur some people into...
| _not doing that_? Yes I 'm aware of the ubiquitous nature
| of whatsapp in many developing nations. Have also
| successfully got a lot of people moved onto using Signal
| for anything they care about.
| Sahbak wrote:
| Signal has and will go down just like facebook.
| Cloudflare/aws having issues affects an insanely high
| percentage of the internet. People still use them.
| Outages rarely cause anything, they happen, people move
| on.
| turtlebits wrote:
| Don't businesses fall back to SMS/phone or e-mail?
| Doesn't seem like a good idea to rely on a single
| corporation.
| [deleted]
| el-salvador wrote:
| El Salvador basically runs on WhatsApp. From the small
| food stall to CEOs and maybe even government.
| Symbiote wrote:
| > The country where I live
|
| With your username, I think you can risk naming the
| country without any additional loss of privacy.
| el-salvador wrote:
| Edited :)
| ivanmontillam wrote:
| In Latin American 3rd world countries, people also
| conduct business via Instagram.
|
| They create Instagram accounts and post products as
| posts, with a caption of "DM me for price".
|
| It also turns on every alarm on my mind, when they start
| calling these "Instagram pages". It blurs the line
| between a real website and an Instagram account (In
| Spanish, "website" is "pagina web" as well).
|
| I've also heard: "My business went to hell because
| Instagram killed my account" and that's when I reply:
| "Have you ever thought of owning a real website?"
| fortran77 wrote:
| He's a HN 10xer. He doesn't care about anyone outside his
| Palo Alto cold-press-Koffee-Klatch, despite what he
| virtue signals. It's amusing seeing people here trip over
| each other to say some variety of "I don't use Facebook."
| danielovichdk wrote:
| You made my day. Thank you
| jollybean wrote:
| My mother uses FB/Messenger to talk to her children and
| grandchildren.
|
| My extended family uses FB to share info about events.
|
| This, and other pedantic activities are really common
| around the world.
|
| Don't reduce the material reality a situation to a meme
| that that represents a personalized view.
| slivanes wrote:
| These things didn't start because FB was invented.
| jollybean wrote:
| They did.
|
| My family didn't share online before FB.
|
| My mother didn't really have a common means to
| communicate with her grandchildren in the same way.
|
| Email, phone are just not the same.
|
| There are more channels available now for sure, but none
| so ubiquitous.
|
| Facetime is not displacing FB for a lot of things, but
| that's more direct.
|
| 'Everyone is on FB' is the reason it still holds in these
| kinds of uses cases.
|
| None of us case one way or the other about the platform,
| we'll just use what's convenient, but that is what it is.
|
| This is a very common theme among FB users. FB by the
| way, is still growing it's userbase, and growing revenues
| even more so. The themes we see here on HN and even in
| the news don't represent the views among the population,
| nor are they necessarily very close to material reality.
| i_like_apis wrote:
| and a lot of people are addicted to nicotine
| [deleted]
| zemo wrote:
| I know you think this is some sort of neutral comment about
| personal choice, but it isn't. Millions of underserved
| people all over the world live in Food Deserts
| (https://en.wikipedia.org/wiki/Food_desert), places with
| little to no access to affordable nutritious food. Those
| people wind up consuming a large portion of their calories
| from high fructose corn syrup, not because they have chosen
| to do so, but because they have no choice, and that is
| their only option. Whether you want to accept it or not,
| your comment is classist and makes HN a more hostile place.
| wrycoder wrote:
| People don't eat straight corn syrup. The products they
| do eat that contain it are quite expensive per calorie.
| I.e. Coke.
|
| The problem is initiative and knowledge. They should walk
| or ride a couple of miles and buy the biggest bags of
| rice and beans they can, along with a bottle of
| multivitamins. And then learn how to cook.
|
| If that's classist, then the classes are structured by
| knowledge and choices. Which they may well be.
| zemo wrote:
| The entire reason that high fructose corn syrup is so
| prevalent in low-cost foods is that it's cheaper than
| sugar, especially in the US because of corn subsidies.
| Find literally any evidence that HFCS is more expensive
| per-calorie than sugar and you will come up empty-handed.
|
| > If that's classist, then the classes are structured by
| knowledge and choices. Which they may well be.
|
| class by its definition accounts for massive difference
| in access to resources. If you think access to resources
| doesn't measurably change the level of knowledge that a
| population has, that's a declaration that resources do
| nothing, which would be an odd stance to take on a
| knowledge-focused community website.
|
| > They should walk or ride a couple of miles and buy the
| biggest bags of rice and beans they can, along with a
| bottle of multivitamins.
|
| I just LOVE the subtle food choice of rice and beans
| here, paired with the recommendation to take
| multivitamins, a recommendation that is supported by
| little to no evidence. Your own lack of knowledge on this
| topic is in full display, as is a clear demonstration of
| your own biases across multiple dimensions.
| jb1991 wrote:
| Here in Europe, WhatsApp actually powers many neighborhood
| watch groups, and so when it goes down, basically a formal
| crime reporting system also goes down.
| sneak wrote:
| This also means that you can't participate in a
| neighborhood without agreeing to a legal contract with
| Facebook to use their services, as well as submitting to
| ad surveillance and tracking.
|
| That's a dick move by the neighborhood.
| belter wrote:
| Neighbors watching Neighbors and reporting via
| WhatsApp...sounds like the Netherlands.
|
| I think if its staying down for a few more days
| Canibalism will ensue by the end of the week.
| mitigating wrote:
| How is that a good comparison? Not everyone uses Facebook
| out of habit, some businesses need it, and it can be used
| for good things as well as bad because it's just a medium
| in which people post content
|
| Yes how that content is presented, ranked, etc is
| controlled by Facebook but that contribution is less than
| the content itself.
|
| It would be better to say it's the spoon in which someone
| could eat a sugary cereal or something healthy.
| finfinfin wrote:
| Are you a Facebook employee? Your justification sounds a
| lot like the internal propaganda that is being fed to
| employees. "Facebook is net positive", "it's just a
| tool", etc
| Qi_ wrote:
| The argument was that Facebook is neutral as a platform.
| Similar to the internet, it serves all kinds of content.
| Some of the content is good, and some is bad. That
| doesn't necessarily mean the platform is good or bad.
| ric2b wrote:
| Facebook is not a neutral platform. It has a lot of
| moderation and algorithmic ranking of posts.
| solmag wrote:
| It is a good start.
| belter wrote:
| I felt a great disturbance in the Force, as if millions of
| voices suddenly cried out in terror and were suddenly
| silenced....Finally!
| lucidbee wrote:
| The timing of this is so rich in irony I can't help but wonder if
| there is an element of internal sabotage. How many FB employees
| hate FB right now? The latest expose of FB is both effective and
| truly awful. I can't imagine feeling good about a FB job. And
| it's gotten worse! Now they look like they can't even keep their
| websites up.
| greeklish wrote:
| Can we really ever know? There are million of $ at stake!
| ttobbaybbob wrote:
| Perhaps we'll find out. As fun as internal sabotage would be,
| schadenfreude-wise, i think it much more likely this will turn
| out to be a time when Hanlon's Razor applies
| someonehere wrote:
| When I worked there they were all about open source projects to
| build it themselves and control the service. Well, when your
| whole company is run on one DNS service this is going to bite you
| in the butt.
|
| I only know of a handful of Saas apps they didn't build
| internally. Sadly none of those will help them get out of this
| situation.
| scumcity wrote:
| hrm, bgp and dns. It's weird when decades old technology somehow
| fails like this. The main reason distributed systems is hard is
| because of the time component. Whenever you add timeouts to an
| algorithm, everything becomes orders of magnitude more difficult
| to reason about, as the number of states grows without bound. In
| any case, this is an epic outage and sad.
| ionwake wrote:
| Did I just read that the Facebook IRC fallback went down too?!? I
| was about to say what's wrong with freenode ( but yeah on 2nd
| thoughts let's not talk about freenode )
| baalimago wrote:
| Aren't there places around poorer countries where Facebook is
| basically an ISP? What about them? They have literally 0 info.
|
| https://tcrn.ch/3kOHco1
| baby wrote:
| Wow. I can't remember the last time whatsapp was done. I pretty
| much use messenger/instagram/whatsapp to talk to most of my
| friends and family. I'm happy that I do use other platforms
| otherwise I would be completely cut off from my parents right
| now.
| LuisMondragon wrote:
| Facebook employees unable to enter buildings this morning to
| begin to evaluate extent of outage because their badges weren't
| working to access doors.
|
| https://twitter.com/sheeraf/status/1445099150316503057
| nwatson wrote:
| I guess the "prophets" at Victory Channel / Flashpoint called
| down Holy Fire on the Facebook infrastructure in retribution ...
| https://youtu.be/FbSkFuvqFdA?t=1127 . (I'm an Evangelical
| Christian but those folks are nuts ... Mario Murillo, Lance
| Wallnau, Hank Kunneman, Gene Bailey, etc.)
| midnightdiesel wrote:
| Hopefully it never comes back up!
| freediver wrote:
| Terrible day for many people. Both working for Facebook and those
| depending on their services.
| jacke wrote:
| They made they own BGP tools and looks like it failed
| https://www.youtube.com/watch?v=wHfYUbKNEyc
| taftster wrote:
| What I think is interesting is the effects of this type of thing
| across peripheral news sites, like HN. I wonder how much spike HN
| gets with people rushing here to find out what's going on and to
| read the (articulate) related discussions.
| johnbaker92 wrote:
| Let this be permanent - not a huge loss for humanity.
| fartingflamingo wrote:
| The whistleblower said she wanted to fix Facebook.
|
| Mission accomplished, I'd say. For now at least.
| polynomial wrote:
| "It's always DNS."
| baalimago wrote:
| Aren't there places around poorer countries where Facebook is
| basically an ISP? What about them?
|
| https://tcrn.ch/3kOHco1 2Africa cable, as an example
| rvz wrote:
| From [0]
|
| > ...there is no limit to the scandals, leaks, whistleblowers,
| lawsuits or penalties that will bring the Facebook mafia down.
|
| Fine. 'Literally' bringing the Facebook mafia down like that
| would do.
|
| But only for now.
|
| [0] https://news.ycombinator.com/item?id=28742179
| lewich wrote:
| Hopefully permanently.
| aaomidi wrote:
| Yeah let's keep it that way.
| jose-cl wrote:
| In this context, I remember youtube+pakistan issue[1]. I also
| wonder how an AS/BGP manager do his/her job... I imagine a
| guy/girl changing a text file in a old console. Anyone knows?
|
| [1] https://www.infoworld.com/article/2648947/youtube-outage-
| und...
| Bwild wrote:
| They must have shut it off and turned it back on
| shkkmo wrote:
| I keep getting non-dns errors from Hackner News as well. There
| appears to be some sort of broader incident happening?
|
| It's not just lag, I keep getting the "We're having some trouble
| serving your request. Sorry!" page.
|
| Edit: HN related thread
| https://news.ycombinator.com/item?id=28749476
| jbverschoor wrote:
| It's funny that hackernews is now overloaded with distracted
| people ;-)
| milankragujevic wrote:
| Also Speedtest.net for me is showing a 503 error page. Seems a
| large CDN might be having problems. Their status page shows all
| green. FB and their other sites are also down.
|
| edit: I see it's back up and I've been getting downvoted, here's
| a screenshot of the error for clarity
|
| https://i.imgur.com/wvhOwwL.png
| chki wrote:
| If Facebook and WhatsApp and Instagram fails there are probably
| a lot of people checking whether their Internet works. That
| might be why Speedtest was overwhelmed.
| thedudeabides5 wrote:
| Checked isitdownrightnow.com and said Netflix was also down. Any
| chance these are related?
| robjan wrote:
| Netflix is up
| synaesthesisx wrote:
| It's quite a coincidence for this to coincide with the
| whistleblower report + rumors of Peter Thiel (perhaps via
| Palantir?) involved in leveraging FB for the 2022 midterm
| elections.
|
| I'm not suggesting that this is the case, but a failure of this
| scale (with internal systems also down) could allow scrubbing of
| evidence without leaving traces.
| wejick wrote:
| Would be very interesting if they release the RCA to the public
| cheese_van wrote:
| Sir, I don't care who you are, you must open a ticket.
| durnygbur wrote:
| Rajeesh FFS, get off HN! We have the world on fire!
| nabeards wrote:
| Seems to be affecting all Facebook properties.
| mrweasel wrote:
| Instagram just returns a 503. Crazy how closely everything
| seems to be integrated.
|
| I'd guess internal networking issues, but the insane that
| something can bring down all of Facebooks properties.
| keithnoizu wrote:
| some poor engineer is sobbing over a split brain mnesia cluster
| right now praying to get the thing back up.
| [deleted]
___________________________________________________________________
(page generated 2021-10-04 23:00 UTC)