[HN Gopher] Slack is down
___________________________________________________________________
Slack is down
Author : gpmcadam
Score : 830 points
Date : 2021-01-04 15:17 UTC (7 hours ago)
(HTM) web link (status.slack.com)
(TXT) w3m dump (status.slack.com)
| tmsh wrote:
| My take last time: https://news.ycombinator.com/item?id=24690175
| anonymou2 wrote:
| Everyone back to IRC!
| betaby wrote:
| Relevant https://blog.apnic.net/2020/09/21/why-irc-is-still-
| good-in-c...
| clashmeifyoucan wrote:
| Seems like Notion has a service interruption going on as well:
| https://status.notion.so/
|
| Potentially related?
| koenigdavidmj wrote:
| Other data points:
|
| * iMessage was taking its sweet time sending a few texts this
| morning.
|
| * I had momentary trouble trying to call a business from my
| Verizon phone, and someone I know had trouble calling from AT&T.
|
| Could just be a coincidence, but I wonder if something larger-
| scale is happening.
| lm2s wrote:
| Yes, I'm wondering that also.
|
| Todoist was having issues and iOS app launching from Xcode
| started taking a lot of time in the middle of the day (which
| reminds me of the app online check fiasco not so long ago).
|
| Even HN seems to be a bit slower.
| rntrg wrote:
| I came here wondering the same thing as I've encountered a
| handful of availability issues this morning.
|
| * Todoist MacOS app is having trouble talking to its API
| mam2 wrote:
| Why would you use cloud for that ? We have mattermost at my
| startup
| [deleted]
| fermienrico wrote:
| Slack/Notion. Stop with the features already. You're killing
| yourself in slow motion. Focus on performance, robustness of your
| infrastructure.
| bengale wrote:
| Not a great first day back for their ops team.
| rexreed wrote:
| I'm migrating to Rocket.chat on Digital Ocean as we speak. Has
| anyone else made the move or tested Rocket.chat?
| why5s wrote:
| My old company used a mix of Slack and RocketChat.
| Functionally, it's fine but I was never a big fan of the UI and
| how attachments were handled. Also, cross-channel search was
| kinda bad. Mind you, this was well over a year ago so I'm sure
| things have improved.
| [deleted]
| 2mol wrote:
| Zoom as well, no? Beautiful to see the whole team grinding to a
| halt. Or maybe everybody finally is getting some time for deep
| focused work.
| lol768 wrote:
| These events seem to be happening almost on a monthly basis now.
| IRC was never this unreliable and at least with netsplits it was
| obvious what had happened because you'd see the clients
| disconnect.
|
| IME messages just fail to send with Slack, then you can retry but
| they're not properly idempotent and you end up sending the
| messages twice.
|
| It's really poor.
| coldcode wrote:
| My feeling is this is an AWS issue. Our services hosted in AWS
| are not working either.
| tcgv wrote:
| Downdetector indicates a possible correlation between Slack
| issues and AWS reports, eventhough Slack peaks at 13960
| problems and AWS at 111:
|
| - https://downdetector.com/status/slack/
|
| - https://downdetector.com/status/aws-amazon-web-services/
| worldsayshi wrote:
| Most people that are affected by AWS outages wouldn't
| report it as such...
| the_duke wrote:
| What issues are you seeing on AWS?
|
| The dashboards are all green. (which doesn't mean that much
| ... I'm aware...)
| tedmiston wrote:
| Do you have more info like services, regions etc? I see all
| green checks on the AWS Status page.
| floatingatoll wrote:
| Down detector shows quite a lot of issues across a broad
| spectrum of services, including AWS _and_ Google.
| coldcode wrote:
| Our prod systems seem to be working, but our lower
| environments seems to be not working. I don't know enough
| about where these things come from. I wonder if the real
| problem is regional. Some connections work and some don't.
| dijit wrote:
| > I see all green checks on the AWS Status page.
|
| I'm sure you know this already, but that status page isn't
| worth the cycles on your CPU, you would be better served
| asking the toaster if AWS is functioning properly than
| checking that status page.
| tedmiston wrote:
| Of course, yeah, but at least you can sometimes see a
| yellow and infer it really means red :/.
| WrtCdEvrydy wrote:
| yellow requires an issue that every customer is aware of
| and red requires a thermonuclear strike.
| yreg wrote:
| If one's smart toaster depends on AWS one might very well
| do that.
| baytrailcat wrote:
| I thought Slack was not on AWS, but Oracle.
| zwily wrote:
| You may be thinking of Zoom, who signed a massive Oracle
| contract.
| bananaoomarang wrote:
| I have also been having intermittent issues with Twitter also
| this morning (can't load tweets etc) and was wondering if it
| was connected.
| slezyr wrote:
| Naaah, Twitter always fails to load for me. It's more
| surprising if it loads from the first attempt.
| derwiki wrote:
| Which AZ?
| daze42 wrote:
| Availability zones are unique for each account. So my zone
| A could be your zone C, for example.
| boie0025 wrote:
| I never knew this, but I think it makes sense. Is there
| any documentation that explains why this is the case? I
| suspect it is to distribute bias to the first option, but
| I'd love to read about it.
|
| [edit] Nevermind, I just needed the right combination of
| terms to find it:
| https://docs.aws.amazon.com/ram/latest/userguide/working-
| wit...
| deepakhj wrote:
| This is so everyone doesn't launch in one zone, "us-
| east-1a".
| derwiki wrote:
| Woah, thanks for clarifying--I had no idea!
| karmicthreat wrote:
| EFnet was always splitting every few hours. I don't really miss
| IRC compared to modern chat systems.
| dkdk8283 wrote:
| I spent a few hours setting up a chat client for a reason.
| Slack takes all this away from me.
| pmlnr wrote:
| I do miss them, terribly. Lightweight, fast, brutally simple.
| Even with splits, it was better, and ever since IRC bouncers
| exist, like ZNC, they are rock solid.
| randylahey wrote:
| I'm the opposite. Back when in my early teens, friends and I
| would attempt to hijack opposing groups' channels via
| takeovers during net-splits (and ofcourse having the same
| done to us). What a time to be alive.
| stormcode wrote:
| Oh yeah, those were the days. Causing server splits to get
| your nick back that was stolen in a previous server
| split...
| libraryatnight wrote:
| In the early battle.net days competing clans would split
| and steal channels. It was tons of fun. Taught me lots
| about bots, proxies, simple scripting, in the process too.
| bdcravens wrote:
| Our company uses Cliq. I wouldn't say that it's as good as
| Slack, but it's probably 80-90%, and even has a few unique
| features (integration into Zoho's suite, remote work checkin,
| integrated bot development environment, etc)
| 013a wrote:
| Its especially strange when you think about how unoriginal
| Slack's product domain is, and how comparable, and in some
| cases small, their userbase is.
|
| * iMessage, which likely handles something in the range of
| 750M-1B monthly actives.
|
| * WhatsApp, 2B users [1], though no clarity on "active" users.
|
| * Telegram, 400M monthly actives [2]
|
| * Discord, 100M monthly actives [3]
|
| * Slack, 12M daily actives [4]
|
| * Teams, which is certainly more popular than Slack, but I
| shudder to list it because its stability may actually be worse.
|
| The old piece of wisdom that "real-time chat is hard" is
| something I've always taken at face-value as being true,
| because it _is_ hard, but some of the most stable, highest
| scale services I 've ever interfaced with are chat services.
| iMessage NEVER goes down. I have to conclude that Slack's
| unacceptable instability, even relative to more static services
| like Jira, is less the product of the difficulty of their
| product domain, and moreso something far deeper and more
| unfixable.
|
| I would not assume that this will improve after they are fully
| integrated with Salesforce. If your company is on Slack, its
| time to investigate an alternative, and I'm fearful of the fact
| that there are very few strong ones in the enterprise world.
|
| [1] https://blog.whatsapp.com/two-billion-users-connecting-
| the-w...
|
| [2] https://techcrunch.com/2020/04/24/telegram-
| hits-400-million-...
|
| [3] https://wersm.com/discord-reaches-100m-monthly-active-
| users-...
|
| [4] https://www.cnbc.com/2019/10/10/slack-says-it-
| crossed-12-mil... (this was also announced on Slack's blog, but
| that's down).
| jjice wrote:
| I didn't realize that Discord has way more active users than
| Slack. I'm glad, Discord is a fantastic service in my
| experience. It's a shame they got shoe horned into a mostly
| gaming oriented service. I've never had a class or worked
| somewhere where Discord was a considered solution instead of
| Slack, but I can't think of anything that Slack does better
| (in my experience). In general, I think Discord has the best
| audio and video service that I've used, especially kicking
| Zoom to the curb.
| gruez wrote:
| >I didn't realize that Discord has way more active users
| than Slack
|
| Keep in mind you're comparing daily active users vs monthly
| active users. I'd guess most slack users are online weekday
| for pretty much the entire day (because it's for work and
| your boss expects you to be online), whereas a good chunk
| of discord users are only logging in a few hours a week
| when they're gaming.
| 013a wrote:
| At 12:00pm EDT on a workday:
|
| Minecraft official server: 190k online users. | Fortnite
| official server: 180k online users. | Valorant official
| server: 170k online users. | Jet's Dream World
| (community): 130k online users. | CallMeCarson server
| (YouTuber): 100k online users. | Call of Duty official
| server: 90k online users. | Rust (the game) official
| discord: 80k online users. | League of Legends official
| server: 60k online users. | Among Us official server: 50k
| online users.
|
| Their scale is insane. Even with their usage spiking
| during after-hours gaming in major countries, their
| baseline usage at every hour of the day, globally, makes
| it one of the most used web services ever created.
|
| Slack's DAU and MAU numbers are probably pretty close to
| one-another. Discord's MAU/DAU ratio is probably bigger
| than Slack's. That just means that Discord is, again,
| solving a harder problem; they have much bigger (and more
| unpredictable) spikes in usage than Slack. Yet, its a far
| more stable and pleasant product.
| jhgg wrote:
| Our secret sauce is Elixir/BEAM and Rust :)
|
| Well for the real time side, I can't tell you how big a
| boon it's been to build our platform on top of
| Elixir/BEAM. Hands down the best runtime / VM for the job
| - and a big big secret to our success. Where we couldn't
| get BEAM fast enough - we lean on rust and embed it into
| the VM via NIFs.
|
| 2021 is the year of rust - with the async ecosystem
| continuing to mature (tokio 1.0 release) we will be
| investing heavily in moving a lot of our workloads from
| Python to Rust - and using Rust in more places, for
| example, as backend data services that sit in front of
| our DBs. We have already piloted this last year for our
| messages data store and have implemented such things as
| concurrency throttles and query coalescing to keep the
| upstream data layer stable. It has helped tremendously
| but we still have a lot of work to do!
|
| To help scale those super large servers, in 2020 we
| invested heavily in making sure our distributed system
| can handle the load.
|
| Did you know that all those mega servers you listed run
| within our distribution on the same hardware and clusters
| as every other discord server - with no special tenancy
| within our distribution. The largest servers are
| scheduled amongst the smallest servers and don't get any
| special treatment. As a server grows - it of course is
| able to consume a larger share of resources within our
| distribution - and automatically transitions to a mode
| built for large servers (we call this "relays"
| internally.) At any hour, over a hundred million BEAM
| processes are concurrently scheduled within our
| distributed system. Each with specific jobs within their
| respective clusters. A process may run your presence,
| websocket connection, session on discord, voice chat
| server, go live stream, your 1:1/group DM call, etc. We
| schedule/reschedule/terminate processes at a rate of a
| few hundred thousand per minute. We are able to scale by
| adding more nodes to each cluster - and processes are
| live migrated to the new nodes. This is an operation we
| perform regularly - and actually is how we deploy updates
| to our real time system.
|
| I was responsible for building and architecting much of
| these systems. It's been super cool to work on - and -
| it's cool to see people acknowledge the scale we now run
| at! Thank you!! It's been a wild ride haha.
|
| As for scale, our last public number perhaps comparable
| to Slack is ~650 billion messages sent in 2020, and a few
| trillion minutes of voice/video chat activity. However
| given the crazy growth that has happened last year due to
| COVID - the daily message send volumes are well over the
| 2 billion/day average.
| kevinqi wrote:
| Assuming this is real - very interesting read, thanks for
| sharing.
| davidwparker wrote:
| We use Discord exclusively at my day job.
|
| We have a few bots we've integrated with things
| (deployment, stats, etc).
|
| We use it for all our voice/video calls.
|
| Edit: We've got roles setup well for things like
| contractors, devs, marketing, etc, so it's easy to lock
| down different conversations in channels.
|
| It's been fantastic.
|
| The only thing I'm not a huge fan of is the (IMO) poor
| implementation of threaded discussions.
|
| Edit: it definitely has issues with connectivity from time-
| to-time too, but not bad overall.
|
| TBH, I'm not sure why companies use Slack (I use it for
| other organizations, so have experience with it too, but
| not extensive).
| pr0zac wrote:
| I really agree Discord is amazing and wish I could use it
| for work instead of Slack.
|
| I think the big things that prevent it from being adopted
| more for professional use is the lack of a threading model
| (even though I hate it when people use threads in Slack)
| and the whole everyone in every channel except for role-
| based privacy settings. The second one especially is a big
| deal because you can't do things like team-only channels
| without a prohibitive amount of overhead.
|
| That said (with zero knowledge of their architecture) I
| have to feel like both of those missing features aren't too
| terribly hard to build. Its very likely Discord is growing
| as a business fast enough on the gaming and community
| spaces they don't feel the added overhead of expanding into
| enterprise (read: support, SLAs, SOC, etc) makes sense and
| are waiting until they need a boost to play that card.
| joshstrange wrote:
| > I think the big things that prevent it from being
| adopted more for professional use is the lack of a
| threading model
|
| They do have a threading model now (if you are talking
| about replying to a message in a channel and having your
| reply clearly show what you are responding to). If you
| are talking about 1-on-1 chats with other people in your
| same server then yes, that is still lacking IMHO in
| discord. The whole "you have to be friends" to start a
| chat (or maybe that's just for a on-the-fly group) is
| annoying.
| 013a wrote:
| Discord gives every user an identity that is persistent
| beyond the server; you have a Discord account, not a
| server account. Slack does the opposite. Enterprises
| would hate Discord's model, as they prefer to control the
| entire identity of every user in their systems, such that
| when they leave the company they can destroy any notion
| of that identity ever existing.
| joshstrange wrote:
| Absolutely agree. I like the 1 main discord account but I
| wish I could have 1 "identity" per-server as well. I
| don't love that I am in some discords that I don't want
| tied to my real name and others where I've known these
| people for over a decade and would see in person multiple
| times a week (before the pandemic). I know you can set
| your name per-server but you can't hide your discord
| username (or make it per-server) which sucks.
| LambdaComplex wrote:
| Agreed completely. Discord has always been much smoother
| for me than Slack, and the voice/video chat quality is
| literally the best I've ever seen anywhere. If they made
| their branding a bit more professional and changed the
| permission model from the (accurate) garbage you
| described to something closer to Slack then I think Slack
| would be doomed.
| 013a wrote:
| Discord is definitely in the same realm of scale as Slack,
| and probably bigger (they publish different metrics, so its
| hard to say for sure).
|
| The really impressive thing about Discord's scale is the
| size of their subscriber pools in the pub-sub model.
| Discord is slightly different than Slack in the sense that
| every User on a Server receives every message from every
| Channel; you don't opt-in to Channels as in Slack, and you
| can't opt-out (though some channels can be restricted to
| only certain roles within the Server, this is the minority
| of Channels).
|
| Some of the largest Discord servers have over 1 million
| ONLINE users actively receiving messages; this is mostly
| the official servers for major games, like Fortnite,
| Minecraft, and League of Legends.
|
| In other words, while the MAU/DAU counts may be within the
| same order of magnitude, Discord's DAUs are more
| centralized into larger servers, and also tend to be
| members of more servers than an average Slack DAU. Its a
| _far_ harder problem.
|
| The chat rooms are oftentimes unusable, but most of these
| users only lurk. Nonetheless, think about that scale for a
| second; when a user sends a message, it is delivered (very
| quickly!) to a million people. That's insane. Then combine
| that with insanely good, low latency audio, and best-in-
| class stability; Discord is a very impressive product,
| possibly one of the most impressive, and does not get
| nearly enough credit for what they've accomplished.
|
| For comparison; a "Team" in Microsoft Teams (roughly
| equivalent to a Discord Server or Slack Workspace) is still
| limited to 5,000 people.
| gilbetron wrote:
| It isn't just # of users, though - SlackOps is probably
| unique to Slack in that list (minus Teams, I guess) - so # of
| messages per month is a better metric. Not that I'm letting
| Slack off the hook, it still may be that their codebase
| and/or dev process is just nasty.
| gmmeyer wrote:
| Slack and the others have different contractual guarantees
| and different regulatory environments. Comparing them is not
| really fair because the reality is that these other services
| probably just lose tons of messages and slack/teams can't do
| that! They have to have better guarantees.
| johnmaguire2013 wrote:
| IME, Slack is far more likely to lose my message than
| iMessage. I believe that's part of the point being made
| above.
| gmmeyer wrote:
| I've never had slack lose a message when it's up
| JMTQp8lwXL wrote:
| To be fair, IRC doesn't do a lot of things Slack does. Where is
| the logging and audit trails, access control, search, etc.
| ghostpepper wrote:
| At least when irc goes down you can still access your logs
| welterde wrote:
| Several IRC servers do have support for authentication and
| access control (and audit trails as well I suppose).
|
| Only centralized history/logging and search would need to be
| bolted on if needed. In the non-centralized case your IRC
| client takes care of all of that.
| capitainenemo wrote:
| You might be interested in IRCv3. https://ircv3.net/irc/
| https://ircv3.net/software/servers
| https://ircv3.net/software/clients
| dkdk8283 wrote:
| Lack of logs and history is a feature not a bug.
| JMTQp8lwXL wrote:
| For business users, there are regulatory requirements.
| You need to keep information around for some period of
| time, but not forever. History and searching is useful
| for spreading tribal knowledge throughout an
| organization.
| welterde wrote:
| Does that actually extend to Slack/slack-like things
| though?
|
| Since I would see Slack more of a replacement for phone
| calls or hallway discussions. Neither of which typically
| has any logs or recordings (and I wouldn't want to work
| somewhere that did keep such logs).
| pr0zac wrote:
| It does yes. This is why for example message history data
| export is a paid feature. Its a requirement for certain
| types of compliance.
| welterde wrote:
| In what areas would you find such requirements? And
| shouldn't the default position be that it is illegal to
| keep those logs? Especially those involving direct
| messages between employees.
| spicybright wrote:
| I'm still dreaming of a world where everyone uses IRC through
| an interface identical to Slack or Discord or whatever, and
| features like these are implemented.
| thinkmassive wrote:
| You might appreciate matrix.org / element.io if you haven't
| seen them yet
| djsumdog wrote:
| I setup my own Matrix homeserver recently with several
| bridges to all my current chat services:
|
| https://battlepenguin.com/tech/matrix-one-chat-protocol-
| to-r...
|
| It works fairly well.
| dijit wrote:
| You might like irccloud; it's a web client (similar to
| slack) and bouncer, with support for image uploads, has a
| decent app, preserving history and I think it supports
| search too.
| welterde wrote:
| Not really a fan of the Slack or Discord user interface
| myself, but there are modern looking web clients for IRC
| such as thelounge[0] or kiwiirc[1] that might be what you
| are after.
|
| [0] https://thelounge.chat/ [1] https://kiwiirc.com/
| curryst wrote:
| I agree in principle, but IRC is a poor way to do this. I
| love IRC for it's simplicity, but that makes it hard to do
| more advanced features. It's a text-only protocol (other
| than DCC), so if you want to do something like allow users
| to click phone numbers to dial them then you have to regex
| it and hope for the best. Any kind of link is the same way.
| If you want to show images inline, you'll have to search
| for links, then either do another regex to see if the link
| is an image or prefetch the page to see if it's an image.
| Most servers still implement user authentication as a
| secondary service (i.e. it isn't part of the IRC server
| itself) afaik. I think the newer IRC specs include those,
| but support for it is missing in many servers.
|
| Really a huge part of IRC's difficulty and beauty is in not
| having a markup language, but most of that beauty is for
| the eyes of the developer, not the user.
|
| I like the concept of Matrix. That's kind of what they're
| trying to do by creating an open protocol, but when I
| looked at implementing a client it was non-trivial. For
| IRC, you can usually send someone a telnet log of you
| joining an IRC server and they could implement a client. I
| don't get the impression that that's true for Matrix.
| Arathorn wrote:
| https://news.ycombinator.com/item?id=20948530 is my
| attempt to demonstrate that implementing a Matrix client
| is almost as trivial as telnetting to port 6667 on an IRC
| server, fwiw :)
| seibelj wrote:
| Down Detector showed a lot of different services suffering
| downtime at the same time https://downdetector.com/
|
| I wonder if it's an AWS region issue
| partiallypro wrote:
| I have stopped using Down Detector as an accurate measure
| because a lot of "outages" are just people having issues with a
| service unrelated to the service they are reporting as down.
| Ex: AT&T outage in Nashville caused people to report Xbox Live
| as down, when it wasn't actually down, etc.
| staticelf wrote:
| I have issues reaching a lot of sites, especially american.
| Both downdetector, hacker news and others loads extremely slow
| or not at all. Downdetector had a bunch of failed resources for
| me..
| clashmeifyoucan wrote:
| > We're still investigating the ongoing connectivity issues with
| Slack. There's no additional information to share just yet, but
| we'll follow up in 30 minutes. Thanks for bearing with us.
|
| Seems to be working intermittently, however.
| exabrial wrote:
| we run a copy of this as a super duper backup:
| https://github.com/shazow/ssh-chat
| ibraheemdev wrote:
| > Customers may have trouble loading channels or connecting to
| Slack at this time. Our team is investigating and we will follow
| up with more information as soon as we have it. We apologize for
| any disruption caused.
|
| - Jan 4, 10:14 AM EST
|
| The status for messaging and connection services has been marked
| as [incident]
|
| https://status.slack.com/
| ibraheemdev wrote:
| All services have now been marked as [outages]:
|
| > We're continuing to investigate connection issues for
| customers, and have upgraded the incident on our side to
| reflect an outage in service. All hands are on deck on our end
| to further investigate. We'll be back in a half hour to keep
| you posted.
|
| - Jan 4, 11:20 AM EST
|
| > There are no changes to report as of yet. We're still all
| hands on deck and continuing to dig in on our side. We'll
| continue to share updates every 30 minutes until the incident
| has been downgraded
|
| - Jan 4, 11:52 AM EST
| Tistel wrote:
| I just got an invite to join my company's Google chat.
| lwedel wrote:
| as an alternative - Cisco has - webex, less known, but does the
| work: https://status.webex.com/service/status?lang=en_US
| decisionsmatter wrote:
| Just seems slightly disingenuous to me to have "100% uptime" on
| the same page that says there is a current major outage.
| ajakate wrote:
| Especially when it's "uptime for the current quarter." A three
| hour outage since Jan 1 is already a 3% downtime automatically
| I would think...
| dustymcp wrote:
| We got a discord server as a backup when teams is down, seems
| like it has gotten worse with days being down entirely, and we
| have to resort to discord voice which always seems to be up..
| vehemenz wrote:
| I was seeing issues about 10 minutes before their system status
| page was updated. I'm surprised they don't have automatic
| monitoring of some kind.
| loginatnine wrote:
| Status pages are probably manually updated. You don't want a
| false positive/bug in your monitoring to affect your public
| metrics.
| vehemenz wrote:
| Fair enough. Though I'm not sure how I'd feel about the whole
| world knowing about my service's outage before I do.
| histriosum wrote:
| I'm positive that they have internal monitoring, and
| probably knew about the issues well before they decided to
| manually update their status page to reflect the issue.
| Manually updating the status page does not equal no
| monitoring, after all.
| [deleted]
| kamyarg wrote:
| Atleast their status page works -\\_(tsu)_/- (Looking at you
| AWS).
|
| I am really looking forward to a better competitor taking over
| their market share, I presume things will only get worse after
| Salesforce acquisition.
| pwned1 wrote:
| This site seems to be lagging as well. Or is it just me?
| kulor wrote:
| Witnessed this when Google went down in December. Seems like a
| thundering herd problem with tech folk flocking to HN for
| updates/discussion/gloating
| dang wrote:
| Yes: https://news.ycombinator.com/item?id=25635115
| mrlala wrote:
| Struggling for me as well.. reddit is as well. But I'm finding
| other sites are just fine.
| MattGaiser wrote:
| Not just you. It is slow for me as well (Calgary, Canada).
| ChrisMarshallNY wrote:
| Me too (New York).
| fleaaa wrote:
| Same here (Berlin).
| stefan_ wrote:
| It's struggling. All the people freed up that think chatting in
| Slack is productivity?
| alexiacob wrote:
| Same here (London)
| tannhaeuser wrote:
| Other than it being days of slow news, with top stories
| seemingly pinned for days now and boring, no ;) you know, where
| Slack being down is considered newsworthy (yawn).
| rootusrootus wrote:
| It's quite fast for me.
|
| Edit: Not consistently, I guess. 9 out of 10 times it responds
| instantly, then it lags once in a while.
| MattGaiser wrote:
| It is the replying that is particularly slow for me.
| [deleted]
| Scandiravian wrote:
| Several sites and services are lagging or slower than usual for
| me (facebook messenger, news sites, google)
| easton wrote:
| It isn't just you, but due to Hacker News' right-sized[0]
| infrastructure, you should sign out unless you need to comment.
| That way you hit the caches instead of getting the server to
| make you a new page.
|
| 0: https://news.ycombinator.com/item?id=12911461
| btmiller wrote:
| That's the wrong way to look at it. If HN struggles in
| certain situations then it is not right-sized. You don't beg
| of users to walk an unintuitive happy path (i.e. logout when
| not commenting).
| kortilla wrote:
| It is "right-sized" if it happens so rarely that it
| realistically causes no problems.
| btmiller wrote:
| Rare is not never.
| minitoar wrote:
| Is this still true? Asking because that comment from dang is
| 4 years old.
| dagmx wrote:
| In the other threads, there's been some theorizing that some
| central infrastructure is down or struggling. Perhaps AWS or
| the like.
| Scandiravian wrote:
| It could just be an effect of people switching to
| "alternatives" from slack, effectively DDOSing those
| services. Notion just went down as well
| paulcarroty wrote:
| Good reason to try matrix or even setup a reserve (matrix)
| channel.
| smurda wrote:
| #hugops
| ChrisMarshallNY wrote:
| I don't use Slack that much, but I know plenty of people that
| work on teams that are probably at a standstill, right now.
|
| HN is also pretty slow...
| dkarl wrote:
| I think there are a lot of sites that would benefit from
| automatically scaling up whenever Slack goes down.
| CobsterLock wrote:
| i noticed the HN slow down too. maybe it isnt only a slack
| issue
| MattGaiser wrote:
| My company runs heavily on Slack. Part of my team got together
| in a video chat, but I have no idea what happened to everyone
| else in the company.
| Deukhoofd wrote:
| HN becomes slow because people notice a service is down, and go
| to HN to check for more info. When Google was down for an hour
| a couple weeks ago, HN became almost unusable.
| beamatronic wrote:
| I'm going to get so much work done today
| [deleted]
| xkeysc0re wrote:
| Looks like my vacation continues!
| blntechie wrote:
| I have always had this fantasy thinking of what happens when
| outages of one of these major service never come back online i.e.
| in this outage Slack loses info of all the accounts, users,
| messages etc.
|
| How would people react? What would engineers do to recover? I
| always found that idea fascinating.
|
| Imagine Google saying tomorrow that they lost all the accounts
| and emails. What kind of impact the world will have?
| JadoJodo wrote:
| My tangential thought in that regard is what if this is a
| really bad outage that causes Slack to tank (i.e. A large
| number of companies switch to Microsoft, Zulip, etc). Equally
| interesting a thought.
| codingdave wrote:
| That scenario is what Disaster Recovery plans are for. Every
| large company I've worked for has had recovery plans in place,
| including scenarios as disturbing as "All data centers and
| offices explode simultaneously, and all staff who know how it
| all works are killed in the blasts."
|
| You not only have backups in place, you have documentation in
| place, including a back-up vendor who has copies of the
| documentation and can staff up workers to get it up and running
| again without any help from existing staff.
|
| And we tested those scenarios. I'm not sure which dry runs were
| less fun - when you got paged at 3 AM to go to the DR site and
| restore the entire infrastructure from scratch... or when you
| got paged at 3 AM and were instructed to stay home and not
| communicate with anyone for 24 hours to prove it can be done
| with out you. (OK, so staying home was definitely more fun, but
| disturbing.)
| blntechie wrote:
| That's some really good thoughts on DR planning. I have never
| thought DR to be to such an extent.
|
| How many companies really plan for an event where their
| entire infrastructure goes offline and their entire team gets
| killed? Does even companies like Google plan for this kind of
| event?
| noir_lord wrote:
| The last company I worked for where I was (de facto) in
| charge of IT (small company so I wore lots of hats) could
| have recovered if both sites burnt down and I got hit by a
| bus since I made sure that all code, data and instructions
| to re-up everything existed off site, that both most senior
| managers understood how to access everything and enough to
| hand it to a competent firm with a memory stick and a
| password.
|
| In some ways losing your ERP and it's backups would be
| harder to recover from than both sites burning down,
| insurance would cover that at least.
| jacques_chester wrote:
| Yes, Google plans extensively and runs regular drills.
|
| It's hearsay, but I was once told that achieving "black
| start" capability was a program that took many years and
| about a billion dollars. But they (probably) have it now.
| blntechie wrote:
| So 'black start' is a program to start over from scratch?
| The scale required for it itself would be amazing.
| jcranmer wrote:
| "Black start" is a term that refers to bringing up
| services when literally everything is down.
|
| It's most often referred to in the electricity sector,
| where bringing power up after a major regional blackout
| (think 2003 NE blackout) is extremely nontrivial, since
| the normal steps to turn on a power plant usually
| requires power: for example, operating valves in a hydro
| plant or blowers in a coal/gas/oil plant, synchronizing
| your generation with grid frequency, having something to
| consume the power; even operating the relays and circuit
| breakers to connect to the grid may require grid power.
|
| The idea here is presumably that Google services have so
| many mutual dependencies that if everything were to go
| down, restarting would be nontrivial because every
| service would be blocked on starting up due to some other
| service not being available.
| twistedpair wrote:
| "black start" for GCP would be something to see. Since
| the global root keys for Cloud KMS are kept on physical
| encrypted keys locked safes, accessible to only a few
| core personnel, that would be interesting, akin to a
| missile silo launch.
| jacques_chester wrote:
| It would be amazing to see. But I hope we never have to.
| eecks wrote:
| The company I work for plans for that and it's definitely
| not FAANG. In fact, DR planning and testing is far more
| important than stuff like continuous integration, build
| pipelines, etc.
| twistedpair wrote:
| > How many companies really plan for an event where their
| entire infrastructure goes offline and their entire team
| gets killed?
|
| Since 9/11, more than you might think. For example Empire
| Blue Cross Blue Shield [1] had its HQ in the WTC.
|
| https://www.computerworld.com/article/2585046/empire-
| blue-[1... cross-it-group-undaunted-by-wtc-attack--anthrax-
| scare.html
| codingdave wrote:
| The two I've worked for that took it that far were a
| Federal bank, and an energy company. I have no idea how far
| Google or other large software companies take their plans.
|
| But based on my experience, the initial recovery planning
| is the hard part. The documentation to tell a new team how
| to do it isn't so painful once the base plan exists,
| although you do need to think ahead to make sure somebody
| at your back-up vendor has an account with enough access to
| set up all the other accounts that will need to be created,
| including authorization to spend money to make it happen.
| NyxWulf wrote:
| This scenario isn't as far fetched as people think. I was
| running a global deployment in 2012 when hurricane Sandy hit
| the east cost. The entire eastern seaboard went offline and
| was off for several days. Some data centers were down for
| weeks. Our plan had covered that contingency and we failed
| all of our US traffic to the two west coast regions of
| amazon. Our downtime on the east cost was around two minutes.
| Yet a sister company had only one data center in downtown New
| York, and they were offline for weeks, scrambling to get a
| backup loaded and online.
| dharmab wrote:
| I worked for a regional company in the oil and gas industry
| and the HQ and both datacenters were in the same earthquake
| zone. A twice per century earthquake had a real risk of
| taking down both DCs and the HQ. The plan would have been
| for every gas station in the vertical to switch to a
| contingency plan distributing critical emergency supplies
| and selling non-essential supplies using off-grid
| procedures.
| tclancy wrote:
| We might start to see actual legislation around implied SLAs in
| the US which would cause Google to rethink everyone's 20%
| project being rolled out for 2 years.
| ab_io wrote:
| It would be a mass customer extinction event for said service,
| and would effectively result in a windfall for competing
| services
| blntechie wrote:
| Services like Slack are replaceable to most extent. How does
| even replace a service like Google easily? There are like to
| like services available for Google but the data is where it
| becomes tricky. Almost 1bn people losing their email
| addresses could cause massive issues.
| lsaferite wrote:
| This should actually be part of your Disaster Recovery plan.
| You should have at least some plan for the loss of all of your
| service providers. Even if that plan is to sit in the corner
| and cry (j/k).
| Macha wrote:
| Happened to Ma.gnolia, which was the number 2 bookmarking site
| behind Del.icio.us in that era:
| https://en.wikipedia.org/wiki/Gnolia
|
| HN comments at the time:
| https://news.ycombinator.com/item?id=487497
|
| The site relaunched a month later and shut down for good a year
| after that
| MattGaiser wrote:
| Google would be catastrophic because so much is stored there.
|
| Slack is mostly real time communication, at least for me. There
| are a few bits and bobs that really should be documented that
| are in the messages though.
| thesuitonym wrote:
| If this thread is to be believed, apparently a lot of
| engineers use slack for alerting and don't know how to check
| their monitoring software manually.
| blntechie wrote:
| Yeah, Google would easily top the list of companies which can
| have catastrophic impact. Microsoft, Apple, Salesforce,
| Dropbox would be the next in the list I guess if we leave out
| the utility companies and internet providers etc.
| jon-wood wrote:
| Just look at the impact a 40 minute outage of Google Auth
| had last month, I wouldn't be surprised if the global
| productivity hit during that outage was in the billions of
| dollars, and that was for a relatively short outage without
| any data loss.
| sjg007 wrote:
| AWS outages have basically crippled a few businesses. The
| longest I know of was 8-10 hours the day before
| Thanksgiving. Some Bay Area food company got hit by it and
| couldn't deliver thanksgiving dinners.
| signed0 wrote:
| In 2011 a small amount (0.02%) of Gmail users had all their
| emails deleted due to a bug:
| https://gmail.googleblog.com/2011/02/gmail-back-soon-for-eve...
| They ended up having to restore them from tape backup, which
| took several days. Affected users also had all their incoming
| mail bounce for 20 hours.
| teagee wrote:
| This reminds me of what happened to the financial services
| Cantor Fitzgerald after 9/11, just replacing a system with
| hundreds of lost employees:
|
| https://www.nytimes.com/2014/11/19/magazine/the-secret-life-...
| nobody9999 wrote:
| I was at CF (at new offices, obviously) briefly a couple
| weeks after 9/11.
|
| They had backups and were able to recover data and systems.
|
| By the time I got there, they were somewhat functional.
|
| The biggest problems were the lack of knowledgeable
| personnel, not lost data or systems.
| teagee wrote:
| Thanks for sharing, for some reason I think about this
| story a lot. It must have been such an emotionally
| difficult time for everyone involved in piecing back
| together their processes.
| TwoPhotons wrote:
| That wouldn't be ideal.
| alexsey wrote:
| Being in DR,I live my life wondering about that too. I spend
| alot of extra time checking accounts and making sure that I
| print (yes, sneakernet) out important data as well as have
| manual copies of passwords. Its old school, but it removes the
| risk to my business in case of a total loss of a global service
| and lowers the risk of a heart attack and related stress.
|
| The rest of the world may not be so energetic re: their
| accounts and data, so it would be painful for many, it depends
| on their how much risk they are willing to experience.
|
| Being in DR, it is very difficult for businesses to allocate
| the time and resources to good planning - for many, DR is an
| insurance policy. Staff: engineering and development are
| focused on putting out fires however, a real DR is more than
| most companies can handle if they have not planned accordingly
| or practiced through testing failover/normalization processes
| as well as performing component-level testing.
| harikb wrote:
| Can we please trade in centralized Slack and single-sign-on and
| get back the netsplit of IRC :) ? At least I can chat with half
| of my colleagues :)
| exabrial wrote:
| For a product that is so simple, there are no good self-hosted
| alternatives. Mattermost and RocketChat are written very poorly,
| reliability and getting your data out is impossible.
|
| Slack goes down so often we're thinking of writing a very boring
| clone that uses ActiveMQ and MySQL, just because chat should be
| boring and needs to "just work".
| wrycoder wrote:
| Zulip can be self-hosted - have you looked at that? I like the
| threads implementation.
| ginja wrote:
| I was just considering setting up a Mattermost instance for our
| company since I used it for a year at a previous job without
| any issues (I was just a user though, I didn't deploy or
| maintain it). Just curious, why do you think it's poorly
| written or unreliable?
| exabrial wrote:
| We tried running it, so a lot of experience with it, and it
| wasn't great. It barely stayed online.
|
| For something so simple, you have to run a massive server,
| like gigs of ram and multiple core, even with a very modest
| user load. Take a look at the codebase, it's also a mess and
| impossible to fix any bugs. Finally, if you want to get your
| data out or report on the message activity, good luck, you'd
| be better off passing paper notes around. The open source
| version is nerfed a bit too, no LDAP authentication for
| instance, so it creates a lot of problems there too.
| offtop5 wrote:
| Like human beings, can we all imagine that every single service
| can have a couple of off days a year.
| bob1029 wrote:
| The entire point of all of the engineering we talk about around
| here is to produce services with inhuman capabilities and
| resilience.
| offtop5 wrote:
| Doesn't that feel like an oxymoron.
|
| If a human created it, it can never have in human
| capabilities
| oriettaxx wrote:
| I am so glad that at least today I do not hear that slak annoying
| sound. I do really think Slack is not helping me, at all, to
| concentrate on my job (system administrator): synchronous
| messages are really the worst, ever, while working: email is much
| much better
| AnIdiotOnTheNet wrote:
| Honestly I'm fairly sure the vast majority of "technology"
| we've deployed, as an industry, in the past 10-15 years has
| actively made life worse. I don't know about anyone else, but
| that's the opposite of why I got into technology.
| irrational wrote:
| My business coworkers are freaking out over Slack being down. But
| all my technical coworkers are nonplussed. It's interesting how
| those of us with a technical background are not too disturbed by
| things breaking.
| tmoertel wrote:
| Interestingly, _nonplussed_ is one of those words having two
| meanings that are at odds with each other. According to Google,
| those two meanings are:
|
| 1. (of a person) surprised and confused so much that they are
| unsure how to react. "he would be completely nonplussed and
| embarrassed at the idea"
|
| 2. INFORMAL*NORTH AMERICAN (of a person) not disconcerted;
| unperturbed.
| irrational wrote:
| I wasn't aware of definition 1. I did mean the informal North
| American definition.
| wrycoder wrote:
| I'm 79, born in the mid-west, living in New England for the
| last 60 years. Worked in tech. I've never heard the word
| used except as in Def. 1.
| irrational wrote:
| I've never lived in the mid-west or the New England
| region of the USA. Maybe it's a regional usage (I've
| lived in Florida, Texas, California, Colorado, Utah,
| Oregon, and Washington). I'm not sure where I picked up
| my usage from. My dad is from Colorado and my mom from
| California. Maybe I picked it up from one of them ;-)
| mepiethree wrote:
| I'm "plussed," because an app that I manage uses slackclient,
| and some people depend on it to get paid. Obviously it's my
| fault for not handling the error, and I hotfixed it, but still,
| wah.
| kuon wrote:
| This is like the first time in a year I need slack, call that bad
| luck.
|
| I hope matrix/element will rise more.
| lawrjone wrote:
| Having been in this situation before, with a totally-down-and-
| not-coming-back-up outage of a payments system, I really feel for
| their incident response team.
|
| I'll take this moment to remind everyone of their human tendency
| to read meaning into random events. There's no evidence to
| suggest New Year traffic has caused this, and outages like this
| can happen in spite of professional and competent preparation.
|
| Hugops for their team, I hope they get it back soon.
| yjftsjthsd-h wrote:
| > I'll take this moment to remind everyone of their human
| tendency to read meaning into random events. There's no
| evidence to suggest New Year traffic has caused this, and
| outages like this can happen in spite of professional and
| competent preparation.
|
| On the one hand, sure we don't specifically know what's going
| on. On the other hand, it's the first Monday in the new year
| and they went down shortly after the start of the business day
| Eastern time; it _could_ be coincidence, but it would be a
| remarkable coincidence.
| xeromal wrote:
| Another likely scenario is that they a deployment that was
| risky that they waited to push until after the holidays.
| taneliv wrote:
| It seemingly worked ok in UTC-2 in the morning and early
| afternoon, then started having issues and is now a bit
| intermittent (or fixed, there's not much traffic on my
| channels, as it's evening already). Do they have that much
| more traffic on US east coast than in Europe?
| martinald wrote:
| Probably, but it was only 2-3pm UK time when it started
| falling over so there would be all the Europe traffic plus
| the East Coast traffic starting to sign in.
| lawrjone wrote:
| There are a load of ways NY might have contributed to this,
| but it may not be a direct cause. What's more likely, Slack
| forgetting to scale their deployment back up after too much
| mulled wine, or a number of people on holiday meaning a
| simple failure has developed into something more serious?
|
| It could be anything really- my post was more about how
| situations like this can happen to even the most prepared.
| The assumption it has something to do with NY tends to assume
| very trivial, silly mistakes. Especially with no information,
| that seems a bit uncharitable.
| [deleted]
| not2b wrote:
| I'm able to connect to Slack at the moment. My company doesn't
| use it, but a hobby group I belong to uses it for discussion
| forums and their instance is up and functional. So it isn't down
| completely as I write this.
| lylo wrote:
| This is an excellent reminder of the danger of being locked into
| closed systems.
|
| I wonder how many companies (like mine) have literally ground to
| a halt because of this? Do other companies have a risk-documented
| backup plan B for times like this? Presumably the default is for
| everyone to resort to email?
|
| More worryingly is the number of ChatOps processes and
| alerting/observability systems that are in place around Slack.
|
| Not being able to chat with co-workers for an hour or two is
| fine, but not being able to safely manage CI/CD/deployments is a
| big risk.
| [deleted]
| Scandiravian wrote:
| I just finished an 5 hour debugging session on what turned out
| to be several cascading bugs in one of the older systems at my
| company
|
| Can't deploy the fix, because
|
| - developers trigger deployments through slack and I don't have
| access to the underlying deployment system
|
| - infrastructure guys who have access aren't responding to my
| emails
| vlod wrote:
| Sounds like a great opportunity to work on not having Slack
| as the only trigger mechanism. :)
|
| Or a least document how to call the slack bot manually.
| (assuming it's just a http endpoint)
| Scandiravian wrote:
| I agree 100%! Though I think it might be dangerous to
| prepare for the "last disaster". It'll be some other system
| breaking next time, so I think we instead should identify
| what systems that do not have some kind of redundancy and
| determine blast radius of those crashing
|
| I'm good at not panicking about things I can't change, but
| I worry about some of my colleagues who find it difficult
| to not have control in these situations
|
| I can't do anything to help them at the moment, so for now
| I'm heading to my couch with my analogue book :)
| hn_throwaway_99 wrote:
| > This is an excellent reminder of the danger of being locked
| into closed systems.
|
| Do you honestly think a self managed solution or open source
| solution would be more reliable for most companies?
| NationalPark wrote:
| When application engineers say stuff like this, they're also
| implying that there's a giant infra/ops team who will be
| wiling and able to do all the work for them. Nobody
| _actually_ wants to be responsible for this stuff.
| lylo wrote:
| Not at all, I think closed private systems are far better
| (better products, support, service) but when an entire
| company runs its operations on a single system like Slack,
| there is a big risk when it goes away and you need
| contingency.
|
| I'd still rather be on Slack and suffer a day of lost
| productivity than force people to use only email or IRC.
| nicioan wrote:
| Let's take a moment and express solidarity towards the fellow
| engineers that are currently working like crazy under a lot of
| stress to fix this.
| davesque wrote:
| It's funny that this isn't considered an "Outage" by their status
| page's standards.
| HowardStark wrote:
| Seems that it is now. It was originally just Messaging and
| Connections that had an "incident", so I wonder if something
| else happened or they manually changed the status to at least
| own that all their services went FUBAR.
| [deleted]
| papito wrote:
| Here is to hoping it's not SolarWinds related eh!
| Trex_Egg wrote:
| me too
| paulintrognon wrote:
| It's already being discussed here:
| https://news.ycombinator.com/item?id=25632346
| ilkkao wrote:
| I just hope slack itself has a backup chat tool for incidents
| like this.
| the_duke wrote:
| A great opportunity to try out Element, an open source client for
| the federated and open Matrix network [1]: https://element.io/
|
| (edit: to clarify: not affiliated in any way, just a fan)
|
| [1] https://matrix.org/
| crubier wrote:
| Lol reckless marketing man aha
|
| (edit: ok I thought you were promoting your own service here, I
| found it a bit spicy but fun nonetheless aha)
| julianlam wrote:
| I've found that element.io itself is really slow to load.
|
| That said, we're pushing up against the limits of our free plan
| with Slack and will likely deploy a matrix server in due
| course.
| xfalcox wrote:
| The good thing about that is that if you want a fast client
| there are quite a few native clients to pick.
|
| https://matrix.org/clients/
|
| For example, Mirage is Python/QT and quite fast in my
| experience. There are Rust clients, C++ clients, terminal
| based ones, etc.
| Shared404 wrote:
| Do you happen to know of any desktop clients that support
| encryption/cross-signing?
|
| I'd like to get off of Element desktop/web for a couple of
| reasons, but I need those features. I'd help implement them
| myself, but that's beyond my skill level.
|
| Edit: For anyone else wondering, matrix-commander [0] looks
| like it may be workable if a cli tool is acceptable for
| your usecase.
|
| [0] https://matrix.org/docs/projects/client/matrix-
| commander/
|
| I'm planning on looking through the GUI ones at some point,
| but don't have time now.
| the_duke wrote:
| Fluffychat [1] is built with Flutter and apparently
| supports e2e encryption.
|
| Note: I wanted to try it out for a while, but haven't
| yet.
|
| [1] https://gitlab.com/famedly/fluffychat
| racl101 wrote:
| Welcome to the party, pal!
| steveharman wrote:
| "It was working fine when we sold it to you"
| sytse wrote:
| At GitLab our fallback from Slack is Zoom
| https://about.gitlab.com/handbook/communication/#emergency-c...
|
| I'm posting this because I found a lot of people don't know that
| Zoom includes a complete chat client that includes channels.
|
| And #HugOps to the engineers at Slack working on this. I
| appreciate that they posted a periodic update even when there was
| no news to report: "There are no changes to report as of yet.
| We're still all hands on deck and continuing to dig in on our
| side. We'll continue to share updates every 30 minutes until the
| incident has been downgraded."
| bww wrote:
| Just a reminder that it's probably not a wise idea for anyone
| to get further in bed with Zoom than they already are.
|
| https://www.washingtonpost.com/technology/2020/12/18/zoom-he...
| tehjoker wrote:
| Imagine if they did that for the US government, which is
| easier to compel since they are in US territory.
| xibalba wrote:
| One nation is currently operating concentration camps and
| arrests and seizes the property of prominent citizens who
| criticize the government. Are you sure that's an
| equivalence you want to draw.
| fhrifjr wrote:
| Like Guantanamo Bay or prosecution of Assange for his
| journalistic work to expose wrongdoing of government? Or
| maybe you're talking about for-profit prison system and
| mass incarceration practices? But you're probably talking
| about China, right?
| bananabreakfast wrote:
| Once again, that is a false equivalence.
|
| No one imprisoned in Guantanamo Bay is a US Citizen and
| neither is Assange.
|
| The US prison system is super fucked up but it is not the
| same as ethnic cleansing.
|
| You are comparing apples to concentration camps.
| thereare5lights wrote:
| > No one imprisoned in Guantanamo Bay is a US Citizen and
| neither is Assange.
|
| That's a glib retort.
|
| A takeaway from your position is that it's ok so long as
| you do it to citizens of other countries.
|
| > it is not the same as ethnic cleansing.
|
| See the above.
|
| That's always been the difference between the US and
| China and why so many countries have hatred for us and
| yet little to none for China. They don't fuck with other
| countries on the level that we do.
| dmkolobov wrote:
| We have thousands of brown people in camps along the
| border, in brutal conditions, without access to
| healthcare(unless you count forced sterilizations as
| healthcare). Do you consider those to be apples as well?
| g00gler wrote:
| Why are they in camps along the border? Why are the
| Uighur? Did the "brown people" break any laws? Did the
| Uighurs?
|
| Are the "brown people" in camps along the border a
| single, ethnic minority? Are all "brown people" in the
| country subject to arrest and under surveillance just for
| being "brown"?
| dmkolobov wrote:
| Well, yeah. People _are_ subject to arrest and
| surveillance for being brown/black in the US.
| throwawaygulf wrote:
| No they aren't.
| vorpalhex wrote:
| That forced sterilization claim was entirely debunked and
| was misleading to start with:
|
| https://www.channel4.com/news/factcheck/factcheck-were-
| mass-...
|
| https://www.snopes.com/ap/2020/09/18/more-migrant-women-
| say-...
|
| And 70% of those people in those camps are released
| within 30 days, often times within one week back to their
| country of origin (or given asylum).
| geocar wrote:
| > No one imprisoned in Guantanamo Bay is a US Citizen and
| neither is Assange.
|
| I think you should I know I -- and probably others, are
| reading this as "b-b-but, they're not US Citizens, so
| they don't deserve [the same] rights"
|
| I hope that's not what you mean, because if it is, that's
| really fucked up.
| thereare5lights wrote:
| That's exactly how I read it. And that's probably the
| same position of lots of Americans, which in and of
| itself is quite fucked up.
| jethro_tell wrote:
| yes, I remember when I got my trump kidney from a poor
| anti-fascist liberal. /s
|
| America is fucked up, that doesn't mean that other
| countries aren't also fucked up or aren't doing worse
| things with the data they collect.
| tehjoker wrote:
| Yea, but you live here and so you should think about the
| implications of this for yourself and your countrymen and
| not through the lens of international competition. That
| is a distraction.
|
| Essentially, the China case proved Zoom is willing to
| cooperate with a nation state. The US is the nation state
| we live in, Zoom is HQ'd here. Therefore, the risk to us
| is high.
|
| As an aside, the organ harvesting idea comes from the
| Fulan Gong, who are similar to Chinese Scientologists. It
| is not clear to me that their claims are accurate.
| tehjoker wrote:
| Yes. The Chinese state and the US state are both proven
| to spy on their citizens. For reference, see the heroic
| Edward Snowden's 2013 leaks.
| [deleted]
| Havoc wrote:
| And yet a surprising number of firms with sensitive info
| continue to use it. Law firms etc
| schoolornot wrote:
| The lesser of two evils and the product just works. They
| might have a few governance issues they need to fix. But at
| the end of the day, they signed a BAA with us and will take
| the liability and fallout of a breach.
| dylan604 wrote:
| Is this an attempt to refute the claim using Zoom is bad,
| or an indictment against those still using it?
| xevrem wrote:
| an indictment that so many people who should know better,
| still use a tainted and non-benign product.
| yabuttslivnwds wrote:
| Elaine Chao's sister is married to Xi, while Elaine, as
| transportation secretary under Trump, was busted inviting
| family with business ties to the CCP to official US
| government meetings.
|
| The fear on this forum is imagined political thriller
| more than realistic.
|
| Every technologist is grifting off the military
| industrial complex.
| KaoruAoiShiho wrote:
| What an overwrought headline, the employee in question has
| already been fired.
| criley2 wrote:
| It's weird that you describe the headline as "overwrought"
| and call the person an "employee" when the headline is more
| accurate than you.
|
| This was an executive, not just an employee. That's a huge
| distinction and I can't help but think you intentionally
| downgraded his position to cover-up his behavior. "Just an
| employee" "Not a big deal"
|
| But when you read the allegations, they seem like a very
| big deal that an executive was spying on users, giving
| their information to the Chinese government explicitly for
| oppressive purposes, including folks who are not in China,
| and went out of his way to personally censor non-Chinese
| groups meeting to discuss the Massacre-Which-Cannot-Be-
| Mentioned.
|
| I would say the headline understates the gravity (it's very
| much a 'by-the-books' headline that you KNOW went through
| ten levels of Legal), and that your hand waving here feels
| much more dishonest than the headline.
| btown wrote:
| Regardless of intent, it's undeniable that at some point
| there were _insufficient controls_ to prevent this
| executive, or any executive in the future, from gaining
| this level of surveillance access.
|
| And it's also undeniable that the consequences for Zoom
| (really, just needing to fire a few people, and not even
| the people who designed those controls if there were any)
| are so minimal that they have no _incentive_ to
| strengthen those controls.
|
| For some organizations (mine included) the benefits of
| Zoom outweigh the risks of Zoom having proven itself to
| not have those controls, namely the possibility of both
| political and corporate espionage. As with all things,
| YMMV.
| MattGaiser wrote:
| Not only that, but this line stuck out to me.
|
| > and other employees have been placed on administrative
| leave until the investigation is complete.
|
| Zoom at least suspects he did not act alone.
| kemonocode wrote:
| Sorry, but an executive is not just "an employee" and any
| alarms are rightfully justified. Took a little bit of
| cajoling in my company but we've successfully moved to
| self-hosted tools for the most part (Jitsi and Rocket.chat)
| with just a couple of projects with outside contractors
| using Slack.
| fermienrico wrote:
| The optics are still very, very bad for Zoom. I have zero
| trust in them.
| noir_lord wrote:
| There are remarkably few organisations I somewhat trust
| (even then on a sliding scale) but on that spectrum Zoom
| sits at the "wouldn't touch them with someone elses
| bargepole" end.
| alisonkisk wrote:
| The company in question is still operating. We don't know
| if the employee was just a scapegoat.
| [deleted]
| Frost1x wrote:
| "Business takes the easy and ethically questionable route to
| continue making money" news at 11.
|
| I'm not condoning Zoom's actions but this is hardly a problem
| unique to Zoom. Few _if any_ businesses will stand up for
| consumers and citizens unless it 's directly aligned with
| their profit motive. In this case, the business choice is to
| operate or not in mainland China. If they choose to stand up
| against the Chinese government they're going to have
| difficulty continuing to operate in China and risk losing
| that entire market.
|
| Google played this PR game many years ago in China (rejecting
| some of the governmental policies) and ultimately caved to
| Chinese policies to do business there.
|
| Businesses are not the organizations we should look to for
| empowering people, that's simply not their goal no matter how
| much their marketing team may want to sell that idea by
| following trending (popular) social movements that they've
| already done market studies on to assess potential fallback.
| tw04 wrote:
| I think it's a pretty bold claim to state that Zoom's
| actions aren't unique.
|
| What other business in this space has given China
| unfettered access to US users and data? I'm not aware of it
| occurring with Webex, Teams or go2meeting. The "one rogue
| employee" thing falls flat pretty quickly when they're the
| only ones that had this issue.
|
| This feels like their encryption thing all over again,
| there's an "oversight" that is equivalent to a backdoor
| that only gets fixed when they get caught.
| Frost1x wrote:
| I didn't realize they shared any user data outside China
| (misread the WP portion). It appears they did share 10
| users' data which is a bit questionable but I'd hardly
| call that unfettered access to US data.
|
| The fact is all of the US businesses operating in China
| give surveillance ability to the Chinese government for
| the Chinese users and are operating in an ethically
| questionable space being primarily based outside of
| China, at least in my opinion.
|
| It's really not too different than the businesses sharing
| US citizen data to the US government, much of which
| Snowden and others before him exposed. I suspect there's
| a lot more surveillance going on everywhere than the
| general public know about and the businesses best
| positioned to do the surveillance are probably doing it.
| trykondev wrote:
| Oh wow -- after years of using Zoom I definitely did not know
| about this. Thank you for pointing it out!
| vorpalhex wrote:
| Please reconsider using or supporting Zoom in any way.
| https://www.nytimes.com/2020/06/11/technology/zoom-china-tia...
| cgh wrote:
| Unpopular opinion, but WebEx beats the pants off Zoom. Of
| course, it's neither free nor open. But it does support strong
| end to end encryption and authentication and has regulatory
| compliance to a bunch of things, if that's important to you. I
| get that there is WebEx hate because "enterprise" etc, but we
| use it around here and it works quite well.
| [deleted]
| lima wrote:
| +1, both are a PITA but Webex at least has a really good web
| client.
| sergiotapia wrote:
| At Papa we use Discord as backup.
| isodev wrote:
| Nice tip! Zoom chat is cool (although chats without gifies are
| way too productive ).
| [deleted]
| bovermyer wrote:
| As someone who has to use Zoom Chat to interact with a client
| on a daily basis, please, do not recommend Zoom Chat to anyone
| except as an example of how not to do chat software.
|
| --
|
| Though, I do agree wholeheartedly with your sentiment that the
| Slack team needs all the positive vibes they can get right now.
| meffie wrote:
| As someone who has to use Zoom Chat every day, this a
| thousand times. (We still run an XMPP server on the side just
| to avoid the horror that is Zoom chat.)
| [deleted]
| ourcat wrote:
| I'm quite surprised that you don't use Mattermost as a 'Slack
| fallback' at GitLab.
| ekianjo wrote:
| Indeed, if you champion FOSS, why would you recommend a
| proprietary piece of software as fallback?
| res0nat0r wrote:
| I'm guessing because they don't want the support burden on
| a rarely used but necessary fallback solution vs. something
| plug and play.
|
| This is the reason these "closed" ecosystem apps like
| Slack/Zoom are multi billion dollar companies and have
| massive uptake. Simple and easy to user.
| [deleted]
| [deleted]
| ourcat wrote:
| My point was really that it's GitLab's own product. "GitLab
| Mattermost" [https://docs.gitlab.com/omnibus/gitlab-
| mattermost/]
|
| I'm amazed they use Slack at all. Let alone as a fallback.
| dvdbloc wrote:
| Or use mattermost with Slack as the fallback
| OJFord wrote:
| No I'm pretty sure that's just a sort of 'integration',
| Mattermost shipped with GitLab?
|
| https://about.gitlab.com/blog/2015/08/18/gitlab-loves-
| matter...
|
| > Like many companies in the last year we've switched to
| using Slack to improve internal communication. [...]
| Since Slack doesn't offer an on-premises version, we
| searched for other options. We found Mattermost to be the
| leading open source Slack-alternative and suggested a
| collaboration to the Mattermost team.
|
| I'm not really sure why it's 'GitLab Mattermost' and not
| (at your link) 'GitLab Nginx' et al. though.
| ourcat wrote:
| Ah I see now, having read more of the history. Calling it
| that seems pretty misleading/odd.
|
| (We use GitLab and Mattermost (integration) where I work.
| I've been 'remote / WFH' for the past 7 years.)
| sytse wrote:
| I agree it can be improved and created
| https://gitlab.com/gitlab-org/omnibus-
| gitlab/-/merge_request... to do so.
| tempest_ wrote:
| They posted a giant list of the services they use
| recently.
|
| They use a ton of services.
|
| Likely you don't want your backup to be one of your
| systems and another part of the company probably uses
| Zoom already so it is probably easy to fail over to that.
| sytse wrote:
| Here is the list of services that we use
| https://about.gitlab.com/handbook/business-ops/tech-
| stack/
|
| This includes many proprietary ones, we generally choose
| the product that will work best for us, considering the
| benefits of open source, but not excluding proprietary
| software.
|
| Mattermost is not part of the single application that
| GitLab is. There is a good integration between with
| GitLab and our Omnibus installer allows you to easily
| install it. But it is a separate application from a
| separate company.
| daniellarusso wrote:
| Maybe my DevOps folks should not be privy to all internal
| communications?
|
| That is one reason we did not go with Mattermost.
| trastknast wrote:
| Slack won't protect you from this as it's possible for
| admins to export even private DMs.
|
| https://www.nbcnews.com/better/business/slack-updates-
| privac...
| Hnrobert42 wrote:
| Only Slack Workspace Owners can export, not Slack Admins.
| daniellarusso wrote:
| You also have to be on the 'Plus' plan, otherwise it is a
| roach motel.
| Jolter wrote:
| Devops should have nothing to do with your chat server.
| It should be your IT department, just as with the email
| server.
| Spivak wrote:
| DevOps at a lot of small companies also manage the
| internal IT stack and sometimes even take on most of the
| IT duties. Once you get larger you start having "IT" as
| something separate from DevOps but with the actual
| infrastructure managed by operations. Once you're really
| big the teams are truly separate and IT owns their own
| infra.
| arpa wrote:
| If you do not trust your own devops, why are you trusting
| someone elses devops?
| toomuchtodo wrote:
| Low maturity risk management functions.
| lostcolony wrote:
| Because someone else's devops can't use it against you
| institutionally. Nor is going to insist on having an
| opinion on things that they're unaffected by.
|
| This isn't a slam at devops, it's about the need for
| institutional information hiding; not everyone needs to
| know about and weigh in on every decision being made.
| Spivak wrote:
| We structure our company similarly. With effort DevOps is
| god on everything except HR, Sales, Finance, Chat, and
| C-level management which are operated with 3rd party
| services controlled by the individual departments and
| "owned/managed" by the C-suite.
| ericbarrett wrote:
| It's just due diligence. Think of what you have access to
| if you have "god mode" on corporate chat: HR, the CFO's
| DMs, private messages between other coworkers, and so on.
| Most won't fall for this temptation, but even those with
| strong anti-spying morals can be weakened by
| circumstances. Best to remove the temptation by design.
| mxuribe wrote:
| I continue to be impressed by GitLab's operations and
| documentation! While, yes, others may have similar backup
| plans, as an outsider, it feels like GitLab's handbook seems
| cooler even if only for their publishing, and making public of
| their practices and processes. I'll caveat that I'm not really
| a fan of zoom/slack/hangouts (I'm an unashamed fanboy of matrix
| and its numerous clients), but gitlab's approach is still
| really neat! Kudos to gitlab!
| kburman wrote:
| Why not Mattermost or Flock?
| ff333ttee wrote:
| Aren't you worried about so many security vulnerabilities found
| in Zoom?
| jayd16 wrote:
| Is this accessible from the web? Can't seem to find it with
| this Chromebook.
| Xmax wrote:
| Surprisingly mobile client is working for me..!!
| djsumdog wrote:
| It's off an on. The connectivity is spotty.
| mariusseufzer wrote:
| And there we have it: Relying on big companies sucks. It's great
| as long as it works. Once a system breaks thousands, or even
| millions, of businesses suffer. (Of course they are also
| beneficial and a private server can also crash at any time + I
| don't wanna blame Slack, but we always have to keep this in
| mind).
| [deleted]
| yreg wrote:
| If a big company has million customers and the big company
| experiences an outage per quater, then a million businesses
| suffer every quater.
|
| If a thousand small companies have thousand customers each. And
| these small companies experience an outage per quater, then a
| million businesses suffer every quater.
|
| As the end-user-business, is it better to suffer the outage at
| the same time as other businesses? Is it worse?
|
| Surely there are valid arguments against relying on big
| companies, but I don't think this is one of them.
| bricss wrote:
| https://status.webex.com/service/status?lang=en_US
| 00adefff574 wrote:
| Hopefully it stays down forever
| jennyyang wrote:
| Could this be some sort of data corruption? I find it hard to
| believe that Slack could be down for this long without something
| that is exceedingly hard to rollback. Even if some services are
| completely overwhelmed with traffic, they could block a certain
| percentage of traffic to decrease load, and then force servers up
| across their datacenters and then unblock traffic. It has the
| hallmarks to me of some sort of datastore is down, but obviously
| just a random guess.
| zaptheimpaler wrote:
| It hasn't been that long and lots other web services are
| behaving a bit strangely or are down as well -
| https://downdetector.com/.
|
| So its probably a wider issue affecting everyone - network
| level is my guess.
| mey wrote:
| Why does the slack client not show connection issues instead of
| just hard locking up?
| jaywalk wrote:
| When it went down fully and I had the Windows client open, it
| went to a page that basically said "Slack is down, we don't
| know why, try restarting and see if that fixes it. Here's the
| status page."
|
| It would be nice if they could fix it so that a fresh start
| also goes to that page, at the very least.
| mey wrote:
| How do you have the Slack app installed? I currently have it
| installed via the Windows/Microsoft Store, and I suspect that
| is a significant part of the problem.
| jaywalk wrote:
| Direct download from their site.
| bengale wrote:
| The client on my Mac showed a page that said it was having
| issues connecting with a link to their status page.
| [deleted]
| mey wrote:
| The Windows desktop app is less fortunate.
| djmetzle wrote:
| Should be an interesting post-mortem...
| louiechristie wrote:
| https://twitter.com/louiechristie/status/1346213038924427265...
| itsdrewmiller wrote:
| It's pretty embarrassing for their 45 minute update to be "not
| sure what's wrong!"
| yreg wrote:
| The status still says "We're continuing to investigate", but
| they tweeted[0] that they have found the issue.
|
| [0] - https://twitter.com/SlackHQ/status/1346132040249470979
| thesuitonym wrote:
| I'd like to take this moment to mention self-hosted, open source,
| and federated alternatives like XMPP and Matrix.
|
| I'd like to, but unfortunately I don't feel like I can in good
| faith. Matrix is woefully immature, and suffers from a lot of
| issues, but I think is closer to being a functional Slack/Discord
| alternative. XMPP is much more mature, and works very well for
| chat, but doesn't have a nice package that does all the Slack
| stuff--at least not that I'm aware of. I'd love to be proven
| wrong there. I know it _can_ be done, but if it can 't be
| deployed quickly by an already overstressed team member, what
| chance does it have?
| halukakin wrote:
| If the benefit we are looking for is better up time, that will
| not happen. The main benefit is going to be knowing why the
| system is down, and the eta to being up again.
| dheera wrote:
| > self-hosted
|
| How often is Slack/Discord down? I mean it's not perfect, but I
| really honestly don't think I could match their uptime by self-
| hosting, as well as more on-call rotations for something that's
| not core product.
|
| I very much prefer that for something that isn't core product,
| if it goes down I need to do exactly nothing for it to come
| back up, and that the engineers at Slack will be starting to
| work on it likely before I even realize it's down.
| darkwater wrote:
| This is a tale SaaS vendors (which have strong presence in
| online tech communities like HN because they are software
| companies) sold very well, and it's probably true for many
| small startups, but for medium sized companies managing their
| own platform for something like Slack is completely doable
| and you will not have those big downtimes compared to Slack.
| Sure, you have to dedicate time and resources to it, and
| obviously is not "core business" although a chat platform is
| a pretty important component in an online company.
| welterde wrote:
| I would be surprised if you couldn't match or exceed slacks
| uptime running whatever alternative you want (IRC,
| mattermost, rocketchat, etc.) on a random dedicated server.
|
| Hardware is quite reliable these days. And updates can be
| scheduled to be at a convenient time for the team.
| dheera wrote:
| Yes, but what if you're taking a few days off to backpack
| in the wilderness with no signal while it goes down? Who
| deals with the downtime?
| welterde wrote:
| If you are the only technical person on your team then
| it's of course not ideal and would require some further
| thought into making things redundant. But even that is
| easy enough to do with IRC (setup two servers, link the
| irc servers together, single DNS record that points to
| both servers - job done).
|
| If there are other people on the team that have _some_
| technical skills then they can fix it..
|
| IRC lacks quite a few features compared to other
| solutions, but the reduced complexity does bring very low
| operational complexity.
| dheera wrote:
| IRC will be incredibly hard to use for non-technical
| people on your team. Mobile clients for IRC look like
| crap, and have horrible-looking ad bars. No integration
| with Google Drive, Github, or other things.
|
| It's just not a business-friendly tool.
|
| I'm an engineer and personally I'm fine with IRC, I'm
| just trying to be realistic here.
| throwaway201103 wrote:
| Who deals with the downtime if any other on-premises
| system goes down?
|
| If you are running networks and software on site, and
| they are business-critical, you have people and a plan
| for this. Or you don't, and suffer the consequences.
| spicyramen wrote:
| Facebook and other vendors killed XMPP, we lived in a non
| federated world in Enterprise and consumer. No interest of
| companies to change this
| lallysingh wrote:
| Spam killed xmpp
| spicyramen wrote:
| I remember using clients like pidgin with all my accounts
| it was a great experience. Now I need to have like 100 apps
| pmlnr wrote:
| BS
|
| The only thing "killed" XMPP was that proprietary made
| money, XMPP didn't.
|
| Apart from that, it's alive and well. See Conversations for
| android, Prosody for server.
| josephg wrote:
| XMPP killed XMPP. Its just not very good. It doesn't work
| well between different clients and servers. The protocol is a
| horribly overcomplicated mess of overlapping, partially
| supported extensions for basic functionality. And it doesn't
| work at all with low power mobile delivery. (It was invented
| before the iphone.)
|
| There might have been political reasons why google dropped
| XMPP, but it would also make sense as a purely technical
| decision.
| MattJ100 wrote:
| The problem is that XMPP and Matrix are protocols, not
| products.
|
| Element (the primary Matrix software) definitely has Slack and
| Discord in its sights.
|
| I don't think there are any serious "self-hosted Slack-like"
| contenders that are XMPP-based right now. You can piece
| components together (yay, standards!) and I did exactly this
| for the IETF's XMPP deployment recently. But it's far from
| being a cohesive easy-to-deploy product. Simply because nobody
| is building that right now. It takes time and resources and
| there's no money in it.[1]
|
| People who do set out to build Slack clones (projects like
| Mattermost and Rocket Chat) and earn money don't have features
| such as federation on their priority list and don't build on
| top of Matrix/XMPP. They roll their own custom protocols and as
| far as I can see they are fairly content with that decision.
|
| [1] There's even less money it, but nevertheless I am currently
| working on such a self-hostable "package" for XMPP. However
| rather than focusing on the team chat use-case (Slack/etc.) I'm
| focusing on personal messaging (WhatsApp/etc.):
| https://snikket.org/ if you're interested. It's possible I will
| broaden the scope one day.
|
| EDIT: typo
| networkimprov wrote:
| It's largely overlooked that the success of Slack & MS Teams
| is partly due to the cybercrime portal that email has become.
| IOW, you don't get phished in your org's Slack chats. To
| prevent phishing, any chat service will suffice; an open
| protocol isn't necessary, as you don't intend to engage with
| ppl outside your org.
|
| The essential problem IMO is how to replace SMTP. No one has
| proposed _and implemented_ an alternative, to my knowledge.
| So I decided to[1]. The current draft omits federation
| (although I wouldn 't rule it out in all cases yet).
|
| [1]
| https://github.com/networkimprov/mnm/blob/master/Protocol.md
| dathinab wrote:
| No, EMail has fundamentally bad UX for a lot of use case
| slack and similar are used for.
|
| > problem IMO is how to replace SMTP.
|
| Sadly SMTP is probably one of the parts of Mail which have
| aged _best_. Enforcing the usage of some (currently by
| design optional) features wrt. authentication and similar
| at the cost of backwards compatibility and you have all you
| need from the delivery protocol.
|
| BUT:
|
| - IMAP and similar is much worse.
|
| - Mail bodies are a _big mess_ it 's always fascinating for
| me that mail interoperability works at all in practice
| (again you can clean it up a lot, theoretically, but
| backwards compatibility would be gone).
|
| - DMARC, DKIM and SPIF which handle mail authenticity have
| a lot of rough corners and again for backward compatibility
| are optional. Again it's not to hard to improve on but
| would brake backwards compatibility.
|
| The main reason mail still matters is because it's
| backwards compatibility, not just with older software but
| also with new software still using old patterns because of
| the (relative to the gain) insane amount of work you need
| to put into all kinds of mail related components. But then
| exactly that backwards compatibility is what.
|
| (Yes, I have read the "Why TMTP?" link and I have written
| software for many parts around mail including SMTP, and
| mail encoding. The idea that SMTP is at the root of the
| problem seems to me very strange. Especially given that
| like I mentioned literally every other part of mail is
| worse then SMTP by multiple degrees...)
|
| EDIT: Just to prevent misunderstandings one core feature of
| mail is the separation of mail delivery and mail
| authenticity, in the sense that you don't need the mailman
| to prove the authenticity of a mail. At most the
| legal/correct/authentic delivery.
| megous wrote:
| Why would you replace it? Will not disabling all public un-
| authenticated submissions on your mail server suffice? You
| can also prevent delivery to outside world (and error out
| on submission so that users are notified) if you really
| like. Result will be your own private mail server.
|
| And you can keep using all the normal MUA's on desktop and
| mobile.
| networkimprov wrote:
| Changing your SMTP server configuration that way would
| break things, so the question is whether to set up a new,
| company-internal SMTP server, and give your employees new
| addresses there. But that won't quickly stop the
| phishing, because your ppl still need to get email via
| the public network from clients and suppliers.
|
| Setting up a new server isn't easy unless you hire an
| outside service provider, and if you're willing to do
| that, Slack et al offer a nicer UX than the well known
| email/webmail clients.
|
| Orgs with sufficient IT resources commonly do run
| internal SMTP servers.
| megous wrote:
| I meant that as a suggestion compared to designing a new
| protocol.
| throwaway201103 wrote:
| Yes I'm old enough to remember when organizations had
| email but it was internal-only. Probably less for
| security reasons at the time than that they simply didn't
| have an internet provider. There were also mainframe-
| based email systems that were internal to that network.
| welterde wrote:
| IRC may be out these days, but at least deploying a small IRC
| server for the own team is really not that much effort anymore
| and doesn't incur that much ongoing maintenance work either.
| Unklejoe wrote:
| I can only offer my own personal experience: Matrix has been
| working well for me for a couple years now. However, I probably
| have a more narrow use case than you're thinking of.
|
| I run a small homeserver and use it to communicate with a group
| of about 20 friends. Most of them aren't "technical" people. We
| use it mostly for chatting and image/video sharing. We never
| use live calling (audio or video).
|
| There have been a few bugs in the mobile apps, but for the most
| part, everything has been working fine.
|
| The biggest issue is the UX. It's not as polished as the big
| players.
| thesuitonym wrote:
| This is actually the use case I've been trying to get to for
| some time. Unfortunately, I need it to "just work" to get my
| non-techy friends interested, otherwise they'll go right back
| to Discord.
|
| Like I said, it's close, I just don't think it's there yet.
| Unklejoe wrote:
| I'd say it's almost in "just works" territory for everyone
| except the person who has to actually administer the
| homeserver (me). I absorb a lot of the complexity for my
| friends.
|
| The only thing that's a little cumbersome is requiring them
| to enter a custom server URL when the register/log in for
| the first time.
| etherealG wrote:
| Actually that's the same with slack, each slack server
| has a unique url too.
| m12k wrote:
| I've heard lots of good about Zulip - haven't tried it myself
| yet though
| vector_spaces wrote:
| Yep, I've stopped recommending Matrix because
|
| 1. There is virtually zero user-facing documentation. Need to
| know how to backup keys, verify another user, or what E2EE
| means? Ask your server operator. Basically the onus is on
| operators to document this stuff for their users. Except the
| stuff we're documenting is hard even for server operators, and
| especially challenging to document in a way that both
| nontechnical and technical users can understand.
|
| 2. Because this stuff is challenging even for more technically
| minded users to understand, it leads to a kind of burnout for
| interested non-technical users where they learn all they can
| about some feature and how it works at a high level from out of
| date random blogs, try to use the (complex, multi-step)
| feature, but then something won't work, and it isn't be clear
| whether it was because the user did something wrong or because
| the clients or server implementations are broken
|
| 3. Issues where core functionality is broken (e.g. two mutually
| verified users on my homeserver haven't been able to talk to
| each other in months -- see [1], [2], [3]) languish for months
| with zero response from maintainers.
|
| 4. While core functionality is both broken and undocumented,
| the maintainers announce rabbit hole features that no one asked
| for and seem very much like distractions, like their recently-
| announced microblogging view/client[4]
|
| In short the Element maintainers have shown little interest in
| making the platform accessible to the people who need its
| differentiating features the most, and have prioritized the
| "mad science"/technical aspect of their platform at the expense
| of the human element (end-users and operators).
|
| It'd be cool if Element used their resources to hire some UX
| folks and community advocates whose sole focus is addressing
| the horrid accessibility of their platform. I think most users
| would rather see that than further "mad science".
|
| [1] https://github.com/vector-im/element-ios/issues/3762
|
| [2] https://github.com/vector-im/element-ios/issues/3572
|
| [3] https://github.com/vector-im/element-ios/issues/3393
|
| [4] https://matrix.org/blog/2020/12/18/introducing-cerulean
| pmlnr wrote:
| This is disturbingly good summary. I remember Matrix being
| presented as less bloated compared to XMPP... sure.
| pachico wrote:
| I can hardly see those as alternatives to Slack. Maybe
| https://mattermost.com/ is what you were thinking about?
| thesuitonym wrote:
| Matrix with Element (Riot) as the front-end is pretty close.
| It does what slack does, it's just not very good. XMPP is
| arguable. It _can_ be a Slack alternative, if you stitch
| enough other servers on top of it. Personally, I don 't think
| XMPP will ever be more than chat, but some of its adherents
| believe differently.
|
| Mattermost is certainly not what I meant. That's just trading
| one Slack for another.
| PhilippGille wrote:
| This thread is about a Slack outage, which you have no
| control over. Mattermost and similar software is self-
| hosted, which of course doesn't mean you're getting 100%
| uptime, but you have (more) control over it.
| pas wrote:
| RocketChat works pretty well for simple team comms. I have no
| idea if it can do XMPP and/or Matrix.
| rexreed wrote:
| I suggested RocketChat when the outage was announced and HN
| community downvoted it quite heavily. I'm not sure why. [0]
|
| We ended making the switch and committed to Discord. We're
| now looking at Rocket.chat as a backup in case Discord goes
| down. But Slack is now completely out of the picture for our
| team.
|
| [0] https://news.ycombinator.com/item?id=25633047
| projectileboy wrote:
| You bring up a good point, however, which is that we _could_
| use open source, non-centralized alternatives for many of the
| online products we consume, but we choose not to, and so we
| increasingly become slaves to corporations that actively seek
| to narrow our choices. Another example of this is the push from
| big sites like Reddit to use their apps rather than just use a
| browser - it's not about functionality, it's about destroying
| the free and open web.
| ende wrote:
| Or, or... and bear with me here... or, packaged click-button
| solutions with paid (contractually obligated) dedicated
| product support is a better use of our short time, more often
| than not.
| megous wrote:
| That only works if you only need to use Slack alone or
| whatever. The moment you have to use more of these annoying
| services at once and manage N different stupid client apps
| for Y different platforms (desktop/mobile), the lack of
| open/shared protocol becomes a major issue. Let alone if
| you want to use them on emerging mobile OSes that are not a
| hellhole of data thievery.
| nix23 wrote:
| >support is a better use of our short time, more often than
| not
|
| Not when it's down.
| thewebcount wrote:
| Which open source solutions never go down?
| megous wrote:
| Everything goes down. But it looks like huge complicated
| distributed services shared by huge amounts of people,
| that are continuously updated and developed, and are
| constantly trying to attract more users/load, seem to go
| down more than a simple service on a simple server.
|
| No hard data though. My mail server only ever went down
| when I upgraded the server and didn't check that
| everything was still working right away, or similar
| maintenance induced incidents. It never went down by
| itself.
|
| Such systems only ever go down unpredictably on HW
| issues, or when overloaded/out of resources. Neither is
| very likely, because you're not trying to grow your
| service in any sense similar to VC backed enterprises.
| Most of the time it has constant very low load and
| resource use. And you can simply stop introducing changes
| to the system if you need more stability for some time.
| (stop updating, for example)
| nix23 wrote:
| The one solution with PLANED downtime.
| thewebcount wrote:
| > You bring up a good point, however, which is that we
| _could_ use open source, non-centralized alternatives for
| many of the online products we consume, but we choose not to,
| and so we increasingly become slaves to corporations that
| actively seek to narrow our choices.
|
| That doesn't happen for no reason. The vast majority of open
| source products I've used have terrible usability. I simply
| don't want to use them. I don't want to be beholden to
| corporations and walled gardens, but for me, the existing
| alternatives are worse in too many ways.
| mnky9800n wrote:
| Remember when Facebook messenger was xmpp based? Lol.
| rattray wrote:
| Have you tried http://quill.chat/ ? Younger startup (invite-
| only) but very slick.
| corytheboyd wrote:
| Thanks for sharing! They definitely nailed the marketing
| page, I'll keep in my list of products to follow up on :)
| zenexer wrote:
| XMPP is supported by a large number of clients, but running a
| server and getting everyone on clients with comparable
| featuresets is a nightmare. It's a cluster of disparate
| standards, and it's overwhelming. I'm sure it's doable if you
| have the time to invest, but it's not straightforward if you've
| never done it before.
|
| Matrix is pretty straightforward on the server side of things,
| but the client UX is invariably mediocre. Vector--the official
| client--exemplifies everything that is wrong with Electron
| apps. Slow, clunky, poor UI, poor platform integration. With
| the default home server, it can take seconds for a message to
| go through. At least it's far more customizable than Slack; it
| has an option for everything, which, as a power user, I quite
| like.
|
| I haven't tried Mattermost, but it looks like some of the
| important features aren't FOSS, at which point it's just
| another Slack as far as I'm concerned. I'll gladly pay for
| support, but for SSO? Meh, might as well stick with Slack; at
| least everyone and their dog knows how to use it. (This is, of
| course, an opinion that stems partially from ignorance; I
| haven't actually tried Mattermost, and if I do, I might fall in
| love with it. But my time is limited, and I can only evaluate
| so many products in a day.)
|
| Not that Slack is much better here: their threading system has
| so many UI/UX issues. Ever had a thread with hundreds of
| messages? For your own sanity, I hope you haven't. Ever tried
| to send an image to a thread from iOS? It's possible, but only
| by pasting the image into the text field; the normal attachment
| button isn't available, and Share buttons in other apps can't
| send to threads. And, of course, the recent uptime issues.
| Arathorn wrote:
| Element (formerly Riot/Vector), has improved loads over the
| years, and the default matrix.org average send time is around
| 100ms these days rather than multiple seconds:
| https://matrix.org/blog/2020/11/03/how-we-fixed-synapses-
| sca... has details. I suspect you (and the parent) may be
| running off stale data.
|
| That said, Element could certainly use less RAM, irrespective
| of Electron - and http://hydrogen.element.io is our project
| to experiment with minimum-footprint Matrix clients (it uses
| ~100x less RAM than Element).
| rattray wrote:
| > it uses ~100x less RAM than Element
|
| Wow - congrats!!
|
| What have been the most important architectural decisions
| to achieve this?
| Arathorn wrote:
| Rather than storing state from the server in the JS heap,
| new state gets stored immediately in indexeddb
| transactionally and is pulled out strictly on demand. So,
| my account (which is admittedly large, with around 3000
| rooms and 350K users visible) uses 1.4GB of JS heap on
| Element/Web, and 14MB on Hydrogen. It's also lightning
| fast, as you might expect given it's not having to wade
| around shuffling gigabytes of javascript heap around the
| place.
| zenexer wrote:
| It has, and I've been using it since its early days. I
| still use it. It's still terrible, just slightly less
| terrible. And, no, messages don't consistently send in
| 100ms on the default home server; there are regularly
| disruptions that cause significant delays, sometimes as
| much as 10-20sec. That's a big problem for a federated chat
| platform.
|
| Edit 1: I _want_ to love it; the design is everything I
| could ever hope for in a chat platform. I even tried to
| contribute to Vector, but it was such a mess that I
| eventually gave up.
|
| Edit 2:
|
| > That said, Element could certainly use less RAM,
| irrespective of Electron - and http://hydrogen.element.io
| is our project to experiment with minimum-footprint Matrix
| clients (it uses ~100x less RAM than Element).
|
| I'm not sure why this is a priority. Techies complain about
| RAM usage a lot, but if we have to choose between
| performance+power and a small memory footprint, we're going
| to choose the former almost every time. Take Telegram, for
| example: they have a bunch of native clients that perform
| amazingly well, although they do gobble RAM. Most of my
| technical friends use it as their primary social platform.
| It's not without issues, but it's really hard to go from
| something like Telegram Desktop or the Swift-based macOS
| Telegram client to Vector. And those clients aren't made by
| large teams--most (all?) first-party Telegram clients are
| each maintained by a single developer, if I'm not mistaken.
| feanaro wrote:
| It's weird that you're calling it Vector when it's now
| called Element and it was called Riot for years before
| that.
| ryanSrich wrote:
| This is actually impressive, in a bad way. I just have become so
| used to being able to run highly resilient cross region
| infrastructure for millions of users with just a handful of
| people that I forget what real downtime looks like.
|
| For their app to just go completely offline is unacceptable. Bugs
| and degraded services I get. But this is catastrophic.
| MuffinFlavored wrote:
| I can't even begin to guess what went wrong. What are your
| guesses? How many screaming executives are there at Slack
| saying "just roll it back"?
| eecks wrote:
| Mass server migration?
| aphextron wrote:
| >I can't even begin to guess what went wrong. What are your
| guesses? How many screaming executives are there at Slack
| saying "just roll it back"?
|
| Doubtful it's a code issue causing a total system outage. I'm
| assuming they have a bunch of auto scaling infrastructure
| that wound down over the holidays and couldn't take the spike
| this morning.
| glouwbug wrote:
| Well, they did hand off slack to Salesforce
| Xmax wrote:
| Surprisingly mobile client is working for me
| unethical_ban wrote:
| I am genuinely surprised that Slack wasn't ready for people to
| come back from holiday, to view increased queues of unread
| messages, to have to manually login vs. having auth tokens or
| cookies, etc. Either that, or they had a cosmically coincidental
| outage on a really bad Monday to have it.
|
| It's bad enough team comms go over Slack so much now, at least we
| have email fallback. What scares me is for the teams that use
| Slack for system alerting.
| tqi wrote:
| Do you know that was the root cause or are you making an
| assumption and running with it?
| [deleted]
| t-writescode wrote:
| Slack has been in business for several years and has survived
| several December to January transitions, including several
| people stopping using their product before Christmas and then
| returning early January.
|
| It seems a bit presumptuous to assume that's at fault here,
| given their age.
| rrrrrrrrrrrryan wrote:
| Does it? Don't you think their users might be leaning on it
| more heavily this year due to working from home?
| twblalock wrote:
| Now is a good time to recommend to your engineering org that
| they should have multiple alerting methods, e.g. Slack plus
| Pagerduty, or Slack plus email.
|
| Hopefully email won't be your backup. I've seen that done.
| Alerts get filtered and ignored, often by accident.
| djxfade wrote:
| We get our primary alerts through Slack. However we also have
| SMS and phone call backups through PagerDuty
| CSDude wrote:
| (from Opsgenie) I would imagine it would be the other way
| around for most people.
| djsumdog wrote:
| We have an alert channel in Slack, but it's mostly ignored.
| Our primary alerts come via SMS/VictorOps.
|
| At one of my old jobs, we had SMS via two physical/hardware
| devices in our data center. One had a Telstra SIM card and
| the other had an Optus SIM card. (They were plugged into the
| same machine, but we had plans to put a second one in another
| data center before I left).
|
| If you really care about alerts, you should have physical
| hardware doing your SMS messages via two different point-of-
| presences.
| dysfunction wrote:
| My coworker's theory was someone was waiting for the holiday's
| end to deploy something risky.
|
| And I'm in that boat of depending on Slack for alerting... in
| fact my team was also waiting over the holidays to deploy more
| robust non-Slack-based alerting (in our defense the product is
| only a few months old and only now starting to scale to any
| real volume).
| bitbuilder wrote:
| I wouldn't be surprised if it's actually a combination of a
| new feature being recently rolled out, along with the sudden
| spike in load this morning.
|
| The holidays are actually the perfect time for Slack to roll
| out a risky deployment, as it has to be their lowest usage
| time. So it would make sense if something was pushed out last
| week or the week before. And everything probably seemed fine.
|
| And then this morning they suddenly realize this new feature
| does not perform under load. And to make matters worse, the
| new feature has been out long enough to make any sort of
| rollback very tricky, if not impossible. Which means they'd
| need engineers to desperately hack out, test and deploy a
| code fix.
|
| If this is the scenario, I do not envy them at all.
| rrrrrrrrrrrryan wrote:
| Holidays are a good time for a _company_ to do a risky
| deployment, but a bad time for an individual employee to do
| a risky deployment, assuming one doesn 't want to work
| overtime over the holiday fixing things.
| ellisv wrote:
| This depends on how easy/difficult the rollback strategy.
| spurdoman77 wrote:
| Depends on how well compensated holiday overtime is.
| There are some employees happy to work overtime if their
| hourly pay is doubled or tripled. However there also
| those who wouldnt do that for any price.
| iso1631 wrote:
| Depends how bad it goes wrong. My org is a 24/7 one, but
| one Christmas back in the 90s (way before my time) some
| work was done on Christmas eve, I think it was on the
| phone system, in the days before widespread mobile
| phones.
|
| It broke, which was a major problem, this meant that
| senior management were being phoned (ho), and relatively
| high middle managers were on site to deal with the fall
| out. Of course most suppliers were also closed so
| everything was harder to fix.
|
| There's good reasons not to do changes when places are
| closed, or at least skeletoned, for 2 weeks.
| hinkley wrote:
| Not a bad theory.
|
| I used to work for a place that had a FY that ended in
| summer. We had a lot less problems with stuff being shoveled
| out the door at Thanksgiving and Christmas because nobody was
| trying to finish their year-end performance goals over the
| Holidays.
|
| I think what I'm implying is that management creates this
| issue, but we are complicit.
| spelunker wrote:
| Yeah, I think it's this rather than load. Slack should be
| able to handle load fine (probably), but since this is the
| first weekday post-holidays I imagine some deployment broke
| something.
| hinkley wrote:
| The two cliched sources of this problem are 1) someone pushed
| something out over the holidays that could have waited until
| January, or 2) peak capacity was negatively affected since the
| last time a spike happened, nobody had a way to monitor it, and
| so this has been broken since the end of May. On further
| reflection, someone will admit that they noticed a notch-up in
| response times and did not connect the dots.
| raverbashing wrote:
| It might have been the increased usage due to the pandemic
| (since it didn't happen from 2019 to 2020) + the sudden inflow
| of people at the same time.
| majewsky wrote:
| This would be a good explanation for an outage in March or
| April 2020, not so much in January 2021.
| T-hawk wrote:
| There may well have been a bigger delta in usage from
| Sunday Jan 3 to Monday Jan 4 2021, than for any particular
| pair of days in March-April 2020.
|
| Of course 2020 saw an increase, but it was smeared over a
| week or a month rather than being a big jump in a single
| day after everyone's holidays.
| ShaneMcGowan wrote:
| status page says apps / API are fine yet am trying to work on a
| slack app but can't cause of these error, a bit annoying
|
| edit: it is now showing as a total outage on the status page
| bmhin wrote:
| It seems weird to say there are issues with connections but
| everything else is working fine. Like is the API technically
| fine on their system metrics but no one can connect to use it
| so it stays as green? Doesn't really help much in practice it
| seems if connections are having issues and everything is
| unusable in practice to keep them as green.
|
| Would be similar if auth was down. You can connect to us, you
| just can't authenticate so can't actually do anything.
|
| Edit: Looks like they updated the status to properly show an
| across the board outage
| brayhite wrote:
| Notion is sluggish as well. That combined with the reports of HN
| potentially being slow, is there some larger network issue at
| play affecting a region of servers, potentially?
| realrocker wrote:
| Same for me. Data is missing in tables too.
| coldcode wrote:
| My feeling is some common infrastructure is failing or
| flailing, like some part of AWS, or some backbone provider. Too
| many flaky things going on at the same time to be independent
| failures.
| joana035 wrote:
| There are many reports of issues with ec2 and console on down
| detector, doesn't surprise me that aws status page is still
| green.
| jmartens wrote:
| My company monitors EC2 performance and availability across
| North America, and EC2 has been fine this morning,
| according to our data (that said, they had some
| intermittent issues the last 3 days).
| coldcode wrote:
| Maybe another internet routing issue, where a bunch of
| traffic is going through some guys router in Albania. Or even
| someone is actively interfering with a root server.
| krisdol wrote:
| Lever has been down for about the same amount of time as well
| (job recruiting platform).
| someonehere wrote:
| Well, there goes the credibility one team has in arguing that
| Slack makes a great knowledge repository.
| jdc0589 wrote:
| people actually argue this? slack is a great coms tool, and
| great BACKUP if you can't find something in a real
| documentation/knowledge/etc.. repository.
| [deleted]
| aledalgrande wrote:
| I think this is due to AWS. Not only Slack is down (e.g. Notion).
| AWS status page didn't show anything yet, but wouldn't be the
| first time. The last Kinesis crisis didn't show up for hours.
| scrose wrote:
| I find it amazing that we can be about an hour and a half into a
| service being completely unusable(ie. Slack telling me it 'cannot
| connect'), yet it's still marked as an 'incident' instead of an
| 'outage' in their own status page
| [deleted]
| DarkContinent wrote:
| It's marked as an outage now.
| ghostpepper wrote:
| and yet they're still proudly proclaiming: "Uptime for the
| current quarter: 100%"
| ajkjk wrote:
| Every time this kind of thing happens HNers love to grip
| about how the status pages aren't correct yet. It's so
| weird -- like the people freaking out about the outage are
| going to be updating their uptime trackers right now or
| something. Who cares? It'll be fixed later.
| mumblemumble wrote:
| This is entirely in line with my experience dealing with
| outages. 85% of the time to fix consists of fielding
| requests for status updates.
|
| It's like when people push the elevator button repeatedly
| if it's taking a while to arrive, only pushing the
| elevator button doesn't cause it to take even longer.
| michaelt wrote:
| Well, who consults a service's status page when it _isn
| 't_ down? During an outage is literally the only time a
| service status page has any function.
|
| A status page that doesn't get updated during an outage
| is about as much use as a solar-powered flashlight
| (without built in power storage).
| Merman_Mike wrote:
| I think the point is that a "Status Page" should show the
| accurate, current status of the system. Not a place
| holder for "we'll fix it later". People look at a status
| page to know what's happening _now_.
| ajkjk wrote:
| I wasn't talking about the status page, I was talking
| about the uptime % tracker.
|
| edit: oh, sorry, i did say 'status page' in the first
| part. But I kinda meant update % tracker like the parent.
| ric2b wrote:
| Enterprise contracts have SLA's about uptime, so it's
| definitely relevant.
| blibble wrote:
| what's the point in a status page that only updates after
| the outage has been resolved?
| pluto9 wrote:
| It doesn't. The status page is currently showing
| information about the outage. And the 100% uptime number
| is probably still correct, since it's only been out for a
| couple of hours.
| yjftsjthsd-h wrote:
| > And the 100% uptime number is probably still correct,
| since it's only been out for a couple of hours.
|
| It's listed as "Uptime for the current quarter"; if they
| mean that as "calendar quarter", i.e. since the start of
| the year, then we aren't even 100 hours into the quarter
| so we should be well below 100% by now.
| hunter2_ wrote:
| You might be correct, but why would anyone care about
| quarter-to-date as opposed to a rolling quarter ending
| now? The latter would mean that an outage of X duration
| will always reduce this statistic by the same amount
| regardless of how close the nearest calendar quarter
| boundary is, which seems like a superior quality for such
| a statistic to have.
| yjftsjthsd-h wrote:
| That would be a completely fair metric to publish, but it
| doesn't _look_ like what Slack is publishing. Of course,
| it 's possible that it is and it's just phrased somewhat
| poorly.
| pluto9 wrote:
| Fair point.
| overlordalex wrote:
| Interestingly their uptime for the quarter is still 100%
| despite a full-red dashboard. I wonder if that's something
| that is calculated only after an outage is resolved
| sofixa wrote:
| Well until the issue is resolved you can't know how long
| you've been down for, so you can't actually update the
| uptime.
| FartyMcFarter wrote:
| Why not? It could be updated second by second
| automatically if they wanted to.
|
| Probably not a priority though.
| mumblemumble wrote:
| Building out the infrastructure to automatically give
| real-time updates to your uptime figure sounds like a
| terrible use of company resources. Who knows how many
| person hours to spend on implementing and maintaining a
| feature that would remove maybe a few minutes of manual
| work from the incident post-mortem checklist, just for
| the sake of delighting people who need something else to
| look at for a workplace distraction now that Slack is
| down.
| deathanatos wrote:
| Well, now the outage is marked as resolved. And the
| uptime is _still_ "100%".
| Havoc wrote:
| Potentially an AWS issue?
|
| Slack, notion and AWS all at same time seems unlikely
|
| https://downdetector.com/status/aws-amazon-web-services/
| heroHACK17 wrote:
| Just wanted to come here and say, hey! How is everyone doing? How
| was your Holiday break?
| J5892 wrote:
| I managed to completely forget everything I knew about my job.
|
| Send help.
| coldcode wrote:
| I had 1 day off, so basically working all the time.
| thismodernlife wrote:
| I wondered about getting credits for the outage but you can't
| view the SLA page because the app is down.
|
| https://slack.com/intl/en-gb/terms/service-level-agreement
| glouwbug wrote:
| Time to slack
| louiechristie wrote:
| Breaking news: Productivity hits sky high today as tech workers
| forced to work at home and not use Slack.
|
| https://twitter.com/louiechristie/status/1346213038924427265...
| Snitch-Thursday wrote:
| Using our teams backup chatroom in a competing service. One of
| these days P2P Matrix will reach GA, then I plan to make a backup
| for my backups, Starfleet style.
| TeMPOraL wrote:
| That's one obscure reference, I love it.
|
| https://www.youtube.com/watch?v=UaPkSU8DNfY
| GILORA: Starfleet code requires a second backup? O'BRIEN:
| In case the first backup fails. GILORA: What are the
| chances that both a primary system and its backup would fail at
| the same time? O'BRIEN: It's very unlikely, but in a
| crunch I wouldn't like to be caught without a second backup.
| iso1631 wrote:
| Makes perfect sense for O'Brien, DS9 had serious backup
| issues in the first couple of years
|
| The Forsaken (season 1 episode 17) LOJAL:
| I've been reading the reports of your Chief of Operations,
| Doctor. They gave me the impression that he was a competent
| engineer. BASHIR: Chief O'Brien? One of the best in
| Starfleet. LOJAL: Then why aren't the backup systems
| functioning? BASHIR: Well, you know, out here on the
| edge of the frontier, it's one adventure after another. Why
| don't I escort you back to your quarters where I'm sure we
| can all wait this out.
|
| Rivals (season 2 episode 11) KIRA: My
| terminal just self-destructed. DAX: What? KIRA: I
| lost an evaluation report I've been working on for weeks.
| DAX: Even the backups? KIRA: Even the backups.
|
| There's a reason to have a backup to the backup by Destiny
| (season 3 episode 15)
| djsumdog wrote:
| I forgot about that. Starfleet really was in good shape back
| in.
| TeMPOraL wrote:
| Late 2300s were the golden years for Starfleet and the
| Federation. Sad to see they went downhill later on.
| nikolay wrote:
| Can't handle the post-holiday surge, or people wanted to justify
| their long holiday and pushed something only to witness their
| holiday optimism head-crashing on the surface of the reality?
| peter_d_sherman wrote:
| For some reason, today, HN seems exceedingly exceedingly slow (to
| me) after logging in...
|
| Without being logged in, things are as fast as they usually are
| -- but post log-in, _SLOWWWER THAN MOLASSESS_...
|
| I tried this several times; why this is, I can only wonder...
|
| To quote Bill and Ted... "Strange things are afoot at the
| Circle-K..."
| [deleted]
| hartator wrote:
| Everyone switched from Slack to HN. :)
| erk__ wrote:
| It is easier to cache stuff for users who are not logged in as
| it is the same for everyone. and everyone is looking up on
| Hackernews at the moment to see what is wrong with slack, which
| is probably the cause of the slowness.
| manquer wrote:
| While it is true for most applications. HN does not do any
| customisation of the content. I don't notice I am not logged
| in until commenting
| tyingq wrote:
| The point count for most articles is consistently lower on
| a view of the non-logged-in homepage. I assume that means
| they are cached more aggressively for non-logged-in.
| There's also the username and karma count in the top-right.
| reaperducer wrote:
| _While it is true for most applications. HN does not do any
| customisation of the content_
|
| I don't think that's true. For example, if you hide a
| thread while logged in, it remains hidden when you return.
| tayo42 wrote:
| You would cache the rendered html, the front page has your
| username and points and stuff. The whole page will be
| unique to you because of that
| jsteemann wrote:
| From the status page
| (https://status.slack.com/2021-01/9ecc1bc75347b6d1), updated just
| now:
|
| > We're continuing to investigate connection issues for
| customers, and have upgraded the incident on our side to reflect
| an outage in service. All hands are on deck on our end to further
| investigate. We'll be back in a half hour to keep you posted. >
| Jan 4, 5:20 PM GMT+1
| aliljet wrote:
| Slack has been a uniquely iffy service. I wonder if there's a
| solid decentralized alternative.
| fwip wrote:
| https://cabal.chat/ is a good program. It does not support all
| of slack's features, but is truly peer-to-peer so there's no
| central points of failure or servers that can go down. (Well, I
| suppose if they released a buggy version of the software and
| you updated, that's a central source, but that's true of most
| software.)
| cuspycode wrote:
| I've had good experiences with ngircd. It's an IRC server that
| is very easy to self-host, and it can be installed via APT on
| any debian/ubuntu/raspbian etc system, and I'm sure on many
| others.
| waihtis wrote:
| Obligatory BGP hijack prediction, since there seems to be a bunch
| of other sites down too.
| _nickwhite wrote:
| I read somewhere that Slack's yearly uptime SLA is 99.99%, which
| has already been exceeded on January 4th.
|
| Sending big hugs to their ops team.
| ririyad wrote:
| Notion is also down right now.
| dhbradshaw wrote:
| We just had to route around this so we're trying out
| chat.google.com for the first time. Seems ok.
| king_magic wrote:
| Slack has been failing - hard - the past few months. Yeah, I get
| it, lots of remote workers - but Slack has had months now to
| prepare for an onslaught given the trends with COVID. Simply not
| acceptable.
| marricks wrote:
| Does Google use slack? Wanted to start my year with some extra
| strength tinfoil and it'd just be great if the day a unionizing
| initiative started the major way workers could talk about said
| unionizing initiative went down.
|
| EDIT: according to a random quora post they do, so keep the
| tinfoil out!
| nicioan wrote:
| Funny how status.slack.com has reported Incidents and Outages for
| a while now, but still the "Uptime for the current quarter" is
| reported at 100% on the bottom right of the status table.
| spelunker wrote:
| I would think that number will be updated once the fire is put
| out.
| ProAm wrote:
| One of those affects money via SLA's. Slack is still up, just
| not usable.
| deathanatos wrote:
| > Slack is still up, just not usable.
|
| I.e., it's down.
|
| (And if you're saying that according to the legal blah blah
| blah of the SLA that this isn't _technically_ "down", then
| there might as well not be an SLA.)
| ProAm wrote:
| > And if you're saying that according to the legal blah
| blah blah of the SLA that this isn't technically "down",
| then there might as well not be an SLA.
|
| I am because Ive had these exact conversations with cloud
| hosted providers/products. Never once have we been refunded
| according to the SLA in our contracts. Never really down
| (according to legal).
| rocho wrote:
| Up means working. It does not mean that something is
| displayed on the screen.
| MattGaiser wrote:
| Other than the status page, I can't get anything displayed
| on the screen.
| benjaminwai wrote:
| It may depend on how they define the "quarter". If they take
| the quarter as the last 91 days and round the number to the
| closest percent, you might not see it changed unless the
| outages go more than 91x24x0.5% or 10.92 hours.. It's quite
| subjective and a guess.
| Justsignedup wrote:
| I'm setting up backups for our company on discord. That way maybe
| some webhooks won't be working, but communication resumes.
| tsar_nikolai wrote:
| Todoist reports that is down as well [0] I wonder if it would be
| connected in any way shape or form
|
| https://status.todoist.net
| djtriptych wrote:
| Could it be the obvious? Everyone signing on / loading slack
| clients at the same time?
| adwww wrote:
| Nice of you to ignore the majority of the world's population
| who have been up and working long before America woke up.
| buzzerbetrayed wrote:
| Relax. GP is clearly referring to an increase in people
| signing on do to the holidays ending and everyone coming back
| from work.
|
| Also, Slack has significantly more users in the US than in
| any other country[1], and it really isn't even close. So the
| offense you're taking is unwarranted anyway.
|
| 1: https://saasscout.com/statistics/slack-stats/
| kevindong wrote:
| Slack makes ~61% of its revenue from US customers which only
| has 4 time zones compared to the remainder of their revenue
| being spread out across ~20 time zones. It's not an
| unreasonable hypothesis.
|
| See page 12 of the document (which is page 14 of the PDF) htt
| ps://d18rn0p25nwr6d.cloudfront.net/CIK-0001764925/70df834...
| blntechie wrote:
| Slack most likely has more US customers but
|
| - Revenue is not same as users. Slack have tons of free
| users and some countries also has lower priced plans.
|
| - Many companies like Amazon etc. probably is counted as US
| revenue for Slack but they have more than 30% of their
| employees outside the US. This should not be huge numbers
| but significant.
| Karawebnetwork wrote:
| My gut feeling is everyone coming out of the holidays reading
| back weeks of notifications.
| jordache wrote:
| then wouldn't this happen every monday morning?
| unethical_ban wrote:
| Perhaps it is a large number of people checking into channels
| that are backlogged with lots of bot message notifications.
| floatingatoll wrote:
| A lot of organizations essentially took the last two weeks
| off from work, which is long enough for a 10-day autoscale
| window to spin down servers, and then get confronted by a
| load spike that wasn't pre-spun for.
| derwiki wrote:
| I would be shocked if Slack operations wasn't aware of this
| return to work spike and didn't pre-scale in anticipation.
| ABeeSea wrote:
| That doesn't mean they chose the right number to scale
| to.
|
| See for example, Amazon Prime day:
|
| https://www.cnbc.com/2018/07/19/amazon-internal-
| documents-wh...
| floatingatoll wrote:
| I wouldn't, since my personal theory is that the outage
| is due to AWS and GCP autoscale capacity exhaustion.
| We'll find out soon enough!
|
| EDIT: And down goes Notion, too:
| https://news.ycombinator.com/item?id=25634159
| yreg wrote:
| >AWS and GCP autoscale capacity
|
| What does this mean? What do cloud providers do when
| customers scale down their services? Do the providers
| literally power down servers? Do they sell the capacity
| to new customers?
| rocho wrote:
| They sell unused capacity at a much lower price (spot
| instances on AWS, preemptible VMs on GCP).
|
| I don't know if they power down some servers if usage
| stays low for a very long time.
| delfinom wrote:
| They rate limit how fast you can auto scale which is
| dependent on a slew of factors.
| spicybright wrote:
| This is after many had a week vacation. I'm sure most
| weekends some people pop in and out, and logins are more
| staggered on a typical monday morning.
|
| Just a theory though.
| obiefernandez wrote:
| This is definitely going to catalyze a nascent move over to
| Discord for my team. (~80 person consulting agency, distributed)
| floatingatoll wrote:
| Which would be unfortunate if based only on evidence of Slack
| being down today, given how many other sites are down as well.
| (Discord _is_ up, though!)
| obiefernandez wrote:
| it's not based on only that. Slack costs a lot of money and
| moving off of it is something that has continually come up
| over the last year or two. We even had a Rocketchat server up
| and running for awhile.
| pmlnr wrote:
| That's it, I'll move from [closed source, centralized, paid
| service] to [closed source, centralized, paid service]!
| SkyPuncher wrote:
| My biggest frustration with these outages is they're hard outages
| across all of Slack. There's no reasonable work arounds or
| fallback features.
|
| A plaintext web interface would keep my team moving along while
| they resolve their issues.
| zucked wrote:
| Nothing like a reminder of how dependent you've become on Slack
| for communication (and archival of conversations) like an
| outage on the Monday after the holidays when you're not on your
| A-Game yourself.
|
| "Let's see, I'll look up so and so's name with Sla.... shoot"
|
| "Okay, I'll just find that thing I .... nevermind"
| itisit wrote:
| The joys of multi-tenancy.
| ketamine__ wrote:
| Maybe we can chat with coworkers here. Is there a Carl around?
| capableweb wrote:
| I'm a Carl. I'm also looking for a coworker who was trying to
| contact me. If it's about last saturday, I promise nothing
| really happened between me and her, but I'm sure she already
| told you.
| djsumdog wrote:
| Do you guys not have e-mail?
|
| _looks through Inbox of 850 new aws, batch job and logging
| messages_
|
| oh yea, that's right..
| tinco wrote:
| _Don 't you guys have e-mail filters?_
|
| "Hey, our site has been down for 2 hours, why aren't you
| doing anything"
|
| _Looks at 850 unread messages in ops-notifications folder_
|
| ooh yeah, that's right..
| iso1631 wrote:
| "Looks at 850 unread messages in ops-notifications
| folder"
|
| In my organization it's spelt "deleted items"
| glouwbug wrote:
| Hey it's me, your Carl, send me your code
| thom wrote:
| I have tried to sell my organisation on a shared Google Chat
| doc for 90s style realtime ICQ chat in times like these, but
| there has been little uptake.
| hn_throwaway_99 wrote:
| G Suite actually has an entire Slack clone, chat.google.com.
| I've been on G Suite (now annoyingly renamed to Google
| Workspace) for years and actually just recently found it
| existed from another comment on HN.
| thom wrote:
| Yeah, this is what we actually use as a fallback, and I did
| push for this as an full time alternative given we'd get it
| free, but people dislike it for all sorts of frivolous
| reasons.
| J5892 wrote:
| Well now's the time for a big push!
|
| Oh wait, how would you share the link...
| J5892 wrote:
| I'm Carl.
|
| I lost the login for our shared AWS account. Mind sending it to
| me here?
| politelemon wrote:
| Yes it's _______
| jakejarvis wrote:
| root / hunter2
| newman8r wrote:
| A good time to host your own slack-like chat with mattermost
| instead
| ryanisnan wrote:
| While Slack is down, let's remind ourselves that it is not the
| end of the world. To their ops team, good luck in sorting out the
| root cause(s), to mitigating their re-occurrence, and to emerging
| the other side a stronger team. You've got this.
| ghostpepper wrote:
| Hopefully they have a backup system for internal comms
| manquer wrote:
| It is not end of the world if you are just using slack for
| intra team communications.
|
| However lot of the monitoring which alerts on slack and other
| automatic notifications are critical for many teams.
| bombcar wrote:
| Critical systems dependent on another system are just as
| reliable as the third-party system; so this may be a good
| wake-up call for many.
| peeters wrote:
| I'll concede that it's possible to not know what the problem is
| by now, but I won't concede that this should not be called an
| "outage" at this point.
| aerovistae wrote:
| I initially misread this as saying that you won't concede it
| shouldn't be called an outrage.
| richardwhiuk wrote:
| Huh? It's definitely an outage.
| ghshephard wrote:
| Amidst all the double negatives, I think that's what the
| parent poster was saying.
| deathanatos wrote:
| It's a red "do not enter" esque thing now, but when the
| parent posted, I think it was still a yellow triangle.
|
| But also, the status page still proudly proclaims that the
| "Uptime for the current quarter: 100%" -- which is clearly
| false at this point.
| sweezyjeezy wrote:
| That's what they mean - double negative. It has also been
| upgraded on their side to 'outage'.
| keehun wrote:
| Assuming this is a bad deployment--not hardware/network issues:
| It will be interesting to read their post-mortem, on why rollback
| still has not happened yet after 2 hours of outage. You would
| hope that a service the level/popularity of Slack would plan for
| deployment-related outages and be able to roll back a deployment.
| hn_throwaway_99 wrote:
| Just a note, if your company uses G Suite, chat.google.com exists
| and is basically an entire Slack clone. We use it as a backup
| when Slack goes down (obviously doesn't help for bots and ChatOps
| we've set up, but works well for realtime work chat).
| hacker_newz wrote:
| Calling it a clone is a stretch. There are a ton of features
| missing.
| twistedpair wrote:
| Our org just failed over to GChat as well. Piece of cake.
|
| Quite glad we never moved any critical ops work into Slack
| bots, since we don't control Slack.
| mahdyarhp wrote:
| Give Telegram a chance. It's worth it! telegram.org
| jamespwilliams wrote:
| Do you recommend it for use by teams?
| ARandomerDude wrote:
| > Customers may have trouble connecting or using Slack
|
| I can't stand how marketing speak pervades every sphere of the
| world. Their entire system is offline (inconvenient certainly,
| but it happens) and they can't bring themselves to say "Slack is
| down. We're working on it and will be back ASAP." or something
| similar. Instead we _may_ have trouble.
| rflrob wrote:
| I'm still logged in on mobile and can communicate with people
| from my team, but cannot log in from desktop. With so few
| people able to connect, it's also unclear whether Slack is
| eating my messages or there's just no one to respond. So I'd
| certainly rank that as "trouble using slack" rather than "the
| system is completely down".
| the_duke wrote:
| Well, most outages start with issues that increasingly get
| worse.
|
| That apparently was also the case here. I started having
| smaller connectivity issues before it went down completely.
| tshaddox wrote:
| Why do you consider that to be "marketing speak?" It appears to
| be concise, direct, and accurate. The phrase "Slack is down,"
| even if true by some interpretations (it hasn't been
| "completely down" from what I have seen), is imprecise and
| informal.
| ekianjo wrote:
| It was pretty clear the 'may' is a euphemism when your whole
| system is down.
| [deleted]
| danepowell wrote:
| There's a wide gulf between "some customers may have trouble
| using Slack" and "most/all customers are completely unable to
| use Slack". Putting aside formality, I'd say "Slack is down"
| is in fact more accurate here (assuming that it is true that
| most users can't use it, which is true for our company at
| least).
| ChrisRR wrote:
| Because it's not that you "may" have trouble
|
| If their service is down, you will have trouble. The service
| will be absolutely inaccessible. Don't give people hope with
| "may"
| tshaddox wrote:
| But 1) it has apparently not been the case that the service
| was "absolutely inaccessible" and 2) "Slack is down" is
| still very imprecise and not a great alternative even if
| the service had been "absolutely inaccessible."
| Lammy wrote:
| To me it's mildly irksome in the same way as people who say
| "may or may not". Like, yes, those are the two possibilities,
| thank you.
| briffle wrote:
| I agree with you in principal, but I have had no problem
| connecting to Slack today (I have a free one I use with
| friends, not a business account) so to say they are down would
| also be inaccurate.
| mepiethree wrote:
| The funniest part to me is that their status page still says
| "Uptime for the current quarter: 100%". These uptime messages
| are so BS. Heroku reports 6 9s of uptime for this month, even
| though _their own status page_ shows multiple days with
| incidents >6 hours
| Havoc wrote:
| Yeah the amount of airgapped uptime dashboard in SV currently
| is insane.
|
| Even the major clouds...hn is going wild about it yet the
| dashboard says all good.
| grecy wrote:
| Someones performance bonus depends on it, you can bet there
| is going to be A LOT of heel dragging when it comes to
| updating those statuses!
| pluc wrote:
| Well that's what happens when Legal joins the fun and starts
| defining what "downtime" means
| jeffbee wrote:
| How do you know it's down completely? Maybe it's down for you
| and maybe even down for a majority but still up for some
| subset. Happens with many products.
| BurningFrog wrote:
| Yeah, I _don 't_ know, because the Slack status page is so
| vague.
| [deleted]
| ARandomerDude wrote:
| https://status.slack.com/
|
| Every service is marked as "Outage" as of now (also when I
| wrote the comment).
| ellisv wrote:
| yet also "Uptime for the current quarter: 100%"
| delecti wrote:
| Maybe that's just for the outage tracker. _It 's_ up.
| jf22 wrote:
| I don't see this as a big deal. Not all metrics have to
| be real time.
| gog wrote:
| The outage is not for everybody, I can connect.
| politician wrote:
| Thundering herd. If you can avoid it, don't connect.
| jonwachob91 wrote:
| It's been a rollercoaster for me the last few hours,
| sometimes servers are up sometimes they are down. Point
| being, they are intermittently up :/
| remyp wrote:
| I'm willing to bet this is influenced more by SLAs and Slack's
| lawyers than marketing speak.
| tvorm wrote:
| As someone in marketing, it's a little bit of this, and a
| little bit of determining what the most default, catch-all
| statement could be well ahead of time to make "crisis comms"
| that much smoother.
| Fauntleroy wrote:
| Probably just some technicality to try and escape litigation
| wrt SLAs for their bit corporate contracts.
| elbrian wrote:
| Very strange to be so upset by an accurate and concise
| statement, while offering an alternative that isn't even
| factually true.
| Reebz wrote:
| Not marketing. That type of language comes from legal and the
| "never proactively admit fault" mantra
| MildlySerious wrote:
| The status page might just lack a branch for when everything is
| down entirely and only differentiates between "all green" and
| "not all green".
|
| I assume this doesn't happen all that often.
| anon34234 wrote:
| This is probably for legal reasons, i.e. Service Level
| Agreements. "May" leaves the door open to other interpretations
| and reporting from other systems.
| ak217 wrote:
| It never went down for me.
| res0nat0r wrote:
| It's not entirely offline though. I was connected via my phone
| ~90 minutes ago when I first got online today and never had any
| issues and was able to tell folks at work my PC connectivity
| may be spotty for a while. When I signed in via my Mac laptop I
| wasn't able to connect for about 20 minutes, and was redirected
| to the status page. I've been online for about an hour now.
| cozzyd wrote:
| I did manage to receive a message a few minutes ago, so it
| might be just mostly dead.
| NyxWulf wrote:
| If it's all dead, there is only one thing you can do.
| unreal37 wrote:
| It's not down for me...
| x3n0ph3n3 wrote:
| It's not down completely as I'm chatting with my coworkers on
| it now.
| tannhaeuser wrote:
| "We're experiencing increased service degradation" is so
| 201x-ish
| AlotOfReading wrote:
| I find it hilarious that the status page is still saying the
| uptime for the current quarter is 100%. I'd think it'd have
| lost at least one 9 by any obvious definition of "current
| quarter".
| jf22 wrote:
| Maybe it's not updated in real time? I wouldn't publish my
| teams uptime metrics while a crisis was happening...
| ourcat wrote:
| "Something's not quite right"
|
| Another classic.
| kaszanka wrote:
| "Oopsie woopsie!"
| wpm wrote:
| "Shit's fucked yo, send whiskey"
| war1025 wrote:
| For the record, I am logged in and have exchanged messages with
| at least one other person. The rest of my team does seem to be
| unable to get in though. Maybe it's because I have just had the
| Slack tab left open in my browser since before I left for
| Christmas?
| [deleted]
| orthecreedence wrote:
| "It will replace email."
| minitoar wrote:
| I mean...wasn't gmail (which effectively IS email for many,
| many people) down recently?
| jolmg wrote:
| > which effectively IS email for many, many people
|
| Doesn't have to be, though. One person doesn't even have to
| tie their address to a single provider, and seeing past
| received messages doesn't even need internet connectivity.
| jll29 wrote:
| Just a note to say "thanks" to the Slack team for the uptime when
| Slack is _not_ down, it 's been incredibly useful as a tool to me
| when other enterprise systems (Teams, Outlook & co.) have been
| down over the last couple of years, and especially throughout
| 2020.
|
| Somehow Slack is very resilient in general. I also appreciate its
| UX/UI being far superior to Teams.
|
| Ultimately, the cloud is often a single point of failure that
| companies become over-dependent. So I'd favour a free (as in
| freedom) and open source self-hosted/deployed alternative if
| there was one (even if it was from Slack and for pay). I agree
| with most on here that there isn't such a thing yet - but it's
| well worth building! So those of you out there who are
| considering implementing "yet another text editor", maybe this is
| something to work on.
| deeblering4 wrote:
| I wonder if this is one of the larger natural drops/spikes of
| legit users that their infrastructure have seen?
|
| * lots of users are coming back to work after the holidays today
|
| * lots of users take the holidays off and fully disconnect
|
| * significant new users added in 2020, with so many teams going
| remote
|
| Sounds like a possible recipe for infra scaling issues and/or
| cascading failures to me
| barathvutukuri wrote:
| Should I contact Salesforce?
| keehun wrote:
| Looks like Slack just updated their status page to show a
| complete outage, not just an incident for "Messaging" and
| "Connections."
| musing-penguin wrote:
| IMO it's really poor Slack took 1 hour to update this to an
| outage, given the impact this seemingly had right from the off.
|
| It's also extremely bad that we're 1 hour in, and they are
| still "investigating", with no more details than that.
| lordnacho wrote:
| I'd be a little nervous if I'd recently bought Slack for $20B.
|
| It's not like there aren't alternatives. You could even imagine
| someone has a live bridge between Mattermost and their Slack
| team, making the switchover seamless.
| jmartens wrote:
| Why be nervous? Outages happen. If this were a string of major
| issues over a few weeks or months, that might be cause for
| concern, but a single incident is not.
| wrycoder wrote:
| c-suite politics are brutal. There is always a reason to be
| nervous, it's just a matter of degree.
| joana035 wrote:
| Lots of services quite red on https://downdetector.com
| MR4D wrote:
| Interesting that PG&E is on the list for power outage in SF.
|
| Wonder if that's related?
|
| https://downdetector.com/status/pge/
| ceejayoz wrote:
| Down Detector largely seems to track daily workplace usage
| patterns more than meaningful outages.
| joana035 wrote:
| Which is great to detect common issues across many companies.
| For example, clicking on the cards shows that many of them
| are related to "network connection".
| ceejayoz wrote:
| It's not, though. They're all spiking because everyone got
| back from the weekend. Just look at the comments.
|
| H&R Block's page there has this as the most recent comment,
| from an hour ago:
|
| > My sister was able to have a bank pull money off her card
| yes her old card dunno which bank ill find out in bit
|
| Two hours ago:
|
| > I went to atm and thought I was crazy my pin wasnt
| working.
|
| These reports are entirely useless.
| jmartens wrote:
| Unfortunately, Down Detector doesn't actually monitor these
| services, so we don't know if they are truly down. Down
| detector relies on human behavior, and we all know humans don't
| act rationally.
| geerlingguy wrote:
| I got dropped into not-dark-mode with a connection issue message
| in each of my workspaces.
|
| I guess everyone hopping back online over the course of a few
| hours for the new year is too much to handle!
| joshxyz wrote:
| Early 2021 downtime jeez good luck ops team i believe in you
| exhaze wrote:
| When I was at Uber, we noticed that most incidents are directly
| caused by human actions that modify the state of the system.
| Therefore, a large "backlog" of human actions that modify the
| system state have a much higher chance of causing an incident.
|
| My bet is that this incident is caused by a big release after a
| post-holiday "code freeze".
| cratermoon wrote:
| I have definitely worked in places where the times right before
| and right after a change freeze were the most unstable, so that
| could be it. However, as others have mentioned, it's pretty
| early on the west coast of the US. Unless some engineer was up
| extra early (perhaps at the behest of an anxious project
| manager) it seems unlikely to be a release.
|
| What it could be is some engineer somewhere coming in after the
| holiday, noticing a slightly flaky thing, and thinking, "I'll
| reboot/redeploy/refresh this thing so the flakiness doesn't get
| worse". Only it turns out the flaky thing was a signal of
| something else falling over. Or maybe the redeploy was the
| wrong version because of bad CI/CD, or maybe the person just
| fat-fingered it.
| ikiris wrote:
| Most releases are automated with time lockouts.
| cratermoon wrote:
| In what companies?
| ikiris wrote:
| Competent ones like those you'd hear about being down on
| HN.
|
| At least that how it worked at one FAANG
| savo92 wrote:
| Or unless that engineer was not in the US
| cratermoon wrote:
| Very possible. I don't know what Slack's workforce
| distribution is. In places I've worked there have
| definitely been some incidents in US off-hours triggered by
| someone on the other side of the world.
| alfalfasprout wrote:
| This is very likely a broken release. The timing lines up with
| pacific time too well.
| NewEntryHN wrote:
| Slack does progressive roll-outs. The broken release
| hypothesis seems very unlikely.
| kevinmchugh wrote:
| They declared the issue at 7:14AM PST. How long is their
| deploy process?
|
| That sounds pretty early to think somebody on the west coast
| did something, other than maybe acknowledge the pages and
| declare the incident.
| [deleted]
| exhaze wrote:
| To elaborate a bit more on this point, you have to think about
| it like any complex system failure - it's almost never one
| thing, but rather a combination of many different factors. The
| factors around post NYE releases:
|
| - high risk changes that weren't released pre-holidays get
| released. Depending on the company, this could mean a 1-week to
| 1-month delay between implementation and release. The greater
| that interval, the higher the divergence world of production
| and the world of the new feature
|
| - lots of new hires (new year = new hiring budget). New hires
| are missing some tribal knowledge about the system and make a
| production-breaking release.
|
| I tried to think of other reasons, but these two overwhelmingly
| stand out as the two biggest reasons. Would love to hear from
| others.
| brundolf wrote:
| Sudden surge of traffic as all their users returns to work?
| ciceryadam wrote:
| Could be, it's the perfect time overlap between US-West,
| US-East, and Europe.
| johnmaguire2013 wrote:
| Yes - I wondered if they took some servers down prior to
| the break as a cost saving measure, and forgot to reinstate
| them.
| fragmede wrote:
| Doubtful. It's not _impossible_ a company the size of
| Slack would be reliant on a specific engineer logging on
| in the morning before a traffic spike so the service can
| handle the spike in load, but that 's a misuse of modern
| distributed cloud-based computing.
|
| Hate on the cloud all you want, but AWS has (several
| flavors of) load balancers and various ways to
| automatically scale up and down resources (and if you're
| conservative, you can disable the 'down' part). If you're
| operating a major SaaS company like Slack and not taking
| advantage of them, something's gone wrong.
| adrianpike wrote:
| I think you're right on the first bullet, but not the second.
| If it was mid-Feb, then maybe, but the next FY hasn't even
| started yet for a ton of companies, let alone onboarding
| newbies to production.
| lwedel wrote:
| I would add here the potential scaling issue - holidays were
| a dry season - less meeting. So if they have some automation
| for scaling down to reduce cost, it may have bitten them in
| their arses now.
|
| People came back to work, and most of them start around the
| same time (US wise at least).
|
| Hence kids - a vital lesson for all of us - don't start the
| call at a full hour, give it 3-7 min to make your coworkers
| confused and give some time for the systems to auto-scale ;)
| kevinmchugh wrote:
| If new hires tends to break production, it's not in the first
| business day of the calendar year. December gets really quiet
| for recruiting, typically, as candidates get busy with their
| social lives, and scheduling interviews gets harder.
|
| January is busy for recruiting, but given a week or two of
| interviewing and negotiating, two weeks notice, it's probably
| February before new employees are starting, and they're not
| making big, production-damaging deploys for a week or two
| after that.
| likpok wrote:
| You will also get a pause in new hires in late December for
| the same reason. I've certainly accepted an offer late in
| the year and then didn't start until the new year.
|
| Probably not as big of a rush as the end of school year
| rush in summer though.
|
| I also doubt that new people will be breaking production on
| day one. Even at a fast moving startup I'd expect it to
| take a bit to go through the onboarding paperwork, get a
| laptop and actually try pushing a change to production.
| rwc wrote:
| Seems to be more than that. Even slack.com in an incognito
| browser fails.
| zwily wrote:
| What does an incognito browser have to do with anything?
| johannes1234321 wrote:
| It means that one is not sending a session cookie of any
| kind, thus should be sent to a 100% cached version. No "Are
| you XYZ and what to log into ABC's Slack again?" box.
| sbilstein wrote:
| non-logged in user may not go through all the same
| codepaths as a user with cookies present.
| SQueeeeeL wrote:
| That means it's not a user auth error
| derin wrote:
| An incognito browser would ignore all client-side cookies,
| so the Slack web client would not try to - say - resume a
| previous user's session or re-use any previously saved
| data.
|
| Likewise, incognito mode will also ignore most cached web
| content, meaning all assets on the Slack web app will get
| loaded again from scratch. This "clean state" start could,
| theoretically, get around issues with old - potentially
| incorrect/outdated - assets being loaded, even though that
| really shouldn't happen under most circumstances.
| zwily wrote:
| Sure, but why does that indicate the issue probably
| wasn't related to a code push, like the person I
| responded to said?
| NewEntryHN wrote:
| Another common cause is resource exhaustion as a result of
| poorly monitored resources (or bugged monitoring). For example
| Google's authentication was down because their system reported
| (wrongly) available quota of 0. The last two incidents at my
| company were also related to resource exhaustion.
| ThePadawan wrote:
| This is one of the original concepts why to go capital-A Agile.
| Make smaller releases more often, so at least if something
| breaks, it's (hopefully) something small, and least it's easier
| to trace.
|
| (I'm not making a statement if that's good or bad or if it
| works or whatever. Please don't read an opinion into it.)
| erik_seaberg wrote:
| This. If you roll many changes into a single deployment, you
| don't know which change broke what. But if you have two or
| three weeks of commits waiting, it's hard to do otherwise.
| nkassis wrote:
| I would bet it's just the influx of traffic post holiday with
| systems that haven't been updated in so long maybe some
| annoying memory leaks have crept up and gone unnoticed or some
| other bad state that was exacerbated by return to work day for
| most NA folks. Code freezes were good at identifying bugs that
| only show up after long periods.
|
| Doubt anyone releasing big changes Monday morning.
| hnlmorg wrote:
| That might be true but when you take the global usage of
| Slack and their respective time zones, more than half the
| world would have signed into Slack this morning before SV had
| and I certainly didn't notice any downtime this morning in my
| time zone.
| radicalbyte wrote:
| It was ropey before SV woke up, I thought it was just my
| (normally rock solid thanks to using Ubiquity) network
| having issues.
|
| Guess it was Slack being Slack.
| [deleted]
| bobthepanda wrote:
| What would make that strange? Where I work it is frowned upon
| to do releases on weekends and so bad changes due to buildups
| happen on Monday.
|
| Although, we also don't close the pipeline for just any
| holiday break. In fact low holiday traffic is a good time to
| keep pipelines open, since changes will impact less people.
| exhaze wrote:
| I haven't worked at Slack, so I can't speak with high
| confidence. A traffic spike is a possible reason, but I'm
| willing to bet that it's not the reason:
|
| > Doubt anyone releasing big changes Monday morning.
|
| This is definitely an engineering best practice, and by best
| practice, I mean something that Uber's, I mean Slack's SRE
| team strongly pushed for, and got politely overruled on.
| After a code freeze is lifted, it's quite common for lots of
| promotion-eager engineers to release big changes.
| throwaway201103 wrote:
| Interesting, I've never worked anywhere where engineers
| decide when to release changes. That's a product decision,
| and there is a process of review and approval at both the
| code level and the functional/end-user-experience level
| that has to happen first.
|
| Did you mean that literally? E.g. is it common at Uber that
| engineers can release changes to production on their own?
| agrippanux wrote:
| In my experience it's not promotion-eager engineers that
| want to push after a code freeze, it's antsy product
| managers. YMMV tho.
| glouwbug wrote:
| What's there to change in Slack, though? It's arguably a
| messaging system, and that feature is tried and tested.
| That, and giphys, to be honest.
|
| EDIT: Guys it was a joke, chill
| VectorLock wrote:
| HN's tolerance for jokes and sarcasm is extremely low.
| [deleted]
| brlewis wrote:
| I'm not sure about that. I feel like I get more upvotes
| from sarcasm and jokes than from insight. In this
| instance, I think it's because when people hear something
| dumb said seriously in real life, they're not going to
| readily recognize online that it's a joke.
| Cederfjard wrote:
| Yeah, Poe's law applies here. That's definitely something
| someone less informed might say in earnest.
| godot wrote:
| IMO it really doesn't have to be promotion-eager
| engineers or antsy product managers. I'm fairly satisfied
| with my role and comp and work type with where my
| career/life-stage is. I just did a code release first
| thing this morning, not because I am promotion-eager, but
| just because I'm picking back up where I left off, like
| any normal day. Granted I work at a much smaller company
| than Slack with orders of magnitude less traffic.
| Thaxll wrote:
| You just don't deploy something major the first day after a
| 2 weeks vacation, it does not makes any sense.
| matsemann wrote:
| Why? I had a rewrite of some core logic the last day
| before Christmas that I didn'td deploy, as it wasn't time
| critical to get out and I didn't want to be disturbed
| during holidays. Today it was perfect to deploy, as I can
| watch it the whole week if needed.
| hacky_engineer wrote:
| Yeah, I do this all the time. I don't want to be bothered
| on the weekend, so I push releases at the beginning of
| the week when possible.
| devilduck wrote:
| lol Good Luck!!
| johnmaguire2013 wrote:
| Well, I think it probably depends on where you work. At
| my work, people just took 2-3 weeks of time off. It takes
| a moment to get your head back in the game.
| adrianpike wrote:
| Everywhere I've worked often has a massive backlog of
| things that get released after a moratorium or extended
| holiday week. Those are usually the worst weeks to be
| oncall since things are under so much churn.
| beamatronic wrote:
| It depends on the goal you're trying to accomplish. Are
| you going for a promotion or bonus? Or instead is your
| goal to maximize uptime?
| adambyrtek wrote:
| I doubt that regularly releasing breaking changes that
| reduce uptime is a good strategy to get a bonus or
| promotion.
| Aperocky wrote:
| Does Uber/Slack not release in CI/CD? At least in backend?
|
| I don't see any need to deploy a big change at once in the
| software world today. At worst feature gate the thing you
| want to do and run it in a beta environment, but still push
| the actual code down the pipeline.
| exhaze wrote:
| > run it in a beta environment
|
| Every Uber/ex-Uber engineer is nervously chuckling at
| this comment right now
| aeyes wrote:
| For those that don't know what this comment is about:
| https://eng.uber.com/multitenancy-microservice-
| architecture/
| xtracto wrote:
| Aaah the wonders of not having to be PCI or SOC2
| compliant...
| yjftsjthsd-h wrote:
| I'm actually more confused after reading that. I assumed
| that you meant that tested in production on purpose, but
| it _sounds_ , at a skim, like they do non-prod testing
| environments - in fact, it looks like they've gone to
| having _multiple_ beta environments of every service?
| aeyes wrote:
| My understanding is that they have a "tenancy" variable
| in every service call which can take a different code
| path. They seem to only have one environment for
| everything and do tests/experiments at code level based
| on this variable.
| yjftsjthsd-h wrote:
| Ah, thanks; that explains it nicely
| dang wrote:
| All: large threads are paginated, especially today when our
| server is steaming. Click More at the bottom of the thread for
| more comments, or like this:
|
| https://news.ycombinator.com/item?id=25632346&p=2
|
| https://news.ycombinator.com/item?id=25632346&p=3
|
| (Yes, these comments are an annoying workaround. Their hidden
| agenda is to goad me into finishing some performance improvements
| we're badly in need of.)
| davesque wrote:
| Considering how ubiquitous slack use seems to be in a lot of
| major tech companies, I wonder if it's reasonable to ask whether
| or not the stock market's performance this morning is somehow
| correlated?
| korethr wrote:
| Well, they're mostly back up now. I'm curious to see what will
| come of the postmortem, and if that report will be made public,
| even if only in part.
| gavnewalkar wrote:
| My slack (desktop + mobile) has been down for the past 30~ mins.
| Strangely I can still receive messages/alerts on my phone.
| ajb wrote:
| Me too. However it won't let me open DMs to certain people.
| curlypaul924 wrote:
| I cannot get to slack in my phone or in the browser.
|
| I wonder if this is because I haven't used the phone app in a
| few days, so I was already logged out, but you and others were
| still logged in?
| dhagz wrote:
| Same, but when I go to view the message it hangs.
| traumivator wrote:
| Same, I still see my colleagues typing but that's it.
| Topgamer7 wrote:
| I too am in this same boat.
| [deleted]
| tedmiston wrote:
| I've experienced this with Slack before where the push
| notifications come through but opening them fails to load.
|
| I imagine their infrastructure to send push notifications is
| decoupled from their infrastructure for chat services
| themselves.
|
| It'd be interesting to know if they have a master switch to
| disable notifications in times like this where they aren't
| usable anwyay.
| rocho wrote:
| This happens to me almost every day, when there are no
| incidents/outages. And it's not a network issue, the other
| apps work fine (e.g. WhatsApp).
| kmichler wrote:
| We're using Google Chat again. Feels ancient.
| https://chat.google.com/
| igetspam wrote:
| If you're using gsuite already, it's a usable failover. I send
| all my alert notifications there, as a fallback already.
| Dragging people in was trivial. It's better than the group SMS
| that one person tried to use.
| adrianpike wrote:
| Same - and it's just... weird. The "everything must be in a
| thread" model feels really clunky.
| kmichler wrote:
| That's a good point, however threads do tend to help with
| keeping things organized when in a channel.
|
| It actually might be a good thing that everyone doesn't feel
| the need to look at slack every X minutes.
| politelemon wrote:
| Subjective... I've found Slack and co's interspersed
| conversations far too chaotic, and temporal; threading is a
| great way of organising many different concurrent topics.
|
| And to be clear I don't mean Slack's implementation of
| threads which is hiding it away in a separate panel and which
| doesn't get used by everyone either.
| johnc1231 wrote:
| I find Zulip to be a nice balance. Threads are much more
| prominent than they are in Slack, but aren't clunky.
| elyseum wrote:
| Slack down, productivity up!
| antisthenes wrote:
| Service Interruption as a Service.
| killjoywashere wrote:
| I noticed something about dead slack channels on the GCP console
| last night, which I thought was odd. Anyone see something
| similar?
| [deleted]
| obventio56 wrote:
| I'm just enjoying it while it lasts
| tuckerpo wrote:
| Time to talk to my coworkers in person... _dry heaves_
|
| /s
| forgetfulness wrote:
| I'm fairly certain that dry heaves are not a COVID-19 symptom,
| at least.
| tempest_ wrote:
| Who knew Salesforce could work this fast :p
| [deleted]
| chrisseaton wrote:
| It's snarky - snarky comments are against the guidelines
| here.
| hackerpain wrote:
| Indeed.
| btbuildem wrote:
| We were just joking with the work mates -- SF bought Tableau 2
| years ago and haven't ruined it yet, only because it takes them
| that long to do anything ;)
| ellisv wrote:
| Can't ruin something that's already ruined.
| tmsh wrote:
| They had the same issue 3 months ago:
| https://news.ycombinator.com/item?id=24687957
| technick wrote:
| Yay Salesforce!
| siruncledrew wrote:
| $28B well spent!
| joshuaellinger wrote:
| And... it is back (for us at least.)
| holler wrote:
| If anyone is looking for an alternative way for fast and seamless
| chat with colleagues, friends, or strangers, you're welcome to
| check out Sqwok (https://sqwok.im)
|
| Although it's built as a live news discussion site versus a team
| messaging app, the topics can be about anything, are public, and
| inviting others is as simple as sharing the url of the post
| (mobile/desktop web).
|
| Example (reposted this hn post to sqwok):
| https://sqwok.im/p/Q3-1AZFLCSpjew
| abanayev wrote:
| I feel obligated to mention this, which was posted a mere 8 days
| ago.
|
| https://news.ycombinator.com/item?id=25550685
| cheschire wrote:
| PACE = Primary, Alternate, Contingency, Emergency
|
| If you haven't been able to justify testing your PACE plan with
| your bosses lately, now's a great time to go ask again.
| platetone wrote:
| oh good, it's not just me and my already bad first day of the
| year... is it too early to start drinking?
| louffoster wrote:
| no
| iso1631 wrote:
| You should probably ask Jimmy Buffett
|
| https://www.youtube.com/watch?v=BPCjC543llU
| _underfl0w_ wrote:
| > is it too early to start drinking?
|
| Depends on the timezone you're in, though one could
| theoretically cite a disparity between physical and
| mental/emotional/temporal time zones...
| joeblau wrote:
| I have friends at many multi-billion dollar companies who are all
| just twiddling their thumbs right now.
| itisit wrote:
| Me too. Although that's occurring irrespective of Slack's
| system status.
| joeblau wrote:
| LOL! Nice, yeah mine is directly related to Slack being down.
| A lot of text messages right now.
| hivacruz wrote:
| Never deploy a new release on Friday and Monday!
| schoolornot wrote:
| How many more outages until all trust is eroded and competing
| services differentiate themselves on the basis of uptime?
| Thaxll wrote:
| None because all competing services have some problems at some
| point.
| dijit wrote:
| If you're asking genuinely then I can tell you my experience
| when I was part of a SaaS shop, though the times have changed a
| lot and "my metric is not necessarily your metric".
|
| But it was roughly "one large impact a month, for six months",
| with large caveats that upper management for whatever company
| had to be working with the product during that month.
|
| Large companies don't care if X service went out during the
| night and impacted someone not in their timezone.
|
| If the CTO notices that he can't use something with the same
| regularity that he gets paid, then it doesn't take long for it
| to stick in their mind. But migrating everything is _so
| painful_ that the majority of large companies will do anything
| they can to avoid moving away.
| xibalba wrote:
| > But migrating everything is _so painful_
|
| This is a key point is the popularity amongst VCs in
| investing in B2B SaaS. I take their (and your) word for it.
| But honestly, I don't actually understand this.
|
| Why is migration so hard?
| gen220 wrote:
| There are plenty of UX reasons (learning new interfaces,
| etc). The burden here is generally distributed and diffuse.
|
| The really big one, for companies of a certain size / cash
| flow, is compliance. Companies spend a lot of time
| developing compliant work flows around a service like
| Slack.
|
| Migrating to another service requires rewriting the
| compliance narrative. The current compliance people might
| not have the confidence or willpower to do that
| effectively, and can raise legal objections to any such
| migration indefinitely.
| notsureaboutpg wrote:
| It's not, honestly.
|
| I worked at a huge corp which employed many non-technical
| people and had global offices and all sorts of contractors
| / full-time employees at various levels.
|
| But they invested in organizational / business software
| early (think SAP-type stuff) and so at any moment every
| employee / hired hand is accounted for, has a number, has a
| position in the org chart, has access to a wiki-type
| platform where they can be trained and informed of any
| changes to the workplace software suite and guided through
| any migrations.
|
| I've seen the company migrate off Slack onto Microsoft
| Teams. I've seen the company migrate to MS Sharepoint from
| Box. I've seen them migrate everyone onto a platform called
| SuccessFactors (still don't really know what it does, it's
| for tracking your career progress I think).
|
| There's work involved. Someone has to write a guide to get
| users to sign up with the new service (even with SSO linked
| to your corporate account it's not easy for most non-
| technical people). In the case of Slack, any hooks or bots
| created for any teams need to be turned off and people need
| to be informed well in advance of the shift and multiple
| times. Optional in-person trainings need to be provided.
| Some employees may have issues with the change (a missing
| feature in the new software that they rely on) and they
| need a forum (or even just someone to contact) where they
| can lay out these issues and get them resolved.
|
| But that's not that bad honestly. If moving from a platform
| gives you serious gains in uptime, it's not that bad. I
| think Slack's downtime problems are not so bad that most
| people will move yet, but they may soon get there.
| twh270 wrote:
| Many reasons, almost none of them technical. Off the top of
| my head, a few:
|
| * Getting out of the Enterprise Contract, or waiting for
| the year to end. * Training people on new software. * Loss
| of productivity. (1) Learning a new UI, processes,
| workflows -- both individually and organizationally. A
| feature or concept in "Tool A" may exist in a completely
| different form in "Tool B". Or not exist, and then people
| need to adapt to and work around the missing feature. (2)
| Missing out on needed information due to the above.
| Ultimately, software exists to move and transform data, and
| when you change the software people have to adjust.
| Sometimes that doesn't go great. "Oh, I didn't realize I
| needed to check this checkbox".
|
| Another way to say this is "organizational inertia", which
| is a fancy term that means "it's hard for people to adjust
| to change".
|
| And you might think developers and other technical people
| would have an easier time of it. They (we) do, but not to
| the extent you may expect. I've been on the front lines of
| a handful of migrations that affected only the IT staff,
| and it was a long and arduous process each time.
| tpxl wrote:
| > Loss of productivity. (1) Learning a new UI
|
| Man it bothers me so much when applications change their
| UIs on updates for no apparent reason other than "it
| looks better".
|
| IntelliJ changed the way build and debug buttons looked
| in some update and it took me days to get used to it and
| I could find them in a snap again. Slack did a couple of
| no-reason changes as well.
| danpalmer wrote:
| Medium sized team on Slack. We'd need to move ~60 full time
| in-house employees, ~10 remote contractors who aren't on
| other comms channels, ~20 infrequent freelance contributors
| who may not check messages often, ~5 custom bots and apps,
| and ~15 3rd party integrations (of which some won't support
| any given choice of alternative).
|
| This is not to mention the fact that half our staff aren't
| hugely technical, so have actively _learnt_ how to use
| Slack and it's features around notification control (things
| that may come "naturally" to the tech-savvy crowd on HN),
| @-things, bots, etc, and they would need to re-learn a new
| tool that is going to work in a different way.
|
| This would be a substantial effort for us, and we're a
| small company. Are there ways to materially minimise this
| cost?
| f0ff wrote:
| >Why is migration so hard?
|
| Why would anyone make it easy
| dijit wrote:
| I think there should be a new computer science law (if
| this one doesn't exist already):
|
| Things that are easy to migrate from get replaced by
| things that are hard to migrate from, eventually.
|
| IRC is incredibly easy to migrate from.
| ludjer wrote:
| IRC is easy to migrate from since there is nothing to
| migrate other then chat history. IRC is also missing so
| many features that slack provides out the box. And a law
| like that would not work since you would need to write
| complicated transformation scripts to transform between
| services. Also not all services are a 1-1 mapping. I like
| IRC but it has its limitations. That is why slack
| succeeded where IRC did not.
| tpxl wrote:
| > And a law like that would not work
|
| The parent meant a law as in "a law of physics", not a
| piece of legislation.
| x86_64Ubuntu wrote:
| We can call it the "Law Of Lotus Notes". I'm not sure if
| it's hard to migrate from, I can only assume that it is
| impossible to migrate from.
| dhagz wrote:
| Getting workflows re-established, any integrations you had
| developed or otherwise come to depend on may not work, you
| will probably lose history, etc.
|
| Plus, it will just take a long time to get everyone on
| board and using the replacement system. My department is
| slowly plodding towards using Teams over Slack, but there
| are enough hold-outs (my sub-department being one of them)
| that it still doesn't have wide-spread adoption.
| Frost1x wrote:
| Training, integration with proprietary internal systems,
| sheer momentum in the employee base, justifying or even
| creating a metric to show cost savings of a migration
| effort, business processes that rely on a specific feature
| of existing infrastructure needing to be met, the
| uncertainty of new vs the certain and known instability of
| something you have....
|
| If you had a small shop with a dozen tech-savvy people and
| Slack became a problem which was used exclusively for quick
| business chats, you could probably push a change to another
| chat platform the next day. You might struggle when you
| have thousands of employees, some that needed training to
| use Slack and still aren't that proficient.
| dkdk8283 wrote:
| I feel a little good every time Slack has an issue - it has
| brought social media communications to my otherwise social
| media free (excluding hn) life.
| NewEntryHN wrote:
| You seem to ask the question about the absolute number of
| outages, whereas uptime is about the number of outages per
| units of time.
| jrockway wrote:
| I say this every time Slack is down, but they just seem so
| shady to me. Nobody can connect right now, and their status
| site says "100% uptime in the last quarter". Maybe it's close
| to 100%, but it ain't 100%.
|
| I think we should push for a metric where "up" means 100% of
| people that want to use the service are able to use the
| service. If 1% of users can't send messages, then that should
| count as a full-blown outage and should start counting against
| whatever SLA they advertise.
|
| The underlying problem here is that apparently everyone lies
| about uptime, so if you don't, that looks bad to potential
| customers. I fear that we will have to push for some legal
| regulation if we want accurate data, and ... people will
| probably be opposed to that.
| teraflop wrote:
| > If 1% of users can't send messages, then that should count
| as a full-blown outage and should start counting against
| whatever SLA they advertise.
|
| Google published a paper last year describing this approach
| to measuring uptime:
| https://blog.acolyer.org/2020/02/26/meaningful-availability/
|
| The idea is to define availability as "the probability that
| the site 'appeared' to be down for a random user, averaged
| over a time window of size w". You can choose a particular
| value of w and look at trends over time, or you can plot
| availability as a _function_ of w to understand patterns of
| downtime.
| gwright wrote:
| Seems silly to worry about quarterly stats several hours into
| an outage. The most obvious explanation is quarterly stats
| aren't generated in real-time -- which isn't "shady" to me.
| de_Selby wrote:
| They should at least update the status site to reflect issues
| currently happening.
|
| I was wondering why the link from a Jira wasn't opening in
| slack, the page eventually timed out and gave me a link to
| status.slack.com where it told me everything was peachy. Cue
| me wasting time trying it again because apparently there was
| no issue with slack..
| nkassis wrote:
| You'll just end up with no SLA or pay a hefty amount to use
| services because that's an impossible standard to support for
| any service of a size like this.
| scrose wrote:
| Isn't this the problem? Companies like Slack set SLA's that
| they _only_ meet by lying about their uptime. It 's as good
| as having no SLA, except you're likely paying a premium
| based on the SLA they set.
| jrockway wrote:
| I'm not demanding 100% uptime, I'm asking that they say
| "99.94% uptime" when there has been an outage.
|
| Honestly, I could live with a 99.50% SLA, if that's what it
| really was. After today's probably full-day outage, they'd
| just have to be extra careful for the rest of the year (or
| pay me money). Kind of sucks when it's 1/4 that you blow
| your year's SLA budget though.
| helper wrote:
| That number is almost certainly updated manually. Check back
| tomorrow and see what it says.
|
| If you look at the history page you can see its not 100% for
| every month: https://status.slack.com/calendar
| derefr wrote:
| > I think we should push for a metric where "up" means 100%
| of people that want to use the service are able to use the
| service.
|
| I mean, that's nice to say, but how do you measure/prove it?
|
| Certainly, having the SLAed party check _themselves_ is
| silly. But what are the other options? If it was up to the
| customer, customers could make up faults to get free service.
| (Since it'd be up to the customer to prove, and customers are
| generally less technical than vendors, you'd have to expect
| /accept very non-technical -- and thus non-evidentiary! --
| forms of "proof", e.g. "I dunno, we weren't able to reach it
| today." Things that could have just as well been their own
| ISP, or even operator error on their side.)
|
| IMHO, contractual SLAs _should_ be based on the checks of
| some agreed-upon neutral-third-party auditor (e.g. any of the
| many status /uptime monitoring services.) If the third party
| says the service is up, it's up in SLA terms; if the third
| party says the service is down, it's down in SLA terms.
|
| (And, of course, if the third party _themselves_ go down, or
| experience connectivity issues that cause them to see false
| correlated failures among many services, that _should_ be
| explicitly written into the SLA as a condition where the
| customer isn't going to get a remedial award against the SLA,
| even if the SLAed service _does_ go down during that time. If
| the Internet backbone falls over, that's the equivalent of
| what insurance providers call an "act of God.")
|
| But in a neutral-third-party observer setup, you aren't going
| to get 100% coverage for customer-seen problems. An uptime
| service isn't going to see the service the way _every single_
| customer does. Only the way one particular customer would. So
| it's not going to notice these spurious some-customers-see-
| it-some-don't faults.
|
| So, again: what kind of input _would_ feed this hypothetical
| "100% of customers are being served successfully" metric?
|
| ETA: maybe you could get _closer_ to this ideal by ensuring
| that the monitoring service 1. is effectively running a full
| integration test suite, not just hitting trivial APIs; and 2.
| if gradual-rollout experiments ala "hash the user's ID to
| land them in an experiment hash-ring position, and assign
| feature flags to sections of the hash ring" are in use by the
| SLAed service, then the monitoring service should be given N
| different "probe users" that together cover the complete
| hash-ring of possible generated-feature-flag combinations. Or
| given special keys that get randomly assigned a different
| combination of feature-flags every time they're used.
| swsieber wrote:
| Some companies do this, though probably not publishing data.
| Any customer downtime is treated the same - for one, for
| many, for all (in theory, ha ha). But they take it pretty
| seriously.
| ceejayoz wrote:
| I wouldn't be shocked if businesses saw _increased_
| productivity during these.
| falcolas wrote:
| Not all - many workflows these days rely on Slack or its ilk.
| Benderbot, Jira/etc. connectors, calendar connectors, remote
| communication/standups, alerting...
| ryanSrich wrote:
| If you use slack primarily as a water cooler then yes.
|
| However, I drive everything through slack - GitHub, linear,
| calendars, Notion, support emails, etc. I have notifications
| turned off for every service we use except for slack. This
| allows me to effectively ignore everything except for slack.
| These types of outages destroy that workflow for me.
| postalrat wrote:
| I'm sitting here not sure if I should deploy code since most
| communication with the rest of my team has been cut off.
| derwiki wrote:
| If something went awry, and it caused more pain because
| Slack was down, how would you feel? If you're missing
| comms/observability then waiting to deploy seems prudent.
| dbbk wrote:
| So the answer is... people should stop working?
| derwiki wrote:
| I can't speak for you, but I can: * work
| on code * update JIRA * complete required
| trainings * work on my peer reviews (Workday/Okta
| are up) * review tech specs
|
| Deploying is actually a very small part of my job.
| postalrat wrote:
| Of course there are other things to do. But the things I
| had planned for the morning are all being delayed.
| dbcurtis wrote:
| Absolutely! Before the holiday shutdown, I Slacked myself a
| huge reminder list of things to jump on as soon as we started
| up again, so that I could hit the ground running in the new
| year. Oh, wait....
| saxonww wrote:
| I have less of an excuse not to be more personally
| productive, but I can't help anyone else (easily) if my
| primary method of communication is down. Not only because
| it's harder to contact you, but also because it's impossible
| for you to just ask in a channel and have me notice you.
|
| There's also this perverse incentive to Slack all the things.
| Lots of CI notifications are sent through it. Some org
| processes are implemented as workflows. There's been talk of
| how wonderful it would be to hook up tasking and work
| tracking to slash commands. I and others often use Slack
| instead of the 'official' tool to video call each other.
|
| An outage like this is still really disruptive. It's not like
| everyone realizes what's going on immediately or at the same
| time; we have backup tools, but our turn radius is pretty
| wide. Some of us can't even communicate effectively without
| memes, too, and backup tools don't have a giphy integration.
|
| EDIT: Do your CI integrations fail if Slack can't be
| contacted? Do those failures fail your pipeline? Whoops!
| ghostpepper wrote:
| Particularly on a Monday morning after a holiday, there are
| tasks that I know I need to be working on but cannot
| because relevant details were never transposed from slack
| to our actual work scheduling tools like google docs, jira,
| etc. and I cannot access Slack history.
| rexreed wrote:
| Any opinions on using Discord vs. Slack?
| FerdSlav wrote:
| I personally have found that one of Discords major shortcomings
| is the lack of support for threaded message chains. For those
| times when you may have 2+ parallel conversations in a channel
| you end up dramatically reducing the ability to effectively
| communicate.
| Cuuugi wrote:
| Seems to be back up for me? Status page has not updated yet
| though.
| clubdorothe wrote:
| which tool did they use to communicate during the outage?
| cadence- wrote:
| Slack is down. This is shaping up to be my most productive day in
| a long while.
| elwell wrote:
| Duplicate of https://news.ycombinator.com/item?id=25632048
|
| I think HN is hiding these posts. Maybe status threads are
| discouraged now? But they're much more useful than
| status.slack.com etc.
| detaro wrote:
| > _Maybe status threads are discouraged now_
|
| They always have been, since they clearly don't fit the
| guidelines for what a good submission is and usually leave
| little for interesting discussions. (unlike postmortems of past
| outages, which often are good)
| yreg wrote:
| Yet they are usually incredibly useful for most people here.
| They should be allowed at least while the event is ongoing.
| noir_lord wrote:
| Agreed, when a major service goes down HN is the most
| accurate overview, often a useful sanity check when its AWS
| or Slack size orgs before I open an incident with whichever
| party.
| floatingatoll wrote:
| They've been discouraged for some number of years, but
| community upvoting manages to get them to the front page now
| and then regardless.
| geerlingguy wrote:
| HN is where we all go when the Internet (or large portions of
| it) are down. It's more reliable than all the
| 'downforeveryoneorjustme' or 'downtime monitor' services.
| spicybright wrote:
| Absolutely. I come here for comments to get an idea from
| other engineers of what's actually going on. Way more useful
| than an is it down site.
| clappski wrote:
| It's the first page I try when I think I'm having connection
| issues at least, to verify it's some service that's broken
| rather than my local network
| iamben wrote:
| Notion is down for me (and others) as well. Is there a cloud
| outage somewhere?
| api wrote:
| Salesforce bought Slack, so maybe it's enterprise now.
| zander312 wrote:
| time for nap
| mugivarra69 wrote:
| oof.
| valevk wrote:
| > While the issue is largely still ongoing, we believe some
| customers may see improvement in connecting to Slack after a
| refresh (CTRL/CMD + R).
|
| Nice.
| tschellenbach wrote:
| Chat infrastructure at this level of scale is not easy to build
| and maintain, I appreciate all the hard work that the engineers
| at Slack are putting in to resolve this.
| jastingo wrote:
| Anyone else having a fantastic morning/afternoon without the
| constant pinging? Just saying - there's always a silver lining.
| jaywalk wrote:
| It has made coming back from a long Christmas vacation a lot
| easier. Once I got my emails taken care of, I was able to get
| to work without distractions. It's been nice.
| eastbayjake wrote:
| It's a nice opportunity to go for a quick walk!
| tumidpandora wrote:
| Now an outage -------------
|
| We're continuing to investigate connection issues for customers,
| and have upgraded the incident on our side to reflect an outage
| in service. All hands are on deck on our end to further
| investigate. We'll be back in a half hour to keep you posted.
|
| Jan 4, 8:20 AM PST
| amir734jj wrote:
| This is my SignalR alternative with end-to-end encryption. Choose
| a password and the file and message will be encrypted in client
| side using that password.
|
| URL: https://symmetric-crypto-chat-room.herokuapp.com/
|
| Repo: https://github.com/amir734jj/SymmetricCryptoChatRoom
___________________________________________________________________
(page generated 2021-01-04 23:00 UTC)