[HN Gopher] GitHub incident: 2022/03/24
       ___________________________________________________________________
        
       GitHub incident: 2022/03/24
        
       Author : rsstack
       Score  : 141 points
       Date   : 2022-03-24 14:46 UTC (8 hours ago)
        
 (HTM) web link (www.githubstatus.com)
 (TXT) w3m dump (www.githubstatus.com)
        
       | [deleted]
        
       | djschnei wrote:
       | Another day, another Github incident...
        
         | Cthulhu_ wrote:
         | Name one service with as many users, traffic, and features that
         | doesn't have any issues.
        
           | grumple wrote:
           | Many of us run applications with more users, traffic, data,
           | and/or features without having such a high frequency of
           | outages.
        
           | HL33tibCe7 wrote:
           | Conversely, can you name a service that has this many users
           | that has this many issues?
        
           | floodle wrote:
           | GitHub before Microsoft?
        
             | meibo wrote:
             | GitHub before Microsoft notoriously shied away from
             | innovating/iterating on their core business for years. We
             | had half as many features before then.
             | 
             | IMO feature wise, GitHub really has improved tenfold. I
             | remember having to fight third party CIs and project
             | trackers, now we're running them all on GitHub and it's a
             | well-integrated pleasure. If it does work, which it usually
             | does and hopefully goes back to doing.
             | 
             | I also really enjoy their live vscode IDE, which lets you
             | e.g. make small changes to PRs in a web vscode by pressing
             | . on the PR page. This is different from their hosted VM
             | offering and free, confusingly.
        
               | [deleted]
        
         | [deleted]
        
       | raybb wrote:
       | How does GitLab compare in terms of it's CI/CD reliability?
        
         | jasonm89 wrote:
         | GitLab seems to have been having their own set of issues this
         | month.
         | https://status.gitlab.com/pages/history/5b36dc6502d06804c083...
        
         | fernandotakai wrote:
         | i honestly prefer gitlab to github wrt CI/CD -- but i'm biased,
         | i have ~4y of gitlab ci/cd usage, while i've only been using
         | github for 6mo.
         | 
         | gitlab feels easier to use, easier to write and i prefer the
         | interface.
        
         | Beltiras wrote:
         | I was a heavy GitLab CI/CD user till a couple of weeks ago when
         | I switched jobs. Loved it for the 3 1/2 years I was a user.
         | Sometimes a bit tricky to set up and understand (especially
         | after we switched the executors to docker-in-docker, was a pain
         | to figure out), but once it's running it goes off without a
         | hitch and runs very reliably. My new position entails Github, I
         | can give a comparison in a couple of months when I'm
         | acclimated.
         | 
         | EDIT: I see there are people down-thread that are discussing
         | the SAAS version, ours was a on-site deployment so we could
         | pretty much just spin up more iron if the executors were
         | straining.
        
         | colinmhayes wrote:
         | My company recently switched to github actions from Gitlab CI
         | because gitlab had so much downtime.
        
           | pzduniak wrote:
           | Were you using self-hosted runners or the SaaS per-minute
           | ones? My experience with Gitlab.com is probably irrelevant at
           | this point, but years ago I didn't have any issues whatsoever
           | with the combination of a Gitlab.com org with self-hosted
           | runners... on AWS, I think?
        
         | skwirl wrote:
         | GitLab has its share of downtime. I don't use GitHub regularly
         | so I can't really compare the two, but you will occasionally
         | have a few hours where GitLab is down. Maybe 2 or 3 times per
         | year?
        
         | globular-toast wrote:
         | Our on-prem instance hasn't had any downtime I'm aware of in
         | the past year at least.
        
           | chockchocschoir wrote:
           | Same uptime as my development environment of GitHub then!
           | (And similarly, my statement is as relevant as yours when it
           | comes to the services CI uptime)
        
         | phphphphp wrote:
         | I'm a big fan of GitHub Actions so take this with that in mind.
         | GitLab (self-hosted) is very reliable however GitLab CI/CD is
         | very different in design to GitHub Actions: it's much more a
         | bunch of shell scripts stuck together with YAML. Personally,
         | I'd never go back (unless they redesign it).
        
           | danogentili wrote:
           | Personally, I hate Github Actions from the bottom of my
           | heart.
           | 
           | Ever since Travis CI sold its soul and I've been forced to
           | use Github Actions, and I've come to hate every single moment
           | I have to work with some broken or buggy action, with the
           | inability to restart only failed jobs (they added this vital
           | feature only a few weeks ago, 2+ years after launching
           | actions), with slow job spin-up times, which are only
           | slightly reduced when using the self-hosted, RAM-hogging
           | action runners which are only capable of running one job per
           | instance, forcing me to waste RAM setting up multiple
           | instances of the runners, each bundling its own copy of node
           | and the .NET runtime.
           | 
           | I've been using gitlab CI @ work, and I absolutely love its
           | reliability and ease of use and configuration.
        
             | rezonant wrote:
             | I've been very happy with CircleCI
        
               | stnmtn wrote:
               | CircleCI has had multiple outages for us this past week,
               | causing a lot of frustration.
        
             | deathanatos wrote:
             | ... yeah, we just moved some stuff to Actions, and it's
             | been rather disappointing as onboarding goes. While the
             | YAML represents jobs as a DAG, it seems like Actions can't
             | actually _evaluate_ them as such, and it inserts
             | dependencies where none exist, i.e., it inserts stalls in
             | the pipeline. Action builds aren 't cached, so they're
             | slow. The default token can't be extended to allow cross-
             | repository access, forcing you to write your own app that
             | can. The jobs, not the workflow, is the CI status/result,
             | so you can't branch protect on a workflow. (You have to
             | just have a dummy job ... but you can't have an empty job,
             | no, it has to run `true`...)
        
       | shrikant wrote:
       | I came up with what I thought was an appropriate riff on XKCD
       | 303, but I fear we might be running out of excuses now :(
       | 
       | https://twitter.com/shr1k/status/1504464470462017542
        
       | iBotPeaches wrote:
       | Well this time looks different than the last.
       | 
       | * We get a non-machine message that migrations to GitHub are
       | temporarily disabled.
       | 
       | * We get an exact message that the delivery of webhooks and
       | action kickoff is delayed.
       | 
       | So while there are no 500s - I just have to wait a few more
       | minutes for builds to kick off. So at least I can still review
       | code in the meantime.
        
       | nimbius wrote:
       | https://www.githubstatus.com/
       | 
       | at the time of posting, webhooks and actions are still struggling
       | :(
       | 
       | there have now been 23 incident outages in 90 days. if anyone
       | from Microsoft is reading this, you need to start considering
       | this degradation as the companies highest priority. At this
       | point, Gitea, Gitlab and others (self hosted or otherwise) are
       | beginning to gain traction as a more reliable and performant
       | alternative.
       | 
       | Youre facing the very real possibility of spending nearly eight
       | billion dollars to capture the developer mindshare of the
       | internet, only to lose it in just four years.
        
         | [deleted]
        
         | rvz wrote:
         | > At this point, Gitea, Gitlab and others (self hosted or
         | otherwise) are beginning to gain traction as a more reliable
         | and performant alternative.
         | 
         | Well there you go. The lesson is to not go _' all in'_ or _'
         | centralize everything'_ on GitHub, which is what everyone was
         | doing and why it has now become increasingly unreliable as I
         | predicted, years ago in the long run. [0]
         | 
         | Perhaps it is time to self-host then?
         | 
         | [0] https://news.ycombinator.com/item?id=22867803
        
         | oneepic wrote:
         | Did you see the following post? I can't prove the underlying
         | issue caused _this_ issue but I 'd wager it's likely. And I'd
         | wager they ARE prioritizing this.
         | https://github.blog/2022-03-23-an-update-on-recent-service-d...
        
         | awill wrote:
         | Isn't that basically what happened to Skype? Microsoft paid
         | $8.5B for Skype when it was on top of the world, and now I
         | don't know a single person who uses it.
        
           | tonyedgecombe wrote:
           | EUR5.4bn for Nokia and that didn't work out for them either.
        
           | iratewizard wrote:
           | They got patent trolled and flinched. They removed the things
           | that made Skype best in class and turned it into just another
           | low quality corporate messenger.
        
             | pzduniak wrote:
             | Was it really due to that lawsuit? IIRC the main issue with
             | P2P was that you could discover anyone's IP address by
             | knowing their Skype username. I recall that people used
             | this to DoS people in multiplayer video games - they looked
             | up accounts with roughly the same name as the in-game one
             | and proceeded to drop their connection. It was especially
             | common in League of Legends.
        
           | queuebert wrote:
           | When my organization uses a Microsoft video conferencing
           | product, we use Teams, not "Skype for Business". I have no
           | idea why, as I don't schedule the meetings, but I can say
           | that I can't imagine something sucking worse than Teams.
        
             | shafyy wrote:
             | Teams is the worst. I constantly miss messages, the video
             | calling feature randomly freezes videos of participants,
             | and sometimes my audio doesn't work for no apparent reason.
             | Their Wiki feature sucks, too.
        
               | queuebert wrote:
               | It's also a CPU hog. Why can I watch a 4K video in VLC
               | for hours, but throwing up a couple of 200x100 windows in
               | Teams kills the CPU? I guess Teams is written by interns.
        
               | klhutchins wrote:
               | I just think about why would they force video
               | conferencing into SharePoint?
        
             | rsstack wrote:
             | "Skype for Business" sucks more than Teams. But it's true:
             | it's hard to imagine something like "Skype for Business". I
             | had to use it for a few months when working with a business
             | unit in a different country that were all forced to use it.
             | We were all happy when their business unit's IT department
             | agreed to move them from Skype to Teams.
        
             | anaccountexists wrote:
             | Teams is built on top of Skype for Business, SfB is one of
             | the main backends for them.
        
           | Cthulhu_ wrote:
           | Anecdotal: It's still the primary communications thing in my
           | company and probably our customers (telecoms / networks,
           | probably quite archaic).
        
       | tastywheat wrote:
        
         | tuwtuwtuwtuw wrote:
        
           | tastywheat wrote:
        
         | semiquaver wrote:
         | GitHub didn't impose any requirement to use 'main' as the
         | default branch. We merely changed the default for new repos
         | unless the user or org has configured otherwise. Nothing about
         | that (optional) change would have required any company to
         | change any of what you listed.
        
           | tastywheat wrote:
        
         | Cthulhu_ wrote:
         | Git (hub) doesn't care what you name your main branch, that was
         | all you / your org; see
         | https://docs.github.com/en/repositories/configuring-branches...
         | on how to change the default branch in your Github-hosted
         | repository. The new default for new repos is 'main' these days
         | but it's arbitrary (also consider how many repos have
         | 'development' as their default branch)
        
       | anoplus wrote:
       | Seems it happens at the same time of the day in each case.
        
       | yazboo wrote:
       | From their earlier posts it sounds like they're encountering some
       | kind of MySQL performance issue, which in my (horrible)
       | experience can be extremely difficult for your jack of all trades
       | software engineer or SRE to troubleshoot.
       | 
       | I would hope a company Github's size would have MySQL expertise
       | on staff, but if not I will say a prayer for the poor souls who
       | are feverishly reading the Percona blog and trying to decide
       | whether to tune the doublewrite buffer or redo log, or both, or
       | neither.
        
         | kragen wrote:
         | This is what happens when your company (notoriously) moves away
         | from having meritocracy as a core value: you put people in
         | charge of things who don't have the expertise to run them very
         | well.
         | 
         | https://www.businessinsider.com/githubs-ceo-ditches-meritocr...
        
           | infamouscow wrote:
           | There are lots of exceptional individual contributors with no
           | desire to ever go into management.
        
           | lucasmullens wrote:
           | Pretty sure that rug wouldn't have prevented downtime.
        
           | nwh5jg56df wrote:
           | What did they move away to?
        
         | llamataboot wrote:
         | I agree that getting deep into the weeds on some of that stuff
         | can be taxing on a smaller development team with a few senior
         | generalists (of which I tend to be one) but I'm quite sure that
         | companies at github scale have deep levels of performance
         | expertise - still not always easy of course, because lots of
         | these types of things only come up at some certain scale
        
           | outworlder wrote:
           | Do not immediately assume that they do. Only those who have
           | management that recognizes the need to have those experts
           | even have a shot at getting them.
           | 
           | If they do have the required staff it still might not be
           | readily available due to org chart boundaries.
           | 
           | Unclear if GitHub has the staff or if they are able to draw
           | from the larger Microsoft pool.
           | 
           | If they keep having issues I expect Microsoft to push them to
           | move everything to MSSQL
        
       | timcavel wrote:
        
       | JCMais wrote:
       | https://github.blog/2022-03-23-an-update-on-recent-service-d...
        
         | darkwater wrote:
         | Thanks for the link, they probably need to add another entry to
         | the list. #hugops
        
       | niffydroid wrote:
       | I moved away from bitbucket to github to get away from these sort
       | of issues.
        
       | _fat_santa wrote:
       | After this latest incident I brought it up to my company that we
       | should start considering alternatives if this situation doesn't
       | improve. From an outsiders perspective, it looks like all the new
       | features they recently introduced seem to be crippling their
       | databases.
       | 
       | Microsoft needs to get a grip on this situation as the parent
       | company. Their golden goose acquisition is about to become
       | persona non grata if these outages continue.
        
         | anaccountexists wrote:
         | My company uses the same DB sharding tech. It took us about a
         | year of daily / weekly outages until we finally were able to
         | fix our performance issues. 256 database splits, lots of cross-
         | shard queries removed, etc before we finally reached a happy-
         | ish state for a year.
         | 
         | Now it's scheduled to fall over in about 6 months and everyone
         | is freaking out again. It's not new features that are hurting
         | us, it's the existing core product line. All the new stuff is
         | built on horizontally sharded DBs from the get-go.
        
         | gtirloni wrote:
         | How are you doing cost analysis between moving off of GitHub
         | and working around the issues?
        
         | nijave wrote:
         | Used to work at BigBank$ where they had an internal 8 node Bit
         | bucket cluster. Down multiple times a week for 6+ months before
         | they could finally get it under control (Jira + CI was
         | apparently putting a lot of load on and apparently BitBucket
         | doesn't scale past 8 nodes)
         | 
         | Needless to say, self hosting can be more reliable but you'll
         | probably end up dedicating a lot of resources to building that
         | out and supporting it
        
           | MrJohz wrote:
           | I've never tried to self-host something like that, but on the
           | other hand, I've never worked at a place where the internal
           | GitLab wasn't a shaky mess, particularly when CI or Pages or
           | certificates is involved.
           | 
           | I believe people when they tell me they've managed it without
           | major problems, but clearly they're the 10x developers of the
           | self-hosted world, because the people I work with seem to
           | find it hard - and that is not a slight on them, because
           | clearly even Github can't do so much better!
        
           | cyberpunk wrote:
           | Single VM (backed up) with gitlab, k8s for workers.
           | 
           | Not difficult, not hard to maintain. Worked fine for our ~500
           | projects (each with pipelines/ci|cd) and ~100 devs.
        
       | tastywheat wrote:
        
       | AndrewOMartin wrote:
       | What we need is some kind of distributed version control system,
       | then this would never happen.
       | 
       | Ideally it'd be a free and open source distributed version
       | control system designed to handle everything from small to very
       | large projects with speed and efficiency, but I think that's just
       | unrealistic.
        
         | peeters wrote:
         | I hear this sort of remark a lot, but does anyone actually
         | _practice_ this? Like git is great in that there a myriad of
         | options for syncing remotes, but none that I 've ever seen come
         | close to having a central clone that also acts as the highest
         | authority.
         | 
         | Having used git prior to Github et al, where I had remotes set
         | up to literally each collaborator that they individually
         | hosted, I would never want that user experience over what I
         | have now. The centralized model is far too compelling. So I'm
         | curious what the bleeding edge distributed model looks like
         | today.
        
           | radicality wrote:
           | I'm with you, I don't even want the decentralised model, but
           | it feels like with GitHub I get... the worst of both worlds?
           | No decentralisation and crappy centralisation?
           | 
           | For example, in a work environment, what's even the point of
           | 'forking' in GitHub. I don't want my own version of any repo,
           | I don't want to be playing weird rebase/merge upstream/push
           | origin nonsense. I don't even want to be creating branches or
           | naming branches. I just want a centralised repo: I make a
           | commit and... that's it, get a commit hash and be able to
           | have that commit anywhere else I check out the repo, and open
           | a PR/diff to merge into master.
        
             | rezonant wrote:
             | I agree though that its hard to find use cases for forks in
             | a typical business- however Unreal Engine would be a great
             | example of it where the private repository is shared by
             | multiple development teams. If you have access to it
             | (because you've agreed to the Source License), you can make
             | a private fork and send in a PR to the main repository.
             | 
             | Epic probably won't merge it though unless you represent a
             | big studio.
        
         | danogentili wrote:
         | My irony meter is currently down for maintenance, but that's
         | precisely what git is, a free and open source distributed
         | version control system designed to handle everything from small
         | to very large projects with speed and efficiency :)
        
           | floodle wrote:
           | I think that was the point OP was making, through a layer of
           | thick sarcasm.
        
           | hwbehrens wrote:
           | My meter was working perfectly until I read the parent
           | comment, at which point it measured off the scale and
           | promptly broke.
        
           | groby_b wrote:
           | Can't be git they're talking about, they mentioned speed and
           | efficiency for very large projects.
        
             | dharmab wrote:
             | Ironically, Microsoft has been a major contributor to
             | improvements in git for handling large repos after Windows
             | was migrated to git.
             | 
             | https://github.com/microsoft/git
        
         | macinjosh wrote:
         | Sounds like mercurial :)
        
         | [deleted]
        
       | ram_rar wrote:
       | I wonder, how much of this could be attributed to attrition and
       | gap in knowledge base created when key engineers of the related
       | team have left.
        
       | spencera wrote:
       | Everyone do your part! I took the rest of the week off to give
       | GitHub a chance to recover. Flatten the peaks, let GitHub migrate
        
         | mh- wrote:
         | 2 days to flatten the curve
        
         | [deleted]
        
       | rvz wrote:
       | Not one, but three outages in a row [0], if you went _' all in'_
       | on GitHub (Including GitHub Actions). Even under _' scheduled
       | maintenance'_ something else unexpectedly goes down.
       | 
       | From [0]:
       | 
       | >> Until the next time GitHub goes down again (hopefully that
       | won't be in another month's time).
       | 
       | >> That says it all really. Lets reset the counter and try this
       | again.
       | 
       | Going to reset the counter once again. Hopefully GitHub won't
       | have another outage in a week's (or even a months) time.
       | 
       | It turns out that my whole comment chain in [0] was right in the
       | _' long term'_ of not _' centralizing everything'_ [1] on GitHub
       | since 2020.
       | 
       | [0] https://news.ycombinator.com/item?id=30784663
       | 
       | [1] https://news.ycombinator.com/item?id=22867803
        
       | HL33tibCe7 wrote:
       | GitHub SREs are going to end up with thousand-yard stares by the
       | time this is all over...
        
       | candiddevmike wrote:
       | Anyone know of an open source/self hosted alternative to GitHub's
       | new project boards? Gitea has the old ones (Kanban style) but I
       | really, really like the table layout.
        
       | bob1029 wrote:
       | After the incident yesterday I sent an email to GH sales to talk
       | about moving to on-prem enterprise so we don't have to go down
       | with the rest of the boat.
       | 
       | Still waiting on that callback/reply. Starting to wonder if
       | Microsoft even wants our money anymore.
        
         | gtirloni wrote:
         | Not even a full day since your email and you're making wild
         | assumptions?
        
           | [deleted]
        
           | bob1029 wrote:
           | How long should I expect to wait for someone else to take
           | more of my money, or at least answer 3 simple pre-sales
           | questions?
           | 
           | If I dont hear back from GitHub enterprise sales before the
           | end of the week, I am going to take this as a strong hint
           | that we are too small for microsoft to care about, suggesting
           | that we would be safer & happier on a different vendor's
           | product.
        
             | gtirloni wrote:
             | A few days to a week is normal for enterprise sales.
        
         | rpadovani wrote:
         | Last year, we were investigating if we wanted to go with GitHub
         | or GitLab on-prem. EU company, they don't sell directly and
         | forwarded us to some EU vendor (strange, but whatever). We were
         | never able to get a proper quote. While I, personally, love
         | GitLab, it wasn't hard to convince other people in the team: we
         | installed the free version, played around, bought a license.
         | 
         | I still have no idea how much the final quote for GitHub could
         | have been.
        
       | WinterMount223 wrote:
        
       | birracerveza wrote:
       | This is starting to get ridiculous.
        
         | [deleted]
        
       ___________________________________________________________________
       (page generated 2022-03-24 23:02 UTC)