[HN Gopher] GitHub incident: 2022/03/24
___________________________________________________________________
GitHub incident: 2022/03/24
Author : rsstack
Score : 141 points
Date : 2022-03-24 14:46 UTC (8 hours ago)
(HTM) web link (www.githubstatus.com)
(TXT) w3m dump (www.githubstatus.com)
| [deleted]
| djschnei wrote:
| Another day, another Github incident...
| Cthulhu_ wrote:
| Name one service with as many users, traffic, and features that
| doesn't have any issues.
| grumple wrote:
| Many of us run applications with more users, traffic, data,
| and/or features without having such a high frequency of
| outages.
| HL33tibCe7 wrote:
| Conversely, can you name a service that has this many users
| that has this many issues?
| floodle wrote:
| GitHub before Microsoft?
| meibo wrote:
| GitHub before Microsoft notoriously shied away from
| innovating/iterating on their core business for years. We
| had half as many features before then.
|
| IMO feature wise, GitHub really has improved tenfold. I
| remember having to fight third party CIs and project
| trackers, now we're running them all on GitHub and it's a
| well-integrated pleasure. If it does work, which it usually
| does and hopefully goes back to doing.
|
| I also really enjoy their live vscode IDE, which lets you
| e.g. make small changes to PRs in a web vscode by pressing
| . on the PR page. This is different from their hosted VM
| offering and free, confusingly.
| [deleted]
| [deleted]
| raybb wrote:
| How does GitLab compare in terms of it's CI/CD reliability?
| jasonm89 wrote:
| GitLab seems to have been having their own set of issues this
| month.
| https://status.gitlab.com/pages/history/5b36dc6502d06804c083...
| fernandotakai wrote:
| i honestly prefer gitlab to github wrt CI/CD -- but i'm biased,
| i have ~4y of gitlab ci/cd usage, while i've only been using
| github for 6mo.
|
| gitlab feels easier to use, easier to write and i prefer the
| interface.
| Beltiras wrote:
| I was a heavy GitLab CI/CD user till a couple of weeks ago when
| I switched jobs. Loved it for the 3 1/2 years I was a user.
| Sometimes a bit tricky to set up and understand (especially
| after we switched the executors to docker-in-docker, was a pain
| to figure out), but once it's running it goes off without a
| hitch and runs very reliably. My new position entails Github, I
| can give a comparison in a couple of months when I'm
| acclimated.
|
| EDIT: I see there are people down-thread that are discussing
| the SAAS version, ours was a on-site deployment so we could
| pretty much just spin up more iron if the executors were
| straining.
| colinmhayes wrote:
| My company recently switched to github actions from Gitlab CI
| because gitlab had so much downtime.
| pzduniak wrote:
| Were you using self-hosted runners or the SaaS per-minute
| ones? My experience with Gitlab.com is probably irrelevant at
| this point, but years ago I didn't have any issues whatsoever
| with the combination of a Gitlab.com org with self-hosted
| runners... on AWS, I think?
| skwirl wrote:
| GitLab has its share of downtime. I don't use GitHub regularly
| so I can't really compare the two, but you will occasionally
| have a few hours where GitLab is down. Maybe 2 or 3 times per
| year?
| globular-toast wrote:
| Our on-prem instance hasn't had any downtime I'm aware of in
| the past year at least.
| chockchocschoir wrote:
| Same uptime as my development environment of GitHub then!
| (And similarly, my statement is as relevant as yours when it
| comes to the services CI uptime)
| phphphphp wrote:
| I'm a big fan of GitHub Actions so take this with that in mind.
| GitLab (self-hosted) is very reliable however GitLab CI/CD is
| very different in design to GitHub Actions: it's much more a
| bunch of shell scripts stuck together with YAML. Personally,
| I'd never go back (unless they redesign it).
| danogentili wrote:
| Personally, I hate Github Actions from the bottom of my
| heart.
|
| Ever since Travis CI sold its soul and I've been forced to
| use Github Actions, and I've come to hate every single moment
| I have to work with some broken or buggy action, with the
| inability to restart only failed jobs (they added this vital
| feature only a few weeks ago, 2+ years after launching
| actions), with slow job spin-up times, which are only
| slightly reduced when using the self-hosted, RAM-hogging
| action runners which are only capable of running one job per
| instance, forcing me to waste RAM setting up multiple
| instances of the runners, each bundling its own copy of node
| and the .NET runtime.
|
| I've been using gitlab CI @ work, and I absolutely love its
| reliability and ease of use and configuration.
| rezonant wrote:
| I've been very happy with CircleCI
| stnmtn wrote:
| CircleCI has had multiple outages for us this past week,
| causing a lot of frustration.
| deathanatos wrote:
| ... yeah, we just moved some stuff to Actions, and it's
| been rather disappointing as onboarding goes. While the
| YAML represents jobs as a DAG, it seems like Actions can't
| actually _evaluate_ them as such, and it inserts
| dependencies where none exist, i.e., it inserts stalls in
| the pipeline. Action builds aren 't cached, so they're
| slow. The default token can't be extended to allow cross-
| repository access, forcing you to write your own app that
| can. The jobs, not the workflow, is the CI status/result,
| so you can't branch protect on a workflow. (You have to
| just have a dummy job ... but you can't have an empty job,
| no, it has to run `true`...)
| shrikant wrote:
| I came up with what I thought was an appropriate riff on XKCD
| 303, but I fear we might be running out of excuses now :(
|
| https://twitter.com/shr1k/status/1504464470462017542
| iBotPeaches wrote:
| Well this time looks different than the last.
|
| * We get a non-machine message that migrations to GitHub are
| temporarily disabled.
|
| * We get an exact message that the delivery of webhooks and
| action kickoff is delayed.
|
| So while there are no 500s - I just have to wait a few more
| minutes for builds to kick off. So at least I can still review
| code in the meantime.
| nimbius wrote:
| https://www.githubstatus.com/
|
| at the time of posting, webhooks and actions are still struggling
| :(
|
| there have now been 23 incident outages in 90 days. if anyone
| from Microsoft is reading this, you need to start considering
| this degradation as the companies highest priority. At this
| point, Gitea, Gitlab and others (self hosted or otherwise) are
| beginning to gain traction as a more reliable and performant
| alternative.
|
| Youre facing the very real possibility of spending nearly eight
| billion dollars to capture the developer mindshare of the
| internet, only to lose it in just four years.
| [deleted]
| rvz wrote:
| > At this point, Gitea, Gitlab and others (self hosted or
| otherwise) are beginning to gain traction as a more reliable
| and performant alternative.
|
| Well there you go. The lesson is to not go _' all in'_ or _'
| centralize everything'_ on GitHub, which is what everyone was
| doing and why it has now become increasingly unreliable as I
| predicted, years ago in the long run. [0]
|
| Perhaps it is time to self-host then?
|
| [0] https://news.ycombinator.com/item?id=22867803
| oneepic wrote:
| Did you see the following post? I can't prove the underlying
| issue caused _this_ issue but I 'd wager it's likely. And I'd
| wager they ARE prioritizing this.
| https://github.blog/2022-03-23-an-update-on-recent-service-d...
| awill wrote:
| Isn't that basically what happened to Skype? Microsoft paid
| $8.5B for Skype when it was on top of the world, and now I
| don't know a single person who uses it.
| tonyedgecombe wrote:
| EUR5.4bn for Nokia and that didn't work out for them either.
| iratewizard wrote:
| They got patent trolled and flinched. They removed the things
| that made Skype best in class and turned it into just another
| low quality corporate messenger.
| pzduniak wrote:
| Was it really due to that lawsuit? IIRC the main issue with
| P2P was that you could discover anyone's IP address by
| knowing their Skype username. I recall that people used
| this to DoS people in multiplayer video games - they looked
| up accounts with roughly the same name as the in-game one
| and proceeded to drop their connection. It was especially
| common in League of Legends.
| queuebert wrote:
| When my organization uses a Microsoft video conferencing
| product, we use Teams, not "Skype for Business". I have no
| idea why, as I don't schedule the meetings, but I can say
| that I can't imagine something sucking worse than Teams.
| shafyy wrote:
| Teams is the worst. I constantly miss messages, the video
| calling feature randomly freezes videos of participants,
| and sometimes my audio doesn't work for no apparent reason.
| Their Wiki feature sucks, too.
| queuebert wrote:
| It's also a CPU hog. Why can I watch a 4K video in VLC
| for hours, but throwing up a couple of 200x100 windows in
| Teams kills the CPU? I guess Teams is written by interns.
| klhutchins wrote:
| I just think about why would they force video
| conferencing into SharePoint?
| rsstack wrote:
| "Skype for Business" sucks more than Teams. But it's true:
| it's hard to imagine something like "Skype for Business". I
| had to use it for a few months when working with a business
| unit in a different country that were all forced to use it.
| We were all happy when their business unit's IT department
| agreed to move them from Skype to Teams.
| anaccountexists wrote:
| Teams is built on top of Skype for Business, SfB is one of
| the main backends for them.
| Cthulhu_ wrote:
| Anecdotal: It's still the primary communications thing in my
| company and probably our customers (telecoms / networks,
| probably quite archaic).
| tastywheat wrote:
| tuwtuwtuwtuw wrote:
| tastywheat wrote:
| semiquaver wrote:
| GitHub didn't impose any requirement to use 'main' as the
| default branch. We merely changed the default for new repos
| unless the user or org has configured otherwise. Nothing about
| that (optional) change would have required any company to
| change any of what you listed.
| tastywheat wrote:
| Cthulhu_ wrote:
| Git (hub) doesn't care what you name your main branch, that was
| all you / your org; see
| https://docs.github.com/en/repositories/configuring-branches...
| on how to change the default branch in your Github-hosted
| repository. The new default for new repos is 'main' these days
| but it's arbitrary (also consider how many repos have
| 'development' as their default branch)
| anoplus wrote:
| Seems it happens at the same time of the day in each case.
| yazboo wrote:
| From their earlier posts it sounds like they're encountering some
| kind of MySQL performance issue, which in my (horrible)
| experience can be extremely difficult for your jack of all trades
| software engineer or SRE to troubleshoot.
|
| I would hope a company Github's size would have MySQL expertise
| on staff, but if not I will say a prayer for the poor souls who
| are feverishly reading the Percona blog and trying to decide
| whether to tune the doublewrite buffer or redo log, or both, or
| neither.
| kragen wrote:
| This is what happens when your company (notoriously) moves away
| from having meritocracy as a core value: you put people in
| charge of things who don't have the expertise to run them very
| well.
|
| https://www.businessinsider.com/githubs-ceo-ditches-meritocr...
| infamouscow wrote:
| There are lots of exceptional individual contributors with no
| desire to ever go into management.
| lucasmullens wrote:
| Pretty sure that rug wouldn't have prevented downtime.
| nwh5jg56df wrote:
| What did they move away to?
| llamataboot wrote:
| I agree that getting deep into the weeds on some of that stuff
| can be taxing on a smaller development team with a few senior
| generalists (of which I tend to be one) but I'm quite sure that
| companies at github scale have deep levels of performance
| expertise - still not always easy of course, because lots of
| these types of things only come up at some certain scale
| outworlder wrote:
| Do not immediately assume that they do. Only those who have
| management that recognizes the need to have those experts
| even have a shot at getting them.
|
| If they do have the required staff it still might not be
| readily available due to org chart boundaries.
|
| Unclear if GitHub has the staff or if they are able to draw
| from the larger Microsoft pool.
|
| If they keep having issues I expect Microsoft to push them to
| move everything to MSSQL
| timcavel wrote:
| JCMais wrote:
| https://github.blog/2022-03-23-an-update-on-recent-service-d...
| darkwater wrote:
| Thanks for the link, they probably need to add another entry to
| the list. #hugops
| niffydroid wrote:
| I moved away from bitbucket to github to get away from these sort
| of issues.
| _fat_santa wrote:
| After this latest incident I brought it up to my company that we
| should start considering alternatives if this situation doesn't
| improve. From an outsiders perspective, it looks like all the new
| features they recently introduced seem to be crippling their
| databases.
|
| Microsoft needs to get a grip on this situation as the parent
| company. Their golden goose acquisition is about to become
| persona non grata if these outages continue.
| anaccountexists wrote:
| My company uses the same DB sharding tech. It took us about a
| year of daily / weekly outages until we finally were able to
| fix our performance issues. 256 database splits, lots of cross-
| shard queries removed, etc before we finally reached a happy-
| ish state for a year.
|
| Now it's scheduled to fall over in about 6 months and everyone
| is freaking out again. It's not new features that are hurting
| us, it's the existing core product line. All the new stuff is
| built on horizontally sharded DBs from the get-go.
| gtirloni wrote:
| How are you doing cost analysis between moving off of GitHub
| and working around the issues?
| nijave wrote:
| Used to work at BigBank$ where they had an internal 8 node Bit
| bucket cluster. Down multiple times a week for 6+ months before
| they could finally get it under control (Jira + CI was
| apparently putting a lot of load on and apparently BitBucket
| doesn't scale past 8 nodes)
|
| Needless to say, self hosting can be more reliable but you'll
| probably end up dedicating a lot of resources to building that
| out and supporting it
| MrJohz wrote:
| I've never tried to self-host something like that, but on the
| other hand, I've never worked at a place where the internal
| GitLab wasn't a shaky mess, particularly when CI or Pages or
| certificates is involved.
|
| I believe people when they tell me they've managed it without
| major problems, but clearly they're the 10x developers of the
| self-hosted world, because the people I work with seem to
| find it hard - and that is not a slight on them, because
| clearly even Github can't do so much better!
| cyberpunk wrote:
| Single VM (backed up) with gitlab, k8s for workers.
|
| Not difficult, not hard to maintain. Worked fine for our ~500
| projects (each with pipelines/ci|cd) and ~100 devs.
| tastywheat wrote:
| AndrewOMartin wrote:
| What we need is some kind of distributed version control system,
| then this would never happen.
|
| Ideally it'd be a free and open source distributed version
| control system designed to handle everything from small to very
| large projects with speed and efficiency, but I think that's just
| unrealistic.
| peeters wrote:
| I hear this sort of remark a lot, but does anyone actually
| _practice_ this? Like git is great in that there a myriad of
| options for syncing remotes, but none that I 've ever seen come
| close to having a central clone that also acts as the highest
| authority.
|
| Having used git prior to Github et al, where I had remotes set
| up to literally each collaborator that they individually
| hosted, I would never want that user experience over what I
| have now. The centralized model is far too compelling. So I'm
| curious what the bleeding edge distributed model looks like
| today.
| radicality wrote:
| I'm with you, I don't even want the decentralised model, but
| it feels like with GitHub I get... the worst of both worlds?
| No decentralisation and crappy centralisation?
|
| For example, in a work environment, what's even the point of
| 'forking' in GitHub. I don't want my own version of any repo,
| I don't want to be playing weird rebase/merge upstream/push
| origin nonsense. I don't even want to be creating branches or
| naming branches. I just want a centralised repo: I make a
| commit and... that's it, get a commit hash and be able to
| have that commit anywhere else I check out the repo, and open
| a PR/diff to merge into master.
| rezonant wrote:
| I agree though that its hard to find use cases for forks in
| a typical business- however Unreal Engine would be a great
| example of it where the private repository is shared by
| multiple development teams. If you have access to it
| (because you've agreed to the Source License), you can make
| a private fork and send in a PR to the main repository.
|
| Epic probably won't merge it though unless you represent a
| big studio.
| danogentili wrote:
| My irony meter is currently down for maintenance, but that's
| precisely what git is, a free and open source distributed
| version control system designed to handle everything from small
| to very large projects with speed and efficiency :)
| floodle wrote:
| I think that was the point OP was making, through a layer of
| thick sarcasm.
| hwbehrens wrote:
| My meter was working perfectly until I read the parent
| comment, at which point it measured off the scale and
| promptly broke.
| groby_b wrote:
| Can't be git they're talking about, they mentioned speed and
| efficiency for very large projects.
| dharmab wrote:
| Ironically, Microsoft has been a major contributor to
| improvements in git for handling large repos after Windows
| was migrated to git.
|
| https://github.com/microsoft/git
| macinjosh wrote:
| Sounds like mercurial :)
| [deleted]
| ram_rar wrote:
| I wonder, how much of this could be attributed to attrition and
| gap in knowledge base created when key engineers of the related
| team have left.
| spencera wrote:
| Everyone do your part! I took the rest of the week off to give
| GitHub a chance to recover. Flatten the peaks, let GitHub migrate
| mh- wrote:
| 2 days to flatten the curve
| [deleted]
| rvz wrote:
| Not one, but three outages in a row [0], if you went _' all in'_
| on GitHub (Including GitHub Actions). Even under _' scheduled
| maintenance'_ something else unexpectedly goes down.
|
| From [0]:
|
| >> Until the next time GitHub goes down again (hopefully that
| won't be in another month's time).
|
| >> That says it all really. Lets reset the counter and try this
| again.
|
| Going to reset the counter once again. Hopefully GitHub won't
| have another outage in a week's (or even a months) time.
|
| It turns out that my whole comment chain in [0] was right in the
| _' long term'_ of not _' centralizing everything'_ [1] on GitHub
| since 2020.
|
| [0] https://news.ycombinator.com/item?id=30784663
|
| [1] https://news.ycombinator.com/item?id=22867803
| HL33tibCe7 wrote:
| GitHub SREs are going to end up with thousand-yard stares by the
| time this is all over...
| candiddevmike wrote:
| Anyone know of an open source/self hosted alternative to GitHub's
| new project boards? Gitea has the old ones (Kanban style) but I
| really, really like the table layout.
| bob1029 wrote:
| After the incident yesterday I sent an email to GH sales to talk
| about moving to on-prem enterprise so we don't have to go down
| with the rest of the boat.
|
| Still waiting on that callback/reply. Starting to wonder if
| Microsoft even wants our money anymore.
| gtirloni wrote:
| Not even a full day since your email and you're making wild
| assumptions?
| [deleted]
| bob1029 wrote:
| How long should I expect to wait for someone else to take
| more of my money, or at least answer 3 simple pre-sales
| questions?
|
| If I dont hear back from GitHub enterprise sales before the
| end of the week, I am going to take this as a strong hint
| that we are too small for microsoft to care about, suggesting
| that we would be safer & happier on a different vendor's
| product.
| gtirloni wrote:
| A few days to a week is normal for enterprise sales.
| rpadovani wrote:
| Last year, we were investigating if we wanted to go with GitHub
| or GitLab on-prem. EU company, they don't sell directly and
| forwarded us to some EU vendor (strange, but whatever). We were
| never able to get a proper quote. While I, personally, love
| GitLab, it wasn't hard to convince other people in the team: we
| installed the free version, played around, bought a license.
|
| I still have no idea how much the final quote for GitHub could
| have been.
| WinterMount223 wrote:
| birracerveza wrote:
| This is starting to get ridiculous.
| [deleted]
___________________________________________________________________
(page generated 2022-03-24 23:02 UTC)