[HN Gopher] GitHub incident 2022-03-23
       ___________________________________________________________________
        
       GitHub incident 2022-03-23
        
       Author : tpaksoy
       Score  : 258 points
       Date   : 2022-03-23 14:51 UTC (8 hours ago)
        
 (HTM) web link (www.githubstatus.com)
 (TXT) w3m dump (www.githubstatus.com)
        
       | max23_ wrote:
       | Looks like the same services that were affected in yesterday
       | incident.
        
       | mfashby wrote:
       | I'm inclined to look at tools like fossil again, for it's
       | distributed issue tracking and wiki capability
       | 
       | https://fossil-scm.org/home/doc/trunk/www/index.wiki
        
         | edgyquant wrote:
         | I had forgotten about that, thanks!
        
       | frjalex wrote:
       | Looking at the "GitHub" prefix in the title, I was half-expecting
       | this to point to a report explaining the outage a week ago... But
       | rest assured, it is a new outage!
        
         | annexrichmond wrote:
         | I thought it was going to be a Postmortem. I couldn't have been
         | more wrong!
        
         | teekert wrote:
         | Oh I thought it was about the one from yesterday :)
        
           | aaaaaaaaata wrote:
           | Are their CI/CD toys that shiny that people still willingly
           | choose them even with all the issues?
           | 
           | I find myself regularly asking this -- about every major SaaS
           | used for critical ops stuff like this.
        
             | teekert wrote:
             | Work choose GitHub (we are a MicroSoft shop), I have to
             | say, I like GitHub a lot. The disruptions have been
             | annoying sometimes, that's true. But due to the nature of
             | Git I could always just keep working.
        
         | [deleted]
        
       | einpoklum wrote:
       | The page at the link is not much more informative than the link
       | itself :-(
        
       | Xarodon wrote:
       | This has been a pretty rough week for GitHub
        
         | stuff4ben wrote:
         | Github Enterprise hasn't been faring too well at my work either
         | this week. When you work on both open and closed source
         | products and GH and GHE are both down, it leads to a very
         | unproductive week.
        
           | jrowley wrote:
           | Does GitHub enterprise result in dedicated instance or any
           | better availability?
        
             | jon-wood wrote:
             | It Depends.
             | 
             | GitHub Enterprise is confusingly both a "call us for
             | pricing" tier of GitHub the website, and also an on-premise
             | version of GitHub that you can run as an appliance in your
             | own data centre. The first of those is ultimately just
             | GitHub and so has the same outages, the second is running
             | on your own hardware so (shouldn't be) tied to the
             | website's availability.
        
             | bewuethr wrote:
             | There are multiple products: self-hosted (Enterprise
             | Server) and hosted by GitHub (Enterprise Cloud). I don't
             | know about uptime guarantees, but you can buy Premium or
             | Premium Plus support with 30-minute SLA or a dedicated
             | account manager.
        
       | okareaman wrote:
       | What's the difference between GitHub and GrubHub?
       | 
       | GrubHub delivers
        
         | jadbox wrote:
         | First HN comment that ever made me laugh, well done.
        
         | darknavi wrote:
         | Watching Lion King as a youth I always though grubs looked
         | delicious.
         | 
         | Little did I know...
        
       | mirekrusin wrote:
       | What's the best crowdsourced status monitor?
        
         | eckza wrote:
         | https://outage.bingo/
        
       | nimbius wrote:
       | https://www.githubstatus.com/history
       | 
       | 21 incident outages in just 3 months. At this rate the benefits
       | of running your own gitea or gitlab are starting to become
       | competitive.
        
         | TheRealPomax wrote:
         | But how many of those actually affected you? For example, no
         | amount of issues around codespaces or github packages would
         | impact my professional use of github, so whether there are 21
         | or 5000 or those parts get permanently taken offline makes no
         | difference in what I need out of the platform.
         | 
         | How many _core_ incidents? The part that affects whether you
         | can even push to and pull from a repo, and access issues and
         | PRs? Because everything else is nice to have, but you can do
         | work perfectly fine without them if they go down for a few
         | hours.
        
           | sitzkrieg wrote:
           | i could not even sso login so it was a bit more impactful
           | than it sounds on the paper
        
           | CanSpice wrote:
           | Yesterday's affected me, I couldn't pull or push and when I
           | tried to look at the repo to do PRs I got 500 errors. That
           | only lasted maybe 30 minutes though.
        
           | jeltz wrote:
           | I was affected by the one last week, the one yesterday and
           | the one today. The one today was harmless but the other two
           | disrupted out work. All three were "core incidents", but the
           | one today felt shorter.
        
         | ironmagma wrote:
         | We run Gitea at my company. In fact, we forked it. It could
         | reeeaaaalllly use a rewrite. If anyone is even mildly ambitious
         | about creating a new alternative to Github/Gitea, it's a great
         | time to do that.
        
           | mynameismon wrote:
           | You might be interested in sourcehut: https://sr.ht
        
           | KronisLV wrote:
           | Another self-hosted project in the space that i've seen was
           | GitBucket, although it runs on the JVM (not necessarily a bad
           | thing, just different from Go): https://gitbucket.github.io/
        
         | chockchocschoir wrote:
         | > At this rate the benefits of running your own gitea or gitlab
         | are starting to become competitive
         | 
         | No need, just use Codeberg.org instead. They run Gitea and is a
         | free collaboration platform (+ git hosting) for free projects.
         | FOSS/OSS should really consider alternatives to GitHub and
         | GitLab, especially when there are much more FOSS/OSS friendly
         | platforms around.
        
         | sonicggg wrote:
         | Come on, don't be so dramatic. This is not a 911 call center,
         | people will survive these minor outages.
        
           | mdellavo wrote:
           | sure but three days in a row?
        
         | Chris2048 wrote:
         | > running your own
         | 
         | assuming that would be flawless, which it wouldn't
        
         | ransom1538 wrote:
         | "21 incident outages in just 3 months. At this rate the
         | benefits of running your own gitea or gitlab are starting to
         | become competitive."
         | 
         | Oh stop the drama. Fine. Setup your gitlab.
        
         | belter wrote:
         | Excluding ones reported as [Errors], [Scheduled] or
         | [Notifications]
         | 
         | 2019 -> 39 Incidents
         | 
         | 2020 -> 67 Incidents
         | 
         | 2021 -> 86 Incidents
         | 
         | 2022 -> 20 Incidents so far
         | 
         | Edit: Using Linear Regression...Prediction for total end 2022:
         | 111 Incidents.
        
           | mrkramer wrote:
           | One would thought when they got acquired by Microsoft that
           | the number of incidents would go down considering all
           | resources Microsoft would provide but no.
        
             | speedgoose wrote:
             | GitHub has a lot more features now though. A few years ago
             | you didn't have GitHub actions or workspaces, mostly a DDoS
             | from Asia once in a while.
        
           | antiquark wrote:
           | Based on the same interpolation, github will reach one
           | incident per day by 2032.
        
           | mbesto wrote:
           | The number of incidents isn't so much of a problem as the
           | amount of downtime is. That would be more interesting to see.
        
             | belter wrote:
             | GitHub Availability Report [1]
             | 
             | Service Downtime Core Services Only - Cumulative per Month
             | 
             | ( Some months with more than one outage)
             | 
             | Jan 2021: 3 hours 53 min
             | 
             | Feb 2021: 1 hour 42 min
             | 
             | Mar 2021: 4 hours 10 min
             | 
             | Apr 2021: 2 hours 20 min
             | 
             | May 2021: 10 hours 34 min
             | 
             | Jun 2021: 0 min
             | 
             | Jul 2021: 0 min
             | 
             | Aug 2021: 4 hours 23 min
             | 
             | Sep 2021: 0 min
             | 
             | Oct 2021: 1 hour 36 min
             | 
             | Nov 2021: 2 hours 50 min
             | 
             | Dec 2021: 0 min
             | 
             | Jan 2022: 26 min
             | 
             | Feb 2022: 13 min
             | 
             | [1] https://github.blog/tag/github-availability-report/
        
               | mbesto wrote:
               | So, if my math is right (for 2021 only): 1888 min /
               | 525,600 min = 99.64% uptime.
               | 
               | If it was more like 99.80+ I think I would be like "meh",
               | but honestly for the price you pay that's not terrible.
               | Still, for a company at the Microsoft level, it should be
               | 99.80 at least.
        
           | omoikane wrote:
           | I wondered if those error rates were proportional to Github's
           | growth over time, so I looked it up. It seems that they have
           | 40M users in 2019[1] and 73M users in 2021[2], which
           | translates to 0.975 incidents per million users per year in
           | 2019 compared to 1.178 in 2021.
           | 
           | So perhaps they are not exactly improving, but maybe there is
           | some other way to normalize the data.
           | 
           | [1] https://github.blog/2019-11-06-the-state-of-the-
           | octoverse-20...
           | 
           | [2] https://octoverse.github.com/
        
           | ejb999 wrote:
           | thats not the kind of progression you like to see - that is,
           | error rates increasing over time instead of decreasing.
        
             | TheRealPomax wrote:
             | Only if you believe those numbers mean anything. What are
             | the errors _for_? Github has been adding lots of features
             | and subproducts over the years, becoming a bigger and
             | bigger platform as a result. What you want is the error-
             | per-component, which may very well have actually gone down,
             | with error spikes coming from  "when github adds a
             | completely new feature and it goes through a slew of
             | incidents in its first year". The bigger the feature, the
             | more incidents.
             | 
             | Without more detailed numbers, there's literally no
             | conclusion to draw here.
        
               | ejb999 wrote:
               | Every place I have ever worked reported incidents going
               | down would be good, not up.
        
               | TheRealPomax wrote:
               | Every place I ever worked at understood that if you x3
               | the codebase/infra/interaction surface/etc, you can
               | expect x3 errors. If the total number of errors don't go
               | up as you grow you're doing amazing, and if they go down
               | even though you're landing more and more code for more
               | and more features and subproducts, you have a genuine
               | miracle.
        
               | AlexandrB wrote:
               | These features can't be rolled out incrementally to
               | users? In this day and age it seems weird for a web app
               | to do a global go-live with something before testing it
               | with a smaller group first.
        
             | quercetumMons wrote:
             | Reasonable if growth/load is growing, too.
        
         | adamsmith143 wrote:
         | Is it really though? Are engineers committing so frequently
         | that they can't make it through a few hours without Github?
        
           | queuebert wrote:
           | Maybe they measure performance in git commits.
        
           | lallysingh wrote:
           | It depends on how many engineers you have! But also, there
           | are plenty of other functions in GH besides raw git, like
           | Wiki/PR/Issues/test/deploy pipelines, etc. It can become
           | pretty critical.
        
           | duped wrote:
           | An outage of a few hours can tank a release deadline for me,
           | so yes.
        
           | [deleted]
        
           | imiric wrote:
           | GitHub doesn't just host Git repositories. It's the central
           | location for discussions, issues, code reviews, milestone
           | planning, and any CI process like testing or releases. If
           | it's unavailable whole teams can be interrupted.
           | 
           | Git is distributed. GitHub is very much not.
        
           | jeffwask wrote:
           | Yes
        
         | Melatonic wrote:
         | The marketing for building your own new "private cloud" will
         | begin soon I am sure :-D
        
           | pid-1 wrote:
           | You just made a few Openstack consultants raise from their
           | graves.
        
         | dbrgn wrote:
         | Does Gitea support some kind of federation / cross-instance
         | PRs? That's the main thing I'd miss from a self-hosted
         | instance, the ease of getting contributions.
         | 
         | After all, you don't even need Gitea for pure Git hosting. If
         | you have a server with SSH access, just init a bare repo in a
         | directory, push to that, and you're ready to go. No web UI
         | needed.
         | 
         | The reason I'm still using GitHub is not code hosting. It's
         | collaboration.
        
           | brimble wrote:
           | Gitea gets you: a nice GitHub-like web GUI, including for
           | stuff like managing users; 2FA; some integrations; web hooks
           | without having to add git-hooks to all your repos; and
           | extremely-useful-to-some-projects features like git-lfs
           | support.
           | 
           | If you don't want or need those things, bare git repos are
           | fine and certainly easier to support (not that Gitea's that
           | hard, though a few issues/PRs I've noticed have caused me
           | more than a little concern about the overall quality of the
           | project).
        
           | encryptluks2 wrote:
           | But by using GitHub for "collaboration" you are sacrificing
           | decentralization.
        
           | dbrgn wrote:
           | It seems there's a tracking issue here, but it seems stalled:
           | https://github.com/go-gitea/gitea/issues/1612
        
           | tokumei wrote:
           | > If you have a server with SSH access, just init a bare repo
           | in a directory, push to that, and you're ready to go. No web
           | UI needed.
           | 
           | Used to do that years ago for my personal projects. Honestly
           | does the trick.
        
         | devwastaken wrote:
         | And whom pays for fixing it? Downtimes of self hosted systems
         | using external software can be far longer. GitHub, unlike
         | Amazon and friends, doesn't lie about their downtime. Every
         | saas has hundreds of downtime instances across the board every
         | month. Some are small enough you don't see them. Yet the
         | services still work exceptionally well - and when they don't
         | they get fixed in a quick manner. What takes them an hour would
         | take most private orgs a day.
        
           | AlexandrB wrote:
           | > GitHub, unlike Amazon and friends, doesn't lie about their
           | downtime.
           | 
           | Are you kidding? The last 2 incidents were called "degraded
           | performance". Where "degraded" meant I would get nothing but
           | 500 errors accessing GitHub.com either via browser or git
           | itself for the duration of the outage. How is this not lying?
        
           | everfrustrated wrote:
           | GitHub is notorious for only noticing outages once the USA
           | morning starts.
           | 
           | If you're using GitHub in Europe or Asia it's not uncommon
           | for GitHub to be offline for many hours before they
           | acknowledge anying.
        
         | mhh__ wrote:
         | The company I work for has a bunch of non-programmers using and
         | working in gitlab (or "the git"), I can't really see it
         | happening with GitHub regardless of where it was hosted.
         | 
         | Gitlab just seems better for actually running a software
         | project.
        
         | edgyquant wrote:
         | I'm not sure at what organization that is true. My company
         | lives out of GitHub and Jira and I've hardly noticed the three
         | month surge. GitHub would have to do a lot worse to get many
         | companies to want to host their own services. This is the
         | argument people have said about the cloud from day one.
         | 
         | People want to know it isn't their problem, that makes cloud
         | computing (and things like GitHub) worth their weight in gold.
         | I have real problems to solve I don't want to deal with a git
         | repo manager on top of that.
        
           | ManWith2Plans wrote:
           | I will say that for us this is a huge deal. We're a devops
           | services company, and our customers expect their deployment
           | pipelines to work. This is becoming a huge pain-point for a
           | few of our customers and we recommended Github Actions to
           | them. A couple of our customers want us to move away from
           | GitHub actions because of how disruptive outages have been.
        
           | jeltz wrote:
           | Maybe you are in a different time zone because our
           | organization certainly noticed and was disrupted by this.
        
             | edgyquant wrote:
             | I'm on PST time, some of our other devs are on the east
             | coast and one is in India. I think we're spread out enough
             | it should be an issue but maybe we prioritize different
             | things.
        
               | jeltz wrote:
               | We are in CET and maybe we use Github differently than
               | you.
        
           | jhugo wrote:
           | I think the impact was for some reason not consistent between
           | users (maybe due to geographical factors or maybe sharding of
           | accounts?). We're in Asia and I think we've had three
           | different days recently where we couldn't actually get much
           | work done due to GitHub being flakey or down for the entire
           | day and our CI/CD and development processes being built
           | around it. We ended up moving off GitHub onto a self-hosted
           | system, which took about a day of work for one engineer
           | (CI/CD itself was already self-hosted, so just Git, issues
           | and PRs), and there have already been two more GitHub outages
           | since then.
        
           | gjulianm wrote:
           | At my organization it's always been true. Setting up GitLab
           | is fairly easy, in my company we do it and it's cheap (on-
           | prem hosting is basically zero, and we had the IPs/domains
           | already) and it hasn't given us too many headaches. I think
           | last time I had to do something was maybe a few months ago
           | when I restarted it so that it picked up the updated SSL
           | certificate.
        
             | nightpool wrote:
             | Self-hosted GitLab got a good callout yesterday from
             | Microsoft, it appears to be a favorite of LAPSUS$: https://
             | www.microsoft.com/security/blog/2022/03/22/dev-0537-...
             | 
             | Self-hosting _always_ increases the operational burden of
             | making sure your systems are secure. Maybe you have the
             | engineering resources to spend on patching everything
             | immediately and conducting in-house pen tests, but for most
             | companies it 's much, much more secure to let the
             | software's developers host it as well.
        
               | temp8964 wrote:
               | Not necessarily. Self-hosted services are protected by
               | company firewall / VPN. They can setup very restrictive
               | network access. They don't have the same level of risks
               | as public services like GitHub or GitLab.
        
               | hhh wrote:
               | Establishing an entry point via VPN is Lapsus$ primary
               | first step.
        
               | Melatonic wrote:
               | Except that the software developers hosting is also a
               | much, much bigger target and you generally do not have
               | any real control over how often they are patching either.
        
             | xondono wrote:
             | I'd say it depends, I run my own on prem server and gitlab
             | was a PITA. Too many moving parts, updating took too much
             | of my time, and I never felt "safe".
             | 
             | Moving to gitea solved all of those issues for me (thus
             | far), now I'm looking into adding other stuff like CI
             | through Drone.
        
               | Inversechi wrote:
               | Did you consider woodpecker instead of drone? It's
               | basically an evolved fork of the OSS version.
               | 
               | https://woodpecker-ci.org/
        
               | xondono wrote:
               | Didn't even know about it. I'll check it out.
               | 
               | Thanks!
        
               | KronisLV wrote:
               | Curiously, this was also my own experience!
               | 
               | I actually wrote a bit about the migration process, as
               | well as the reasons for migrating over to Gitea, Nexus
               | and Drone CI as opposed to using GitLab, GitLab Registry
               | and GitLab CI: https://blog.kronis.dev/articles/goodbye-
               | gitlab-hello-gitea-...
               | 
               | With containers, it's actually a pretty good experience
               | that's not too hard to setup or manage.
        
               | edgyquant wrote:
               | It definitely depends. We're pretty early stage and I'm
               | the senior engineer+infrastructure guy so running our own
               | gitea instance or whatever is just more time that I'm
               | almost out of.
        
             | skeeter2020 wrote:
             | >> Setting up GitLab is fairly easy, in my company we do it
             | and it's cheap (on-prem hosting is basically zero, and we
             | had the IPs/domains already)
             | 
             | In what tech company is hosting or domains the main cost
             | centre? Many companies spend more on a single hour of a
             | dev's time than their entire GH monthly bill.
        
               | capitol_ wrote:
               | I think we pay about $10 per developer per month for
               | github, and with about 1000 developers I would love that
               | hourly rate.
        
               | kleebeesh wrote:
               | ...What? $10 x 1000 = $10k / month. $10k x 12 = $120k.
               | That is a new grad software engineer salary in any US
               | city. You'd pay more than that for a single dev with the
               | devops and security experience to keep GHE running and
               | patched for 1000 devs.
        
               | zeku wrote:
               | Just a bone to pick... new grad engineers in my US city
               | started around 60-70k in 2018 when my college cohort
               | graduated. Southern US...
        
               | zrail wrote:
               | Things have changed considerably over the last four
               | years.
        
               | sitzkrieg wrote:
               | yea and starting salary for zero experience developers is
               | not $120k in most places
        
               | oceanplexian wrote:
               | There are a lot of problems with this from the business
               | angle:
               | 
               | (1) An engineer getting paid 120k doesn't "cost" 120k,
               | probably >150k with federal taxes, health insurance,
               | benefits, and so on. Not including the cost to recruit,
               | interview, and train said person.
               | 
               | (2) I don't know of many 1,000 person companies that
               | would trust a new grad software engineer with no
               | experience to manage critical infrastructure.
               | 
               | (3) You need N engineers to manage said service, because
               | what happens when your one engineer gets sick, takes PTO,
               | or quits for some reason? You also need a manager for
               | said engineer(s).
               | 
               | (4) You now need to secure an internal service you never
               | did before, so expect to have to hire external security
               | consultants or re-allocate security engineers, since it's
               | high risk.
               | 
               | (5) Github is FedRAMP compliant, SOC1 and SOC2 compliant
               | and GDPR compliant. If you or your customers need any of
               | those things, expect to hire external auditors on a
               | recurring basis to validate your home-grown solution
               | meets those requirements.
               | 
               | I hate to make these points because I'm a big believer in
               | the scrappy startup mentality, but if you want to do
               | things right, in the context of a large enterprise that
               | is accountable to a lot of people, expect a project like
               | this to cost $1MM per year minimum, and it probably won't
               | reach parity with a cloud offering in terms of
               | reliability, multi-region performance, proper backups,
               | and so on. This is why Github can charge ~$200 per user
               | (Or $200k per year for 1,000 seats) and still come away
               | looking like a bargain.
        
               | tjoff wrote:
               | Well, considering you'd likely spend an average of 5
               | minutes per day doing it I wouldn't mind it.
        
               | cortesoft wrote:
               | The person was replying to a comment saying they spend
               | more on a SINGLE HOUR of a dev's time than the monthly GH
               | bill, which is not true for an org of more than 20 people
               | or so (depending on hourly rate).
        
               | kleebeesh wrote:
               | Ah, totally misread it. Thanks.
        
           | thaeli wrote:
           | Also, looking at this it seems like GitHub isn't doing the
           | common SaaS thing of just lying on their status page. Many
           | providers, both internal and external, would look a lot worse
           | if they had honest status pages.
        
             | jhugo wrote:
             | Several of the recent outages were much longer (at least
             | for us, here in Asia) than they admitted on their status
             | page. In one case I started work, noticed I couldn't push
             | to or pull from GitHub, that situation persisted all day,
             | and around 5pm local time (so morning-ish in the US)
             | suddenly their status page acknowledged the problem and a
             | discussion started on HN.
        
             | mirekrusin wrote:
             | They are green for good 15 minutes from first moment i see
             | problems, not the first time, it happens actually quite
             | often. Maybe that's the time they need to confirm/cross
             | check/write status update, don't know.
        
               | judge2020 wrote:
               | They probably allow regular SREs to trigger an incident
               | on the status page on their own, when the likes of AWS
               | and other bigger cloud providers are rumored to need
               | approval from a VP[0] to update the status page.
               | 
               | 0: https://news.ycombinator.com/item?id=29475756
        
               | georgemcbay wrote:
               | While quicker reporting would be better, 15 minutes is
               | anecdotally a lot better than I see from most other
               | services where their status pages will report all-clear
               | hours into full outages.
        
               | thaeli wrote:
               | Yeah, I'm legit impressed with a 15 minute time here.
        
             | ishanjain28 wrote:
             | They do intentionally or not lie about this on their status
             | page. From December 25th to December 31st 2021, Github
             | actions had network problems almost every single day for
             | hours and the status page was green out through out that
             | period.
             | 
             | Same thing also happened few months back.
             | 
             | It feels like they do this manually and it's only done when
             | enough people are effected.
        
           | SkyPuncher wrote:
           | > I'm not sure at what organization that is true. My company
           | lives out of GitHub and Jira and I've hardly noticed the
           | three month surge
           | 
           | These have been minor inconveniences for us - at worst. Most
           | of the time it simply means people jump to something else
           | then come back later in the day.
           | 
           | Failing tests and PR feedback cycles are more of a blocker to
           | our team than these outages.
        
           | jmartens wrote:
           | My company monitors the functionality, performance and
           | availability of apps like Github, and we have certainly
           | noticed the increase in issues lately.
        
             | edgyquant wrote:
             | We were actually talking about implementing this last week.
             | Not for GitHub but for slack as it seems to have issues
             | once a month or so.
        
           | wnevets wrote:
           | > I've hardly noticed the three month surge.
           | 
           | This has been my experience as well. I don't know if that
           | means GitHub is being overly transparent about issues or I've
           | just been lucky but I would hate if people punished services
           | for being transparent and informative on their status pages.
        
             | zenexer wrote:
             | GitHub's outages have hit me hard over the past week or so.
             | I don't think it's a matter of them being transparent--if
             | anything, I was hitting errors well before their status
             | page updated. Yesterday it was completely unusable for much
             | of my workday, and today tasks that normally take me a few
             | minutes have been taking hours.
        
           | jacobr wrote:
           | 20 PRs waiting in line for half a day to be merged is pretty
           | annoying. We've had that on multiple occasions the last few
           | weeks due to GitHub incidents.
        
         | dijonman2 wrote:
         | GitHub enterprise is amazing, but I agree that a centrally
         | hosted Git instance of any variety is a liability.
         | 
         | With the advent of the Okta breach I think we will see a
         | reverse in the centralization trend.
        
         | gwbas1c wrote:
         | > At this rate the benefits of running your own gitea or gitlab
         | are starting to become competitive
         | 
         | When you host things yourself, you still have downtime. And,
         | having worked with Github for over a decade, the actual
         | disruption to my work is from downtime is much less than if I
         | had to host my own.
         | 
         | That being said: I briefly worked for a company that hosted its
         | own source code control system. For us, as a small team, it
         | wasn't worth it. The system was outdated and hosted in an
         | insecure manner. No one ever did any "admin" work except the
         | founder. He ran it because he had irrational fears of
         | switching, not because of any tangible advantages over Github
         | (and competitors.)
         | 
         | Keep in mind that Github (and competitors) are often cheaper
         | than the time needed to invest in hosting your own. (Estimate
         | 10-20 hours a year of invested time. Calculate your hourly
         | rate. Github and competitors are cheaper.) In order to come
         | ahead, you need tangible benefits other than "I think I can
         | have less downtime."
        
           | megous wrote:
           | Dunno, I got blocked from my work SaaS hosted gitlab for
           | about a month by cloudflare. Nobody at gitlab or cf helped. I
           | only figured it myself after about 4 hours of research, that
           | it was caused by some disabled (by me years ago) web tracking
           | APIs no-one should have hard dependence on.
           | 
           | I certainly would not have this problem on self hosted
           | instance, because it would not be behind CF. I'm sure I'd
           | have other problems though. :)
           | 
           | All software is crap. You can be either spending time fixing
           | it yourself, or spending time begging online for fixes/help
           | from some SaaS company/community with resolution time in
           | months, somtimes, all that while you may not be able to use
           | it fully.
           | 
           | Also with SaaS it will be constantly shifting under you.
           | Things will be moved around, restyled, iconized, popupized,
           | etc. This doesn't help productivity either. With self-
           | hosting, you can at least avoid upgrading, if you dislike
           | this kind of thing. Or choose FOSS software that values UX
           | permanency/stability, which seems to be really hard ask from
           | SaaS business.
        
         | mrkurt wrote:
         | If you want companies to be honest on their status pages (I
         | do!), you can't just count incidents like that. Status pages
         | can be an amazing place to communicate all kinds of problems.
         | 
         | Most issues have a relatively narrow impact, but the impacted
         | people _still_ benefit from seeing them listed.
        
           | jmartens wrote:
           | How can we solve this as customers, or push the vendor to do
           | better?
        
             | mrkurt wrote:
             | Use vendors who do a good job communicating status,
             | basically. I don't think you can change AWS behavior. But
             | if you find a hosting company who does an amazing job with
             | their status updates, put some apps there (_my_ company
             | does an ok job with status page updates, we're getting
             | better, it's not amazing yet).
        
             | encryptluks2 wrote:
             | Stop being a customer of crappy vendors
        
               | copperx wrote:
               | What cloud provider does better status pages than AWS?
        
               | drusepth wrote:
               | The snarky answer is "literally all of them", but one
               | real answer is that I've been pretty happy with GCP's
               | status reporting for the past year-ish I've used them.
               | I've only noticed a few incidents, but every time I've
               | checked the status it was already updated. They also
               | occasionally provide workarounds on the live incident
               | pages if you need to be back up before the issue is fixed
               | on their end.
        
         | rvz wrote:
         | Well, I think I have said that since 2020 [0] and it is self-
         | evident that you are better off self-hosting your own Git repo.
         | If you can host a website you can do it. If GNOME, ReactOS,
         | Wireguard, Linux Kernel Project, Mozilla, etc can do it, so can
         | you. Or even use it as a backup / failsafe just in case.
         | 
         | But going _' all in'_ on GitHub just doesn't make any sense
         | anymore.
         | 
         | [0]
         | https://hn.algolia.com/?dateRange=all&page=1&prefix=true&que...
        
           | Someone wrote:
           | But who can host a website? I would be wary of hosting
           | something that isn't a 100% static site, out of fear of the
           | amount of attention maintenance would take.
           | 
           | Also, quite a few of the non-profits behind the projects you
           | mentioned have multi-million dollar budgets that they can use
           | to administer their git instance, if needed. I don't think
           | "if they can do it, you can" is a strong argument for those.
        
             | rvz wrote:
             | I don't recall ReactOS, or the creators of wireguard having
             | _' multi million dollar budgets'_. How is it that even
             | projects like RedoxOS [0] are able to self-host on a GitLab
             | instance using a subdomain, without giant budgets in the
             | millions?
             | 
             | You don't need a _' multi-million dollar budget'_ to self-
             | host a git repo and may of these open-source projects have
             | been doing so even before GitHub existed for years. Even if
             | they did have such a budget, there isn't an excuse left to
             | self-host and avoid going _' all in'_ on GitHub.
             | 
             | At the very least I would expect something like what
             | ReactOS is doing by having a self-hosted backup just in
             | case GitHub goes down or vice-versa. [1]
             | 
             | Looks like that is proving to be useful.
             | 
             | [0] https://gitlab.redox-os.org/redox-os
             | 
             | [1] https://github.com/reactos/reactos#code-mirrors
        
               | Someone wrote:
               | > You don't need a 'multi-million dollar budget' to self-
               | host a git repo
               | 
               | I never made that claim. The argument was "if X can do
               | it, so can you".
               | 
               | I pointed out that _some_of_these_ (Mozilla, likely the
               | most extreme of them, had over $400 million in revenues
               | in 2020), are quite different from the typical 'you',
               | invalidating that argument.
               | 
               | As always, invalidating an argument doesn't mean its
               | conclusion is wrong.
        
             | rglullis wrote:
             | My last bill from Hetzner was ~35EUR. I host gitea, drone
             | CI, hashicorp vault and my own docker registry/pypi
             | repository. I can add as many users as I want, and I had
             | exactly zero incidents in the past ~6 years since I set
             | this up.
             | 
             | I don't even worry about a strong backup strategy (besides
             | just making occasional snapshots of the data volumes)
             | because this was all set up with IaC tools (Terraform,
             | Ansible) and I have copies of all the code in local
             | repositories.
        
           | rglullis wrote:
           | It's almost like people forget that git is a _Distributed_
           | Version Control System, after all...
        
             | djbusby wrote:
             | GitHub/Lab are for more than just code repo
        
         | Jiejeing wrote:
         | If you are a closed org, that is. Running your own gitea or
         | gitlab with registration enabled and having to deal with spam
         | is a real hurdle.
        
           | julianlam wrote:
           | Is it not possible to restrict access to the git server from
           | a VPN server only?
           | 
           | Just off the top of my head, that's one thing you can do.
        
             | mlyle wrote:
             | Yah, that's a "closed org". When you need to deal with the
             | public at large, you need to deal with user registration
             | issues and spam.
        
             | Dobbs wrote:
             | So now every person who wants to contribute to your open
             | source project has to setup a VPN client?
             | 
             | The parent comment was explicitly about non-closed (e.g.
             | private) orgs.
        
       | eatonphil wrote:
       | Github Actions are back for me now.
        
       | toastal wrote:
       | And to think Git can easily be decentralized. I wonder if the
       | community could fork GitHub to fix it. Oh, it's not open source.
       | Devs must be too busy working on more 'social' features like "For
       | You (Beta)" to milk the attention economy.
        
       | intunderflow wrote:
       | With how often these happen we might as well sticky this thread
       | for the next one
        
       | grumple wrote:
       | Again?! Jeez. I wish I had customers this tolerant.
        
       | rvz wrote:
       | Again? Last time that happened was 24 hours ago? [0] It is really
       | getting unreliably bad. Like I said before, having a self-hosted
       | backup seems to make more sense.
       | 
       | [0] https://news.ycombinator.com/item?id=30767821
        
       | mirekrusin wrote:
       | Status page says only degraded performance.
       | 
       | It's a nice way of putting it.
       | 
       | I'm trying to run github action for couple of hours now. They
       | don't work at all. But apparently this means they run, but in
       | infinite time, hence == degraded performance, nice.
        
         | raffraffraff wrote:
         | It's just a way to avoid SLA breaches. "Of course it wasn't
         | _down_! It was just infinitely slow! "
        
       | koolba wrote:
       | I really wish they would add the word "outage" to these titles.
       | 
       | "Incident" alone makes me think something got hacked or leaked.
        
         | arez wrote:
         | That's SRE lingo --> https://sre.google/sre-book/managing-
         | incidents/
        
           | zufallsheld wrote:
           | It's also itil lingo, which predates sre.
        
             | mtnops wrote:
             | It's NIMS - FEMA lingo, which predates ITIL. Which was
             | developed in USFS wildland firefighting, which predates
             | FEMA. It's incident management all the way down.
        
       | blueplanet200 wrote:
       | I hope they figure out what's going on every morning. Heard from
       | inside they don't know why the db dies everyday but restarting it
       | fixes it.
        
         | cube00 wrote:
         | Break out the early morning restart cron job.
        
           | Kostic wrote:
           | Early morning in which timezone?
        
             | glenneroo wrote:
             | When the least amount of users are online?
        
             | afterburner wrote:
             | GaryOldman.gif
        
           | gaoshan wrote:
           | Here you go, Github:
           | 
           | 0 4 * * * /etc/init.d/postgresql restart
           | 
           | I'll take an architect position as compensation, but only if
           | there is equity.
        
             | rish wrote:
             | GitHub uses MySQL primarily though.
        
               | grumple wrote:
               | MySQL also has a restart command! I'll take my rsus now
               | ty.
        
         | shepardrtc wrote:
         | IIS Server had/has a memory leak in worker threads that many
         | years ago always forced us to restart the server every few
         | days. Starting in 6.0, they added worker thread recycling and
         | made it a mandatory to choose a time period for every thread to
         | be recycled. Why fix the error when you can just restart the
         | service?
        
           | whimsicalism wrote:
           | I doubt they use IIS
        
             | throwra620 wrote:
             | MSer here, yes we do... for some things
        
               | prepend wrote:
               | For GitHub? It seems unbelievable that they would use IIS
               | pre-purchase and why in the world would you mix in a
               | second web server for post-purchase enhancements.
        
               | whimsicalism wrote:
               | If GH is around the same level of integration with
               | Microsoft as my employer, which is another Microsoft
               | acquisition, I don't really believe you have a ton of
               | insight into GH processes.
        
               | edgyquant wrote:
               | I dated a girl at GitHub for awhile last year who said
               | they weren't even completely off of AWS yet and she liked
               | how they didn't seem like working for Microsoft. Maybe
               | this has changed though.
        
           | djbusby wrote:
           | Apache prefork had that since forever. Seems just a garbage
           | collect type pattern.
        
             | shepardrtc wrote:
             | It's not a bug, it's a pattern.
             | 
             | Seriously though, IIS 5.0 had no worker recycling. There
             | was no method to fix the issue. Threads would eat up GB's
             | of memory until you killed them.
        
         | raffraffraff wrote:
         | Yuck. Honestly, restarting a database to fix a major outage
         | sounds like "we have no idea what we're doing"
        
           | bpicolo wrote:
           | Sporadic database performance issues can certainly make you
           | feel that way. They are definitely not trivially debugged at
           | scale
        
           | vimda wrote:
           | Would you rather it stay down while they spend a day
           | debugging it?
        
             | paulryanrogers wrote:
             | If that means it won't be down every morning in my time
             | zone then yes.
        
         | exikyut wrote:
         | What's "the db"? It sounds like something of small to medium
         | scale if you can just restart it like that.
         | 
         | In any case, why not just relocate some vendor engineers on
         | site for a bit? Or, better, why does the vendor not have a
         | small presence in the corner?
         | 
         | Sounds like whatever "the db" is it's probably some
         | (objectively) small but very scary thing that's currently on
         | fire and people are trying to figure out how to put it out
         | without crashing the plane _and_ also making too many waves
         | internally, which is probably even harder. So asking about
         | making vendor noises is (as useful as it may be) probably going
         | down the wrong path - in much the same way this is probably not
         | related to the outages (it may well be, but from the outside it
         | 's all coincidence anyway).
        
       | jonnybarnes wrote:
       | 2nd day in a row isn't it?
        
         | stepri wrote:
         | And 6 days ago: https://news.ycombinator.com/item?id=30711269
        
         | rvz wrote:
         | It is. 24 hours later [0] and I only expected it to happen once
         | every month. Looks like it is getting worse.
         | 
         | Oh dear. Not a good idea to go 'all in' on GitHub.
         | 
         | [0] https://news.ycombinator.com/item?id=30767821
        
         | momothereal wrote:
         | Yes: https://news.ycombinator.com/item?id=30767635
        
         | fishnchips wrote:
         | Yesterday they had two.
        
       | higeorge13 wrote:
       | The usual services (actions) again down around the same time.
       | This is embarrassing.
        
       | etimberg wrote:
       | The quality of GH seems to be slipping
        
         | amelius wrote:
         | I hope it doesn't affect security ...
        
         | xtracto wrote:
         | Funny that it happened since they were acquired by Microsoft...
         | reminds me of Hotmail, Skype, LinkedIn, Rare, among several
         | others.
        
           | [deleted]
        
           | [deleted]
        
         | Trasmatta wrote:
         | I've actually been pretty impressed with the quality of the
         | product and new features over the past couple of years, but it
         | seems to be having a lot of stability issues recently.
        
           | etimberg wrote:
           | I've liked the new features too, especially after so many
           | years of not many features. Maybe they've moved too fast now
        
       | paskozdilar wrote:
        
       | bob1029 wrote:
       | We are scheduling a call with an enterprise sales person next
       | week.
       | 
       | If I can get all the Github features I had as of ~2020, but on an
       | instance that wont get hit by the public cloud/update bus, I
       | would be exceptionally happy.
       | 
       | The only complaints we have are regarding availability. If we can
       | fix that one problem, this is a perfect product in our view.
        
         | andruby wrote:
         | How do you evalute running your own gitlab instance?
        
       | [deleted]
        
       | iBotPeaches wrote:
       | It seems like we haven't had a non-robot status update on the
       | status page in days since this what seems like daily occurrence.
       | I figure at this point we'd get something of why this is
       | happening.
       | 
       | I also don't appreciate our builds freezing, unable to be
       | cancelled and then eating up hundreds of minutes.
        
         | MattIPv4 wrote:
         | > I figure at this point we'd get something of why this is
         | happening.
         | 
         | I've created a new discussion in their feedback repo asking for
         | this, three major outages in a week could really do with a
         | post-mortem:
         | https://github.com/github/feedback/discussions/13344
        
         | lucasyvas wrote:
         | Billing should always be built on a "ping" IMO and not
         | start/stop hooks. The latter is shockingly bad for customers
         | during times of unreliability. The former sounds stupid and
         | requires more infrastructure from the one offering the service,
         | but I think it's more fair.
         | 
         | I haven't used GA in a way where it actually costed me
         | anything, but having minutes just tick away while you can't do
         | anything is really stupid if that's the case.
         | 
         | Edit: Another sane solution would probably be to record outage
         | periods and have Billing automatically reconcile for every
         | customer when invoicing. This would require them to admit the
         | outage durations however, so it may be flawed from a human
         | perspective.
        
           | drusepth wrote:
           | The "ping" solution is an interesting one that I haven't seen
           | proposed before.
           | 
           | At what rate would you do these pings? I don't know how
           | upgrading/downgrading works at GitHub but if they do any sort
           | of refund/credit when you downgrade, it seems like there's
           | some interesting implications for abusing the system (e.g.
           | upgrading/downgrading between pings for "free" service if the
           | time between them is too long) versus performance (e.g. how
           | do you update all users per ping in a timely manner if the
           | time between them is too short?).
           | 
           | Would love to read up more on this approach; seems
           | interesting!
        
         | easton wrote:
         | Do they give you the minutes back if there's an incident during
         | the period where a job is running?
        
           | no_wizard wrote:
           | You will have to contact them for them to credit you, that's
           | what we did
        
             | lucasyvas wrote:
             | This is totally unsurprising and also totally unacceptable
             | IMO. They should automatically wipe out all build minute
             | usage during outages for every account if they insist on
             | architecting their system in this way.
        
         | mhitza wrote:
         | I suggest you add the timeout-minute property on the job/step,
         | so even if the web interface isn't responsive the job times out
         | eventually. Saves you from spending time emailing support about
         | consumed minutes.
         | 
         | Of course, assuming that a future bug won't affect the timeout-
         | minute itself.
        
       | cube2222 wrote:
       | Looks like they really want to get a PR deployed, but there's
       | still not enough duct tape on it.
        
       ___________________________________________________________________
       (page generated 2022-03-23 23:01 UTC)