[HN Gopher] [Resolved]Incident with GitHub Actions, Issues, Pull...
       ___________________________________________________________________
        
       [Resolved]Incident with GitHub Actions, Issues, Pull Requests, and
       Webhooks
        
       Author : mot2ba
       Score  : 84 points
       Date   : 2021-10-21 15:24 UTC (7 hours ago)
        
 (HTM) web link (www.githubstatus.com)
 (TXT) w3m dump (www.githubstatus.com)
        
       | maherbeg wrote:
       | I'm surprised they don't have a couple of separate clusters that
       | they roll things out to and monitor. Seems like you could have a
       | very stable "high paying customers" cluster that is at the very
       | end of your deployment cycle after a ton of canary checks on the
       | way get through.
        
         | intunderflow wrote:
         | With Actions they do this, if you're on GitHub Enterprise and
         | run an action it picks machines out of a special pool set aside
         | just for enterprise customers.
        
       | intunderflow wrote:
       | When GitHub is down we legally don't have to do any work, it's in
       | the constitution I swear
        
         | m_a_g wrote:
         | Best comment in this thread by far
        
       | bob1029 wrote:
       | This one is pretty nasty. Getting tired of the disengagement this
       | causes for the team. It's basically a lost day of productivity
       | even if GH goes down for only 30 minutes. Yes, we can continue
       | coding locally, but issues & PRs are a huge part of our daily
       | process.
       | 
       | When I get back from vacation we are moving our shit to the
       | enterprise plan. $21/user-month is really not that big of a deal
       | when you are running basically your entire business through the
       | product.
       | 
       | I do agree that it's ridiculous to assume that we can manage
       | Github's software better than their own engineers, but at the
       | same time our infrastructure has proven itself to be extremely
       | reliable over the last 4-5 years. Even hosting GH enterprise on
       | public AWS/Azure is more ideal in my eyes now, because I can
       | control the physical region and tenancy. There is an Azure
       | datacenter within 100 miles of many of our home offices and I can
       | ensure that our Github stack spins up there. Minimizing the
       | amount of internet you have to transit to get to your
       | applications can sidestep a lot of this stormy public
       | cloud/internet weather bullshit.
        
         | cyberpunk wrote:
         | I would strongly recommend (having used both extensively) going
         | all in in gitlab instead if you have to do a migration anyway.
        
         | teh_klev wrote:
         | > There is an Azure datacenter within 100 miles of many of our
         | home offices and I can ensure that our Github stack spins up
         | there. Minimizing the amount of internet you have to transit to
         | get to your applications can sidestep a lot of this stormy
         | public cloud/internet weather bullshit.
         | 
         | You've no guarantees that your local'ish data centre is going
         | to be hop-wise, route-wise and peering-wise any better than a
         | DC 1500 miles away, in relation to your home or office ISP.
        
           | bob1029 wrote:
           | > You've no guarantees that your local'ish data centre is
           | going to be hop-wise, route-wise and peering-wise any better
           | than a DC 1500 miles away, in relation to your home or office
           | ISP.
           | 
           | You're correct. In fact, as I type this reply my cloudflare
           | diagnostics are indicating I am talking to a datacenter 200
           | miles further away than would otherwise be ideal. That said,
           | its still within an extremely reasonable distance. This is a
           | "risk" I am willing to take. It's certainly a better starting
           | point than _guaranteed_ 70ms minimums.
        
         | scaryclam wrote:
         | I don't buy this at all. Even if GH is down for the entire day,
         | how the heck does NO-ONE on your team know what they're doing?
         | Was no-one working on something already? Did nobody pay any
         | attention during planning? Do you not have anything you can do
         | from memory from your backlog?!
         | 
         | If you can't work for a day just because Github is down, then
         | there's bigger problems in your process that github being down.
         | I'm sorry of that sounds harsh, but you're either being
         | hyperbolic or you have some real issues to fix in your team or
         | organisation.
        
         | blitzar wrote:
         | On the plus side, you have a process and in theory it works, or
         | you would be looking elsewhere.
         | 
         | You might have lost a day today, but how many days have you
         | gained thanks to these tools the last month?
        
         | rightbyte wrote:
         | > I do agree that it's ridiculous to assume that we can manage
         | Github's software better than their own engineers
         | 
         | Not really. You can mess with stuff when it suits you with a
         | risk for downtime. Hosting yourself has the same advantage as
         | disabling auto-updates - you are in control of when to break
         | stuff.
        
         | lima wrote:
         | > _I do agree that it 's ridiculous to assume that we can
         | manage Github's software better than their own engineers_
         | 
         | Why? It's totally reasonable that your own GHE instance will
         | have better uptime.
         | 
         | Running GitHub.com is much, much harder than a private instance
         | (DB scale-out, load, ...).
         | 
         | Our in-house Gerrit infra and CI has had a significantly better
         | uptime than GitHub over the past year, but we have hundreds,
         | not 60 million users and exabytes of storage :-)
        
         | andrewstuart2 wrote:
         | I think you will quickly find if you're just deploying GH
         | Enterprise on premises that it is not at all what you get from
         | the GH Cloud offering. GHE has its own product roadmap that is
         | quite a bit behind the cloud product, and in many cases (IMO)
         | unacceptably so. It still doesn't support cache for runners,
         | last I checked, though I've since moved on from the org that
         | required me to work with GHE. I'm back to my happy place with
         | self-hosted GitLab, and a little bit of GitHub cloud.
        
           | bob1029 wrote:
           | > GHE has its own product roadmap that is quite a bit behind
           | the cloud product
           | 
           | This is exactly what we want though. We don't need the new
           | fancy shit on a regular cadence. Issues, Code, PRs and 1 line
           | checkbuild scripts are all we care about. Everything else is
           | built into our software.
        
             | andrewstuart2 wrote:
             | What I mean is that it's quite clearly not the same product
             | as cloud. It _does_ have roughly the same road map as
             | cloud, just pretty far behind and /or cherry-picking some
             | features in a different order. My experience with
             | enterprise software is that it doesn't matter what the
             | roadmap is as long as you get to choose if/when the
             | benefits outweigh the risks for an update. And you usually
             | want certain releases to get backported security updates
             | for that same reason. You don't have to take new features
             | and their associated bugs but you do want, and get,
             | security fixes. That's a separate thing, because this would
             | be the case if GHE was just "run your own copy of .com."
             | 
             | What is really rough about GHE is that you _can 't_ choose
             | a lot of the features or IMO baseline requirements like
             | caching that you've probably come to expect from
             | github.com, and may have been around for years. At least
             | not until they can get GHE to parity with .com.
        
             | eyegor wrote:
             | I'd recommend checking out gitlab for on-prem hosting in an
             | enterprise environment. Works great, integrates with
             | AD/ldap and most of the features are available on the free
             | tier if you want to test it. Practically a drop in
             | replacement for github.
        
               | mdaniel wrote:
               | I love GitLab, and am now a shareholder, but advertising
               | it as "drop in replacement" is setting up an evaluator
               | for disappointment
               | 
               | At the very least, they use just absolutely incompatible
               | yaml files for their CI pipelines (in, of course, an
               | incompatible location in the repo)
               | 
               | But probably the bigger obstacle would be their
               | incompatible API (and incompatible auth to it); that
               | means one cannot grab a cool "github bot/tool for doing
               | $X" and expect it to do anything reasonable in GitLab
        
         | nonbirithm wrote:
         | > It's basically a lost day of productivity even if GH goes
         | down for only 30 minutes.
         | 
         | Are we entering an era where if we don't have hundreds of
         | thousands of servers running 24/7 to host our services, with
         | all the resource consumption and environmental implications
         | that result, that we will no longer be able to remain
         | productive as a society? Is this gradually becoming a new
         | baseline for humanity from which we cannot reasonably downsize?
        
           | jrochkind1 wrote:
           | Yes. You're just noticing?
        
           | bob1029 wrote:
           | > Are we entering an era
           | 
           | We've been in this era for about 4 decades now. There are
           | mainframes which do payment processing that, if they were to
           | fail, would cause substantial harm to the global economy
           | almost instantly.
        
           | sudhirj wrote:
           | Utility technology become foundational very quickly. Despite
           | it being very new, humanity is already fundamentally reliant
           | on global supply chains, oil, electricity, networks,
           | satellites and many other technologies that we cannot
           | downsize. We could collectively plan and execute decades long
           | exit plans, like we do for oil, but outages will bring daily
           | life to a halt.
        
           | ygjb wrote:
           | Yes.
           | 
           | It's worth noting that you can take almost any software from
           | before the late 90s to early 2000s, depending on the vendor,
           | that is still available, and with a layer of emulation get it
           | running in minutes.
           | 
           | The vast majority of software that is being built today for
           | end users simply will not function in a short time frame
           | because of aggressively built in dependencies on cloud based
           | services, often with those dependencies designed to encourage
           | customer lock-in and prevent piracy by forcing users to have
           | active accounts and shift core logic from endpoints to cloud
           | services.
           | 
           | Even moving past licensing servers and account capabilities,
           | tools like Grammarly ship much of their analysis to the
           | cloud, same for most translation services. Many modern text
           | to speech services are cloud based as well (just look at how
           | useless a modern cell phone becomes when you are without a
           | data connection, for example).
           | 
           | I don't know what the statistics would look like, but I
           | shudder to think how much of the world economy would grind to
           | a halt if Amazon or another significant cloud provider had a
           | sustained, multi-region outage (say 24-48 hours).
           | 
           | It's a god-damn mess, and we did it to ourselves.
        
             | bob1029 wrote:
             | > I shudder to think how much of the world economy would
             | grind to a halt if Amazon or another significant cloud
             | provider had a sustained, multi-region outage (say 24-48
             | hours).
             | 
             | You can rest at ease. Nothing that is mission-critical for
             | the world's financial infrastructure is hosted within one
             | of these sorts of facilities. Facebook and Netflix might go
             | down for days, but your Amex will still work at any
             | merchant with a functioning internet connection.
             | 
             | I have been inside of [financial services organization]'s
             | datacenter in [some midwestern state] which was purpose-
             | built for the IT load. The strategy is "failure is not an
             | option". It's essentially 1 gigantic, redundant life-
             | support system for the one of the more sensitive computers
             | on the planet. Amazon and Microsoft cannot afford to go to
             | these lengths for the market they serve.
        
               | ygjb wrote:
               | I wish I had your confidence in the financial system, but
               | it's not the financial services I am concerned about.
               | 
               | Financial services are important for moving money around,
               | and processing electronic payments, but it doesn't matter
               | how effectively you can process a wire if there are
               | significant supply chain disruptions, and systems
               | failures that take down the platforms that major
               | retailers and distributors use for logistics.
               | 
               | Even if ATMs and bank networks remain up, what about the
               | encashment and physical security services that those
               | institutions rely on to move around actual physical
               | money?
               | 
               | The economy is more than just financial services, and all
               | of those financial services are just proxies for the real
               | world goods that people need to survive for more than a
               | few days in most urban centres.
        
         | dabeeeenster wrote:
         | How does moving to the Enterprise plan help?
        
           | onionisafruit wrote:
           | Presumably GP means they will deploy their on GitHub
           | Enterprise server instead of using github.com.
        
           | sudhirj wrote:
           | GH Enterprise is run on your own servers, so you'd
           | theoretically run it right in the office. It may not move the
           | needle on actual downtime, but there's some control in the
           | downtime - if the LAN is out no one can work anyway, and
           | upgrades will only happen on your business's lean days, be
           | tested out with the teams that have an appetite for
           | experimentation, etc.
        
         | onionisafruit wrote:
         | > I do agree that it's ridiculous to assume that we can manage
         | Github's software better than their own engineers
         | 
         | That by itself would be ridiculous, but there's more to it than
         | that. Your GHE server won't have new code deployed to it
         | hundreds of times per day the way github.com does. You probably
         | won't be the target of ddos attacks either.
         | 
         | Very few of github.com outages are the result of maintenance
         | errors.
        
         | cmckn wrote:
         | > It's basically a lost day of productivity even if GH goes
         | down for only 30 minutes. Yes, we can continue coding locally,
         | but issues & PRs are a huge part of our daily process.
         | 
         | I know outages are frustrating, but how does 30 minutes before
         | 10am ruin an entire day? Maybe you're just being hyperbolic,
         | but people take coffee breaks longer than that.
        
           | bob1029 wrote:
           | > how does 30 minutes before 10am ruin an entire day?
           | 
           | Not everyone is on pacific time. This impacted us right in
           | the middle of a standup call and disrupted our planning for
           | the day.
           | 
           | Also, the problem is more that you don't know how long the
           | outage is going to last at first, so you start finding other
           | ways to occupy your time. Through the lens of hindsight, yes
           | we are certainly being hyperbolic in those cases where it
           | _was_ only 30 minutes.
        
             | jimmaswell wrote:
             | > so you start finding other ways to occupy your time
             | 
             | Ok, pick a ticket and do some work on it locally, when
             | that's done do the same with another. I can go a full day
             | without interacting with github because I'm working on a
             | local branch. Make a note of what branches you need to push
             | later. I can't possibly imagine throwing my arms up and
             | saying the day is wasted because I have to work locally.
             | It's completely unbelievable.
        
               | jorams wrote:
               | I understand your point and agree, but "pick a ticket"
               | can be hard if you use GitHub issues for those.
        
       | encryptluks2 wrote:
       | Just noticed GitHub is down for me. Can't access repos :(
        
       | jacobrussell wrote:
       | Good time for a lunch break I guess
        
       | funOtter wrote:
       | I was just doing some development on my GitHub Actions ... can
       | only assume it was my fault.
        
         | roland35 wrote:
         | Should have probably left out that sudo rm -rf from your Action
        
       | mabbo wrote:
       | In the middle of onboarding with a new company. We're at a
       | critical point of training that requires GitHub.
       | 
       | Oh boy, this is going to be a fun day.
        
         | awestroke wrote:
         | The incident lasted less than an hour. How does this affect
         | your whole day?
        
           | mabbo wrote:
           | It didn't. But at the time, the trainer was very, very
           | worried.
        
         | tyingq wrote:
         | Maybe you could sign up for a GitHub enterprise trial? At the
         | least first few screens seem to be working.
        
       | judge2020 wrote:
       | GitHub Universe is only a week away, could be related to a bad
       | deploy for some feature updates to-be-revealed then.
        
       | contingencies wrote:
       | They were down at least a solid hour last week and didn't even
       | post to their status page. I put in a query and got no response.
       | They then unilaterally closed the ticket asked me how my support
       | experience was. Time to move to Gitlab.
       | https://news.ycombinator.com/item?id=28874751
        
       | mot2ba wrote:
       | This incident has been marked as resolved. You guys could check
       | how often Github notable incidents that really captures the
       | audience's attention in HN [0] [1]:
       | 
       | [0]https://news.ycombinator.com/from?site=githubstatus.com
       | 
       | [1]https://hn.algolia.com/?dateRange=pastYear&page=0&prefix=fal..
       | .
        
         | jrochkind1 wrote:
         | Looks like about once every two months?
         | 
         | Which is actually more than I expected, and seems like kind of
         | too much.
        
       | jrochkind1 wrote:
       | Hmm, how do I get the Github Actions CI to run on all the already
       | existing PR's for which it never ran? Anyone know?
        
         | mdaniel wrote:
         | Would this do it?
         | https://docs.github.com/en/rest/reference/actions#create-a-w...
         | or, depending on the error, perhaps this?
         | https://docs.github.com/en/rest/reference/actions#re-run-a-w...
        
       ___________________________________________________________________
       (page generated 2021-10-21 23:01 UTC)