[HN Gopher] [Resolved]Incident with GitHub Actions, Issues, Pull...
___________________________________________________________________
[Resolved]Incident with GitHub Actions, Issues, Pull Requests, and
Webhooks
Author : mot2ba
Score : 84 points
Date : 2021-10-21 15:24 UTC (7 hours ago)
(HTM) web link (www.githubstatus.com)
(TXT) w3m dump (www.githubstatus.com)
| maherbeg wrote:
| I'm surprised they don't have a couple of separate clusters that
| they roll things out to and monitor. Seems like you could have a
| very stable "high paying customers" cluster that is at the very
| end of your deployment cycle after a ton of canary checks on the
| way get through.
| intunderflow wrote:
| With Actions they do this, if you're on GitHub Enterprise and
| run an action it picks machines out of a special pool set aside
| just for enterprise customers.
| intunderflow wrote:
| When GitHub is down we legally don't have to do any work, it's in
| the constitution I swear
| m_a_g wrote:
| Best comment in this thread by far
| bob1029 wrote:
| This one is pretty nasty. Getting tired of the disengagement this
| causes for the team. It's basically a lost day of productivity
| even if GH goes down for only 30 minutes. Yes, we can continue
| coding locally, but issues & PRs are a huge part of our daily
| process.
|
| When I get back from vacation we are moving our shit to the
| enterprise plan. $21/user-month is really not that big of a deal
| when you are running basically your entire business through the
| product.
|
| I do agree that it's ridiculous to assume that we can manage
| Github's software better than their own engineers, but at the
| same time our infrastructure has proven itself to be extremely
| reliable over the last 4-5 years. Even hosting GH enterprise on
| public AWS/Azure is more ideal in my eyes now, because I can
| control the physical region and tenancy. There is an Azure
| datacenter within 100 miles of many of our home offices and I can
| ensure that our Github stack spins up there. Minimizing the
| amount of internet you have to transit to get to your
| applications can sidestep a lot of this stormy public
| cloud/internet weather bullshit.
| cyberpunk wrote:
| I would strongly recommend (having used both extensively) going
| all in in gitlab instead if you have to do a migration anyway.
| teh_klev wrote:
| > There is an Azure datacenter within 100 miles of many of our
| home offices and I can ensure that our Github stack spins up
| there. Minimizing the amount of internet you have to transit to
| get to your applications can sidestep a lot of this stormy
| public cloud/internet weather bullshit.
|
| You've no guarantees that your local'ish data centre is going
| to be hop-wise, route-wise and peering-wise any better than a
| DC 1500 miles away, in relation to your home or office ISP.
| bob1029 wrote:
| > You've no guarantees that your local'ish data centre is
| going to be hop-wise, route-wise and peering-wise any better
| than a DC 1500 miles away, in relation to your home or office
| ISP.
|
| You're correct. In fact, as I type this reply my cloudflare
| diagnostics are indicating I am talking to a datacenter 200
| miles further away than would otherwise be ideal. That said,
| its still within an extremely reasonable distance. This is a
| "risk" I am willing to take. It's certainly a better starting
| point than _guaranteed_ 70ms minimums.
| scaryclam wrote:
| I don't buy this at all. Even if GH is down for the entire day,
| how the heck does NO-ONE on your team know what they're doing?
| Was no-one working on something already? Did nobody pay any
| attention during planning? Do you not have anything you can do
| from memory from your backlog?!
|
| If you can't work for a day just because Github is down, then
| there's bigger problems in your process that github being down.
| I'm sorry of that sounds harsh, but you're either being
| hyperbolic or you have some real issues to fix in your team or
| organisation.
| blitzar wrote:
| On the plus side, you have a process and in theory it works, or
| you would be looking elsewhere.
|
| You might have lost a day today, but how many days have you
| gained thanks to these tools the last month?
| rightbyte wrote:
| > I do agree that it's ridiculous to assume that we can manage
| Github's software better than their own engineers
|
| Not really. You can mess with stuff when it suits you with a
| risk for downtime. Hosting yourself has the same advantage as
| disabling auto-updates - you are in control of when to break
| stuff.
| lima wrote:
| > _I do agree that it 's ridiculous to assume that we can
| manage Github's software better than their own engineers_
|
| Why? It's totally reasonable that your own GHE instance will
| have better uptime.
|
| Running GitHub.com is much, much harder than a private instance
| (DB scale-out, load, ...).
|
| Our in-house Gerrit infra and CI has had a significantly better
| uptime than GitHub over the past year, but we have hundreds,
| not 60 million users and exabytes of storage :-)
| andrewstuart2 wrote:
| I think you will quickly find if you're just deploying GH
| Enterprise on premises that it is not at all what you get from
| the GH Cloud offering. GHE has its own product roadmap that is
| quite a bit behind the cloud product, and in many cases (IMO)
| unacceptably so. It still doesn't support cache for runners,
| last I checked, though I've since moved on from the org that
| required me to work with GHE. I'm back to my happy place with
| self-hosted GitLab, and a little bit of GitHub cloud.
| bob1029 wrote:
| > GHE has its own product roadmap that is quite a bit behind
| the cloud product
|
| This is exactly what we want though. We don't need the new
| fancy shit on a regular cadence. Issues, Code, PRs and 1 line
| checkbuild scripts are all we care about. Everything else is
| built into our software.
| andrewstuart2 wrote:
| What I mean is that it's quite clearly not the same product
| as cloud. It _does_ have roughly the same road map as
| cloud, just pretty far behind and /or cherry-picking some
| features in a different order. My experience with
| enterprise software is that it doesn't matter what the
| roadmap is as long as you get to choose if/when the
| benefits outweigh the risks for an update. And you usually
| want certain releases to get backported security updates
| for that same reason. You don't have to take new features
| and their associated bugs but you do want, and get,
| security fixes. That's a separate thing, because this would
| be the case if GHE was just "run your own copy of .com."
|
| What is really rough about GHE is that you _can 't_ choose
| a lot of the features or IMO baseline requirements like
| caching that you've probably come to expect from
| github.com, and may have been around for years. At least
| not until they can get GHE to parity with .com.
| eyegor wrote:
| I'd recommend checking out gitlab for on-prem hosting in an
| enterprise environment. Works great, integrates with
| AD/ldap and most of the features are available on the free
| tier if you want to test it. Practically a drop in
| replacement for github.
| mdaniel wrote:
| I love GitLab, and am now a shareholder, but advertising
| it as "drop in replacement" is setting up an evaluator
| for disappointment
|
| At the very least, they use just absolutely incompatible
| yaml files for their CI pipelines (in, of course, an
| incompatible location in the repo)
|
| But probably the bigger obstacle would be their
| incompatible API (and incompatible auth to it); that
| means one cannot grab a cool "github bot/tool for doing
| $X" and expect it to do anything reasonable in GitLab
| nonbirithm wrote:
| > It's basically a lost day of productivity even if GH goes
| down for only 30 minutes.
|
| Are we entering an era where if we don't have hundreds of
| thousands of servers running 24/7 to host our services, with
| all the resource consumption and environmental implications
| that result, that we will no longer be able to remain
| productive as a society? Is this gradually becoming a new
| baseline for humanity from which we cannot reasonably downsize?
| jrochkind1 wrote:
| Yes. You're just noticing?
| bob1029 wrote:
| > Are we entering an era
|
| We've been in this era for about 4 decades now. There are
| mainframes which do payment processing that, if they were to
| fail, would cause substantial harm to the global economy
| almost instantly.
| sudhirj wrote:
| Utility technology become foundational very quickly. Despite
| it being very new, humanity is already fundamentally reliant
| on global supply chains, oil, electricity, networks,
| satellites and many other technologies that we cannot
| downsize. We could collectively plan and execute decades long
| exit plans, like we do for oil, but outages will bring daily
| life to a halt.
| ygjb wrote:
| Yes.
|
| It's worth noting that you can take almost any software from
| before the late 90s to early 2000s, depending on the vendor,
| that is still available, and with a layer of emulation get it
| running in minutes.
|
| The vast majority of software that is being built today for
| end users simply will not function in a short time frame
| because of aggressively built in dependencies on cloud based
| services, often with those dependencies designed to encourage
| customer lock-in and prevent piracy by forcing users to have
| active accounts and shift core logic from endpoints to cloud
| services.
|
| Even moving past licensing servers and account capabilities,
| tools like Grammarly ship much of their analysis to the
| cloud, same for most translation services. Many modern text
| to speech services are cloud based as well (just look at how
| useless a modern cell phone becomes when you are without a
| data connection, for example).
|
| I don't know what the statistics would look like, but I
| shudder to think how much of the world economy would grind to
| a halt if Amazon or another significant cloud provider had a
| sustained, multi-region outage (say 24-48 hours).
|
| It's a god-damn mess, and we did it to ourselves.
| bob1029 wrote:
| > I shudder to think how much of the world economy would
| grind to a halt if Amazon or another significant cloud
| provider had a sustained, multi-region outage (say 24-48
| hours).
|
| You can rest at ease. Nothing that is mission-critical for
| the world's financial infrastructure is hosted within one
| of these sorts of facilities. Facebook and Netflix might go
| down for days, but your Amex will still work at any
| merchant with a functioning internet connection.
|
| I have been inside of [financial services organization]'s
| datacenter in [some midwestern state] which was purpose-
| built for the IT load. The strategy is "failure is not an
| option". It's essentially 1 gigantic, redundant life-
| support system for the one of the more sensitive computers
| on the planet. Amazon and Microsoft cannot afford to go to
| these lengths for the market they serve.
| ygjb wrote:
| I wish I had your confidence in the financial system, but
| it's not the financial services I am concerned about.
|
| Financial services are important for moving money around,
| and processing electronic payments, but it doesn't matter
| how effectively you can process a wire if there are
| significant supply chain disruptions, and systems
| failures that take down the platforms that major
| retailers and distributors use for logistics.
|
| Even if ATMs and bank networks remain up, what about the
| encashment and physical security services that those
| institutions rely on to move around actual physical
| money?
|
| The economy is more than just financial services, and all
| of those financial services are just proxies for the real
| world goods that people need to survive for more than a
| few days in most urban centres.
| dabeeeenster wrote:
| How does moving to the Enterprise plan help?
| onionisafruit wrote:
| Presumably GP means they will deploy their on GitHub
| Enterprise server instead of using github.com.
| sudhirj wrote:
| GH Enterprise is run on your own servers, so you'd
| theoretically run it right in the office. It may not move the
| needle on actual downtime, but there's some control in the
| downtime - if the LAN is out no one can work anyway, and
| upgrades will only happen on your business's lean days, be
| tested out with the teams that have an appetite for
| experimentation, etc.
| onionisafruit wrote:
| > I do agree that it's ridiculous to assume that we can manage
| Github's software better than their own engineers
|
| That by itself would be ridiculous, but there's more to it than
| that. Your GHE server won't have new code deployed to it
| hundreds of times per day the way github.com does. You probably
| won't be the target of ddos attacks either.
|
| Very few of github.com outages are the result of maintenance
| errors.
| cmckn wrote:
| > It's basically a lost day of productivity even if GH goes
| down for only 30 minutes. Yes, we can continue coding locally,
| but issues & PRs are a huge part of our daily process.
|
| I know outages are frustrating, but how does 30 minutes before
| 10am ruin an entire day? Maybe you're just being hyperbolic,
| but people take coffee breaks longer than that.
| bob1029 wrote:
| > how does 30 minutes before 10am ruin an entire day?
|
| Not everyone is on pacific time. This impacted us right in
| the middle of a standup call and disrupted our planning for
| the day.
|
| Also, the problem is more that you don't know how long the
| outage is going to last at first, so you start finding other
| ways to occupy your time. Through the lens of hindsight, yes
| we are certainly being hyperbolic in those cases where it
| _was_ only 30 minutes.
| jimmaswell wrote:
| > so you start finding other ways to occupy your time
|
| Ok, pick a ticket and do some work on it locally, when
| that's done do the same with another. I can go a full day
| without interacting with github because I'm working on a
| local branch. Make a note of what branches you need to push
| later. I can't possibly imagine throwing my arms up and
| saying the day is wasted because I have to work locally.
| It's completely unbelievable.
| jorams wrote:
| I understand your point and agree, but "pick a ticket"
| can be hard if you use GitHub issues for those.
| encryptluks2 wrote:
| Just noticed GitHub is down for me. Can't access repos :(
| jacobrussell wrote:
| Good time for a lunch break I guess
| funOtter wrote:
| I was just doing some development on my GitHub Actions ... can
| only assume it was my fault.
| roland35 wrote:
| Should have probably left out that sudo rm -rf from your Action
| mabbo wrote:
| In the middle of onboarding with a new company. We're at a
| critical point of training that requires GitHub.
|
| Oh boy, this is going to be a fun day.
| awestroke wrote:
| The incident lasted less than an hour. How does this affect
| your whole day?
| mabbo wrote:
| It didn't. But at the time, the trainer was very, very
| worried.
| tyingq wrote:
| Maybe you could sign up for a GitHub enterprise trial? At the
| least first few screens seem to be working.
| judge2020 wrote:
| GitHub Universe is only a week away, could be related to a bad
| deploy for some feature updates to-be-revealed then.
| contingencies wrote:
| They were down at least a solid hour last week and didn't even
| post to their status page. I put in a query and got no response.
| They then unilaterally closed the ticket asked me how my support
| experience was. Time to move to Gitlab.
| https://news.ycombinator.com/item?id=28874751
| mot2ba wrote:
| This incident has been marked as resolved. You guys could check
| how often Github notable incidents that really captures the
| audience's attention in HN [0] [1]:
|
| [0]https://news.ycombinator.com/from?site=githubstatus.com
|
| [1]https://hn.algolia.com/?dateRange=pastYear&page=0&prefix=fal..
| .
| jrochkind1 wrote:
| Looks like about once every two months?
|
| Which is actually more than I expected, and seems like kind of
| too much.
| jrochkind1 wrote:
| Hmm, how do I get the Github Actions CI to run on all the already
| existing PR's for which it never ran? Anyone know?
| mdaniel wrote:
| Would this do it?
| https://docs.github.com/en/rest/reference/actions#create-a-w...
| or, depending on the error, perhaps this?
| https://docs.github.com/en/rest/reference/actions#re-run-a-w...
___________________________________________________________________
(page generated 2021-10-21 23:01 UTC)