[HN Gopher] GitHub Actions Incident 29.3
       ___________________________________________________________________
        
       GitHub Actions Incident 29.3
        
       Author : rethab
       Score  : 89 points
       Date   : 2023-03-29 14:32 UTC (8 hours ago)
        
 (HTM) web link (www.githubstatus.com)
 (TXT) w3m dump (www.githubstatus.com)
        
       | usrme wrote:
       | Azure is also having somewhat widespread issues at the moment, so
       | I'd venture to guess that these two are also linked.
        
       | jacobsenscott wrote:
       | Nobody's stock goes down when actions go down. Nobody's stock
       | goes up when actions are working. But everyone's stock goes up
       | when you have mass layoffs. Working as designed.
        
         | riffic wrote:
         | not sure what the SLA is on Actions but outages are a regular
         | occurrence with these kinds of systems and are incredibly
         | expensive to move to the next 9 of availability.
         | 
         | it's certainly a risk you'll need to evaluate when planning
         | your desired build process.
        
       | chatmasta wrote:
       | At least we don't need to scroll too far back in our comment
       | history to copy/paste our arguments from the thread two days ago.
        
       | riffic wrote:
       | what is the "29.3" in the title supposed to represent? is that
       | supposed to indicate a date of March 29th? I do not see a
       | reference to this on the incident page itself.
        
       | gxt wrote:
       | At what point are organisations going to ask themselves wether
       | this is intentional or not. Your velocity is disrupted by a
       | likely competitor. I'd move out.
        
         | jacooper wrote:
         | Or simply just self host actions?
        
         | aaomidi wrote:
         | I don't have an opinion on it is or not but layoffs do have a
         | significant impact on that and it's generally something that's
         | pretty impossible to study.
        
       | esotericimpl wrote:
       | [dead]
        
       | [deleted]
        
       | nimbius wrote:
       | how is it in just _five years_ microsoft has managed to pedal
       | this once vibrant and bustling community of developers and
       | creatives into a roaring dumpster fire of sketcky GPL breaking
       | copilot AI and endless seemingly random outages.
       | 
       | https://www.githubstatus.com/history
       | 
       | github has had 55 outages in 3 months. thats nearly an outage
       | _every two days._
       | 
       | the last six months of 2022 had 74 outages. In many shops thats
       | tangibly _worse_ than what their local greybeard Linux admin
       | maintains.
       | 
       | arguments against spinning up my own
       | gitlab/gitea/jenkins/whatever in podman under systemd are
       | starting to ring pretty hollow lately.
        
         | xxpor wrote:
         | comments like these are what end up incentivizing companies to
         | hide outages and report all green all the time
        
         | _gabe_ wrote:
         | > how is it in just five years microsoft has managed to pedal
         | this once vibrant and bustling community of developers and
         | creatives into a roaring dumpster fire of sketcky GPL breaking
         | copilot AI and endless seemingly random outages.
         | 
         | I mean Github Actions was released 5 years ago[0]. I imagine
         | the infrastructure for actions is more susceptible to outages
         | than the fairly simple features Github offered previously. It
         | makes sense that the number of outages would increase with the
         | additional complexity in the infrastructure.
         | 
         | [0]:
         | https://resources.github.com/devops/tools/automation/actions...
        
       | web3-is-a-scam wrote:
       | Github is so crappy now, it feels like something is always wrong
       | with it. Thanks Microsoft.
        
       | temp_account_32 wrote:
       | Funnily enough, GitLab is also melting down at the moment, with
       | pipelines not running and pull requests not functioning:
       | 
       | https://status.gitlab.com/
        
         | bencevans wrote:
         | Also causing issues due to a change that's made release source
         | tars change [1] (changing hash), so build systems are rejecting
         | [2].
         | 
         | 1: https://gitlab.com/gitlab-org/gitlab/-/issues/402616 2:
         | https://github.com/microsoft/vcpkg/issues/30481
        
           | mardifoufs wrote:
           | Isn't that the same "bug" that happened to github a few weeks
           | ago when they updated their Git version too? It wasn't a bug
           | per say but they still had to revert because the new hashes
           | were causing massive build problems. Maybe it's a different
           | root cause though.
        
           | Rapzid wrote:
           | Holy cow. The cardinal DevOps sin of changing the contents of
           | a versioned file.
        
         | kingds wrote:
         | what's a pull request?
        
           | smcleod wrote:
           | I agree the naming is misleading - it's not actually a
           | request to pull anything - it's a request to merge someone's
           | branch into another. This is known as a merge request on
           | several other platforms.
        
           | momentoftop wrote:
           | It's a request to the owner of some reference in a Git
           | repository to pull in some changes from some reference in
           | some (possibly other) Git repository. You can do this via
           | email, but centralised Git hosts like Github have their own
           | interface to this basic workflow.
        
           | tmpz22 wrote:
           | A pull request is a process which can merge new code into
           | existing code.
           | 
           | "Software Engineer John was tasked to add a new logo to the
           | website, when he was done he submitted a pull request of his
           | feature branch into his organization's github repository for
           | the website so that his team members could approve the
           | changes before automation (like _Github Actions_ ) deployed
           | live as a new version of the website."
        
             | nightfly wrote:
             | Gitlab has "merge requests" the comment you're replying was
             | trying to point out an error in an annoying way
        
               | OJFord wrote:
               | Not even an error frankly, MR, PR, change request, patch
               | set, personally don't care what you call it, it's the
               | same thing.
        
               | kingds wrote:
               | i tried to add a lil laughing emoji to indicate that my
               | comment was in jest but apparently those get stripped out
               | of comments? my bad.
        
               | c-hendricks wrote:
               | Yeah HN doesn't want people to be _too_ expressive.
        
       | deltaci wrote:
       | this is already the third time github actions is down this week
       | at wednesday morning
        
         | mdaniel wrote:
         | maybe they have a "no deploy on Friday" rule :-D
        
           | capableweb wrote:
           | I sure hope not, GitHub is supposed to be matured
           | infrastructure at this point, where most if not all changes
           | going into production should be very well tested and nothing
           | that multiple people haven't verified as being correct should
           | end up being deployed and released.
           | 
           | Besides, Microsoft surely has 24/7 watch of their
           | infrastructure, even on weekends, it's a huge company.
        
             | andrewxdiamond wrote:
             | Bugs are a function of change, not a function of maturity.
             | 
             | Just because they can have people come in on weekends to
             | fix things doesn't mean they like doing that.
             | 
             | I know many "mature" software platforms that do not deploy
             | on Fridays or off hours at all
        
               | bastardoperator wrote:
               | A true global company doesn't have off hours.
        
               | zamnos wrote:
               | Less "on" hours then? Even Google has diurnal patterns
               | when there's a lower amount of traffic simply due to the
               | fact that humans are unevenly distributed across the
               | Earth's surface. And Google does code freezes for the
               | holidays where they don't deploy at all.
        
               | andrewxdiamond wrote:
               | As someone who runs one of the most global APIs in the
               | world, I promise you, I do in fact sleep
        
               | zamnos wrote:
               | What's your pager rotation like? I want to say you have
               | follows-the-sun, and so your on-call shifts are 12-hours
               | long and you swap with a team on the other side of the
               | world from you so you can get said sleep, but I don't
               | want to just assume that.
        
               | andrewxdiamond wrote:
               | Dayshifts are 9-5, night shifts are 5-9. Same team
               | rotates through both. I have done plenty of overnight
               | oncalls.
               | 
               | Some teams do follow the sun type rotations, but my team
               | is all in Seattle.
        
               | zamnos wrote:
               | Fascinating, glad I asked!
        
               | bastardoperator wrote:
               | That's why a global team is important, when you sleep
               | they work, when they sleep you work. Work is constant
               | when you're servicing the globe.
        
               | andrewxdiamond wrote:
               | And work is constantly slow as well, since it's
               | impossible to get people in the room at the same time
        
               | bastardoperator wrote:
               | Why does everyone need to be in the room? I have a
               | groomed backlog and can talk to people async as needed.
               | We also record meetings if you missed them and depending
               | on the context of the meeting or importance, we'll hold
               | timezone friendly meetings for everyone as required.
        
               | andrewxdiamond wrote:
               | -\\_(tsu)_/-
               | 
               | We have a different work philosophy. I do work with teams
               | in India and England, and it's painful to accomplish
               | anything cross team
        
               | capableweb wrote:
               | Everyone employed by your company, who work in the same
               | industry stops and starts working at the same time, all
               | around the world?
        
               | andrewxdiamond wrote:
               | No, but my team owns our own APIs. We are all located in
               | the same area and we go oncall for our service.
        
             | outworlder wrote:
             | > Besides, Microsoft surely has 24/7 watch of their
             | infrastructure, even on weekends
             | 
             | "watching" with a dedicated team vs "waking up everyone in
             | engg because things are on fire" are two very different
             | things.
             | 
             | Besides, size doesn't work that way. The larger the
             | organization and the more complex the product is, the
             | higher the chance some unexpected interaction will occur.
             | There are processes and automation that can mitigate this,
             | but one can never be completely certain.
             | 
             | Not even the aviation industry has mastered that.
        
       | sithlord wrote:
       | Wonder if these were managed by a defunct team from India?
        
         | racl101 wrote:
         | Oh I get it.
         | 
         | [touches nose]
        
       | rvz wrote:
       | Once again, just two days ago [0], the whole of GitHub went down,
       | after the RSA key leakage and the certificate key expiry on its
       | user facing site.
       | 
       | It is also apparent that GitHub Actions has chronically been
       | struggling to operate normally for at least once a month for
       | years.
       | 
       | There is no question that GitHub has been more unreliable than if
       | you were to use a self-hosted GitLab or Gittea instance yourself
       | as I said before [1].
       | 
       | [0] https://news.ycombinator.com/item?id=35325850
       | 
       | [1] https://news.ycombinator.com/item?id=22867803
        
         | carlmr wrote:
         | Gittea is great but it doesn't replace GHA, only the rest of
         | GitHub.
        
           | dunno7456 wrote:
           | Gitea + Woodpecker CI works pretty nicely.
        
       ___________________________________________________________________
       (page generated 2023-03-29 23:02 UTC)