[HN Gopher] GitHub Actions Incident 29.3
___________________________________________________________________
GitHub Actions Incident 29.3
Author : rethab
Score : 89 points
Date : 2023-03-29 14:32 UTC (8 hours ago)
(HTM) web link (www.githubstatus.com)
(TXT) w3m dump (www.githubstatus.com)
| usrme wrote:
| Azure is also having somewhat widespread issues at the moment, so
| I'd venture to guess that these two are also linked.
| jacobsenscott wrote:
| Nobody's stock goes down when actions go down. Nobody's stock
| goes up when actions are working. But everyone's stock goes up
| when you have mass layoffs. Working as designed.
| riffic wrote:
| not sure what the SLA is on Actions but outages are a regular
| occurrence with these kinds of systems and are incredibly
| expensive to move to the next 9 of availability.
|
| it's certainly a risk you'll need to evaluate when planning
| your desired build process.
| chatmasta wrote:
| At least we don't need to scroll too far back in our comment
| history to copy/paste our arguments from the thread two days ago.
| riffic wrote:
| what is the "29.3" in the title supposed to represent? is that
| supposed to indicate a date of March 29th? I do not see a
| reference to this on the incident page itself.
| gxt wrote:
| At what point are organisations going to ask themselves wether
| this is intentional or not. Your velocity is disrupted by a
| likely competitor. I'd move out.
| jacooper wrote:
| Or simply just self host actions?
| aaomidi wrote:
| I don't have an opinion on it is or not but layoffs do have a
| significant impact on that and it's generally something that's
| pretty impossible to study.
| esotericimpl wrote:
| [dead]
| [deleted]
| nimbius wrote:
| how is it in just _five years_ microsoft has managed to pedal
| this once vibrant and bustling community of developers and
| creatives into a roaring dumpster fire of sketcky GPL breaking
| copilot AI and endless seemingly random outages.
|
| https://www.githubstatus.com/history
|
| github has had 55 outages in 3 months. thats nearly an outage
| _every two days._
|
| the last six months of 2022 had 74 outages. In many shops thats
| tangibly _worse_ than what their local greybeard Linux admin
| maintains.
|
| arguments against spinning up my own
| gitlab/gitea/jenkins/whatever in podman under systemd are
| starting to ring pretty hollow lately.
| xxpor wrote:
| comments like these are what end up incentivizing companies to
| hide outages and report all green all the time
| _gabe_ wrote:
| > how is it in just five years microsoft has managed to pedal
| this once vibrant and bustling community of developers and
| creatives into a roaring dumpster fire of sketcky GPL breaking
| copilot AI and endless seemingly random outages.
|
| I mean Github Actions was released 5 years ago[0]. I imagine
| the infrastructure for actions is more susceptible to outages
| than the fairly simple features Github offered previously. It
| makes sense that the number of outages would increase with the
| additional complexity in the infrastructure.
|
| [0]:
| https://resources.github.com/devops/tools/automation/actions...
| web3-is-a-scam wrote:
| Github is so crappy now, it feels like something is always wrong
| with it. Thanks Microsoft.
| temp_account_32 wrote:
| Funnily enough, GitLab is also melting down at the moment, with
| pipelines not running and pull requests not functioning:
|
| https://status.gitlab.com/
| bencevans wrote:
| Also causing issues due to a change that's made release source
| tars change [1] (changing hash), so build systems are rejecting
| [2].
|
| 1: https://gitlab.com/gitlab-org/gitlab/-/issues/402616 2:
| https://github.com/microsoft/vcpkg/issues/30481
| mardifoufs wrote:
| Isn't that the same "bug" that happened to github a few weeks
| ago when they updated their Git version too? It wasn't a bug
| per say but they still had to revert because the new hashes
| were causing massive build problems. Maybe it's a different
| root cause though.
| Rapzid wrote:
| Holy cow. The cardinal DevOps sin of changing the contents of
| a versioned file.
| kingds wrote:
| what's a pull request?
| smcleod wrote:
| I agree the naming is misleading - it's not actually a
| request to pull anything - it's a request to merge someone's
| branch into another. This is known as a merge request on
| several other platforms.
| momentoftop wrote:
| It's a request to the owner of some reference in a Git
| repository to pull in some changes from some reference in
| some (possibly other) Git repository. You can do this via
| email, but centralised Git hosts like Github have their own
| interface to this basic workflow.
| tmpz22 wrote:
| A pull request is a process which can merge new code into
| existing code.
|
| "Software Engineer John was tasked to add a new logo to the
| website, when he was done he submitted a pull request of his
| feature branch into his organization's github repository for
| the website so that his team members could approve the
| changes before automation (like _Github Actions_ ) deployed
| live as a new version of the website."
| nightfly wrote:
| Gitlab has "merge requests" the comment you're replying was
| trying to point out an error in an annoying way
| OJFord wrote:
| Not even an error frankly, MR, PR, change request, patch
| set, personally don't care what you call it, it's the
| same thing.
| kingds wrote:
| i tried to add a lil laughing emoji to indicate that my
| comment was in jest but apparently those get stripped out
| of comments? my bad.
| c-hendricks wrote:
| Yeah HN doesn't want people to be _too_ expressive.
| deltaci wrote:
| this is already the third time github actions is down this week
| at wednesday morning
| mdaniel wrote:
| maybe they have a "no deploy on Friday" rule :-D
| capableweb wrote:
| I sure hope not, GitHub is supposed to be matured
| infrastructure at this point, where most if not all changes
| going into production should be very well tested and nothing
| that multiple people haven't verified as being correct should
| end up being deployed and released.
|
| Besides, Microsoft surely has 24/7 watch of their
| infrastructure, even on weekends, it's a huge company.
| andrewxdiamond wrote:
| Bugs are a function of change, not a function of maturity.
|
| Just because they can have people come in on weekends to
| fix things doesn't mean they like doing that.
|
| I know many "mature" software platforms that do not deploy
| on Fridays or off hours at all
| bastardoperator wrote:
| A true global company doesn't have off hours.
| zamnos wrote:
| Less "on" hours then? Even Google has diurnal patterns
| when there's a lower amount of traffic simply due to the
| fact that humans are unevenly distributed across the
| Earth's surface. And Google does code freezes for the
| holidays where they don't deploy at all.
| andrewxdiamond wrote:
| As someone who runs one of the most global APIs in the
| world, I promise you, I do in fact sleep
| zamnos wrote:
| What's your pager rotation like? I want to say you have
| follows-the-sun, and so your on-call shifts are 12-hours
| long and you swap with a team on the other side of the
| world from you so you can get said sleep, but I don't
| want to just assume that.
| andrewxdiamond wrote:
| Dayshifts are 9-5, night shifts are 5-9. Same team
| rotates through both. I have done plenty of overnight
| oncalls.
|
| Some teams do follow the sun type rotations, but my team
| is all in Seattle.
| zamnos wrote:
| Fascinating, glad I asked!
| bastardoperator wrote:
| That's why a global team is important, when you sleep
| they work, when they sleep you work. Work is constant
| when you're servicing the globe.
| andrewxdiamond wrote:
| And work is constantly slow as well, since it's
| impossible to get people in the room at the same time
| bastardoperator wrote:
| Why does everyone need to be in the room? I have a
| groomed backlog and can talk to people async as needed.
| We also record meetings if you missed them and depending
| on the context of the meeting or importance, we'll hold
| timezone friendly meetings for everyone as required.
| andrewxdiamond wrote:
| -\\_(tsu)_/-
|
| We have a different work philosophy. I do work with teams
| in India and England, and it's painful to accomplish
| anything cross team
| capableweb wrote:
| Everyone employed by your company, who work in the same
| industry stops and starts working at the same time, all
| around the world?
| andrewxdiamond wrote:
| No, but my team owns our own APIs. We are all located in
| the same area and we go oncall for our service.
| outworlder wrote:
| > Besides, Microsoft surely has 24/7 watch of their
| infrastructure, even on weekends
|
| "watching" with a dedicated team vs "waking up everyone in
| engg because things are on fire" are two very different
| things.
|
| Besides, size doesn't work that way. The larger the
| organization and the more complex the product is, the
| higher the chance some unexpected interaction will occur.
| There are processes and automation that can mitigate this,
| but one can never be completely certain.
|
| Not even the aviation industry has mastered that.
| sithlord wrote:
| Wonder if these were managed by a defunct team from India?
| racl101 wrote:
| Oh I get it.
|
| [touches nose]
| rvz wrote:
| Once again, just two days ago [0], the whole of GitHub went down,
| after the RSA key leakage and the certificate key expiry on its
| user facing site.
|
| It is also apparent that GitHub Actions has chronically been
| struggling to operate normally for at least once a month for
| years.
|
| There is no question that GitHub has been more unreliable than if
| you were to use a self-hosted GitLab or Gittea instance yourself
| as I said before [1].
|
| [0] https://news.ycombinator.com/item?id=35325850
|
| [1] https://news.ycombinator.com/item?id=22867803
| carlmr wrote:
| Gittea is great but it doesn't replace GHA, only the rest of
| GitHub.
| dunno7456 wrote:
| Gitea + Woodpecker CI works pretty nicely.
___________________________________________________________________
(page generated 2023-03-29 23:02 UTC)