[HN Gopher] GitHub incident 2022-03-23
___________________________________________________________________
GitHub incident 2022-03-23
Author : tpaksoy
Score : 258 points
Date : 2022-03-23 14:51 UTC (8 hours ago)
(HTM) web link (www.githubstatus.com)
(TXT) w3m dump (www.githubstatus.com)
| max23_ wrote:
| Looks like the same services that were affected in yesterday
| incident.
| mfashby wrote:
| I'm inclined to look at tools like fossil again, for it's
| distributed issue tracking and wiki capability
|
| https://fossil-scm.org/home/doc/trunk/www/index.wiki
| edgyquant wrote:
| I had forgotten about that, thanks!
| frjalex wrote:
| Looking at the "GitHub" prefix in the title, I was half-expecting
| this to point to a report explaining the outage a week ago... But
| rest assured, it is a new outage!
| annexrichmond wrote:
| I thought it was going to be a Postmortem. I couldn't have been
| more wrong!
| teekert wrote:
| Oh I thought it was about the one from yesterday :)
| aaaaaaaaata wrote:
| Are their CI/CD toys that shiny that people still willingly
| choose them even with all the issues?
|
| I find myself regularly asking this -- about every major SaaS
| used for critical ops stuff like this.
| teekert wrote:
| Work choose GitHub (we are a MicroSoft shop), I have to
| say, I like GitHub a lot. The disruptions have been
| annoying sometimes, that's true. But due to the nature of
| Git I could always just keep working.
| [deleted]
| einpoklum wrote:
| The page at the link is not much more informative than the link
| itself :-(
| Xarodon wrote:
| This has been a pretty rough week for GitHub
| stuff4ben wrote:
| Github Enterprise hasn't been faring too well at my work either
| this week. When you work on both open and closed source
| products and GH and GHE are both down, it leads to a very
| unproductive week.
| jrowley wrote:
| Does GitHub enterprise result in dedicated instance or any
| better availability?
| jon-wood wrote:
| It Depends.
|
| GitHub Enterprise is confusingly both a "call us for
| pricing" tier of GitHub the website, and also an on-premise
| version of GitHub that you can run as an appliance in your
| own data centre. The first of those is ultimately just
| GitHub and so has the same outages, the second is running
| on your own hardware so (shouldn't be) tied to the
| website's availability.
| bewuethr wrote:
| There are multiple products: self-hosted (Enterprise
| Server) and hosted by GitHub (Enterprise Cloud). I don't
| know about uptime guarantees, but you can buy Premium or
| Premium Plus support with 30-minute SLA or a dedicated
| account manager.
| okareaman wrote:
| What's the difference between GitHub and GrubHub?
|
| GrubHub delivers
| jadbox wrote:
| First HN comment that ever made me laugh, well done.
| darknavi wrote:
| Watching Lion King as a youth I always though grubs looked
| delicious.
|
| Little did I know...
| mirekrusin wrote:
| What's the best crowdsourced status monitor?
| eckza wrote:
| https://outage.bingo/
| nimbius wrote:
| https://www.githubstatus.com/history
|
| 21 incident outages in just 3 months. At this rate the benefits
| of running your own gitea or gitlab are starting to become
| competitive.
| TheRealPomax wrote:
| But how many of those actually affected you? For example, no
| amount of issues around codespaces or github packages would
| impact my professional use of github, so whether there are 21
| or 5000 or those parts get permanently taken offline makes no
| difference in what I need out of the platform.
|
| How many _core_ incidents? The part that affects whether you
| can even push to and pull from a repo, and access issues and
| PRs? Because everything else is nice to have, but you can do
| work perfectly fine without them if they go down for a few
| hours.
| sitzkrieg wrote:
| i could not even sso login so it was a bit more impactful
| than it sounds on the paper
| CanSpice wrote:
| Yesterday's affected me, I couldn't pull or push and when I
| tried to look at the repo to do PRs I got 500 errors. That
| only lasted maybe 30 minutes though.
| jeltz wrote:
| I was affected by the one last week, the one yesterday and
| the one today. The one today was harmless but the other two
| disrupted out work. All three were "core incidents", but the
| one today felt shorter.
| ironmagma wrote:
| We run Gitea at my company. In fact, we forked it. It could
| reeeaaaalllly use a rewrite. If anyone is even mildly ambitious
| about creating a new alternative to Github/Gitea, it's a great
| time to do that.
| mynameismon wrote:
| You might be interested in sourcehut: https://sr.ht
| KronisLV wrote:
| Another self-hosted project in the space that i've seen was
| GitBucket, although it runs on the JVM (not necessarily a bad
| thing, just different from Go): https://gitbucket.github.io/
| chockchocschoir wrote:
| > At this rate the benefits of running your own gitea or gitlab
| are starting to become competitive
|
| No need, just use Codeberg.org instead. They run Gitea and is a
| free collaboration platform (+ git hosting) for free projects.
| FOSS/OSS should really consider alternatives to GitHub and
| GitLab, especially when there are much more FOSS/OSS friendly
| platforms around.
| sonicggg wrote:
| Come on, don't be so dramatic. This is not a 911 call center,
| people will survive these minor outages.
| mdellavo wrote:
| sure but three days in a row?
| Chris2048 wrote:
| > running your own
|
| assuming that would be flawless, which it wouldn't
| ransom1538 wrote:
| "21 incident outages in just 3 months. At this rate the
| benefits of running your own gitea or gitlab are starting to
| become competitive."
|
| Oh stop the drama. Fine. Setup your gitlab.
| belter wrote:
| Excluding ones reported as [Errors], [Scheduled] or
| [Notifications]
|
| 2019 -> 39 Incidents
|
| 2020 -> 67 Incidents
|
| 2021 -> 86 Incidents
|
| 2022 -> 20 Incidents so far
|
| Edit: Using Linear Regression...Prediction for total end 2022:
| 111 Incidents.
| mrkramer wrote:
| One would thought when they got acquired by Microsoft that
| the number of incidents would go down considering all
| resources Microsoft would provide but no.
| speedgoose wrote:
| GitHub has a lot more features now though. A few years ago
| you didn't have GitHub actions or workspaces, mostly a DDoS
| from Asia once in a while.
| antiquark wrote:
| Based on the same interpolation, github will reach one
| incident per day by 2032.
| mbesto wrote:
| The number of incidents isn't so much of a problem as the
| amount of downtime is. That would be more interesting to see.
| belter wrote:
| GitHub Availability Report [1]
|
| Service Downtime Core Services Only - Cumulative per Month
|
| ( Some months with more than one outage)
|
| Jan 2021: 3 hours 53 min
|
| Feb 2021: 1 hour 42 min
|
| Mar 2021: 4 hours 10 min
|
| Apr 2021: 2 hours 20 min
|
| May 2021: 10 hours 34 min
|
| Jun 2021: 0 min
|
| Jul 2021: 0 min
|
| Aug 2021: 4 hours 23 min
|
| Sep 2021: 0 min
|
| Oct 2021: 1 hour 36 min
|
| Nov 2021: 2 hours 50 min
|
| Dec 2021: 0 min
|
| Jan 2022: 26 min
|
| Feb 2022: 13 min
|
| [1] https://github.blog/tag/github-availability-report/
| mbesto wrote:
| So, if my math is right (for 2021 only): 1888 min /
| 525,600 min = 99.64% uptime.
|
| If it was more like 99.80+ I think I would be like "meh",
| but honestly for the price you pay that's not terrible.
| Still, for a company at the Microsoft level, it should be
| 99.80 at least.
| omoikane wrote:
| I wondered if those error rates were proportional to Github's
| growth over time, so I looked it up. It seems that they have
| 40M users in 2019[1] and 73M users in 2021[2], which
| translates to 0.975 incidents per million users per year in
| 2019 compared to 1.178 in 2021.
|
| So perhaps they are not exactly improving, but maybe there is
| some other way to normalize the data.
|
| [1] https://github.blog/2019-11-06-the-state-of-the-
| octoverse-20...
|
| [2] https://octoverse.github.com/
| ejb999 wrote:
| thats not the kind of progression you like to see - that is,
| error rates increasing over time instead of decreasing.
| TheRealPomax wrote:
| Only if you believe those numbers mean anything. What are
| the errors _for_? Github has been adding lots of features
| and subproducts over the years, becoming a bigger and
| bigger platform as a result. What you want is the error-
| per-component, which may very well have actually gone down,
| with error spikes coming from "when github adds a
| completely new feature and it goes through a slew of
| incidents in its first year". The bigger the feature, the
| more incidents.
|
| Without more detailed numbers, there's literally no
| conclusion to draw here.
| ejb999 wrote:
| Every place I have ever worked reported incidents going
| down would be good, not up.
| TheRealPomax wrote:
| Every place I ever worked at understood that if you x3
| the codebase/infra/interaction surface/etc, you can
| expect x3 errors. If the total number of errors don't go
| up as you grow you're doing amazing, and if they go down
| even though you're landing more and more code for more
| and more features and subproducts, you have a genuine
| miracle.
| AlexandrB wrote:
| These features can't be rolled out incrementally to
| users? In this day and age it seems weird for a web app
| to do a global go-live with something before testing it
| with a smaller group first.
| quercetumMons wrote:
| Reasonable if growth/load is growing, too.
| adamsmith143 wrote:
| Is it really though? Are engineers committing so frequently
| that they can't make it through a few hours without Github?
| queuebert wrote:
| Maybe they measure performance in git commits.
| lallysingh wrote:
| It depends on how many engineers you have! But also, there
| are plenty of other functions in GH besides raw git, like
| Wiki/PR/Issues/test/deploy pipelines, etc. It can become
| pretty critical.
| duped wrote:
| An outage of a few hours can tank a release deadline for me,
| so yes.
| [deleted]
| imiric wrote:
| GitHub doesn't just host Git repositories. It's the central
| location for discussions, issues, code reviews, milestone
| planning, and any CI process like testing or releases. If
| it's unavailable whole teams can be interrupted.
|
| Git is distributed. GitHub is very much not.
| jeffwask wrote:
| Yes
| Melatonic wrote:
| The marketing for building your own new "private cloud" will
| begin soon I am sure :-D
| pid-1 wrote:
| You just made a few Openstack consultants raise from their
| graves.
| dbrgn wrote:
| Does Gitea support some kind of federation / cross-instance
| PRs? That's the main thing I'd miss from a self-hosted
| instance, the ease of getting contributions.
|
| After all, you don't even need Gitea for pure Git hosting. If
| you have a server with SSH access, just init a bare repo in a
| directory, push to that, and you're ready to go. No web UI
| needed.
|
| The reason I'm still using GitHub is not code hosting. It's
| collaboration.
| brimble wrote:
| Gitea gets you: a nice GitHub-like web GUI, including for
| stuff like managing users; 2FA; some integrations; web hooks
| without having to add git-hooks to all your repos; and
| extremely-useful-to-some-projects features like git-lfs
| support.
|
| If you don't want or need those things, bare git repos are
| fine and certainly easier to support (not that Gitea's that
| hard, though a few issues/PRs I've noticed have caused me
| more than a little concern about the overall quality of the
| project).
| encryptluks2 wrote:
| But by using GitHub for "collaboration" you are sacrificing
| decentralization.
| dbrgn wrote:
| It seems there's a tracking issue here, but it seems stalled:
| https://github.com/go-gitea/gitea/issues/1612
| tokumei wrote:
| > If you have a server with SSH access, just init a bare repo
| in a directory, push to that, and you're ready to go. No web
| UI needed.
|
| Used to do that years ago for my personal projects. Honestly
| does the trick.
| devwastaken wrote:
| And whom pays for fixing it? Downtimes of self hosted systems
| using external software can be far longer. GitHub, unlike
| Amazon and friends, doesn't lie about their downtime. Every
| saas has hundreds of downtime instances across the board every
| month. Some are small enough you don't see them. Yet the
| services still work exceptionally well - and when they don't
| they get fixed in a quick manner. What takes them an hour would
| take most private orgs a day.
| AlexandrB wrote:
| > GitHub, unlike Amazon and friends, doesn't lie about their
| downtime.
|
| Are you kidding? The last 2 incidents were called "degraded
| performance". Where "degraded" meant I would get nothing but
| 500 errors accessing GitHub.com either via browser or git
| itself for the duration of the outage. How is this not lying?
| everfrustrated wrote:
| GitHub is notorious for only noticing outages once the USA
| morning starts.
|
| If you're using GitHub in Europe or Asia it's not uncommon
| for GitHub to be offline for many hours before they
| acknowledge anying.
| mhh__ wrote:
| The company I work for has a bunch of non-programmers using and
| working in gitlab (or "the git"), I can't really see it
| happening with GitHub regardless of where it was hosted.
|
| Gitlab just seems better for actually running a software
| project.
| edgyquant wrote:
| I'm not sure at what organization that is true. My company
| lives out of GitHub and Jira and I've hardly noticed the three
| month surge. GitHub would have to do a lot worse to get many
| companies to want to host their own services. This is the
| argument people have said about the cloud from day one.
|
| People want to know it isn't their problem, that makes cloud
| computing (and things like GitHub) worth their weight in gold.
| I have real problems to solve I don't want to deal with a git
| repo manager on top of that.
| ManWith2Plans wrote:
| I will say that for us this is a huge deal. We're a devops
| services company, and our customers expect their deployment
| pipelines to work. This is becoming a huge pain-point for a
| few of our customers and we recommended Github Actions to
| them. A couple of our customers want us to move away from
| GitHub actions because of how disruptive outages have been.
| jeltz wrote:
| Maybe you are in a different time zone because our
| organization certainly noticed and was disrupted by this.
| edgyquant wrote:
| I'm on PST time, some of our other devs are on the east
| coast and one is in India. I think we're spread out enough
| it should be an issue but maybe we prioritize different
| things.
| jeltz wrote:
| We are in CET and maybe we use Github differently than
| you.
| jhugo wrote:
| I think the impact was for some reason not consistent between
| users (maybe due to geographical factors or maybe sharding of
| accounts?). We're in Asia and I think we've had three
| different days recently where we couldn't actually get much
| work done due to GitHub being flakey or down for the entire
| day and our CI/CD and development processes being built
| around it. We ended up moving off GitHub onto a self-hosted
| system, which took about a day of work for one engineer
| (CI/CD itself was already self-hosted, so just Git, issues
| and PRs), and there have already been two more GitHub outages
| since then.
| gjulianm wrote:
| At my organization it's always been true. Setting up GitLab
| is fairly easy, in my company we do it and it's cheap (on-
| prem hosting is basically zero, and we had the IPs/domains
| already) and it hasn't given us too many headaches. I think
| last time I had to do something was maybe a few months ago
| when I restarted it so that it picked up the updated SSL
| certificate.
| nightpool wrote:
| Self-hosted GitLab got a good callout yesterday from
| Microsoft, it appears to be a favorite of LAPSUS$: https://
| www.microsoft.com/security/blog/2022/03/22/dev-0537-...
|
| Self-hosting _always_ increases the operational burden of
| making sure your systems are secure. Maybe you have the
| engineering resources to spend on patching everything
| immediately and conducting in-house pen tests, but for most
| companies it 's much, much more secure to let the
| software's developers host it as well.
| temp8964 wrote:
| Not necessarily. Self-hosted services are protected by
| company firewall / VPN. They can setup very restrictive
| network access. They don't have the same level of risks
| as public services like GitHub or GitLab.
| hhh wrote:
| Establishing an entry point via VPN is Lapsus$ primary
| first step.
| Melatonic wrote:
| Except that the software developers hosting is also a
| much, much bigger target and you generally do not have
| any real control over how often they are patching either.
| xondono wrote:
| I'd say it depends, I run my own on prem server and gitlab
| was a PITA. Too many moving parts, updating took too much
| of my time, and I never felt "safe".
|
| Moving to gitea solved all of those issues for me (thus
| far), now I'm looking into adding other stuff like CI
| through Drone.
| Inversechi wrote:
| Did you consider woodpecker instead of drone? It's
| basically an evolved fork of the OSS version.
|
| https://woodpecker-ci.org/
| xondono wrote:
| Didn't even know about it. I'll check it out.
|
| Thanks!
| KronisLV wrote:
| Curiously, this was also my own experience!
|
| I actually wrote a bit about the migration process, as
| well as the reasons for migrating over to Gitea, Nexus
| and Drone CI as opposed to using GitLab, GitLab Registry
| and GitLab CI: https://blog.kronis.dev/articles/goodbye-
| gitlab-hello-gitea-...
|
| With containers, it's actually a pretty good experience
| that's not too hard to setup or manage.
| edgyquant wrote:
| It definitely depends. We're pretty early stage and I'm
| the senior engineer+infrastructure guy so running our own
| gitea instance or whatever is just more time that I'm
| almost out of.
| skeeter2020 wrote:
| >> Setting up GitLab is fairly easy, in my company we do it
| and it's cheap (on-prem hosting is basically zero, and we
| had the IPs/domains already)
|
| In what tech company is hosting or domains the main cost
| centre? Many companies spend more on a single hour of a
| dev's time than their entire GH monthly bill.
| capitol_ wrote:
| I think we pay about $10 per developer per month for
| github, and with about 1000 developers I would love that
| hourly rate.
| kleebeesh wrote:
| ...What? $10 x 1000 = $10k / month. $10k x 12 = $120k.
| That is a new grad software engineer salary in any US
| city. You'd pay more than that for a single dev with the
| devops and security experience to keep GHE running and
| patched for 1000 devs.
| zeku wrote:
| Just a bone to pick... new grad engineers in my US city
| started around 60-70k in 2018 when my college cohort
| graduated. Southern US...
| zrail wrote:
| Things have changed considerably over the last four
| years.
| sitzkrieg wrote:
| yea and starting salary for zero experience developers is
| not $120k in most places
| oceanplexian wrote:
| There are a lot of problems with this from the business
| angle:
|
| (1) An engineer getting paid 120k doesn't "cost" 120k,
| probably >150k with federal taxes, health insurance,
| benefits, and so on. Not including the cost to recruit,
| interview, and train said person.
|
| (2) I don't know of many 1,000 person companies that
| would trust a new grad software engineer with no
| experience to manage critical infrastructure.
|
| (3) You need N engineers to manage said service, because
| what happens when your one engineer gets sick, takes PTO,
| or quits for some reason? You also need a manager for
| said engineer(s).
|
| (4) You now need to secure an internal service you never
| did before, so expect to have to hire external security
| consultants or re-allocate security engineers, since it's
| high risk.
|
| (5) Github is FedRAMP compliant, SOC1 and SOC2 compliant
| and GDPR compliant. If you or your customers need any of
| those things, expect to hire external auditors on a
| recurring basis to validate your home-grown solution
| meets those requirements.
|
| I hate to make these points because I'm a big believer in
| the scrappy startup mentality, but if you want to do
| things right, in the context of a large enterprise that
| is accountable to a lot of people, expect a project like
| this to cost $1MM per year minimum, and it probably won't
| reach parity with a cloud offering in terms of
| reliability, multi-region performance, proper backups,
| and so on. This is why Github can charge ~$200 per user
| (Or $200k per year for 1,000 seats) and still come away
| looking like a bargain.
| tjoff wrote:
| Well, considering you'd likely spend an average of 5
| minutes per day doing it I wouldn't mind it.
| cortesoft wrote:
| The person was replying to a comment saying they spend
| more on a SINGLE HOUR of a dev's time than the monthly GH
| bill, which is not true for an org of more than 20 people
| or so (depending on hourly rate).
| kleebeesh wrote:
| Ah, totally misread it. Thanks.
| thaeli wrote:
| Also, looking at this it seems like GitHub isn't doing the
| common SaaS thing of just lying on their status page. Many
| providers, both internal and external, would look a lot worse
| if they had honest status pages.
| jhugo wrote:
| Several of the recent outages were much longer (at least
| for us, here in Asia) than they admitted on their status
| page. In one case I started work, noticed I couldn't push
| to or pull from GitHub, that situation persisted all day,
| and around 5pm local time (so morning-ish in the US)
| suddenly their status page acknowledged the problem and a
| discussion started on HN.
| mirekrusin wrote:
| They are green for good 15 minutes from first moment i see
| problems, not the first time, it happens actually quite
| often. Maybe that's the time they need to confirm/cross
| check/write status update, don't know.
| judge2020 wrote:
| They probably allow regular SREs to trigger an incident
| on the status page on their own, when the likes of AWS
| and other bigger cloud providers are rumored to need
| approval from a VP[0] to update the status page.
|
| 0: https://news.ycombinator.com/item?id=29475756
| georgemcbay wrote:
| While quicker reporting would be better, 15 minutes is
| anecdotally a lot better than I see from most other
| services where their status pages will report all-clear
| hours into full outages.
| thaeli wrote:
| Yeah, I'm legit impressed with a 15 minute time here.
| ishanjain28 wrote:
| They do intentionally or not lie about this on their status
| page. From December 25th to December 31st 2021, Github
| actions had network problems almost every single day for
| hours and the status page was green out through out that
| period.
|
| Same thing also happened few months back.
|
| It feels like they do this manually and it's only done when
| enough people are effected.
| SkyPuncher wrote:
| > I'm not sure at what organization that is true. My company
| lives out of GitHub and Jira and I've hardly noticed the
| three month surge
|
| These have been minor inconveniences for us - at worst. Most
| of the time it simply means people jump to something else
| then come back later in the day.
|
| Failing tests and PR feedback cycles are more of a blocker to
| our team than these outages.
| jmartens wrote:
| My company monitors the functionality, performance and
| availability of apps like Github, and we have certainly
| noticed the increase in issues lately.
| edgyquant wrote:
| We were actually talking about implementing this last week.
| Not for GitHub but for slack as it seems to have issues
| once a month or so.
| wnevets wrote:
| > I've hardly noticed the three month surge.
|
| This has been my experience as well. I don't know if that
| means GitHub is being overly transparent about issues or I've
| just been lucky but I would hate if people punished services
| for being transparent and informative on their status pages.
| zenexer wrote:
| GitHub's outages have hit me hard over the past week or so.
| I don't think it's a matter of them being transparent--if
| anything, I was hitting errors well before their status
| page updated. Yesterday it was completely unusable for much
| of my workday, and today tasks that normally take me a few
| minutes have been taking hours.
| jacobr wrote:
| 20 PRs waiting in line for half a day to be merged is pretty
| annoying. We've had that on multiple occasions the last few
| weeks due to GitHub incidents.
| dijonman2 wrote:
| GitHub enterprise is amazing, but I agree that a centrally
| hosted Git instance of any variety is a liability.
|
| With the advent of the Okta breach I think we will see a
| reverse in the centralization trend.
| gwbas1c wrote:
| > At this rate the benefits of running your own gitea or gitlab
| are starting to become competitive
|
| When you host things yourself, you still have downtime. And,
| having worked with Github for over a decade, the actual
| disruption to my work is from downtime is much less than if I
| had to host my own.
|
| That being said: I briefly worked for a company that hosted its
| own source code control system. For us, as a small team, it
| wasn't worth it. The system was outdated and hosted in an
| insecure manner. No one ever did any "admin" work except the
| founder. He ran it because he had irrational fears of
| switching, not because of any tangible advantages over Github
| (and competitors.)
|
| Keep in mind that Github (and competitors) are often cheaper
| than the time needed to invest in hosting your own. (Estimate
| 10-20 hours a year of invested time. Calculate your hourly
| rate. Github and competitors are cheaper.) In order to come
| ahead, you need tangible benefits other than "I think I can
| have less downtime."
| megous wrote:
| Dunno, I got blocked from my work SaaS hosted gitlab for
| about a month by cloudflare. Nobody at gitlab or cf helped. I
| only figured it myself after about 4 hours of research, that
| it was caused by some disabled (by me years ago) web tracking
| APIs no-one should have hard dependence on.
|
| I certainly would not have this problem on self hosted
| instance, because it would not be behind CF. I'm sure I'd
| have other problems though. :)
|
| All software is crap. You can be either spending time fixing
| it yourself, or spending time begging online for fixes/help
| from some SaaS company/community with resolution time in
| months, somtimes, all that while you may not be able to use
| it fully.
|
| Also with SaaS it will be constantly shifting under you.
| Things will be moved around, restyled, iconized, popupized,
| etc. This doesn't help productivity either. With self-
| hosting, you can at least avoid upgrading, if you dislike
| this kind of thing. Or choose FOSS software that values UX
| permanency/stability, which seems to be really hard ask from
| SaaS business.
| mrkurt wrote:
| If you want companies to be honest on their status pages (I
| do!), you can't just count incidents like that. Status pages
| can be an amazing place to communicate all kinds of problems.
|
| Most issues have a relatively narrow impact, but the impacted
| people _still_ benefit from seeing them listed.
| jmartens wrote:
| How can we solve this as customers, or push the vendor to do
| better?
| mrkurt wrote:
| Use vendors who do a good job communicating status,
| basically. I don't think you can change AWS behavior. But
| if you find a hosting company who does an amazing job with
| their status updates, put some apps there (_my_ company
| does an ok job with status page updates, we're getting
| better, it's not amazing yet).
| encryptluks2 wrote:
| Stop being a customer of crappy vendors
| copperx wrote:
| What cloud provider does better status pages than AWS?
| drusepth wrote:
| The snarky answer is "literally all of them", but one
| real answer is that I've been pretty happy with GCP's
| status reporting for the past year-ish I've used them.
| I've only noticed a few incidents, but every time I've
| checked the status it was already updated. They also
| occasionally provide workarounds on the live incident
| pages if you need to be back up before the issue is fixed
| on their end.
| rvz wrote:
| Well, I think I have said that since 2020 [0] and it is self-
| evident that you are better off self-hosting your own Git repo.
| If you can host a website you can do it. If GNOME, ReactOS,
| Wireguard, Linux Kernel Project, Mozilla, etc can do it, so can
| you. Or even use it as a backup / failsafe just in case.
|
| But going _' all in'_ on GitHub just doesn't make any sense
| anymore.
|
| [0]
| https://hn.algolia.com/?dateRange=all&page=1&prefix=true&que...
| Someone wrote:
| But who can host a website? I would be wary of hosting
| something that isn't a 100% static site, out of fear of the
| amount of attention maintenance would take.
|
| Also, quite a few of the non-profits behind the projects you
| mentioned have multi-million dollar budgets that they can use
| to administer their git instance, if needed. I don't think
| "if they can do it, you can" is a strong argument for those.
| rvz wrote:
| I don't recall ReactOS, or the creators of wireguard having
| _' multi million dollar budgets'_. How is it that even
| projects like RedoxOS [0] are able to self-host on a GitLab
| instance using a subdomain, without giant budgets in the
| millions?
|
| You don't need a _' multi-million dollar budget'_ to self-
| host a git repo and may of these open-source projects have
| been doing so even before GitHub existed for years. Even if
| they did have such a budget, there isn't an excuse left to
| self-host and avoid going _' all in'_ on GitHub.
|
| At the very least I would expect something like what
| ReactOS is doing by having a self-hosted backup just in
| case GitHub goes down or vice-versa. [1]
|
| Looks like that is proving to be useful.
|
| [0] https://gitlab.redox-os.org/redox-os
|
| [1] https://github.com/reactos/reactos#code-mirrors
| Someone wrote:
| > You don't need a 'multi-million dollar budget' to self-
| host a git repo
|
| I never made that claim. The argument was "if X can do
| it, so can you".
|
| I pointed out that _some_of_these_ (Mozilla, likely the
| most extreme of them, had over $400 million in revenues
| in 2020), are quite different from the typical 'you',
| invalidating that argument.
|
| As always, invalidating an argument doesn't mean its
| conclusion is wrong.
| rglullis wrote:
| My last bill from Hetzner was ~35EUR. I host gitea, drone
| CI, hashicorp vault and my own docker registry/pypi
| repository. I can add as many users as I want, and I had
| exactly zero incidents in the past ~6 years since I set
| this up.
|
| I don't even worry about a strong backup strategy (besides
| just making occasional snapshots of the data volumes)
| because this was all set up with IaC tools (Terraform,
| Ansible) and I have copies of all the code in local
| repositories.
| rglullis wrote:
| It's almost like people forget that git is a _Distributed_
| Version Control System, after all...
| djbusby wrote:
| GitHub/Lab are for more than just code repo
| Jiejeing wrote:
| If you are a closed org, that is. Running your own gitea or
| gitlab with registration enabled and having to deal with spam
| is a real hurdle.
| julianlam wrote:
| Is it not possible to restrict access to the git server from
| a VPN server only?
|
| Just off the top of my head, that's one thing you can do.
| mlyle wrote:
| Yah, that's a "closed org". When you need to deal with the
| public at large, you need to deal with user registration
| issues and spam.
| Dobbs wrote:
| So now every person who wants to contribute to your open
| source project has to setup a VPN client?
|
| The parent comment was explicitly about non-closed (e.g.
| private) orgs.
| eatonphil wrote:
| Github Actions are back for me now.
| toastal wrote:
| And to think Git can easily be decentralized. I wonder if the
| community could fork GitHub to fix it. Oh, it's not open source.
| Devs must be too busy working on more 'social' features like "For
| You (Beta)" to milk the attention economy.
| intunderflow wrote:
| With how often these happen we might as well sticky this thread
| for the next one
| grumple wrote:
| Again?! Jeez. I wish I had customers this tolerant.
| rvz wrote:
| Again? Last time that happened was 24 hours ago? [0] It is really
| getting unreliably bad. Like I said before, having a self-hosted
| backup seems to make more sense.
|
| [0] https://news.ycombinator.com/item?id=30767821
| mirekrusin wrote:
| Status page says only degraded performance.
|
| It's a nice way of putting it.
|
| I'm trying to run github action for couple of hours now. They
| don't work at all. But apparently this means they run, but in
| infinite time, hence == degraded performance, nice.
| raffraffraff wrote:
| It's just a way to avoid SLA breaches. "Of course it wasn't
| _down_! It was just infinitely slow! "
| koolba wrote:
| I really wish they would add the word "outage" to these titles.
|
| "Incident" alone makes me think something got hacked or leaked.
| arez wrote:
| That's SRE lingo --> https://sre.google/sre-book/managing-
| incidents/
| zufallsheld wrote:
| It's also itil lingo, which predates sre.
| mtnops wrote:
| It's NIMS - FEMA lingo, which predates ITIL. Which was
| developed in USFS wildland firefighting, which predates
| FEMA. It's incident management all the way down.
| blueplanet200 wrote:
| I hope they figure out what's going on every morning. Heard from
| inside they don't know why the db dies everyday but restarting it
| fixes it.
| cube00 wrote:
| Break out the early morning restart cron job.
| Kostic wrote:
| Early morning in which timezone?
| glenneroo wrote:
| When the least amount of users are online?
| afterburner wrote:
| GaryOldman.gif
| gaoshan wrote:
| Here you go, Github:
|
| 0 4 * * * /etc/init.d/postgresql restart
|
| I'll take an architect position as compensation, but only if
| there is equity.
| rish wrote:
| GitHub uses MySQL primarily though.
| grumple wrote:
| MySQL also has a restart command! I'll take my rsus now
| ty.
| shepardrtc wrote:
| IIS Server had/has a memory leak in worker threads that many
| years ago always forced us to restart the server every few
| days. Starting in 6.0, they added worker thread recycling and
| made it a mandatory to choose a time period for every thread to
| be recycled. Why fix the error when you can just restart the
| service?
| whimsicalism wrote:
| I doubt they use IIS
| throwra620 wrote:
| MSer here, yes we do... for some things
| prepend wrote:
| For GitHub? It seems unbelievable that they would use IIS
| pre-purchase and why in the world would you mix in a
| second web server for post-purchase enhancements.
| whimsicalism wrote:
| If GH is around the same level of integration with
| Microsoft as my employer, which is another Microsoft
| acquisition, I don't really believe you have a ton of
| insight into GH processes.
| edgyquant wrote:
| I dated a girl at GitHub for awhile last year who said
| they weren't even completely off of AWS yet and she liked
| how they didn't seem like working for Microsoft. Maybe
| this has changed though.
| djbusby wrote:
| Apache prefork had that since forever. Seems just a garbage
| collect type pattern.
| shepardrtc wrote:
| It's not a bug, it's a pattern.
|
| Seriously though, IIS 5.0 had no worker recycling. There
| was no method to fix the issue. Threads would eat up GB's
| of memory until you killed them.
| raffraffraff wrote:
| Yuck. Honestly, restarting a database to fix a major outage
| sounds like "we have no idea what we're doing"
| bpicolo wrote:
| Sporadic database performance issues can certainly make you
| feel that way. They are definitely not trivially debugged at
| scale
| vimda wrote:
| Would you rather it stay down while they spend a day
| debugging it?
| paulryanrogers wrote:
| If that means it won't be down every morning in my time
| zone then yes.
| exikyut wrote:
| What's "the db"? It sounds like something of small to medium
| scale if you can just restart it like that.
|
| In any case, why not just relocate some vendor engineers on
| site for a bit? Or, better, why does the vendor not have a
| small presence in the corner?
|
| Sounds like whatever "the db" is it's probably some
| (objectively) small but very scary thing that's currently on
| fire and people are trying to figure out how to put it out
| without crashing the plane _and_ also making too many waves
| internally, which is probably even harder. So asking about
| making vendor noises is (as useful as it may be) probably going
| down the wrong path - in much the same way this is probably not
| related to the outages (it may well be, but from the outside it
| 's all coincidence anyway).
| jonnybarnes wrote:
| 2nd day in a row isn't it?
| stepri wrote:
| And 6 days ago: https://news.ycombinator.com/item?id=30711269
| rvz wrote:
| It is. 24 hours later [0] and I only expected it to happen once
| every month. Looks like it is getting worse.
|
| Oh dear. Not a good idea to go 'all in' on GitHub.
|
| [0] https://news.ycombinator.com/item?id=30767821
| momothereal wrote:
| Yes: https://news.ycombinator.com/item?id=30767635
| fishnchips wrote:
| Yesterday they had two.
| higeorge13 wrote:
| The usual services (actions) again down around the same time.
| This is embarrassing.
| etimberg wrote:
| The quality of GH seems to be slipping
| amelius wrote:
| I hope it doesn't affect security ...
| xtracto wrote:
| Funny that it happened since they were acquired by Microsoft...
| reminds me of Hotmail, Skype, LinkedIn, Rare, among several
| others.
| [deleted]
| [deleted]
| Trasmatta wrote:
| I've actually been pretty impressed with the quality of the
| product and new features over the past couple of years, but it
| seems to be having a lot of stability issues recently.
| etimberg wrote:
| I've liked the new features too, especially after so many
| years of not many features. Maybe they've moved too fast now
| paskozdilar wrote:
| bob1029 wrote:
| We are scheduling a call with an enterprise sales person next
| week.
|
| If I can get all the Github features I had as of ~2020, but on an
| instance that wont get hit by the public cloud/update bus, I
| would be exceptionally happy.
|
| The only complaints we have are regarding availability. If we can
| fix that one problem, this is a perfect product in our view.
| andruby wrote:
| How do you evalute running your own gitlab instance?
| [deleted]
| iBotPeaches wrote:
| It seems like we haven't had a non-robot status update on the
| status page in days since this what seems like daily occurrence.
| I figure at this point we'd get something of why this is
| happening.
|
| I also don't appreciate our builds freezing, unable to be
| cancelled and then eating up hundreds of minutes.
| MattIPv4 wrote:
| > I figure at this point we'd get something of why this is
| happening.
|
| I've created a new discussion in their feedback repo asking for
| this, three major outages in a week could really do with a
| post-mortem:
| https://github.com/github/feedback/discussions/13344
| lucasyvas wrote:
| Billing should always be built on a "ping" IMO and not
| start/stop hooks. The latter is shockingly bad for customers
| during times of unreliability. The former sounds stupid and
| requires more infrastructure from the one offering the service,
| but I think it's more fair.
|
| I haven't used GA in a way where it actually costed me
| anything, but having minutes just tick away while you can't do
| anything is really stupid if that's the case.
|
| Edit: Another sane solution would probably be to record outage
| periods and have Billing automatically reconcile for every
| customer when invoicing. This would require them to admit the
| outage durations however, so it may be flawed from a human
| perspective.
| drusepth wrote:
| The "ping" solution is an interesting one that I haven't seen
| proposed before.
|
| At what rate would you do these pings? I don't know how
| upgrading/downgrading works at GitHub but if they do any sort
| of refund/credit when you downgrade, it seems like there's
| some interesting implications for abusing the system (e.g.
| upgrading/downgrading between pings for "free" service if the
| time between them is too long) versus performance (e.g. how
| do you update all users per ping in a timely manner if the
| time between them is too short?).
|
| Would love to read up more on this approach; seems
| interesting!
| easton wrote:
| Do they give you the minutes back if there's an incident during
| the period where a job is running?
| no_wizard wrote:
| You will have to contact them for them to credit you, that's
| what we did
| lucasyvas wrote:
| This is totally unsurprising and also totally unacceptable
| IMO. They should automatically wipe out all build minute
| usage during outages for every account if they insist on
| architecting their system in this way.
| mhitza wrote:
| I suggest you add the timeout-minute property on the job/step,
| so even if the web interface isn't responsive the job times out
| eventually. Saves you from spending time emailing support about
| consumed minutes.
|
| Of course, assuming that a future bug won't affect the timeout-
| minute itself.
| cube2222 wrote:
| Looks like they really want to get a PR deployed, but there's
| still not enough duct tape on it.
___________________________________________________________________
(page generated 2022-03-23 23:01 UTC)