[HN Gopher] Anyone can access deleted and private repository dat...
___________________________________________________________________
Anyone can access deleted and private repository data on GitHub
Author : __0x1__
Score : 733 points
Date : 2024-07-24 18:24 UTC (4 hours ago)
(HTM) web link (trufflesecurity.com)
(TXT) w3m dump (trufflesecurity.com)
| TazeTSchnitzel wrote:
| This is not new. Many people have noticed this before, e.g.
| https://hikari.noyu.me/blog/2020-05-05-github-private-repos-...
| andersa wrote:
| I reported this on their HackerOne many years ago (2018 it seems)
| and they said it was working as intended. Conclusion: don't use
| private forks. Copy the repository instead.
|
| Here is their full response from back then:
|
| > Thanks for the submission! We have reviewed your report and
| validated your findings. After internally assessing the finding
| we have determined it is a known low risk issue. We may make this
| functionality more strict in the future, but don't have anything
| to announce now. As a result, this is not eligible for reward
| under the Bug Bounty program.
|
| > GitHub stores the parent repository along with forks in a
| "repository network". It is a known behavior that objects from
| one network member are readable via other network members. Blobs
| and commits are stored together, while refs are stored separately
| for each fork. This shared storage model is what allows for pull
| requests between members of the same network. When a repository's
| visibility changes (Eg. public->private) we remove it from the
| network to prevent private commits/blobs from being readable via
| another network member.
| tedivm wrote:
| I reported a different security issue to github, and they
| responded the same (although they ultimately ended up fixing it
| when I told them I was going to blog about the "intended
| behavior").
| myfonj wrote:
| What "intended behaviour" was that, specifically?
| liendolucas wrote:
| Honest question. Submitting these types of bugs only to get a:
| "we have determined it is known low risk issue..." seems like
| they really don't want to pay for someone else's time and
| dedication in making their product safer. If they knew about
| this, was this disclosed somewhere? If not I don't see them
| playing a fair game. What's the motivation to do this if in the
| end they can have the final decision to award you or not? To me
| it looks like similar to what happens with Google Play/Apple
| store to decide whether or not an app can be
| uploaded/distributed through them.
|
| Edit: I popped this up because to me is absolutely miserable
| from a big company to just say: "Thanks, but we were aware of
| this".
| kayodelycaon wrote:
| As the article pointed out, GitHub already publicly
| documented this vulnerability.
|
| My employer doesn't pay out for known security issues,
| especially if we have mitigating controls.
|
| A lot of people spam us with vulnerability reports from
| security tools we already use. At least half of them turn out
| to be false positives we are already aware of. In my opinion,
| running a bug bounty program at all is a net negative for us.
| We aren't large enough to get the attention of anyone
| competent.
| ipaddr wrote:
| For both sides it turns into a net negative. Better to keep
| your bugs and use them when needed or sell them to others
| to use if possible.
|
| Lets get back to what we had before when multiple people
| can find the same bug and exploit if needed. Now we have
| the one person who finds the bug it gets patched and they
| don't get paid.
| giobox wrote:
| > As the article pointed out, GitHub already publicly
| documented this vulnerability.
|
| I'm honestly not yet convinced that is enough here - I've
| fallen victim to this without realizing it - the behaviour
| here is so far removed from how I suspect most user's
| mental model of github.com works. For me none of the
| exposed data is sensitive, but the point remains I was
| totally unawares it would be retrievable like this.
|
| If the behaviour flies so against the grain, just
| publishing it in a help doc is not enough I'd argue. The
| linked article makes the exact same argument:
|
| > "The average user views the separation of private and
| public repositories as a security boundary, and
| understandably believes that any data located in a private
| repository cannot be accessed by public users.
| Unfortunately, as we documented above, that is not always
| true. Whatsmore, the act of deletion implies the
| destruction of data. As we saw above, deleting a repository
| or fork does not mean your commit data is actually
| deleted."
| tptacek wrote:
| The problem with this line of argument is that the
| fundamental workings of git are also surprising to
| people, such that they routinely attempt to address
| mistaken hazmat commits by simple reverts. If at bottom
| this whole story is just that git is treacherous, well,
| yeah, but not news.
|
| There's a deeper problem here, which is that making the
| UX on hosting sites less surprising doesn't fix the
| underlying problem. There is a best-practices response to
| commiting hazmat to a repository: revoke the hazmat, so
| that its disclosure no long matters. _You have to do this
| anyways._ If you can 't, you should be in contact with
| Github directly to remove it.
| Cpoll wrote:
| Is "git" relevant here? Forking isn't a git concept, and
| none of this behaviour has much to do with git; it's all
| GitHub.
|
| Also, you can revoke an API key, but you can't revoke a
| company-proprietary algorithm that you implemented into a
| fork of a public project.
| tptacek wrote:
| Like I said: if you can't revoke the thing you committed,
| you need to get in touch with Github and have them remove
| it. That's a thing they do.
| jonahx wrote:
| Not defending GH here (their position is indefensible imo)
| but, as the article notes, they document these behaviors
| clearly and publicly:
|
| https://docs.github.com/en/pull-requests/collaborating-
| with-...
|
| I don't think they're being underhanded exactly... they're
| just making a terrible decision. Quoting from the article:
|
| > The average user views the separation of private and public
| repositories as a security boundary, and understandably
| believes that any data located in a private repository cannot
| be accessed by public users. Unfortunately, as we documented
| above, that is not always true. Whatsmore, the act of
| deletion implies the destruction of data. As we saw above,
| deleting a repository or fork does not mean your commit data
| is actually deleted.
| andersa wrote:
| Based on some (admittedly not very thorough) search, this
| documentation was posted in 2021, three years after my
| report.
| YetAnotherNick wrote:
| But that would still means they didn't intend to fix it,
| hence not giving bounty is fair.
| malfist wrote:
| It's a bug bounty, not a "only if we have time to fix it"
| bounty.
|
| He found a security problem, they decided not to act on
| it, but it was still an acknowledged security problem
| madeofpalk wrote:
| The point of a bug bounty is for companies to find new
| security problems.
|
| If the (class of) problem is already known, it's not
| worth rewarding.
| berdario wrote:
| I can see this argument making a bit of sense, but if
| they documented this 3 years after the issue was
| reported, they don't have a way to demonstrate that they
| truly already knew.
|
| At the end it boils down to: is Github being honest and
| fair in answering the bug bounty reports?
|
| If you think it is, cool.
|
| If you don't, maybe it's not worth playing ball with
| Github's bug bounty process
| tptacek wrote:
| It doesn't matter if they knew. If they don't deem it a
| security vulnerability --- and they have put their money
| where their mouth is, by _documenting it as part of the
| platform behavior_ --- it 's not eligible for a payout.
| It can be a bug, but if it's not the kind of bug the
| bounty program is designed to address, it's not getting
| paid out. The incentives you create by paying for every
| random non-vulnerability are really bad.
|
| The subtext of this thread is that companies should
| reward any research that turns up surprising or user-
| hostile behavior in products. It's good to want things.
| But that is not the point of a security bug bounty.
| cycomanic wrote:
| I would argue that even if the behaviour was as intended,
| at least the fact that it was not documented was a bug
| (and a pretty serious one at that).
| andrewinardeer wrote:
| If a renown company won't pay a bug bounty, a foreign
| government often will.
| madeofpalk wrote:
| Why would a foreign government pay for a commonly known
| security limitation of a product?
| prepend wrote:
| Good luck selling this to a foreign (or domestic)
| government. It doesn't seem valuable to me, but who
| knows, maybe someone finds it worth payout.
| coldtea wrote:
| > _It 's a bug bounty, not a "only if we have time to fix
| it" bounty_
|
| It's only a bug if it's not intended
| jowea wrote:
| Shouldn't that be on the config page for the repo below the
| "private" button with a note saying private is not actually
| private if it's a fork? And ditto for delete?
| 93po wrote:
| companies vary wildly in their honesty and cooperation with
| bug bounties and develop reputations as a result. if they
| have a shit reputation, people stop doing free work for them
| and instead focus on more honest companies
| andersa wrote:
| I didn't find anything mentioning it online at the time. But
| there wasn't much time and dedication involved either, to be
| fair. I discovered it completely on accident when I combined
| a commit hash from my local client with the wrong repository
| url and it ended up working.
| cyrnel wrote:
| Security disclosures are like giving someone an unsolicited
| gift. The receiver is obligated to return the favor.
|
| But if you buy someone non-refundable tickets to a concert
| they already have tickets for, you aren't owed compensation.
| nyrikki wrote:
| For moral reasons, historically I never wrote POCs or
| threatened disclosure.
|
| For companies like Microsoft, which a CSRB audit showed that
| their security culture 'inadequate', the risk of disclosure
| with a POC is about the only tool we have to enforce their
| side of the Shared Responsibility Model.
|
| Even the largest IT spender in the world, the US government
| has moved more from the carrot to the stick model. If they
| have to do it so do we.
|
| Unfortunately as publishing a 'bad practices' list by us
| doesn't invoke the risk of EULA busting gross negligence
| claims, responsible disclosure is one of the few tools we
| have.
| tptacek wrote:
| No large company running a bug bounty cares one iota about
| stiffing you on a bounty payment. The teams running this
| programs are internally incentivized to _maximize_ payouts;
| the payouts are evidence that the system is working. If you
| 're denied a payment --- for a large company, at least ---
| there's something else going on.
|
| The thing to keep in mind is that large-scale bug bounty
| programs make their own incentive weather. People game the
| hell out of them. If you ack and fix sev:info bugs, people
| submit _lots_ more sev:info bugs, and now your security
| program has been reoriented around the dumbest bugs --- the
| opposite of what you want a bounty program to do.
| hluska wrote:
| The issue had been reported at least twice and was clearly
| documented. GitHub knew about this and had known for years.
| Their replies to the two notifications were even very
| similar.
|
| GitHub clearly knew. Would you prefer that a vendor lie?
| kayodelycaon wrote:
| What does "private fork" mean in this context? I created a fork
| of a project by cloning it to my own machine and set origin to
| an empty private repository on GitHub. I manually merge
| upstream changes on my machine.
|
| Is my repository accessible?
| andersa wrote:
| No, that would be the "copy the repository" approach. Private
| fork is when you do it through their UI.
|
| As far as I know, it is not accessible.
| swozey wrote:
| Because you never git pushed to the fork it's not aware of
| your repo, you're ok.
|
| What I don't know is if in 3 months you DO set your remote
| origin to that fork to for instance, pull upstream patches
| into your private repo, you're still not pushing, only
| pulling, so I would THINK they'd still never get your
| changes, but I don't know if git does some sort of log sync
| when you do a pull as well.
|
| Maybe that would wind up having the commit hash available.
| masklinn wrote:
| It's not. The feature here works because a network of forks
| known by GitHub has a unified storage, that's what makes
| things like PRs work transparently and keep working if you
| delete the fork (kinda, it closes the PR but the contents
| don't change).
| dathinab wrote:
| then it's fine
|
| the issue is the `fork` mechanism of github is not
| semantically like a `git clone`
|
| it's more like creating a larger git repo in which all forks
| weather private or not are contained and which doesn't
| properly implement access management (at least point 2&3
| wouldn't be an issue if they did)
|
| there are also some implications form point 1 that forks do
| in some way infer with gc-ing orphan commits (e.g. the non
| synced commits in he deleted repo in point 1) at least that
| should be a bug IMHO one which also costs them storage
|
| (also to be clear for me 2&3 are security vulnerabilities no
| matter if they are classified as intended behavior)
| jeremyjh wrote:
| It would not even be that hard to fix it; private forks should
| always just be automatically copied on first write. You might
| lose your little link to the original repo, but that's not as
| bad as unintentionally exposing all your future content.
| sundalia wrote:
| Yup, we can close the thread and ack that GitHub does not
| care.
| fullstackchris wrote:
| To be fair, in the true git sense, if a "fork" is really just a
| branch, deleting the original completely would also mean
| deleting every branch (fork) completely
|
| obviously not a fan of this policy though
| SnowflakeOnIce wrote:
| There seems to be no such thing as a "private fork" on GitHub
| in 2024 [1]:
|
| > A fork is a new repository that shares code and visibility
| settings with the upstream repository. All forks of public
| repositories are public. You cannot change the visibility of a
| fork.
|
| [1] https://docs.github.com/en/pull-requests/collaborating-
| with-...
| Manuel_D wrote:
| Not through the GitHub interface, no. But you can copy all
| files in a repository and create a new repository. IIRC
| there's a way to retain the history via this process as well.
| make3 wrote:
| That's not the GitHub concept / almost trademark of "fork"
| anymore though, which is what your parent was talking about
| a1o wrote:
| I mean it's git, just git init, git remote add for origin
| and upstream, origin pointing to your private, git fetch
| upstream, git push to origin.
| mckn1ght wrote:
| You can create a private repository on GitHub, clone it
| locally, add the repo being "forked" from as a separate git
| remote (I usually call this one "upstream" and my "fork",
| well, "fork"), fetch and pull from upstream, then push to
| fork.
| shkkmo wrote:
| All you should have to do is just clone the repo locally
| and then create a blank GitHub repository, set it as the/a
| remote and push to it.
| JyB wrote:
| That's beside the point. The article is specifically about
| << GitHub forks >> and their shortcomings. It's unrelated
| to pushing to distinct repositories not magically 'linked'
| by the GH << fork feature >>.
| einpoklum wrote:
| Data that you place with an entity that is a large organization
| with many commercial and government ties - must be assumed to be
| accessible to some of those parties.
|
| And if that entity has a complex system of storage and retrieval
| of data by and for many users, that changes frequently, without
| public scrutiny - it should be assumed that data breaches are
| likely to occur.
|
| So I don't see it as very problematic that GitHub's private
| repositories, or deleted repositories, are only kind-sorta-
| sometimes private and deleted.
|
| And it's silly that the article refers to one creating an
| "internal version" of a repository - on GitHub....
|
| Still, interesting to know about the network-of-repositories
| concept.
| cxr wrote:
| > The implication here is that any code committed to a public
| repository may be accessible forever
|
| That's exactly how you should treat anything made available to
| the public (and there's no need for the subsequent qualifier that
| appears in the article--" _as long as there is at least one fork
| of that repository_ ").
| ilikehurdles wrote:
| Sometimes I wonder if all the security features GitHub slathers
| on top of `git` lull people into a false sense of security when
| fundamentally they're working in a fully distributed version
| control system with no centralized authority. If your key is
| leaked the solution is to invalidate the key not just
| synthetically alter your version of history to pretend it never
| happened.
| b800h wrote:
| This is more of a problem if you leak private information
| with a commit by accident. You can't really revoke that.
| kemitche wrote:
| You can't reach out to any machines that have pulled down
| that commit and forcibly delete it, either.
| miguelaeh wrote:
| Wow. This is wild!
| haneul wrote:
| Does any variant of this apply to DMCA'd repos in the repo
| network?
|
| For example if the root repo is DMCA'd, or, if repo B forks repo
| A, then B adds some stuff that causes B to get DMCA'd. Can A
| still access B?
| richbell wrote:
| I believe the entire network is suspended.
| haneul wrote:
| A downstream dmca suspends the upstream? That astonishes me.
| Anyone down to shut down react?
| lilyball wrote:
| Really the only semi-interesting part of this is "if you make a
| private repo public, data from other private forks might be
| discoverable", but even that seems pretty minor, and the best
| practice for taking private repos public is to copy the data into
| a new repo anyway.
| zelphirkalt wrote:
| Is that a best practice in hindsight, or because it was known
| to some, that this issue exists, or for what other reason do
| you consider it a best practice? Git history?
| lilyball wrote:
| When making a private repo public, there's a high chance that
| there was stuff in the private repo that isn't necessarily ok
| to make public. It's a lot easier to just create a new public
| repo containing all the data you want to make public than it
| is to reliably scrub a private repo of any data that
| shouldn't be there.
|
| More generally, you probably want to construct a new history
| for the public repo anyway, so you'll want a brand new repo
| to ensure none of the scrubbed history is accessible.
| xmodem wrote:
| Even after a private repo is made public, it's common practice
| for new functionality to be worked on in private until it's
| ready.
| HL33tibCe7 wrote:
| You've completely missed the most dangerous thing mentioned,
| namely that private forks are not private.
| hmottestad wrote:
| The biggest gotcha here is probably that if you start of with a
| private repo and a private fork, making the repo public also
| makes the fork "public".
|
| GitHub may very well say that this is working as intended, but if
| it truly is then you should be forced to make both the repo and
| fork public at the same time.
|
| Essentially "Making repo R public will make the following forks
| public as well 'My Fork', 'Super secret fork', 'Fork that I
| deleted because it contained the password to my neighbours wifi
| :P'.
|
| OK. I'm not sure if the last one would actually be public, but I
| wouldn't be surprised if that was "Working as intended(TM)" -
| GitHub SecOps
| pants2 wrote:
| Any time you make a private repo public it's best to just copy
| that code into a new public repo and leave the private repo
| private. Otherwise have to audit every previous commit and
| every commit on every fork of your private code.
| umpalumpaaa wrote:
| If I understand the issue correctly if you make the original
| repo public any private forks from other users are also
| effectively public. Right?
| kemitche wrote:
| I agree. The other cases may be mildly surprising, but
| ultimately fall firmly into the category of "once public on the
| internet, always public." Deleting a repo or fork or commit
| doesn't revoke an access key that was accidentally committed,
| and an access key being public for even a microsecond should be
| assumed to have been scraped and usable by a malicious actor.
| rvz wrote:
| Come on, this is not surprising.
|
| "Private repositories" were never private as I said before. [0]
|
| [0] https://news.ycombinator.com/item?id=23057769
| qual wrote:
| > _Come on, this is not surprising._
|
| Very cool that it is not surprising to you.
|
| But to others (some are even in this thread!) it is both new
| and surprising. They unfortunately missed your 4 year old
| comment, but at least they get to learn it now.
| londons_explore wrote:
| This isn't a bug IMO.
|
| If you know the hash of some data, then you either already have
| the data yourself, or you learned the hash from someone who had
| the data.
|
| If you already have the data, there is no vulnerability - since
| you cannot learn anything you don't already have.
|
| If you got the hash from someone, you could likewise have gotten
| the data from them.
|
| People do need to be aware that 'some random hex string' in fact
| is the irrevocable key to all the data behind that hash - but
| that's kinda inherent to gits design. Just like I don't tell
| everyone here on HN my login password - the password itself isn't
| sensitive, but both of us know it accesses other things that are.
|
| If github itself was leaking the hash of deleted data, or my
| plaintext password, then _that_ would be a vulnerability.
| jkaptur wrote:
| That's counterintuitive, though - often, the whole point of a
| hash is that it's one-way.
| haneul wrote:
| > If you know the hash of some data, then you either already
| have the data yourself, or you learned the hash from someone
| who had the data.
|
| Don't think so - the article mentions you can use the short
| prefix on GitHub, so you have a search space of 65536.
| qual wrote:
| > _If you know the hash of some data, then you either already
| have the data yourself, or you learned the hash from someone
| who had the data._
|
| From the article, you do not need to have the data nor learn
| the hash from someone who had the data.
|
| > _Commit hashes can be brute forced through GitHub's UI,
| particularly because the git protocol permits the use of short
| SHA-1 values when referencing a commit. A short SHA-1 value is
| the minimum number of characters required to avoid a collision
| with another commit hash, with an absolute minimum of 4. The
| keyspace of all 4 character SHA-1 values is 65,536_
| londons_explore wrote:
| In which case, yeah, thats a vulnerability. They shouldn't
| allow a short hash to match up against anything but public
| data.
| gus_massa wrote:
| It's common to use short hash in pull request, and then
| modify or rebase the commits.
|
| The solutions are:
|
| * Force people to use the full hash.
|
| * Get use to a lot of dead links.
|
| * Claim that it's a feature, not a bug.
| guipsp wrote:
| * Force people to use the full hash for commits pushed
| now on?
| Aurornis wrote:
| > If you know the hash of some data, then you either already
| have the data yourself, or you learned the hash from someone
| who had the data.
|
| You need to read to the end of the article where they show the
| brute-force way of getting the hashes.
| refulgentis wrote:
| Read TFA.
| jonahx wrote:
| Surprised at the comments minimizing this.
|
| I've used github for a long time, would not have expected these
| results, and was unnerved by them.
|
| I'd recommend reading the article yourself. It does a good job
| explaining the vulnerabilities.
| hyperpape wrote:
| For the first two, git is based on content addressable storage,
| so it makes sense that anything that is ever public will never
| disappear.
|
| I can sympathize with someone who gets bit by it, as it might
| not have occurred to them, but it's part of the model.
|
| The third strikes me as counter-intuitive and hard to reason
| about.
|
| P.S. If you publish your keys or access tokens for well known
| services to GitHub and you are prominent enough, they will be
| found and exploited in minutes. The idea that deleting the
| repository is a security measure is not really worth taking
| seriously.
| jonahx wrote:
| I agree the 3rd is by far the worst of the offenders. But
| even the first two should have more visibility. For example,
| by notifying users during deletion of forked repos that data
| will still be available.
|
| The exact UX here is debatable, but I don't think security
| warnings buried in the docs is enough. They should be
| accounting for likely misunderstandings of the model.
| hyperpape wrote:
| Even if it wasn't forked, it could be cloned. Should that
| be part of the warning?
|
| I wouldn't mind a disclaimer when you delete a repository
| that any information that repository ever contained is
| likely to have already been downloaded and stored. Per the
| comment I added, I'm not sure it would really help that
| much, but it would not be harmful.
| jonahx wrote:
| > Should that be part of the warning?
|
| It couldn't hurt, but that isn't the misunderstanding I'm
| worried about.
|
| As described in the first example of the article, you can
| make a fork, commit to it, delete _your entire fork_ ,
| and yet the data will still be accessible via the parent
| repo, even though no one ever forked or cloned or saw
| your fork. _That_ is not intuitive at all.
|
| You can say "Well just consider any data that has ever
| been public compromised forever", and indeed you should,
| but this behavior is still surprising and could bite devs
| even if they know they should follow the advice in that
| quote.
|
| Consider a situation like this...
|
| Dev forks, accidentally pushes a secret or some
| proprietary code in a commit, and immediately deletes the
| fork. They figure it was only up for a very short time,
| now it's gone, risk someone saw it is low. They don't
| bother rotating, because that would be a major
| operational pain (and yes, it _shouldn 't_ be, but for
| many orgs it is).
|
| Is this dev making a mistake? Of course. That's not good
| security thinking. But their assessment of the risk being
| low might actually be correct _if their very reasonable
| mental model of deletion were correct_. But the
| unintuitive way GH works means that the actual risk is
| much higher than their reasoning led them to believe.
| hyperpape wrote:
| > As described in the first example of the article, you
| can make a fork, commit to it, delete your entire fork,
| and yet the data will still be accessible via the parent
| repo, even though no one ever forked or cloned or saw
| your fork. That is not intuitive at all.
|
| But isn't that only the third vulnerability, that private
| forks are implicitly made public?
|
| As I said, I won't defend that decision.
| dogleash wrote:
| > git is based on content addressable storage, so it makes
| sense that anything that is every public will never
| disappear.
|
| No. That doesn't make sense. It only sounds vaguely plausible
| at first because content addressable storage often means a
| distributed system where hosting nodes are controlled by
| multiple parties. That's not the case here, we're only
| talking about one host.
|
| Imagine we were talking about a (hypothetical) NetFlix CDN
| where it's content addressed rather than by UUID. Would
| anyone say "they forgot to check auth tokens for Frozen for
| one day, therefore it makes sense that everyone can watch it
| for free forever"?
| hyperpape wrote:
| Since Netflix neither allows anonymous users to fully
| download Frozen without DRM, nor allows authorized users to
| upload derivative works that are then redistributed to the
| public, I think there may be some relevant differences
| here.
| debugnik wrote:
| They do remove content when their licence expires,
| though. So imagine instead Netflix allowing users to find
| and watch expired series by hash, then telling the
| copyright owners they can't fully delete the series
| because _something something content-addressing._
| dathinab wrote:
| > For the first two, git is based on content addressable
| storage, so it makes sense that anything that is every public
| will never disappear.
|
| this isn't quite right
|
| content addressable storage is just a mean of access it does
|
| - not imply content cannot be deleted
|
| - not imply content cannot be access managed
|
| you could apply this to a git repo itself (like making some
| branches private and some not) but more important forks are
| not git ops, they are more high level github ops and could
| very well have appropriate measurements to make sure this
| cannot happen
|
| e.g. if github had implemented forks like a `git clone` _non
| of this vulnerabilities would have been a thing_
|
| similar implemented different access rights for different
| subsets of fork networks (or even the same git repo)
| technically isn't a problem either (not trivial but quite
| doable)
|
| and I mean commits made to private repositories being public
| is always a security vulnerability no matter how much github
| claims it's intended
| hyperpape wrote:
| You're right that I shouldn't have given the impression
| that content addressed storage means as a technical matter
| that public content must never disappear. The phrasing was
| a bit sloppy. GitHub could, as a technical matter, choose
| to hide content that had previously been made public.
|
| Nonetheless, given that GitHub exists to facilitate both
| anonymously pulling the entire history of the repository,
| and given that any forks would contain the full contents of
| that repository, it is very natural that GitHub would take
| the "once public always public" line.
|
| > and I mean commits made to private repositories being
| public is always a security vulnerability no matter how
| much github claims it's intended
|
| I specifically said the third use case was different,
| because it is the one that doesn't involve you explicitly
| choosing to publish the commits that contain your private
| information. I did not and would not defend GitHub on that
| point.
| keybored wrote:
| > For the first two, git is based on content addressable
| storage, so it makes sense that anything that is ever public
| will never disappear.
|
| No one can, with a straight face, say that they don't
| restrict access because "this is just how the technology
| works". Doesn't matter if it is content addressable or an
| append-only FS or whatever else.
|
| Even for some technology where the data lives forever
| somewhere (it doesn't according to Git; GitHub has a system
| which keeps non-transitively referenced commits from being
| garbage collected), the non-crazy thing is to put access
| policy logic behind the raw storage fetch.
| bladegash wrote:
| Unrelated, but another interesting one is any non-admin
| contributors being able to add (and I believe update) secrets in
| a private repo for use in GH actions. It can't be done via the
| UI, but can be done via the API or VSCode extension.
|
| When I looked into it a while back, apparently it is intended
| behavior, which just seems odd.
| mmsc wrote:
| >This is such an enormous attack vector for all organizations
| that use GitHub that we're introducing a new term: Cross Fork
| Object Reference (CFOR)
|
| Have we stopped naming vulnerabilities cute and fuzzy names and
| started inventing class names instead? Does this have a logo? Has
| this issue been identified anywhere else?
| booi wrote:
| Introducing a new vulnerability... Git Forked(tm)!
|
| chatgpt: Create a logo image of a fork impaling a small gnome
| named "code"
| riiii wrote:
| Much better name.
|
| It's very formally called Cross Fork Object Reference (CFOR).
| But commonly known as Git Forked! (Including the exclamation
| mark).
| agentdrek wrote:
| Clearly a POLA violation (principle of least astonishment)
| hackerbirds wrote:
| Users should never be expected to know these gotchas for a
| feature called "private", documented or not. It's disappointing
| to see GitHub calling it a feature instead of a bug, to me it
| just shows a complete lack of care about security. Privacy
| features should _always_ have a strict, safe default.
|
| In the meantime I'll be calling "private" repos "unlisted", seems
| more appropriate
| chrisandchris wrote:
| Yep, I see GitHub as "public only" hosting, and if I want to
| host something private, I will choose another vendor.
| stvltvs wrote:
| Which vendors work best for private projects?
| tracker1 wrote:
| You could consider GitLab.. though this only seems to
| affect private forks of public repos.
| the8thbit wrote:
| I've used both Bitbucket and Azure in the corporate world.
| OutOfHere wrote:
| The noted issue looks to be applicable to forks only, not to
| all private repos.
| eslaught wrote:
| It also applies to this situation: 1.
| Create a private repo R 2. Create a private fork F
| of R 3. Push commits to the fork F 4. Make
| R public
|
| The commits pushed to F prior to R being made public will
| become de facto public, even though F has always been a
| private fork. The post makes clear that commits pushed to F
| _after_ R is made public are placed into a separate,
| private fork network.
|
| So basically, if you ever intend to open source anything,
| never do it to an existing private repo. Always start a
| from-scratch repo to be the root of your new public
| project.
| dheera wrote:
| Or commit an ecryptfs.
|
| Clone and mount, unmount and commit
| layer8 wrote:
| > I'll be calling "private" repos "unlisted"
|
| The same for "deleted" repos.
| NullPrefix wrote:
| "deleted" is just a fancy word "inaccessible to the user"
| renewiltord wrote:
| To fork private, I always just make a new repo and push to it.
| Looks like that behaves correctly here.
| kemitche wrote:
| Agreed. If anything, github should remove the option to change
| a repo from private to public or vice versa. Force creation of
| a new repo with the correct settings.
| LeifCarrotson wrote:
| IMO, the real vulnerability here is the way the Github Events
| archive exposes the SHA1 hashes of the vulnerable repositories.
| It would be easy to trawl the entire network to access these
| deleted/private repositories, but only because they have a list
| of them.
|
| Similar (but less concerning) is the ability to use short SHA1
| hashes. You'd have to either be targeting a particular repository
| (for example, one for which a malicious actor can expect users to
| follow the tutorial and commit API keys or other private data) or
| be targeting a particular individual with a public repository who
| you suspect might have linked private repositories. It's not free
| to guess something like "07f01e", but not hard either.
|
| If these links still worked exactly the same, but (1) you had to
| guess 07f01e8337c1073d2c45bb12d688170fcd44c637 and (2) there was
| no events API with which to look up that value, this would be
| much, much less impactful.
| SnowflakeOnIce wrote:
| 'git clone --mirror' seems to pull down lots of additional
| content also.
| fortran77 wrote:
| This is why for private and business projects, we don't use
| GitHub, we use Amazon CodeCommit.
| makach wrote:
| The article states that this "vulnerability" might exist in
| other scm systems as well
| swozey wrote:
| Because of literally this issue? I'm not sure if you're doing a
| generic "I don't like github" or know for a fact that
| CodeCommit doesn't have issues like this.
|
| This seems like a terrible security vector but I'm not sure
| migrating thousands of repos out of github vs. training
| engineers to keep public and private repos completely separated
| makes sense and you haven't explained why you use CodeCommit.
|
| Unless it is this reason, which like I said, seems a bit heavy
| handed, but I rarely move private repos to public.
|
| I kind of assumed this was a distributed Git problem, not
| Github, but I don't know.
| ajross wrote:
| Most of this report is just noise. GitHub repos are public.
| Public stuff can be shared. Public stuff shared previously and
| then deleted is "still available", but it was _shared previously_
| and not really subject to security analysis.
|
| The one thing they seem to be able to show is that commits in
| _private_ branches show up in the parent repository if you know
| the SHAs. And that seems like a real vulnerability. But AFAICT it
| also requires that you know the commit IDs, which is not
| something you can get via brute forcing the API. You 'd have to
| combine this with a secondary hole (like the ability to generate
| a git log, or exploiting a tool that lists its commit via ID in
| its own metadata, etc...).
|
| Not nothing, but not "anyone can access private data on GitHub"
| as advertised.
| LoganDark wrote:
| > it also requires that you know the commit IDs, which is not
| something you can get via brute forcing the API
|
| Well, GitHub accepts abbreviations down to as short as four hex
| digits... as long as there's no collision with another commit,
| that's certainly feasible. Even if there is collision, once you
| have the first four characters you can just do a breadth-first
| search
| beezlewax wrote:
| There's a whole section here about how to brute force the
| hashs. You don't even need the full hash... just a shortened
| version using the first few chars.
| poikroequ wrote:
| Microsoft: It's the EUs fault!
|
| Also Microsoft: It's a feature!
| theragra wrote:
| It was known before Microsoft
| makach wrote:
| A "delete" means it should be gone forever from the service it
| was removed from.
|
| "Private" means it should only be available to specific involved
| parties only.
|
| If you implement any other behavior to these concepts you are
| implementing anti patterns.
|
| We need to be precise and consistent in the wording of the
| functions we are providing in order to ensure we easily can
| understand what is going on, without having to interpret
| documentation to be able to fully understand what is going on.
| yread wrote:
| On the positive side this takes care of all those companies
| forking open source software and not contributing back
| kassah wrote:
| In response to the end of the article "it's important to note
| that some of these issues exist on other version control system
| products." I actually have experience helping someone with an
| issue on BitBucket with PII data that you can't rotate.
|
| Once we eliminated the references in the tree and all forks (they
| were all private thankfully), we reached out to BitBucket
| support, and they were able to garbage collect those commits, and
| purge them to the point where even knowing the git hashes they
| were not locatable directly.
| Szpadel wrote:
| even better you can actually commit to other forks if they
| creates pull request to you.
|
| (there is checkbox allowing that when you are opening PR that I
| bet almost noone noticed)
|
| I reported that years ago and all they changed it that they
| extended documentation about this "feature"
|
| my main issue was that you cannot easily revoke this access
| because target repo can always reopen PR and regain write access.
|
| but they basically "stated works as intended"
| tamimio wrote:
| I don't use GitHub for anything serious, rather my own Gitea.
| However:
|
| > Any commits made to your private fork after you make the
| "upstream" repository public are not viewable.
|
| Does that mean a private repo that has never been or will be
| public isn't accessible? That scenario wasn't mentioned.
| fedorareis wrote:
| My understanding is that you are correct. If the repo and all
| of its forks stay private then the only people that would be
| able to view them are people who have permissions to access
| those repos.
| josephscott wrote:
| How much help is turning off the "Allow forking" option
| https://docs.github.com/en/repositories/managing-your-reposi... ?
| WhereIsTheTruth wrote:
| Whoever at Github/Microsoft who doesn't want to solve this
| problem should be jailed
| dathinab wrote:
| commits done to private repose being public (point 2&3) is always
| a non minor security vulnerability IMHO
|
| it doesn't matter if it's behaving as intended or how there are
| forks
|
| also point 1 implies that github likely doesn't properly GCes
| there git which could have all kinds of problematic implications
| beyond the point 1 wrt. purging accidental leaked secrets or
| PI....
|
| all in all it just shows github might not take privacy security
| serious ... which is kinda hilarious given that private repo
| using customers tend to be the paying customers
| keybored wrote:
| You're right that they don't let commits get GC. They jump
| through hoops in order to keep commits that are not
| transitively referenced from being garbage collected. Just
| assume that every commit is kept around for "auditing".
|
| One GitHub employee even contributed a configuration to Git
| which allows you to do the same thing: run a program or feed a
| file which tells the GC what nodes to not traverse.
| keybored wrote:
| People are so preoccupied with putting the code on GitHub. It's
| like it doesn't exist before it's on GitHub.
|
| If you're not gonna share it then it hardly matters. Use a backup
| drive.
|
| Git is distributed. You don't have to put your dotfiles on
| GitHub. Local is enough.
| JohnMakin wrote:
| Your laptop breaks in a way that your disk cannot be recovered.
| Now what? How often are you backing up your disk? Probably much
| easier to type "git commit" and "git push"
| keybored wrote:
| Am I really gonna get interrogated on HN for talking about
| automatic and redundant backup give me a break.
| JohnMakin wrote:
| I wouldn't call the parent comment you're responding to an
| "interrogation" and I'm sorry you perceived it that way.
| You make a pretty extraordinary claim that local disk is
| better than a remote repository for storing/updating code
| for personal work - with no evidence to support this claim
| - so a followup question seems reasonable.
|
| as far as "git is distributed" I don't know if that's the
| case if you keep it purely local, but hey, you seem to have
| it all figured out so good job.
| keybored wrote:
| I thought a person of your background (who no doubt has
| it all figured out) would surmise that I was talking
| about backing up to an external disk and not to another
| disk on the same laptop. And would grant another person
| some good faith and be able to generalize without
| spelling it all out for them: if the point is to back
| things up then maybe I can infer that other means of
| backup are also in the cards, like sneakernet or your own
| server or multiple locations. _Huh_
|
| You can also back up to a remote. That is not GitHub. You
| know because the topic is GitHub and how promiscuous they
| are. Which is why I say: if you don't need your code to
| be "social" you don't need to put it on GitHub.
|
| But even a remote repository is overkill. An automated
| backup plan with git bundle is automatic, after all. Set
| it and forget. And backups are supposed to be automated,
| right? I ask because you have the relevant background
| here.
| JohnMakin wrote:
| > I thought a person of your background (who no doubt has
| it all figured out) would surmise that I was talking
| about backing up to an external disk and not to another
| disk on the same laptop. And would grant another person
| some good faith and be able to generalize without
| spelling it all out for them: if the point is to back
| things up then maybe I can infer that other means of
| backup are also in the cards, like sneakernet or your own
| server or multiple locations. Huh
|
| Your snark not withstanding, I actually did understand
| that an external disk resides outside of the laptop, and
| find your claim still fantastic and lacking evidence.
|
| As for the rest of your post, you'll forgive my
| misunderstanding of whatever _deeply_ nuanced point you
| 're making here regarding backing up to a remote because
| of this at the end of your original post:
|
| > local is enough.
|
| Anyway, seems like you need to take a break. Someone of
| my background has better things to do than engage in a
| flame war with someone clearly looking for a fight over a
| throwaway post.
| imadj wrote:
| They single out GitHub in the title and throughout the entire
| article, only in the very last line they clarify that these
| issues are actually a common design flow in version control
| systems and not limited to GitHub:
|
| > Finally, while our research focused on GitHub, it's important
| to note that some of these issues exist on other version control
| system products
|
| For example, Gitlab only recently solved the issue:
| https://gitlab.com/gitlab-org/gitlab/-/issues/408137
| eezing wrote:
| I'm glad I don't use forks
| thih9 wrote:
| Can this be used to host illegal content? I.e.: fork a popular
| repo, commit a pirated book to the fork, delete the fork, use the
| original repo to access the pirated book?
|
| What would github do after receiving a DMCA request in that case?
| er4hn wrote:
| One can safely assume they will find a way to follow the law
| rather than mumble about technically this is working as
| intended.
| lnrd wrote:
| That looks like the kind of loophole that could get GH to do
| something about this.
| arccy wrote:
| they have the ability to do essentially git gc and drop
| unreachable commits
| devinsewell wrote:
| and people have been yelling at me for refusing to ever use
| github since 2013 lolo
| j-pb wrote:
| Commit hashes are essentially capabilities, you should be able to
| access any data that you have a capability for. But allowing
| access via a 16bit prefix is just idiotic, and equivalent to
| accepting just the first two bytes of a 256bit cryptographic
| signature...
| nostrademons wrote:
| Cool, another way to access youtube-dl next time it gets deleted
| from GitHub.
| madewulf wrote:
| In fact, there is a process to request complete removal of data,
| but it involves sending an email that will be reviewed by github
| staff: https://docs.github.com/en/site-policy/content-removal-
| polic...
|
| On the other hand, once an API key or password has been published
| somewhere, you should rotate it anyway.
| riedel wrote:
| I was wondering, how they can otherwise comply with
| legislation. Makes sense there is a way to do this e.g. in case
| of valid GDPR, DMCA, etc. cases.
| midtake wrote:
| Just rebase/squash everything.
| josephcsible wrote:
| How is this more of a vulnerability than the existence of sites
| like archive.org is? Isn't it just a fact of the Internet that
| once you make something public, you can't fully take it back
| later?
| bogwog wrote:
| Because private forks are not meant to be public
| debugnik wrote:
| The third case in the article shows private forks being leaked
| publicly when the upstream goes public.
|
| The other two cases are indeed not worse than third-party
| archival, but they're still socially concerning. When you ask
| your own host to delete something you uploaded, you don't
| expect them to ignore you just because someone could have
| already archived it maybe. Making it harder to find can still
| be valuable; not all archives stay available forever, if any.
| galkk wrote:
| I won't be surprised if "right to be forgotten"/GDPR abusers will
| spam github and force them to act on it, eventually.
|
| ----
|
| This is clearly documented and can be explained even to non-
| technical managers.
|
| From my POV calling that vulnerability is trying to build a hype.
|
| I think that having quote from here on visibility changing
| settings page would be even more clear:
| https://docs.github.com/en/pull-requests/collaborating-with-...
| crvdgc wrote:
| I think the first two points are a result of private data
| (commit/fork/issue) being able to refer to public data without
| making the reference public.
|
| Say a private commit depends on a public commit C. Suppose in the
| public repo, the branch containing C gets deleted and C is no
| longer reachable from the root. From the public repo's point-of-
| view, C can be garbage-collected, but GitHub must keep it alive,
| otherwise the deletion will break the private commit.
|
| It would be "a spooky action at a distance" from the private
| repo's POV. Since the data was at a time public, the private repo
| could have just backed up everything. In fact, if that's the
| case, everyone _should always_ backup everything. GitHub
| retaining the commit achieves the same effect.
|
| The public repo's owner can't prevent this breakage even if they
| want to, because there's no way to know the existence of this
| dependency.
|
| The security issue discussed in the post is a different scenario,
| where the public repo's owner wants to break the dependency
| (making the commit no longer accessible). That would put too much
| of a risk for anyone to depend on any public code.
|
| My mental model is that all commits ever submitted to GitHub will
| live forever and if it's public at one time, then it will always
| be publicly accessible via its commit hash.
| ahpook wrote:
| Hubber here (same username on github.com). We in GitHub's OSPO
| have been working on an open source GitHub App to address the use
| case where organizations want to keep a private mirror of an
| upstream public fork so they can review code and remove
| IP/secrets/keys that get committed and squash history before any
| of those changes are made public. Getting a beta release this
| week, in fact - check it out, I'm curious what yall think about
| the approach
|
| https://github.com/github-community-projects/private-mirrors
___________________________________________________________________
(page generated 2024-07-24 23:00 UTC)