[HN Gopher] Anyone can access deleted and private repository dat...
       ___________________________________________________________________
        
       Anyone can access deleted and private repository data on GitHub
        
       Author : __0x1__
       Score  : 733 points
       Date   : 2024-07-24 18:24 UTC (4 hours ago)
        
 (HTM) web link (trufflesecurity.com)
 (TXT) w3m dump (trufflesecurity.com)
        
       | TazeTSchnitzel wrote:
       | This is not new. Many people have noticed this before, e.g.
       | https://hikari.noyu.me/blog/2020-05-05-github-private-repos-...
        
       | andersa wrote:
       | I reported this on their HackerOne many years ago (2018 it seems)
       | and they said it was working as intended. Conclusion: don't use
       | private forks. Copy the repository instead.
       | 
       | Here is their full response from back then:
       | 
       | > Thanks for the submission! We have reviewed your report and
       | validated your findings. After internally assessing the finding
       | we have determined it is a known low risk issue. We may make this
       | functionality more strict in the future, but don't have anything
       | to announce now. As a result, this is not eligible for reward
       | under the Bug Bounty program.
       | 
       | > GitHub stores the parent repository along with forks in a
       | "repository network". It is a known behavior that objects from
       | one network member are readable via other network members. Blobs
       | and commits are stored together, while refs are stored separately
       | for each fork. This shared storage model is what allows for pull
       | requests between members of the same network. When a repository's
       | visibility changes (Eg. public->private) we remove it from the
       | network to prevent private commits/blobs from being readable via
       | another network member.
        
         | tedivm wrote:
         | I reported a different security issue to github, and they
         | responded the same (although they ultimately ended up fixing it
         | when I told them I was going to blog about the "intended
         | behavior").
        
           | myfonj wrote:
           | What "intended behaviour" was that, specifically?
        
         | liendolucas wrote:
         | Honest question. Submitting these types of bugs only to get a:
         | "we have determined it is known low risk issue..." seems like
         | they really don't want to pay for someone else's time and
         | dedication in making their product safer. If they knew about
         | this, was this disclosed somewhere? If not I don't see them
         | playing a fair game. What's the motivation to do this if in the
         | end they can have the final decision to award you or not? To me
         | it looks like similar to what happens with Google Play/Apple
         | store to decide whether or not an app can be
         | uploaded/distributed through them.
         | 
         | Edit: I popped this up because to me is absolutely miserable
         | from a big company to just say: "Thanks, but we were aware of
         | this".
        
           | kayodelycaon wrote:
           | As the article pointed out, GitHub already publicly
           | documented this vulnerability.
           | 
           | My employer doesn't pay out for known security issues,
           | especially if we have mitigating controls.
           | 
           | A lot of people spam us with vulnerability reports from
           | security tools we already use. At least half of them turn out
           | to be false positives we are already aware of. In my opinion,
           | running a bug bounty program at all is a net negative for us.
           | We aren't large enough to get the attention of anyone
           | competent.
        
             | ipaddr wrote:
             | For both sides it turns into a net negative. Better to keep
             | your bugs and use them when needed or sell them to others
             | to use if possible.
             | 
             | Lets get back to what we had before when multiple people
             | can find the same bug and exploit if needed. Now we have
             | the one person who finds the bug it gets patched and they
             | don't get paid.
        
             | giobox wrote:
             | > As the article pointed out, GitHub already publicly
             | documented this vulnerability.
             | 
             | I'm honestly not yet convinced that is enough here - I've
             | fallen victim to this without realizing it - the behaviour
             | here is so far removed from how I suspect most user's
             | mental model of github.com works. For me none of the
             | exposed data is sensitive, but the point remains I was
             | totally unawares it would be retrievable like this.
             | 
             | If the behaviour flies so against the grain, just
             | publishing it in a help doc is not enough I'd argue. The
             | linked article makes the exact same argument:
             | 
             | > "The average user views the separation of private and
             | public repositories as a security boundary, and
             | understandably believes that any data located in a private
             | repository cannot be accessed by public users.
             | Unfortunately, as we documented above, that is not always
             | true. Whatsmore, the act of deletion implies the
             | destruction of data. As we saw above, deleting a repository
             | or fork does not mean your commit data is actually
             | deleted."
        
               | tptacek wrote:
               | The problem with this line of argument is that the
               | fundamental workings of git are also surprising to
               | people, such that they routinely attempt to address
               | mistaken hazmat commits by simple reverts. If at bottom
               | this whole story is just that git is treacherous, well,
               | yeah, but not news.
               | 
               | There's a deeper problem here, which is that making the
               | UX on hosting sites less surprising doesn't fix the
               | underlying problem. There is a best-practices response to
               | commiting hazmat to a repository: revoke the hazmat, so
               | that its disclosure no long matters. _You have to do this
               | anyways._ If you can 't, you should be in contact with
               | Github directly to remove it.
        
               | Cpoll wrote:
               | Is "git" relevant here? Forking isn't a git concept, and
               | none of this behaviour has much to do with git; it's all
               | GitHub.
               | 
               | Also, you can revoke an API key, but you can't revoke a
               | company-proprietary algorithm that you implemented into a
               | fork of a public project.
        
               | tptacek wrote:
               | Like I said: if you can't revoke the thing you committed,
               | you need to get in touch with Github and have them remove
               | it. That's a thing they do.
        
           | jonahx wrote:
           | Not defending GH here (their position is indefensible imo)
           | but, as the article notes, they document these behaviors
           | clearly and publicly:
           | 
           | https://docs.github.com/en/pull-requests/collaborating-
           | with-...
           | 
           | I don't think they're being underhanded exactly... they're
           | just making a terrible decision. Quoting from the article:
           | 
           | > The average user views the separation of private and public
           | repositories as a security boundary, and understandably
           | believes that any data located in a private repository cannot
           | be accessed by public users. Unfortunately, as we documented
           | above, that is not always true. Whatsmore, the act of
           | deletion implies the destruction of data. As we saw above,
           | deleting a repository or fork does not mean your commit data
           | is actually deleted.
        
             | andersa wrote:
             | Based on some (admittedly not very thorough) search, this
             | documentation was posted in 2021, three years after my
             | report.
        
               | YetAnotherNick wrote:
               | But that would still means they didn't intend to fix it,
               | hence not giving bounty is fair.
        
               | malfist wrote:
               | It's a bug bounty, not a "only if we have time to fix it"
               | bounty.
               | 
               | He found a security problem, they decided not to act on
               | it, but it was still an acknowledged security problem
        
               | madeofpalk wrote:
               | The point of a bug bounty is for companies to find new
               | security problems.
               | 
               | If the (class of) problem is already known, it's not
               | worth rewarding.
        
               | berdario wrote:
               | I can see this argument making a bit of sense, but if
               | they documented this 3 years after the issue was
               | reported, they don't have a way to demonstrate that they
               | truly already knew.
               | 
               | At the end it boils down to: is Github being honest and
               | fair in answering the bug bounty reports?
               | 
               | If you think it is, cool.
               | 
               | If you don't, maybe it's not worth playing ball with
               | Github's bug bounty process
        
               | tptacek wrote:
               | It doesn't matter if they knew. If they don't deem it a
               | security vulnerability --- and they have put their money
               | where their mouth is, by _documenting it as part of the
               | platform behavior_ --- it 's not eligible for a payout.
               | It can be a bug, but if it's not the kind of bug the
               | bounty program is designed to address, it's not getting
               | paid out. The incentives you create by paying for every
               | random non-vulnerability are really bad.
               | 
               | The subtext of this thread is that companies should
               | reward any research that turns up surprising or user-
               | hostile behavior in products. It's good to want things.
               | But that is not the point of a security bug bounty.
        
               | cycomanic wrote:
               | I would argue that even if the behaviour was as intended,
               | at least the fact that it was not documented was a bug
               | (and a pretty serious one at that).
        
               | andrewinardeer wrote:
               | If a renown company won't pay a bug bounty, a foreign
               | government often will.
        
               | madeofpalk wrote:
               | Why would a foreign government pay for a commonly known
               | security limitation of a product?
        
               | prepend wrote:
               | Good luck selling this to a foreign (or domestic)
               | government. It doesn't seem valuable to me, but who
               | knows, maybe someone finds it worth payout.
        
               | coldtea wrote:
               | > _It 's a bug bounty, not a "only if we have time to fix
               | it" bounty_
               | 
               | It's only a bug if it's not intended
        
             | jowea wrote:
             | Shouldn't that be on the config page for the repo below the
             | "private" button with a note saying private is not actually
             | private if it's a fork? And ditto for delete?
        
           | 93po wrote:
           | companies vary wildly in their honesty and cooperation with
           | bug bounties and develop reputations as a result. if they
           | have a shit reputation, people stop doing free work for them
           | and instead focus on more honest companies
        
           | andersa wrote:
           | I didn't find anything mentioning it online at the time. But
           | there wasn't much time and dedication involved either, to be
           | fair. I discovered it completely on accident when I combined
           | a commit hash from my local client with the wrong repository
           | url and it ended up working.
        
           | cyrnel wrote:
           | Security disclosures are like giving someone an unsolicited
           | gift. The receiver is obligated to return the favor.
           | 
           | But if you buy someone non-refundable tickets to a concert
           | they already have tickets for, you aren't owed compensation.
        
           | nyrikki wrote:
           | For moral reasons, historically I never wrote POCs or
           | threatened disclosure.
           | 
           | For companies like Microsoft, which a CSRB audit showed that
           | their security culture 'inadequate', the risk of disclosure
           | with a POC is about the only tool we have to enforce their
           | side of the Shared Responsibility Model.
           | 
           | Even the largest IT spender in the world, the US government
           | has moved more from the carrot to the stick model. If they
           | have to do it so do we.
           | 
           | Unfortunately as publishing a 'bad practices' list by us
           | doesn't invoke the risk of EULA busting gross negligence
           | claims, responsible disclosure is one of the few tools we
           | have.
        
           | tptacek wrote:
           | No large company running a bug bounty cares one iota about
           | stiffing you on a bounty payment. The teams running this
           | programs are internally incentivized to _maximize_ payouts;
           | the payouts are evidence that the system is working. If you
           | 're denied a payment --- for a large company, at least ---
           | there's something else going on.
           | 
           | The thing to keep in mind is that large-scale bug bounty
           | programs make their own incentive weather. People game the
           | hell out of them. If you ack and fix sev:info bugs, people
           | submit _lots_ more sev:info bugs, and now your security
           | program has been reoriented around the dumbest bugs --- the
           | opposite of what you want a bounty program to do.
        
           | hluska wrote:
           | The issue had been reported at least twice and was clearly
           | documented. GitHub knew about this and had known for years.
           | Their replies to the two notifications were even very
           | similar.
           | 
           | GitHub clearly knew. Would you prefer that a vendor lie?
        
         | kayodelycaon wrote:
         | What does "private fork" mean in this context? I created a fork
         | of a project by cloning it to my own machine and set origin to
         | an empty private repository on GitHub. I manually merge
         | upstream changes on my machine.
         | 
         | Is my repository accessible?
        
           | andersa wrote:
           | No, that would be the "copy the repository" approach. Private
           | fork is when you do it through their UI.
           | 
           | As far as I know, it is not accessible.
        
           | swozey wrote:
           | Because you never git pushed to the fork it's not aware of
           | your repo, you're ok.
           | 
           | What I don't know is if in 3 months you DO set your remote
           | origin to that fork to for instance, pull upstream patches
           | into your private repo, you're still not pushing, only
           | pulling, so I would THINK they'd still never get your
           | changes, but I don't know if git does some sort of log sync
           | when you do a pull as well.
           | 
           | Maybe that would wind up having the commit hash available.
        
           | masklinn wrote:
           | It's not. The feature here works because a network of forks
           | known by GitHub has a unified storage, that's what makes
           | things like PRs work transparently and keep working if you
           | delete the fork (kinda, it closes the PR but the contents
           | don't change).
        
           | dathinab wrote:
           | then it's fine
           | 
           | the issue is the `fork` mechanism of github is not
           | semantically like a `git clone`
           | 
           | it's more like creating a larger git repo in which all forks
           | weather private or not are contained and which doesn't
           | properly implement access management (at least point 2&3
           | wouldn't be an issue if they did)
           | 
           | there are also some implications form point 1 that forks do
           | in some way infer with gc-ing orphan commits (e.g. the non
           | synced commits in he deleted repo in point 1) at least that
           | should be a bug IMHO one which also costs them storage
           | 
           | (also to be clear for me 2&3 are security vulnerabilities no
           | matter if they are classified as intended behavior)
        
         | jeremyjh wrote:
         | It would not even be that hard to fix it; private forks should
         | always just be automatically copied on first write. You might
         | lose your little link to the original repo, but that's not as
         | bad as unintentionally exposing all your future content.
        
           | sundalia wrote:
           | Yup, we can close the thread and ack that GitHub does not
           | care.
        
         | fullstackchris wrote:
         | To be fair, in the true git sense, if a "fork" is really just a
         | branch, deleting the original completely would also mean
         | deleting every branch (fork) completely
         | 
         | obviously not a fan of this policy though
        
         | SnowflakeOnIce wrote:
         | There seems to be no such thing as a "private fork" on GitHub
         | in 2024 [1]:
         | 
         | > A fork is a new repository that shares code and visibility
         | settings with the upstream repository. All forks of public
         | repositories are public. You cannot change the visibility of a
         | fork.
         | 
         | [1] https://docs.github.com/en/pull-requests/collaborating-
         | with-...
        
           | Manuel_D wrote:
           | Not through the GitHub interface, no. But you can copy all
           | files in a repository and create a new repository. IIRC
           | there's a way to retain the history via this process as well.
        
             | make3 wrote:
             | That's not the GitHub concept / almost trademark of "fork"
             | anymore though, which is what your parent was talking about
        
             | a1o wrote:
             | I mean it's git, just git init, git remote add for origin
             | and upstream, origin pointing to your private, git fetch
             | upstream, git push to origin.
        
             | mckn1ght wrote:
             | You can create a private repository on GitHub, clone it
             | locally, add the repo being "forked" from as a separate git
             | remote (I usually call this one "upstream" and my "fork",
             | well, "fork"), fetch and pull from upstream, then push to
             | fork.
        
             | shkkmo wrote:
             | All you should have to do is just clone the repo locally
             | and then create a blank GitHub repository, set it as the/a
             | remote and push to it.
        
             | JyB wrote:
             | That's beside the point. The article is specifically about
             | << GitHub forks >> and their shortcomings. It's unrelated
             | to pushing to distinct repositories not magically 'linked'
             | by the GH << fork feature >>.
        
       | einpoklum wrote:
       | Data that you place with an entity that is a large organization
       | with many commercial and government ties - must be assumed to be
       | accessible to some of those parties.
       | 
       | And if that entity has a complex system of storage and retrieval
       | of data by and for many users, that changes frequently, without
       | public scrutiny - it should be assumed that data breaches are
       | likely to occur.
       | 
       | So I don't see it as very problematic that GitHub's private
       | repositories, or deleted repositories, are only kind-sorta-
       | sometimes private and deleted.
       | 
       | And it's silly that the article refers to one creating an
       | "internal version" of a repository - on GitHub....
       | 
       | Still, interesting to know about the network-of-repositories
       | concept.
        
       | cxr wrote:
       | > The implication here is that any code committed to a public
       | repository may be accessible forever
       | 
       | That's exactly how you should treat anything made available to
       | the public (and there's no need for the subsequent qualifier that
       | appears in the article--" _as long as there is at least one fork
       | of that repository_ ").
        
         | ilikehurdles wrote:
         | Sometimes I wonder if all the security features GitHub slathers
         | on top of `git` lull people into a false sense of security when
         | fundamentally they're working in a fully distributed version
         | control system with no centralized authority. If your key is
         | leaked the solution is to invalidate the key not just
         | synthetically alter your version of history to pretend it never
         | happened.
        
           | b800h wrote:
           | This is more of a problem if you leak private information
           | with a commit by accident. You can't really revoke that.
        
             | kemitche wrote:
             | You can't reach out to any machines that have pulled down
             | that commit and forcibly delete it, either.
        
       | miguelaeh wrote:
       | Wow. This is wild!
        
       | haneul wrote:
       | Does any variant of this apply to DMCA'd repos in the repo
       | network?
       | 
       | For example if the root repo is DMCA'd, or, if repo B forks repo
       | A, then B adds some stuff that causes B to get DMCA'd. Can A
       | still access B?
        
         | richbell wrote:
         | I believe the entire network is suspended.
        
           | haneul wrote:
           | A downstream dmca suspends the upstream? That astonishes me.
           | Anyone down to shut down react?
        
       | lilyball wrote:
       | Really the only semi-interesting part of this is "if you make a
       | private repo public, data from other private forks might be
       | discoverable", but even that seems pretty minor, and the best
       | practice for taking private repos public is to copy the data into
       | a new repo anyway.
        
         | zelphirkalt wrote:
         | Is that a best practice in hindsight, or because it was known
         | to some, that this issue exists, or for what other reason do
         | you consider it a best practice? Git history?
        
           | lilyball wrote:
           | When making a private repo public, there's a high chance that
           | there was stuff in the private repo that isn't necessarily ok
           | to make public. It's a lot easier to just create a new public
           | repo containing all the data you want to make public than it
           | is to reliably scrub a private repo of any data that
           | shouldn't be there.
           | 
           | More generally, you probably want to construct a new history
           | for the public repo anyway, so you'll want a brand new repo
           | to ensure none of the scrubbed history is accessible.
        
         | xmodem wrote:
         | Even after a private repo is made public, it's common practice
         | for new functionality to be worked on in private until it's
         | ready.
        
         | HL33tibCe7 wrote:
         | You've completely missed the most dangerous thing mentioned,
         | namely that private forks are not private.
        
       | hmottestad wrote:
       | The biggest gotcha here is probably that if you start of with a
       | private repo and a private fork, making the repo public also
       | makes the fork "public".
       | 
       | GitHub may very well say that this is working as intended, but if
       | it truly is then you should be forced to make both the repo and
       | fork public at the same time.
       | 
       | Essentially "Making repo R public will make the following forks
       | public as well 'My Fork', 'Super secret fork', 'Fork that I
       | deleted because it contained the password to my neighbours wifi
       | :P'.
       | 
       | OK. I'm not sure if the last one would actually be public, but I
       | wouldn't be surprised if that was "Working as intended(TM)" -
       | GitHub SecOps
        
         | pants2 wrote:
         | Any time you make a private repo public it's best to just copy
         | that code into a new public repo and leave the private repo
         | private. Otherwise have to audit every previous commit and
         | every commit on every fork of your private code.
        
           | umpalumpaaa wrote:
           | If I understand the issue correctly if you make the original
           | repo public any private forks from other users are also
           | effectively public. Right?
        
         | kemitche wrote:
         | I agree. The other cases may be mildly surprising, but
         | ultimately fall firmly into the category of "once public on the
         | internet, always public." Deleting a repo or fork or commit
         | doesn't revoke an access key that was accidentally committed,
         | and an access key being public for even a microsecond should be
         | assumed to have been scraped and usable by a malicious actor.
        
       | rvz wrote:
       | Come on, this is not surprising.
       | 
       | "Private repositories" were never private as I said before. [0]
       | 
       | [0] https://news.ycombinator.com/item?id=23057769
        
         | qual wrote:
         | > _Come on, this is not surprising._
         | 
         | Very cool that it is not surprising to you.
         | 
         | But to others (some are even in this thread!) it is both new
         | and surprising. They unfortunately missed your 4 year old
         | comment, but at least they get to learn it now.
        
       | londons_explore wrote:
       | This isn't a bug IMO.
       | 
       | If you know the hash of some data, then you either already have
       | the data yourself, or you learned the hash from someone who had
       | the data.
       | 
       | If you already have the data, there is no vulnerability - since
       | you cannot learn anything you don't already have.
       | 
       | If you got the hash from someone, you could likewise have gotten
       | the data from them.
       | 
       | People do need to be aware that 'some random hex string' in fact
       | is the irrevocable key to all the data behind that hash - but
       | that's kinda inherent to gits design. Just like I don't tell
       | everyone here on HN my login password - the password itself isn't
       | sensitive, but both of us know it accesses other things that are.
       | 
       | If github itself was leaking the hash of deleted data, or my
       | plaintext password, then _that_ would be a vulnerability.
        
         | jkaptur wrote:
         | That's counterintuitive, though - often, the whole point of a
         | hash is that it's one-way.
        
         | haneul wrote:
         | > If you know the hash of some data, then you either already
         | have the data yourself, or you learned the hash from someone
         | who had the data.
         | 
         | Don't think so - the article mentions you can use the short
         | prefix on GitHub, so you have a search space of 65536.
        
         | qual wrote:
         | > _If you know the hash of some data, then you either already
         | have the data yourself, or you learned the hash from someone
         | who had the data._
         | 
         | From the article, you do not need to have the data nor learn
         | the hash from someone who had the data.
         | 
         | > _Commit hashes can be brute forced through GitHub's UI,
         | particularly because the git protocol permits the use of short
         | SHA-1 values when referencing a commit. A short SHA-1 value is
         | the minimum number of characters required to avoid a collision
         | with another commit hash, with an absolute minimum of 4. The
         | keyspace of all 4 character SHA-1 values is 65,536_
        
           | londons_explore wrote:
           | In which case, yeah, thats a vulnerability. They shouldn't
           | allow a short hash to match up against anything but public
           | data.
        
             | gus_massa wrote:
             | It's common to use short hash in pull request, and then
             | modify or rebase the commits.
             | 
             | The solutions are:
             | 
             | * Force people to use the full hash.
             | 
             | * Get use to a lot of dead links.
             | 
             | * Claim that it's a feature, not a bug.
        
               | guipsp wrote:
               | * Force people to use the full hash for commits pushed
               | now on?
        
         | Aurornis wrote:
         | > If you know the hash of some data, then you either already
         | have the data yourself, or you learned the hash from someone
         | who had the data.
         | 
         | You need to read to the end of the article where they show the
         | brute-force way of getting the hashes.
        
         | refulgentis wrote:
         | Read TFA.
        
       | jonahx wrote:
       | Surprised at the comments minimizing this.
       | 
       | I've used github for a long time, would not have expected these
       | results, and was unnerved by them.
       | 
       | I'd recommend reading the article yourself. It does a good job
       | explaining the vulnerabilities.
        
         | hyperpape wrote:
         | For the first two, git is based on content addressable storage,
         | so it makes sense that anything that is ever public will never
         | disappear.
         | 
         | I can sympathize with someone who gets bit by it, as it might
         | not have occurred to them, but it's part of the model.
         | 
         | The third strikes me as counter-intuitive and hard to reason
         | about.
         | 
         | P.S. If you publish your keys or access tokens for well known
         | services to GitHub and you are prominent enough, they will be
         | found and exploited in minutes. The idea that deleting the
         | repository is a security measure is not really worth taking
         | seriously.
        
           | jonahx wrote:
           | I agree the 3rd is by far the worst of the offenders. But
           | even the first two should have more visibility. For example,
           | by notifying users during deletion of forked repos that data
           | will still be available.
           | 
           | The exact UX here is debatable, but I don't think security
           | warnings buried in the docs is enough. They should be
           | accounting for likely misunderstandings of the model.
        
             | hyperpape wrote:
             | Even if it wasn't forked, it could be cloned. Should that
             | be part of the warning?
             | 
             | I wouldn't mind a disclaimer when you delete a repository
             | that any information that repository ever contained is
             | likely to have already been downloaded and stored. Per the
             | comment I added, I'm not sure it would really help that
             | much, but it would not be harmful.
        
               | jonahx wrote:
               | > Should that be part of the warning?
               | 
               | It couldn't hurt, but that isn't the misunderstanding I'm
               | worried about.
               | 
               | As described in the first example of the article, you can
               | make a fork, commit to it, delete _your entire fork_ ,
               | and yet the data will still be accessible via the parent
               | repo, even though no one ever forked or cloned or saw
               | your fork. _That_ is not intuitive at all.
               | 
               | You can say "Well just consider any data that has ever
               | been public compromised forever", and indeed you should,
               | but this behavior is still surprising and could bite devs
               | even if they know they should follow the advice in that
               | quote.
               | 
               | Consider a situation like this...
               | 
               | Dev forks, accidentally pushes a secret or some
               | proprietary code in a commit, and immediately deletes the
               | fork. They figure it was only up for a very short time,
               | now it's gone, risk someone saw it is low. They don't
               | bother rotating, because that would be a major
               | operational pain (and yes, it _shouldn 't_ be, but for
               | many orgs it is).
               | 
               | Is this dev making a mistake? Of course. That's not good
               | security thinking. But their assessment of the risk being
               | low might actually be correct _if their very reasonable
               | mental model of deletion were correct_. But the
               | unintuitive way GH works means that the actual risk is
               | much higher than their reasoning led them to believe.
        
               | hyperpape wrote:
               | > As described in the first example of the article, you
               | can make a fork, commit to it, delete your entire fork,
               | and yet the data will still be accessible via the parent
               | repo, even though no one ever forked or cloned or saw
               | your fork. That is not intuitive at all.
               | 
               | But isn't that only the third vulnerability, that private
               | forks are implicitly made public?
               | 
               | As I said, I won't defend that decision.
        
           | dogleash wrote:
           | > git is based on content addressable storage, so it makes
           | sense that anything that is every public will never
           | disappear.
           | 
           | No. That doesn't make sense. It only sounds vaguely plausible
           | at first because content addressable storage often means a
           | distributed system where hosting nodes are controlled by
           | multiple parties. That's not the case here, we're only
           | talking about one host.
           | 
           | Imagine we were talking about a (hypothetical) NetFlix CDN
           | where it's content addressed rather than by UUID. Would
           | anyone say "they forgot to check auth tokens for Frozen for
           | one day, therefore it makes sense that everyone can watch it
           | for free forever"?
        
             | hyperpape wrote:
             | Since Netflix neither allows anonymous users to fully
             | download Frozen without DRM, nor allows authorized users to
             | upload derivative works that are then redistributed to the
             | public, I think there may be some relevant differences
             | here.
        
               | debugnik wrote:
               | They do remove content when their licence expires,
               | though. So imagine instead Netflix allowing users to find
               | and watch expired series by hash, then telling the
               | copyright owners they can't fully delete the series
               | because _something something content-addressing._
        
           | dathinab wrote:
           | > For the first two, git is based on content addressable
           | storage, so it makes sense that anything that is every public
           | will never disappear.
           | 
           | this isn't quite right
           | 
           | content addressable storage is just a mean of access it does
           | 
           | - not imply content cannot be deleted
           | 
           | - not imply content cannot be access managed
           | 
           | you could apply this to a git repo itself (like making some
           | branches private and some not) but more important forks are
           | not git ops, they are more high level github ops and could
           | very well have appropriate measurements to make sure this
           | cannot happen
           | 
           | e.g. if github had implemented forks like a `git clone` _non
           | of this vulnerabilities would have been a thing_
           | 
           | similar implemented different access rights for different
           | subsets of fork networks (or even the same git repo)
           | technically isn't a problem either (not trivial but quite
           | doable)
           | 
           | and I mean commits made to private repositories being public
           | is always a security vulnerability no matter how much github
           | claims it's intended
        
             | hyperpape wrote:
             | You're right that I shouldn't have given the impression
             | that content addressed storage means as a technical matter
             | that public content must never disappear. The phrasing was
             | a bit sloppy. GitHub could, as a technical matter, choose
             | to hide content that had previously been made public.
             | 
             | Nonetheless, given that GitHub exists to facilitate both
             | anonymously pulling the entire history of the repository,
             | and given that any forks would contain the full contents of
             | that repository, it is very natural that GitHub would take
             | the "once public always public" line.
             | 
             | > and I mean commits made to private repositories being
             | public is always a security vulnerability no matter how
             | much github claims it's intended
             | 
             | I specifically said the third use case was different,
             | because it is the one that doesn't involve you explicitly
             | choosing to publish the commits that contain your private
             | information. I did not and would not defend GitHub on that
             | point.
        
           | keybored wrote:
           | > For the first two, git is based on content addressable
           | storage, so it makes sense that anything that is ever public
           | will never disappear.
           | 
           | No one can, with a straight face, say that they don't
           | restrict access because "this is just how the technology
           | works". Doesn't matter if it is content addressable or an
           | append-only FS or whatever else.
           | 
           | Even for some technology where the data lives forever
           | somewhere (it doesn't according to Git; GitHub has a system
           | which keeps non-transitively referenced commits from being
           | garbage collected), the non-crazy thing is to put access
           | policy logic behind the raw storage fetch.
        
       | bladegash wrote:
       | Unrelated, but another interesting one is any non-admin
       | contributors being able to add (and I believe update) secrets in
       | a private repo for use in GH actions. It can't be done via the
       | UI, but can be done via the API or VSCode extension.
       | 
       | When I looked into it a while back, apparently it is intended
       | behavior, which just seems odd.
        
       | mmsc wrote:
       | >This is such an enormous attack vector for all organizations
       | that use GitHub that we're introducing a new term: Cross Fork
       | Object Reference (CFOR)
       | 
       | Have we stopped naming vulnerabilities cute and fuzzy names and
       | started inventing class names instead? Does this have a logo? Has
       | this issue been identified anywhere else?
        
         | booi wrote:
         | Introducing a new vulnerability... Git Forked(tm)!
         | 
         | chatgpt: Create a logo image of a fork impaling a small gnome
         | named "code"
        
           | riiii wrote:
           | Much better name.
           | 
           | It's very formally called Cross Fork Object Reference (CFOR).
           | But commonly known as Git Forked! (Including the exclamation
           | mark).
        
       | agentdrek wrote:
       | Clearly a POLA violation (principle of least astonishment)
        
       | hackerbirds wrote:
       | Users should never be expected to know these gotchas for a
       | feature called "private", documented or not. It's disappointing
       | to see GitHub calling it a feature instead of a bug, to me it
       | just shows a complete lack of care about security. Privacy
       | features should _always_ have a strict, safe default.
       | 
       | In the meantime I'll be calling "private" repos "unlisted", seems
       | more appropriate
        
         | chrisandchris wrote:
         | Yep, I see GitHub as "public only" hosting, and if I want to
         | host something private, I will choose another vendor.
        
           | stvltvs wrote:
           | Which vendors work best for private projects?
        
             | tracker1 wrote:
             | You could consider GitLab.. though this only seems to
             | affect private forks of public repos.
        
             | the8thbit wrote:
             | I've used both Bitbucket and Azure in the corporate world.
        
           | OutOfHere wrote:
           | The noted issue looks to be applicable to forks only, not to
           | all private repos.
        
             | eslaught wrote:
             | It also applies to this situation:                   1.
             | Create a private repo R         2. Create a private fork F
             | of R         3. Push commits to the fork F         4. Make
             | R public
             | 
             | The commits pushed to F prior to R being made public will
             | become de facto public, even though F has always been a
             | private fork. The post makes clear that commits pushed to F
             | _after_ R is made public are placed into a separate,
             | private fork network.
             | 
             | So basically, if you ever intend to open source anything,
             | never do it to an existing private repo. Always start a
             | from-scratch repo to be the root of your new public
             | project.
        
           | dheera wrote:
           | Or commit an ecryptfs.
           | 
           | Clone and mount, unmount and commit
        
         | layer8 wrote:
         | > I'll be calling "private" repos "unlisted"
         | 
         | The same for "deleted" repos.
        
           | NullPrefix wrote:
           | "deleted" is just a fancy word "inaccessible to the user"
        
       | renewiltord wrote:
       | To fork private, I always just make a new repo and push to it.
       | Looks like that behaves correctly here.
        
         | kemitche wrote:
         | Agreed. If anything, github should remove the option to change
         | a repo from private to public or vice versa. Force creation of
         | a new repo with the correct settings.
        
       | LeifCarrotson wrote:
       | IMO, the real vulnerability here is the way the Github Events
       | archive exposes the SHA1 hashes of the vulnerable repositories.
       | It would be easy to trawl the entire network to access these
       | deleted/private repositories, but only because they have a list
       | of them.
       | 
       | Similar (but less concerning) is the ability to use short SHA1
       | hashes. You'd have to either be targeting a particular repository
       | (for example, one for which a malicious actor can expect users to
       | follow the tutorial and commit API keys or other private data) or
       | be targeting a particular individual with a public repository who
       | you suspect might have linked private repositories. It's not free
       | to guess something like "07f01e", but not hard either.
       | 
       | If these links still worked exactly the same, but (1) you had to
       | guess 07f01e8337c1073d2c45bb12d688170fcd44c637 and (2) there was
       | no events API with which to look up that value, this would be
       | much, much less impactful.
        
         | SnowflakeOnIce wrote:
         | 'git clone --mirror' seems to pull down lots of additional
         | content also.
        
       | fortran77 wrote:
       | This is why for private and business projects, we don't use
       | GitHub, we use Amazon CodeCommit.
        
         | makach wrote:
         | The article states that this "vulnerability" might exist in
         | other scm systems as well
        
         | swozey wrote:
         | Because of literally this issue? I'm not sure if you're doing a
         | generic "I don't like github" or know for a fact that
         | CodeCommit doesn't have issues like this.
         | 
         | This seems like a terrible security vector but I'm not sure
         | migrating thousands of repos out of github vs. training
         | engineers to keep public and private repos completely separated
         | makes sense and you haven't explained why you use CodeCommit.
         | 
         | Unless it is this reason, which like I said, seems a bit heavy
         | handed, but I rarely move private repos to public.
         | 
         | I kind of assumed this was a distributed Git problem, not
         | Github, but I don't know.
        
       | ajross wrote:
       | Most of this report is just noise. GitHub repos are public.
       | Public stuff can be shared. Public stuff shared previously and
       | then deleted is "still available", but it was _shared previously_
       | and not really subject to security analysis.
       | 
       | The one thing they seem to be able to show is that commits in
       | _private_ branches show up in the parent repository if you know
       | the SHAs. And that seems like a real vulnerability. But AFAICT it
       | also requires that you know the commit IDs, which is not
       | something you can get via brute forcing the API. You 'd have to
       | combine this with a secondary hole (like the ability to generate
       | a git log, or exploiting a tool that lists its commit via ID in
       | its own metadata, etc...).
       | 
       | Not nothing, but not "anyone can access private data on GitHub"
       | as advertised.
        
         | LoganDark wrote:
         | > it also requires that you know the commit IDs, which is not
         | something you can get via brute forcing the API
         | 
         | Well, GitHub accepts abbreviations down to as short as four hex
         | digits... as long as there's no collision with another commit,
         | that's certainly feasible. Even if there is collision, once you
         | have the first four characters you can just do a breadth-first
         | search
        
         | beezlewax wrote:
         | There's a whole section here about how to brute force the
         | hashs. You don't even need the full hash... just a shortened
         | version using the first few chars.
        
       | poikroequ wrote:
       | Microsoft: It's the EUs fault!
       | 
       | Also Microsoft: It's a feature!
        
         | theragra wrote:
         | It was known before Microsoft
        
       | makach wrote:
       | A "delete" means it should be gone forever from the service it
       | was removed from.
       | 
       | "Private" means it should only be available to specific involved
       | parties only.
       | 
       | If you implement any other behavior to these concepts you are
       | implementing anti patterns.
       | 
       | We need to be precise and consistent in the wording of the
       | functions we are providing in order to ensure we easily can
       | understand what is going on, without having to interpret
       | documentation to be able to fully understand what is going on.
        
       | yread wrote:
       | On the positive side this takes care of all those companies
       | forking open source software and not contributing back
        
       | kassah wrote:
       | In response to the end of the article "it's important to note
       | that some of these issues exist on other version control system
       | products." I actually have experience helping someone with an
       | issue on BitBucket with PII data that you can't rotate.
       | 
       | Once we eliminated the references in the tree and all forks (they
       | were all private thankfully), we reached out to BitBucket
       | support, and they were able to garbage collect those commits, and
       | purge them to the point where even knowing the git hashes they
       | were not locatable directly.
        
       | Szpadel wrote:
       | even better you can actually commit to other forks if they
       | creates pull request to you.
       | 
       | (there is checkbox allowing that when you are opening PR that I
       | bet almost noone noticed)
       | 
       | I reported that years ago and all they changed it that they
       | extended documentation about this "feature"
       | 
       | my main issue was that you cannot easily revoke this access
       | because target repo can always reopen PR and regain write access.
       | 
       | but they basically "stated works as intended"
        
       | tamimio wrote:
       | I don't use GitHub for anything serious, rather my own Gitea.
       | However:
       | 
       | > Any commits made to your private fork after you make the
       | "upstream" repository public are not viewable.
       | 
       | Does that mean a private repo that has never been or will be
       | public isn't accessible? That scenario wasn't mentioned.
        
         | fedorareis wrote:
         | My understanding is that you are correct. If the repo and all
         | of its forks stay private then the only people that would be
         | able to view them are people who have permissions to access
         | those repos.
        
       | josephscott wrote:
       | How much help is turning off the "Allow forking" option
       | https://docs.github.com/en/repositories/managing-your-reposi... ?
        
       | WhereIsTheTruth wrote:
       | Whoever at Github/Microsoft who doesn't want to solve this
       | problem should be jailed
        
       | dathinab wrote:
       | commits done to private repose being public (point 2&3) is always
       | a non minor security vulnerability IMHO
       | 
       | it doesn't matter if it's behaving as intended or how there are
       | forks
       | 
       | also point 1 implies that github likely doesn't properly GCes
       | there git which could have all kinds of problematic implications
       | beyond the point 1 wrt. purging accidental leaked secrets or
       | PI....
       | 
       | all in all it just shows github might not take privacy security
       | serious ... which is kinda hilarious given that private repo
       | using customers tend to be the paying customers
        
         | keybored wrote:
         | You're right that they don't let commits get GC. They jump
         | through hoops in order to keep commits that are not
         | transitively referenced from being garbage collected. Just
         | assume that every commit is kept around for "auditing".
         | 
         | One GitHub employee even contributed a configuration to Git
         | which allows you to do the same thing: run a program or feed a
         | file which tells the GC what nodes to not traverse.
        
       | keybored wrote:
       | People are so preoccupied with putting the code on GitHub. It's
       | like it doesn't exist before it's on GitHub.
       | 
       | If you're not gonna share it then it hardly matters. Use a backup
       | drive.
       | 
       | Git is distributed. You don't have to put your dotfiles on
       | GitHub. Local is enough.
        
         | JohnMakin wrote:
         | Your laptop breaks in a way that your disk cannot be recovered.
         | Now what? How often are you backing up your disk? Probably much
         | easier to type "git commit" and "git push"
        
           | keybored wrote:
           | Am I really gonna get interrogated on HN for talking about
           | automatic and redundant backup give me a break.
        
             | JohnMakin wrote:
             | I wouldn't call the parent comment you're responding to an
             | "interrogation" and I'm sorry you perceived it that way.
             | You make a pretty extraordinary claim that local disk is
             | better than a remote repository for storing/updating code
             | for personal work - with no evidence to support this claim
             | - so a followup question seems reasonable.
             | 
             | as far as "git is distributed" I don't know if that's the
             | case if you keep it purely local, but hey, you seem to have
             | it all figured out so good job.
        
               | keybored wrote:
               | I thought a person of your background (who no doubt has
               | it all figured out) would surmise that I was talking
               | about backing up to an external disk and not to another
               | disk on the same laptop. And would grant another person
               | some good faith and be able to generalize without
               | spelling it all out for them: if the point is to back
               | things up then maybe I can infer that other means of
               | backup are also in the cards, like sneakernet or your own
               | server or multiple locations. _Huh_
               | 
               | You can also back up to a remote. That is not GitHub. You
               | know because the topic is GitHub and how promiscuous they
               | are. Which is why I say: if you don't need your code to
               | be "social" you don't need to put it on GitHub.
               | 
               | But even a remote repository is overkill. An automated
               | backup plan with git bundle is automatic, after all. Set
               | it and forget. And backups are supposed to be automated,
               | right? I ask because you have the relevant background
               | here.
        
               | JohnMakin wrote:
               | > I thought a person of your background (who no doubt has
               | it all figured out) would surmise that I was talking
               | about backing up to an external disk and not to another
               | disk on the same laptop. And would grant another person
               | some good faith and be able to generalize without
               | spelling it all out for them: if the point is to back
               | things up then maybe I can infer that other means of
               | backup are also in the cards, like sneakernet or your own
               | server or multiple locations. Huh
               | 
               | Your snark not withstanding, I actually did understand
               | that an external disk resides outside of the laptop, and
               | find your claim still fantastic and lacking evidence.
               | 
               | As for the rest of your post, you'll forgive my
               | misunderstanding of whatever _deeply_ nuanced point you
               | 're making here regarding backing up to a remote because
               | of this at the end of your original post:
               | 
               | > local is enough.
               | 
               | Anyway, seems like you need to take a break. Someone of
               | my background has better things to do than engage in a
               | flame war with someone clearly looking for a fight over a
               | throwaway post.
        
       | imadj wrote:
       | They single out GitHub in the title and throughout the entire
       | article, only in the very last line they clarify that these
       | issues are actually a common design flow in version control
       | systems and not limited to GitHub:
       | 
       | > Finally, while our research focused on GitHub, it's important
       | to note that some of these issues exist on other version control
       | system products
       | 
       | For example, Gitlab only recently solved the issue:
       | https://gitlab.com/gitlab-org/gitlab/-/issues/408137
        
       | eezing wrote:
       | I'm glad I don't use forks
        
       | thih9 wrote:
       | Can this be used to host illegal content? I.e.: fork a popular
       | repo, commit a pirated book to the fork, delete the fork, use the
       | original repo to access the pirated book?
       | 
       | What would github do after receiving a DMCA request in that case?
        
         | er4hn wrote:
         | One can safely assume they will find a way to follow the law
         | rather than mumble about technically this is working as
         | intended.
        
         | lnrd wrote:
         | That looks like the kind of loophole that could get GH to do
         | something about this.
        
           | arccy wrote:
           | they have the ability to do essentially git gc and drop
           | unreachable commits
        
       | devinsewell wrote:
       | and people have been yelling at me for refusing to ever use
       | github since 2013 lolo
        
       | j-pb wrote:
       | Commit hashes are essentially capabilities, you should be able to
       | access any data that you have a capability for. But allowing
       | access via a 16bit prefix is just idiotic, and equivalent to
       | accepting just the first two bytes of a 256bit cryptographic
       | signature...
        
       | nostrademons wrote:
       | Cool, another way to access youtube-dl next time it gets deleted
       | from GitHub.
        
       | madewulf wrote:
       | In fact, there is a process to request complete removal of data,
       | but it involves sending an email that will be reviewed by github
       | staff: https://docs.github.com/en/site-policy/content-removal-
       | polic...
       | 
       | On the other hand, once an API key or password has been published
       | somewhere, you should rotate it anyway.
        
         | riedel wrote:
         | I was wondering, how they can otherwise comply with
         | legislation. Makes sense there is a way to do this e.g. in case
         | of valid GDPR, DMCA, etc. cases.
        
       | midtake wrote:
       | Just rebase/squash everything.
        
       | josephcsible wrote:
       | How is this more of a vulnerability than the existence of sites
       | like archive.org is? Isn't it just a fact of the Internet that
       | once you make something public, you can't fully take it back
       | later?
        
         | bogwog wrote:
         | Because private forks are not meant to be public
        
         | debugnik wrote:
         | The third case in the article shows private forks being leaked
         | publicly when the upstream goes public.
         | 
         | The other two cases are indeed not worse than third-party
         | archival, but they're still socially concerning. When you ask
         | your own host to delete something you uploaded, you don't
         | expect them to ignore you just because someone could have
         | already archived it maybe. Making it harder to find can still
         | be valuable; not all archives stay available forever, if any.
        
       | galkk wrote:
       | I won't be surprised if "right to be forgotten"/GDPR abusers will
       | spam github and force them to act on it, eventually.
       | 
       | ----
       | 
       | This is clearly documented and can be explained even to non-
       | technical managers.
       | 
       | From my POV calling that vulnerability is trying to build a hype.
       | 
       | I think that having quote from here on visibility changing
       | settings page would be even more clear:
       | https://docs.github.com/en/pull-requests/collaborating-with-...
        
       | crvdgc wrote:
       | I think the first two points are a result of private data
       | (commit/fork/issue) being able to refer to public data without
       | making the reference public.
       | 
       | Say a private commit depends on a public commit C. Suppose in the
       | public repo, the branch containing C gets deleted and C is no
       | longer reachable from the root. From the public repo's point-of-
       | view, C can be garbage-collected, but GitHub must keep it alive,
       | otherwise the deletion will break the private commit.
       | 
       | It would be "a spooky action at a distance" from the private
       | repo's POV. Since the data was at a time public, the private repo
       | could have just backed up everything. In fact, if that's the
       | case, everyone _should always_ backup everything. GitHub
       | retaining the commit achieves the same effect.
       | 
       | The public repo's owner can't prevent this breakage even if they
       | want to, because there's no way to know the existence of this
       | dependency.
       | 
       | The security issue discussed in the post is a different scenario,
       | where the public repo's owner wants to break the dependency
       | (making the commit no longer accessible). That would put too much
       | of a risk for anyone to depend on any public code.
       | 
       | My mental model is that all commits ever submitted to GitHub will
       | live forever and if it's public at one time, then it will always
       | be publicly accessible via its commit hash.
        
       | ahpook wrote:
       | Hubber here (same username on github.com). We in GitHub's OSPO
       | have been working on an open source GitHub App to address the use
       | case where organizations want to keep a private mirror of an
       | upstream public fork so they can review code and remove
       | IP/secrets/keys that get committed and squash history before any
       | of those changes are made public. Getting a beta release this
       | week, in fact - check it out, I'm curious what yall think about
       | the approach
       | 
       | https://github.com/github-community-projects/private-mirrors
        
       ___________________________________________________________________
       (page generated 2024-07-24 23:00 UTC)