[HN Gopher] Forging signed commits on GitHub
       ___________________________________________________________________
        
       Forging signed commits on GitHub
        
       Author : rokkitmensch
       Score  : 277 points
       Date   : 2024-01-22 04:49 UTC (1 days ago)
        
 (HTM) web link (iter.ca)
 (TXT) w3m dump (iter.ca)
        
       | YetAnotherNick wrote:
       | Really nice and educational exploit. Regexes should almost never
       | be used in parsing structured stuff in production. While it is
       | developer friendly, it's very hard to think all the edge cases.
        
         | _nalply wrote:
         | What I tought, they should have used some more official library
         | to manage git instead of doing the work themselves, but perhaps
         | there was none?
        
           | j16sdiz wrote:
           | We have libgit2, but github have their own ruby
           | implementation of everything.
           | 
           | (java and rust have their own "native" implementation too.
        
             | guappa wrote:
             | libgit2 was written after github. So that might explain.
        
               | bspammer wrote:
               | libgit2's frontpage claims that GitHub does use them:
               | https://libgit2.org/
        
               | guappa wrote:
               | Depends which part of github uses them. Might even be
               | their GUI client...
        
         | Zababa wrote:
         | Are there alternatives that make it easier to think about all
         | the edge cases?
        
           | michaelt wrote:
           | Keep using regular expressions, just stop using . and
           | instead, define what is allowed.
           | 
           | You're expecting an input to be a UUID? Check it with
           | ^[a-fA-F0-9\\-]{30,40}$ and you know you're not getting any
           | apostrophes or script tags or enormous inputs or empty inputs
           | or newlines or lookalike characters or emojis.
        
           | ylk wrote:
           | I think re-implementing the functionality is the mistake
           | here. A big counter-argument to "rewrite in rust" is usually
           | that by rewriting you introduce new bugs.
           | 
           | Especially for security critical things one should re-use the
           | implementation to avoid issues like the above. Proving both
           | the original and re-implemented parser to be equivalent would
           | probably also work, but not sure how practical.
           | 
           | And in any case have competent people audit what you did.
        
             | avgcorrection wrote:
             | How did we get from criticizing regexes to mocking "rewrite
             | in rust" in the span of three comments?
        
               | ylk wrote:
               | I'm not mocking it. I'm just applying the argument I see
               | many people on HN make in discussions about rust to this
               | case, where I suspect many will lean much more on the
               | side of using regexes to reimplement parsers. Don't know
               | if that'll be the same people. Either way, I didn't want
               | to have a discussion about rust.
        
               | avgcorrection wrote:
               | > I'm not mocking it. I'm just applying the argument I
               | see many people on HN make in discussions about rust to
               | this case, where I suspect many will lean much more on
               | the side of using regexes to reimplement parsers.
               | 
               | Hmm. Okay.
               | 
               | The part that bewilders me is that Rust showed up. Which
               | would have made sense if someone was complaining about
               | C++ or something, or maybe unsafe memory management. But
               | regexes?
               | 
               | > Either way, I didn't want to have a discussion about
               | rust.
               | 
               | Which is why you brought it up unprompted... I give up.
        
             | mistercheph wrote:
             | Hey, watch what you're saying buddy, 'rewrite in rust' is
             | what pays the bills in this family!
        
         | akdor1154 wrote:
         | Disagree here, the problem is their implementation was slightly
         | different to Git's. There is more chance of stuffing up like
         | this with a handrolled parser than with a regex. The only
         | viable alternative i'd accept here to actually reduce risk is
         | to call into libgit itself. Absent that, a regex is appropriate
         | for this problem.
         | 
         | There are many problems where regexes are succinct and
         | appropriate, (of course there are many more where they are
         | neither).
        
           | YetAnotherNick wrote:
           | Unless git uses regex, it is almost impossible to verify that
           | the regex is same as their validation. What if git's
           | validation changes. Even looking at the corrected regex I am
           | not sure what would happen for strings like '<a@a.com>>' or
           | '<' or 'x <a>'. Not saying the github's corrected regex is
           | wrong, but it is a mental challenge to verify it.
        
         | thrdbndndn wrote:
         | When the data itself is just plain text, are there any other
         | (better) ways?
        
           | brokensegue wrote:
           | Parsers?
        
             | davemp wrote:
             | I don't see how writing a parser from scratch would
             | mitigate bugs vs using a regex parser. Parsers are just a
             | hot spot for security bugs that should get extra scrutiny.
        
             | semanticc wrote:
             | Regex is the parser for regular languages.
        
         | cedilla wrote:
         | This isn't really a regex problem though. The regex probably
         | did exactly what the dev intended, it's just that the dev
         | failed to take into account that <author> could also be a 0
         | length string.
         | 
         | The same logic bug would have occured with a parser.
        
           | account42 wrote:
           | This is all speculation of course but a parser is more likely
           | to recognize the author: prefix and then error out on
           | anything unexpected in the rest of the line. Meanwhile
           | regular expressions don't have any way to report errors.
        
             | glenjamin wrote:
             | Personally I'd have separated finding the Author line from
             | pulling out the contents of that line - but still used
             | Regex
        
       | Vogtinator wrote:
       | I treat "signed by GitHub's verified signature" equivalent to
       | unsigned. It's not the author's signature, just some third
       | party's.
        
         | sureglymop wrote:
         | One should treat git itself as insecure. When you sign a
         | commit, the signature is based on the commit message and commit
         | metadata including tree object_id and parent object_id. But if
         | sha1 is used to generate commit hashes, one can theoretically
         | forge a commit. This means that in an elaborate supply chain
         | attack, one could spoof a commit with a valid signature. That
         | signature would then still appear valid for the spoofed commit
         | and probably make it seem more legitimate than it is.
        
           | Anunayj wrote:
           | For people who might be wondering why git hasn't moved to
           | sha256 yet, here's a lwn article on it:
           | https://lwn.net/Articles/898522/
        
             | jacoblambda wrote:
             | For a more recent bit of info on it (I remembered this
             | thread from the latter half of last year on the mailing
             | list):
             | 
             | https://lore.kernel.org/git/2f5de416-04ba-c23d-1e0b-83bb655
             | 8...
             | 
             | The important snippet from this thread is:
             | 
             | > On 6/29/23 07:59, Junio C Hamano wrote:
             | 
             | > > Adam Majer <adamm@zombino.com> writes:
             | 
             | > > > Is sha256 still considered experimental or can it be
             | assumed to be stable?
             | 
             | > > I do not think we would officially label SHA-256
             | support as "stable" until we have good interoperability
             | with SHA-1 repositories, but the expectation is that we
             | will make reasonable effort to keep migration path for the
             | current SHA-256 repositories, even if it turns out that its
             | on-disk format need to be updated, to keep the end-user
             | data safe.
             | 
             | > That could be a different definition of stable. But I'm
             | satisfied that current sha256 repositories will not end up
             | incompatible with some future version of git without
             | migration path (talking about on-disk format).
             | 
             | > So maybe my question should be reworded to "is sha256
             | still considered early stage, for testing purposes only
             | with possible data-loss or can it be relied on for actual
             | long lived repositories?"
             | 
             | > > So while "no-longer-experimental" patch is probably a
             | bit premature, the warning in flashing red letters to
             | caution against any use other than testing may want to be
             | toned down.
             | 
             | > Agreed. I think it should be clear that SHA256 and SHA1
             | repositories cannot share data at this point. The scary
             | wording should be removed though, as currently it sounds
             | like "data loss incoming and it's your fault" if one
             | chooses sha256
             | 
             | > - Adam
             | 
             | TLDR: SHA256 works well and can be considered "beta". It's
             | viable but you can't really share data between a SHA1 and
             | SHA256 repo. So if your repo doesn't need to merge to/from
             | existing sha1 repos and you aren't dealing with subtrees or
             | anything fancy like that, you should be able to use SHA256.
             | It is not expected to break your repo, it's reasonably well
             | supported, and in the event a migration needs to occur,
             | there will be support for that as well.
             | 
             | So if you want to jump on the SHA256 train you absolutely
             | can provided you are following a pretty normal/boring use
             | case.
        
           | theamk wrote:
           | Note that while SHA1 itself is broken, git specifically has a
           | code to detect attempts at collisions and prevent them, see
           | all the sha1dc stuff [0]
           | 
           | [0] https://news.ycombinator.com/item?id=17825441
        
         | kevincox wrote:
         | I think it adds value. It means that the commit was created
         | using the author's credentials (to the extent that you trust
         | GitHub and its lack of bugs). So the author can be trusted to
         | some extent on these commits.
         | 
         | Although extending this signing ability to arbitrary commits
         | via the codespaces API seems weird as the commit is no longer
         | generated by GitHub. It seems now that you could generate some
         | fake merge that changed data in a surprising way. Previously
         | you knew that merges generated by GitHub would be "clean". Not
         | that this changes much in practice.
        
           | cryptonector wrote:
           | In particular it means "this was committed via the GitHub Web
           | UI" _and_ "the author was authenticated to GitHub". But the
           | latter part is not really any different from who pushed the
           | commit. And clearly there is no value in this as long as
           | GitHub doesn't make this feature more secure. Using regexp to
           | parse the author line then ignoring author lines that don't
           | match... yikes.
        
             | kevincox wrote:
             | But how do you know who pushed a given commit? I don't
             | think it is recorded in the Git repository.
             | 
             | I agree that "to the extent that you trust GitHub and its
             | lack of bugs" is a big caveat. But it sill seems better to
             | have this information than not to have it.
        
             | semiquaver wrote:
             | The badge is also displayed if the commit was signed
             | locally by a GPG key whose public component is uploaded to
             | GitHub by the the claimed author and committer.
        
             | orblivion wrote:
             | I realized there is one benefit that this offers: Github
             | attests to the time that it happened.
             | 
             | Supposing you're looking at a popular repository, one where
             | a malicious commit would likely be noticed eventually. The
             | last commit was "one month ago". What's to say someone
             | didn't compromise the developer's computer, sign a
             | malicious commit backdated by a month, and push it to
             | Github? If the last commit was made via the Github UI, you
             | have pretty good assurance (i.e. as much as you trust
             | Github not to get hacked) that this didn't happen.
             | 
             | Even better if the previous commit was done by the author,
             | and the Github UI commit is trivially confirmed as safe.
             | That way, you can confirm the author's commit locally, in
             | case Github is the one that got hacked.
             | 
             | If both the author and Github got hacked, :shrug: I guess
             | that's a pretty skilled adversary.
             | 
             | Caveat: All of the above is my own analysis. I'm curious if
             | there are flaws in my thinking here.
        
           | kyrofa wrote:
           | > It means that the commit was created using the author's
           | credentials (to the extent that you trust GitHub and its lack
           | of bugs).
           | 
           | This is incorrect. If I create a merge request, and then the
           | maintainer clicks the "squash and merge" button, that commit
           | is associated with me, even though I'm not actually the
           | creator of that commit at that point: the maintainer is. I
           | believe this happens even if someone else (e.g. the
           | maintainer) pushed commits to that merge request before
           | squashing them together.
        
         | kyrofa wrote:
         | Yeah I've always found that to be frustrating. Someone squashes
         | and merges MY commits, and github still shows them as validly
         | signed, even though they aren't the commits I created. Sadly
         | it's hard to actually filter those out: the UI makes them all
         | look the same.
        
       | photoGrant wrote:
       | > April 24 2023: GitHub closes the issue, saying that being able
       | to impersonate your own account is not an issue
       | 
       | I mean, fear & curiosity would have me testing it before closing.
       | Wha?
        
         | michaelt wrote:
         | As I understand things, everyone who runs a bug bounty gets an
         | endless stream of useless reports telling them things like
         | "your website sends the server header which makes you
         | vulnerable to hacking, bounty please"
         | 
         | Therefore, they have _very_ junior people assigned to triage
         | bug reports. And those very junior people are in the habit of
         | closing 99.9% of reports as not-really-a-security-problem.
        
           | dns_snek wrote:
           | Anecdote time, I discovered a bug where a _bank_ app would
           | continue receiving 2FA notification prompts even after the
           | user terminated the session from another device (e.g. after
           | realizing their device was stolen). The app was apparently
           | still holding onto some kind of a valid token that could be
           | used to approve 2FA requests despite being  "logged out".
           | 
           | I submitted detailed reproduction steps, with a video clearly
           | showing the bug. Triage claimed that this was "intended
           | behavior" and as far as I'm aware, the bank didn't even see
           | my report.
        
             | zero_k wrote:
             | Haha yes, that's the norm. They don't care. Report to the
             | regulator. Then they care. Find out who the regulator is in
             | your country, and report to them. Kinda annoying, but when
             | the regulator tells them they need to do it, it magically
             | gets done.
             | 
             | Note that the regulator doesn't threaten with childish
             | things like a fine of 1M USD, that's kids' playground
             | stuff. They threaten to revoke their banking license, which
             | will cost them 100x more per day :)
        
             | tedivm wrote:
             | I've found that responding "Well, if that's intended
             | behavior you don't mind if I blog about it then? The post
             | will go up in a week." tends to help them take it more
             | seriously.
        
           | hannob wrote:
           | Sorry, if that's your security process, it's broken. If
           | you're microsoft and that's your security process, you have
           | no excuse and should be embarrassed. However, I heard such
           | stories about Microsoft's security process plenty of times
           | before, so I guess that's what it is.
        
           | Sohcahtoa82 wrote:
           | > "your website sends the server header which makes you
           | vulnerable to hacking, bounty please"
           | 
           | Getting nightmare flashbacks at my last job.
           | 
           | My CISO insisted _everything_ on our penetration test report
           | get remediated. Even  "Information" level items, like the
           | server header. And port 443 being open. _facepalm_
        
       | asmor wrote:
       | GitHub closing any security issue by default sure is a thing.
       | 
       | I reported an issue that was not fun to reproduce in their SAML
       | implementation, where they would include organization membership
       | in OAuth tokens even if you didn't have a SAML session, so if you
       | did device posture in your SAML provider to protect from access
       | to code, installing apps that allow you to get code (e.g.
       | SonarCloud) would allow bypass.
       | 
       | The weirdest thing is that they had an endpoint that did check if
       | there was a SAML session (membership API would 403 without), but
       | I still got told this wasn't a bug and within expected behavior
       | (not for our CISO it wasn't, and it's still not documented).
       | 
       | Shoutout to Tailscale for sending me some stuff despite it not
       | being an issue in their implementation.
        
       | datadeft wrote:
       | > GitHub fixed the issue by changing the problematic regex
       | 
       | Why am I not surprised.
        
       | asimpletune wrote:
       | One crucial thing that enabled this work is being able to
       | decompile the enterprise version of the GitHub binary. It makes
       | you think that distributing a binary of your app is either a good
       | idea or bad idea, depending on how you view security. If your app
       | is important though then it's definitely a good idea.
        
         | greatgib wrote:
         | At the opposite, thanks to that they now have a safer platform.
         | 
         | At the opposite, there could still be nefarious insiders that
         | discover that and abuse of it or resell the exploit for years
         | without anyone noticing.
         | 
         | Especially for a proprietary service, as a customer, for me it
         | is an added bonus point to know that the thing can and is
         | battle tested by outsiders...
        
         | epcoa wrote:
         | > [was] being able to decompile the enterprise version of the
         | GitHub binary.
         | 
         | > It makes you think that distributing a binary of your app is
         | either a good idea or bad idea,
         | 
         | > If your app is important though then it's definitely a good
         | idea.
         | 
         | I don't follow. What are you saying would have been the good
         | idea here?
         | 
         | It also isn't clear to me that they required the code except to
         | answer why it works, certainly this parsing defect could be
         | discovered simply by testing the endpoint.
        
           | asimpletune wrote:
           | That distributing a binary app was a good idea. The attacker
           | found and fixed the vulnerability.
           | 
           | Of course it wasn't required to find the vulnerability,
           | however if you make it then it's more likely that good guys
           | will find and fix stuff. The contrary would be it's difficult
           | to find this stuff, so then only highly motivated bad guys do
           | it, because at that point 10 grand might not be worth it.
           | 
           | Again, this was all prefaced with it depends. Some might see
           | it as making it easier to find problems, and others might say
           | 'yeah, exactly.". It's just an opinion. A fact is however
           | that a binary makes it easier. Source code would be even
           | easier.
        
             | xboxnolifes wrote:
             | But what alternative are you suggesting? Not having apps?
        
               | rsanek wrote:
               | if you don't offer a self hosted version you don't have
               | to distribute anything. just force people to use your
               | cloud product
        
       | avgcorrection wrote:
       | I don't like how GitHub changes my committer to them in the name
       | of being Verified. I don't care about it being Verified by GitHub
       | (yes, even if it had always worked and not allowed spoofing).
       | 
       | It's weird that GitHub has such a long history of cooperation
       | with the Git project and yet just rewriting the committer is
       | considered okay to them.
        
         | sp1rit wrote:
         | It's common practice to have the committer not match the author
         | (especially in environments where only patches/diffs are sent).
         | I can't imagine the git project having any issue with that,
         | given that the author field stays retained (ignoring this bug
         | ofc.).
        
           | avgcorrection wrote:
           | I tried to do some more research on this. Unfortunately the
           | search combination
           | 
           | > rewrite commit github verified
           | 
           | is so insanely SEO-poisoned by (1) StackOverflow questions
           | about how to rewrite commits and (2) GitHub's docs about
           | _regular_ (built-in Git commit verification) commit
           | verification.
           | 
           | But I thought about it some more. And I guess them hijacking
           | the committer on pull request merges is not that insanely
           | bothersome. It's their platform after all.
           | 
           | Just yet another reason to not interact with forges in a way
           | that affects Git itself.
           | 
           | But another thing I've noticed is that sometimes my committer
           | becomes the stupid initial (legacy) email I used to sign up
           | there. Why in the hell? My current email is already
           | registered there. But I tested this now and at least I got
           | the option to merge as one of my addresses. So I guess it was
           | fixed?
           | 
           | > It's common practice to have the committer not match the
           | author
           | 
           | When the committer and author are different people. And
           | apparently when GitHub tries to insert itself as a pseudo-
           | person.
           | 
           | I expected that my own actions would be tied to my own
           | identity on GitHub. Not for them to shoehorn GitHub
           | Verified(tm) into commit objects that I create through
           | clickity-clacking around their UI. But it is after all their
           | UI and they can poison commits however they like since _they_
           | create it (i.e. I didn't create it and send it to them; then
           | I would have noticed if they tried to rewrite it). So shame
           | on me.
           | 
           | > especially in environments where only patches/diffs are
           | sent
           | 
           | Clearly doesn't apply here.
           | 
           | > I can't imagine the git project having any issue with that,
           | given that the author field stays retained (ignoring this bug
           | ofc.).
           | 
           | Given that they don't use GitHub for anything directly
           | related to Git (PRs and such) and that they have their own
           | way of indicating code provenance: No, I guess they have no
           | reason to care. (Other than in spirit.)
           | 
           | But it matters in general (in spirit) that if I pick up some
           | patch by Jack Cooper and apply it then the committer is _me_.
           | Not outlook.com. Or whatever program did it for me.
           | 
           | The commit object has a few metadata fields. I've never seen
           | anyone say, "Oh yeah sure, just use that field for whatever
           | it's not like it matters anyway".
        
         | brasic wrote:
         | The web-flow signing system is for users' convenience in places
         | where it's not feasible to sign the commit with their own
         | private key: commits made in the web interface or on an
         | ephemeral GH-provisioned VM (codespace). For the latter, you
         | are free to send your own private key to your codespace so you
         | can sign your own commits but GitHub cannot because they don't
         | have your private key and don't want to have it. Defaults
         | matter and signed commits are important.
         | 
         | As a sibling notes, this use case and similar ones is the
         | reason the committer field exists as distinct from the author
         | field. I think a $10K bounty for this bug speaks to how
         | seriously they stand behind the fact that they will only sign
         | and mark as verified commits whose author field matches an
         | authenticated user.
         | 
         | (Disclaimer: former GH employee)
        
           | avgcorrection wrote:
           | > The web-flow signing system is for users' convenience in
           | places where it's not feasible to sign the commit with their
           | own private key:
           | 
           | Who signs _all_ their commits? Joey Hess maybe? There are
           | certainly others. But I've never seen anyone make a case for
           | this. In fact only negative cases since it just encourages
           | you to automate your signing process, which many are not
           | comfortable with.[1]
           | 
           | I'm not important enough to sign anything.
           | 
           | On Bitbucket we push the big merge button and out comes a
           | commit with the _correct_ person attributed to it.[2] Even
           | Atlassian manages to do this the correct way.
           | 
           | > For the latter, you are free to send your own private key
           | to your codespace so you can sign your own commits but
           | 
           | Yeah GPG/SSH sign commits... who cares. Most people don't.
           | 
           | > Defaults matter and signed commits are important.
           | 
           | I don't care about your opinion.
           | 
           | I wouldn't mind if this was an option that I could opt out
           | of. (I'm wondering out loud, not asking you or anyone else.)
           | I just haven't heard of it yet.
           | 
           | I'm a Git user after all so I'm used to changing bad
           | defaults.
           | 
           | > As a sibling notes, this use case and similar ones is the
           | reason the committer field exists as distinct from the author
           | field.
           | 
           | Quite a leap to go from attributing emailed-around patches to
           | the correct author while also maintaining the committer (like
           | the maintainer) to what looks equivalent to Norton Antivirus
           | junk output stuffed 40 lines into someone's email signature.
           | 
           | > I think a $10K bounty for this bug speaks to how seriously
           | they stand behind the fact that they will only sign and mark
           | as verified commits whose author field matches an
           | authenticated user.
           | 
           | "I think the price they put on this SPOOFING vulnerability
           | speaks to how serious they are about verified commits", they
           | said without irony.
           | 
           | "Sent from my GitHub", ah they all felt at-ease
           | immediately... wait the same platform that had a spoofing
           | vulnerability?
           | 
           | [1] Well, allegedly. I have never signed _anything_ so I
           | don't know.
           | 
           | [2] They committed it too. Or wait. Was that the merge
           | button?
        
         | oefrha wrote:
         | If you contribute to the Git project itself, or other major
         | git-using projects like the Linux kernel, you'll get completely
         | accustomed to having your commits committed by someone else.
         | 
         | Having a long history of cooperation with the Git project
         | should only increase your confidence in doing that.
         | 
         | Also, don't use the web UI to commit if you want to sign with
         | your key, simple.
        
           | avgcorrection wrote:
           | I know that Junio C Hamano applies all patches himself to
           | git.git and thus is the committer on all commits.[1] That is
           | _how Git works_ when picking out patches from a mailbox.
           | 
           | The are all committed by _people_ , not some middle man
           | program.
           | 
           | Just like I can click the big merge button _myself_ and then
           | it is committed by... oh by mr noreply?
           | 
           | Try to find some committer or author in the git.git project
           | that has a name like "Verified By Yahoo! <jibber jabber hash
           | nonense>"? Good luck. (Almost like that kind of forge
           | silliness was never accepted into any of their workflows...
           | yet they are somehow very comparable, in your mind.)
           | 
           | This is like trying to explain to some Outlook representative
           | that no one cares if they "verify" emails that go through
           | them with some proprietary DKIM headers that only they know
           | and care about.[2] "Well actually, if you understood how
           | email headers work then you would see that it is not at all
           | unlike......"
           | 
           | > Also, don't use the web UI to commit if you want to sign
           | with your key, simple.
           | 
           | Thank you for the tip.
           | 
           | [1] With the few pull request by way of email exceptions
           | 
           | [2] Made up but not implausible example
        
             | mook wrote:
             | You may have pressed the button, but the actual commit
             | action was done by GitHub's infrastructure, hence they are
             | listed as committer. Think of it as an acknowledgment that
             | maybe they were hacked and made to do the commit even if
             | you didn't want it.
             | 
             | If you want to do the commit yourself, you can run `git
             | merge` locally and push the result. They won't touch the
             | commit (including the committer) in that case, because the
             | commit hash must never change. I'm not sure if they track
             | who did the push (or even what the sensible value would be
             | if two people pushed the same commit to two different
             | forks).
        
       | jaimex2 wrote:
       | Is there anything more pointless than signed commits? If your
       | security team ever enforced it you're probably already pwned but
       | may as well replace them just in case.
        
         | samcat116 wrote:
         | Why do you say that? It seems like they would be more important
         | in the context of supply chain attacks.
        
       | tumetab1 wrote:
       | (2023)
        
       | richbell wrote:
       | > Since the first author name is zero characters long, the regex
       | skips that line, and the fake second author line is used instead.
       | Git ignores extra author lines after the first, so Codespaces
       | looks at the second author line but Git looks at the first.
       | 
       | How common is it to have multiple author lines? I haven't seen
       | that before, only co-author footer[1].
       | 
       | [1]: https://docs.github.com/en/pull-requests/committing-
       | changes-...
        
         | avgcorrection wrote:
         | Apparently multiple authors is invalid.[1]
         | 
         | [1] https://stackoverflow.com/q/65445147/1725151
        
       | hannob wrote:
       | It's wild that we're in 2024, and there's a security issue that
       | is essentially "we parse data in an ad-hoc format with an ad-hoc
       | parser based on regexps, and we got that wrong". And the solution
       | is "we tweaked the regexp", and not "we properly
       | specified/documented the format (or replaced it with something
       | properly specified), and wrote a parser that correctly implements
       | the specification".
        
         | meandmycode wrote:
         | Especially wild for GitHub, I would have thought they had the
         | entirety of 'git foo' codified up with actual parsers by now,
         | but I guess a regex will always do..
        
           | formerly_proven wrote:
           | Well the user-git-facing parts of GitHub _are_ built on
           | libgit2 and libssh.
        
           | hedora wrote:
           | This doesn't surprise me at all coming from github. I've
           | tried using their public api's, and they're all shockingly
           | bad. In some cases I thought it was my mistake, but then I
           | checked their website, and found it was bug for bug
           | compatible with the thing I'd built.
           | 
           | My usual rule of thumb for choosing products is to go with
           | whatever rising alternative there is to the entrenched
           | monopoly. The little one needs to be much better to overcome
           | monopoly effects.
           | 
           | I don't sign my own paycheck, but would be moving to gitlab
           | or something if I did.
        
         | sanjayjc wrote:
         | I must be missing something because the corrected regex
         | (/\Aauthor ([^<]*)[ ]{0,1}<(.+)>/) seems to behave the same as
         | the problematic one:                 https://regexr.com/7qrum
         | 
         | I wonder whether github may not be using PCRE.
        
           | jwilk wrote:
           | The problem wasn't that the regexp matched the second
           | "author" line, but that it didn't match the first one.
        
         | TravisCooper wrote:
         | This is the correct take.
        
         | theamk wrote:
         | Even if I had to use regex, I'd at least do it in 2 stages:
         | 
         | - Extract all "author" lines, fail-fast if there is more than
         | one
         | 
         | - Parse the line, again fail-fast if there is anything strange
         | 
         | this would just stop this, and similar, attacks completely.
         | Postel's law has no place in modern development if we want
         | things to be secure.
        
           | HideousKojima wrote:
           | >Postel's law
           | 
           | Also known as "make other people's incompetence and inability
           | to conform to a spec _your_ problem. "
        
             | avgcorrection wrote:
             | Yeah that "law" makes absolutely zero sense to me. It's
             | like "be nice" but in a self-sacrificial way which then
             | indirectly sacrifices other people in the long run.
        
               | cryptonector wrote:
               | It's from an era when interoperability in a then-very-
               | small universe was very important to moving the Internet
               | forward. Back then there were no spammers, and very few
               | bad actors. It's possible that the "Postel law" helped
               | for a while -- hard to say for sure though.
        
               | cortesoft wrote:
               | It makes sense for a lot of situations purely out of
               | practicality. If a good portion of your customers use
               | tools that break spec, your choices are to accept the
               | brokenness or lose the customer.
               | 
               | Take browsers, for example. A lot of people complain that
               | browsers try to work around broken certificate setups on
               | servers, but if they didn't, users would just switch to
               | browsers that did.
        
               | avgcorrection wrote:
               | > It makes sense for a lot of situations purely out of
               | practicality. If a good portion of your customers use
               | tools that break spec, your choices are to accept the
               | brokenness or lose the customer.
               | 
               | Practical? So the adage is just "be liberal because you
               | have to out of practical necessity"? Are you sure that's
               | the history behind it?
        
               | nradov wrote:
               | You're not wrong, but sometimes you have to be pragmatic.
               | I've done a lot of work building interfaces to acquire
               | clinical data from healthcare provider organizations for
               | care quality improvement. There are clear industry
               | standards from HL7 (V2 Messaging, CDA, FHIR) but many
               | provider organizations don't implement them correctly, so
               | we had to accept a lot of junk and just find a way to
               | deal with it. Many of those provider organizations lack
               | IT resources and are at the mercy of their vendors or
               | consultants, so trying to hold them to strict standards
               | results in an impasse.
        
               | treflop wrote:
               | Idk it makes sense for UX sometimes. Really it's a case
               | by case basis and I don't think you should be adhering to
               | any law for this.
        
           | a1o wrote:
           | > In computing, the robustness principle is a design
           | guideline for software that states: "be conservative in what
           | you do, be liberal in what you accept from others". It is
           | often reworded as: "be conservative in what you send, be
           | liberal in what you accept". The principle is also known as
           | Postel's law, after Jon Postel, who used the wording in an
           | early specification of TCP.
           | 
           | I didn't know, so if anyone wants to know, here's the summary
           | from Wikipedia.
           | 
           | https://en.wikipedia.org/wiki/Robustness_principle
        
             | jrockway wrote:
             | It's a tough balance. Postel's law is from a world before
             | security mattered. It is incompatible with correctness.
             | 
             | The early Internet definitely wouldn't have worked without
             | a thick layer of "you set this flag wrong in the header but
             | I know what you meant". But these liberal formats have been
             | the death of many a "0 remote root holes in the default
             | install" OS slogan ;)
             | 
             | Git, meanwhile, would be something like:
             | $ git clone example.com/cool-code         FATAL: commit
             | abc123: 2 errors:              header.author: too many
             | authors in header             header.author[0]: empty
             | friendly name         checkout aborted, contact the
             | upstream git repo and tell 'em it's broke
             | 
             | People would be mad and would stop using Git, so "git
             | clone" has to accept whatever.
             | 
             | But the signature generating-code doesn't need to be this
             | liberal; it can run this validation and say "I'm not going
             | to sign that", and nobody would be mad. Github's
             | implementation is just a shortcut; some engineer tried
             | their flow against a test commit they made, it worked, they
             | checked it in, they shipped the feature. Then someone
             | thought "what if there are two author lines", and broke the
             | signing code. (Maybe write some fuzz tests that emit
             | headers and try signing them? You might not find this bug,
             | but it can help.)
             | 
             | The problem is the missing spec, but specs are useless if
             | not enforced, because people won't know that they
             | accidentally violated the spec. That's what's tough.
             | Postel's Law guarantees working software. But sometimes
             | working means failing open. (Failing closed can definitely
             | suck. Ever been late to work because of a "signal problem"?
             | The signalling system failed closed and prevented your
             | train from moving even though there wasn't any actual
             | reason to prevent movement. Postel's Law would say "it's
             | just a burned out light bulb; power ahead at line speed,
             | not expecting a train randomly misrouted and barreling into
             | you head on". 99.9% of the time, it's right! But there are
             | a lot of dead 0.1%-ers, which is after the early days, we
             | decided to make the system fail closed.)
        
           | surfingdino wrote:
           | It makes sense when you want rapid adoption, as was the case
           | with the Internet.
        
           | Arnavion wrote:
           | This still has the same problem. Your solution requires them
           | to program the first "Extract all author lines" step to
           | accept both lines, namely they would have had to make sure
           | their extractor considered author lines with a zero-length
           | author name to be "an author line". But if they had the
           | foresight to do that they could just as well have done it
           | with their original regex to begin with.
        
             | theamk wrote:
             | I don't see how?
             | 
             | As I said, actually parsing the line is the 2nd step. The
             | first step is to extract all lines which start from "author
             | ", without checking what the rest of the line looks like.
             | That could be something like starts_with?("author "); or
             | split by first space and check if first word is "author";
             | or verify against "^author " regex; or take first 7 chars
             | and compare against "author ". All of those methods would
             | easily catch the duplicate line in the attack and reject
             | it.
             | 
             | (you might be curious if simple "author", without any
             | argument, would be bad. First, even if it was, the impact
             | would be pretty low as there is no second name to inject.
             | Second, there is no need to worry: git actually checks for
             | space after "author" [0], simple "author\n" by itself would
             | be rejected)
             | 
             | [0] https://github.com/git/git/blob/e02ecfcc534e2021aae2907
             | 7a958...
        
               | Arnavion wrote:
               | >The first step is to extract all lines which start from
               | "author ", without checking what the rest of the line
               | looks like.
               | 
               | I'm saying that this already requires knowing that
               | "without checking what the rest of the line looks like"
               | is an important consideration. It would be just as valid
               | to define "extract all author lines" as "extract all
               | lines that have `author <value>`, as they did.
               | 
               | Only by knowing that the latter definition causes the bug
               | that the TFA found (either by thinking about it or by
               | testing against the canonical git implementation) would
               | it become known that that definition is wrong.
               | 
               | The problem is not the structure of the parser. The
               | problem is that they didn't check that their
               | implementation matches the behavior of the system it
               | speaks to (git).
               | 
               | Edit: Also covered in
               | https://news.ycombinator.com/item?id=39101419 /
               | https://news.ycombinator.com/item?id=39101736 /
               | https://news.ycombinator.com/item?id=39107945
        
         | alexjplant wrote:
         | I wrote a chintzy Preact app that pulls the source for a
         | Pokemon Red hack from Github and makes Pokemon stats, moves,
         | type matchups, etc. searchable [1] because enough has changed
         | from the original that I wanted a quick reference for my
         | current playthrough. I literally wrote in the TODOs in the
         | README                 Generate proper AST from asm source and
         | ditch janky Regex parsing?
         | 
         | because I felt guilty about doing things like
         | const match = l.match(/\s*move\s*(\w+),\s*(\w+),\s*(\d+),\s*(\w
         | +),\s*(\d+),\s*(\d+)\s*(.*)/)
         | 
         | Maybe I shouldn't be so hard on myself.
         | 
         | [1] https://github.com/alexjplant/rgbdex
        
           | willdr wrote:
           | A proper AST would "read" the source code programmatically,
           | as opposed to brittle regex? Is there any benefit to building
           | that when your regex works and what you're parsing is static?
           | (Genuine questions, my compsci background is not strong).
        
         | js2 wrote:
         | There's nothing inherently wrong with using a regular
         | expression here.
         | 
         | The security issue is that there are two different parsers and
         | they interpret invalid input differently. This is a parsing
         | differential vulnerability.
         | 
         | This security issue can (and does) occur whenever there is more
         | than one parser implementation.
        
           | theamk wrote:
           | git treats objects as a generic structure ("list of space-
           | separated name-value pairs, maybe with space continuations,
           | followed by double newline, followed my message") followed by
           | additional rules for individual values. Something very common
           | in older protocols -- see HTTP, deb control files, email,
           | etc...
           | 
           | And github code decided to short-circuit the process and
           | combine two steps into one, which introduced weird inter-
           | layer interactions where lines with good name but invalid
           | content would be silently rejected.
        
       | dboreham wrote:
       | If you use a regex in your application you need to write lots of
       | unit tests for it that include many negative test cases. Perhaps
       | even look for a test case generation tool/library that helps
       | automate this.
        
         | layer8 wrote:
         | The better solution is to have a formal specification of the
         | syntax, that the regular expression can straightforwardly be
         | derived from and be compared with.
        
       | captn3m0 wrote:
       | I've always avoided blind signing endpoints for this reason.
       | You're reducing the authenticity guarantee by a thousandfold, and
       | increasing so much complexity.
       | 
       | I don't think there's an easier way though short of not
       | supporting signatures in codespaces, and this definitely feels
       | like a feature where the Codespaces product team's roadmap was
       | prioritised over the Security team.
        
         | semiquaver wrote:
         | Now that SSH key signing is a thing, you might be able to use
         | SSH agent forwarding to sign commits without having to hand
         | over your private key.
        
           | namibj wrote:
           | ...GPG had such a feature for ages: it's AFAIK largely a
           | result from very modular/flexible smartcard/HSM support.
        
         | nightpool wrote:
         | It would be better to generate a user-and-codespaces-specific
         | private key that would be safe to give to the user by checking
         | out in the codespace so that you can use normal GPG signing
         | flows.
        
       | Taylor_OD wrote:
       | Signing commits is such a pain to set up. I'm convinced its such
       | a boring area with so little profit that no one has spent enough
       | time to make it more straight forward.
        
         | sitzkrieg wrote:
         | you can sign w ssh key w 2 git configs set. im not sure this
         | qualifies as pain
        
           | cryptonector wrote:
           | It's meaningless unless your keys are known to others to
           | speak for you. I.e., you have to participate in the PGP trust
           | mesh, and _that_ is what is a pain.
        
             | woodrowbarlow wrote:
             | isn't this what keybase is for?
        
             | arccy wrote:
             | or github.com/<username>.keys
        
             | Arnavion wrote:
             | https://datatracker.ietf.org/doc/draft-koch-openpgp-
             | webkey-s...
             | 
             | Serve the key at a certain URL from a webserver on your
             | email domain. Clients that trust the domain can trust the
             | key that it serves.
        
         | influx wrote:
         | 1Password makes it easy to do with ssh keys and biometrics.
         | It's actually pretty slick.
        
         | g4zj wrote:
         | I tend to agree with Linus Torvalds' statement about signing
         | tags rather than commits.
         | 
         | https://web.archive.org/web/20201111174438/http://git.661346...
        
         | Arnavion wrote:
         | What's hard about it? You just define `user.signingkey =
         | $key_id` and `commit.gpgsign = true` in `~/.config/git/config`
         | once and be done with it. Or only the first one if you want to
         | choose what you sign instead of signing everything.
         | 
         | Or are you talking about setting up GPG in general?
        
       | nightpool wrote:
       | Why does Github use the same GPG key for all users? Feels like it
       | makes these kind of confused deputy attacks endemic to the entire
       | platform rather than being contained to a single user account.
        
       | gwern wrote:
       | > Since the first author name is zero characters long, the regex
       | skips that line, and the fake second author line is used instead.
       | Git ignores extra author lines after the first, so Codespaces
       | looks at the second author line but Git looks at the first. This
       | means we can create GitHub-signed commits with any author
       | name+email.
       | 
       | Classic weird machine exploit of looking at two independent ad
       | hoc parsers and finding inputs where they desynchronize and enter
       | different states.
        
       | n4jm4 wrote:
       | Relying on regexes instead of thoroughly tested parser AST's
       | encourages vulnerabilities.
       | 
       | Are they really scraping logs for credentials instead of using an
       | API?
        
       | layer8 wrote:
       | This looks like a classic case of a parser mismatch vulnerability
       | [0].
       | 
       | [0] https://www.brainonfire.net/blog/2022/04/11/what-is-
       | parser-m...
        
         | jwilk wrote:
         | Discussed on HN:
         | 
         | https://news.ycombinator.com/item?id=38743029 (22 comments)
        
       | 1oooqooq wrote:
       | the fact they haven't published an analysis with every single
       | case this was abused, even if zero, in the past is proof enough
       | this was a feature to someone.
        
       ___________________________________________________________________
       (page generated 2024-01-23 23:01 UTC)