[HN Gopher] Forging signed commits on GitHub
___________________________________________________________________
Forging signed commits on GitHub
Author : rokkitmensch
Score : 277 points
Date : 2024-01-22 04:49 UTC (1 days ago)
(HTM) web link (iter.ca)
(TXT) w3m dump (iter.ca)
| YetAnotherNick wrote:
| Really nice and educational exploit. Regexes should almost never
| be used in parsing structured stuff in production. While it is
| developer friendly, it's very hard to think all the edge cases.
| _nalply wrote:
| What I tought, they should have used some more official library
| to manage git instead of doing the work themselves, but perhaps
| there was none?
| j16sdiz wrote:
| We have libgit2, but github have their own ruby
| implementation of everything.
|
| (java and rust have their own "native" implementation too.
| guappa wrote:
| libgit2 was written after github. So that might explain.
| bspammer wrote:
| libgit2's frontpage claims that GitHub does use them:
| https://libgit2.org/
| guappa wrote:
| Depends which part of github uses them. Might even be
| their GUI client...
| Zababa wrote:
| Are there alternatives that make it easier to think about all
| the edge cases?
| michaelt wrote:
| Keep using regular expressions, just stop using . and
| instead, define what is allowed.
|
| You're expecting an input to be a UUID? Check it with
| ^[a-fA-F0-9\\-]{30,40}$ and you know you're not getting any
| apostrophes or script tags or enormous inputs or empty inputs
| or newlines or lookalike characters or emojis.
| ylk wrote:
| I think re-implementing the functionality is the mistake
| here. A big counter-argument to "rewrite in rust" is usually
| that by rewriting you introduce new bugs.
|
| Especially for security critical things one should re-use the
| implementation to avoid issues like the above. Proving both
| the original and re-implemented parser to be equivalent would
| probably also work, but not sure how practical.
|
| And in any case have competent people audit what you did.
| avgcorrection wrote:
| How did we get from criticizing regexes to mocking "rewrite
| in rust" in the span of three comments?
| ylk wrote:
| I'm not mocking it. I'm just applying the argument I see
| many people on HN make in discussions about rust to this
| case, where I suspect many will lean much more on the
| side of using regexes to reimplement parsers. Don't know
| if that'll be the same people. Either way, I didn't want
| to have a discussion about rust.
| avgcorrection wrote:
| > I'm not mocking it. I'm just applying the argument I
| see many people on HN make in discussions about rust to
| this case, where I suspect many will lean much more on
| the side of using regexes to reimplement parsers.
|
| Hmm. Okay.
|
| The part that bewilders me is that Rust showed up. Which
| would have made sense if someone was complaining about
| C++ or something, or maybe unsafe memory management. But
| regexes?
|
| > Either way, I didn't want to have a discussion about
| rust.
|
| Which is why you brought it up unprompted... I give up.
| mistercheph wrote:
| Hey, watch what you're saying buddy, 'rewrite in rust' is
| what pays the bills in this family!
| akdor1154 wrote:
| Disagree here, the problem is their implementation was slightly
| different to Git's. There is more chance of stuffing up like
| this with a handrolled parser than with a regex. The only
| viable alternative i'd accept here to actually reduce risk is
| to call into libgit itself. Absent that, a regex is appropriate
| for this problem.
|
| There are many problems where regexes are succinct and
| appropriate, (of course there are many more where they are
| neither).
| YetAnotherNick wrote:
| Unless git uses regex, it is almost impossible to verify that
| the regex is same as their validation. What if git's
| validation changes. Even looking at the corrected regex I am
| not sure what would happen for strings like '<a@a.com>>' or
| '<' or 'x <a>'. Not saying the github's corrected regex is
| wrong, but it is a mental challenge to verify it.
| thrdbndndn wrote:
| When the data itself is just plain text, are there any other
| (better) ways?
| brokensegue wrote:
| Parsers?
| davemp wrote:
| I don't see how writing a parser from scratch would
| mitigate bugs vs using a regex parser. Parsers are just a
| hot spot for security bugs that should get extra scrutiny.
| semanticc wrote:
| Regex is the parser for regular languages.
| cedilla wrote:
| This isn't really a regex problem though. The regex probably
| did exactly what the dev intended, it's just that the dev
| failed to take into account that <author> could also be a 0
| length string.
|
| The same logic bug would have occured with a parser.
| account42 wrote:
| This is all speculation of course but a parser is more likely
| to recognize the author: prefix and then error out on
| anything unexpected in the rest of the line. Meanwhile
| regular expressions don't have any way to report errors.
| glenjamin wrote:
| Personally I'd have separated finding the Author line from
| pulling out the contents of that line - but still used
| Regex
| Vogtinator wrote:
| I treat "signed by GitHub's verified signature" equivalent to
| unsigned. It's not the author's signature, just some third
| party's.
| sureglymop wrote:
| One should treat git itself as insecure. When you sign a
| commit, the signature is based on the commit message and commit
| metadata including tree object_id and parent object_id. But if
| sha1 is used to generate commit hashes, one can theoretically
| forge a commit. This means that in an elaborate supply chain
| attack, one could spoof a commit with a valid signature. That
| signature would then still appear valid for the spoofed commit
| and probably make it seem more legitimate than it is.
| Anunayj wrote:
| For people who might be wondering why git hasn't moved to
| sha256 yet, here's a lwn article on it:
| https://lwn.net/Articles/898522/
| jacoblambda wrote:
| For a more recent bit of info on it (I remembered this
| thread from the latter half of last year on the mailing
| list):
|
| https://lore.kernel.org/git/2f5de416-04ba-c23d-1e0b-83bb655
| 8...
|
| The important snippet from this thread is:
|
| > On 6/29/23 07:59, Junio C Hamano wrote:
|
| > > Adam Majer <adamm@zombino.com> writes:
|
| > > > Is sha256 still considered experimental or can it be
| assumed to be stable?
|
| > > I do not think we would officially label SHA-256
| support as "stable" until we have good interoperability
| with SHA-1 repositories, but the expectation is that we
| will make reasonable effort to keep migration path for the
| current SHA-256 repositories, even if it turns out that its
| on-disk format need to be updated, to keep the end-user
| data safe.
|
| > That could be a different definition of stable. But I'm
| satisfied that current sha256 repositories will not end up
| incompatible with some future version of git without
| migration path (talking about on-disk format).
|
| > So maybe my question should be reworded to "is sha256
| still considered early stage, for testing purposes only
| with possible data-loss or can it be relied on for actual
| long lived repositories?"
|
| > > So while "no-longer-experimental" patch is probably a
| bit premature, the warning in flashing red letters to
| caution against any use other than testing may want to be
| toned down.
|
| > Agreed. I think it should be clear that SHA256 and SHA1
| repositories cannot share data at this point. The scary
| wording should be removed though, as currently it sounds
| like "data loss incoming and it's your fault" if one
| chooses sha256
|
| > - Adam
|
| TLDR: SHA256 works well and can be considered "beta". It's
| viable but you can't really share data between a SHA1 and
| SHA256 repo. So if your repo doesn't need to merge to/from
| existing sha1 repos and you aren't dealing with subtrees or
| anything fancy like that, you should be able to use SHA256.
| It is not expected to break your repo, it's reasonably well
| supported, and in the event a migration needs to occur,
| there will be support for that as well.
|
| So if you want to jump on the SHA256 train you absolutely
| can provided you are following a pretty normal/boring use
| case.
| theamk wrote:
| Note that while SHA1 itself is broken, git specifically has a
| code to detect attempts at collisions and prevent them, see
| all the sha1dc stuff [0]
|
| [0] https://news.ycombinator.com/item?id=17825441
| kevincox wrote:
| I think it adds value. It means that the commit was created
| using the author's credentials (to the extent that you trust
| GitHub and its lack of bugs). So the author can be trusted to
| some extent on these commits.
|
| Although extending this signing ability to arbitrary commits
| via the codespaces API seems weird as the commit is no longer
| generated by GitHub. It seems now that you could generate some
| fake merge that changed data in a surprising way. Previously
| you knew that merges generated by GitHub would be "clean". Not
| that this changes much in practice.
| cryptonector wrote:
| In particular it means "this was committed via the GitHub Web
| UI" _and_ "the author was authenticated to GitHub". But the
| latter part is not really any different from who pushed the
| commit. And clearly there is no value in this as long as
| GitHub doesn't make this feature more secure. Using regexp to
| parse the author line then ignoring author lines that don't
| match... yikes.
| kevincox wrote:
| But how do you know who pushed a given commit? I don't
| think it is recorded in the Git repository.
|
| I agree that "to the extent that you trust GitHub and its
| lack of bugs" is a big caveat. But it sill seems better to
| have this information than not to have it.
| semiquaver wrote:
| The badge is also displayed if the commit was signed
| locally by a GPG key whose public component is uploaded to
| GitHub by the the claimed author and committer.
| orblivion wrote:
| I realized there is one benefit that this offers: Github
| attests to the time that it happened.
|
| Supposing you're looking at a popular repository, one where
| a malicious commit would likely be noticed eventually. The
| last commit was "one month ago". What's to say someone
| didn't compromise the developer's computer, sign a
| malicious commit backdated by a month, and push it to
| Github? If the last commit was made via the Github UI, you
| have pretty good assurance (i.e. as much as you trust
| Github not to get hacked) that this didn't happen.
|
| Even better if the previous commit was done by the author,
| and the Github UI commit is trivially confirmed as safe.
| That way, you can confirm the author's commit locally, in
| case Github is the one that got hacked.
|
| If both the author and Github got hacked, :shrug: I guess
| that's a pretty skilled adversary.
|
| Caveat: All of the above is my own analysis. I'm curious if
| there are flaws in my thinking here.
| kyrofa wrote:
| > It means that the commit was created using the author's
| credentials (to the extent that you trust GitHub and its lack
| of bugs).
|
| This is incorrect. If I create a merge request, and then the
| maintainer clicks the "squash and merge" button, that commit
| is associated with me, even though I'm not actually the
| creator of that commit at that point: the maintainer is. I
| believe this happens even if someone else (e.g. the
| maintainer) pushed commits to that merge request before
| squashing them together.
| kyrofa wrote:
| Yeah I've always found that to be frustrating. Someone squashes
| and merges MY commits, and github still shows them as validly
| signed, even though they aren't the commits I created. Sadly
| it's hard to actually filter those out: the UI makes them all
| look the same.
| photoGrant wrote:
| > April 24 2023: GitHub closes the issue, saying that being able
| to impersonate your own account is not an issue
|
| I mean, fear & curiosity would have me testing it before closing.
| Wha?
| michaelt wrote:
| As I understand things, everyone who runs a bug bounty gets an
| endless stream of useless reports telling them things like
| "your website sends the server header which makes you
| vulnerable to hacking, bounty please"
|
| Therefore, they have _very_ junior people assigned to triage
| bug reports. And those very junior people are in the habit of
| closing 99.9% of reports as not-really-a-security-problem.
| dns_snek wrote:
| Anecdote time, I discovered a bug where a _bank_ app would
| continue receiving 2FA notification prompts even after the
| user terminated the session from another device (e.g. after
| realizing their device was stolen). The app was apparently
| still holding onto some kind of a valid token that could be
| used to approve 2FA requests despite being "logged out".
|
| I submitted detailed reproduction steps, with a video clearly
| showing the bug. Triage claimed that this was "intended
| behavior" and as far as I'm aware, the bank didn't even see
| my report.
| zero_k wrote:
| Haha yes, that's the norm. They don't care. Report to the
| regulator. Then they care. Find out who the regulator is in
| your country, and report to them. Kinda annoying, but when
| the regulator tells them they need to do it, it magically
| gets done.
|
| Note that the regulator doesn't threaten with childish
| things like a fine of 1M USD, that's kids' playground
| stuff. They threaten to revoke their banking license, which
| will cost them 100x more per day :)
| tedivm wrote:
| I've found that responding "Well, if that's intended
| behavior you don't mind if I blog about it then? The post
| will go up in a week." tends to help them take it more
| seriously.
| hannob wrote:
| Sorry, if that's your security process, it's broken. If
| you're microsoft and that's your security process, you have
| no excuse and should be embarrassed. However, I heard such
| stories about Microsoft's security process plenty of times
| before, so I guess that's what it is.
| Sohcahtoa82 wrote:
| > "your website sends the server header which makes you
| vulnerable to hacking, bounty please"
|
| Getting nightmare flashbacks at my last job.
|
| My CISO insisted _everything_ on our penetration test report
| get remediated. Even "Information" level items, like the
| server header. And port 443 being open. _facepalm_
| asmor wrote:
| GitHub closing any security issue by default sure is a thing.
|
| I reported an issue that was not fun to reproduce in their SAML
| implementation, where they would include organization membership
| in OAuth tokens even if you didn't have a SAML session, so if you
| did device posture in your SAML provider to protect from access
| to code, installing apps that allow you to get code (e.g.
| SonarCloud) would allow bypass.
|
| The weirdest thing is that they had an endpoint that did check if
| there was a SAML session (membership API would 403 without), but
| I still got told this wasn't a bug and within expected behavior
| (not for our CISO it wasn't, and it's still not documented).
|
| Shoutout to Tailscale for sending me some stuff despite it not
| being an issue in their implementation.
| datadeft wrote:
| > GitHub fixed the issue by changing the problematic regex
|
| Why am I not surprised.
| asimpletune wrote:
| One crucial thing that enabled this work is being able to
| decompile the enterprise version of the GitHub binary. It makes
| you think that distributing a binary of your app is either a good
| idea or bad idea, depending on how you view security. If your app
| is important though then it's definitely a good idea.
| greatgib wrote:
| At the opposite, thanks to that they now have a safer platform.
|
| At the opposite, there could still be nefarious insiders that
| discover that and abuse of it or resell the exploit for years
| without anyone noticing.
|
| Especially for a proprietary service, as a customer, for me it
| is an added bonus point to know that the thing can and is
| battle tested by outsiders...
| epcoa wrote:
| > [was] being able to decompile the enterprise version of the
| GitHub binary.
|
| > It makes you think that distributing a binary of your app is
| either a good idea or bad idea,
|
| > If your app is important though then it's definitely a good
| idea.
|
| I don't follow. What are you saying would have been the good
| idea here?
|
| It also isn't clear to me that they required the code except to
| answer why it works, certainly this parsing defect could be
| discovered simply by testing the endpoint.
| asimpletune wrote:
| That distributing a binary app was a good idea. The attacker
| found and fixed the vulnerability.
|
| Of course it wasn't required to find the vulnerability,
| however if you make it then it's more likely that good guys
| will find and fix stuff. The contrary would be it's difficult
| to find this stuff, so then only highly motivated bad guys do
| it, because at that point 10 grand might not be worth it.
|
| Again, this was all prefaced with it depends. Some might see
| it as making it easier to find problems, and others might say
| 'yeah, exactly.". It's just an opinion. A fact is however
| that a binary makes it easier. Source code would be even
| easier.
| xboxnolifes wrote:
| But what alternative are you suggesting? Not having apps?
| rsanek wrote:
| if you don't offer a self hosted version you don't have
| to distribute anything. just force people to use your
| cloud product
| avgcorrection wrote:
| I don't like how GitHub changes my committer to them in the name
| of being Verified. I don't care about it being Verified by GitHub
| (yes, even if it had always worked and not allowed spoofing).
|
| It's weird that GitHub has such a long history of cooperation
| with the Git project and yet just rewriting the committer is
| considered okay to them.
| sp1rit wrote:
| It's common practice to have the committer not match the author
| (especially in environments where only patches/diffs are sent).
| I can't imagine the git project having any issue with that,
| given that the author field stays retained (ignoring this bug
| ofc.).
| avgcorrection wrote:
| I tried to do some more research on this. Unfortunately the
| search combination
|
| > rewrite commit github verified
|
| is so insanely SEO-poisoned by (1) StackOverflow questions
| about how to rewrite commits and (2) GitHub's docs about
| _regular_ (built-in Git commit verification) commit
| verification.
|
| But I thought about it some more. And I guess them hijacking
| the committer on pull request merges is not that insanely
| bothersome. It's their platform after all.
|
| Just yet another reason to not interact with forges in a way
| that affects Git itself.
|
| But another thing I've noticed is that sometimes my committer
| becomes the stupid initial (legacy) email I used to sign up
| there. Why in the hell? My current email is already
| registered there. But I tested this now and at least I got
| the option to merge as one of my addresses. So I guess it was
| fixed?
|
| > It's common practice to have the committer not match the
| author
|
| When the committer and author are different people. And
| apparently when GitHub tries to insert itself as a pseudo-
| person.
|
| I expected that my own actions would be tied to my own
| identity on GitHub. Not for them to shoehorn GitHub
| Verified(tm) into commit objects that I create through
| clickity-clacking around their UI. But it is after all their
| UI and they can poison commits however they like since _they_
| create it (i.e. I didn't create it and send it to them; then
| I would have noticed if they tried to rewrite it). So shame
| on me.
|
| > especially in environments where only patches/diffs are
| sent
|
| Clearly doesn't apply here.
|
| > I can't imagine the git project having any issue with that,
| given that the author field stays retained (ignoring this bug
| ofc.).
|
| Given that they don't use GitHub for anything directly
| related to Git (PRs and such) and that they have their own
| way of indicating code provenance: No, I guess they have no
| reason to care. (Other than in spirit.)
|
| But it matters in general (in spirit) that if I pick up some
| patch by Jack Cooper and apply it then the committer is _me_.
| Not outlook.com. Or whatever program did it for me.
|
| The commit object has a few metadata fields. I've never seen
| anyone say, "Oh yeah sure, just use that field for whatever
| it's not like it matters anyway".
| brasic wrote:
| The web-flow signing system is for users' convenience in places
| where it's not feasible to sign the commit with their own
| private key: commits made in the web interface or on an
| ephemeral GH-provisioned VM (codespace). For the latter, you
| are free to send your own private key to your codespace so you
| can sign your own commits but GitHub cannot because they don't
| have your private key and don't want to have it. Defaults
| matter and signed commits are important.
|
| As a sibling notes, this use case and similar ones is the
| reason the committer field exists as distinct from the author
| field. I think a $10K bounty for this bug speaks to how
| seriously they stand behind the fact that they will only sign
| and mark as verified commits whose author field matches an
| authenticated user.
|
| (Disclaimer: former GH employee)
| avgcorrection wrote:
| > The web-flow signing system is for users' convenience in
| places where it's not feasible to sign the commit with their
| own private key:
|
| Who signs _all_ their commits? Joey Hess maybe? There are
| certainly others. But I've never seen anyone make a case for
| this. In fact only negative cases since it just encourages
| you to automate your signing process, which many are not
| comfortable with.[1]
|
| I'm not important enough to sign anything.
|
| On Bitbucket we push the big merge button and out comes a
| commit with the _correct_ person attributed to it.[2] Even
| Atlassian manages to do this the correct way.
|
| > For the latter, you are free to send your own private key
| to your codespace so you can sign your own commits but
|
| Yeah GPG/SSH sign commits... who cares. Most people don't.
|
| > Defaults matter and signed commits are important.
|
| I don't care about your opinion.
|
| I wouldn't mind if this was an option that I could opt out
| of. (I'm wondering out loud, not asking you or anyone else.)
| I just haven't heard of it yet.
|
| I'm a Git user after all so I'm used to changing bad
| defaults.
|
| > As a sibling notes, this use case and similar ones is the
| reason the committer field exists as distinct from the author
| field.
|
| Quite a leap to go from attributing emailed-around patches to
| the correct author while also maintaining the committer (like
| the maintainer) to what looks equivalent to Norton Antivirus
| junk output stuffed 40 lines into someone's email signature.
|
| > I think a $10K bounty for this bug speaks to how seriously
| they stand behind the fact that they will only sign and mark
| as verified commits whose author field matches an
| authenticated user.
|
| "I think the price they put on this SPOOFING vulnerability
| speaks to how serious they are about verified commits", they
| said without irony.
|
| "Sent from my GitHub", ah they all felt at-ease
| immediately... wait the same platform that had a spoofing
| vulnerability?
|
| [1] Well, allegedly. I have never signed _anything_ so I
| don't know.
|
| [2] They committed it too. Or wait. Was that the merge
| button?
| oefrha wrote:
| If you contribute to the Git project itself, or other major
| git-using projects like the Linux kernel, you'll get completely
| accustomed to having your commits committed by someone else.
|
| Having a long history of cooperation with the Git project
| should only increase your confidence in doing that.
|
| Also, don't use the web UI to commit if you want to sign with
| your key, simple.
| avgcorrection wrote:
| I know that Junio C Hamano applies all patches himself to
| git.git and thus is the committer on all commits.[1] That is
| _how Git works_ when picking out patches from a mailbox.
|
| The are all committed by _people_ , not some middle man
| program.
|
| Just like I can click the big merge button _myself_ and then
| it is committed by... oh by mr noreply?
|
| Try to find some committer or author in the git.git project
| that has a name like "Verified By Yahoo! <jibber jabber hash
| nonense>"? Good luck. (Almost like that kind of forge
| silliness was never accepted into any of their workflows...
| yet they are somehow very comparable, in your mind.)
|
| This is like trying to explain to some Outlook representative
| that no one cares if they "verify" emails that go through
| them with some proprietary DKIM headers that only they know
| and care about.[2] "Well actually, if you understood how
| email headers work then you would see that it is not at all
| unlike......"
|
| > Also, don't use the web UI to commit if you want to sign
| with your key, simple.
|
| Thank you for the tip.
|
| [1] With the few pull request by way of email exceptions
|
| [2] Made up but not implausible example
| mook wrote:
| You may have pressed the button, but the actual commit
| action was done by GitHub's infrastructure, hence they are
| listed as committer. Think of it as an acknowledgment that
| maybe they were hacked and made to do the commit even if
| you didn't want it.
|
| If you want to do the commit yourself, you can run `git
| merge` locally and push the result. They won't touch the
| commit (including the committer) in that case, because the
| commit hash must never change. I'm not sure if they track
| who did the push (or even what the sensible value would be
| if two people pushed the same commit to two different
| forks).
| jaimex2 wrote:
| Is there anything more pointless than signed commits? If your
| security team ever enforced it you're probably already pwned but
| may as well replace them just in case.
| samcat116 wrote:
| Why do you say that? It seems like they would be more important
| in the context of supply chain attacks.
| tumetab1 wrote:
| (2023)
| richbell wrote:
| > Since the first author name is zero characters long, the regex
| skips that line, and the fake second author line is used instead.
| Git ignores extra author lines after the first, so Codespaces
| looks at the second author line but Git looks at the first.
|
| How common is it to have multiple author lines? I haven't seen
| that before, only co-author footer[1].
|
| [1]: https://docs.github.com/en/pull-requests/committing-
| changes-...
| avgcorrection wrote:
| Apparently multiple authors is invalid.[1]
|
| [1] https://stackoverflow.com/q/65445147/1725151
| hannob wrote:
| It's wild that we're in 2024, and there's a security issue that
| is essentially "we parse data in an ad-hoc format with an ad-hoc
| parser based on regexps, and we got that wrong". And the solution
| is "we tweaked the regexp", and not "we properly
| specified/documented the format (or replaced it with something
| properly specified), and wrote a parser that correctly implements
| the specification".
| meandmycode wrote:
| Especially wild for GitHub, I would have thought they had the
| entirety of 'git foo' codified up with actual parsers by now,
| but I guess a regex will always do..
| formerly_proven wrote:
| Well the user-git-facing parts of GitHub _are_ built on
| libgit2 and libssh.
| hedora wrote:
| This doesn't surprise me at all coming from github. I've
| tried using their public api's, and they're all shockingly
| bad. In some cases I thought it was my mistake, but then I
| checked their website, and found it was bug for bug
| compatible with the thing I'd built.
|
| My usual rule of thumb for choosing products is to go with
| whatever rising alternative there is to the entrenched
| monopoly. The little one needs to be much better to overcome
| monopoly effects.
|
| I don't sign my own paycheck, but would be moving to gitlab
| or something if I did.
| sanjayjc wrote:
| I must be missing something because the corrected regex
| (/\Aauthor ([^<]*)[ ]{0,1}<(.+)>/) seems to behave the same as
| the problematic one: https://regexr.com/7qrum
|
| I wonder whether github may not be using PCRE.
| jwilk wrote:
| The problem wasn't that the regexp matched the second
| "author" line, but that it didn't match the first one.
| TravisCooper wrote:
| This is the correct take.
| theamk wrote:
| Even if I had to use regex, I'd at least do it in 2 stages:
|
| - Extract all "author" lines, fail-fast if there is more than
| one
|
| - Parse the line, again fail-fast if there is anything strange
|
| this would just stop this, and similar, attacks completely.
| Postel's law has no place in modern development if we want
| things to be secure.
| HideousKojima wrote:
| >Postel's law
|
| Also known as "make other people's incompetence and inability
| to conform to a spec _your_ problem. "
| avgcorrection wrote:
| Yeah that "law" makes absolutely zero sense to me. It's
| like "be nice" but in a self-sacrificial way which then
| indirectly sacrifices other people in the long run.
| cryptonector wrote:
| It's from an era when interoperability in a then-very-
| small universe was very important to moving the Internet
| forward. Back then there were no spammers, and very few
| bad actors. It's possible that the "Postel law" helped
| for a while -- hard to say for sure though.
| cortesoft wrote:
| It makes sense for a lot of situations purely out of
| practicality. If a good portion of your customers use
| tools that break spec, your choices are to accept the
| brokenness or lose the customer.
|
| Take browsers, for example. A lot of people complain that
| browsers try to work around broken certificate setups on
| servers, but if they didn't, users would just switch to
| browsers that did.
| avgcorrection wrote:
| > It makes sense for a lot of situations purely out of
| practicality. If a good portion of your customers use
| tools that break spec, your choices are to accept the
| brokenness or lose the customer.
|
| Practical? So the adage is just "be liberal because you
| have to out of practical necessity"? Are you sure that's
| the history behind it?
| nradov wrote:
| You're not wrong, but sometimes you have to be pragmatic.
| I've done a lot of work building interfaces to acquire
| clinical data from healthcare provider organizations for
| care quality improvement. There are clear industry
| standards from HL7 (V2 Messaging, CDA, FHIR) but many
| provider organizations don't implement them correctly, so
| we had to accept a lot of junk and just find a way to
| deal with it. Many of those provider organizations lack
| IT resources and are at the mercy of their vendors or
| consultants, so trying to hold them to strict standards
| results in an impasse.
| treflop wrote:
| Idk it makes sense for UX sometimes. Really it's a case
| by case basis and I don't think you should be adhering to
| any law for this.
| a1o wrote:
| > In computing, the robustness principle is a design
| guideline for software that states: "be conservative in what
| you do, be liberal in what you accept from others". It is
| often reworded as: "be conservative in what you send, be
| liberal in what you accept". The principle is also known as
| Postel's law, after Jon Postel, who used the wording in an
| early specification of TCP.
|
| I didn't know, so if anyone wants to know, here's the summary
| from Wikipedia.
|
| https://en.wikipedia.org/wiki/Robustness_principle
| jrockway wrote:
| It's a tough balance. Postel's law is from a world before
| security mattered. It is incompatible with correctness.
|
| The early Internet definitely wouldn't have worked without
| a thick layer of "you set this flag wrong in the header but
| I know what you meant". But these liberal formats have been
| the death of many a "0 remote root holes in the default
| install" OS slogan ;)
|
| Git, meanwhile, would be something like:
| $ git clone example.com/cool-code FATAL: commit
| abc123: 2 errors: header.author: too many
| authors in header header.author[0]: empty
| friendly name checkout aborted, contact the
| upstream git repo and tell 'em it's broke
|
| People would be mad and would stop using Git, so "git
| clone" has to accept whatever.
|
| But the signature generating-code doesn't need to be this
| liberal; it can run this validation and say "I'm not going
| to sign that", and nobody would be mad. Github's
| implementation is just a shortcut; some engineer tried
| their flow against a test commit they made, it worked, they
| checked it in, they shipped the feature. Then someone
| thought "what if there are two author lines", and broke the
| signing code. (Maybe write some fuzz tests that emit
| headers and try signing them? You might not find this bug,
| but it can help.)
|
| The problem is the missing spec, but specs are useless if
| not enforced, because people won't know that they
| accidentally violated the spec. That's what's tough.
| Postel's Law guarantees working software. But sometimes
| working means failing open. (Failing closed can definitely
| suck. Ever been late to work because of a "signal problem"?
| The signalling system failed closed and prevented your
| train from moving even though there wasn't any actual
| reason to prevent movement. Postel's Law would say "it's
| just a burned out light bulb; power ahead at line speed,
| not expecting a train randomly misrouted and barreling into
| you head on". 99.9% of the time, it's right! But there are
| a lot of dead 0.1%-ers, which is after the early days, we
| decided to make the system fail closed.)
| surfingdino wrote:
| It makes sense when you want rapid adoption, as was the case
| with the Internet.
| Arnavion wrote:
| This still has the same problem. Your solution requires them
| to program the first "Extract all author lines" step to
| accept both lines, namely they would have had to make sure
| their extractor considered author lines with a zero-length
| author name to be "an author line". But if they had the
| foresight to do that they could just as well have done it
| with their original regex to begin with.
| theamk wrote:
| I don't see how?
|
| As I said, actually parsing the line is the 2nd step. The
| first step is to extract all lines which start from "author
| ", without checking what the rest of the line looks like.
| That could be something like starts_with?("author "); or
| split by first space and check if first word is "author";
| or verify against "^author " regex; or take first 7 chars
| and compare against "author ". All of those methods would
| easily catch the duplicate line in the attack and reject
| it.
|
| (you might be curious if simple "author", without any
| argument, would be bad. First, even if it was, the impact
| would be pretty low as there is no second name to inject.
| Second, there is no need to worry: git actually checks for
| space after "author" [0], simple "author\n" by itself would
| be rejected)
|
| [0] https://github.com/git/git/blob/e02ecfcc534e2021aae2907
| 7a958...
| Arnavion wrote:
| >The first step is to extract all lines which start from
| "author ", without checking what the rest of the line
| looks like.
|
| I'm saying that this already requires knowing that
| "without checking what the rest of the line looks like"
| is an important consideration. It would be just as valid
| to define "extract all author lines" as "extract all
| lines that have `author <value>`, as they did.
|
| Only by knowing that the latter definition causes the bug
| that the TFA found (either by thinking about it or by
| testing against the canonical git implementation) would
| it become known that that definition is wrong.
|
| The problem is not the structure of the parser. The
| problem is that they didn't check that their
| implementation matches the behavior of the system it
| speaks to (git).
|
| Edit: Also covered in
| https://news.ycombinator.com/item?id=39101419 /
| https://news.ycombinator.com/item?id=39101736 /
| https://news.ycombinator.com/item?id=39107945
| alexjplant wrote:
| I wrote a chintzy Preact app that pulls the source for a
| Pokemon Red hack from Github and makes Pokemon stats, moves,
| type matchups, etc. searchable [1] because enough has changed
| from the original that I wanted a quick reference for my
| current playthrough. I literally wrote in the TODOs in the
| README Generate proper AST from asm source and
| ditch janky Regex parsing?
|
| because I felt guilty about doing things like
| const match = l.match(/\s*move\s*(\w+),\s*(\w+),\s*(\d+),\s*(\w
| +),\s*(\d+),\s*(\d+)\s*(.*)/)
|
| Maybe I shouldn't be so hard on myself.
|
| [1] https://github.com/alexjplant/rgbdex
| willdr wrote:
| A proper AST would "read" the source code programmatically,
| as opposed to brittle regex? Is there any benefit to building
| that when your regex works and what you're parsing is static?
| (Genuine questions, my compsci background is not strong).
| js2 wrote:
| There's nothing inherently wrong with using a regular
| expression here.
|
| The security issue is that there are two different parsers and
| they interpret invalid input differently. This is a parsing
| differential vulnerability.
|
| This security issue can (and does) occur whenever there is more
| than one parser implementation.
| theamk wrote:
| git treats objects as a generic structure ("list of space-
| separated name-value pairs, maybe with space continuations,
| followed by double newline, followed my message") followed by
| additional rules for individual values. Something very common
| in older protocols -- see HTTP, deb control files, email,
| etc...
|
| And github code decided to short-circuit the process and
| combine two steps into one, which introduced weird inter-
| layer interactions where lines with good name but invalid
| content would be silently rejected.
| dboreham wrote:
| If you use a regex in your application you need to write lots of
| unit tests for it that include many negative test cases. Perhaps
| even look for a test case generation tool/library that helps
| automate this.
| layer8 wrote:
| The better solution is to have a formal specification of the
| syntax, that the regular expression can straightforwardly be
| derived from and be compared with.
| captn3m0 wrote:
| I've always avoided blind signing endpoints for this reason.
| You're reducing the authenticity guarantee by a thousandfold, and
| increasing so much complexity.
|
| I don't think there's an easier way though short of not
| supporting signatures in codespaces, and this definitely feels
| like a feature where the Codespaces product team's roadmap was
| prioritised over the Security team.
| semiquaver wrote:
| Now that SSH key signing is a thing, you might be able to use
| SSH agent forwarding to sign commits without having to hand
| over your private key.
| namibj wrote:
| ...GPG had such a feature for ages: it's AFAIK largely a
| result from very modular/flexible smartcard/HSM support.
| nightpool wrote:
| It would be better to generate a user-and-codespaces-specific
| private key that would be safe to give to the user by checking
| out in the codespace so that you can use normal GPG signing
| flows.
| Taylor_OD wrote:
| Signing commits is such a pain to set up. I'm convinced its such
| a boring area with so little profit that no one has spent enough
| time to make it more straight forward.
| sitzkrieg wrote:
| you can sign w ssh key w 2 git configs set. im not sure this
| qualifies as pain
| cryptonector wrote:
| It's meaningless unless your keys are known to others to
| speak for you. I.e., you have to participate in the PGP trust
| mesh, and _that_ is what is a pain.
| woodrowbarlow wrote:
| isn't this what keybase is for?
| arccy wrote:
| or github.com/<username>.keys
| Arnavion wrote:
| https://datatracker.ietf.org/doc/draft-koch-openpgp-
| webkey-s...
|
| Serve the key at a certain URL from a webserver on your
| email domain. Clients that trust the domain can trust the
| key that it serves.
| influx wrote:
| 1Password makes it easy to do with ssh keys and biometrics.
| It's actually pretty slick.
| g4zj wrote:
| I tend to agree with Linus Torvalds' statement about signing
| tags rather than commits.
|
| https://web.archive.org/web/20201111174438/http://git.661346...
| Arnavion wrote:
| What's hard about it? You just define `user.signingkey =
| $key_id` and `commit.gpgsign = true` in `~/.config/git/config`
| once and be done with it. Or only the first one if you want to
| choose what you sign instead of signing everything.
|
| Or are you talking about setting up GPG in general?
| nightpool wrote:
| Why does Github use the same GPG key for all users? Feels like it
| makes these kind of confused deputy attacks endemic to the entire
| platform rather than being contained to a single user account.
| gwern wrote:
| > Since the first author name is zero characters long, the regex
| skips that line, and the fake second author line is used instead.
| Git ignores extra author lines after the first, so Codespaces
| looks at the second author line but Git looks at the first. This
| means we can create GitHub-signed commits with any author
| name+email.
|
| Classic weird machine exploit of looking at two independent ad
| hoc parsers and finding inputs where they desynchronize and enter
| different states.
| n4jm4 wrote:
| Relying on regexes instead of thoroughly tested parser AST's
| encourages vulnerabilities.
|
| Are they really scraping logs for credentials instead of using an
| API?
| layer8 wrote:
| This looks like a classic case of a parser mismatch vulnerability
| [0].
|
| [0] https://www.brainonfire.net/blog/2022/04/11/what-is-
| parser-m...
| jwilk wrote:
| Discussed on HN:
|
| https://news.ycombinator.com/item?id=38743029 (22 comments)
| 1oooqooq wrote:
| the fact they haven't published an analysis with every single
| case this was abused, even if zero, in the past is proof enough
| this was a feature to someone.
___________________________________________________________________
(page generated 2024-01-23 23:01 UTC)