[HN Gopher] Behind GitHub's new authentication token formats
       ___________________________________________________________________
        
       Behind GitHub's new authentication token formats
        
       Author : todsacerdoti
       Score  : 146 points
       Date   : 2021-04-05 16:32 UTC (6 hours ago)
        
 (HTM) web link (github.blog)
 (TXT) w3m dump (github.blog)
        
       | RyJones wrote:
       | ok, great, when will I be able to scope tokens to acting on only
       | one repo, or only one org, or in any meaningful way?
        
         | nathanaldensr wrote:
         | I completely agree. This change is interesting and all, but
         | GitHub (Enterprise, in my case) tokens aren't granular enough.
         | There's a lot more benefit to be had in fixing that issue.
        
         | tptacek wrote:
         | I'm sure this is a valid complaint about Github, but it has
         | nothing to do with the article, which is a bit annoying since
         | there's some cleverness in that article (checksumming tokens,
         | for instance) that we could be talking about stealing, rather
         | than turning this thread into a generic referendum on whether
         | Github is good.
        
           | chatmasta wrote:
           | The article is about GitHub's authentication tokens. It seems
           | relevant to bring up a complaint about their scopes.
        
             | tptacek wrote:
             | And yet it's not, because this is an article about token
             | formats, and not about entire authorization schemes.
        
           | threeseed wrote:
           | a) I don't see what is particularly clever about their token
           | algorithm.
           | 
           | b) Talking about fundamental flaws in their token
           | implementation seems relevant in a discussion about their
           | token implementation.
           | 
           | c) Public discussions about flaws are often the best way to
           | educate others and make the company aware of them.
        
         | paxys wrote:
         | You can already do that today. Create a Github App and add it
         | only to a single repo.
        
         | chatmasta wrote:
         | I definitely agree this is needed, but I have to imagine it's
         | quite a complex change on GitHub's side. It probably entails
         | changing a lot of their authentication architecture, in terms
         | of what's stateless and stateful and what requires a trip to
         | the DB to check (i.e., it's hard to encode a list of which
         | resources you should have access to in a single token). I'm
         | sure they're aware this is a problem though. Maybe this recent
         | change sets the groundwork for fixing it.
        
           | RyJones wrote:
           | That the only method I have to scope tokens is to break the
           | TOS for GitHub - by creating single-use accounts - is super
           | lame.
        
             | threeseed wrote:
             | Don't forget that creating new accounts costs money.
             | 
             | If you want to take advantage of Environments for example
             | then you need to pay for the Enterprise license which means
             | every account is another $21/month. That adds up for
             | individual and startup use.
             | 
             | I am actually confused why we can't just have tokens
             | assigned to organisations and not users.
        
             | JamesSwift wrote:
             | What TOS does it violate? They explicitly recommend
             | creating machine users in certain areas of their help docs.
        
               | jpalomaki wrote:
               | This one? "you may not have more than one free Account"
               | 
               | https://docs.github.com/en/github/site-policy/github-
               | terms-o...
               | 
               | But there's also later more detailed guidance:
               | 
               | "One person or legal entity may maintain no more than one
               | free Account (if you choose to control a machine account
               | as well, that's fine, but it can only be used for running
               | a machine)."
        
               | kroltan wrote:
               | Presumably then it would be non-violating to have many
               | paid accounts? Sure is pricey for smaller operations but
               | depending on the value it brings to have this more
               | granular scoping it surely might be worth it?
               | 
               | Doesn't remove their need to implement a more reasonable
               | way of scoping tokens, though.
        
               | tptacek wrote:
               | Whatever the letter of this restriction is, that's not
               | its spirit. Our practice at my last company was per-
               | client segregated accounts, and I have a mailbox full of
               | discussions with Github support staff telling us that was
               | OK.
        
               | marcinzm wrote:
               | This seems like a classic case of rules with high
               | potential for selective enforcement which generally leads
               | to unfair enforcement. It's fine as long as you don't
               | somehow get on Github's bad side and if you do it's an
               | instant reason to close your accounts.
        
               | ArchOversight wrote:
               | This means that if you create a work account that is
               | separate from your personal account (because you don't
               | use personal credentials on work machines and vice-versa)
               | then you are technically in violation of the TOS...
               | 
               | Yet this is something I, and many others, do because we
               | don't want to mix business with pleasure. In fact I
               | absolutely refuse to do so because of security reasons.
        
               | Arnavion wrote:
               | Many Microsoft employees also have separate personal and
               | work accounts. I would be surprised if that was a
               | violation.
        
               | masklinn wrote:
               | It is explicitly a technical violation of the TOS.
               | 
               | In practice this is mostly so github has a reason to
               | misuses like bots which might not be caught by the anti-
               | spam measure.
               | 
               | They do mind a bit but not too much, and you can get help
               | from support having literally stated that you have
               | multiple accounts. For instance if you're testing an
               | extension or integration with github, and there are
               | specific interactions between different users... you
               | kinda need different users to test it. And mocking github
               | may not be sufficient.
        
               | Arnavion wrote:
               | Yes. I'm talking specifically about the "separate
               | personal and work accounts" case. It may very well be a
               | TOS violation as the TOS is written. I'm saying I'd be
               | surprised if they treated it as one.
        
               | chatmasta wrote:
               | I'm actually dealing with something like this right now
               | and am curious what solutions people use for e2e testing
               | of OAuth flows. I'm leaning toward creating a test
               | account at each Identity Provider, but then I have to
               | deal with things like 2FA. I guess it's not so bad if I
               | just use a TOTP generator on the client, but if they want
               | to send an email to verify my account, that's just
               | annoying.
        
               | derefr wrote:
               | Aren't the work accounts, paid accounts? The TOS only
               | restricts having multiple free accounts.
        
               | ArchOversight wrote:
               | No, the work accounts are not paid accounts. My work
               | requires me to interact with Open Source projects and the
               | like. We have our own hosted Gitlab instance for our
               | internal projects.
               | 
               | The work account is strictly to communicate with/provide
               | patches back to upstream projects.
               | 
               | I have multiple personal accounts on Github. One for each
               | employer I have worked at in the past couple of years,
               | and my personal account that is tied to my own identity
               | and is used for my personal time projects/open source
               | work that is not tied to $work.
        
               | TheSoftwareGuy wrote:
               | Well the solution is simple, just make a new LLC for each
               | GitHub account you need!
               | 
               | I'm definitely kidding, but unless there is more in there
               | TOS (which I don't intend to read) I don't see why this
               | wouldn't be a workable loophole
        
           | threeseed wrote:
           | GitHub for me is running out of excuses for why the
           | fundamentals of their platform is so poor. And why compared
           | to Gitlab they deliver improvements at such an anaemic pace.
           | They act like a company with 15 employees let alone 1,500+.
           | 
           | Everything from Security, Actions, Containers, Packages,
           | Terraform, JIRA Integration etc is either completely broken
           | or has major outstanding issues that haven't been fixed for
           | years.
        
             | chatmasta wrote:
             | Personally, we use GitLab for all the internal repos at our
             | company. We originally migrated because GitLab CI was free
             | and GitHub didn't even have a CI solution at the time. We
             | still use GitHub for our public repos since that's "where
             | the community is." GitHub actions is great, although IMO a
             | bit too prescriptive (I'd rather write a script that can
             | run anywhere rather than spend time building up the mental
             | model of the abstractions that are unique to GHA). But
             | nothing beats GitLab CI + container registry; we put a lot
             | of work into our CI pipeline and now we've got incremental
             | builds with a Docker image for every service tagged per
             | commit. And since GitLab Container Registry supports
             | manifest v2, we can take advantage of BuildKit layer
             | caching (I think GitHub registry supports this now too, but
             | haven't played with it).
             | 
             | That said, GitLab has its fair share of problems too.
             | GitHub UI is way better, community/discussion features are
             | better, and forking/public collaboration workflow is
             | better.
             | 
             | I'm glad there are two big players in the space, though;
             | GitLab really lit a fire under GitHub to finally get them
             | to start pushing new features.
        
       | systemvoltage wrote:
       | Side rant: Github devs, if you're listening, please give us REST
       | API. Not everyone knows GraphQL or has the motivation to do so.
       | The industry standard is REST for public facing APIs, including
       | companies such as Stripe (widely considered to be the gold
       | standard for public API design and documentation). You can use
       | GraphQL internally.
        
         | minitech wrote:
         | https://docs.github.com/en/rest
        
           | systemvoltage wrote:
           | I remember having to use GraphQL to delete a Docker image
           | that was stuck in my private repo and there was no GUI to
           | clear it. Wasted a couple of hours trying to send a GraphQL
           | query which would have been a 2 minute jobbie using cURL.
           | Github's public REST API didn't have this feature.
        
             | bastardoperator wrote:
             | They have a curl example in the docs
             | 
             | https://docs.github.com/en/packages/learn-github-
             | packages/de...
        
         | azimuth11 wrote:
         | A lot of REST APIs are just as hard to grok as GraphQL is as a
         | whole. Companies often lack schemas and documentation, which
         | GraphQL helps with out of the box.
        
       | glsdfgkjsklfj wrote:
       | You know a company do not take tech seriously when they use fancy
       | quotation marks in code blocks.
        
       | seoaeu wrote:
       | It seems weird that in a blog post about a new format for tokens,
       | there isn't a single example of what a GitHub token now looks
       | like.
        
       | Mandatum wrote:
       | I wonder why they went with 2 rather than 3 or 4 for company
       | identifier. Stock ticker for instance would make sense. Not
       | really practical.
        
         | paxys wrote:
         | This isn't meant to be a standard, just something they picked
         | for themselves. And it doesn't even need to be a company
         | identifier. Slack tokens are prefixed with "xox<token type>-",
         | for example.
        
       | ramses0 wrote:
       | https://tools.ietf.org/html/rfc8959 - "secret-
       | token:E92FB7EB-D882-47A4-A265-A0B6135DC842%20foo"
        
       | Wowfunhappy wrote:
       | Does anyone know if these new tokens are backwards compatible
       | with software that used the old tokens? By which I mean, I'm
       | using a version of Git Tower from before they switched to a
       | subscription model, and I'm wondering whether regenerating my
       | tokens will make me unable to log in.
        
       | mperham wrote:
       | The key insight here is that random tokens should be self-
       | describing, so you know their intended use and therefore can make
       | decisions and take action when one is detected.
       | 
       | If a script sees "ABC123" in a code commit, that's meaningless.
       | If you see "secret-token:ABC123", now you can fail the commit
       | with an error message: "Secret token detected in public commit,
       | aborting."
        
         | staticassertion wrote:
         | FWIW it is still very much worth reading the article, since
         | they talk about how they implement that approach. They bring up
         | why they use an underscore, checksum'ing, entropy, etc.
        
         | echelon wrote:
         | One of the biggest learnings of our org was to prefix tokens
         | with the entity type. It's helped immensely.
         | 
         | entity-type:RANDOM_TOKEN
         | 
         | * Helps in migrations, especially complex ones where you split
         | up entity types.
         | 
         | * Identities what tokens are so people can look them up if they
         | see them in logs.
         | 
         | * Polymorphic relationships can delegate to the appropriate
         | owning service easily without additional bookkeeping.
         | 
         | You can also encode other stuff in the token entropy, too, such
         | as the author DC/region for active-active setups where you need
         | to forward the request to the source of truth in the brief
         | window where the other regions don't know about it yet.
        
           | fiddlerwoaroof wrote:
           | I've always thought that a Java-style reverse domain name
           | format (or, perhaps, URLs) is a great way to encode IDs:
           | com.foo.bar.Person:0000-11111-22222-33333 or whatever. That
           | way, any code that logs IDs or transfers IDs across the
           | network gets tracing "for free" and, when you see an ID in a
           | bug report, you can use it to help focus the investigation.
        
             | ljm wrote:
             | Ruby/Rails has this in the form of GlobalID[1]. To be
             | honest I haven't seen it used outside of whatever Rails
             | itself automatically does, but the concept is there.
             | 
             | [1]https://github.com/rails/globalid
        
           | 11235813213455 wrote:
           | that's how Stripe prefixes their IDs too, depending on what
           | type of entity it is. Makes debugging, docs, .. easier
        
         | xPaw wrote:
         | For those that haven't seen it, "secret-token:" is an RFC. I've
         | started using it at work.
         | 
         | https://tools.ietf.org/html/rfc8959
        
           | nine_k wrote:
           | An unencrypted _version_ marker would be pretty useful, too.
           | If anything is long-term, you can safely bet in that it 'll
           | need to evolve.
        
           | cornstalks wrote:
           | Note that the RFC's category is "informational", which
           | doesn't give it as much weight as something that is
           | "standards track". Usually the important RFCs are "standards
           | track" though there are some "informational" RFCs that are
           | also important.
           | 
           | From Wikipedia[1]:
           | 
           | > _An informational RFC can be nearly anything from April 1
           | jokes to widely recognized essential RFCs like Domain Name
           | System Structure and Delegation (RFC 1591). Some
           | informational RFCs formed the FYI sub-series._
           | 
           | [1]: https://en.wikipedia.org/wiki/Request_for_Comments#Infor
           | mati...
        
         | revicon wrote:
         | The ID prefixing is cool from an identification point of view,
         | but we've been using UUIDs for tokens and if we implemented
         | this we wouldn't be able to use the UUID optimized datatype
         | field in Postgres.
        
           | marksomnian wrote:
           | Surely you can just strip off the prefix in the application
           | layer before sending it to Postgres? You still get the
           | benefits, while being able to use the native query.
        
             | edoceo wrote:
             | I do it like that. We're only using a two char prefix (I
             | copied Twilio)
        
           | orf wrote:
           | Why not? You don't have to store the prefix and the UUID in
           | the same column?
        
           | asimpletune wrote:
           | Can't you just add a column to your schema with the prefix?
        
           | [deleted]
        
       | gkop wrote:
       | Would someone comment on this idea in context of JWTs? Not
       | trolling, just curious as I use JWTs and embed this kind of
       | metadata as a custom claim, which accomplishes some but far from
       | all of what GitHub accomplishes here, but then I have no need for
       | the easy scanning. So seeking wisdom from anyone who has thought
       | carefully about whether or not to prefix their JWTs in this way.
        
         | benatkin wrote:
         | JWTs purposely contain information in plain text (unencrypted
         | and not stored in a database), however it is in base64 so you
         | don't need to worry about url encoding issues and so it looks
         | like a token.
         | 
         | You could add a prefix to a jwt. That would make it a token
         | that contains a jwt.
         | 
         | I don't think the tiny prefix is what they want to obscure. So
         | it wouldn't go against the design of JWT to add one.
         | 
         | I would do it. I don't see any issues with it.
         | 
         | It would be something like:
         | 
         | BA_<base64>.<base64>.base64
         | 
         | If you wanted to be able to double click to copy and paste,
         | which I don't think is a huge usability improvement, you could
         | replace the . with _, and I think a lot of devs would be able
         | to figure out that it's a representation of a JWT.
        
         | paxys wrote:
         | A big motivation for such token formats is to quickly and
         | easily identity when they are shared somewhere they shouldn't
         | be. JWTs aren't helpful in that regard, since they always
         | present themselves as a base64 encoded blob.
        
           | gkop wrote:
           | Totally, but JWT-like blobs can be detected (see sibling
           | comment), and parsing attempted, so for the automated
           | scanning use case, if I understand correctly, it can be done
           | with perfect accuracy, just at a larger computation expense
           | and worse security exposure due to the complexity of the
           | scanning and the need to parse.
           | 
           | The more interesting side to me is the benefit to humans,
           | from the prefix technique.
        
           | threeseed wrote:
           | The question is why you couldn't just have "${prefix}-${JWT}"
           | as your format.
           | 
           | Then you can just strip the prefix before parsing. Which
           | means don't need to worry about checksumming or entropy and
           | you have the ability to embed large amounts of data as well
           | as plenty of client support and libraries.
           | 
           | Would be curious if this implementation is somehow more
           | performant.
        
           | parhamn wrote:
           | It's pretty grepable because {" (json opener) always encode
           | to "ey". So a base64 that starts with "ey" and has 3 dot
           | separated sections is a good start for a regex. I'm sure you
           | can go further by looking at the spec.
        
       | vsareto wrote:
       | >One other neat thing about _ is it will reliably select the
       | whole token when you double click on it
       | 
       | Shout out and kudos to whomever brought that up
        
         | chatmasta wrote:
         | Frankly the fact that this doesn't happen with `-` in a
         | `<code>` block should be considered a browser bug.
        
           | williamdclt wrote:
           | Well it doesn't happen in IDEs either (by default, at least)
        
             | chatmasta wrote:
             | I can see the argument for multi-line pre-formatted code
             | blocks, but for inline `<code>` it would be nice if double
             | clicking anywhere selected the whole thing.
        
               | mbauman wrote:
               | Is it `a-long-identifier` or is it `x-y`?
        
               | williamdclt wrote:
               | I'd rather consistent behaviour TBH, I'm not too happy
               | that `-` is a non-word character but I'd rather it always
               | behaves the same everywhere without having to think about
               | context
        
       ___________________________________________________________________
       (page generated 2021-04-05 23:00 UTC)