[HN Gopher] Let's Encrypt Acme API Outage
       ___________________________________________________________________
        
       Let's Encrypt Acme API Outage
        
       Author : fastest963
       Score  : 141 points
       Date   : 2023-06-15 16:20 UTC (6 hours ago)
        
 (HTM) web link (letsencrypt.status.io)
 (TXT) w3m dump (letsencrypt.status.io)
        
       | StayTrue wrote:
       | Looks like they're back online with a fix.
        
       | 8organicbits wrote:
       | What's the impact of an outage like this? ACME renewals should
       | happen daily starting 30 days before expiry so no one should have
       | had a cert expire due to this. New certificates wouldn't have
       | been issued, so that's impact, although I suspect most new certs
       | aren't taking traffic immediately (i.e. setting up a new server).
        
       | agwa wrote:
       | This is because I discovered that Let's Encrypt was issuing non-
       | compliant certificates:
       | https://bugzilla.mozilla.org/show_bug.cgi?id=1838667
        
         | VWWHFSfQ wrote:
         | Sounds like somebody didn't properly seed their random number
         | generator
        
           | agwa wrote:
           | The problem is actually WAY more subtle, and pretty hard to
           | understand unless you really get in the weeds of Certificate
           | Transparency and certificate policy, but I'll give a shot at
           | providing a concise explanation.
           | 
           | Let's Encrypt has produced two signed artifacts with the same
           | serial number:
           | 
           | 1. A precertificate: https://api.certspotter.com/v1/certs/227
           | 00bd0d70ac5790e6ae5b...
           | 
           | 2. A certificate: https://api.certspotter.com/v1/certs/c0916d
           | 24ac8844522b36950...
           | 
           | A precertificate is not a certificate, but it implies the
           | existence of a corresponding certificate which can be
           | constructed by applying an algorithm to the precertificate.
           | 
           | Let's Encrypt _intended_ to create a precertificate which
           | would result in (2) when applying the algorithm to (1).
           | Unfortunately, applying the algorithm to (1) results in a
           | different certificate, (3), presumably because of some bug in
           | Let 's Encrypt. Since (2) and (3) have the same serial
           | number, it's a violation of the prohibition against duplicate
           | serial numbers.
           | 
           | An easier-to-understand description of the problem is that
           | Let's Encrypt was producing precertificates that didn't match
           | the final certificate, but the compliance violation is
           | duplicate serial numbers, which is why I worded my compliance
           | bug the way I did.
        
             | VWWHFSfQ wrote:
             | Fascinating! What kind of effects can this have in real
             | life? Are they going to have to revoke the affected
             | certificates?
        
               | agwa wrote:
               | They will indeed need to revoke the affected certificates
               | within 5 days, per the Baseline Requirements.
               | 
               | Also, the affected certificates won't be accepted by
               | Certificate Transparency-enforcing browsers (Chrome and
               | Safari) because of the precertificate mismatch.
        
               | flangola7 wrote:
               | Why does Firefox not enforce it?
        
               | amiga386 wrote:
               | Firefox, the browser, only cares if the certificate is
               | valid (not expired, not revoked, ultimately signed by a
               | root CA it trusts). It does not keep tabs on every
               | certificate ever issued. You wouldn't like it if Firefox
               | did an online check with a central authority for every
               | website you visited, nor would you like it to bundle
               | every single certificate ever issued (or even just serial
               | numbers).
               | 
               | Mozilla, the authors of the browser, are part of the
               | CA/Browser Forum, which holds the threat of complete
               | distrust in all web browsers against CAs, which compels
               | CAs to be open and provide logs of all the certificates
               | they've issued and prove they're not mis-issuing
               | certificates. All those extra checks happen here.
        
               | agwa wrote:
               | > _You wouldn 't like it if Firefox did an online check
               | with a central authority for every website you visited_
               | 
               | Enforcing Certificate Transparency does not require doing
               | an online check for every website you visit.
               | 
               | > _Mozilla, the authors of the browser, are part of the
               | CA /Browser Forum, which holds the threat of complete
               | distrust in all web browsers against CAs, which compels
               | CAs to be open and provide logs of all the certificates
               | they've issued and prove they're not mis-issuing
               | certificates._
               | 
               | The CA/Browser Forum does not require CAs to log the
               | certificates that they issue. CT is enforced entirely
               | within the certificate validator code, and it is a major
               | shortcoming that Firefox does not do it.
        
               | amiga386 wrote:
               | You are correct: https://github.com/google/certificate-
               | transparency/blob/mast...
               | 
               | You can embed CT attestations (SCTs) in the certificate
               | itself, so yes, provided the CA is in cooperation with CT
               | log operators, and deliberately does the pre-certificate
               | -> SCTs -> real certificate dance, it is possible for a
               | browser to validate embedded SCTs without an online
               | check.
               | 
               | However, that assumes that the CA actively does that,
               | they don't have to. Neither does the server. What's
               | compelling them to is _policy_, set by Google and Apple,
               | that their respective browsers won't accept certificates
               | _without_ CT attestations. Google's policy specifically
               | requires that one of the SCTs on a certificate must be a
               | CT log run by Google. Google also controls the list of CT
               | logs that Chrome will consider as valid CT logs, as part
               | of deciding if an SCT is valid. Antitrust, anyone?
               | 
               | I was trying to make a similar point about Firefox -
               | policy vs code. And rather than saying that it's
               | specifically the CA/Browser Forum setting policy (which
               | it does, but only baseline policy, which does not include
               | CT), each org in the CA/Browser Forum has their own root
               | cert inclusion program with their own policies, that all
               | draw from baseline policy then add to it. You are right,
               | _baseline_ policy does not require CT....
               | 
               | ... and neither does _Mozilla's_ policy, now I've scanned
               | through it. It actively acknowledges that CT exists (in
               | that it mandates that if you issue a precertificate for
               | CT, you _must_ issue the completed certificate), but it
               | does _not_ require CAs to use CT. In stark contrast to
               | Google and Apple.
               | 
               | Perhaps this is why they also don't implement CT checking
               | in Firefox?
        
               | agwa wrote:
               | There is a major distinction between root store policy
               | and CT policy which you are missing.
               | 
               | Root store policy contains requirements which are
               | enforced by audits, and if a CA violates the root store
               | policy it is considered misissuance requiring them to
               | revoke the offending certificates and file an incident
               | report. Neither Chrome nor Apple root store policies
               | require CT.
               | 
               | CT policy describes what CAs must do for their
               | certificates to be accepted by the certificate validation
               | code. CT policy is enforced entirely by code. It is not
               | an incident if a CA doesn't comply with CT policy; it
               | just means their certificates won't be accepted.
        
               | toast0 wrote:
               | > It actively acknowledges that CT exists (in that it
               | mandates that if you issue a precertificate for CT, you
               | _must_ issue the completed certificate)
               | 
               | I don't think that's what the document says. I don't see
               | a requirement to issue the final certificate. This
               | portion is putting pre-certificates into scope of the
               | agreement in that a mis-issued pre-certificate is
               | evidence of intent to mis-issue a final certificate. So,
               | before issuing a pre-certificate, a CA has to be prepared
               | to revoke the final certificate, even if they never
               | actually issue the final certificate; as well as prepared
               | to defend the issuing of the final certificate.
               | 
               | Presumably, this is to cover from CAs claiming a pre-
               | certificate was issued for testing only, and wasn't going
               | to be issued as a final certificate. Also, I'd presume
               | that a CA issuing pre-certificates so they could embed
               | SCTs would abort issuance if they were unable to get a
               | response from the certificate log, but there's always the
               | chance that the submission went fine and the pre-
               | certificate is logged, but the response didn't make it,
               | so the CA would abort.
        
               | agwa wrote:
               | I was involved in the drafting of that language and you
               | are 100% correct.
        
               | johncolanduoni wrote:
               | Chrome also decides what CAs they will accept in Chrome
               | in the first place, so CT doesn't give them any extra
               | monopoly levers.
        
               | marginalia_nu wrote:
               | > You wouldn't like it if Firefox did an online check
               | with a central authority for every website you visited
               | 
               | And yet OCSP stapling is still far from ubiquitous.
        
               | jraph wrote:
               | I wonder how much this has to do with OCSP stapling being
               | so badly implemented in Apache 2 and nginx (don't know
               | about the other servers). This article from 2009 [1]
               | still seems current, at least I can attest that I still
               | have issues with nginx. Also this super user Q&A [2]
               | which suggest priming OCSP with a cron job because nginx
               | does not do its job by itself.
               | 
               | [1] https://blog.apnic.net/2019/01/15/is-the-web-ready-
               | for-ocsp-...
               | 
               | [2] https://superuser.com/questions/1635407/ocsp-not-
               | working-con...
        
               | johncolanduoni wrote:
               | Certificate Transparency verification only requires the
               | server provide stapled proof (either to the certificate
               | itself or a OCSP response) that the certificate was
               | submitted to the public logs. It does not involve any
               | extra requests from the browser to a third party; at most
               | it involves a periodic request from the server to the CA
               | with no client-specific data.
        
               | agwa wrote:
               | Here's the 7-year-old bug:
               | https://bugzilla.mozilla.org/show_bug.cgi?id=1281469
               | 
               | I don't know why it is taking them so long, but it makes
               | me sad.
        
             | f0rgot wrote:
             | Thank you for sharing your knowledge here.
             | 
             | A few questions:
             | 
             | If applying the algorithm to (1) produced (3), what
             | produced (2)?
             | 
             | How can "no duplicate serial numbers" be enforced by any
             | browser without having a store of all certificates? Is it
             | simply a best-effort? Will the browser have a mapping from
             | <serial number> to <certificate>, and whenever it sees a
             | certificate, it will check this map to see if it has seen
             | that serial number on a separate certificate?
        
               | johncolanduoni wrote:
               | Requiring Certificate Transparency in the browser doesn't
               | directly prevent this, but (as in this instance) it
               | ensures there is public data anyone can check to see if
               | this situation has occurred.
        
               | agwa wrote:
               | > _If applying the algorithm to (1) produced (3), what
               | produced (2)?_
               | 
               | I believe the root of the problem is that Let's Encrypt
               | is creating certificates and precertificates
               | independently, instead of creating a precertificate and
               | then applying the algorithm to create the corresponding
               | certificate. Since their processes for certificates and
               | precertificates got out-of-sync, they ended up producing
               | (2) instead of (3).
               | 
               | > _How can "no duplicate serial numbers" be enforced by
               | any browser without having a store of all certificates?_
               | 
               | Browser software doesn't enforce this. It can only be
               | enforced by scanning Certificate Transparency logs
               | looking for violations.
        
               | tialaramex wrote:
               | Indeed, although the de jure requirement in the policy
               | isn't actually important per se, the only practical way
               | to obey the policy is to do the thing we want you to do,
               | so that's what you'll actually do, but the way the policy
               | is phrased makes enforcement practical.
               | 
               | This is different from a "Brown M&M" policy where the
               | purpose of the policy is to easily check that you are
               | actually reading and obeying the policy document. Here
               | the policy is worded in a way that doesn't directly
               | achieve what we want, but is measurable, whereas what we
               | want isn't, but the only practical way to achieve policy
               | compliance is to do what we wanted anyway.
        
             | 0xbadcafebee wrote:
             | That's absolutely wild that they had no test to detect
             | that. I wonder what other obvious bugs are floating around
             | in there.
        
         | rob-olmos wrote:
         | Do you have a blog post or writeup on how you discovered that?
         | Thanks!
        
           | agwa wrote:
           | This all happened less than 2 hours ago, but a quick summary
           | is that my Certificate Transparency monitor, Cert Spotter
           | (https://sslmate.com/certspotter) performs various sanity
           | checks on every certificate that it observes. At 15:41 UTC
           | today, I started getting alerts that certificates from Let's
           | Encrypt were failing one particular check. I quickly emailed
           | Let's Encrypt's problem reporting address, and Let's Encrypt
           | promptly suspended issuance so they could investigate. I've
           | lost count of how many CAs I've detected having this
           | particular problem, so perhaps it is time to blog about it
           | (https://www.agwa.name/blog if you're interested).
        
             | danShumway wrote:
             | I will also throw out a quick vote that I'd be interested
             | in reading a blog post about it.
        
             | conroydave wrote:
             | this is why i will always love hacker news. thank you
        
             | [deleted]
        
             | toomuchtodo wrote:
             | Curious people contributing to the ongoing functioning of
             | critical systems at scale. Thank you for your effort!
             | 
             | https://xkcd.com/2347/
        
             | mholt wrote:
             | I would love to read a blog of yours with more information.
        
             | mardifoufs wrote:
             | That's awesome!! I wonder if let's encrypt runs sanity
             | checks before/after issuing certs too?
        
               | agwa wrote:
               | They "lint" certificates before issuance, as do most CAs.
               | However, I don't think any linters check for this
               | problem, as it requires access to more than just the
               | certificate (the linter would need access to either the
               | precertificate or a database of Certificate Transparency
               | log keys).
        
             | dopamean wrote:
             | This iso so awesome. Thank you for sharing. I hope you do
             | write about that problem. I'd love to learn something new.
        
         | iso1631 wrote:
         | So you can legitimately put "broke the internet" on your resume
         | :D
        
           | dylan604 wrote:
           | That would be impressive except for all of the AWS us-east-1
           | engineers that can claim the same thing
        
             | natebc wrote:
             | They didn't do it all by themselves though!
        
               | dylan604 wrote:
               | Like you don't have things on your resume that your team
               | did and not just you
        
               | gumby wrote:
               | One thing I like about the computing world is that people
               | put "wrote the package that frobbed jpegs that in
               | production frobbs over 1 million jpegs per hour".
               | Anything less specific means "I was somewhere in the
               | building when this was written and deployed".
               | 
               | When I worked in pharma people would say something like
               | "I joined a program targetting neurology early in the
               | preclinical phase and developed assays, until first-in-
               | human four years later. Started as senior lab technician
               | and departed as a junior assistant director for
               | preclinical QC."
               | 
               | Took me years to understand how the sociology, regulatory
               | dynamics, and science of the two fields legitimately
               | resulted in these utterly different approaches.
        
       | mholt wrote:
       | Regular reminder that the best ACME clients will fall back to
       | other CAs if one is down. For example caddy does this.
       | (Disclosure yada yada)
        
         | Arnavion wrote:
         | That wouldn't work for caddy if you also follow the best
         | practice to have a CAA record pointing to the issuer and
         | account URL, unless caddy is also managing DNS records in
         | addition to being an HTTP server. (I don't know if it is, but I
         | would think it's a layering violation for an HTTP server to
         | also be a DNS server.)
        
           | mholt wrote:
           | This is true, if you manually configure a CAA limited to just
           | one CA, then you lose that benefit of redundancy.
           | 
           | I recommend trusting multiple CAs (but not too many):
           | https://matt.life/writing/the-acme-protocol-in-practice-
           | and-...
           | 
           | > (I don't know if it is, but I would think it's a layering
           | violation for an HTTP server to also be a DNS server.)
           | 
           | Caddy 2 is, at its core, a server of servers. The HTTP server
           | is just an "app module" for Caddy. There are other servers; I
           | don't know of a DNS server app yet. (CoreDNS is a fork of
           | Caddy v1, though.)
        
         | neurostimulant wrote:
         | What other CAs you recommend aside from letsencrypt? I'm a bit
         | wary of trying some random CAs that offers free certificates
         | aside from letsencrypt.
        
           | mholt wrote:
           | It's good to be wary.
           | 
           | You can trust the ACME CAs listed on this site:
           | https://www.acmeisuptime.com/ (Although, I think that list
           | could use some updating. I'll ping the author.)
           | 
           | Personally I would use Let's Encrypt, ZeroSSL (Sectigo) and
           | Google Trust Services. There are, of course, others. But
           | which ones you choose depend on your requirements and such.
           | (Some offer business support, for example.) SSL.com and
           | Sectigo also offer ACME but I am not sure how performant
           | their CA software is.
        
         | remram wrote:
         | The missing disclosure is "I'm the author of the Caddy web
         | server". I'm not sure why you would do it halfway.
        
         | yjftsjthsd-h wrote:
         | > the best ACME clients will fall back
         | 
         | Are there others that do it, or are you just saying that yours
         | is the best?
        
           | mholt wrote:
           | There are others that do it.
           | 
           | While we're on the topic, I will say that Caddy has
           | independently and repeatedly been cited as the gold standard
           | of ACME clients, "the best client experience," and "we hope
           | to see other servers follow Caddy's lead." [0] [1] [and
           | others I don't have links to currently].
           | 
           | [0]: https://www.youtube.com/watch?v=OE5UhQGg_Fo
           | 
           | [1]: https://jhalderm.com/pub/papers/letsencrypt-ccs19.pdf
        
           | xen2xen1 wrote:
           | I used Caddy and was just wondering if my stuff would be
           | broke, so the GP comment was useful to me, and snarky as your
           | reply was.
        
             | yjftsjthsd-h wrote:
             | Sure, there are a handful of useful points there (off the
             | top of my head: it's possible to work around this failure
             | by using multiple CAs, multiple free ACME CAs exist, caddy
             | implements this solution). I'm just 1. slightly frustrated
             | that caddy's author never seems to miss a chance for self-
             | promotion (at least he's started alluding to the fact that
             | it's his project), and 2. actually curious whether any
             | other ACME clients are implementing that fallback.
        
               | toast0 wrote:
               | Apache mod_md has fallback too,
               | https://github.com/icing/mod_md#acme-failover I'm just a
               | user, not the author, and I didn't try the fallback. I'm
               | more worried about stuff breaking if I switch issuers
               | than certs expiring without me noticing. I've got some
               | embedded junk that hits my website and has weak cert
               | validation, so better to stick with something that works.
        
               | xen2xen1 wrote:
               | For self promotion that was pretty light. You ever seen
               | the Sourcehut guy on here?
        
               | schoolornot wrote:
               | Promoting went from HTTP headers to HN comments.
        
               | mholt wrote:
               | > I'm just 1. slightly frustrated that caddy's author
               | never seems to miss a chance for self-promotion (at least
               | he's started alluding to the fact that it's his project),
               | 
               | Sorry that I happen to be the author. It's really not
               | about that though -- it just matters that an ACME-native
               | HTTPS server exists. We need more integrated fully-native
               | ACME clients.
               | 
               | > actually curious whether any other ACME clients are
               | implementing that fallback.
               | 
               | There are at least one or two others. I don't recall
               | which ones at the moment but I think Certify the Web may
               | be one. Edit: mod_md is another apparently!
        
               | tialaramex wrote:
               | Right, it's astounding to me that outfits like
               | _Microsoft_ didn 't just immediately ship decent ACME
               | implementations. People seem to have settled for third
               | party bolt-on solutions. It's like you wake up in an
               | alternate world where yeah, no cars come with seat belts,
               | but of course everybody buys seatbelts for the car,
               | there's usually a store next to the car dealer which
               | sells them. Um. What?
               | 
               | Most of the "popular" software in this space is garbage.
               | I spent the entire day today (aside from meetings and
               | helping other people debug problems) wrestling with the
               | fact Apache seems to be designed so heavily with a C
               | programmer mindset that even the idea of reporting
               | problems has never occurred to them. Just blunder on,
               | it'll be fine, don't think about it. You can sprinkle
               | complete nonsense into Apache configuration files and,
               | until you trip an actual syntactical error and blow up
               | their parser, Apache just presses on anyway with the
               | nonsense values you provided, and if that doesn't work,
               | no reason to report it just do whatever was the default
               | and hope that's OK.
               | 
               | As far as I can tell, in the wild the result is a lot of
               | Apache configuration is complete nonsense, but hey no
               | errors are reported, so, copy, paste, move on.
        
             | agwa wrote:
             | If your certificate was issued after the start of the
             | incident but before Let's Encrypt suspended issuance, then
             | the certificate is currently not working in Chrome or
             | Safari.
             | 
             | This incident wasn't just about downtime, it was also about
             | issuing non-functional/non-compliant certificates.
        
               | mholt wrote:
               | Is this because the certs were revoked? (Revocation is
               | broken ;P)
               | 
               | Caddy staples Valid OCSP responses to all certificates
               | that have an OCSP responder, so if browsers aren't
               | accepting that, then arguably the clients are broken,
               | because that response is valid until a few days from now.
               | But before the 100% valid and trusted OCSP staple
               | expires, Caddy will get a new staple that presumably says
               | Revoked, and replace them right away before browsers
               | would ever see a Revoked status.
               | 
               | (Revocation is broken ;P)
        
               | agwa wrote:
               | No, it's because the SCTs in the certificate have invalid
               | signatures.
        
               | mholt wrote:
               | Ah, that makes sense!
               | 
               | I wonder if we should be doing some basic sanity checks
               | on newly obtained certificates in Caddy, and treat this
               | as a failure, and try the next configured CA instead.
               | 
               | (Obviously SCT signatures will require some external
               | resource so we would have to weigh that a bit more, maybe
               | make it configurable...)
               | 
               | Issue opened here to discuss, though it does sound
               | troublesome/tedious:
               | https://github.com/caddyserver/certmagic/issues/240
        
               | agwa wrote:
               | Yeah, the question is how far you want to go. To be safe
               | against every possible CA screwup, you basically have to
               | re-implement every browser's entire certificate
               | validation engine and run certificates through each one.
               | That would obviously be very hard, and could do more harm
               | than good if it falls out-of-sync with browsers.
        
               | toast0 wrote:
               | It would make sense to try to check this, if there's a
               | reasonable way to access a database of trusted
               | certificate logs. If not, it's going to be tricky; I
               | wouldn't fail a certificate that had a SCT from an
               | unknown log, because it might be valid and you don't
               | know. Etc.
        
       | yjftsjthsd-h wrote:
       | On the bright side, that's actually one of the lower-impact
       | things to have an outage on, IMO; if you're using it the
       | recommended way, an outage would only really affect new certs,
       | with older certs just getting renewed slightly later.
        
         | johncolanduoni wrote:
         | If you're onboarding new users into something like a SaaS or
         | PaaS it could be a bigger deal.
        
       ___________________________________________________________________
       (page generated 2023-06-15 23:02 UTC)