[HN Gopher] Let's Encrypt Acme API Outage
___________________________________________________________________
Let's Encrypt Acme API Outage
Author : fastest963
Score : 141 points
Date : 2023-06-15 16:20 UTC (6 hours ago)
(HTM) web link (letsencrypt.status.io)
(TXT) w3m dump (letsencrypt.status.io)
| StayTrue wrote:
| Looks like they're back online with a fix.
| 8organicbits wrote:
| What's the impact of an outage like this? ACME renewals should
| happen daily starting 30 days before expiry so no one should have
| had a cert expire due to this. New certificates wouldn't have
| been issued, so that's impact, although I suspect most new certs
| aren't taking traffic immediately (i.e. setting up a new server).
| agwa wrote:
| This is because I discovered that Let's Encrypt was issuing non-
| compliant certificates:
| https://bugzilla.mozilla.org/show_bug.cgi?id=1838667
| VWWHFSfQ wrote:
| Sounds like somebody didn't properly seed their random number
| generator
| agwa wrote:
| The problem is actually WAY more subtle, and pretty hard to
| understand unless you really get in the weeds of Certificate
| Transparency and certificate policy, but I'll give a shot at
| providing a concise explanation.
|
| Let's Encrypt has produced two signed artifacts with the same
| serial number:
|
| 1. A precertificate: https://api.certspotter.com/v1/certs/227
| 00bd0d70ac5790e6ae5b...
|
| 2. A certificate: https://api.certspotter.com/v1/certs/c0916d
| 24ac8844522b36950...
|
| A precertificate is not a certificate, but it implies the
| existence of a corresponding certificate which can be
| constructed by applying an algorithm to the precertificate.
|
| Let's Encrypt _intended_ to create a precertificate which
| would result in (2) when applying the algorithm to (1).
| Unfortunately, applying the algorithm to (1) results in a
| different certificate, (3), presumably because of some bug in
| Let 's Encrypt. Since (2) and (3) have the same serial
| number, it's a violation of the prohibition against duplicate
| serial numbers.
|
| An easier-to-understand description of the problem is that
| Let's Encrypt was producing precertificates that didn't match
| the final certificate, but the compliance violation is
| duplicate serial numbers, which is why I worded my compliance
| bug the way I did.
| VWWHFSfQ wrote:
| Fascinating! What kind of effects can this have in real
| life? Are they going to have to revoke the affected
| certificates?
| agwa wrote:
| They will indeed need to revoke the affected certificates
| within 5 days, per the Baseline Requirements.
|
| Also, the affected certificates won't be accepted by
| Certificate Transparency-enforcing browsers (Chrome and
| Safari) because of the precertificate mismatch.
| flangola7 wrote:
| Why does Firefox not enforce it?
| amiga386 wrote:
| Firefox, the browser, only cares if the certificate is
| valid (not expired, not revoked, ultimately signed by a
| root CA it trusts). It does not keep tabs on every
| certificate ever issued. You wouldn't like it if Firefox
| did an online check with a central authority for every
| website you visited, nor would you like it to bundle
| every single certificate ever issued (or even just serial
| numbers).
|
| Mozilla, the authors of the browser, are part of the
| CA/Browser Forum, which holds the threat of complete
| distrust in all web browsers against CAs, which compels
| CAs to be open and provide logs of all the certificates
| they've issued and prove they're not mis-issuing
| certificates. All those extra checks happen here.
| agwa wrote:
| > _You wouldn 't like it if Firefox did an online check
| with a central authority for every website you visited_
|
| Enforcing Certificate Transparency does not require doing
| an online check for every website you visit.
|
| > _Mozilla, the authors of the browser, are part of the
| CA /Browser Forum, which holds the threat of complete
| distrust in all web browsers against CAs, which compels
| CAs to be open and provide logs of all the certificates
| they've issued and prove they're not mis-issuing
| certificates._
|
| The CA/Browser Forum does not require CAs to log the
| certificates that they issue. CT is enforced entirely
| within the certificate validator code, and it is a major
| shortcoming that Firefox does not do it.
| amiga386 wrote:
| You are correct: https://github.com/google/certificate-
| transparency/blob/mast...
|
| You can embed CT attestations (SCTs) in the certificate
| itself, so yes, provided the CA is in cooperation with CT
| log operators, and deliberately does the pre-certificate
| -> SCTs -> real certificate dance, it is possible for a
| browser to validate embedded SCTs without an online
| check.
|
| However, that assumes that the CA actively does that,
| they don't have to. Neither does the server. What's
| compelling them to is _policy_, set by Google and Apple,
| that their respective browsers won't accept certificates
| _without_ CT attestations. Google's policy specifically
| requires that one of the SCTs on a certificate must be a
| CT log run by Google. Google also controls the list of CT
| logs that Chrome will consider as valid CT logs, as part
| of deciding if an SCT is valid. Antitrust, anyone?
|
| I was trying to make a similar point about Firefox -
| policy vs code. And rather than saying that it's
| specifically the CA/Browser Forum setting policy (which
| it does, but only baseline policy, which does not include
| CT), each org in the CA/Browser Forum has their own root
| cert inclusion program with their own policies, that all
| draw from baseline policy then add to it. You are right,
| _baseline_ policy does not require CT....
|
| ... and neither does _Mozilla's_ policy, now I've scanned
| through it. It actively acknowledges that CT exists (in
| that it mandates that if you issue a precertificate for
| CT, you _must_ issue the completed certificate), but it
| does _not_ require CAs to use CT. In stark contrast to
| Google and Apple.
|
| Perhaps this is why they also don't implement CT checking
| in Firefox?
| agwa wrote:
| There is a major distinction between root store policy
| and CT policy which you are missing.
|
| Root store policy contains requirements which are
| enforced by audits, and if a CA violates the root store
| policy it is considered misissuance requiring them to
| revoke the offending certificates and file an incident
| report. Neither Chrome nor Apple root store policies
| require CT.
|
| CT policy describes what CAs must do for their
| certificates to be accepted by the certificate validation
| code. CT policy is enforced entirely by code. It is not
| an incident if a CA doesn't comply with CT policy; it
| just means their certificates won't be accepted.
| toast0 wrote:
| > It actively acknowledges that CT exists (in that it
| mandates that if you issue a precertificate for CT, you
| _must_ issue the completed certificate)
|
| I don't think that's what the document says. I don't see
| a requirement to issue the final certificate. This
| portion is putting pre-certificates into scope of the
| agreement in that a mis-issued pre-certificate is
| evidence of intent to mis-issue a final certificate. So,
| before issuing a pre-certificate, a CA has to be prepared
| to revoke the final certificate, even if they never
| actually issue the final certificate; as well as prepared
| to defend the issuing of the final certificate.
|
| Presumably, this is to cover from CAs claiming a pre-
| certificate was issued for testing only, and wasn't going
| to be issued as a final certificate. Also, I'd presume
| that a CA issuing pre-certificates so they could embed
| SCTs would abort issuance if they were unable to get a
| response from the certificate log, but there's always the
| chance that the submission went fine and the pre-
| certificate is logged, but the response didn't make it,
| so the CA would abort.
| agwa wrote:
| I was involved in the drafting of that language and you
| are 100% correct.
| johncolanduoni wrote:
| Chrome also decides what CAs they will accept in Chrome
| in the first place, so CT doesn't give them any extra
| monopoly levers.
| marginalia_nu wrote:
| > You wouldn't like it if Firefox did an online check
| with a central authority for every website you visited
|
| And yet OCSP stapling is still far from ubiquitous.
| jraph wrote:
| I wonder how much this has to do with OCSP stapling being
| so badly implemented in Apache 2 and nginx (don't know
| about the other servers). This article from 2009 [1]
| still seems current, at least I can attest that I still
| have issues with nginx. Also this super user Q&A [2]
| which suggest priming OCSP with a cron job because nginx
| does not do its job by itself.
|
| [1] https://blog.apnic.net/2019/01/15/is-the-web-ready-
| for-ocsp-...
|
| [2] https://superuser.com/questions/1635407/ocsp-not-
| working-con...
| johncolanduoni wrote:
| Certificate Transparency verification only requires the
| server provide stapled proof (either to the certificate
| itself or a OCSP response) that the certificate was
| submitted to the public logs. It does not involve any
| extra requests from the browser to a third party; at most
| it involves a periodic request from the server to the CA
| with no client-specific data.
| agwa wrote:
| Here's the 7-year-old bug:
| https://bugzilla.mozilla.org/show_bug.cgi?id=1281469
|
| I don't know why it is taking them so long, but it makes
| me sad.
| f0rgot wrote:
| Thank you for sharing your knowledge here.
|
| A few questions:
|
| If applying the algorithm to (1) produced (3), what
| produced (2)?
|
| How can "no duplicate serial numbers" be enforced by any
| browser without having a store of all certificates? Is it
| simply a best-effort? Will the browser have a mapping from
| <serial number> to <certificate>, and whenever it sees a
| certificate, it will check this map to see if it has seen
| that serial number on a separate certificate?
| johncolanduoni wrote:
| Requiring Certificate Transparency in the browser doesn't
| directly prevent this, but (as in this instance) it
| ensures there is public data anyone can check to see if
| this situation has occurred.
| agwa wrote:
| > _If applying the algorithm to (1) produced (3), what
| produced (2)?_
|
| I believe the root of the problem is that Let's Encrypt
| is creating certificates and precertificates
| independently, instead of creating a precertificate and
| then applying the algorithm to create the corresponding
| certificate. Since their processes for certificates and
| precertificates got out-of-sync, they ended up producing
| (2) instead of (3).
|
| > _How can "no duplicate serial numbers" be enforced by
| any browser without having a store of all certificates?_
|
| Browser software doesn't enforce this. It can only be
| enforced by scanning Certificate Transparency logs
| looking for violations.
| tialaramex wrote:
| Indeed, although the de jure requirement in the policy
| isn't actually important per se, the only practical way
| to obey the policy is to do the thing we want you to do,
| so that's what you'll actually do, but the way the policy
| is phrased makes enforcement practical.
|
| This is different from a "Brown M&M" policy where the
| purpose of the policy is to easily check that you are
| actually reading and obeying the policy document. Here
| the policy is worded in a way that doesn't directly
| achieve what we want, but is measurable, whereas what we
| want isn't, but the only practical way to achieve policy
| compliance is to do what we wanted anyway.
| 0xbadcafebee wrote:
| That's absolutely wild that they had no test to detect
| that. I wonder what other obvious bugs are floating around
| in there.
| rob-olmos wrote:
| Do you have a blog post or writeup on how you discovered that?
| Thanks!
| agwa wrote:
| This all happened less than 2 hours ago, but a quick summary
| is that my Certificate Transparency monitor, Cert Spotter
| (https://sslmate.com/certspotter) performs various sanity
| checks on every certificate that it observes. At 15:41 UTC
| today, I started getting alerts that certificates from Let's
| Encrypt were failing one particular check. I quickly emailed
| Let's Encrypt's problem reporting address, and Let's Encrypt
| promptly suspended issuance so they could investigate. I've
| lost count of how many CAs I've detected having this
| particular problem, so perhaps it is time to blog about it
| (https://www.agwa.name/blog if you're interested).
| danShumway wrote:
| I will also throw out a quick vote that I'd be interested
| in reading a blog post about it.
| conroydave wrote:
| this is why i will always love hacker news. thank you
| [deleted]
| toomuchtodo wrote:
| Curious people contributing to the ongoing functioning of
| critical systems at scale. Thank you for your effort!
|
| https://xkcd.com/2347/
| mholt wrote:
| I would love to read a blog of yours with more information.
| mardifoufs wrote:
| That's awesome!! I wonder if let's encrypt runs sanity
| checks before/after issuing certs too?
| agwa wrote:
| They "lint" certificates before issuance, as do most CAs.
| However, I don't think any linters check for this
| problem, as it requires access to more than just the
| certificate (the linter would need access to either the
| precertificate or a database of Certificate Transparency
| log keys).
| dopamean wrote:
| This iso so awesome. Thank you for sharing. I hope you do
| write about that problem. I'd love to learn something new.
| iso1631 wrote:
| So you can legitimately put "broke the internet" on your resume
| :D
| dylan604 wrote:
| That would be impressive except for all of the AWS us-east-1
| engineers that can claim the same thing
| natebc wrote:
| They didn't do it all by themselves though!
| dylan604 wrote:
| Like you don't have things on your resume that your team
| did and not just you
| gumby wrote:
| One thing I like about the computing world is that people
| put "wrote the package that frobbed jpegs that in
| production frobbs over 1 million jpegs per hour".
| Anything less specific means "I was somewhere in the
| building when this was written and deployed".
|
| When I worked in pharma people would say something like
| "I joined a program targetting neurology early in the
| preclinical phase and developed assays, until first-in-
| human four years later. Started as senior lab technician
| and departed as a junior assistant director for
| preclinical QC."
|
| Took me years to understand how the sociology, regulatory
| dynamics, and science of the two fields legitimately
| resulted in these utterly different approaches.
| mholt wrote:
| Regular reminder that the best ACME clients will fall back to
| other CAs if one is down. For example caddy does this.
| (Disclosure yada yada)
| Arnavion wrote:
| That wouldn't work for caddy if you also follow the best
| practice to have a CAA record pointing to the issuer and
| account URL, unless caddy is also managing DNS records in
| addition to being an HTTP server. (I don't know if it is, but I
| would think it's a layering violation for an HTTP server to
| also be a DNS server.)
| mholt wrote:
| This is true, if you manually configure a CAA limited to just
| one CA, then you lose that benefit of redundancy.
|
| I recommend trusting multiple CAs (but not too many):
| https://matt.life/writing/the-acme-protocol-in-practice-
| and-...
|
| > (I don't know if it is, but I would think it's a layering
| violation for an HTTP server to also be a DNS server.)
|
| Caddy 2 is, at its core, a server of servers. The HTTP server
| is just an "app module" for Caddy. There are other servers; I
| don't know of a DNS server app yet. (CoreDNS is a fork of
| Caddy v1, though.)
| neurostimulant wrote:
| What other CAs you recommend aside from letsencrypt? I'm a bit
| wary of trying some random CAs that offers free certificates
| aside from letsencrypt.
| mholt wrote:
| It's good to be wary.
|
| You can trust the ACME CAs listed on this site:
| https://www.acmeisuptime.com/ (Although, I think that list
| could use some updating. I'll ping the author.)
|
| Personally I would use Let's Encrypt, ZeroSSL (Sectigo) and
| Google Trust Services. There are, of course, others. But
| which ones you choose depend on your requirements and such.
| (Some offer business support, for example.) SSL.com and
| Sectigo also offer ACME but I am not sure how performant
| their CA software is.
| remram wrote:
| The missing disclosure is "I'm the author of the Caddy web
| server". I'm not sure why you would do it halfway.
| yjftsjthsd-h wrote:
| > the best ACME clients will fall back
|
| Are there others that do it, or are you just saying that yours
| is the best?
| mholt wrote:
| There are others that do it.
|
| While we're on the topic, I will say that Caddy has
| independently and repeatedly been cited as the gold standard
| of ACME clients, "the best client experience," and "we hope
| to see other servers follow Caddy's lead." [0] [1] [and
| others I don't have links to currently].
|
| [0]: https://www.youtube.com/watch?v=OE5UhQGg_Fo
|
| [1]: https://jhalderm.com/pub/papers/letsencrypt-ccs19.pdf
| xen2xen1 wrote:
| I used Caddy and was just wondering if my stuff would be
| broke, so the GP comment was useful to me, and snarky as your
| reply was.
| yjftsjthsd-h wrote:
| Sure, there are a handful of useful points there (off the
| top of my head: it's possible to work around this failure
| by using multiple CAs, multiple free ACME CAs exist, caddy
| implements this solution). I'm just 1. slightly frustrated
| that caddy's author never seems to miss a chance for self-
| promotion (at least he's started alluding to the fact that
| it's his project), and 2. actually curious whether any
| other ACME clients are implementing that fallback.
| toast0 wrote:
| Apache mod_md has fallback too,
| https://github.com/icing/mod_md#acme-failover I'm just a
| user, not the author, and I didn't try the fallback. I'm
| more worried about stuff breaking if I switch issuers
| than certs expiring without me noticing. I've got some
| embedded junk that hits my website and has weak cert
| validation, so better to stick with something that works.
| xen2xen1 wrote:
| For self promotion that was pretty light. You ever seen
| the Sourcehut guy on here?
| schoolornot wrote:
| Promoting went from HTTP headers to HN comments.
| mholt wrote:
| > I'm just 1. slightly frustrated that caddy's author
| never seems to miss a chance for self-promotion (at least
| he's started alluding to the fact that it's his project),
|
| Sorry that I happen to be the author. It's really not
| about that though -- it just matters that an ACME-native
| HTTPS server exists. We need more integrated fully-native
| ACME clients.
|
| > actually curious whether any other ACME clients are
| implementing that fallback.
|
| There are at least one or two others. I don't recall
| which ones at the moment but I think Certify the Web may
| be one. Edit: mod_md is another apparently!
| tialaramex wrote:
| Right, it's astounding to me that outfits like
| _Microsoft_ didn 't just immediately ship decent ACME
| implementations. People seem to have settled for third
| party bolt-on solutions. It's like you wake up in an
| alternate world where yeah, no cars come with seat belts,
| but of course everybody buys seatbelts for the car,
| there's usually a store next to the car dealer which
| sells them. Um. What?
|
| Most of the "popular" software in this space is garbage.
| I spent the entire day today (aside from meetings and
| helping other people debug problems) wrestling with the
| fact Apache seems to be designed so heavily with a C
| programmer mindset that even the idea of reporting
| problems has never occurred to them. Just blunder on,
| it'll be fine, don't think about it. You can sprinkle
| complete nonsense into Apache configuration files and,
| until you trip an actual syntactical error and blow up
| their parser, Apache just presses on anyway with the
| nonsense values you provided, and if that doesn't work,
| no reason to report it just do whatever was the default
| and hope that's OK.
|
| As far as I can tell, in the wild the result is a lot of
| Apache configuration is complete nonsense, but hey no
| errors are reported, so, copy, paste, move on.
| agwa wrote:
| If your certificate was issued after the start of the
| incident but before Let's Encrypt suspended issuance, then
| the certificate is currently not working in Chrome or
| Safari.
|
| This incident wasn't just about downtime, it was also about
| issuing non-functional/non-compliant certificates.
| mholt wrote:
| Is this because the certs were revoked? (Revocation is
| broken ;P)
|
| Caddy staples Valid OCSP responses to all certificates
| that have an OCSP responder, so if browsers aren't
| accepting that, then arguably the clients are broken,
| because that response is valid until a few days from now.
| But before the 100% valid and trusted OCSP staple
| expires, Caddy will get a new staple that presumably says
| Revoked, and replace them right away before browsers
| would ever see a Revoked status.
|
| (Revocation is broken ;P)
| agwa wrote:
| No, it's because the SCTs in the certificate have invalid
| signatures.
| mholt wrote:
| Ah, that makes sense!
|
| I wonder if we should be doing some basic sanity checks
| on newly obtained certificates in Caddy, and treat this
| as a failure, and try the next configured CA instead.
|
| (Obviously SCT signatures will require some external
| resource so we would have to weigh that a bit more, maybe
| make it configurable...)
|
| Issue opened here to discuss, though it does sound
| troublesome/tedious:
| https://github.com/caddyserver/certmagic/issues/240
| agwa wrote:
| Yeah, the question is how far you want to go. To be safe
| against every possible CA screwup, you basically have to
| re-implement every browser's entire certificate
| validation engine and run certificates through each one.
| That would obviously be very hard, and could do more harm
| than good if it falls out-of-sync with browsers.
| toast0 wrote:
| It would make sense to try to check this, if there's a
| reasonable way to access a database of trusted
| certificate logs. If not, it's going to be tricky; I
| wouldn't fail a certificate that had a SCT from an
| unknown log, because it might be valid and you don't
| know. Etc.
| yjftsjthsd-h wrote:
| On the bright side, that's actually one of the lower-impact
| things to have an outage on, IMO; if you're using it the
| recommended way, an outage would only really affect new certs,
| with older certs just getting renewed slightly later.
| johncolanduoni wrote:
| If you're onboarding new users into something like a SaaS or
| PaaS it could be a bigger deal.
___________________________________________________________________
(page generated 2023-06-15 23:02 UTC)