hngopher.com

       [HN Gopher] 35M Hot Dogs: Benchmarking Caddy vs. Nginx
       ___________________________________________________________________
        
       35M Hot Dogs: Benchmarking Caddy vs. Nginx
        
       Author : EntICOnc
       Score  : 309 points
       Date   : 2022-09-16 12:58 UTC (10 hours ago)
        
 (HTM) web link (blog.tjll.net)
 (TXT) w3m dump (blog.tjll.net)
        
       | petecooper wrote:
       | I'm an Nginx guy, and I have been for some years, but I do love a
       | little bit of Caddy jingoism[1] as the weekend approaches.
       | 
       | This is a good write up. I was expecting Caddy to trounce Nginx,
       | but that wasn't the case. I'll be back to re-read this with fresh
       | eyes tomorrow.
       | 
       | [1] For the avoidance of doubt, this is not meant as a snarky
       | observation.
        
         | mholt wrote:
         | You were expecting Caddy to "trounce" nginx? Most people expect
         | the opposite.
         | 
         | But Caddy certainly does in some cases, especially with the
         | upcoming 2.6 release.
        
           | petecooper wrote:
           | > You were expecting Caddy to "trounce" nginx? Most people
           | expect the opposite.
           | 
           | I absolutely was, yes. As an observer I see a lot of people
           | saying positive things about Caddy around here, and how it's
           | superior performance-wise to a variety of 'classic' httpd
           | software. Lots of people love Caddy, and they're quite vocal,
           | so it's not a stretch to assume there are reasons _why_ they
           | love it. Nginx development has slowed since the events in
           | Ukraine, unsurprisingly, so again it's not a leap to surmise
           | Caddy is making good things happen in the meantime.
        
             | mholt wrote:
             | Ahh, right -- so there's a _lot_ more to performance than
             | just req /sec and HTTP errors. And that's probably the
             | love/hype you're hearing about. (Though Caddy's req/sec
             | performance is quite good too, as you can see!)
             | 
             | Caddy _scales_ better than NGINX especially with regards to
             | TLS /HTTPS. Our certificate automation code is the best in
             | the industry, and works nicely in clusters to coordinate
             | and share, automatically.
             | 
             | Caddy performs better in terms of security overall. Go has
             | stronger memory safety guarantees than C, so your server is
             | basically impervious to a whole class of vulnerabilities.
             | 
             | And if you consider failure modes, there are pros and cons
             | to each, but it can definitely be argued that Caddy
             | dropping fewer requests than nginx (if any at all!) is
             | "superior performance".
             | 
             | I'm actually quite pleased that Caddy can now, in general,
             | perform competitively with nginx, and hopefully most people
             | can stop worrying about that.
             | 
             | And if you operate at Cloudflare-"nginx-is-now-too-slow-
             | for-us"-scale, let's talk. (I have some ideas.)
        
               | CoolCold wrote:
               | Can you add details on _scales better_, what do you mean?
               | I've read recent post from Cloudflare on their thread
               | pool and it makes sense, do you mean things of that sort
               | or?
               | 
               | I had the case when after push notifications mobile
               | clients wakeup and all of them doing TLS handshake to
               | LoadBalancers (Nginx), hitting cpu limit for minute or
               | so, but otherwise had no problem with 5-15k rps and
               | scaling.
        
               | mholt wrote:
               | Caddy does connection pooling (perhaps differently than
               | what Cloudflare's proxy does, we'll have to see once they
               | open source it) just as Go does. But what Caddy does so
               | well is scale well with the number of certificates/sites.
               | 
               | So we find lots of people using Caddy to serve tens to
               | hundreds of thousands of sites with different domain
               | names because Caddy can automate those certificates
               | without falling over. (Huge deployments like this will
               | require a little more config and planning, but nothing a
               | wiki article [0] can't help with. You might also want
               | sufficient hardware to keep a lot of certs in memory,
               | etc.)
               | 
               | Also note that rps is not a useful metric when TLS enters
               | the picture, as it says nothing useful about the actual
               | TLS impact (TLS connection does not necessarily correlate
               | to HTTP request - and there are many modes for TLS
               | connections that vary).
               | 
               | [0]: https://caddy.community/t/serving-tens-of-thousands-
               | of-domai...
        
       | Kiro wrote:
       | Why would any of those fail at a measly 10k clients? 10 billion
       | clients maybe.
        
         | EugeneOZ wrote:
         | We don't have enough humans for that test.
        
           | teknopaul wrote:
           | A test is limited to 1024 clients which is (despite aversion
           | to the term webscale) not a lot, even on an Intranet.
           | 
           | I would say if you are not testing 10kcc you are not pushing
           | the difference between nginx and apache1.3
           | 
           | As soon as you do push 10kcc, kernel tcp buffers and the
           | amount of junk in your browser http headers start to be more
           | import than server perf. Just in the amount of data coming
           | into the nic.
        
       | CoolCold wrote:
       | Would be nice to have SO_REUSEPORT on Nginx optimized - if I read
       | configuration right, it was not used
        
       | la_fayette wrote:
       | These are interesting tests.considering the energy cost of large
       | software systems it would be also intersting, which of these two
       | has a lower co2 footprint.
        
       | boberoni wrote:
       | The killer feature of Caddy for me is that it handles TLS/HTTPS
       | certificates automatically for me.
       | 
       | I only ever use Caddy as a reverse proxy for web apps (think
       | Flask, Ruby on Rails, Phoenix Framework). My projects have never
       | needed high performance, but if my projects ever take off, it's
       | nice to see that Caddy is already competitive with Nginx on
       | resilience, latency, and throughput.
        
       | skyde wrote:
       | TLDR: "Nginx will fail by refusing or dropping connections, Caddy
       | will fail by slowing everything down"
       | 
       | To me it seem that Caddy suffer from BufferBloat. Under heavy
       | congestion the goodput (useful throughput) will drop to 0 because
       | client will start timing-out before the server get a chance to
       | respond.
       | 
       | Caddy should use an algorithm similar to :
       | https://github.com/Netflix/concurrency-limits
       | 
       | Basically check what was the best request latency, and decrease
       | concurrency limit until latency stop improving.
        
         | mholt wrote:
         | Thanks, we'll consider that, maybe as an option. Want to open
         | an issue so we don't forget?
         | 
         | I'd probably learn toward CUBIC:
         | https://en.wikipedia.org/wiki/CUBIC_TCP
         | 
         | (I implemented Reno in college, but times have changed)
         | 
         | The nice thing about Caddy's failure mode is that the server
         | won't give up; the server has no control over if or when a
         | client will time out, so it I never felt it made much sense to
         | optimize for that.
        
       | gordian_NOT wrote:
       | I feel like we never see HAProxy in these reverse proxy
       | comparisons. Lots of nginx, Apache, Caddy, Traefik, Envoy, etc.
       | 
       | The HAProxy configuration is just as simple as Caddy for a
       | reverse proxy setup. It's also written in C which is a comparison
       | the author makes between nginx and Caddy. And it seems to be
       | available on every *nix OS.
        
         | snowwrestler wrote:
         | I'm surprised Varnish is not mentioned much either. For a while
         | there it had a reputation as the fastest reverse proxy. I think
         | its popularity was harmed by complex config and refusal to
         | handle TLS.
        
           | pbowyer wrote:
           | It's always been blisteringly fast when we've used it, and I
           | like the power of the configuration (it has its quirks but so
           | do most powerful systems). But the overhead of setting it up
           | and maintaining it due to having to handle TLS termination
           | separately puts me off using it when other software is 'good
           | enough'. If Varnish Enterprise was cheaper I would have
           | bought it, but at their enterprise prices no way.
           | 
           | I'm keeping a watching brief on
           | https://github.com/darkweak/souin and its Caddy integration
           | to see if that can step up and replace Varnish for short-
           | lived dynamic caching of web applications. Though I've lost
           | track of its current status.
        
             | darkweak wrote:
             | Amazing that you're talking about Souin and it's possible
             | usage to replace Varnish. Let me know if you have question
             | about the configuration or implementation. ATM I'm working
             | on the stabilization branch to have a more stable version
             | and merge the improvements into the caddy's cache-handler
             | module.
        
         | tempest_ wrote:
         | I am not sure I would agree with the assertion that config for
         | HAProxy is just as easy.
         | 
         | In fact I use HAProxy in production pretty regularly because it
         | is solid but its config one of the main reasons I would choose
         | something else.
         | 
         | A basic HAProxy config is fine but it feels like after a little
         | bit each line is just a collection of tokens in a random order
         | that I have to sit and think about to parse.
        
           | gunapologist99 wrote:
           | For simple things, Caddy is nice and easy, but I've struggled
           | with Caddy quite a bit, too, especially for more complex
           | setups. I usually break out haproxy or nginx for really
           | challenging setups, because caddy's documentation and
           | examples are quite sparse (esp v2)
        
             | mholt wrote:
             | What do you struggle with about the documentation or "more
             | complex setups"? I was just on the phone recently with
             | Stripe who has a fairly complex, large-scale deployment,
             | and they seemed to have figured it out with relative ease.
             | 
             | I'm currently on a push to improve our docs, especially for
             | beginners, so feel free to review the changes and leave
             | your feedback:
             | https://github.com/caddyserver/website/pull/263
        
           | bmurphy1976 wrote:
           | I feel the same way. I'm not a fan of haproxy's configuration
           | system. It's really difficult for me to understand it,
           | whereas I feel I can read most nginx/apache configs and
           | immediately know what is supposed to be happening. I still
           | maintain servers under load in production that use all three
           | to this day and I always go back to nginx because of the
           | configuration alone.
        
             | kilburn wrote:
             | I can't comment on haproxy because I haven't used it
             | enough, but I think that the "nginx's config is easy to
             | grasp" posture has a bit of Stockholm syndrome in it.
             | 
             | - Do you want to add a header in this "location" block?
             | Great, you better remember to re-apply all the security
             | headers you've defined at a higher level (server block for
             | instance) because of course adding a new header will reset
             | those.
             | 
             | - Oh, you mixed prefix locations with exact location with
             | regex locations. Great, let's see if you can figure out by
             | which location block will a request end up being processed.
             | The docs "clearly" explain what the priority rules for
             | those are and they're easy to grasp [1].
             | 
             | - I see you used a hostname in a proxy_pass directive
             | (e.g.: http://internal.thing.com). Great, I will resolve it
             | at startup and never check again, because this is the most
             | sensible thing to do of course.
             | 
             | - Oh... now you used a variable (e.g.:
             | http://$internal_host). That fundamentally changes things
             | (how?) so I'll respect the DNS's TTL now. Except you'll
             | have to set up a DNS resolver in my config because I refuse
             | to use the system's normal resolver because reasons.
             | 
             | - Here's an `if` directive for the configuration. It sounds
             | extremely useful, doesn't it? Well.. "if is evil" [2] and
             | you should NOT use it. There be dragons, you've been
             | warned.
             | 
             | I could go on... but I think I've proved my point already.
             | Note that these are not complaints, it's just me pointing
             | out that nginx's configuration has its _very_ significant
             | warts too.
             | 
             | [1] https://nginx.org/en/docs/http/ngx_http_core_module.htm
             | l#loc...
             | 
             | [2] https://www.nginx.com/resources/wiki/start/topics/depth
             | /ifis...
        
               | bmurphy1976 wrote:
               | To be clear I never said it was easy. I have a LOT of
               | issues with Nginx's configuration, I just find it to be
               | significantly less bad than the other options.
               | 
               | Other than Caddy, Caddy has been great so far but I have
               | only used it for personal projects.
        
               | hinkley wrote:
               | All that may be true, but for a lot of us old timers we
               | were coming from apache to nginx and apache's configs can
               | eat a bag of dicks.
               | 
               | Unfortunately it's likely I worked in the same building
               | as one of the people responsible for either creating or
               | at least maintaining that mess, but I didn't know at the
               | time that he needed an intervention.
        
               | TimWolla wrote:
               | Exactly all of this. I've mentioned the first point about
               | add_header redefining instead of appending in a previous
               | HN comment of mine:
               | https://news.ycombinator.com/item?id=27253579. As
               | mentioned in that comment, HAProxy's configuration is
               | much more obvious, because it's procedural. You can read
               | it from top to bottom and know what's happening in which
               | order.
               | 
               | Disclosure: Community contributor to HAProxy, I help
               | maintain HAProxy's issue tracker.
        
               | slivanes wrote:
               | Yes, I've experienced most of these with nginx and it can
               | be a minefield. The best experience I've had configuring
               | a webserver was lighttpd.
        
         | fullstop wrote:
         | I would also like to see benchmarks for reverse proxies with
         | TLS termination.
        
           | porker wrote:
           | h2o [1] was excellent when I tried it for TLS termination,
           | beating hitch in my unscientific tests. And it got http/2
           | priorities right. It's a shame they don't make regular
           | releases.
           | 
           | 1. https://github.com/h2o/h2o/
        
           | mholt wrote:
           | I think one reason a lot of benchmarks don't include TLS
           | termination is because it's often impractical in the real-
           | world, where most clients reuse the connection and the TLS
           | session for many requests, thus making them negligible in the
           | long run. And given hardware optimizations for cryptographic
           | functions combined with network round trips, you end up
           | benchmarking the network and the protocol more than its
           | actual implementation, which is often upstream from the
           | server itself anyway.
           | 
           | Go's TLS stack is set to be more efficient and safer in
           | coming versions thanks to continued work by Filippo and team.
        
             | nerdponx wrote:
             | Maybe it would be a useful benchmark to simulate a scenario
             | like "my site got posted on HN and now I'm getting a huge
             | number of unique page views."
        
               | CoolCold wrote:
               | Any idea on how much traffic could be from HN? I doubt
               | more than 100 rps or any other noticeable load
        
               | viraptor wrote:
               | Around 100k/day with lots of requests concentrated around
               | the start. Still mostly rpm rather than rps.
        
               | mholt wrote:
               | Sure, we've already done this very real test in
               | production a number of times and Caddy doesn't even skip
               | a beat. (IMO that's the best kind of benchmark right
               | there. No need to simulate with pretend traffic!)
        
           | capableweb wrote:
           | Yeah, this tends to be (in my cases) where response times
           | suffer the most, unless your bottleneck is I/O to/from the
           | backend or further away
        
         | gog wrote:
         | HAProxy does not serve static files (AFAIK), so for some stacks
         | you need to add nginx or caddy after haproxy as well to serve
         | static files and forward to a fastcgi backend.
        
           | tomohawk wrote:
           | nginx started out as a web server and over time gained
           | reverse proxy abilities.
           | 
           | haproxy started out as a proxy and has gained some web server
           | abilities, but is all about proxying.
           | 
           | haproxy has less surprises as a reverse proxy than nginx
           | does. Some of the defaults for nginx are appropriate for web
           | serving, but not proxying.
        
         | RcouF1uZ4gsC wrote:
         | > The HAProxy configuration is just as simple as Caddy for a
         | reverse proxy setup.
         | 
         | Does HAProxy have built in support for Let's Encrypt?
         | 
         | That is one of my favorite features. Caddy just automatically
         | manages the certificates for https.
        
           | gordian_NOT wrote:
           | It's not as turn-key as Caddy, that's for sure, but it's
           | there: https://www.haproxy.com/blog/lets-encrypt-acme2-for-
           | haproxy/
        
             | ei8ths wrote:
             | this is great, i'll implement this soon as my current cert
             | is about to expire and have been wanting to get haproxy on
             | lets encrypt.
        
           | TimWolla wrote:
           | It does not, because HAProxy does not perform any disk access
           | at runtime and thus would be unable to persist the
           | certificates anywhere. Disks accesses can be unpredictably
           | slow and would block the entire thread which is not something
           | you want when handling hundreds of thousands of requests per
           | second.
           | 
           | See this issue and especially the comment from Lukas Tribus:
           | https://github.com/haproxy/haproxy/issues/1864
           | 
           | Disclosure: Community contributor to HAProxy, I help maintain
           | HAProxy's issue tracker.
        
             | mholt wrote:
             | That issue has some good explanation, thanks. I wonder if a
             | disk-writing process could be spun out before dropping
             | privileges?
             | 
             | > Disks accesses can be unpredictably slow and would block
             | the entire thread which is not something you want when
             | handling hundreds of thousands of requests per second.
             | 
             | This is not something I see mentioned in the issue, but I
             | don't see why disk accesses need to block requests, or why
             | they have to occur in the same thread as requests?
        
               | TimWolla wrote:
               | When reading along: Keep in mind that I'm not a core
               | developer and thus are not directly involved in
               | development, design decisions, or roadmap. I have some
               | understanding of the internals and the associated
               | challenges based on my contributions and discussions on
               | the mailing list, but the following might not be entirely
               | correct.
               | 
               | > I wonder if a disk-writing process could be spun out
               | before dropping privileges?
               | 
               | I mean ... it sure can and that appears the plan based on
               | the last comment in that issue. However the "no disk
               | access" policy is also useful for security. HAProxy can
               | chroot itself to an empty directory to reduce the blast
               | radius and that is done in the default configuration on
               | at least Debian.
               | 
               | > but I don't see why disk accesses need to block
               | requests
               | 
               | My understanding is that historically Linux disk IO was
               | inherently blocking. A non-blocking interface (io_uring)
               | only became available fairly recently:
               | https://stackoverflow.com/a/57451551/782822. And even
               | then it's a operating system specific interface. For the
               | BSD's you need a different solution.
               | 
               | If your process is blocked for even one millisecond when
               | handling two million of requests per second
               | (https://www.haproxy.com/de/blog/haproxy-forwards-
               | over-2-mill...) then you drop 2k requests or increase
               | latency.
               | 
               | > or why they have to occur in the same thread as
               | requests?
               | 
               | "have" is a strong word, of course nothing "has" to be.
               | One thing to keep in mind is that HAProxy is 20 years old
               | and apart from possibly doing Let's Encrypt there was no
               | real need for it to have disk access. HAProxy is a
               | reverse proxy / load balancer, not a web server.
               | 
               | Inter-thread communication comes with its own set of
               | challenges and building something reliable for a narrow
               | use case is not necessarily worth it, because you likely
               | need to sacrifice something else.
               | 
               | As an example at scale you can't even let your operating
               | system schedule out one of the worker threads to schedule
               | in the "disk writer" thread, because that will
               | effectively result in a reduced processing capacity for
               | some fractions of a second which will result in dropped
               | requests or increased latency. This becomes even worse if
               | the worker holds an important lock.
        
           | fullstop wrote:
           | Built-in? Not exactly, but there is an acmev2 implemention
           | from haproxytech: https://github.com/haproxytech/haproxy-lua-
           | acme
        
           | abdusco wrote:
           | I use caddy mostly as a reverse proxy in front of an app.
           | It's just one line in the caddy file:
           | sub.domain.com {           # transparent proxy + websocket
           | support + letsencrypt TLS           reverse_proxy
           | 127.0.0.1:2345         }
           | 
           | It's a fresh breath of air to have server with sensible
           | defaults after dealing with apache and nginx (haproxy isn't
           | much better in that regard).
        
             | mholt wrote:
             | If that's your whole Caddyfile, might as well not even use
             | a config file:                   caddy reverse-proxy --from
             | sub.domain.com --to :2345
             | 
             | Glad you like using Caddy!
        
               | bmurphy1976 wrote:
               | Personally I still recommend the config file. Even when
               | they are simple, it gives you one single source of truth
               | that you can refer to, it will grow as you need it, and
               | it can be stored in source control.
               | 
               | Where and how parameters are configured is a bit more of
               | a wild card and dependent on the environment you are
               | running in.
        
               | francislavoie wrote:
               | That's something Matt and I tend to disagree on - I agree
               | that a config file is better almost always because it
               | gives you a better starting point to experiment with
               | other features.
        
               | mholt wrote:
               | Hey, I mean, I do agree that a config file is "better"
               | most of the time -- but having the CLI is just so
               | awesome! :D
        
             | CoolCold wrote:
             | I still cannot make myself to try Caddy.. in things like
             | this looks sweet but just may be 5% of the functionality [I
             | care about]. Not saying it's not possible, but with Nginx I
             | already know how to do list of CORS, OPTIONS , per location
             | & cookie name caching. Issuing certs is probably simplest
             | and the last thing in config setups of reverse proxying.
        
         | tylerjl wrote:
         | FWIW I'm a big fan of HAProxy as well, but I was just
         | constrained by the sheer volume of testing and how rigorous I
         | intended to be. Maybe once my testing is a little more
         | generalized I can fan out to additional proxies like HAProxy
         | without too much hassle, as I'd love to know as well.
        
           | tomohawk wrote:
           | Would love to see this
        
       | stefantalpalaru wrote:
       | > I'll build hosts with Terraform (because That's What We Use
       | These Days) in EC2
       | 
       | > [...]
       | 
       | > Create two EC2 instances - their default size is c5.xlarge
       | 
       | When you're benchmarking, you want a stable platform between
       | runs. Virtual private servers don't offer that, because the
       | host's resources are shared between multiple guests, in
       | unpredictable ways.
        
         | zmxz wrote:
         | Which platform would you suggest to use for this benchmark?
        
           | bdcravens wrote:
           | Ideally your own hardware with nothing else running on it.
           | For convenience you could use a VM assuming they were setup
           | identically.
        
           | 0x457 wrote:
           | Well, AWS offers "metal" servers.
        
         | stevewatson301 wrote:
         | The c5 instances get dedicated cores, and thus should be exempt
         | from resource contention due to shared cores.
        
           | speedgoose wrote:
           | Do you get dedicated IOs on these too? AWS tends to throttle
           | heavily most instances after some time.
        
             | stevewatson301 wrote:
             | For dedicated disk IOPS you should take a look at the EBS
             | provisioned IO volumes, or perhaps use the ephemeral stores
             | that come with some of their more expensive instances.
        
         | tylerjl wrote:
         | This is hard because while, yes, some platform with less
         | potential for jitter and/or noisy neighbors would help
         | eliminate outside influence on the metrics, I think it's also
         | valuable to benchmark these in a scenario that I would assume
         | _most_ operators would run them in, which is a VPS situation
         | and not bare-metal. FWIW, I did try really hard to eliminate
         | some of the contamination in the results that would arise from
         | running in a VPS by doing things like using the _same_ host
         | reconfigured to avoid potential shifts in the underlying
         | hypervisor, etc.
         | 
         | But I would certainly agree that, for the utmost accurate
         | results, a bare-metal situation would probably be more accurate
         | than what I have written.
        
       | tylerjl wrote:
       | Hey y'all, author here. Traffic/feedback/questions are coming in
       | hot, as HN publicity tends to engender, but I'm happy to answer
       | questions or discuss the findings generally here if you'd like
       | (I'm looking through the comments, too, but I'm more likely to
       | see it here).
        
         | jacooper wrote:
         | The black color for caddy in the charts is very hard to read in
         | dark mode it would be great if you can change it to other
         | colors
        
       | Havoc wrote:
       | Close enough to not matter in most use cases. ie pick whatever is
       | convenient
        
         | 5d8767c68926 wrote:
         | When would it matter? I write in Python, so performance was
         | never a concern for me, but I am curious the scenarios in which
         | this was likely to be the weakest link in real workloads.
         | 
         | Given available options, I will take the network software
         | written in a memory safe language every time.
        
         | zivkovicp wrote:
         | This is almost always the case, no matter the service we're
         | talking about.
        
           | no_time wrote:
           | Is this how we ended up with electron for desktop
           | applications and Java for backend?
        
             | philipwhiuk wrote:
             | Yes, because developers and expensive and so developer
             | productivity dominates almost everything else.
        
           | eddieroger wrote:
           | Also, "pick what you know" applies here, too. If you know
           | NGINX, then all you get from switching to Caddy is
           | experience, and likewise, vice versa.
        
             | mholt wrote:
             | *and memory safety*
             | 
             | This cannot be understated. Caddy is not written in C! And
             | it can even run your NGINX configs. :)
             | https://github.com/caddyserver/nginx-adapter
        
               | excitom wrote:
               | A solution in search of a problem.
        
               | 5d8767c68926 wrote:
               | Nginx security page [0] lists a non-zero amount of
               | exploitable issues rooted in manual memory management.
               | 
               | [0] https://nginx.org/en/security_advisories.html
        
         | shabbatt wrote:
         | This is the answer I was looking for but sadly, this type of
         | insignificance becomes ammunition for managers/founders who are
         | obsessed with novelty
        
       | anonymouse008 wrote:
       | > Wew lad! Now we're cooking with gas.
       | 
       | This is the new gold standard for benchmarks!
       | 
       | OP / Author, stupendously tremendous job. The methodology is
       | defensible and sensible. Thank you for doing this on behalf of
       | the community.
        
         | tylerjl wrote:
         | That's very kind of you to say, thank you!
        
         | mholt wrote:
         | Yeah, Tyler did an amazing job.
        
         | lelandfe wrote:
         | Seconding!
         | 
         | I am also in love with the friendliness and tone of the
         | article. I'm a complete dummy when it comes to stuff like this
         | and still understood most of it. Feynman would be proud.
        
       | cies wrote:
       | I like Caddy, but on prod I do not need SSL (off-loaded by the
       | LB), so I stick to nginx after reading this.
       | 
       | Guess I'm waiting for Cloudflare to FLOSS-release their proxy
       | https://news.ycombinator.com/item?id=32864119 :)
        
       | mholt wrote:
       | This is a great writeup overall. I was happy to see Tyler's
       | initial outreach before conducting his tests [0]. However, please
       | note that these tests are also being revised shortly after some
       | brief feedback [1]:
       | 
       | - The sendfile tests at the end actually didn't use sendfile, so
       | expect much greater performance there.
       | 
       | - All the Caddy tests had metrics enabled, which are known[2] to
       | be quite slow currently. Nginx does not emit metrics in its
       | configuration, so in that sense the tests are a bit uneven. From
       | my own tests, when I remove metrics code, Caddy is 10-20% faster.
       | (We're working on addressing that [3].)
       | 
       | - The tests in this article did not tune reverse proxy buffers,
       | which are 4KB by default. I was able to see moderate performance
       | improvements (depending on the size of payload) by reducing the
       | buffer size to 1KB and 2KB.
       | 
       | I want to thank Tyler for his considerate and careful approach,
       | and for all the effort put into this!
       | 
       | [0]: https://caddy.community/t/seeking-performance-suggestions-
       | fo...
       | 
       | [1]: https://twitter.com/mholt6/status/1570442275339239424
       | (thread)
       | 
       | [2]: https://github.com/caddyserver/caddy/issues/4644
       | 
       | [3]: https://github.com/caddyserver/caddy/pull/5042
        
         | tylerjl wrote:
         | Thanks, Matt! I've pushed the revised section measuring
         | sendfile and metrics changes, so those should be accurate now.
         | 
         | Phew. Caches are purged, my errors are fixed. I can rest
         | finally. If folks have questions about anything, I'm happy to
         | answer.
        
           | tomcam wrote:
           | Just want to say your writing is the best quirky balance of
           | fun and substance, reminiscent of Corey Quinn [1]. Thanks for
           | doing a so damn much work and for the instantly relatable
           | phrase "Nobody is allowed to make mistakes on the Internet".
           | 
           | [1] https://www.lastweekinaws.com
        
             | tylerjl wrote:
             | Thank you, that's very kind! There's a reason I included
             | Corey's name in my hastily cobbled-together skeleton meme
             | [1]. Hopefully my writing achieves that level of technical
             | approachability.
             | 
             | [1]: https://blog.tjll.net/reverse-proxy-hot-dog-eating-
             | contest-c...
        
               | tomcam wrote:
               | How did I miss that. Anyway, you succeeded.
        
             | QuinnyPig wrote:
             | That's very kind of you to say. Do I get to put this on my
             | resume?
        
               | tomcam wrote:
               | THE MAN HIMSELF
               | 
               | I can die happy
        
         | fariszr wrote:
         | > - All the tests had metrics enabled, which are known[1] to be
         | quite slow. From my own tests, when I remove metrics code,
         | Caddy is 10-20% faster.
         | 
         | But disabling metrics is not supported in standard Caddy, you
         | need to remove specirc code and recompile to disable it.
         | 
         | So maybe benchmarking with it isn't fair to Nginx?
        
           | tialaramex wrote:
           | Yeah, I think Fair comparisons are:
           | 
           | * How do these things perform by default. This is how they're
           | going to perform for many users, because if it's adequate
           | nobody will tune them, why bother.
           | 
           | * How do these things perform with performance configuration
           | as often recommended online. This is how they'll perform for
           | people who think they need performance but don't tune or
           | don't know how to tune, this _might be worse than default_
           | but that 's actually useful information.
           | 
           | * How do these things perform when their authors get to tune
           | them for our test workload. This is how they'll perform for
           | users who squeeze every drop and can afford to get somebody
           | to do real work to facilitate, possibly even hiring the same
           | authors to do it.
           | 
           | In some cases I would also really want to see:
           | 
           | * How do these things perform with _recommended security_. A
           | benchmark mode with great scores but lousy security can
           | promote a race to the bottom where everybody ships insecure
           | garbage by default then has a mode which is never measured
           | and has lousy performance yet is mandatory if you don 't
           | think Hunter2 is a great password.
        
             | mholt wrote:
             | > How do these things perform by default.
             | 
             | Agreed on this one -- today I'm looking at how to disable
             | metrics by default and make them opt-in. At least until the
             | performance regression can be addressed.
             | 
             | Update: PR opened:
             | https://github.com/caddyserver/caddy/pull/5042 - hoping to
             | land that before 2.6.
        
               | dQw4w9WgXcQ wrote:
               | Good stuff dude. Listens to users, sees a problem,
               | doesn't take it personally, makes a fix. Caddy's going
               | places.
        
               | mholt wrote:
               | I'm also grateful that Dave, the original contributor of
               | the metrics feature, isn't taking it personally. We love
               | the functionality! Just gotta refine it...
        
           | mholt wrote:
           | > But disabling metrics is not supported in standard Caddy,
           | you need to remove specirc code and recompile to disable it.
           | 
           | We're addressing that quite soon. Unfortunately the original
           | contributor of the feature has been too busy lately to work
           | on it, so we might just have to go the simple route and make
           | it opt-in instead. Expect to see a way to toggle metrics
           | soon!
           | 
           | Update: PR opened:
           | https://github.com/caddyserver/caddy/pull/5042
        
             | Bilal_io wrote:
             | That was fast. I love it!
        
             | philipwhiuk wrote:
             | There's always going to be some cost to metrics, going
             | forward you probably just want to document it and then
             | update the figure as you tune it. Higher performance opt-in
             | metrics are the sort of thing a company using it at scale
             | ought to be able to help with/sponsor work on.
        
               | mholt wrote:
               | Absolutely. The plan is to make it opt-in for now, and
               | then having a company sponsor the performance tuning
               | would be very welcomed. Otherwise it'll probably sit
               | until someone with the right skills/know-how and time
               | comes along.
        
         | hinkley wrote:
         | > All the Caddy tests had metrics enabled
         | 
         | One of the great mysteries in (my) life is why people think
         | that measuring things is free. It always slows things down a
         | bit and the more precisely you try to measure speed, the slower
         | things go.
         | 
         | I just finished reducing the telemetry overhead for our app by
         | a bit more than half, by cleaning up data handling. Now it's
         | ~5% of response time instead of 10%. I could probably halve
         | that again if I could sort out some stupidity in the
         | configuration logic, but that still leaves around 2-3% for
         | intrinsic complexity instead of accidental.
        
       | asb wrote:
       | I wrote up a few notes on my Caddy setup here
       | https://muxup.com/2022q3/muxup-implementation-notes#serving-...
       | which may be a useful reference if you have a static site and
       | wanted to tick off a few items likely on your list (brotli,
       | http3, cache-control, more fine-grained control on redirects).
       | 
       | I don't think performance is ever going to matter for my use
       | case, but one thing I think is worth highlighting is the quality
       | of the community and maintainership. In a thread I started asking
       | for feedback on my Caddyfile
       | (https://caddy.community/t/suggestions-for-simplifying-my-
       | cad...), mholt determined I'd found a bug and rapidly fixed it. I
       | followed up with a PR
       | (https://github.com/caddyserver/website/pull/264) for the docs to
       | clarify something related to this bug which was reviewed and
       | merged within 30 minutes.
        
         | mholt wrote:
         | Thanks for the comments and participation!
         | 
         | I'm still thinking about that `./`-pattern-matching problem.
         | Will probably have to be addressed after 2.6...
        
         | fariszr wrote:
         | A very helpful post, thanks for sharing it!
        
       | jiripospisil wrote:
       | I'm impressed with Caddy's performance. I was expecting it to
       | fall behind mainly due to the fact it's written in Go but
       | apparently not. It's a bit disappointing that it's slower in
       | reverse proxying, as that's one of the most important use cases,
       | but now that it's identified maybe they can make some
       | improvements. Finally, there really should be a max memory / max
       | connections setting (maybe there is?).
        
         | pjmlp wrote:
         | I am not a big fan of Go's design, however that is exactly one
         | reason I tend to argue for it.
         | 
         | There is enough juice in compiled managed languages that expose
         | value types and low level features, it is a matter to learn how
         | to use the tools on the toolbox instead of taking always the
         | hammer out.
        
         | zekica wrote:
         | Goroutines are efficient enough, and Go compiles to native
         | code. I'm sure that Rust/Tokio or handcrafted C can be faster,
         | but I think Go is fast enough for 99% of use cases.
         | 
         | I'm building a service manager a la systemd in Go as a side
         | project, and I really like it - it's not as low level as Rust
         | and has a huge runtime but it is impressively fast.
        
         | shabbatt wrote:
         | The only reason for me to consider Caddy was reverse proxy. Now
         | that reason is gone and I'm happy with nginx
        
       | teknopaul wrote:
       | Worker_connections 1024;
       | 
       | Hello?
       | 
       | http://xtomp.tp23.org/book/100kcc.html
       | 
       | Try worker_connections 1000000;
        
       | mordornginx wrote:
       | People still liking nginx making money on it.
       | 
       | nginx awful use and only make easy accidentally shoot one's
       | feet's.
        
       | samcrawford wrote:
       | Great write-up! One question I had was around the use of
       | keepalives. There's no mention in the article of whether
       | keepalives were used between the client and reverse proxy, and no
       | mention of whether it was used between the reverse proxy and
       | backend.
       | 
       | I know Nginx doesn't use keepalives to backends by default (and I
       | see it wasn't setup in the optimised Nginx proxy config), but it
       | looks like Caddy does have keepalives enabled by default.
       | 
       | Perhaps that could explain the delta in failure rates, at least
       | for one case?
        
         | mholt wrote:
         | Are you talking about HTTP keepalive or TCP keepalive?
         | 
         | Keepalives can actually reduce the performance of a server with
         | many concurrent clients (i.e. a benchmark test), and have other
         | weird effects on benchmarks: https://www.nginx.com/blog/http-
         | keepalives-and-web-performan...
        
           | teknopaul wrote:
           | Same thing. Http has no keep alive feature, you don't send
           | http keep alive requests, if http 1.1 asks for keepalives
           | it's a tcp thing.
        
             | mholt wrote:
             | They are distinct in Go. The standard library uses "HTTP
             | keep-alive" to mark connections as idle based on most
             | recent HTTP request, whereas TCP keep-alive checks only
             | ACKs.
        
       | davidjfelix wrote:
       | FYI to author (who is in the comments): you may want to prevent
       | the graphs from allowing scroll to zoom, I was scrolling on the
       | page and the graphs were zooming in and out.
        
       | bagels wrote:
       | I think I don't get the joke. What does the X-Hotdogs header do?
        
         | Arnavion wrote:
         | The header does nothing. As the article says, the author sent
         | the header in every request, made a total of 35M requests, and
         | thus gained a reason to use 35M hot dogs in the article title.
        
           | tylerjl wrote:
           | Correct. Maybe it blows my credibility out of the water and
           | I'll be shamed for life, who knows
        
       | fullstop wrote:
       | The author's writing style reminds me of Andy Weir's Project Hail
       | Mary or The Martian.
        
       | pdhborges wrote:
       | The author linked to wrk2 but I think he ended up using a k6
       | executor that exhibits the problem wrk2 was designed to solve.
        
         | tylerjl wrote:
         | Damn. This is probably worth swapping out k6 for if I manage to
         | pull off a second set of benchmarks. Thanks for the heads-up.
        
         | hassy wrote:
         | Yep, k6 suffers from coordinated omission [1] with its default
         | settings.
         | 
         | A tool that can send a request at a constant rate i.e. wrk2 or
         | Vegeta [2] is a much better fit for this type of a performance
         | test.
         | 
         | 1. https://www.scylladb.com/2021/04/22/on-coordinated-omission/
         | 
         | 2. https://github.com/tsenart/vegeta
        
           | imiric wrote:
           | With its default settings, yes, but k6 can be configured to
           | use an executor that implements the open model[1].
           | 
           | See more discussion here[2].
           | 
           | [1]: https://k6.io/docs/using-k6/scenarios/arrival-rate/
           | 
           | [2]: https://community.k6.io/t/is-k6-safe-from-the-
           | coordinated-om...
        
       | nickjj wrote:
       | I'm surprised no benchmarks were done with logging turned on.
       | 
       | I get wanting to isolate things but this is the problem with
       | micro benchmarks, it doesn't test "real world" usage patterns.
       | Chances are your real production server will be logging to at
       | least syslog so logging performance is worth looking into.
       | 
       | If one of them can write logs with 500 microseconds added to
       | reach request but the other takes 5 milliseconds that could be a
       | huge difference in the end.
        
         | tylerjl wrote:
         | This is - along with some reverse proxy settings tweaks - one
         | of the variables I'd be keen to test in the future, since it's
         | probably _the_ most common delta against my tests versus real-
         | world applications.
        
         | mholt wrote:
         | Caddy's logger (uber/zap) is zero-allocation. We've found that
         | the _writing_ of the logs is often much slower, e.g. printing
         | to the terminal or writing to a file. And that 's a system
         | problem more than a Caddy one. But the actual emission of logs
         | is quite fast last time I checked!
        
           | nickjj wrote:
           | I think your statement is exactly why logging should have
           | been turned on, at least for 1 of the benchmarks. If it's a
           | system problem then it's a problem that both tools need to
           | deal with.
           | 
           | If one of them can do 100,000 requests per second but the
           | other can do 80,000 requests per second but you're both
           | capped at 30,000 requests per second because of system level
           | limitations then you could make a strong case that both
           | products perform equally in the end.
        
       | metaltyphoon wrote:
       | I wonder how this compares to YARP.
        
       | kijin wrote:
       | If you tell nginx to limit itself to 4 workers x 1024 connections
       | per worker = 4096 connections, and hurl 10k connections at it, of
       | course it's going to throw errors. It's doing exactly what you
       | told it to do.
       | 
       | That's just one example of how OP's "optimized" nginx config is
       | barely even optimized. There are lots of other variables that you
       | can tweak to get even better performance and blow Caddy out the
       | window, but those tweaks are going to depend on the specific
       | workload you expect to handle. There isn't a single, perfectly
       | optimized, set of values that's going to work for everyone.
       | 
       | The beauty of Caddy is that you get most of that performance
       | without having to tweak anything.
        
         | teknopaul wrote:
         | Nginx scales to 1,000,0000 workers per vm in my tests, but
         | bandwidth is silly.
         | 
         | I got those results by seriously limiting the junk in http
         | headers. Not with real browsers.
         | 
         | If you have that demand for any commercial service, you have
         | money to distribute your load globally across more than one
         | nginx instance.
        
       ___________________________________________________________________
       (page generated 2022-09-16 23:00 UTC)