[HN Gopher] Load Testing: An Unorthodox Guide
       ___________________________________________________________________
        
       Load Testing: An Unorthodox Guide
        
       Author : fagnerbrack
       Score  : 82 points
       Date   : 2022-09-24 07:54 UTC (2 days ago)
        
 (HTM) web link (www.marcobehler.com)
 (TXT) w3m dump (www.marcobehler.com)
        
       | kqr wrote:
       | Very good article, I think! Reveals many of the subtleties that
       | are easy to get wrong.
       | 
       | ----
       | 
       | > A common shortcut is to generate the load on the same machine
       | (i.e. the developer's laptop), that the server is running on.
       | What's problematic about that? Generating load needs
       | CPU/Memory/Network Traffic/IO and that will naturally skew your
       | test results, as to what capacity your server can handle
       | requests.
       | 
       | This fear is overplayed, I think. A lot of production software is
       | slow enough that load generation will not be a significant
       | CPU/memory/network load. By worrying about this people miss other
       | things that they should worry about much more.
       | 
       | ----
       | 
       | > Run the actual load test for 1-5 minutes. (great numbers, huh?)
       | 
       | This is another common problem: for many garbage collected
       | platforms (and platforms with other types of occasional delayed
       | work), 1-5 minutes is not enough to trigger the pathological
       | behaviour you want to test for. Some systems need to be loaded
       | for 4-6 hours or more to see the full spectrum of behaviour.
        
         | drewcoo wrote:
         | > This fear is overplayed, I think.
         | 
         | It can be difficult to separate the effects on the system from
         | the loader and the server. I think if it's overplayed it's
         | because it's trying to convince people to not make it hard.
         | 
         | Even doing them both on the same box is valuable, though.
         | Functional tests aren't going to catch concurrency issues but
         | load testing however you want to do it just might.
        
       | sparsely wrote:
       | It's very easy to misinterpret the results of load tests. A
       | common mistake is to run the wrong sort of load test (explained
       | here[1]) and have your loaders automatically back off when
       | latency starts increasing. As the author of the OP touches on in
       | a different context, you're now running a pseudo-limit test
       | instead!
       | 
       | I like the author's emphasis on only proceeding if the server's
       | metrics are clearly within normal ranges.
       | 
       | [1]
       | https://gatling.io/docs/gatling/reference/current/core/injec...
        
       | fizwhiz wrote:
       | > It's also important that the loader generates those requests at
       | a constant rate, best done asynchronously, so that response
       | processing doesn't get in the way of sending out new requests.
       | 
       | Coordinated Omission is subtle but deserves more color:
       | https://groups.google.com/g/mechanical-sympathy/c/icNZJejUHf...
        
       | [deleted]
        
       | mikessoft_gmail wrote:
        
       | aymar_99 wrote:
       | https://gatling.io/ Gatling is one among the load test tools
       | people need to give a shot. It uses akka for generating
       | asynchronous loads. And it's good from a programmer standpoint
       | since load test scripts can be programmed and maintained as any
       | other script.
        
       | AzzieElbab wrote:
       | Good article. One thing I would add is the necessity to include
       | failures/unhappy path scenarios in your load scripts. See if all
       | that logging/tracing really is free :)
       | 
       | Slightly off topic: do people still run any load/performance
       | tests on cloud hosted sites? it seems like spiking your aws load
       | is becoming prohibitively expensive. Also, in the real world
       | replacing third party services with some intelligent
       | mocks(simulating failures and latency) can be super complex.
        
       | arein3 wrote:
       | Did anyone use hedless chrome with puppeteer for load tests?
        
         | dev_throw wrote:
         | None of the utility libraries helped with this, instead wrote a
         | shell script to orchestrate concurrent puppeteer load tests
         | with jest and collect and parse reports. It is very resource
         | intensive and spiky. I'm looking to see how we can make it more
         | efficient.
        
       | vinay_ys wrote:
       | Building systems that can scale _efficiently_ requires careful
       | design and implementation. I have seen teams relax the efficiency
       | requirement and manage to handle high load by pre-scaling-up the
       | provisioned capacity of their systems many hours in advance of an
       | anticipated traffic load. This is obviously very wasteful of
       | resources, but also it makes your system brittle, non-resilient.
       | An unanticipated spike in load that stresses the system beyond
       | its provisioned capacity will cause the system to overload and
       | crash /jam-up.
       | 
       | So, rather than testing if your system can handle high load when
       | your system is pre-scaled, testing if your system can continue to
       | provide good throughput even when under heavy load is a better
       | test. Then, if your system is designed to auto-scale in an
       | elastic cloud, testing fast auto-scaling of your systems to see
       | if it increases the good throughput dynamically is a better
       | strategy.
       | 
       | Load testing should be about ascertaining the resiliency,
       | scalability and efficiency characteristics of your system
       | architecture and deployment setup.
       | 
       | Some definitions:
       | 
       | 1. A system is said to be resilient if it can sustain the good
       | throughput even when under load excess relative to available
       | system resources.
       | 
       | 2. A system is said to be scalable if it can quickly respond to
       | increasing load by scaling up its provisioned resources and
       | scale-up its good throughput.
       | 
       | 3. Good throughput is the throughput (request/sec, concurrent
       | user sessions etc) of useful work done with acceptable
       | performance characteristics - ie., latency curve at different
       | percentiles is acceptable.
       | 
       | 4. Efficiency of scalability is the ratio of increase in good
       | throughput to increase in provisioned resources.
       | 
       | 5. Efficiency in general is the concurrent user sessions or
       | requests/sec per node of the provisioned system.
        
       | MikeYasnev007 wrote:
       | LA is marketing cinema and drive) Load testing is more for
       | Arizona server farms
        
       | i_like_apis wrote:
       | I used to do this a lot.
       | 
       | It may look a little clunky, but there is a piece of software
       | called JMeter (https://jmeter.apache.org/) that is capable of
       | doing just about anything you would want to do in a load test -
       | modeling any request behavior, distributed traffic, awesome
       | reports, etc.
       | 
       | 10/10 recommend.
        
         | nerdponx wrote:
         | My problem with JMeter (and load testing in general) is that
         | all the recommended tools are extremely powerful if you know
         | what you're doing, but provide little guidance if you aren't
         | already an expert. I would love an old school "wizard" app that
         | generates a sensible JMeter config for me.
         | 
         | I've just been using the Apache Benchmark "ab" tool with
         | mostly-default settings for now.
        
           | yabones wrote:
           | Jmeter does have a proxy mode that records your traffic and
           | saves it to a working config. From there you can tweak
           | timings, add variables, loops, etc. It's a decent starting
           | place if you have no experience with building load testing
           | plans.
           | 
           | https://jmeter.apache.org/usermanual/jmeter_proxy_step_by_st.
           | ..
        
           | i_like_apis wrote:
           | A wizard is an interesting idea.
           | 
           | As for learning load testing, it's mostly medium-easy in
           | terms of effort and understanding required; it requires a
           | little bit of effort to know what you're doing. Also a bit of
           | scientific common sense and maybe some knowledge of
           | statistics helps.
           | 
           | I think any of these are probably good.
           | https://www.amazon.com/s?k=jmeter
           | 
           | I read a Packt or Apress one that had more than enough on how
           | to use it. I don't remember which one though.
        
         | drewcoo wrote:
         | JMeter is old and crusty and not at all friendly to work with.
         | But I used it for years because it was really about the best we
         | had. Today I don't wish it on anyone.
         | 
         | Ruby JMeter finally made JMeter easier to manage, but I haven't
         | worked in a Ruby shop for years, and I'm not going to force
         | everyone to learn Ruby just to do some load testing.
         | 
         | https://github.com/flood-io/ruby-jmeter
         | 
         | Then along came k6. It's developer-friendly and I've seen
         | people actually enjoy using it. I recommend anyone considering
         | JMeter also take a look at k6. They do a better job of selling
         | it than I do:
         | 
         | https://k6.io
         | 
         | I am also Gatling-curious. Seems like an option for anyone in
         | the JVM ecosystem.
         | 
         | https://gatling.io
        
           | i_like_apis wrote:
           | Disagree. Old? I guess, but that doesn't mean bad ... it has
           | recent feature development and is currently maintained.
           | 
           | Crusty is not an adjective I would use, which seems like more
           | ageism. It's not an amazing GUI, but it's very powerful. It
           | will do whatever you want.
           | 
           | It's quite flexible and easy to define what I need it to do,
           | to store and load configuration, and it has very meaningful
           | reports. All of these things are pretty easy.
           | 
           | A one-size-fits all paid service could give you 3-click
           | interactions and glossy charts, but at the cost of control
           | and understanding what's really going on, and you're more
           | likely to have wrong assumptions about what's happening.
           | 
           | Summary: load testing is just complicated enough that it
           | requires effort, thought, and a good toolset.
           | 
           | Also K6 may be good or whatever, but it is a paid service.
           | JMeter is free.
        
           | brabel wrote:
           | Looks like K6 is basically Gatling but with a JS-based DSL
           | while Gatling uses Scala (or anything JVM-based like
           | Java/Groovy/Kotlin)... there's also Locust[1] which is
           | Python-based.
           | 
           | What I really wanted was an interactive tool, like old LoadUI
           | used to do (Wikipedia, nicely, still has the old screenshots
           | which show how cool it used to look:
           | https://en.wikipedia.org/wiki/LoadUI)... because until you
           | run tests, you just don't know what kind of load you can
           | throw at the server (and whether you're hitting
           | CPU/Memory/Bandwitdth limits instead of server limits).
           | Visual Components were written in a scripting DSL (Groovy
           | based) and dynamically loaded so you could change the code
           | even while the test ran... really awesome stuff, a bit like
           | coding in Lisp with a visual facade on top.
           | 
           | Based on existing tools, it should be relatively easy to
           | build something like that and I am surprised there seems to
           | be nothing of the kind. I've always wanted to do it myself,
           | maybe one day if no one else finally tackles the problem.
           | 
           | [1] https://docs.locust.io/en/stable/
        
         | tbrownaw wrote:
         | A while back, I had to get scaling numbers for something that
         | did multipage forms where individual pages _might_ refresh
         | themselves from the server depending on what dropdowns you
         | picked, _might_ pull option lists if certain questions were
         | visible, etc.
         | 
         | I ended up using Selenium on a giant pile of cloud VMs, because
         | the $$$$/run looked cheaper than my wild guess about the dev
         | time for a more efficient tool.
         | 
         | Do you know of any guides or general overviews for using JMeter
         | (Or Gatling or such) for this, or maybe for determining the
         | appropriate degree to categorize and randomize sessions (some
         | of those option listings were pretty expensive, and it wouldn't
         | have been good to just skip them)? From my brief look at the
         | time, it seemed that it'd be a major pain and I'd have to
         | figure out any script logic on my own without a whole lot in
         | the way of pre-written guides.
        
           | i_like_apis wrote:
           | Ah yes, selenium is definitely too much overhead for load
           | testing. You will get much better cost, performance and
           | control with jmeter. There's also much less to develop once
           | you know how to use it. You can also play back recorded
           | sessions and parameterize variables if you want to design
           | traffic in the browser.
           | 
           | There are a handful of great technical manuals in the way of
           | books about JMeter. Search Amazon for jmeter and you will
           | find some current ones. I read one and it was really helpful,
           | just can't remember which. They all look pretty good though.
        
       | cvwright wrote:
       | Agreed with a lot of the points here, like starting small with a
       | single piece of your API, then slowly expanding your tests once
       | you're comfortable that you know what you're doing.
       | 
       | Note that if you use the Locust framework to write your load
       | tests in Python, it takes care of measuring and reporting the
       | latency and throughput for you. It's really nice.
       | 
       | https://locust.io/
        
         | sparsely wrote:
         | This seems to encourage you to run it with a target number of
         | concurrent users, rather than a target RPS from the loaders,
         | which IME can result in difficulty interpreting the results (it
         | also doesn't reflect most use cases, except in some heavily
         | controlled scenarios). When latencies increase, the fixed
         | number of virtual users will necessarily be sending fewer new
         | requests, leading to a constant or decreasing number of RPS
         | actually being served.
         | 
         | Other load testing tools (Gatling, newer versions of k6) will
         | let you set a target RPS instead.
        
           | ericb wrote:
           | Targeting RPS without also targeting concurrent users is a
           | huge mistake!
           | 
           | Here's a metaphor I like.
           | 
           | Imagine I am trying to open a small high-volume restaurant. I
           | call the chef in for a trial run. I sit at a table, and while
           | I watch, she is able to make 200 dishes an hour. I'm excited,
           | at 200 dishes an hour, we'll be rich! I tell the investors.
           | 
           | Opening day comes!
           | 
           | I have 10 tables and people eat for an hour on average. We
           | serve 10 dishes an hour.
           | 
           | Our restaurant fails.
           | 
           | Throughput, without targeting concurrent users, is a bad
           | test.
        
             | bombcar wrote:
             | Part of it is identifying the bottleneck. You verified the
             | chef wouldn't be it, but the tables ended up being the
             | limiting factor.
             | 
             | And say you installed 200 tables, but only get ten
             | customers an hour - now the bottleneck is the input and
             | that likely has to be solved by marketing or something
             | exterior.
             | 
             | And maybe you get that fixed and now have 200 tables an
             | hour, wnd discover your dishwasher can only handle 50
             | tables an hour. So you need more dishes (buffer) but that
             | will only help until the poor dishwasher is washing dishes
             | 24/7 and at that point the buffer will eventually run out.
             | 
             | Successfully modeling the system and identifying the point
             | at which it will fail is useful, because then you can keep
             | an eye on that point AND know if something unexpected is
             | causing it to fail earlier.
        
             | sparsely wrote:
             | I don't really understand how this analogy works in terms
             | of the actual practice of running load tests. You should of
             | course test the full flow, the important part is not
             | implicitly limiting the input volume based on the system
             | performance. I think in your analogy it would be something
             | about people queuing outside without you noticing but am
             | unsure.
             | 
             | Also, to be clear, the bad thing is normally doing closed
             | workload tests, not having a multistep workflow or whatever
             | (https://gatling.io/docs/gatling/reference/current/core/inj
             | ec...)
        
               | ericb wrote:
               | To complete the analogy, you need to have one concurrent
               | session per "virtual user." This is the "concurrency" of
               | the test. In my bad test, I had _one_ concurrent user
               | driving all the throughput--me.
               | 
               | If I had targeted 200 _concurrent users_ , I would have
               | seen a failure in my test and found what the real
               | throughput would be. Each _concurrent_ user uses many
               | resources that could be limited.
               | 
               | In the restaurant, it is tables, plates, silverware. In a
               | web app, it is sessions, connections, memory, database
               | connections, and many other resources that can be
               | associated with each session, and each may be a potential
               | bottleneck that limits your actual throughput.
               | 
               | If we target throughput and ignore concurrency, we set
               | ourselves up for failure.
               | 
               | In practicality, the better load testing tools let you
               | create "concurrent users." In JMeter, this is the
               | "threads." In other tools, it will be called "virtual
               | users", "vus" or "sessions." In locust, the setting is -u
               | NUM_USERS.
               | 
               | The caution is not to fool yourself with high RPS if
               | these settings aren't right for your test, and to
               | normally target _both_.
        
       ___________________________________________________________________
       (page generated 2022-09-26 23:02 UTC)