[HN Gopher] Load Testing: An Unorthodox Guide
___________________________________________________________________
Load Testing: An Unorthodox Guide
Author : fagnerbrack
Score : 82 points
Date : 2022-09-24 07:54 UTC (2 days ago)
(HTM) web link (www.marcobehler.com)
(TXT) w3m dump (www.marcobehler.com)
| kqr wrote:
| Very good article, I think! Reveals many of the subtleties that
| are easy to get wrong.
|
| ----
|
| > A common shortcut is to generate the load on the same machine
| (i.e. the developer's laptop), that the server is running on.
| What's problematic about that? Generating load needs
| CPU/Memory/Network Traffic/IO and that will naturally skew your
| test results, as to what capacity your server can handle
| requests.
|
| This fear is overplayed, I think. A lot of production software is
| slow enough that load generation will not be a significant
| CPU/memory/network load. By worrying about this people miss other
| things that they should worry about much more.
|
| ----
|
| > Run the actual load test for 1-5 minutes. (great numbers, huh?)
|
| This is another common problem: for many garbage collected
| platforms (and platforms with other types of occasional delayed
| work), 1-5 minutes is not enough to trigger the pathological
| behaviour you want to test for. Some systems need to be loaded
| for 4-6 hours or more to see the full spectrum of behaviour.
| drewcoo wrote:
| > This fear is overplayed, I think.
|
| It can be difficult to separate the effects on the system from
| the loader and the server. I think if it's overplayed it's
| because it's trying to convince people to not make it hard.
|
| Even doing them both on the same box is valuable, though.
| Functional tests aren't going to catch concurrency issues but
| load testing however you want to do it just might.
| sparsely wrote:
| It's very easy to misinterpret the results of load tests. A
| common mistake is to run the wrong sort of load test (explained
| here[1]) and have your loaders automatically back off when
| latency starts increasing. As the author of the OP touches on in
| a different context, you're now running a pseudo-limit test
| instead!
|
| I like the author's emphasis on only proceeding if the server's
| metrics are clearly within normal ranges.
|
| [1]
| https://gatling.io/docs/gatling/reference/current/core/injec...
| fizwhiz wrote:
| > It's also important that the loader generates those requests at
| a constant rate, best done asynchronously, so that response
| processing doesn't get in the way of sending out new requests.
|
| Coordinated Omission is subtle but deserves more color:
| https://groups.google.com/g/mechanical-sympathy/c/icNZJejUHf...
| [deleted]
| mikessoft_gmail wrote:
| aymar_99 wrote:
| https://gatling.io/ Gatling is one among the load test tools
| people need to give a shot. It uses akka for generating
| asynchronous loads. And it's good from a programmer standpoint
| since load test scripts can be programmed and maintained as any
| other script.
| AzzieElbab wrote:
| Good article. One thing I would add is the necessity to include
| failures/unhappy path scenarios in your load scripts. See if all
| that logging/tracing really is free :)
|
| Slightly off topic: do people still run any load/performance
| tests on cloud hosted sites? it seems like spiking your aws load
| is becoming prohibitively expensive. Also, in the real world
| replacing third party services with some intelligent
| mocks(simulating failures and latency) can be super complex.
| arein3 wrote:
| Did anyone use hedless chrome with puppeteer for load tests?
| dev_throw wrote:
| None of the utility libraries helped with this, instead wrote a
| shell script to orchestrate concurrent puppeteer load tests
| with jest and collect and parse reports. It is very resource
| intensive and spiky. I'm looking to see how we can make it more
| efficient.
| vinay_ys wrote:
| Building systems that can scale _efficiently_ requires careful
| design and implementation. I have seen teams relax the efficiency
| requirement and manage to handle high load by pre-scaling-up the
| provisioned capacity of their systems many hours in advance of an
| anticipated traffic load. This is obviously very wasteful of
| resources, but also it makes your system brittle, non-resilient.
| An unanticipated spike in load that stresses the system beyond
| its provisioned capacity will cause the system to overload and
| crash /jam-up.
|
| So, rather than testing if your system can handle high load when
| your system is pre-scaled, testing if your system can continue to
| provide good throughput even when under heavy load is a better
| test. Then, if your system is designed to auto-scale in an
| elastic cloud, testing fast auto-scaling of your systems to see
| if it increases the good throughput dynamically is a better
| strategy.
|
| Load testing should be about ascertaining the resiliency,
| scalability and efficiency characteristics of your system
| architecture and deployment setup.
|
| Some definitions:
|
| 1. A system is said to be resilient if it can sustain the good
| throughput even when under load excess relative to available
| system resources.
|
| 2. A system is said to be scalable if it can quickly respond to
| increasing load by scaling up its provisioned resources and
| scale-up its good throughput.
|
| 3. Good throughput is the throughput (request/sec, concurrent
| user sessions etc) of useful work done with acceptable
| performance characteristics - ie., latency curve at different
| percentiles is acceptable.
|
| 4. Efficiency of scalability is the ratio of increase in good
| throughput to increase in provisioned resources.
|
| 5. Efficiency in general is the concurrent user sessions or
| requests/sec per node of the provisioned system.
| MikeYasnev007 wrote:
| LA is marketing cinema and drive) Load testing is more for
| Arizona server farms
| i_like_apis wrote:
| I used to do this a lot.
|
| It may look a little clunky, but there is a piece of software
| called JMeter (https://jmeter.apache.org/) that is capable of
| doing just about anything you would want to do in a load test -
| modeling any request behavior, distributed traffic, awesome
| reports, etc.
|
| 10/10 recommend.
| nerdponx wrote:
| My problem with JMeter (and load testing in general) is that
| all the recommended tools are extremely powerful if you know
| what you're doing, but provide little guidance if you aren't
| already an expert. I would love an old school "wizard" app that
| generates a sensible JMeter config for me.
|
| I've just been using the Apache Benchmark "ab" tool with
| mostly-default settings for now.
| yabones wrote:
| Jmeter does have a proxy mode that records your traffic and
| saves it to a working config. From there you can tweak
| timings, add variables, loops, etc. It's a decent starting
| place if you have no experience with building load testing
| plans.
|
| https://jmeter.apache.org/usermanual/jmeter_proxy_step_by_st.
| ..
| i_like_apis wrote:
| A wizard is an interesting idea.
|
| As for learning load testing, it's mostly medium-easy in
| terms of effort and understanding required; it requires a
| little bit of effort to know what you're doing. Also a bit of
| scientific common sense and maybe some knowledge of
| statistics helps.
|
| I think any of these are probably good.
| https://www.amazon.com/s?k=jmeter
|
| I read a Packt or Apress one that had more than enough on how
| to use it. I don't remember which one though.
| drewcoo wrote:
| JMeter is old and crusty and not at all friendly to work with.
| But I used it for years because it was really about the best we
| had. Today I don't wish it on anyone.
|
| Ruby JMeter finally made JMeter easier to manage, but I haven't
| worked in a Ruby shop for years, and I'm not going to force
| everyone to learn Ruby just to do some load testing.
|
| https://github.com/flood-io/ruby-jmeter
|
| Then along came k6. It's developer-friendly and I've seen
| people actually enjoy using it. I recommend anyone considering
| JMeter also take a look at k6. They do a better job of selling
| it than I do:
|
| https://k6.io
|
| I am also Gatling-curious. Seems like an option for anyone in
| the JVM ecosystem.
|
| https://gatling.io
| i_like_apis wrote:
| Disagree. Old? I guess, but that doesn't mean bad ... it has
| recent feature development and is currently maintained.
|
| Crusty is not an adjective I would use, which seems like more
| ageism. It's not an amazing GUI, but it's very powerful. It
| will do whatever you want.
|
| It's quite flexible and easy to define what I need it to do,
| to store and load configuration, and it has very meaningful
| reports. All of these things are pretty easy.
|
| A one-size-fits all paid service could give you 3-click
| interactions and glossy charts, but at the cost of control
| and understanding what's really going on, and you're more
| likely to have wrong assumptions about what's happening.
|
| Summary: load testing is just complicated enough that it
| requires effort, thought, and a good toolset.
|
| Also K6 may be good or whatever, but it is a paid service.
| JMeter is free.
| brabel wrote:
| Looks like K6 is basically Gatling but with a JS-based DSL
| while Gatling uses Scala (or anything JVM-based like
| Java/Groovy/Kotlin)... there's also Locust[1] which is
| Python-based.
|
| What I really wanted was an interactive tool, like old LoadUI
| used to do (Wikipedia, nicely, still has the old screenshots
| which show how cool it used to look:
| https://en.wikipedia.org/wiki/LoadUI)... because until you
| run tests, you just don't know what kind of load you can
| throw at the server (and whether you're hitting
| CPU/Memory/Bandwitdth limits instead of server limits).
| Visual Components were written in a scripting DSL (Groovy
| based) and dynamically loaded so you could change the code
| even while the test ran... really awesome stuff, a bit like
| coding in Lisp with a visual facade on top.
|
| Based on existing tools, it should be relatively easy to
| build something like that and I am surprised there seems to
| be nothing of the kind. I've always wanted to do it myself,
| maybe one day if no one else finally tackles the problem.
|
| [1] https://docs.locust.io/en/stable/
| tbrownaw wrote:
| A while back, I had to get scaling numbers for something that
| did multipage forms where individual pages _might_ refresh
| themselves from the server depending on what dropdowns you
| picked, _might_ pull option lists if certain questions were
| visible, etc.
|
| I ended up using Selenium on a giant pile of cloud VMs, because
| the $$$$/run looked cheaper than my wild guess about the dev
| time for a more efficient tool.
|
| Do you know of any guides or general overviews for using JMeter
| (Or Gatling or such) for this, or maybe for determining the
| appropriate degree to categorize and randomize sessions (some
| of those option listings were pretty expensive, and it wouldn't
| have been good to just skip them)? From my brief look at the
| time, it seemed that it'd be a major pain and I'd have to
| figure out any script logic on my own without a whole lot in
| the way of pre-written guides.
| i_like_apis wrote:
| Ah yes, selenium is definitely too much overhead for load
| testing. You will get much better cost, performance and
| control with jmeter. There's also much less to develop once
| you know how to use it. You can also play back recorded
| sessions and parameterize variables if you want to design
| traffic in the browser.
|
| There are a handful of great technical manuals in the way of
| books about JMeter. Search Amazon for jmeter and you will
| find some current ones. I read one and it was really helpful,
| just can't remember which. They all look pretty good though.
| cvwright wrote:
| Agreed with a lot of the points here, like starting small with a
| single piece of your API, then slowly expanding your tests once
| you're comfortable that you know what you're doing.
|
| Note that if you use the Locust framework to write your load
| tests in Python, it takes care of measuring and reporting the
| latency and throughput for you. It's really nice.
|
| https://locust.io/
| sparsely wrote:
| This seems to encourage you to run it with a target number of
| concurrent users, rather than a target RPS from the loaders,
| which IME can result in difficulty interpreting the results (it
| also doesn't reflect most use cases, except in some heavily
| controlled scenarios). When latencies increase, the fixed
| number of virtual users will necessarily be sending fewer new
| requests, leading to a constant or decreasing number of RPS
| actually being served.
|
| Other load testing tools (Gatling, newer versions of k6) will
| let you set a target RPS instead.
| ericb wrote:
| Targeting RPS without also targeting concurrent users is a
| huge mistake!
|
| Here's a metaphor I like.
|
| Imagine I am trying to open a small high-volume restaurant. I
| call the chef in for a trial run. I sit at a table, and while
| I watch, she is able to make 200 dishes an hour. I'm excited,
| at 200 dishes an hour, we'll be rich! I tell the investors.
|
| Opening day comes!
|
| I have 10 tables and people eat for an hour on average. We
| serve 10 dishes an hour.
|
| Our restaurant fails.
|
| Throughput, without targeting concurrent users, is a bad
| test.
| bombcar wrote:
| Part of it is identifying the bottleneck. You verified the
| chef wouldn't be it, but the tables ended up being the
| limiting factor.
|
| And say you installed 200 tables, but only get ten
| customers an hour - now the bottleneck is the input and
| that likely has to be solved by marketing or something
| exterior.
|
| And maybe you get that fixed and now have 200 tables an
| hour, wnd discover your dishwasher can only handle 50
| tables an hour. So you need more dishes (buffer) but that
| will only help until the poor dishwasher is washing dishes
| 24/7 and at that point the buffer will eventually run out.
|
| Successfully modeling the system and identifying the point
| at which it will fail is useful, because then you can keep
| an eye on that point AND know if something unexpected is
| causing it to fail earlier.
| sparsely wrote:
| I don't really understand how this analogy works in terms
| of the actual practice of running load tests. You should of
| course test the full flow, the important part is not
| implicitly limiting the input volume based on the system
| performance. I think in your analogy it would be something
| about people queuing outside without you noticing but am
| unsure.
|
| Also, to be clear, the bad thing is normally doing closed
| workload tests, not having a multistep workflow or whatever
| (https://gatling.io/docs/gatling/reference/current/core/inj
| ec...)
| ericb wrote:
| To complete the analogy, you need to have one concurrent
| session per "virtual user." This is the "concurrency" of
| the test. In my bad test, I had _one_ concurrent user
| driving all the throughput--me.
|
| If I had targeted 200 _concurrent users_ , I would have
| seen a failure in my test and found what the real
| throughput would be. Each _concurrent_ user uses many
| resources that could be limited.
|
| In the restaurant, it is tables, plates, silverware. In a
| web app, it is sessions, connections, memory, database
| connections, and many other resources that can be
| associated with each session, and each may be a potential
| bottleneck that limits your actual throughput.
|
| If we target throughput and ignore concurrency, we set
| ourselves up for failure.
|
| In practicality, the better load testing tools let you
| create "concurrent users." In JMeter, this is the
| "threads." In other tools, it will be called "virtual
| users", "vus" or "sessions." In locust, the setting is -u
| NUM_USERS.
|
| The caution is not to fool yourself with high RPS if
| these settings aren't right for your test, and to
| normally target _both_.
___________________________________________________________________
(page generated 2022-09-26 23:02 UTC)