[HN Gopher] In praise of property-based testing (2019)
___________________________________________________________________
In praise of property-based testing (2019)
Author : BerislavLopac
Score : 52 points
Date : 2021-04-03 11:15 UTC (11 hours ago)
(HTM) web link (increment.com)
(TXT) w3m dump (increment.com)
| jrockway wrote:
| Neither the bad example nor good example of the "add contributors
| up to the limit" test check the state change where users stop
| being accepted. You really want: add user 1 -> ok, add user 2 ->
| ok, add user 3 -> ok, add user 4 -> fail.
|
| Doesn't matter what methodology you use if your test doesn't test
| what you want it to.
|
| I also fear that in general, people try to make their tests too
| clever. I don't want the test to generate random data that's
| different every time, unless it's a fuzz test. I want to see the
| inputs and the expected outputs as clearly as possible, so that
| when the test fails I'm not guessing whether or not there is some
| bug 30 helpers deep, or if I simply added a bug with my change.
| The real world is complicated and people are going to enter a lot
| of invalid data into your application, so it is crucial that you
| test that. But you also need the basics to work before you worry
| about the advanced cases.
| dfee wrote:
| There are a lot of words without a strong definition, but it
| seems to summarize to:
|
| "Property testing is where you vary the parameters, perhaps
| within certain well defined constraints, and through
| randomization ensure certain coverage."
|
| I think there are other ways of handling this, beyond the way
| described, but I'm not sure if that's because I just read a field
| guide to Hypothesis (again, without a firm definition), or if
| other approaches - such as test parameterization at key
| boundaries is more effective.
|
| My hunch is that property based testing provides less value on
| unit tests - where it could be used to reduce coverage - and more
| value in integrating those units together - where the number of
| permutations can grow very quickly.
| BerislavLopac wrote:
| > Property testing is where you vary the parameters, perhaps
| within certain well defined constraints, and through
| randomization ensure certain coverage.
|
| No - this would be akin to saying "unit testing is when you use
| assert statements".
|
| Varying the parameters -- and using far more than simple
| randomisation -- is just a technique; the _purpose_ of property
| testing is that the behaviour of the unit (or system) under
| test fulfills certain properties, in ma mathematical sense of
| the word. For a simple example: if you 're writing a function
| that is adding two integer numbers, you want to test that it
| satisfies all the properties of addition [0]. So your tests
| need to validate against a number of combinations of input
| values, many of which are not random (e.g. to test for the
| "successor" property one of them needs to be 1).
|
| [0] https://en.wikipedia.org/wiki/Addition#Properties
| Jtsummers wrote:
| PBT is nice in place of (some) unit tests in that you can
| describe immediately the properties you expect without needing
| to produce a collection of examples (or write a custom
| generator that produces a limited set of examples, at which
| point you're halfway to PBT anyways).
|
| It's also helpful to use it in a piece wise fashion if you're
| doing TDD. An illustrative example (though perhaps not stellar
| as it is a synthetic, non-real-world example) uses the diamond
| kata, TDD, and PBT together [0]. None of the tests on their own
| fully specify the system, but in total they do.
|
| If you're doing TDD (or attempting to) I think this is an
| interesting case. Many TDD methods have you start off with an
| example case like (to stick with this kata, and using
| Pythonesque pseudocode because I'm still not awake this
| Saturday morning): diamond-kata-test-a:
| assert(diamond('A') == 'A')
|
| Great, so now someone makes that absolute simplest solution:
| diamond(c): return 'A'
|
| Now repeat with a second test case: diamond-
| kata-test-b: assert(diamond('B') == ' A \nB B\n A ')
|
| And the function is duly complicated:
| diamond(c): switch c: case 'A': return 'A'
| case 'B': return ' A \nB B\n A ' default: return
| 'blah' // or error, doesn't matter it's not tested
|
| But not actually generalized to reflect the intent of the
| system. By focusing on properties, I've found, the progression
| of the UUT is a bit better/more natural.
|
| Another interesting thing to do with PBT is model-based testing
| [1]. The useful thing here is that sometimes the errors are
| triggered by a peculiar, though plausible, sequence of commands
| to your system. We've all worked with that one guy who somehow
| manages to find exactly the right sequence that triggers weird
| edge cases and errors, but unless we're him having a system
| which will generate arbitrary execution traces for you. (I
| actually used FsCheck for this last year in trying to sell PBT
| to my colleagues and was able to identify where a known issue
| originated as well as several other problems that hadn't been
| found by users or testers yet.)
|
| In the end, when these failures are found you can always turn
| them into distinct unit tests in order to preserve them and
| prevent regressions. The two modes of testing fit well
| together.
|
| [0]
| http://christophethibaut.com/programming/2020/03/18/Diamond-...
|
| [1] https://fscheck.github.io/FsCheck/StatefulTesting.html
| diurnalist wrote:
| Thank you for sharing this! I've been intrigued by the idea of
| property tests for a while but in my mind it's relegated to the
| "mad science" corner of tools I would use, partly because most
| examples or cases made for it that I've seen have used examples
| and use cases that didn't translate easily to the day-to-day
| systems (html web servers mostly) I work on. I like that this
| post uses Django as the motivating example.
|
| The "shrinking" capability of the test library highlighted is
| brilliant.
|
| I'm inspired to think of how to start to leverage something like
| this on some upcoming work.
| pfdietz wrote:
| Hypothesis does shrinking in an interesting way.
|
| The first idea you think of for shrinking it to take the
| randomly generated values and try to make them smaller. But the
| generator may be imposing constraints on the values, and if you
| lose those constraints, the input becomes invalid.
|
| An example of this problem is generating valid C programs to
| test C compilers (by compiling with different compilers or
| different optimization settings and seeing if the behavior
| differs). The constraint there is that the C program not show
| undefined or implementation-specific behavior. Naively
| shrinking a C program will not in general preserve this
| property.
|
| Hypothesis takes a different approach to shrinking: it records
| the sequence of random values used by the generator, and
| replays the generator on mutations of that sequence that do not
| increase its length. The only way this can fail is if the
| generator runs out of values on the mutated sequence.
| Otherwise, the new output will always satisfy the constraints
| imposed by the generator. Hypothesis does various clever things
| to speed this up.
| adkadskhj wrote:
| I'm quite new to property testing, first introduced recently
| via a Rust property testing framework proptest[0]. So far
| i've had the feeling that property testing frameworks need to
| include a way to rationalize complexity, as their assumptions
| can easily fall short as you illustrated.
|
| Eg the simplest example might be an application where you
| input an integer, but a smaller int actually drives up the
| complexity. This idea gets more complex when we consider a
| list of ints, where a larger list and larger numbers are
| simpler. Etcetc.
|
| It would be neat if a proptest framework supported (and maybe
| they do, again, i'm a novice here) a way to rank complexity.
| Eg a simple function which gives two input failures and you
| can choose which is the simpler.
|
| Another really neat way to do that might be to actually
| compute the path complexity as the program runs based on the
| count of CPU instructions or something. This wouldn't always
| be a parallel, but would often be the right default i
| imagine.
|
| Either way in my limited proptest experience, i've found it a
| bit difficult to design the tests to test what you actually
| want, but very helpful once you establish that.
|
| [0]: https://crates.io/crates/proptest
| garethrowlands wrote:
| The best thing, in my view, about property testing is that it
| allows you to state the properties you're testing for, as opposed
| to some examples of them.
|
| For example if I'm making a sqrt function, then I want sqrt(x) *
| sqrt(x) == x, for any x>=0.
|
| Human beings are good at inferring the general rules from
| examples but sometimes it's easier to understand if you just say
| what the general rules is.
|
| Also, unit tests that include example data can sometimes be
| dominated by that data. Removing the particular examples can
| sometimes remove a lot of distraction and verbosity.
|
| It's not just that property tests find more bugs.
| Vinnl wrote:
| I've been aware of property-based testing for a number of years
| now, but never had a good opportunity to give it a try. Then the
| past year I had a piece of serialisation/de-serialisation code,
| which was the perfect opportunity for a rather simple property-
| based test. That gave me the hang of it, and found two (minor,
| but still) bugs.
|
| Then recently I had a fairly larger, more error prone piece of
| work that lend itself very well to property-based testing, and
| it's been a godsend. It helped me discover a number of bugs, this
| time with the risk of causing privilege escalation. And since the
| proptests started succeeding reliably, I've been very confident
| that a rather complex piece of code now actually does what it's
| supposed to.
|
| If you're working in JavaScript, I can recommend fast-check [1].
|
| Another interesting approach, that I haven't yet tried, is
| Quickstrom [2], basically Puppeteer for proptests. It opens a
| webpage in a browser, performs some random interactions (pressing
| buttons, entering data, etc.), and then verifies that properties
| you specified still hold.
|
| [1] https://dubzzz.github.io/fast-check.github.com/
|
| [2] https://quickstrom.io/
| Rendello wrote:
| I had a similar experience after watching a Computerphile video
| with John Hughes, one of the authors of the original propert-
| based testing tool, QuickCheck [1]. Running hundreds of
| thousands of unique tests and finding the simplified cases was
| mind blowing to watch.
|
| I loved the video but my first experience with this method was
| with Python's Hypotheis [2], I haven't used it a ton but it's
| great for finding parsing errors. In the words of Python core
| developer Raymond Hettinger:
|
| > It's not quite fuzzing, but it hits it with the kind of test
| cases that a good QA engineer would typically come up with, and
| it does it in automated fashion.
|
| 1. https://www.youtube.com/watch?v=AfaNEebCDos
|
| 2. https://hypothesis.readthedocs.io/en/latest/
|
| 3. https://youtu.be/ARKbfWk4Xyw?t=319
| yakshaving_jgt wrote:
| I checked out Quickstrom, and I thought "Wow, this looks
| amazing!"
|
| Then I noticed who wrote it and thought "Ah. Well that makes
| sense."
| marcosdumay wrote:
| I was hoping from some insight, but as happens every time I
| decide to try property testing, the examples on your links are
| trivial to either enforce the property at type or construction,
| or to separate it from the rest of the code so that their
| implementation becomes at least as obvious and error-prone as
| the tests.
|
| Property based testing looks like a really good idea, but I
| never gained anything from applying it on practice. There are
| probably some application domains that they are good for, but I
| still didn't find them.
| Jtsummers wrote:
| If you're open to Erlang/Elixir I liked [0]. Obviously a
| bigger commitment than some blog posts and tutorials, but
| worth it in my opinion. It more clearly (to me) presents the
| case for property-based testing and goes through more complex
| examples than most blog posts which help to illustrate the
| utility more effectively.
|
| [0] https://pragprog.com/titles/fhproper/property-based-
| testing-...
| pfdietz wrote:
| It's great for compiler testing.
|
| The first approach is to generate random correct programs and
| see if they do the same thing with different compilers and/or
| different optimization flags.
|
| The second approach is to take a correct program, note that
| some parts of it are not executed on some particular input,
| then mutate that part and run on the same input. The output
| should be the same.
|
| Also, any program (not just compilers) should satisfy the
| properties that no assertions fail and that no sanitizer
| failures occur.
| vbrandl wrote:
| here's an example I had a few month ago: Password
| Requirements, merging of requirements and generating
| passwords that fulfill the requirements:
|
| - merging/combining of requirements is a monoid operation
| with an "empty requirement" (e.g. one that accepts every
| password) as the neutral element. Finding this requirement, I
| could write properties for the monoid properties (a + b = b+
| a, a + 0 = a, and so on) - when generating a password from a
| requirement, the same requirement must accept the generated
| password (`requirement.accept(requirement.generate()`)
|
| In general: if you find mathematical rules to your code
| (commutativity, associativity,. ..), these make a great
| starting point for property tests.
| ghayes wrote:
| Obvious code should have obvious properties, in which case
| you should be able to gain value from property tests. If you
| make every little function high-quality and well tested, you
| should end up with better code overall. The only functions,
| IMO, that don't deserve properties as much as functions which
| are exclusively the composition of other functions without
| any other logic. In that case, the function is often, by
| definition, its result.
| felixhuttmann wrote:
| Property-based testing is good when there is a property to
| assert that emerges in a non-trivial manner from the code
| under test. If you do not have such a problem, property-based
| testing provides no value over a simple, example-based test.
|
| In the case of the parsing code from the article, the
| emerging property is that a serialization followed by a
| deserialization should always yield the original result.
|
| In the case of the 'binary or not' case from the article, the
| non-trivial, emergent property is that the function never
| fails with an exception.
|
| Most modern software development is plumbing, stitching
| together platforms, libraries and frameworks, and there are
| rarely non-trivial, emergent properties where property-based
| testing is useful.
| artemonster wrote:
| I wonder when the software industry finally "reinvents" and
| starts using random-constraint generated stimulus from hardware
| verification methodologies :)
___________________________________________________________________
(page generated 2021-04-03 23:01 UTC)