hngopher.com

       [HN Gopher] In praise of property-based testing (2019)
       ___________________________________________________________________
        
       In praise of property-based testing (2019)
        
       Author : BerislavLopac
       Score  : 52 points
       Date   : 2021-04-03 11:15 UTC (11 hours ago)
        
 (HTM) web link (increment.com)
 (TXT) w3m dump (increment.com)
        
       | jrockway wrote:
       | Neither the bad example nor good example of the "add contributors
       | up to the limit" test check the state change where users stop
       | being accepted. You really want: add user 1 -> ok, add user 2 ->
       | ok, add user 3 -> ok, add user 4 -> fail.
       | 
       | Doesn't matter what methodology you use if your test doesn't test
       | what you want it to.
       | 
       | I also fear that in general, people try to make their tests too
       | clever. I don't want the test to generate random data that's
       | different every time, unless it's a fuzz test. I want to see the
       | inputs and the expected outputs as clearly as possible, so that
       | when the test fails I'm not guessing whether or not there is some
       | bug 30 helpers deep, or if I simply added a bug with my change.
       | The real world is complicated and people are going to enter a lot
       | of invalid data into your application, so it is crucial that you
       | test that. But you also need the basics to work before you worry
       | about the advanced cases.
        
       | dfee wrote:
       | There are a lot of words without a strong definition, but it
       | seems to summarize to:
       | 
       | "Property testing is where you vary the parameters, perhaps
       | within certain well defined constraints, and through
       | randomization ensure certain coverage."
       | 
       | I think there are other ways of handling this, beyond the way
       | described, but I'm not sure if that's because I just read a field
       | guide to Hypothesis (again, without a firm definition), or if
       | other approaches - such as test parameterization at key
       | boundaries is more effective.
       | 
       | My hunch is that property based testing provides less value on
       | unit tests - where it could be used to reduce coverage - and more
       | value in integrating those units together - where the number of
       | permutations can grow very quickly.
        
         | BerislavLopac wrote:
         | > Property testing is where you vary the parameters, perhaps
         | within certain well defined constraints, and through
         | randomization ensure certain coverage.
         | 
         | No - this would be akin to saying "unit testing is when you use
         | assert statements".
         | 
         | Varying the parameters -- and using far more than simple
         | randomisation -- is just a technique; the _purpose_ of property
         | testing is that the behaviour of the unit (or system) under
         | test fulfills certain properties, in ma mathematical sense of
         | the word. For a simple example: if you 're writing a function
         | that is adding two integer numbers, you want to test that it
         | satisfies all the properties of addition [0]. So your tests
         | need to validate against a number of combinations of input
         | values, many of which are not random (e.g. to test for the
         | "successor" property one of them needs to be 1).
         | 
         | [0] https://en.wikipedia.org/wiki/Addition#Properties
        
         | Jtsummers wrote:
         | PBT is nice in place of (some) unit tests in that you can
         | describe immediately the properties you expect without needing
         | to produce a collection of examples (or write a custom
         | generator that produces a limited set of examples, at which
         | point you're halfway to PBT anyways).
         | 
         | It's also helpful to use it in a piece wise fashion if you're
         | doing TDD. An illustrative example (though perhaps not stellar
         | as it is a synthetic, non-real-world example) uses the diamond
         | kata, TDD, and PBT together [0]. None of the tests on their own
         | fully specify the system, but in total they do.
         | 
         | If you're doing TDD (or attempting to) I think this is an
         | interesting case. Many TDD methods have you start off with an
         | example case like (to stick with this kata, and using
         | Pythonesque pseudocode because I'm still not awake this
         | Saturday morning):                 diamond-kata-test-a:
         | assert(diamond('A') == 'A')
         | 
         | Great, so now someone makes that absolute simplest solution:
         | diamond(c):         return 'A'
         | 
         | Now repeat with a second test case:                 diamond-
         | kata-test-b:         assert(diamond('B') == ' A \nB B\n A ')
         | 
         | And the function is duly complicated:
         | diamond(c):         switch c:           case 'A': return 'A'
         | case 'B': return ' A \nB B\n A '           default: return
         | 'blah' // or error, doesn't matter it's not tested
         | 
         | But not actually generalized to reflect the intent of the
         | system. By focusing on properties, I've found, the progression
         | of the UUT is a bit better/more natural.
         | 
         | Another interesting thing to do with PBT is model-based testing
         | [1]. The useful thing here is that sometimes the errors are
         | triggered by a peculiar, though plausible, sequence of commands
         | to your system. We've all worked with that one guy who somehow
         | manages to find exactly the right sequence that triggers weird
         | edge cases and errors, but unless we're him having a system
         | which will generate arbitrary execution traces for you. (I
         | actually used FsCheck for this last year in trying to sell PBT
         | to my colleagues and was able to identify where a known issue
         | originated as well as several other problems that hadn't been
         | found by users or testers yet.)
         | 
         | In the end, when these failures are found you can always turn
         | them into distinct unit tests in order to preserve them and
         | prevent regressions. The two modes of testing fit well
         | together.
         | 
         | [0]
         | http://christophethibaut.com/programming/2020/03/18/Diamond-...
         | 
         | [1] https://fscheck.github.io/FsCheck/StatefulTesting.html
        
       | diurnalist wrote:
       | Thank you for sharing this! I've been intrigued by the idea of
       | property tests for a while but in my mind it's relegated to the
       | "mad science" corner of tools I would use, partly because most
       | examples or cases made for it that I've seen have used examples
       | and use cases that didn't translate easily to the day-to-day
       | systems (html web servers mostly) I work on. I like that this
       | post uses Django as the motivating example.
       | 
       | The "shrinking" capability of the test library highlighted is
       | brilliant.
       | 
       | I'm inspired to think of how to start to leverage something like
       | this on some upcoming work.
        
         | pfdietz wrote:
         | Hypothesis does shrinking in an interesting way.
         | 
         | The first idea you think of for shrinking it to take the
         | randomly generated values and try to make them smaller. But the
         | generator may be imposing constraints on the values, and if you
         | lose those constraints, the input becomes invalid.
         | 
         | An example of this problem is generating valid C programs to
         | test C compilers (by compiling with different compilers or
         | different optimization settings and seeing if the behavior
         | differs). The constraint there is that the C program not show
         | undefined or implementation-specific behavior. Naively
         | shrinking a C program will not in general preserve this
         | property.
         | 
         | Hypothesis takes a different approach to shrinking: it records
         | the sequence of random values used by the generator, and
         | replays the generator on mutations of that sequence that do not
         | increase its length. The only way this can fail is if the
         | generator runs out of values on the mutated sequence.
         | Otherwise, the new output will always satisfy the constraints
         | imposed by the generator. Hypothesis does various clever things
         | to speed this up.
        
           | adkadskhj wrote:
           | I'm quite new to property testing, first introduced recently
           | via a Rust property testing framework proptest[0]. So far
           | i've had the feeling that property testing frameworks need to
           | include a way to rationalize complexity, as their assumptions
           | can easily fall short as you illustrated.
           | 
           | Eg the simplest example might be an application where you
           | input an integer, but a smaller int actually drives up the
           | complexity. This idea gets more complex when we consider a
           | list of ints, where a larger list and larger numbers are
           | simpler. Etcetc.
           | 
           | It would be neat if a proptest framework supported (and maybe
           | they do, again, i'm a novice here) a way to rank complexity.
           | Eg a simple function which gives two input failures and you
           | can choose which is the simpler.
           | 
           | Another really neat way to do that might be to actually
           | compute the path complexity as the program runs based on the
           | count of CPU instructions or something. This wouldn't always
           | be a parallel, but would often be the right default i
           | imagine.
           | 
           | Either way in my limited proptest experience, i've found it a
           | bit difficult to design the tests to test what you actually
           | want, but very helpful once you establish that.
           | 
           | [0]: https://crates.io/crates/proptest
        
       | garethrowlands wrote:
       | The best thing, in my view, about property testing is that it
       | allows you to state the properties you're testing for, as opposed
       | to some examples of them.
       | 
       | For example if I'm making a sqrt function, then I want sqrt(x) *
       | sqrt(x) == x, for any x>=0.
       | 
       | Human beings are good at inferring the general rules from
       | examples but sometimes it's easier to understand if you just say
       | what the general rules is.
       | 
       | Also, unit tests that include example data can sometimes be
       | dominated by that data. Removing the particular examples can
       | sometimes remove a lot of distraction and verbosity.
       | 
       | It's not just that property tests find more bugs.
        
       | Vinnl wrote:
       | I've been aware of property-based testing for a number of years
       | now, but never had a good opportunity to give it a try. Then the
       | past year I had a piece of serialisation/de-serialisation code,
       | which was the perfect opportunity for a rather simple property-
       | based test. That gave me the hang of it, and found two (minor,
       | but still) bugs.
       | 
       | Then recently I had a fairly larger, more error prone piece of
       | work that lend itself very well to property-based testing, and
       | it's been a godsend. It helped me discover a number of bugs, this
       | time with the risk of causing privilege escalation. And since the
       | proptests started succeeding reliably, I've been very confident
       | that a rather complex piece of code now actually does what it's
       | supposed to.
       | 
       | If you're working in JavaScript, I can recommend fast-check [1].
       | 
       | Another interesting approach, that I haven't yet tried, is
       | Quickstrom [2], basically Puppeteer for proptests. It opens a
       | webpage in a browser, performs some random interactions (pressing
       | buttons, entering data, etc.), and then verifies that properties
       | you specified still hold.
       | 
       | [1] https://dubzzz.github.io/fast-check.github.com/
       | 
       | [2] https://quickstrom.io/
        
         | Rendello wrote:
         | I had a similar experience after watching a Computerphile video
         | with John Hughes, one of the authors of the original propert-
         | based testing tool, QuickCheck [1]. Running hundreds of
         | thousands of unique tests and finding the simplified cases was
         | mind blowing to watch.
         | 
         | I loved the video but my first experience with this method was
         | with Python's Hypotheis [2], I haven't used it a ton but it's
         | great for finding parsing errors. In the words of Python core
         | developer Raymond Hettinger:
         | 
         | > It's not quite fuzzing, but it hits it with the kind of test
         | cases that a good QA engineer would typically come up with, and
         | it does it in automated fashion.
         | 
         | 1. https://www.youtube.com/watch?v=AfaNEebCDos
         | 
         | 2. https://hypothesis.readthedocs.io/en/latest/
         | 
         | 3. https://youtu.be/ARKbfWk4Xyw?t=319
        
         | yakshaving_jgt wrote:
         | I checked out Quickstrom, and I thought "Wow, this looks
         | amazing!"
         | 
         | Then I noticed who wrote it and thought "Ah. Well that makes
         | sense."
        
         | marcosdumay wrote:
         | I was hoping from some insight, but as happens every time I
         | decide to try property testing, the examples on your links are
         | trivial to either enforce the property at type or construction,
         | or to separate it from the rest of the code so that their
         | implementation becomes at least as obvious and error-prone as
         | the tests.
         | 
         | Property based testing looks like a really good idea, but I
         | never gained anything from applying it on practice. There are
         | probably some application domains that they are good for, but I
         | still didn't find them.
        
           | Jtsummers wrote:
           | If you're open to Erlang/Elixir I liked [0]. Obviously a
           | bigger commitment than some blog posts and tutorials, but
           | worth it in my opinion. It more clearly (to me) presents the
           | case for property-based testing and goes through more complex
           | examples than most blog posts which help to illustrate the
           | utility more effectively.
           | 
           | [0] https://pragprog.com/titles/fhproper/property-based-
           | testing-...
        
           | pfdietz wrote:
           | It's great for compiler testing.
           | 
           | The first approach is to generate random correct programs and
           | see if they do the same thing with different compilers and/or
           | different optimization flags.
           | 
           | The second approach is to take a correct program, note that
           | some parts of it are not executed on some particular input,
           | then mutate that part and run on the same input. The output
           | should be the same.
           | 
           | Also, any program (not just compilers) should satisfy the
           | properties that no assertions fail and that no sanitizer
           | failures occur.
        
           | vbrandl wrote:
           | here's an example I had a few month ago: Password
           | Requirements, merging of requirements and generating
           | passwords that fulfill the requirements:
           | 
           | - merging/combining of requirements is a monoid operation
           | with an "empty requirement" (e.g. one that accepts every
           | password) as the neutral element. Finding this requirement, I
           | could write properties for the monoid properties (a + b = b+
           | a, a + 0 = a, and so on) - when generating a password from a
           | requirement, the same requirement must accept the generated
           | password (`requirement.accept(requirement.generate()`)
           | 
           | In general: if you find mathematical rules to your code
           | (commutativity, associativity,. ..), these make a great
           | starting point for property tests.
        
           | ghayes wrote:
           | Obvious code should have obvious properties, in which case
           | you should be able to gain value from property tests. If you
           | make every little function high-quality and well tested, you
           | should end up with better code overall. The only functions,
           | IMO, that don't deserve properties as much as functions which
           | are exclusively the composition of other functions without
           | any other logic. In that case, the function is often, by
           | definition, its result.
        
           | felixhuttmann wrote:
           | Property-based testing is good when there is a property to
           | assert that emerges in a non-trivial manner from the code
           | under test. If you do not have such a problem, property-based
           | testing provides no value over a simple, example-based test.
           | 
           | In the case of the parsing code from the article, the
           | emerging property is that a serialization followed by a
           | deserialization should always yield the original result.
           | 
           | In the case of the 'binary or not' case from the article, the
           | non-trivial, emergent property is that the function never
           | fails with an exception.
           | 
           | Most modern software development is plumbing, stitching
           | together platforms, libraries and frameworks, and there are
           | rarely non-trivial, emergent properties where property-based
           | testing is useful.
        
       | artemonster wrote:
       | I wonder when the software industry finally "reinvents" and
       | starts using random-constraint generated stimulus from hardware
       | verification methodologies :)
        
       ___________________________________________________________________
       (page generated 2021-04-03 23:01 UTC)