[HN Gopher] Property-Based Testing for the People
       ___________________________________________________________________
        
       Property-Based Testing for the People
        
       Author : matt_d
       Score  : 55 points
       Date   : 2025-01-06 16:47 UTC (6 hours ago)
        
 (HTM) web link (repository.upenn.edu)
 (TXT) w3m dump (repository.upenn.edu)
        
       | cosmic_quanta wrote:
       | This work was discussed by the author in the Haskell Interlude
       | podcast as well [0]. Highly recommended and probably easier to
       | digest than a whole dissertation.
       | 
       | [0]: https://haskell.foundation/podcast/59/
        
       | sunesimonsen wrote:
       | I think property based testing becomes a lot easier when you can
       | just use normal asserts like this:
       | https://github.com/unexpectedjs/unchecked
        
       | hitchstory wrote:
       | Property testing is a lot like formal methods - really cool, but
       | almost entirely useless in ~95% of commercial contexts.
       | 
       | They're both extremely useful when, say, building a parser, but
       | when the kind of code you write involves displaying custom
       | widgets, taking data and pushing it onto a queue, looking up data
       | in a database, etc. integration tests have a lot more bang for
       | the buck.
        
         | diggan wrote:
         | I've found it effective for anything that handles arbitrary
         | input, especially from end-users. But if that data is coming
         | from within your systems where you have full control over
         | everything, less valuable.
        
           | hitchstory wrote:
           | If the arbitrary input is, say, a text box which takes a name
           | and puts it into a database, it probably won't uncover any
           | bugs.
           | 
           | It has some use if you build something like a complex pricing
           | engine, numerical code or a parser for a mini DSL. I find
           | that problems of this type don't crop up a lot though.
        
             | IanCal wrote:
             | I disagree.
             | 
             | I've used it for things like "regardless of where you are
             | on the page, tab n times and shift tab n times leaves you
             | on the original item".
             | 
             | I found a bug in our tv ui library which was actually a bug
             | in the spec. Regardless of how you built the ui, if you
             | press a direction and focus moves, pressing the opposite
             | direction takes you back - but we had another rule that
             | broke this. We had tests for both, and it was only when I
             | made the general test (for all ui, items in it and
             | directions) it found the inconsistency.
             | 
             | It was also pretty easy to write.
             | 
             | I've also found issues around text processing due to
             | lowercasing not always resulting in the same length string
             | and more. I found a bug demoing pbt for a contact gig I was
             | going for that was around some versioning.
             | 
             | To be honest I've never implemented it, even for a demo,
             | and not found a bug.
        
               | josephg wrote:
               | > To be honest I've never implemented it, even for a
               | demo, and not found a bug.
               | 
               | Me too. I tend to roll my own property testers / fuzzers
               | per project instead of using a library. But my experience
               | is similar to yours. Out of maybe 25 testers, I think the
               | only times I didn't find any bugs was when I messed up
               | the tests themselves.
               | 
               | It's incredibly humbling work.
        
           | jgalt212 wrote:
           | Very true. For me, fuzzers and property-based tests are two
           | sides of the same coin. I'd just use whichever feels more
           | natural.
        
         | boscillator wrote:
         | It's very useful when you're working on numerical software.
         | Often, it's hard to figure out exactly what output your code
         | should return (because if you knew the answer you wouldn't have
         | to write the code), but you can easily list properties you
         | expect.
        
           | matt_d wrote:
           | Right, metamorphic testing in particular (which would be a
           | special case of PBT, with metamorphic relations being
           | properties),
           | https://en.wikipedia.org/wiki/Metamorphic_testing, https://gi
           | thub.com/MattPD/cpplinks/blob/master/testing.md#pr...
           | 
           | One simple example (from the above) is "sin (p - x) = sin x"
           | for the implementation of the sine function not requiring the
           | knowledge of its specific output values. Naturally, instead
           | of the literal equality "=" one can use a more appropriate
           | accuracy specification as in, say, relative ulp
           | (https://en.wikipedia.org/wiki/Unit_in_the_last_place) error
           | bound, cf.
           | https://members.loria.fr/PZimmermann/papers/accuracy.pdf
        
         | thehappyfellow wrote:
         | How come e.g. Jane Street uses it so much? It's the second most
         | common type of test I write.
        
           | hansvm wrote:
           | The same reason Google burns $50M+ in electricity each year
           | using protobufs instead of a more efficient format. An
           | individual company having specific needs isn't at odds with a
           | general statement being broadly true.
        
             | thehappyfellow wrote:
             | How's that comparable at all? There are no network effects
             | from writing property based tests, people use them if they
             | are helpful - are they testing enough of the code with
             | reasonable amount of effort. Nobody's forcing people to
             | write tests, unlike Google forces usage of protobuf on all
             | projects there.
        
               | hansvm wrote:
               | It's comparable in the way described in sentence #2:
               | 
               | > An individual company having specific needs isn't at
               | odds with a general statement being broadly true.
               | 
               | Google needs certain things more than reduced carbon
               | emissions, and Jane Street needs certain things more than
               | whatever else they could spend that dev time on.
        
             | cyberpunk wrote:
             | Not to derail but what's more efficient in your view? We
             | compared messagepack, standard http/json and probufs for an
             | internal service and protobufs came out tops on every
             | measure we had.
        
           | TypingOutBugs wrote:
           | Jane Street uses OCaml and property based tests are easiest
           | when dealing with pure functions, and are taught in FP
           | classes usually, so I assume it's that. Easier to setup and
           | target audience.
           | 
           | Edit: also a numerical domain, which is the easiest type to
           | use them for in my experience!
        
         | choeger wrote:
         | Hah! Try to separate your domain logic from your interfaces
         | (e.g., using something like hexagonal architecture) and then
         | say this again.
         | 
         | Yes, it's a lot of work coming up with good properties, but it
         | _massively_ helps to find gaps in the domain logic. In my
         | experience, these gaps are what 's typically expensive, not the
         | weird problem a junior had with properly using Redis or S3.
        
           | hitchstory wrote:
           | >Hah! Try to separate your domain logic from your interfaces
           | 
           | Im not an amateur.
           | 
           | The only time I dont do this is when there literally is _no_
           | domain logic yet (e.g. a CRUD app).
           | 
           | >In my experience, these gaps are what's typically expensive,
           | not the weird problem a junior had with properly using Redis
           | or S3.
           | 
           | What can I say? Your experience might not be as broad as
           | mine.
           | 
           | Redis is a source of almost no bugs because it is very well
           | designed, but most interfaces I couple to have design
           | qualities that are the exact opposite of redis's.
           | 
           | Those interfaces (e.g. wonky payment gateway APIs, weird
           | microservice APIs) are the probably source of most bugs in
           | enterprise systems I work on.
           | 
           | #2 is probably simple misspecifications (customer said code
           | should do X, it should actually do Y which is almost the same
           | but very slightly different).
           | 
           | #3 would be domain logic errors. And even most of those are
           | uncovered and avoided with saner architecture or a couple of
           | unit tests.
           | 
           | For the parsers I write at home, sure, property testing kicks
           | ass. For your college degree algorithm coursework, sure, it
           | helps a lot. For 95% of business logic? Pointless, the
           | complexity isnt buried deep in the business logic.
        
         | thom wrote:
         | I agree, but this is a good thing! My default approach these
         | days is functional tests for everything possible, and property
         | based tests for anything particularly algorithmic or containing
         | lots of edge cases, and no unit tests outside that. This is a
         | great combo, covers all the business value without leaving
         | obscure bugs, and also isn't a pain every time you refactor.
        
       | 082349872349872 wrote:
       | For structure generation I prefer Doug McIlroy's approach: pick a
       | tree size (from some arbitrary distribution), and then, of the
       | _n_ possible valid structures of that size, produce the _k_ th
       | one uniformly.
       | 
       | https://www.cs.dartmouth.edu/~doug/nfa.pdf gives an nfa variant;
       | extending to a pda is an (interesting, I found) exercise.
        
       | dpc_01234 wrote:
       | Would be very valuable if someone could write a summary of novel
       | ideas for practitioners (if there are any).
        
       | tomnicholas1 wrote:
       | The python package Hypothesis[0] already does a great job
       | bringing property-based testing to the people! I've used it and
       | it's extremely powerful.
       | 
       | [0]: https://github.com/HypothesisWorks/hypothesis
        
         | epgui wrote:
         | I have used Python's `hypothesis` as well, and I wish it were
         | better. We had to rip it out at work as we were running into
         | too many issues.
         | 
         | I have also used Haskell's `QuickCheck` and Clojure's `spec` /
         | `test.check` and have had a great experience with these. In my
         | experience they "just work".
         | 
         | Conversely, if you're trying to generate non-trivial datasets,
         | you will likely run into situations where your specification is
         | correct but Hypothesis' implementation fails to generate data,
         | or takes an unreasonable amount of time to generate data.
         | 
         | Example: Generate a 100x25 array of numeric values, where the
         | only condition is that they must not all be zero
         | simultaneously. [1]
         | 
         | [1] https://github.com/HypothesisWorks/hypothesis/issues/3493
        
           | mrcsd wrote:
           | Care to expand upon the issues you were running into with
           | hypothesis? I'm genuinely curious as I may soon be evaluating
           | whether to use it in a professional context.
        
           | rtpg wrote:
           | I understand your pain in some sense, but on another I feel
           | like people with a decent amount of hypothesis experience
           | "know" how the generator works and would understand that you
           | basically _never_ want to use `filter` if you can avoid it,
           | instead relying on unfalsifiable generation.
           | 
           | Silly idea for your generator would to generate an array, and
           | if it's zero... draw a random index and a random non-zero
           | number and add it into the array. Leads to some weird non-
           | convexity properties but is a workable hack.
           | 
           | In your own example you turned off the "data too slow" issue,
           | probably because building up a dataframe (all to just do a
           | column sum!) is actually kind of costly at large numbers!
           | Your complaint is probably actually meant for the pandas
           | extras (or pandas itself) rather than the concept of
           | hypothesis.
        
       | choeger wrote:
       | Nice work. I didn't yet read it fully, but I love the idea. Looks
       | to be a valuable thesis.
        
       ___________________________________________________________________
       (page generated 2025-01-06 23:00 UTC)