[HN Gopher] Property-Based Testing for the People
___________________________________________________________________
Property-Based Testing for the People
Author : matt_d
Score : 55 points
Date : 2025-01-06 16:47 UTC (6 hours ago)
(HTM) web link (repository.upenn.edu)
(TXT) w3m dump (repository.upenn.edu)
| cosmic_quanta wrote:
| This work was discussed by the author in the Haskell Interlude
| podcast as well [0]. Highly recommended and probably easier to
| digest than a whole dissertation.
|
| [0]: https://haskell.foundation/podcast/59/
| sunesimonsen wrote:
| I think property based testing becomes a lot easier when you can
| just use normal asserts like this:
| https://github.com/unexpectedjs/unchecked
| hitchstory wrote:
| Property testing is a lot like formal methods - really cool, but
| almost entirely useless in ~95% of commercial contexts.
|
| They're both extremely useful when, say, building a parser, but
| when the kind of code you write involves displaying custom
| widgets, taking data and pushing it onto a queue, looking up data
| in a database, etc. integration tests have a lot more bang for
| the buck.
| diggan wrote:
| I've found it effective for anything that handles arbitrary
| input, especially from end-users. But if that data is coming
| from within your systems where you have full control over
| everything, less valuable.
| hitchstory wrote:
| If the arbitrary input is, say, a text box which takes a name
| and puts it into a database, it probably won't uncover any
| bugs.
|
| It has some use if you build something like a complex pricing
| engine, numerical code or a parser for a mini DSL. I find
| that problems of this type don't crop up a lot though.
| IanCal wrote:
| I disagree.
|
| I've used it for things like "regardless of where you are
| on the page, tab n times and shift tab n times leaves you
| on the original item".
|
| I found a bug in our tv ui library which was actually a bug
| in the spec. Regardless of how you built the ui, if you
| press a direction and focus moves, pressing the opposite
| direction takes you back - but we had another rule that
| broke this. We had tests for both, and it was only when I
| made the general test (for all ui, items in it and
| directions) it found the inconsistency.
|
| It was also pretty easy to write.
|
| I've also found issues around text processing due to
| lowercasing not always resulting in the same length string
| and more. I found a bug demoing pbt for a contact gig I was
| going for that was around some versioning.
|
| To be honest I've never implemented it, even for a demo,
| and not found a bug.
| josephg wrote:
| > To be honest I've never implemented it, even for a
| demo, and not found a bug.
|
| Me too. I tend to roll my own property testers / fuzzers
| per project instead of using a library. But my experience
| is similar to yours. Out of maybe 25 testers, I think the
| only times I didn't find any bugs was when I messed up
| the tests themselves.
|
| It's incredibly humbling work.
| jgalt212 wrote:
| Very true. For me, fuzzers and property-based tests are two
| sides of the same coin. I'd just use whichever feels more
| natural.
| boscillator wrote:
| It's very useful when you're working on numerical software.
| Often, it's hard to figure out exactly what output your code
| should return (because if you knew the answer you wouldn't have
| to write the code), but you can easily list properties you
| expect.
| matt_d wrote:
| Right, metamorphic testing in particular (which would be a
| special case of PBT, with metamorphic relations being
| properties),
| https://en.wikipedia.org/wiki/Metamorphic_testing, https://gi
| thub.com/MattPD/cpplinks/blob/master/testing.md#pr...
|
| One simple example (from the above) is "sin (p - x) = sin x"
| for the implementation of the sine function not requiring the
| knowledge of its specific output values. Naturally, instead
| of the literal equality "=" one can use a more appropriate
| accuracy specification as in, say, relative ulp
| (https://en.wikipedia.org/wiki/Unit_in_the_last_place) error
| bound, cf.
| https://members.loria.fr/PZimmermann/papers/accuracy.pdf
| thehappyfellow wrote:
| How come e.g. Jane Street uses it so much? It's the second most
| common type of test I write.
| hansvm wrote:
| The same reason Google burns $50M+ in electricity each year
| using protobufs instead of a more efficient format. An
| individual company having specific needs isn't at odds with a
| general statement being broadly true.
| thehappyfellow wrote:
| How's that comparable at all? There are no network effects
| from writing property based tests, people use them if they
| are helpful - are they testing enough of the code with
| reasonable amount of effort. Nobody's forcing people to
| write tests, unlike Google forces usage of protobuf on all
| projects there.
| hansvm wrote:
| It's comparable in the way described in sentence #2:
|
| > An individual company having specific needs isn't at
| odds with a general statement being broadly true.
|
| Google needs certain things more than reduced carbon
| emissions, and Jane Street needs certain things more than
| whatever else they could spend that dev time on.
| cyberpunk wrote:
| Not to derail but what's more efficient in your view? We
| compared messagepack, standard http/json and probufs for an
| internal service and protobufs came out tops on every
| measure we had.
| TypingOutBugs wrote:
| Jane Street uses OCaml and property based tests are easiest
| when dealing with pure functions, and are taught in FP
| classes usually, so I assume it's that. Easier to setup and
| target audience.
|
| Edit: also a numerical domain, which is the easiest type to
| use them for in my experience!
| choeger wrote:
| Hah! Try to separate your domain logic from your interfaces
| (e.g., using something like hexagonal architecture) and then
| say this again.
|
| Yes, it's a lot of work coming up with good properties, but it
| _massively_ helps to find gaps in the domain logic. In my
| experience, these gaps are what 's typically expensive, not the
| weird problem a junior had with properly using Redis or S3.
| hitchstory wrote:
| >Hah! Try to separate your domain logic from your interfaces
|
| Im not an amateur.
|
| The only time I dont do this is when there literally is _no_
| domain logic yet (e.g. a CRUD app).
|
| >In my experience, these gaps are what's typically expensive,
| not the weird problem a junior had with properly using Redis
| or S3.
|
| What can I say? Your experience might not be as broad as
| mine.
|
| Redis is a source of almost no bugs because it is very well
| designed, but most interfaces I couple to have design
| qualities that are the exact opposite of redis's.
|
| Those interfaces (e.g. wonky payment gateway APIs, weird
| microservice APIs) are the probably source of most bugs in
| enterprise systems I work on.
|
| #2 is probably simple misspecifications (customer said code
| should do X, it should actually do Y which is almost the same
| but very slightly different).
|
| #3 would be domain logic errors. And even most of those are
| uncovered and avoided with saner architecture or a couple of
| unit tests.
|
| For the parsers I write at home, sure, property testing kicks
| ass. For your college degree algorithm coursework, sure, it
| helps a lot. For 95% of business logic? Pointless, the
| complexity isnt buried deep in the business logic.
| thom wrote:
| I agree, but this is a good thing! My default approach these
| days is functional tests for everything possible, and property
| based tests for anything particularly algorithmic or containing
| lots of edge cases, and no unit tests outside that. This is a
| great combo, covers all the business value without leaving
| obscure bugs, and also isn't a pain every time you refactor.
| 082349872349872 wrote:
| For structure generation I prefer Doug McIlroy's approach: pick a
| tree size (from some arbitrary distribution), and then, of the
| _n_ possible valid structures of that size, produce the _k_ th
| one uniformly.
|
| https://www.cs.dartmouth.edu/~doug/nfa.pdf gives an nfa variant;
| extending to a pda is an (interesting, I found) exercise.
| dpc_01234 wrote:
| Would be very valuable if someone could write a summary of novel
| ideas for practitioners (if there are any).
| tomnicholas1 wrote:
| The python package Hypothesis[0] already does a great job
| bringing property-based testing to the people! I've used it and
| it's extremely powerful.
|
| [0]: https://github.com/HypothesisWorks/hypothesis
| epgui wrote:
| I have used Python's `hypothesis` as well, and I wish it were
| better. We had to rip it out at work as we were running into
| too many issues.
|
| I have also used Haskell's `QuickCheck` and Clojure's `spec` /
| `test.check` and have had a great experience with these. In my
| experience they "just work".
|
| Conversely, if you're trying to generate non-trivial datasets,
| you will likely run into situations where your specification is
| correct but Hypothesis' implementation fails to generate data,
| or takes an unreasonable amount of time to generate data.
|
| Example: Generate a 100x25 array of numeric values, where the
| only condition is that they must not all be zero
| simultaneously. [1]
|
| [1] https://github.com/HypothesisWorks/hypothesis/issues/3493
| mrcsd wrote:
| Care to expand upon the issues you were running into with
| hypothesis? I'm genuinely curious as I may soon be evaluating
| whether to use it in a professional context.
| rtpg wrote:
| I understand your pain in some sense, but on another I feel
| like people with a decent amount of hypothesis experience
| "know" how the generator works and would understand that you
| basically _never_ want to use `filter` if you can avoid it,
| instead relying on unfalsifiable generation.
|
| Silly idea for your generator would to generate an array, and
| if it's zero... draw a random index and a random non-zero
| number and add it into the array. Leads to some weird non-
| convexity properties but is a workable hack.
|
| In your own example you turned off the "data too slow" issue,
| probably because building up a dataframe (all to just do a
| column sum!) is actually kind of costly at large numbers!
| Your complaint is probably actually meant for the pandas
| extras (or pandas itself) rather than the concept of
| hypothesis.
| choeger wrote:
| Nice work. I didn't yet read it fully, but I love the idea. Looks
| to be a valuable thesis.
___________________________________________________________________
(page generated 2025-01-06 23:00 UTC)