[HN Gopher] Automated Test-Case Reduction
___________________________________________________________________
Automated Test-Case Reduction
Author : ingve
Score : 71 points
Date : 2024-07-15 20:03 UTC (1 days ago)
(HTM) web link (www.cs.cornell.edu)
(TXT) w3m dump (www.cs.cornell.edu)
| reruwe wrote:
| I find it odd that test-case reduction is billed as a "research
| skill". To me, this is a basic debugging skill for anyone who
| works with code.
|
| Maybe it's more critical when you're dealing with "research-
| quality" code. This post seems to ignore the underlying problem:
| it's rare to see research code with _any_ tests.
| derdi wrote:
| As the article says, right at the top: "The plan is to
| demonstrate techniques that "everyone knows" because everyone,
| in fact, does not already know them."
| mistercow wrote:
| This is actually something I find hard to resist in myself as
| an aging software engineer on HN. When I was young, articles
| would describe "obvious" things in excruciating detail, and
| I'd be grateful, because they were often things I didn't
| know.
|
| Now when I see articles about things I find obvious on the
| front page -- particularly if they're topics I've seen hit
| the front page before (which, mind you, means "any time in
| the last 15 years or so") -- I have to resist an annoyed gut
| reaction. The fact that knowledge and skills seem obvious and
| boring once internalized is a kind of terrible quirk of the
| human condition.
| tialaramex wrote:
| For whatever it's worth, I enjoy explaining things to
| others so each time a topic recurs I can do that where it
| seems appropriate. And hey I can pass on things I didn't
| know about and then learned here, for example that's why I
| tell people about WUFFS, can you do all the bounds checking
| at compile time? Sure, for a price. And that price
| (generality) might be easily affordable in which case using
| Rust, never mind C++ is very silly because WUFFS was
| offering you better performance and total safety if you can
| do without generality.
| HideousKojima wrote:
| My issue isn't with articles describing "obvious" things,
| it's with articles acting like this obvious thing is an
| amazing new discovery. This is especially funny when you
| see it in articles about relational database stuff, since
| most of the fundamentals have been discovered since the
| 70's.
| Certhas wrote:
| I don't understand your point. Debugging is research almost by
| definition, no?
| wyldfire wrote:
| In addition to c-reduce and shrinkray there's also cvise [1] for
| reducing tests. It's great.
|
| Also, if you happen to be reducing an LLVM case you can use llvm-
| reduce [2].
|
| [1] https://github.com/marxin/cvise
|
| [2] https://llvm.org/docs/CommandGuide/llvm-reduce.html
| crvdgc wrote:
| Very interesting. Previously I've seen two similar ideas:
|
| 1. Reduce the test data by property-based tests' auto shrinking a
| la QuickCheck [1]
|
| 2. Reduce the change set to a codebase, a la delta debugging [2]
|
| The reducer is able to do both at the same time in a pretty
| generic way, with some restrictions on the expressivity. Seems to
| be a good fit for end-to-end tests.
|
| [1]: https://www.cse.chalmers.se/~rjmh/QuickCheck/
|
| [2]: https://dl.acm.org/doi/10.1145/318774.318946
| clord wrote:
| I used to do something like this all the time with C/C++ compiler
| tests. I tried lots of fancy tools and stuff, but I kept going
| back to: expand all macros and retokenize to one token per line
| (I made a custom build of the preprocessor that had this built
| in). Then, have a shell script randomly remove lines, and use
| another script to check that the resulting test case behaves
| consistently with the failure. It would run for a few hours (or
| days, for boost problems), then usually you'd get a really
| minimal testcase that shows the problem. Often I would use this
| to find regressions. Just have the shell script check one is
| good, the other has the problem. The resulting output would then
| usually point exactly at the regressed feature, and make an
| amazing unit test.
|
| Before leaving compiler team, I wanted to find a way to do this
| one the AST level (so parens always go in pairs, etc), but that
| could also complect with the bug.
|
| I wonder if LLMs could accelerate this by more intelligently
| removing stuff first, iteratively?
| derdi wrote:
| Sophisticated reducers like C-Reduce do know things like that
| parens go in pairs. C-Reduce has many transformation
| operations, and while some are syntax agnostic (delete a few
| characters or tokens), others use Clang to try to parse the
| input as C++ and transform the AST.
|
| Perses is a reducer that works on an AST and claims to only try
| syntactically valid candidate inputs. It also claims to be
| language agnostic, and I don't know how these two things go
| together. But it does work nicely for Java for me.
| https://github.com/uw-pluverse/perses /
| https://faculty.cc.gatech.edu/~qzhang414/papers/icse18_cheng...
| DRMacIver wrote:
| Perses isn't language agnostic, it just knows the syntax of a
| lot of languages because there are antlr grammars for most
| commonly used languages.
|
| Really there's no such thing as a language-agnostic test-case
| reducer. shrink ray is much closer than most, but all this
| means is that it's got some heuristics that work well for a
| wide variety of common languages (e.g. the bracket balancing
| thing). It's also got a bunch of language-specific passes.
|
| This is sortof inherent to the problem, because in order to
| get good results and good performance, a test-case reducer
| has to have a strong idea of what sort of transformations are
| likely to work, which in turn means it has to have a strong
| idea of what sort of languages it's likely to be run on.
| imp0cat wrote:
| Interesting. The techniques are quite similar to what one can do
| with Hypothesis in Python (
| https://hypothesis.readthedocs.io/en/latest/quickstart.html ).
| Basically, it generates a bunch of test data and then tries to
| shrink it to the smallest example that fails the test
| automatically.
| spockz wrote:
| This is more generally called property based testing. My first
| encounters with it were quickcheck for erlang and Haskell. It
| has been around for ages.
| DRMacIver wrote:
| This isn't a coincidence. I wrote both.
| mspdx22 wrote:
| There was a talk about some novel ways to approach test suite
| generation and reduction at a PLDI workshop a few weeks ago. That
| work came from the compilers and programming languages tool
| domain:
|
| https://pldi24.sigplan.org/home/egraphs-2024#program
|
| Specifically, the EGSTRA paper.
___________________________________________________________________
(page generated 2024-07-16 23:00 UTC)