[HN Gopher] Automated Test-Case Reduction
       ___________________________________________________________________
        
       Automated Test-Case Reduction
        
       Author : ingve
       Score  : 71 points
       Date   : 2024-07-15 20:03 UTC (1 days ago)
        
 (HTM) web link (www.cs.cornell.edu)
 (TXT) w3m dump (www.cs.cornell.edu)
        
       | reruwe wrote:
       | I find it odd that test-case reduction is billed as a "research
       | skill". To me, this is a basic debugging skill for anyone who
       | works with code.
       | 
       | Maybe it's more critical when you're dealing with "research-
       | quality" code. This post seems to ignore the underlying problem:
       | it's rare to see research code with _any_ tests.
        
         | derdi wrote:
         | As the article says, right at the top: "The plan is to
         | demonstrate techniques that "everyone knows" because everyone,
         | in fact, does not already know them."
        
           | mistercow wrote:
           | This is actually something I find hard to resist in myself as
           | an aging software engineer on HN. When I was young, articles
           | would describe "obvious" things in excruciating detail, and
           | I'd be grateful, because they were often things I didn't
           | know.
           | 
           | Now when I see articles about things I find obvious on the
           | front page -- particularly if they're topics I've seen hit
           | the front page before (which, mind you, means "any time in
           | the last 15 years or so") -- I have to resist an annoyed gut
           | reaction. The fact that knowledge and skills seem obvious and
           | boring once internalized is a kind of terrible quirk of the
           | human condition.
        
             | tialaramex wrote:
             | For whatever it's worth, I enjoy explaining things to
             | others so each time a topic recurs I can do that where it
             | seems appropriate. And hey I can pass on things I didn't
             | know about and then learned here, for example that's why I
             | tell people about WUFFS, can you do all the bounds checking
             | at compile time? Sure, for a price. And that price
             | (generality) might be easily affordable in which case using
             | Rust, never mind C++ is very silly because WUFFS was
             | offering you better performance and total safety if you can
             | do without generality.
        
             | HideousKojima wrote:
             | My issue isn't with articles describing "obvious" things,
             | it's with articles acting like this obvious thing is an
             | amazing new discovery. This is especially funny when you
             | see it in articles about relational database stuff, since
             | most of the fundamentals have been discovered since the
             | 70's.
        
         | Certhas wrote:
         | I don't understand your point. Debugging is research almost by
         | definition, no?
        
       | wyldfire wrote:
       | In addition to c-reduce and shrinkray there's also cvise [1] for
       | reducing tests. It's great.
       | 
       | Also, if you happen to be reducing an LLVM case you can use llvm-
       | reduce [2].
       | 
       | [1] https://github.com/marxin/cvise
       | 
       | [2] https://llvm.org/docs/CommandGuide/llvm-reduce.html
        
       | crvdgc wrote:
       | Very interesting. Previously I've seen two similar ideas:
       | 
       | 1. Reduce the test data by property-based tests' auto shrinking a
       | la QuickCheck [1]
       | 
       | 2. Reduce the change set to a codebase, a la delta debugging [2]
       | 
       | The reducer is able to do both at the same time in a pretty
       | generic way, with some restrictions on the expressivity. Seems to
       | be a good fit for end-to-end tests.
       | 
       | [1]: https://www.cse.chalmers.se/~rjmh/QuickCheck/
       | 
       | [2]: https://dl.acm.org/doi/10.1145/318774.318946
        
       | clord wrote:
       | I used to do something like this all the time with C/C++ compiler
       | tests. I tried lots of fancy tools and stuff, but I kept going
       | back to: expand all macros and retokenize to one token per line
       | (I made a custom build of the preprocessor that had this built
       | in). Then, have a shell script randomly remove lines, and use
       | another script to check that the resulting test case behaves
       | consistently with the failure. It would run for a few hours (or
       | days, for boost problems), then usually you'd get a really
       | minimal testcase that shows the problem. Often I would use this
       | to find regressions. Just have the shell script check one is
       | good, the other has the problem. The resulting output would then
       | usually point exactly at the regressed feature, and make an
       | amazing unit test.
       | 
       | Before leaving compiler team, I wanted to find a way to do this
       | one the AST level (so parens always go in pairs, etc), but that
       | could also complect with the bug.
       | 
       | I wonder if LLMs could accelerate this by more intelligently
       | removing stuff first, iteratively?
        
         | derdi wrote:
         | Sophisticated reducers like C-Reduce do know things like that
         | parens go in pairs. C-Reduce has many transformation
         | operations, and while some are syntax agnostic (delete a few
         | characters or tokens), others use Clang to try to parse the
         | input as C++ and transform the AST.
         | 
         | Perses is a reducer that works on an AST and claims to only try
         | syntactically valid candidate inputs. It also claims to be
         | language agnostic, and I don't know how these two things go
         | together. But it does work nicely for Java for me.
         | https://github.com/uw-pluverse/perses /
         | https://faculty.cc.gatech.edu/~qzhang414/papers/icse18_cheng...
        
           | DRMacIver wrote:
           | Perses isn't language agnostic, it just knows the syntax of a
           | lot of languages because there are antlr grammars for most
           | commonly used languages.
           | 
           | Really there's no such thing as a language-agnostic test-case
           | reducer. shrink ray is much closer than most, but all this
           | means is that it's got some heuristics that work well for a
           | wide variety of common languages (e.g. the bracket balancing
           | thing). It's also got a bunch of language-specific passes.
           | 
           | This is sortof inherent to the problem, because in order to
           | get good results and good performance, a test-case reducer
           | has to have a strong idea of what sort of transformations are
           | likely to work, which in turn means it has to have a strong
           | idea of what sort of languages it's likely to be run on.
        
       | imp0cat wrote:
       | Interesting. The techniques are quite similar to what one can do
       | with Hypothesis in Python (
       | https://hypothesis.readthedocs.io/en/latest/quickstart.html ).
       | Basically, it generates a bunch of test data and then tries to
       | shrink it to the smallest example that fails the test
       | automatically.
        
         | spockz wrote:
         | This is more generally called property based testing. My first
         | encounters with it were quickcheck for erlang and Haskell. It
         | has been around for ages.
        
         | DRMacIver wrote:
         | This isn't a coincidence. I wrote both.
        
       | mspdx22 wrote:
       | There was a talk about some novel ways to approach test suite
       | generation and reduction at a PLDI workshop a few weeks ago. That
       | work came from the compilers and programming languages tool
       | domain:
       | 
       | https://pldi24.sigplan.org/home/egraphs-2024#program
       | 
       | Specifically, the EGSTRA paper.
        
       ___________________________________________________________________
       (page generated 2024-07-16 23:00 UTC)