[HN Gopher] Chasing a Bug in a SAT Solver
       ___________________________________________________________________
        
       Chasing a Bug in a SAT Solver
        
       Author : wofo
       Score  : 69 points
       Date   : 2024-06-19 06:54 UTC (2 days ago)
        
 (HTM) web link (ochagavia.nl)
 (TXT) w3m dump (ochagavia.nl)
        
       | rgovostes wrote:
       | If you have a large input that triggers a bug and you want to
       | reduce it, it might be worth looking into the Delta Debugging[0]
       | algorithm used by C-Reduce[1] to make minimal test cases for
       | C-like languages. C-Reduce "happens to do a pretty good job
       | reducing the size of programs in languages other than C/C++, such
       | as JavaScript and Rust" so it might work out of the box for you.
       | (Aside: there's also ddSMT for SMT solvers that take SMT-LIB2
       | input.)
       | 
       | 0: https://www.st.cs.uni-saarland.de/dd/
       | 
       | 1: https://github.com/csmith-project/creduce
        
         | kragen wrote:
         | thanks, this is fantastic!
        
         | wyldfire wrote:
         | I've used creduce (and cvise) to reduce test cases for all
         | kinds of inputs, even ones that aren't C-like. If it has
         | newline-delimited text, it'll work well (though you should
         | adjust the knobs accordingly in order to converge faster).
        
         | andrewchambers wrote:
         | I wrote a single file python3 implementation with no other
         | dependencies here:
         | 
         | https://github.com/andrewchambers/ddmin-python
         | 
         | It can be used as a cli program or a python library.
        
         | nextaccountic wrote:
         | Not sure if this makes sense but is that similar to fuzzing? Or
         | can it be used alongside fuzzing
         | 
         | What I am familiar is with property testing libraries that
         | automatically shrink failing cases, like quickcheck for
         | Haskell, quickcheck for Rust and Hypothesis for Python. I
         | suppose that if we modelled C code as an AST one could use
         | quickcheck to shrink a test case. Is creduce like this, but
         | operating on strings rather than trees?
         | 
         | https://stackoverflow.com/questions/16968549/what-is-a-shrin...
        
           | gergo_barany wrote:
           | Yes, the kind of test case reduction that C-Reduce does is
           | analogous to QuickCheck's shrinking. Some of C-Reduce's
           | transformations are purely "on strings", e.g., "delete a
           | physical line from the file", others are a bit more syntax-
           | aware, e.g., "delete N tokens", and some actually build an
           | AST using Clang and do some semantic transformations like
           | "replace function parameter by global variable".
           | 
           | And yes, this is used alongside fuzzing, once a fuzzer (like
           | Csmith from the same group as C-Reduce) has found a file that
           | provokes a bug.
           | 
           | A reducer is not a fuzzer itself, although with some
           | ingenuity it can be made to behave like a fuzzer in certain
           | contexts: https://blog.regehr.org/archives/1284
        
         | ajb wrote:
         | The Delta Debugging algorithm is an underused tool, which is
         | not only applicable to debugging, but any task where you need
         | to find a minimal satisfying subset. For example, it can also
         | be used to automatically find an (approximately) minimal set of
         | cloud permissions needed for some purpose:
         | https://github.com/KanoComputing/aws-tools/blob/master/bin/a...
         | This is otherwise quite a tedious exercise.
         | 
         | (It's approximate because theoretically you can find a local
         | minimum - but for permissions, there's probably only one
         | minimum)
        
         | pfdietz wrote:
         | A problem with test case reduction occurs when your test
         | generator also ensures the test input satisfies some
         | properties. This means naive reduction can produce invalid
         | inputs for which "failure" doesn't represent a bug. An example
         | is in testing C compilers where the C code must not exhibit any
         | of the many undefined behaviors described in the C standard.
         | Creduce goes to some lengths to work around this problem.
         | 
         | A solution to this is doing the reduction on a tree of random
         | numbers provided to the test generator, so the output is always
         | a legitimate test input. The Hypothesis testing system takes
         | this approach.
        
         | PhilipRoman wrote:
         | AFL provides a nice standalone tool "afl-minimize", I've used
         | it on several occasions and it works great for certain types of
         | inputs.
        
       | JonChesterfield wrote:
       | I wrote a CSP solver a while ago. The bugs were alarmingly prone
       | to being incomplete enumeration of solutions. The solver would
       | say "Yep, here's all 2412 solutions to that problem", but there
       | were more solutions out there it missed. I didn't usually know
       | how many solutions there would be so it's hard to distinguish
       | between "all of them" and "95% of them". However that also threw
       | a lot of doubt on when the solver returned unsat - was the
       | problem unsat, or was the solver failing to look in part of the
       | search space?
       | 
       | I didn't find a plausible testing strategy for convincing myself
       | that the last of that class of error was gone and ultimately
       | stopped working on it as a result.
        
         | cjfd wrote:
         | Maybe one could attempt to use a proof assistant to prove that
         | the algorithm is correct?
        
           | nextaccountic wrote:
           | Or just prove yourself with pen and paper if the algorithm
           | works, using structural induction and other techniques. Then
           | it's just a matter of whether the code follows the spec
        
           | CJefferson wrote:
           | At present we are a long way from proving general constraint
           | solvers are correct, as they are built from some many little
           | algorithms that interact in weird ways.
           | 
           | It's an interesting area, and I hope we'll get there in the
           | future.
        
           | JonChesterfield wrote:
           | For what it's worth I'm now working on a proof assistant so
           | yeah, that's the game plan. I think the straight line case
           | for the solver is probably OK but there's a huge range of
           | optimisations out there and it's far too easy to introduce
           | errors there.
        
         | CJefferson wrote:
         | The best practical technique is generate random problems, and
         | compare your solver with another (or two), and hope you didn't
         | both have the same bugs :)
         | 
         | You can also often make progress by "shuffling" problems
         | (reordering constraints / variables), or solving sub-problems +
         | super-problems.
        
           | hansvm wrote:
           | This is my go-to technique for tricky code.
           | 
           | Most of the time it's pretty easy to write a dumb, slow,
           | almost certainly correct version of the thing in question and
           | just fuzz the two implementations on small inputs. E.g., if
           | you're speeding up floating-point division by 2 with bit-
           | hacks in the exponent field, compare it to just dividing by 2
           | (and for up to 32-bit floats, just check all of them). If
           | you're implementing a sort, compare it to bubble-sort. If
           | you're implementing a SAT solver, compare that to just
           | checking every possible solution (and even for a comparison
           | solver that slow you can check every possible problem up to
           | ~4-6 terms (depending a bit on how you define a "term") and
           | spot-check problems with many terms and 30 independent
           | variables).
        
             | CJefferson wrote:
             | Yes, I've done exactly this with my constraint solver (
             | https://github.com/minion/minion ). Famous last words,
             | almost all bugs that have been reported have been
             | repeatable with 6 variables of domain size at most 6 (maybe
             | even less).
             | 
             | The only exceptions are cases where integer overflow
             | occurs, and those really need seperate tests. I'm mostly
             | handling that by putting fairly conservative bounds on
             | maximum integer values, and asserting if those are ever
             | violated, because (particularly as academic software), an
             | assertion where your solve might work but is undertested is
             | (in my mind) infinitely better than a wrong answer.
        
               | hansvm wrote:
               | What sorts of code normally causes integer overflows in
               | minion? I have to be super cautious of float overflows in
               | most of my code, but usually with integers I'm
               | implementing a bitmask or a 64-bit counter or something
               | similarly not prone to such problems.
        
       | kragen wrote:
       | i've used drmaciver's property testing library hypothesis to find
       | bugs in c libraries before; it has shrinking. 'doesn't crash' is
       | the easiest property to define, but a bit fiddly to connect to
       | property testing libraries, because you have to spawn a
       | subprocess. in one case i just used fork, _exit, and wait, but in
       | some cases things get ugly then (posix threads guarantee you
       | almost nothing, but fortunately i wasn't using threads)
       | 
       | his 'minithesis' is the best way i've found to learn about
       | implementing property-based testing
       | 
       | sat and smt solvers have gotten astoundingly good, and most of
       | the leading ones are free software. smt-comp shows off new gains
       | every year and has a lineup of the leaders
       | 
       | z3 is a particularly easy smt solver to get started with, in part
       | because of its convenient python binding
       | 
       | a thing sat/smt and property-based testing have in common is that
       | they both solve inverse problems, allowing you to program at a
       | more declarative level
        
       | rurban wrote:
       | Ah, counting! One of the hardest problems, that's why I suck at
       | probability and stats. Knuths Volume 4
        
       | alvincodes wrote:
       | I thought this was about SAT physics (used in games, etc)
        
       | stevemk14ebr wrote:
       | this article has almost zero content on the bug, the debugging,
       | or the fix. I expect more details
        
         | sdwr wrote:
         | Yeah, he says "it's too short to be an interesting story", and
         | then doesn't get into any specifics
        
       | droelf wrote:
       | Resolvo (the SAT solver here) has been really good for us. It
       | helped make some conda-forge bots up to 10x faster than the
       | previous C-based solver (libsolv) while being memory safe.
       | 
       | The specific bot tests went from taking 60 minutes to ~6 minutes
       | which is quite remarkable.
        
       | babel_ wrote:
       | Having ended up with a critical bug in the SAT solver I wrote for
       | my undergrad thesis, it really can be a challenge to fix without
       | clear logs. So, always nice to see a little love for contribution
       | through issues and finding minimal ways to reproduce edge cases.
       | 
       | While we do mention how good issue contributions are significant
       | and meaningful, we often forget how there's often more to it than
       | an initial filing, and may overlook the contributions from those
       | that join lengthier issue threads later.
       | 
       | (Oh, and yes, that critical bug did impact the undergrad thesis,
       | but it could be worked around, however meant I couldn't show the
       | full benefits of the solver.)
        
       | pfdietz wrote:
       | I don't see it mentioned here, but SAT solvers are also good
       | targets for property based testing, providing failure cases for
       | subsequent minimization. There are many properties that can be
       | checked, for example that satisfaction-preserving transformations
       | on an input do not alter the answer the solver returns.
        
       ___________________________________________________________________
       (page generated 2024-06-21 23:02 UTC)