[HN Gopher] Chasing a Bug in a SAT Solver
___________________________________________________________________
Chasing a Bug in a SAT Solver
Author : wofo
Score : 69 points
Date : 2024-06-19 06:54 UTC (2 days ago)
(HTM) web link (ochagavia.nl)
(TXT) w3m dump (ochagavia.nl)
| rgovostes wrote:
| If you have a large input that triggers a bug and you want to
| reduce it, it might be worth looking into the Delta Debugging[0]
| algorithm used by C-Reduce[1] to make minimal test cases for
| C-like languages. C-Reduce "happens to do a pretty good job
| reducing the size of programs in languages other than C/C++, such
| as JavaScript and Rust" so it might work out of the box for you.
| (Aside: there's also ddSMT for SMT solvers that take SMT-LIB2
| input.)
|
| 0: https://www.st.cs.uni-saarland.de/dd/
|
| 1: https://github.com/csmith-project/creduce
| kragen wrote:
| thanks, this is fantastic!
| wyldfire wrote:
| I've used creduce (and cvise) to reduce test cases for all
| kinds of inputs, even ones that aren't C-like. If it has
| newline-delimited text, it'll work well (though you should
| adjust the knobs accordingly in order to converge faster).
| andrewchambers wrote:
| I wrote a single file python3 implementation with no other
| dependencies here:
|
| https://github.com/andrewchambers/ddmin-python
|
| It can be used as a cli program or a python library.
| nextaccountic wrote:
| Not sure if this makes sense but is that similar to fuzzing? Or
| can it be used alongside fuzzing
|
| What I am familiar is with property testing libraries that
| automatically shrink failing cases, like quickcheck for
| Haskell, quickcheck for Rust and Hypothesis for Python. I
| suppose that if we modelled C code as an AST one could use
| quickcheck to shrink a test case. Is creduce like this, but
| operating on strings rather than trees?
|
| https://stackoverflow.com/questions/16968549/what-is-a-shrin...
| gergo_barany wrote:
| Yes, the kind of test case reduction that C-Reduce does is
| analogous to QuickCheck's shrinking. Some of C-Reduce's
| transformations are purely "on strings", e.g., "delete a
| physical line from the file", others are a bit more syntax-
| aware, e.g., "delete N tokens", and some actually build an
| AST using Clang and do some semantic transformations like
| "replace function parameter by global variable".
|
| And yes, this is used alongside fuzzing, once a fuzzer (like
| Csmith from the same group as C-Reduce) has found a file that
| provokes a bug.
|
| A reducer is not a fuzzer itself, although with some
| ingenuity it can be made to behave like a fuzzer in certain
| contexts: https://blog.regehr.org/archives/1284
| ajb wrote:
| The Delta Debugging algorithm is an underused tool, which is
| not only applicable to debugging, but any task where you need
| to find a minimal satisfying subset. For example, it can also
| be used to automatically find an (approximately) minimal set of
| cloud permissions needed for some purpose:
| https://github.com/KanoComputing/aws-tools/blob/master/bin/a...
| This is otherwise quite a tedious exercise.
|
| (It's approximate because theoretically you can find a local
| minimum - but for permissions, there's probably only one
| minimum)
| pfdietz wrote:
| A problem with test case reduction occurs when your test
| generator also ensures the test input satisfies some
| properties. This means naive reduction can produce invalid
| inputs for which "failure" doesn't represent a bug. An example
| is in testing C compilers where the C code must not exhibit any
| of the many undefined behaviors described in the C standard.
| Creduce goes to some lengths to work around this problem.
|
| A solution to this is doing the reduction on a tree of random
| numbers provided to the test generator, so the output is always
| a legitimate test input. The Hypothesis testing system takes
| this approach.
| PhilipRoman wrote:
| AFL provides a nice standalone tool "afl-minimize", I've used
| it on several occasions and it works great for certain types of
| inputs.
| JonChesterfield wrote:
| I wrote a CSP solver a while ago. The bugs were alarmingly prone
| to being incomplete enumeration of solutions. The solver would
| say "Yep, here's all 2412 solutions to that problem", but there
| were more solutions out there it missed. I didn't usually know
| how many solutions there would be so it's hard to distinguish
| between "all of them" and "95% of them". However that also threw
| a lot of doubt on when the solver returned unsat - was the
| problem unsat, or was the solver failing to look in part of the
| search space?
|
| I didn't find a plausible testing strategy for convincing myself
| that the last of that class of error was gone and ultimately
| stopped working on it as a result.
| cjfd wrote:
| Maybe one could attempt to use a proof assistant to prove that
| the algorithm is correct?
| nextaccountic wrote:
| Or just prove yourself with pen and paper if the algorithm
| works, using structural induction and other techniques. Then
| it's just a matter of whether the code follows the spec
| CJefferson wrote:
| At present we are a long way from proving general constraint
| solvers are correct, as they are built from some many little
| algorithms that interact in weird ways.
|
| It's an interesting area, and I hope we'll get there in the
| future.
| JonChesterfield wrote:
| For what it's worth I'm now working on a proof assistant so
| yeah, that's the game plan. I think the straight line case
| for the solver is probably OK but there's a huge range of
| optimisations out there and it's far too easy to introduce
| errors there.
| CJefferson wrote:
| The best practical technique is generate random problems, and
| compare your solver with another (or two), and hope you didn't
| both have the same bugs :)
|
| You can also often make progress by "shuffling" problems
| (reordering constraints / variables), or solving sub-problems +
| super-problems.
| hansvm wrote:
| This is my go-to technique for tricky code.
|
| Most of the time it's pretty easy to write a dumb, slow,
| almost certainly correct version of the thing in question and
| just fuzz the two implementations on small inputs. E.g., if
| you're speeding up floating-point division by 2 with bit-
| hacks in the exponent field, compare it to just dividing by 2
| (and for up to 32-bit floats, just check all of them). If
| you're implementing a sort, compare it to bubble-sort. If
| you're implementing a SAT solver, compare that to just
| checking every possible solution (and even for a comparison
| solver that slow you can check every possible problem up to
| ~4-6 terms (depending a bit on how you define a "term") and
| spot-check problems with many terms and 30 independent
| variables).
| CJefferson wrote:
| Yes, I've done exactly this with my constraint solver (
| https://github.com/minion/minion ). Famous last words,
| almost all bugs that have been reported have been
| repeatable with 6 variables of domain size at most 6 (maybe
| even less).
|
| The only exceptions are cases where integer overflow
| occurs, and those really need seperate tests. I'm mostly
| handling that by putting fairly conservative bounds on
| maximum integer values, and asserting if those are ever
| violated, because (particularly as academic software), an
| assertion where your solve might work but is undertested is
| (in my mind) infinitely better than a wrong answer.
| hansvm wrote:
| What sorts of code normally causes integer overflows in
| minion? I have to be super cautious of float overflows in
| most of my code, but usually with integers I'm
| implementing a bitmask or a 64-bit counter or something
| similarly not prone to such problems.
| kragen wrote:
| i've used drmaciver's property testing library hypothesis to find
| bugs in c libraries before; it has shrinking. 'doesn't crash' is
| the easiest property to define, but a bit fiddly to connect to
| property testing libraries, because you have to spawn a
| subprocess. in one case i just used fork, _exit, and wait, but in
| some cases things get ugly then (posix threads guarantee you
| almost nothing, but fortunately i wasn't using threads)
|
| his 'minithesis' is the best way i've found to learn about
| implementing property-based testing
|
| sat and smt solvers have gotten astoundingly good, and most of
| the leading ones are free software. smt-comp shows off new gains
| every year and has a lineup of the leaders
|
| z3 is a particularly easy smt solver to get started with, in part
| because of its convenient python binding
|
| a thing sat/smt and property-based testing have in common is that
| they both solve inverse problems, allowing you to program at a
| more declarative level
| rurban wrote:
| Ah, counting! One of the hardest problems, that's why I suck at
| probability and stats. Knuths Volume 4
| alvincodes wrote:
| I thought this was about SAT physics (used in games, etc)
| stevemk14ebr wrote:
| this article has almost zero content on the bug, the debugging,
| or the fix. I expect more details
| sdwr wrote:
| Yeah, he says "it's too short to be an interesting story", and
| then doesn't get into any specifics
| droelf wrote:
| Resolvo (the SAT solver here) has been really good for us. It
| helped make some conda-forge bots up to 10x faster than the
| previous C-based solver (libsolv) while being memory safe.
|
| The specific bot tests went from taking 60 minutes to ~6 minutes
| which is quite remarkable.
| babel_ wrote:
| Having ended up with a critical bug in the SAT solver I wrote for
| my undergrad thesis, it really can be a challenge to fix without
| clear logs. So, always nice to see a little love for contribution
| through issues and finding minimal ways to reproduce edge cases.
|
| While we do mention how good issue contributions are significant
| and meaningful, we often forget how there's often more to it than
| an initial filing, and may overlook the contributions from those
| that join lengthier issue threads later.
|
| (Oh, and yes, that critical bug did impact the undergrad thesis,
| but it could be worked around, however meant I couldn't show the
| full benefits of the solver.)
| pfdietz wrote:
| I don't see it mentioned here, but SAT solvers are also good
| targets for property based testing, providing failure cases for
| subsequent minimization. There are many properties that can be
| checked, for example that satisfaction-preserving transformations
| on an input do not alter the answer the solver returns.
___________________________________________________________________
(page generated 2024-06-21 23:02 UTC)