[HN Gopher] You can use C-Reduce for any language
___________________________________________________________________
You can use C-Reduce for any language
Author : Tomte
Score : 446 points
Date : 2024-11-27 17:56 UTC (1 days ago)
(HTM) web link (bernsteinbear.com)
(TXT) w3m dump (bernsteinbear.com)
| jchw wrote:
| I did not know about C-Reduce until just now, and I'm already
| hooked. This is basically like discovering git bisect for the
| first time.
|
| I'll have to stuff that into the back of my mind until I run into
| something where it might fit.
| __MatrixMan__ wrote:
| Same story, I can't wait to try this.
| geon wrote:
| I ran into some issues with what might be compiler bugs in
| cc65, a C compiler for the 6502 processor (c64, NES, Apple 1).
|
| I might see if I can get this set up. Apparently VICE supports
| "printing" to a file on the host OS, so I could run my test on
| the emulator.
| brightball wrote:
| ...so I just discovered git bisect for the first time and this
| is pretty cool.
| crdrost wrote:
| Yes! The actual commands that you have to search up to get it
| to run automatically without user input, take almost as long
| to put together as finding the bug that you're finding (in
| the 80% case, of course there are 20% cases that take
| forever) but there is something so satisfying about seeing
| the thing just humming along once you have it set, that just
| makes it so satisfying.
| dataflow wrote:
| > Yes! The actual commands that you have to search up to
| get it to run automatically without user input, take almost
| as long to put together as finding the bug that you're
| finding
|
| Funny, I just wrote the exact same thing only to scroll
| down and see you had the same opinion:
|
| https://news.ycombinator.com/item?id=42262579
| com2kid wrote:
| Doing this manually was part of my first job out of college on
| a C/C++ compiler team. Kind of amazing that there is automation
| that can accomplish the same thing!
| pfdietz wrote:
| It's great when combined with random test input generators.
| eru wrote:
| Yes, most property based testing libraries do this.
| pfdietz wrote:
| Hypothesis in particular has a very clever way to do test
| reduction. The key problem is that if your test generator
| is enforcing some constraint on the input, it's not
| necessarily the case that naive pruning will preserve the
| constraint. Hypothesis has a fairly general way of
| enforcing that the generated test input continues to
| satisfy constraints.
| eru wrote:
| Yes, Hypothesis is great. And 'shrinkray' mentioned
| elsewhere in the comments here seems to be by the same
| author.
| dataflow wrote:
| > I did not know about C-Reduce until just now, and I'm already
| hooked. This is basically like discovering git bisect for the
| first time.
|
| I've known about git bisect for maybe a decade, and used it
| maybe... twice so far? And I think at least one of those times
| it took me longer to understand the command line options than
| it took to just do it manually.
|
| YMMV, maybe it's more useful in a larger collaborative
| codebase, but to me it sounds life-changing in theory but I
| haven't found it that in practice.
| jchw wrote:
| It's pretty incredible if I'm trying to find a breakage in a
| repo I don't understand. Build's failing? Easy enough:
| git bisect start master v1.1 # Bisect from v1.1 to
| current master git bisect run cmake --build .build #
| Find the first commit where `cmake --build` returns non-zero.
|
| No need to bother trying to understand the error message, I
| can go make coffee instead and then just look at the commit
| that broke it. Also, this is really useful for and in
| conjunction with Nixpkgs.
|
| It's not as useful for personal projects because chances are
| it won't tell you anything you don't already know.
| dataflow wrote:
| > It's pretty incredible if I'm trying to find a breakage
| in a repo I don't understand.
|
| > Build's failing? Easy enough:
|
| > It's not as useful for personal projects
|
| How often do you run into this situation, though? For your
| scenario to arise, you need:
|
| (1) A repo you're unfamiliar with (which is already unusual
| for most people... most people spend most of their time
| working on repositories they've become familiar with)
|
| (2) The unfamiliar repo to use standard build commands you
| already know, which is basically never the case in my
| experience (even cmake-based projects almost always need
| variables defined with custom names I don't know
| beforehand, and the configuration process is so onerous I
| find the GUI easier)
|
| (3) An obscure enough breakage where the output is complete
| garbage (otherwise the compile error would just tell you
| the file and line, and git blame would tell you the commit
| directly)
|
| (4) Knowledge that this unfamiliar repo _actually would
| have actually built on your machine_ at some point (which
| is often _not_ the case because cmake isn 't hermetic and
| breakages are often due to having the wrong toolchain
| installed)
|
| (5) For the problem to be a mere build error, as opposed to
| some running of the program that requires investing time to
| automate
|
| (6) For you to not have any sort of prior CI or other form
| of validation on your dependencies, because you apparently
| didn't catch even catch a build error when it was
| introduced (never mind automated testing)
|
| I could go on, but these set of circumstances are more or
| less indistinguishable from ~never to me.
| _flux wrote:
| Well, the conditions were satisfied with pipewire, that
| introduced some dependency at some point that were not
| satisfiable in Debian Stable.
|
| So, for now, I'm stuck with the version one commit before
| that dependency :). (I haven't tried undoing the change
| later, I suspect it might be dependend upon later on..)
| jchw wrote:
| > (1) A repo you're unfamiliar with (which is already
| unusual for most people... most people spend most of
| their time working on repositories they've become
| familiar with)
|
| I don't understand a _ton_ of things. I 've bisected
| Linux, Chromium, Wine, Nixpkgs, and many more. These are
| projects whose codebases I was never going to understand,
| the codebases are too big for one person to ever fully
| grasp. Then there's smaller codebases where I could if I
| wanted to, but if I don't have to, why bother?
|
| > (2) The unfamiliar repo to use standard build commands
| you already know, which is basically never the case in my
| experience (even cmake-based projects almost always need
| variables defined with custom names I don't know
| beforehand, and the configuration process is so onerous I
| find the GUI easier)
|
| In real life I make great use of Nixpkgs understandings
| of how to build things. Nix development shells take care
| of the dependencies and the build commands. However, I do
| have confidence that I can figure out the actual desired
| build commands of _most_ random repositories in under
| five minutes. This applies to private repos at work and
| random open source projects. Even for autotools projects.
| Usually, not that much hoopla is needed, most CMake
| projects don 't need custom cache variables to make a
| basic functional build.
|
| > (3) An obscure enough breakage where the output is
| complete garbage (otherwise the compile error would just
| tell you the file and line, and git blame would tell you
| the commit directly)
|
| Nah, not necessarily, really just any breakage where
| you're not sure what happened. Compiler errors are not a
| great use case for git bisect, but _test failures_ are a
| really good use case. Likewise, git bisect is great for
| crashes. Even if you can 't automate the full process
| easily, for e.g. a GUI program, it's still a lot less
| work to periodically repeat some steps to crash or
| cleanly exit a GUI program to help the bisect along.
| Still, some compiler errors are great for Git bisect.
| Basically any weird linker error is a good candidate.
|
| > (4) Knowledge that this unfamiliar repo actually would
| have actually built on your machine at some point (which
| is often not the case because cmake isn't hermetic and
| breakages are often due to having the wrong toolchain
| installed)
|
| Or just test it.
|
| > cmake isn't hermetic
|
| Nix is, so no problem!
|
| > (5) For the problem to be a mere build error, as
| opposed to some running of the program that requires
| investing time to automate
|
| Not true. It's harder to automate GUI failures, but that
| does not mean bisecting isn't incredibly valuable for GUI
| apps. Remember, in a bisect where you never have to skip
| a commit (e.g. intermediate builds _rarely_ fail for
| unrelated reasons, a very common case surprisingly)
| bisect only ever takes log2[n] steps, so even if you are
| in a repo that has a very huge commit volume, the bisect
| will be at most a handful of steps. Repeating the same
| GUI action like 10 times is hardly a big chore,
| especially for a machine to magically crank out the exact
| origin of where the thing broke. For CLI apps, it 's even
| easier since you can use bisect run still.
|
| > (6) For you to not have any sort of prior CI or other
| form of validation on your dependencies, because you
| apparently didn't catch even catch a build error when it
| was introduced (never mind automated testing)
|
| Not true. Everyone uses CI, but CI will _never_ catch
| 100% of regressions. There 's a combinatorial explosion
| of potential build environments and there's _no way_ a CI
| environment can test every possibility. Can 't test with
| every library version, every release of every compiler,
| every combination of using vendored library vs system
| library or so on.
|
| > I could go on, but these set of circumstances are more
| or less indistinguishable from ~never to me.
|
| For me it happens more often touching open source things
| than at work, but really the probability it will be
| useful increases as the size and inscrutability of a
| project increases. For trying to debug Chromium or
| Electron issues, bisect reigns supreme. (My current
| workplace doesn't ship software built on top of Chromium,
| but replace Chromium with any other big and inscrutable
| dependency and you get the idea.)
|
| I suspect it may be the case that I bother to root cause
| and debug things that other people would simply
| completely write off as impossible or not worth the
| debugging, thanks to the power of Git bisect.
| dataflow wrote:
| Just to clarify: I wasn't disagreeing with the use cases
| you described in _this_ comment (which are certainly
| valid, just not common for most people I think), but
| rather the use case in your _previous_ comment. The
| scenarios are basically the opposite of each other, hence
| my replies above. For example:
|
| >> Build's failing? git bisect run cmake --build .build
|
| > Compiler errors are not a great use case for git
| bisect, but _test failures_ are a really good use case
|
| These aren't build failures.
|
| > For trying to debug Chromium or Electron issues, bisect
| reigns supreme.
|
| These are neither a simple CMake! Even their build
| process changes over time.
|
| (Side note, I'm surprised that for a massive third party
| dependency that you don't understand, the specific commit
| is what you really want. I would've thought you'd just go
| to the previous version, which is presumably more stable
| and supported too.)
|
| >> No need to bother trying to understand the error
| message, I can go make coffee instead and then just look
| at the commit that broke it.
|
| > Not true. It's harder to automate GUI failures, but
| that does not mean bisecting isn't incredibly valuable
| for GUI apps.
|
| I'm not sure how we got into discussing GUIs, but if
| you're debugging a dependency that's a GUI, then having
| to bisect it (as useful as that is) is very much the
| opposite of "no need to understand it, just go grab a
| coffee"!
|
| All of which is to say: I'm not claiming git bisect is
| useless by any means. It's nice that it's there and you
| can use it. I'm just saying that it's often (often !=
| always) a small difference in cost vs. bisecting
| manually.
| jchw wrote:
| > These aren't build failures.
|
| While they are not _compilation_ failures, I do sort-of
| consider unit test failures to be _build_ failures if the
| unit tests are typically ran as part of the build. But
| either way, I do still use bisect to find the origin of
| weird compilation errors, FWIW. As an example, I used it
| on Envoy proxy to figure out a really strange linker
| error on Darwin. It 's just not great for e.g. a syntax
| error, because compiler diagnostics will give a more
| obvious answer most of the time. I do agree with that.
|
| > These are neither a simple CMake! Even their build
| process changes over time.
|
| Chromium and Electron are both fairly easy builds. Not
| because they are simple to build, but rather because the
| build process is extremely automated and documented. (The
| real hard part is balancing concurrency with OOM:
| Electron defaults to -j200 and on my 7950X3D it will
| happily eat _all_ of the 128 GiB of RAM. Keeping modern
| CPU cores fed is hard work...)
|
| > (Side note, I'm surprised that for a massive third
| party dependency that you don't understand, the specific
| commit is what you really want. I would've thought you'd
| just go to the previous version, which is presumably more
| stable and supported too.)
|
| Well, I want the bug to be fixed. Even if I am not going
| to submit the actual patch that fixes it, providing
| bisect details in my bug report will vastly increase the
| odds that it gets fixed. However, with the magic of
| bisect I often _can_ make working patches for complex
| bugs and then patch them. If it 's software I use on my
| system, I can then add a temporary patch in my Nix
| configuration and continue to run the latest version. I
| don't always do this, but it is very useful. I don't like
| to rely on hoping someone else will figure out my bugs if
| I can help it.
|
| I am also a package maintainer in some cases. If I run
| into a bug on something I maintain a package for, I will
| often send a PR fixing it and then pull that patch into
| my package temporarily.
|
| > I'm not sure how we got into discussing GUIs, but if
| you're debugging a dependency that's a GUI, then having
| to bisect it (as useful as that is) is very much the
| opposite of "no need to understand it, just go grab a
| coffee"!
|
| I'm not sure what you mean, I'm just saying that git
| bisect works even for complicated runtime issues. There's
| no need to debug the GUI: all you have to do is tell git
| bisect if the bug is still there in whatever commit it
| dumps you in. You do it a few times and you have the
| commit that broke things. It's magic every time.
|
| As an example of a GUI failure I've "debugged" with this,
| I found a regression in Wine using this technique.
|
| > All of which is to say: I'm not claiming git bisect is
| useless by any means. It's nice that it's there and you
| can use it. I'm just saying that it's often (often !=
| always) a small difference in cost vs. bisecting
| manually.
|
| Bisecting manually? I don't understand. Do you mean still
| using git bisect without using the run command? I do that
| when debugging more complex failures so I can skip
| commits broken for unrelated reasons. Or, do you mean
| literally manually picking commits to test and bisecting
| like that? If so, I'm confused. Starting a git bisect is
| a single command where you pass it the bad and good
| commit. The rest of the commands don't even have to be
| memorized because it tells you what to run (not that it's
| hard since its just git bisect good, git bisect bad, git
| bisect skip...) I cannot fathom it being less work to do
| this manually, especially since git bisect handles
| complex situations where there are diverging merges to
| explore.
|
| Even at Google, where the CL numbers were pretty much
| linear, we _still_ had a bunch of tooling around
| bisecting google3 to find regressions. I 'm a little
| surprised it's any question that it's worth it, you can
| bisect 100,000 commits in ~17 steps without sitting there
| trying to figure out which commits you need to pick.
| That's incredible value and it has helped me root cause
| bugs in all kinds of stuff, like pure magic.
|
| Anyway, I'm not trying to be condescending, but as you
| may have gathered I actually bisect fairly frequently and
| I also am very sure it is saving me time every time I do
| it. Especially since I reckon more than half the time I
| don't really have to do anything, so even when the build
| can take a long time it's still not a problem.
| dataflow wrote:
| I should perhaps clarify, this whole discussion I'm
| referring to the usefulness of the _git bisect_ command
| specifically, not _bisection_ as a technique. So these
| weren 't in any way intended to be a comment on e.g.
| tooling you may have had at Google. I bisect things _all
| the time_ -- across files, across lines, and across time.
| It 's just ~never via git-bisect.
|
| Please also note I have no doubt that _your_ workflow
| gets immense value from git-bisect! Almost every tool is
| incredibly valuable to certain subset of users and
| workflows. What I 've been debating is just how useful I
| see it being to the most typical users. I know I haven't
| actually come across _anyone_ using git-bisect for
| several years now, and I myself haven 't either.
| Bisection as a technique itself is incredibly useful for
| everybody -- it's just not something I see done via git-
| bisect with any frequency, is all.
|
| > Bisecting manually? do you mean literally manually
| picking commits to test and bisecting like that? If so,
| I'm confused
|
| That's what I mean, yeah.
|
| > The rest of the commands don't even have to be
| memorized because it tells you what to run (not that it's
| hard since its just git bisect good, git bisect bad, git
| bisect skip...) I cannot fathom it being less work to do
| this manually
|
| While I didn't actually intend to say it's _faster_ to
| avoid git-bisect -- just that it 's a _similar_ amount of
| effort -- now that you mention it, I actually _do_ think
| it is often faster for me. There are multiple reasons
| why:
|
| - Despite it taking a few more iterations in the worst
| case in theory, manually picking my own commits lets me
| split based on e.g. version numbers, rather than random
| commits. I find it much, much more useful to know that a
| problem came up between v4.3.1 and v4.4, rather than
| between commit 1133728 and commit 1133262, say. And I
| know -- and this is key -- that those commits will be
| well-tested, stable, and released to the public. So I
| don't have to worry about encountering random bugs in the
| middle, even unrelated ones. By contrast, "git bisect
| skip" might seem easy to type in 3 seconds, but if you
| encounter a _single_ failed build, then you lose all that
| time you "saved", and then an _enormous_ amount on top
| of that, merely by virtue of having to build & testing
| those failed iterations. I really, really wouldn't want
| to do a full build of Chromium -- even an "incremental"
| one -- just to encounter a bug in it and have to skip to
| another commit!
|
| - I actually almost always have _some_ idea of where the
| bug is, merely based on the tags /versions, or commit
| messages, or just the timestamps. Sometimes, literally
| searching for a keyword pops up a relevant commit for me
| to test, and then doing an incremental build with the
| commit immediately after is _much_ faster (almost free)
| compared to doing what ends up being a ~full rebuild for
| a distant commit. So it actually _does_ end up being far
| fewer iterations and less time in some cases.
|
| - Git bisect is stateful. It means your repo is no longer
| in a normal state, it's in the middle of a bisect! That's
| annoying to say the least. I've gotten burned way too
| many times by running the wrong command or forgetting
| that I'm in the middle of an interactive merge or rebase.
| I don't want to have to think about an _additional_ state
| I could be in when I leave my work and come back to it.
| It 's a huge cognitive burden, and I would really rather
| avoid unless it's _really_ clearly going to pay for
| itself.
|
| Again, I'm not saying these trade-offs apply the same way
| to you. You encounter scenarios that I clearly don't :-)
| I'm just explaining where I'm coming from, and (what I
| believe are) the reasons I don't really see other people
| using git-bisect frequently.
|
| > especially since git bisect handles complex situations
| where there are diverging merges to explore.
|
| Diverging merges is a super interesting case, I've never
| needed to deal with that with bisection before (whether
| manual or automatic). If your history is nonlinear then I
| definitely see a much larger value in git-bisect. Though
| at the same time the result isn't even necessarily a
| single commit, so I'm not even sure what exactly I would
| expect it to do in that case!
| SnowflakeOnIce wrote:
| At a previous job, with a huge C++ codebase and ~100
| developers over 20 years, many platforms supported, a build
| could take hours, and the test suite took so long to run that
| it would take several days or weeks of real time to get
| through it all.
|
| This cycle time combined with occasional unexpected
| interactions between components meant that in every release
| cycle, there were dozens of complicated failing tests where
| it was not obvious which code change was responsible.
|
| `bisect` here was extremely helpful: instead of having to
| pore over commit history and think very hard, I could bisect
| with a small wrapper script that would build and run the
| failing test in question. Builds still took hours, but I
| could usually autonatically pinpoint the responsible code
| change for one of the tests overnight.
|
| (This was not using Git, but Perforce, for which I had to
| write `p4-bisect`. Alas, it's not open-source...)
| dataflow wrote:
| > in every release cycle, there were dozens of complicated
| failing tests
|
| Sorry if this is a dumb question, perhaps I'm not
| understanding what you mean by every release cycle, but
| does this mean nobody in the team/company ran tests until
| it was time for release? That sounds like a really big
| everlasting wound to put a tiny git-bisect bandage on!
| botanical76 wrote:
| At least in my "Agile" experience, it is always time for
| release.
| Someone wrote:
| > does this mean nobody in the team/company ran tests
| until it was time for release?
|
| The OP wrote "many platforms supported, a build could
| take hours, and the test suite took so long to run that
| it would take several days or weeks of real time to get
| through it all"
|
| = I think they did run tests, but typically not all of
| them on all platforms.
| SnowflakeOnIce wrote:
| About a dozen OSes / configurations were supported, and
| the entire test suite would take a few days to run on
| each such configuration.
|
| This was native desktop+server software, not a hosted
| SaaS thing.
|
| Major releases were put out every few months.
|
| Developers _did_ run tests regularly with every change
| they would make, but it was infeasible to run all the
| tests in every configuration for each change. So they
| would try to choose tests to run that seemed most
| relevant to the code they had changed.
|
| The entire test suite would run more or less constantly
| on shared machines, and every few days some new tricky
| failure would be detected. The tricky failures were
| almost always the result of some unanticipated
| interaction of features, frequently on the more obscure
| configurations (like IBM z/OS).
|
| The problem was not that developers were not testing, but
| that the tests were infeasible to run every time. So
| instead, testing became an optimization problem.
| dataflow wrote:
| Gotcha, thank you for the explanation! Indeed it sounds
| like something git bisect is perfect for, when you're in
| these sorts of situations!
| deskr wrote:
| > but I haven't found it that in practice
|
| That's perfectly fine, and I'd say lucky you.
|
| Bisect is a type of tool that you hope you won't need, but
| when you need it it's a life saver. You're right about the
| larger collaborative codebase. Imagine trying to find a
| strange bug that got introduced sometime in the last 12
| months in a huge and active repo.
| furyofantares wrote:
| Without an explanation of _why_ it works on languages other than
| C, that is a hard claim to believe! I mean, I don 't think
| they're lying but (given it doesn't use an LLM) I am bewildered.
| johnisgood wrote:
| Yeah, I would have liked to see more examples and an
| explanation.
| evmar wrote:
| I glanced at the code. It seems to have multiple possible
| "passes" which reduce the code in various ways, and the
| passes here not tagged with "C"=>1 are used in the not-c mode
| recommended in the post.
|
| https://github.com/csmith-
| project/creduce/blob/31e855e290970...
|
| You can see the pass implementation in files on the left.
| wat10000 wrote:
| The key thing is that the transforms it makes aren't
| required to always produce a program that can actually run,
| or even build.
|
| The user provides a script which determines if a program is
| "interesting." A program with build-time errors should be
| considered "not interesting" in most cases. (And if you're
| hunting an incorrectly reported error, you'll want to make
| sure you only consider that error to be interesting.)
|
| Then it yoinks out parts of your program and checks if the
| result is still interesting. If it's not, that attempt is
| discarded and it tries something else. If the reduced
| program is still interesting, it will try yoinking out more
| stuff. Repeat until you like the result.
|
| There doesn't need to be any understanding of the program
| in order for this to work. You just need something where
| removing some bit of the program has a chance of producing
| a new program that's still interesting. That works in most
| programming languages.
|
| This process can take a long time, and it's slower when
| there's a lower probability of a removal producing an
| interesting program. Thus heuristics are added: don't
| remove random character ranges, but work at a token level.
| When removing a brace, find the matching brace and remove
| the insides. That sort of thing. Most modern languages are
| similar enough to C that many of these rules are helpful
| for them too. And even if they don't help, this just slows
| the process down, but it still works.
| bqmjjx0kac wrote:
| Yes, and without understanding how it works, I'm left wondering
| whether it's even safe to use this way. Will creduce execute
| mutated versions of the input script, potentially deleting my
| files or eating my lunch?
| None4U wrote:
| it will execute whatever script you pass it (immutable) and
| mutate whatever other files you pass it
| defen wrote:
| I think the question is, how do you know C-reduce isn't
| going to mutate your test case into calling
| `remove("/path/to/my/important/file");`
| frabert wrote:
| C-reduce is meant to... Reduce your files, it would not
| add stuff that was not there in the first place. Also, I
| think it's meant to only be run against the "frontend" of
| most languages, not full execution
| chubot wrote:
| Yeah but you could have a shell script that does
| rm -rf /foo/bar
|
| and in theory it could reduce it to rm
| -rf /
|
| The passes are very heuristic, so that doesn't seem like
| something you can rule out.
|
| (and I actually want to run it on shell scripts! )
| Cieric wrote:
| While I've used c-reduce before, I've never done it in a
| way where it could be destructive. However speculating
| based on what I do know. I think the 2 things I would do
| would be in the interesting-ness test, grep for already
| known harmful instructions and force them to be
| uninteresting (returning 1). And then if I was still
| unsure of the potential harm of the program and I had to
| run it to determine if it's interesting or not (segfault
| or something similar). I think I would hoist the
| binary/script into a docker container and run it there.
| That way the log and result can still be checked, but
| it's access to actual file system is minimized as much as
| possible.
|
| TLDR; C-Reduce just gives you text to run/compile, if
| you're worried about things like that sandbox as much as
| possible.
| btilly wrote:
| So just run it inside of a Docker instance.
| Kuinox wrote:
| Without looking at the paper, I would guess it would be like a
| fuzzer that apply mutations that reduce the size of the input.
| gnulinux wrote:
| Isn't that potentially unsafe though? Random permutations can
| reduce arbitrary programs to destructive programs, in
| particular any program can be mutated to `rm -rf /` given
| enough time. Also even the problem of finding syntactically
| valid programs is combinatoric, it's pretty surprising that
| it can go toward a local optima in such a short time without
| having any understanding of the function or its derivatives.
| jewel wrote:
| For compiled languages it should be fine, as you're only
| going to compile the permuted source code, not execute it.
|
| Given a sufficiently bad compiler bug anything is possible,
| but I think given that it's trying to minimize the size of
| an input that gives a particular output I don't think it's
| likely to explore distant branches.
| Quekid5 wrote:
| > Given a sufficiently bad compiler bug anything is
| possible, ...
|
| I can't help but wonder about the consteval/constexpr
| machinery in the various C++ compilers... It'd be
| interesting to see how much adversarial input has been
| tested for those.
|
| (I guess Zig's comptime might also be relevant, but I'm
| not sure what that's allowed to do. Maybe execute
| arbitrary code?)
|
| ... anyway just a stray thought.
| cztomsik wrote:
| IIRC there's `zig reduce` builtin in Zig :) https://githu
| b.com/ziglang/zig/blob/master/lib/compiler/redu...
| adastra22 wrote:
| It runs the executed code in order to determine if the
| bug exists, does it not?
| Cieric wrote:
| That depends completely on the interesting-ness test
| that's provided. If you're looking for a compiler bug
| (like I do often for my language), then the interesting-
| ness test checks the compile logs for information like
| the "Segmentation Fault" text, no need to run the actual
| executable. You could also hoist everything into docker
| if you really want to, or ship it off to another device
| to run.
| 0x457 wrote:
| > For compiled languages it should be fine, as you're
| only going to compile the permuted source code, not
| execute it.
|
| Only if you'te going for compiler bug. If you're working
| on minimal reproducible example. You need to make sure
| your code isn't reduced to:
|
| int main() { return 0; }
|
| in that case.
| _flux wrote:
| I don't think that should trigger a reasonable
| interesting condition one has specified, so that should
| not happen. So I suppose for non-compiler-bugs you need
| to check (from the output) that the correct thing is
| being done, in addition to detecting the actual issue.
| bawolff wrote:
| It seems pretty unlikely to mutate in a malicious
| direction, random chance probably wouldn't get there, and
| there doesn't seem like any reason it would be guided in
| that direction.
|
| "Enough time" does a lot of work here and warrants further
| analysis. With enough time it might produce the works of
| shakespare (if you ignore its designed to minimize), but we
| should probably view anything that would take more than a
| year as too much times.
| eru wrote:
| Use a sandboxed VM?
| ndjdjddjsjj wrote:
| Docker saves the day again!
| wiml wrote:
| I recommend skimming the PLDI paper linked by asmeurer, it has
| a good summary. Some of its transforms are pretty C-specific,
| using the Clang frontend; some are pretty generic and probably
| work on any Algol-descended language. It's meant to be a
| modular tool, so you could add transforms that understand other
| languages if you like.
| keybored wrote:
| > I mean, I don't think they're lying but (given it doesn't use
| an LLM) I am bewildered.
|
| I'm pretty sure (based on the last time I saw this) that this
| is just good old fashioned computer science.[1]
|
| I really hope that HN hasn't forgotten about good old fashioned
| computer science stuff already.
|
| [1] All the algorithm and whatnot stuff, not the spooky machine
| learning stuff. Even things like Prolog-for-AI, although that
| has the slight downside of not working (for the purposes of
| making AI).
| furyofantares wrote:
| To be clear my comment was meant to be an awareness that it
| is good old fashioned computer science. Without LLMs, which
| this predates, it is surprising to me that you'd have a lot
| of success reducing a program in an arbitrary language and
| still having something that's valid syntax!
|
| I do get that it will reject a lot of stuff as not working
| (and has to even in the target language) and I also get that
| algol-like languages are all very similar, but I am still
| surprised that it works well enough to function on ~arbitrary
| languages.
|
| These are very LLM-era properties for a program to have. The
| question is not "does it work for language x" but "how well
| does it work for language x", and the answer is not "is it
| one of the languages it was designed for" but instead "idunno
| try it out and see".
| ndjdjddjsjj wrote:
| An AST for you favourite language is readily availble so
| get one and go nuts.
|
| I am sure in 2006 or so I was doing "extract method" in
| Resharper. And Lispers probably doing it for centuries ;)
| Philpax wrote:
| The fascinating part isn't that it can modify code
| accordingly to some rules - it's that it can modify code
| from _arbitrary_ languages without having a grammar or
| semantic analysis for them.
| ndjdjddjsjj wrote:
| I think the tool has some grammar info: C, C++, C#, Java,
| Go, JS are similar.
|
| And that is enough when:
|
| 1. Most code in a code file is irrelevant to whatever
| specifc bug is being looked at.
|
| 2. It is removal of code at play not writing code.
| jcranmer wrote:
| So the short answer is that some of the reduction passes are
| pretty generalizable to C-family languages, and these can be
| some of the most effective passes.
|
| Some of those passes are:
|
| * Tokenize the input according to C, and randomly chuck away
| chunks of length 1-(I think) 13. Most languages vaguely similar
| to C have very similar tokenization rules to C, so these kinds
| of passes are going to have similar effects, and this will tend
| to be effective in doing things like omitting needless
| qualifiers or attributes.
|
| * Balanced parentheses, and chuck away units of balanced
| parentheses. This includes (), {}, and []--and for almost any
| language, this can be a useful level of stuff to throw away or
| reduce.
|
| * Stripping comments and whitespace. Again, many languages use
| the /* and // styles of comments that C does, so this is pretty
| effective.
|
| There's actually relatively few passes that are incredibly
| C/C++-specific, and in my experience, one of the biggest issues
| with creduce is that it is absolutely terrible at trying to
| detemplatize the code as a reduction step, which should be a
| relatively easily automatible step.
| ndjdjddjsjj wrote:
| The compile and test condition do the heavy lifting.
| DRMacIver wrote:
| I think the easiest way to think about it is that it's gradient
| descent, it's just a slightly weird form of it over a discrete
| space.
|
| If you imagine you've got some neighbourhood function N(x) that
| takes a program and returns a large sample of valid programs
| that are "similar" to x (its neighbourhood), then your basic
| test-case reduction algorithm is just:
|
| 1. Start with some initial test case. 2. Look at the
| neighbourhood of your current test case. If any elements of it
| are both interesting and smaller than your current test case,
| move to one of those. If not, stop. 3. Return the smallest
| interesting test case you've seen at the end.
|
| And most of what varies from test-case reducer to test-case
| reducer is that neighbourhood function (how you explore the
| neighbourhood also varies, but you can think of that as an
| implementation detail that mostly only affects performance).
|
| This approach will generally work pretty well because the sort
| of bugs you tend to find are "dense in program space". This
| will e.g. do nothing if you start from the property that the
| file has a particular sha1 hash, but if you start from a
| property like "this uses this particular pair of language
| features", most nearby programs will share it.
|
| The nice thing about doing test-case reduction is that it
| doesn't actually matter if your neighbourhood contains only
| valid programs, because you can rely on the interestingness
| test to filter out any invalid program. So all you care about
| really is:
|
| 1. It contains a fairly large set of valid programs among the
| test cases you consider. 2. It's not too large so as to be
| prohibitive to explore.
|
| And this frees you up to just try a bunch of heuristics that
| are likely to work pretty well, and it turns out that there are
| a lot of easily accessible ones. In particular, deleting
| contiguous chunks of a program pretty often produces a valid
| program, which will always be shorter.
|
| For a trivial example, imagine you've got something like:
|
| if True: blah()
|
| in a Python program. It might look like you need to know about
| Python syntax to reduce this to just `blah()`, but in fact you
| can reduce it by deleting the 13-byte string `"if True:\n "`.
| So if your reducer is happy to brute force try all short
| strings, it will be able to make this transformation.
|
| This isn't a real example exactly. C-reduce I think won't find
| this particular one (but I might be misremembering its exact
| heuristics here). Shrinkray will, but it will do so by
| understanding Python - it stops at 10-byte sequences, which
| won't work here. But it demonstrates the sort of thing that can
| work surprisingly well without understanding syntax.
|
| Another thing you can do is you can understand just enough
| syntax in a general purpose way. For example, shrinkray (and I
| think C-Reduce) will look for integer literals in the program
| and try replacing them with integer literals representing a
| smaller number. In order to do this, it doesn't have to know
| anything about the program's lexical syntax, it just has to
| know what a number looks like.
|
| Heuristics like this will often fail to work, but that's OK,
| you just need them to succeed enough, and often some pretty
| crude stuff will get you down to something that is small enough
| that a human can get it the rest of the way.
| graypegg wrote:
| Found this article with examples of before and after:
| https://pramodkumbhar.com/2024/01/c-reduce-systematically-ta...
|
| But still having trouble understanding how it knows WHAT to
| remove when trying each iteration. There must be some
| tokenization going on, but then I don't know how that would work
| across programming languages
| wood_spirit wrote:
| Creduce is awesome.
|
| I used a test script that spent hours using CSmith to generate
| random test programs when developing an esoteric llvm target.
| When they crashed it automatically Creduced and left them for
| inspection. Invaluable!
| asmeurer wrote:
| A paper by the authors John Regehr et al. from 2012 explaining
| how it works https://fsl.cs.illinois.edu/publications/regehr-
| chen-cuoq-ei....
| gnulinux wrote:
| I read this paper and I still feel lost as to _how can this
| even be possible_. It seems to understand how to tokenize,
| merge lines, remove tokens etc for arbitrary programming
| languages. Is there another paper that explains this algorithm
| alone?
| IshKebab wrote:
| It basically runs transformation passes over the code until
| they stop changing the code or break its behaviour.
|
| It seems like a small number of the passes are not specific
| to C:
|
| https://github.com/csmith-
| project/creduce/blob/master/creduc...
|
| `"C" => 1,` means it is only for C.
|
| I would guess `pass_lines` is the most important for non-C
| code; I'm guessing (it's written in unreadable Perl) that it
| removes lines.
|
| So while it _can_ work for languages other than C, most of
| its features are C-specific so it 's not going to work nearly
| as well. Still I'd never heard of C-Reduce; pretty cool tool!
|
| Someone make one based on Tree Sitter, quick!
| the8472 wrote:
| douzo https://langston-barrett.github.io/treereduce/
| codesnik wrote:
| unreadable perl?! I clicked on the link expecting something
| super-terse and unstructured, but it's anything but. What
| the hell is wrong with people casually dropping such
| remarks.
| dazed_confused wrote:
| Because if you aren't writing in python or JS/TS then it
| is gibberish. Not like Perl is used all throughout Linux
| system...
| dataflow wrote:
| I was wondering the same thing but I guess the key is
| probably that you don't actually have to do it correctly
| every time. Just tokenizing based on a few common
| characteristics (brace pairing, quotes, indentation,
| newlines, etc.) should let you trim a ton without knowing
| anything about the language, I imagine?
|
| My real worry is what if this ends up running dangerous code.
| Like what if you have a disabled line that writes instead of
| reading, that randomly gets reactivated?
| steventhedev wrote:
| It's intended to run for producing compiler test cases, so
| there shouldn't be any code that's actually running.
|
| CPython includes a flag to only run parsing/compiling to
| bytecode. While you can use it like they did here and run
| the code - it really depends on how much you trust every
| possible subset of your code
| summarity wrote:
| > Just tokenizing based on a few common characteristics
| (brace pairing, quotes, indentation, newlines, etc.) should
| let you trim a ton without knowing anything about the
| language, I imagine?
|
| Yep, here's an explanation from a related tool, which was
| spawned by the parser-parser-combinator[0] approach to
| syntactical analysis and transformation:
| https://comby.dev/blog/2021/03/26/comby-reducer - which is
| based on what you've said.
|
| [0] - https://www.youtube.com/watch?v=JMZLBB_BFNg
| nyrikki wrote:
| This paper is not about C-Reduce as a whole but 3 new "domain-
| specific test-case reducers" that the authors added to the
| project.
|
| Most of the non-domain specific reductions in C-Reduce is
| simply brute force IIRC.
| lambda wrote:
| Note that as recommended by John Regehr, author of C-Reduce, for
| this use case you might also want to try Shrinkray, a tool that
| was written to be format independent and works well for cases
| that C-Reduce dow not:
| https://mastodon.social/@regehr/113489759789563570
| canu7 wrote:
| Looking forward to try it! A shout out to cvise
| (https://github.com/marxin/cvise) as a python alternative that
| works well too for non c languages.
| droideqa wrote:
| Is this[0] the official repo of Shrinkray?
|
| [0]: https://github.com/DRMacIver/shrinkray
| eru wrote:
| Interesting, isn't that also the author of Python's
| hypothesis?
| lambda wrote:
| Yes, hypothesis also does minimization of failed test
| cases, so it's kind of a similar problem, just a question
| if what format the data is in and how you invoke your test.
| DRMacIver wrote:
| Yes, that's me.
|
| I accidentally got obsessed with test-case reduction as a
| result of writing Hypothesis, and wrote shrinkray because I
| thought it was ridiculous that I hadn't put all of the
| things I'd learned about test-case reduction into some
| general purpose tool that other people could benefit from.
|
| Shrinkray is still a bit under-advertised and rough around
| the edges in places, but I think that basically there's no
| good reason to use C-reduce over shrinkray for things that
| aren't C or C++. Shrinkray has very good generic heuristics
| that should work in most languages (it even works pretty
| well on binary files apparently, although I've not tested
| this myself), a couple of language specific passes for
| other languages, and is much more fun to watch.
|
| It might even be the case that you should use shrinkray for
| C and C++ too (because I use C-Reduce's transformations for
| those languages), but I'm less confident that that's true.
| The best I'll confidently claim is "probably more
| competitive than you'd expect".
| lambda wrote:
| Yep!
| andrewchambers wrote:
| I wrote a general purpose delta debugger in python -
| https://github.com/andrewchambers/ddmin-python. Can be used to do
| the same thing.
| judofyr wrote:
| Well, aren't you going to share the reduced file?
|
| Okay fine, let's see for ourselves: # Setup
| git clone git@github.com:RustPython/RustPython.git && cd
| RustPython && cargo build --release git clone
| git@github.com:tekknolagi/scrapscript.git && cp
| scrapscript/scrapscript.py . # Download
| interesting.sh, replace the path to RustPython chmod +x
| interesting.sh # Command in article: nix run
| nixpkgs#creduce -- --not-c interesting.sh scrapscript.py
|
| Niiice: (93.2 %, 13717 bytes) (93.2 %,
| 13683 bytes) (93.3 %, 13614 bytes) (93.3 %, 13592
| bytes) (93.3 %, 13571 bytes) (93.3 %, 13517
| bytes) (93.3 %, 13449 bytes) (93.4 %, 13412
| bytes) (93.4 %, 13365 bytes) (93.4 %, 13333
| bytes) (93.4 %, 13313 bytes)
|
| It seems to have stopped at "(96.4 %, 7347 bytes)" with the
| following output:
| https://gist.github.com/judofyr/47cba8a20cb2cd5798943ef975d0...
| tomberek wrote:
| Random thought. Another commenter worried about the runtime of
| the program becoming mangled and performing destructive
| operations on your machine. What if you run the reducer as a
| source-to-source Nix derivation? Protects against dangerous
| things, and can be easily distributed to remote builders.
| capitol_ wrote:
| How does this protect against dangerous things?
|
| My understanding is that this would just cause the dangerous
| things to be repeatable.
| dietr1ch wrote:
| It must be that by running any program within the nix build
| sandbox you don't expose your files unless you discover a
| privilege escalation attack by chance during the reduction
| process.
| judofyr wrote:
| Oh, that's a neat idea. To reply to the sibling comment: Nix
| runs commands under a sandbox. It's not bullet proof, but
| useful enough that it will prevent obvious commands like
| deleting local files and accessing the network.
|
| It took some time to get RustPython to run, but it seems to
| work fine: https://gist.github.com/judofyr/82d2255c92ebb0c6b3
| a6882ac9fd...
|
| Also, after running it for 5 minutes it was now able of
| reducing it to a much smaller case: class
| a: def b() : c def h(d) :
| while d: break else:
| return d.b() def j(e) :
| f = a() if f.h () : g
| i = "" j(i)
| hinkley wrote:
| You generally shouldn't use absolute paths in your programs.
| That makes it dangerous for local development, and makes
| running two copies of the program to compare behaviors
| difficult.
|
| There's a Venn diagram of people who don't care enough to do
| that (works for me!) and people who would never think to use
| c-reduce. It's not a perfect circle, but it'll be fairly
| close.
| somat wrote:
| > You generally shouldn't use absolute paths in your
| programs.
|
| This is true, but I was enjoying the irony that there is an
| old sys-sdmin adage that you should only use absolute paths
| in your program(usually a shell script, in this
| environment) this is to make sure it is running exactly
| what you expect it to run.
|
| So always put "/usr/bin/awk" instead of just "awk"
|
| I had a co-worker once who took this as gospel. His scripts
| were always... interesting... to port to a new environment.
| DRMacIver wrote:
| FWIW shrinkray gets to 162 bytes if I leave it to run for about
| 10 minutes.
| https://gist.github.com/DRMacIver/ee025c90b4867125b382a13aaa...
|
| I think it might do a little but not a lot better if I left it
| to run for longer but it looked to have mostly stalled then so
| I got bored and killed it.
| def- wrote:
| Works well for SQL too, I've been using it at my day job, found
| out via https://github.com/sqlancer/sqlancer?tab=readme-ov-
| file#redu...
| gmueckl wrote:
| How does this compare to dustmite
| (https://dlang.org/blog/2020/04/13/dustmite-the-general-
| purpo...)?
| dswilkerson wrote:
| Delta debugging is not new:
| https://en.wikipedia.org/wiki/Delta_debugging
|
| My implementation of delta debugging, "delta" is 19+ years old:
| https://github.com/dsw/delta
|
| I released it as Open Source because Microsoft Research sent
| someone to my office to ask me to, back when Microsoft was
| calling Open Source a "cancer".
|
| The LLVM introduction by Latner refers to the "standard delta
| debugging tool", so it is rather well-known:
| https://aosabook.org/en/v1/llvm.html 'unlike the standard "delta"
| command line tool.'
| arxanas wrote:
| For other readers' benefit: C-Reduce is a little more
| sophisticated than plain delta-debugging. From the abstract of
| Test-Case Reduction for C Compiler Bugs (2012):
|
| > [...] [C-Reduce] produces outputs that are, on average, more
| than 25 times smaller than those produced by our other reducers
| or by the existing reducer that is most commonly used by
| compiler developers. We conclude that effective program
| reduction requires more than straightforward delta debugging.
|
| (Of course, this means that C-Reduce is 12 years old now.)
|
| At the same time, C-Reduce seems to be more general than the
| LLVM tool you linked ("BugPoint", dating to 2002), since that
| one works with LLVM IR specifically.
|
| I think most developers are generally unfamiliar with automatic
| test case minimization tools and techniques, so the post may be
| helpful even if the ideas have been known in their respective
| circles for quite some time.
| jheriko wrote:
| the seems somewhat interesting but the example given is
| incredibly stupid.
| tekknolagi wrote:
| Thank you for your feedback. I'll take that into consideration
| for my next blog post and compiler.
| wyldfire wrote:
| A good companion/ successor is cvise.
|
| I've used c-reduce/cvise to reduce assembly programs to the
| minimal set that evokes an assembler defect quite a few times.
| It's a real handy program.
|
| [1] https://github.com/marxin/cvise
| ndesaulniers wrote:
| True, I've used it for asm, though these days I prefer c-vise.
| af3d wrote:
| The link to the documentation seems to be dead. Archived version:
| https://web.archive.org/web/20221203043431/https://embed.cs....
| rurban wrote:
| Of course, because it's perl, the Swiss army knife. Reducing
| token sets, blocks, statements is pretty language agnostic. It
| would have been an overarchitectured multi-file mess in python
| edwcross wrote:
| A "super-parallel Python port" of C-Reduce, cvise
| (https://github.com/marxin/cvise), is actually written in
| Python.
| rurban wrote:
| See? What I said
| ajb wrote:
| This kind of algorithm needs to be better known. For example, the
| original DD algorithm can be used to automatically minimise the
| set of cloud permissions for a service (from a given set) (A PoC
| is aws-policy-minimize here:
| https://github.com/kanocomputing/aws-tools/)
| pyrolistical wrote:
| Makes me think about an alternative one that is language specific
| and uses genetic algorithm.
|
| You start with your large repo case
|
| 1. Use language specific mutations that both add/delete
|
| 2. Crossover
|
| 3. Test fitness to see if they retain the compiler bug and sort
| by code size
|
| 4. Keep top N
|
| 5. Repeat
|
| This allows the code temporarily get larger to get out of local
| minimums.
___________________________________________________________________
(page generated 2024-11-28 23:00 UTC)