hngopher.com

       [HN Gopher] You can use C-Reduce for any language
       ___________________________________________________________________
        
       You can use C-Reduce for any language
        
       Author : Tomte
       Score  : 446 points
       Date   : 2024-11-27 17:56 UTC (1 days ago)
        
 (HTM) web link (bernsteinbear.com)
 (TXT) w3m dump (bernsteinbear.com)
        
       | jchw wrote:
       | I did not know about C-Reduce until just now, and I'm already
       | hooked. This is basically like discovering git bisect for the
       | first time.
       | 
       | I'll have to stuff that into the back of my mind until I run into
       | something where it might fit.
        
         | __MatrixMan__ wrote:
         | Same story, I can't wait to try this.
        
         | geon wrote:
         | I ran into some issues with what might be compiler bugs in
         | cc65, a C compiler for the 6502 processor (c64, NES, Apple 1).
         | 
         | I might see if I can get this set up. Apparently VICE supports
         | "printing" to a file on the host OS, so I could run my test on
         | the emulator.
        
         | brightball wrote:
         | ...so I just discovered git bisect for the first time and this
         | is pretty cool.
        
           | crdrost wrote:
           | Yes! The actual commands that you have to search up to get it
           | to run automatically without user input, take almost as long
           | to put together as finding the bug that you're finding (in
           | the 80% case, of course there are 20% cases that take
           | forever) but there is something so satisfying about seeing
           | the thing just humming along once you have it set, that just
           | makes it so satisfying.
        
             | dataflow wrote:
             | > Yes! The actual commands that you have to search up to
             | get it to run automatically without user input, take almost
             | as long to put together as finding the bug that you're
             | finding
             | 
             | Funny, I just wrote the exact same thing only to scroll
             | down and see you had the same opinion:
             | 
             | https://news.ycombinator.com/item?id=42262579
        
         | com2kid wrote:
         | Doing this manually was part of my first job out of college on
         | a C/C++ compiler team. Kind of amazing that there is automation
         | that can accomplish the same thing!
        
         | pfdietz wrote:
         | It's great when combined with random test input generators.
        
           | eru wrote:
           | Yes, most property based testing libraries do this.
        
             | pfdietz wrote:
             | Hypothesis in particular has a very clever way to do test
             | reduction. The key problem is that if your test generator
             | is enforcing some constraint on the input, it's not
             | necessarily the case that naive pruning will preserve the
             | constraint. Hypothesis has a fairly general way of
             | enforcing that the generated test input continues to
             | satisfy constraints.
        
               | eru wrote:
               | Yes, Hypothesis is great. And 'shrinkray' mentioned
               | elsewhere in the comments here seems to be by the same
               | author.
        
         | dataflow wrote:
         | > I did not know about C-Reduce until just now, and I'm already
         | hooked. This is basically like discovering git bisect for the
         | first time.
         | 
         | I've known about git bisect for maybe a decade, and used it
         | maybe... twice so far? And I think at least one of those times
         | it took me longer to understand the command line options than
         | it took to just do it manually.
         | 
         | YMMV, maybe it's more useful in a larger collaborative
         | codebase, but to me it sounds life-changing in theory but I
         | haven't found it that in practice.
        
           | jchw wrote:
           | It's pretty incredible if I'm trying to find a breakage in a
           | repo I don't understand. Build's failing? Easy enough:
           | git bisect start master v1.1         # Bisect from v1.1 to
           | current master         git bisect run cmake --build .build  #
           | Find the first commit where `cmake --build` returns non-zero.
           | 
           | No need to bother trying to understand the error message, I
           | can go make coffee instead and then just look at the commit
           | that broke it. Also, this is really useful for and in
           | conjunction with Nixpkgs.
           | 
           | It's not as useful for personal projects because chances are
           | it won't tell you anything you don't already know.
        
             | dataflow wrote:
             | > It's pretty incredible if I'm trying to find a breakage
             | in a repo I don't understand.
             | 
             | > Build's failing? Easy enough:
             | 
             | > It's not as useful for personal projects
             | 
             | How often do you run into this situation, though? For your
             | scenario to arise, you need:
             | 
             | (1) A repo you're unfamiliar with (which is already unusual
             | for most people... most people spend most of their time
             | working on repositories they've become familiar with)
             | 
             | (2) The unfamiliar repo to use standard build commands you
             | already know, which is basically never the case in my
             | experience (even cmake-based projects almost always need
             | variables defined with custom names I don't know
             | beforehand, and the configuration process is so onerous I
             | find the GUI easier)
             | 
             | (3) An obscure enough breakage where the output is complete
             | garbage (otherwise the compile error would just tell you
             | the file and line, and git blame would tell you the commit
             | directly)
             | 
             | (4) Knowledge that this unfamiliar repo _actually would
             | have actually built on your machine_ at some point (which
             | is often _not_ the case because cmake isn 't hermetic and
             | breakages are often due to having the wrong toolchain
             | installed)
             | 
             | (5) For the problem to be a mere build error, as opposed to
             | some running of the program that requires investing time to
             | automate
             | 
             | (6) For you to not have any sort of prior CI or other form
             | of validation on your dependencies, because you apparently
             | didn't catch even catch a build error when it was
             | introduced (never mind automated testing)
             | 
             | I could go on, but these set of circumstances are more or
             | less indistinguishable from ~never to me.
        
               | _flux wrote:
               | Well, the conditions were satisfied with pipewire, that
               | introduced some dependency at some point that were not
               | satisfiable in Debian Stable.
               | 
               | So, for now, I'm stuck with the version one commit before
               | that dependency :). (I haven't tried undoing the change
               | later, I suspect it might be dependend upon later on..)
        
               | jchw wrote:
               | > (1) A repo you're unfamiliar with (which is already
               | unusual for most people... most people spend most of
               | their time working on repositories they've become
               | familiar with)
               | 
               | I don't understand a _ton_ of things. I 've bisected
               | Linux, Chromium, Wine, Nixpkgs, and many more. These are
               | projects whose codebases I was never going to understand,
               | the codebases are too big for one person to ever fully
               | grasp. Then there's smaller codebases where I could if I
               | wanted to, but if I don't have to, why bother?
               | 
               | > (2) The unfamiliar repo to use standard build commands
               | you already know, which is basically never the case in my
               | experience (even cmake-based projects almost always need
               | variables defined with custom names I don't know
               | beforehand, and the configuration process is so onerous I
               | find the GUI easier)
               | 
               | In real life I make great use of Nixpkgs understandings
               | of how to build things. Nix development shells take care
               | of the dependencies and the build commands. However, I do
               | have confidence that I can figure out the actual desired
               | build commands of _most_ random repositories in under
               | five minutes. This applies to private repos at work and
               | random open source projects. Even for autotools projects.
               | Usually, not that much hoopla is needed, most CMake
               | projects don 't need custom cache variables to make a
               | basic functional build.
               | 
               | > (3) An obscure enough breakage where the output is
               | complete garbage (otherwise the compile error would just
               | tell you the file and line, and git blame would tell you
               | the commit directly)
               | 
               | Nah, not necessarily, really just any breakage where
               | you're not sure what happened. Compiler errors are not a
               | great use case for git bisect, but _test failures_ are a
               | really good use case. Likewise, git bisect is great for
               | crashes. Even if you can 't automate the full process
               | easily, for e.g. a GUI program, it's still a lot less
               | work to periodically repeat some steps to crash or
               | cleanly exit a GUI program to help the bisect along.
               | Still, some compiler errors are great for Git bisect.
               | Basically any weird linker error is a good candidate.
               | 
               | > (4) Knowledge that this unfamiliar repo actually would
               | have actually built on your machine at some point (which
               | is often not the case because cmake isn't hermetic and
               | breakages are often due to having the wrong toolchain
               | installed)
               | 
               | Or just test it.
               | 
               | > cmake isn't hermetic
               | 
               | Nix is, so no problem!
               | 
               | > (5) For the problem to be a mere build error, as
               | opposed to some running of the program that requires
               | investing time to automate
               | 
               | Not true. It's harder to automate GUI failures, but that
               | does not mean bisecting isn't incredibly valuable for GUI
               | apps. Remember, in a bisect where you never have to skip
               | a commit (e.g. intermediate builds _rarely_ fail for
               | unrelated reasons, a very common case surprisingly)
               | bisect only ever takes log2[n] steps, so even if you are
               | in a repo that has a very huge commit volume, the bisect
               | will be at most a handful of steps. Repeating the same
               | GUI action like 10 times is hardly a big chore,
               | especially for a machine to magically crank out the exact
               | origin of where the thing broke. For CLI apps, it 's even
               | easier since you can use bisect run still.
               | 
               | > (6) For you to not have any sort of prior CI or other
               | form of validation on your dependencies, because you
               | apparently didn't catch even catch a build error when it
               | was introduced (never mind automated testing)
               | 
               | Not true. Everyone uses CI, but CI will _never_ catch
               | 100% of regressions. There 's a combinatorial explosion
               | of potential build environments and there's _no way_ a CI
               | environment can test every possibility. Can 't test with
               | every library version, every release of every compiler,
               | every combination of using vendored library vs system
               | library or so on.
               | 
               | > I could go on, but these set of circumstances are more
               | or less indistinguishable from ~never to me.
               | 
               | For me it happens more often touching open source things
               | than at work, but really the probability it will be
               | useful increases as the size and inscrutability of a
               | project increases. For trying to debug Chromium or
               | Electron issues, bisect reigns supreme. (My current
               | workplace doesn't ship software built on top of Chromium,
               | but replace Chromium with any other big and inscrutable
               | dependency and you get the idea.)
               | 
               | I suspect it may be the case that I bother to root cause
               | and debug things that other people would simply
               | completely write off as impossible or not worth the
               | debugging, thanks to the power of Git bisect.
        
               | dataflow wrote:
               | Just to clarify: I wasn't disagreeing with the use cases
               | you described in _this_ comment (which are certainly
               | valid, just not common for most people I think), but
               | rather the use case in your _previous_ comment. The
               | scenarios are basically the opposite of each other, hence
               | my replies above. For example:
               | 
               | >> Build's failing? git bisect run cmake --build .build
               | 
               | > Compiler errors are not a great use case for git
               | bisect, but _test failures_ are a really good use case
               | 
               | These aren't build failures.
               | 
               | > For trying to debug Chromium or Electron issues, bisect
               | reigns supreme.
               | 
               | These are neither a simple CMake! Even their build
               | process changes over time.
               | 
               | (Side note, I'm surprised that for a massive third party
               | dependency that you don't understand, the specific commit
               | is what you really want. I would've thought you'd just go
               | to the previous version, which is presumably more stable
               | and supported too.)
               | 
               | >> No need to bother trying to understand the error
               | message, I can go make coffee instead and then just look
               | at the commit that broke it.
               | 
               | > Not true. It's harder to automate GUI failures, but
               | that does not mean bisecting isn't incredibly valuable
               | for GUI apps.
               | 
               | I'm not sure how we got into discussing GUIs, but if
               | you're debugging a dependency that's a GUI, then having
               | to bisect it (as useful as that is) is very much the
               | opposite of "no need to understand it, just go grab a
               | coffee"!
               | 
               | All of which is to say: I'm not claiming git bisect is
               | useless by any means. It's nice that it's there and you
               | can use it. I'm just saying that it's often (often !=
               | always) a small difference in cost vs. bisecting
               | manually.
        
               | jchw wrote:
               | > These aren't build failures.
               | 
               | While they are not _compilation_ failures, I do sort-of
               | consider unit test failures to be _build_ failures if the
               | unit tests are typically ran as part of the build. But
               | either way, I do still use bisect to find the origin of
               | weird compilation errors, FWIW. As an example, I used it
               | on Envoy proxy to figure out a really strange linker
               | error on Darwin. It 's just not great for e.g. a syntax
               | error, because compiler diagnostics will give a more
               | obvious answer most of the time. I do agree with that.
               | 
               | > These are neither a simple CMake! Even their build
               | process changes over time.
               | 
               | Chromium and Electron are both fairly easy builds. Not
               | because they are simple to build, but rather because the
               | build process is extremely automated and documented. (The
               | real hard part is balancing concurrency with OOM:
               | Electron defaults to -j200 and on my 7950X3D it will
               | happily eat _all_ of the 128 GiB of RAM. Keeping modern
               | CPU cores fed is hard work...)
               | 
               | > (Side note, I'm surprised that for a massive third
               | party dependency that you don't understand, the specific
               | commit is what you really want. I would've thought you'd
               | just go to the previous version, which is presumably more
               | stable and supported too.)
               | 
               | Well, I want the bug to be fixed. Even if I am not going
               | to submit the actual patch that fixes it, providing
               | bisect details in my bug report will vastly increase the
               | odds that it gets fixed. However, with the magic of
               | bisect I often _can_ make working patches for complex
               | bugs and then patch them. If it 's software I use on my
               | system, I can then add a temporary patch in my Nix
               | configuration and continue to run the latest version. I
               | don't always do this, but it is very useful. I don't like
               | to rely on hoping someone else will figure out my bugs if
               | I can help it.
               | 
               | I am also a package maintainer in some cases. If I run
               | into a bug on something I maintain a package for, I will
               | often send a PR fixing it and then pull that patch into
               | my package temporarily.
               | 
               | > I'm not sure how we got into discussing GUIs, but if
               | you're debugging a dependency that's a GUI, then having
               | to bisect it (as useful as that is) is very much the
               | opposite of "no need to understand it, just go grab a
               | coffee"!
               | 
               | I'm not sure what you mean, I'm just saying that git
               | bisect works even for complicated runtime issues. There's
               | no need to debug the GUI: all you have to do is tell git
               | bisect if the bug is still there in whatever commit it
               | dumps you in. You do it a few times and you have the
               | commit that broke things. It's magic every time.
               | 
               | As an example of a GUI failure I've "debugged" with this,
               | I found a regression in Wine using this technique.
               | 
               | > All of which is to say: I'm not claiming git bisect is
               | useless by any means. It's nice that it's there and you
               | can use it. I'm just saying that it's often (often !=
               | always) a small difference in cost vs. bisecting
               | manually.
               | 
               | Bisecting manually? I don't understand. Do you mean still
               | using git bisect without using the run command? I do that
               | when debugging more complex failures so I can skip
               | commits broken for unrelated reasons. Or, do you mean
               | literally manually picking commits to test and bisecting
               | like that? If so, I'm confused. Starting a git bisect is
               | a single command where you pass it the bad and good
               | commit. The rest of the commands don't even have to be
               | memorized because it tells you what to run (not that it's
               | hard since its just git bisect good, git bisect bad, git
               | bisect skip...) I cannot fathom it being less work to do
               | this manually, especially since git bisect handles
               | complex situations where there are diverging merges to
               | explore.
               | 
               | Even at Google, where the CL numbers were pretty much
               | linear, we _still_ had a bunch of tooling around
               | bisecting google3 to find regressions. I 'm a little
               | surprised it's any question that it's worth it, you can
               | bisect 100,000 commits in ~17 steps without sitting there
               | trying to figure out which commits you need to pick.
               | That's incredible value and it has helped me root cause
               | bugs in all kinds of stuff, like pure magic.
               | 
               | Anyway, I'm not trying to be condescending, but as you
               | may have gathered I actually bisect fairly frequently and
               | I also am very sure it is saving me time every time I do
               | it. Especially since I reckon more than half the time I
               | don't really have to do anything, so even when the build
               | can take a long time it's still not a problem.
        
               | dataflow wrote:
               | I should perhaps clarify, this whole discussion I'm
               | referring to the usefulness of the _git bisect_ command
               | specifically, not _bisection_ as a technique. So these
               | weren 't in any way intended to be a comment on e.g.
               | tooling you may have had at Google. I bisect things _all
               | the time_ -- across files, across lines, and across time.
               | It 's just ~never via git-bisect.
               | 
               | Please also note I have no doubt that _your_ workflow
               | gets immense value from git-bisect! Almost every tool is
               | incredibly valuable to certain subset of users and
               | workflows. What I 've been debating is just how useful I
               | see it being to the most typical users. I know I haven't
               | actually come across _anyone_ using git-bisect for
               | several years now, and I myself haven 't either.
               | Bisection as a technique itself is incredibly useful for
               | everybody -- it's just not something I see done via git-
               | bisect with any frequency, is all.
               | 
               | > Bisecting manually? do you mean literally manually
               | picking commits to test and bisecting like that? If so,
               | I'm confused
               | 
               | That's what I mean, yeah.
               | 
               | > The rest of the commands don't even have to be
               | memorized because it tells you what to run (not that it's
               | hard since its just git bisect good, git bisect bad, git
               | bisect skip...) I cannot fathom it being less work to do
               | this manually
               | 
               | While I didn't actually intend to say it's _faster_ to
               | avoid git-bisect -- just that it 's a _similar_ amount of
               | effort -- now that you mention it, I actually _do_ think
               | it is often faster for me. There are multiple reasons
               | why:
               | 
               | - Despite it taking a few more iterations in the worst
               | case in theory, manually picking my own commits lets me
               | split based on e.g. version numbers, rather than random
               | commits. I find it much, much more useful to know that a
               | problem came up between v4.3.1 and v4.4, rather than
               | between commit 1133728 and commit 1133262, say. And I
               | know -- and this is key -- that those commits will be
               | well-tested, stable, and released to the public. So I
               | don't have to worry about encountering random bugs in the
               | middle, even unrelated ones. By contrast, "git bisect
               | skip" might seem easy to type in 3 seconds, but if you
               | encounter a _single_ failed build, then you lose all that
               | time you  "saved", and then an _enormous_ amount on top
               | of that, merely by virtue of having to build  & testing
               | those failed iterations. I really, really wouldn't want
               | to do a full build of Chromium -- even an "incremental"
               | one -- just to encounter a bug in it and have to skip to
               | another commit!
               | 
               | - I actually almost always have _some_ idea of where the
               | bug is, merely based on the tags /versions, or commit
               | messages, or just the timestamps. Sometimes, literally
               | searching for a keyword pops up a relevant commit for me
               | to test, and then doing an incremental build with the
               | commit immediately after is _much_ faster (almost free)
               | compared to doing what ends up being a ~full rebuild for
               | a distant commit. So it actually _does_ end up being far
               | fewer iterations and less time in some cases.
               | 
               | - Git bisect is stateful. It means your repo is no longer
               | in a normal state, it's in the middle of a bisect! That's
               | annoying to say the least. I've gotten burned way too
               | many times by running the wrong command or forgetting
               | that I'm in the middle of an interactive merge or rebase.
               | I don't want to have to think about an _additional_ state
               | I could be in when I leave my work and come back to it.
               | It 's a huge cognitive burden, and I would really rather
               | avoid unless it's _really_ clearly going to pay for
               | itself.
               | 
               | Again, I'm not saying these trade-offs apply the same way
               | to you. You encounter scenarios that I clearly don't :-)
               | I'm just explaining where I'm coming from, and (what I
               | believe are) the reasons I don't really see other people
               | using git-bisect frequently.
               | 
               | > especially since git bisect handles complex situations
               | where there are diverging merges to explore.
               | 
               | Diverging merges is a super interesting case, I've never
               | needed to deal with that with bisection before (whether
               | manual or automatic). If your history is nonlinear then I
               | definitely see a much larger value in git-bisect. Though
               | at the same time the result isn't even necessarily a
               | single commit, so I'm not even sure what exactly I would
               | expect it to do in that case!
        
           | SnowflakeOnIce wrote:
           | At a previous job, with a huge C++ codebase and ~100
           | developers over 20 years, many platforms supported, a build
           | could take hours, and the test suite took so long to run that
           | it would take several days or weeks of real time to get
           | through it all.
           | 
           | This cycle time combined with occasional unexpected
           | interactions between components meant that in every release
           | cycle, there were dozens of complicated failing tests where
           | it was not obvious which code change was responsible.
           | 
           | `bisect` here was extremely helpful: instead of having to
           | pore over commit history and think very hard, I could bisect
           | with a small wrapper script that would build and run the
           | failing test in question. Builds still took hours, but I
           | could usually autonatically pinpoint the responsible code
           | change for one of the tests overnight.
           | 
           | (This was not using Git, but Perforce, for which I had to
           | write `p4-bisect`. Alas, it's not open-source...)
        
             | dataflow wrote:
             | > in every release cycle, there were dozens of complicated
             | failing tests
             | 
             | Sorry if this is a dumb question, perhaps I'm not
             | understanding what you mean by every release cycle, but
             | does this mean nobody in the team/company ran tests until
             | it was time for release? That sounds like a really big
             | everlasting wound to put a tiny git-bisect bandage on!
        
               | botanical76 wrote:
               | At least in my "Agile" experience, it is always time for
               | release.
        
               | Someone wrote:
               | > does this mean nobody in the team/company ran tests
               | until it was time for release?
               | 
               | The OP wrote "many platforms supported, a build could
               | take hours, and the test suite took so long to run that
               | it would take several days or weeks of real time to get
               | through it all"
               | 
               | = I think they did run tests, but typically not all of
               | them on all platforms.
        
               | SnowflakeOnIce wrote:
               | About a dozen OSes / configurations were supported, and
               | the entire test suite would take a few days to run on
               | each such configuration.
               | 
               | This was native desktop+server software, not a hosted
               | SaaS thing.
               | 
               | Major releases were put out every few months.
               | 
               | Developers _did_ run tests regularly with every change
               | they would make, but it was infeasible to run all the
               | tests in every configuration for each change. So they
               | would try to choose tests to run that seemed most
               | relevant to the code they had changed.
               | 
               | The entire test suite would run more or less constantly
               | on shared machines, and every few days some new tricky
               | failure would be detected. The tricky failures were
               | almost always the result of some unanticipated
               | interaction of features, frequently on the more obscure
               | configurations (like IBM z/OS).
               | 
               | The problem was not that developers were not testing, but
               | that the tests were infeasible to run every time. So
               | instead, testing became an optimization problem.
        
               | dataflow wrote:
               | Gotcha, thank you for the explanation! Indeed it sounds
               | like something git bisect is perfect for, when you're in
               | these sorts of situations!
        
           | deskr wrote:
           | > but I haven't found it that in practice
           | 
           | That's perfectly fine, and I'd say lucky you.
           | 
           | Bisect is a type of tool that you hope you won't need, but
           | when you need it it's a life saver. You're right about the
           | larger collaborative codebase. Imagine trying to find a
           | strange bug that got introduced sometime in the last 12
           | months in a huge and active repo.
        
       | furyofantares wrote:
       | Without an explanation of _why_ it works on languages other than
       | C, that is a hard claim to believe! I mean, I don 't think
       | they're lying but (given it doesn't use an LLM) I am bewildered.
        
         | johnisgood wrote:
         | Yeah, I would have liked to see more examples and an
         | explanation.
        
           | evmar wrote:
           | I glanced at the code. It seems to have multiple possible
           | "passes" which reduce the code in various ways, and the
           | passes here not tagged with "C"=>1 are used in the not-c mode
           | recommended in the post.
           | 
           | https://github.com/csmith-
           | project/creduce/blob/31e855e290970...
           | 
           | You can see the pass implementation in files on the left.
        
             | wat10000 wrote:
             | The key thing is that the transforms it makes aren't
             | required to always produce a program that can actually run,
             | or even build.
             | 
             | The user provides a script which determines if a program is
             | "interesting." A program with build-time errors should be
             | considered "not interesting" in most cases. (And if you're
             | hunting an incorrectly reported error, you'll want to make
             | sure you only consider that error to be interesting.)
             | 
             | Then it yoinks out parts of your program and checks if the
             | result is still interesting. If it's not, that attempt is
             | discarded and it tries something else. If the reduced
             | program is still interesting, it will try yoinking out more
             | stuff. Repeat until you like the result.
             | 
             | There doesn't need to be any understanding of the program
             | in order for this to work. You just need something where
             | removing some bit of the program has a chance of producing
             | a new program that's still interesting. That works in most
             | programming languages.
             | 
             | This process can take a long time, and it's slower when
             | there's a lower probability of a removal producing an
             | interesting program. Thus heuristics are added: don't
             | remove random character ranges, but work at a token level.
             | When removing a brace, find the matching brace and remove
             | the insides. That sort of thing. Most modern languages are
             | similar enough to C that many of these rules are helpful
             | for them too. And even if they don't help, this just slows
             | the process down, but it still works.
        
         | bqmjjx0kac wrote:
         | Yes, and without understanding how it works, I'm left wondering
         | whether it's even safe to use this way. Will creduce execute
         | mutated versions of the input script, potentially deleting my
         | files or eating my lunch?
        
           | None4U wrote:
           | it will execute whatever script you pass it (immutable) and
           | mutate whatever other files you pass it
        
             | defen wrote:
             | I think the question is, how do you know C-reduce isn't
             | going to mutate your test case into calling
             | `remove("/path/to/my/important/file");`
        
               | frabert wrote:
               | C-reduce is meant to... Reduce your files, it would not
               | add stuff that was not there in the first place. Also, I
               | think it's meant to only be run against the "frontend" of
               | most languages, not full execution
        
               | chubot wrote:
               | Yeah but you could have a shell script that does
               | rm -rf /foo/bar
               | 
               | and in theory it could reduce it to                   rm
               | -rf /
               | 
               | The passes are very heuristic, so that doesn't seem like
               | something you can rule out.
               | 
               | (and I actually want to run it on shell scripts! )
        
               | Cieric wrote:
               | While I've used c-reduce before, I've never done it in a
               | way where it could be destructive. However speculating
               | based on what I do know. I think the 2 things I would do
               | would be in the interesting-ness test, grep for already
               | known harmful instructions and force them to be
               | uninteresting (returning 1). And then if I was still
               | unsure of the potential harm of the program and I had to
               | run it to determine if it's interesting or not (segfault
               | or something similar). I think I would hoist the
               | binary/script into a docker container and run it there.
               | That way the log and result can still be checked, but
               | it's access to actual file system is minimized as much as
               | possible.
               | 
               | TLDR; C-Reduce just gives you text to run/compile, if
               | you're worried about things like that sandbox as much as
               | possible.
        
               | btilly wrote:
               | So just run it inside of a Docker instance.
        
         | Kuinox wrote:
         | Without looking at the paper, I would guess it would be like a
         | fuzzer that apply mutations that reduce the size of the input.
        
           | gnulinux wrote:
           | Isn't that potentially unsafe though? Random permutations can
           | reduce arbitrary programs to destructive programs, in
           | particular any program can be mutated to `rm -rf /` given
           | enough time. Also even the problem of finding syntactically
           | valid programs is combinatoric, it's pretty surprising that
           | it can go toward a local optima in such a short time without
           | having any understanding of the function or its derivatives.
        
             | jewel wrote:
             | For compiled languages it should be fine, as you're only
             | going to compile the permuted source code, not execute it.
             | 
             | Given a sufficiently bad compiler bug anything is possible,
             | but I think given that it's trying to minimize the size of
             | an input that gives a particular output I don't think it's
             | likely to explore distant branches.
        
               | Quekid5 wrote:
               | > Given a sufficiently bad compiler bug anything is
               | possible, ...
               | 
               | I can't help but wonder about the consteval/constexpr
               | machinery in the various C++ compilers... It'd be
               | interesting to see how much adversarial input has been
               | tested for those.
               | 
               | (I guess Zig's comptime might also be relevant, but I'm
               | not sure what that's allowed to do. Maybe execute
               | arbitrary code?)
               | 
               | ... anyway just a stray thought.
        
               | cztomsik wrote:
               | IIRC there's `zig reduce` builtin in Zig :) https://githu
               | b.com/ziglang/zig/blob/master/lib/compiler/redu...
        
               | adastra22 wrote:
               | It runs the executed code in order to determine if the
               | bug exists, does it not?
        
               | Cieric wrote:
               | That depends completely on the interesting-ness test
               | that's provided. If you're looking for a compiler bug
               | (like I do often for my language), then the interesting-
               | ness test checks the compile logs for information like
               | the "Segmentation Fault" text, no need to run the actual
               | executable. You could also hoist everything into docker
               | if you really want to, or ship it off to another device
               | to run.
        
               | 0x457 wrote:
               | > For compiled languages it should be fine, as you're
               | only going to compile the permuted source code, not
               | execute it.
               | 
               | Only if you'te going for compiler bug. If you're working
               | on minimal reproducible example. You need to make sure
               | your code isn't reduced to:
               | 
               | int main() { return 0; }
               | 
               | in that case.
        
               | _flux wrote:
               | I don't think that should trigger a reasonable
               | interesting condition one has specified, so that should
               | not happen. So I suppose for non-compiler-bugs you need
               | to check (from the output) that the correct thing is
               | being done, in addition to detecting the actual issue.
        
             | bawolff wrote:
             | It seems pretty unlikely to mutate in a malicious
             | direction, random chance probably wouldn't get there, and
             | there doesn't seem like any reason it would be guided in
             | that direction.
             | 
             | "Enough time" does a lot of work here and warrants further
             | analysis. With enough time it might produce the works of
             | shakespare (if you ignore its designed to minimize), but we
             | should probably view anything that would take more than a
             | year as too much times.
        
             | eru wrote:
             | Use a sandboxed VM?
        
             | ndjdjddjsjj wrote:
             | Docker saves the day again!
        
         | wiml wrote:
         | I recommend skimming the PLDI paper linked by asmeurer, it has
         | a good summary. Some of its transforms are pretty C-specific,
         | using the Clang frontend; some are pretty generic and probably
         | work on any Algol-descended language. It's meant to be a
         | modular tool, so you could add transforms that understand other
         | languages if you like.
        
         | keybored wrote:
         | > I mean, I don't think they're lying but (given it doesn't use
         | an LLM) I am bewildered.
         | 
         | I'm pretty sure (based on the last time I saw this) that this
         | is just good old fashioned computer science.[1]
         | 
         | I really hope that HN hasn't forgotten about good old fashioned
         | computer science stuff already.
         | 
         | [1] All the algorithm and whatnot stuff, not the spooky machine
         | learning stuff. Even things like Prolog-for-AI, although that
         | has the slight downside of not working (for the purposes of
         | making AI).
        
           | furyofantares wrote:
           | To be clear my comment was meant to be an awareness that it
           | is good old fashioned computer science. Without LLMs, which
           | this predates, it is surprising to me that you'd have a lot
           | of success reducing a program in an arbitrary language and
           | still having something that's valid syntax!
           | 
           | I do get that it will reject a lot of stuff as not working
           | (and has to even in the target language) and I also get that
           | algol-like languages are all very similar, but I am still
           | surprised that it works well enough to function on ~arbitrary
           | languages.
           | 
           | These are very LLM-era properties for a program to have. The
           | question is not "does it work for language x" but "how well
           | does it work for language x", and the answer is not "is it
           | one of the languages it was designed for" but instead "idunno
           | try it out and see".
        
             | ndjdjddjsjj wrote:
             | An AST for you favourite language is readily availble so
             | get one and go nuts.
             | 
             | I am sure in 2006 or so I was doing "extract method" in
             | Resharper. And Lispers probably doing it for centuries ;)
        
               | Philpax wrote:
               | The fascinating part isn't that it can modify code
               | accordingly to some rules - it's that it can modify code
               | from _arbitrary_ languages without having a grammar or
               | semantic analysis for them.
        
               | ndjdjddjsjj wrote:
               | I think the tool has some grammar info: C, C++, C#, Java,
               | Go, JS are similar.
               | 
               | And that is enough when:
               | 
               | 1. Most code in a code file is irrelevant to whatever
               | specifc bug is being looked at.
               | 
               | 2. It is removal of code at play not writing code.
        
         | jcranmer wrote:
         | So the short answer is that some of the reduction passes are
         | pretty generalizable to C-family languages, and these can be
         | some of the most effective passes.
         | 
         | Some of those passes are:
         | 
         | * Tokenize the input according to C, and randomly chuck away
         | chunks of length 1-(I think) 13. Most languages vaguely similar
         | to C have very similar tokenization rules to C, so these kinds
         | of passes are going to have similar effects, and this will tend
         | to be effective in doing things like omitting needless
         | qualifiers or attributes.
         | 
         | * Balanced parentheses, and chuck away units of balanced
         | parentheses. This includes (), {}, and []--and for almost any
         | language, this can be a useful level of stuff to throw away or
         | reduce.
         | 
         | * Stripping comments and whitespace. Again, many languages use
         | the /* and // styles of comments that C does, so this is pretty
         | effective.
         | 
         | There's actually relatively few passes that are incredibly
         | C/C++-specific, and in my experience, one of the biggest issues
         | with creduce is that it is absolutely terrible at trying to
         | detemplatize the code as a reduction step, which should be a
         | relatively easily automatible step.
        
           | ndjdjddjsjj wrote:
           | The compile and test condition do the heavy lifting.
        
         | DRMacIver wrote:
         | I think the easiest way to think about it is that it's gradient
         | descent, it's just a slightly weird form of it over a discrete
         | space.
         | 
         | If you imagine you've got some neighbourhood function N(x) that
         | takes a program and returns a large sample of valid programs
         | that are "similar" to x (its neighbourhood), then your basic
         | test-case reduction algorithm is just:
         | 
         | 1. Start with some initial test case. 2. Look at the
         | neighbourhood of your current test case. If any elements of it
         | are both interesting and smaller than your current test case,
         | move to one of those. If not, stop. 3. Return the smallest
         | interesting test case you've seen at the end.
         | 
         | And most of what varies from test-case reducer to test-case
         | reducer is that neighbourhood function (how you explore the
         | neighbourhood also varies, but you can think of that as an
         | implementation detail that mostly only affects performance).
         | 
         | This approach will generally work pretty well because the sort
         | of bugs you tend to find are "dense in program space". This
         | will e.g. do nothing if you start from the property that the
         | file has a particular sha1 hash, but if you start from a
         | property like "this uses this particular pair of language
         | features", most nearby programs will share it.
         | 
         | The nice thing about doing test-case reduction is that it
         | doesn't actually matter if your neighbourhood contains only
         | valid programs, because you can rely on the interestingness
         | test to filter out any invalid program. So all you care about
         | really is:
         | 
         | 1. It contains a fairly large set of valid programs among the
         | test cases you consider. 2. It's not too large so as to be
         | prohibitive to explore.
         | 
         | And this frees you up to just try a bunch of heuristics that
         | are likely to work pretty well, and it turns out that there are
         | a lot of easily accessible ones. In particular, deleting
         | contiguous chunks of a program pretty often produces a valid
         | program, which will always be shorter.
         | 
         | For a trivial example, imagine you've got something like:
         | 
         | if True:                   blah()
         | 
         | in a Python program. It might look like you need to know about
         | Python syntax to reduce this to just `blah()`, but in fact you
         | can reduce it by deleting the 13-byte string `"if True:\n "`.
         | So if your reducer is happy to brute force try all short
         | strings, it will be able to make this transformation.
         | 
         | This isn't a real example exactly. C-reduce I think won't find
         | this particular one (but I might be misremembering its exact
         | heuristics here). Shrinkray will, but it will do so by
         | understanding Python - it stops at 10-byte sequences, which
         | won't work here. But it demonstrates the sort of thing that can
         | work surprisingly well without understanding syntax.
         | 
         | Another thing you can do is you can understand just enough
         | syntax in a general purpose way. For example, shrinkray (and I
         | think C-Reduce) will look for integer literals in the program
         | and try replacing them with integer literals representing a
         | smaller number. In order to do this, it doesn't have to know
         | anything about the program's lexical syntax, it just has to
         | know what a number looks like.
         | 
         | Heuristics like this will often fail to work, but that's OK,
         | you just need them to succeed enough, and often some pretty
         | crude stuff will get you down to something that is small enough
         | that a human can get it the rest of the way.
        
       | graypegg wrote:
       | Found this article with examples of before and after:
       | https://pramodkumbhar.com/2024/01/c-reduce-systematically-ta...
       | 
       | But still having trouble understanding how it knows WHAT to
       | remove when trying each iteration. There must be some
       | tokenization going on, but then I don't know how that would work
       | across programming languages
        
       | wood_spirit wrote:
       | Creduce is awesome.
       | 
       | I used a test script that spent hours using CSmith to generate
       | random test programs when developing an esoteric llvm target.
       | When they crashed it automatically Creduced and left them for
       | inspection. Invaluable!
        
       | asmeurer wrote:
       | A paper by the authors John Regehr et al. from 2012 explaining
       | how it works https://fsl.cs.illinois.edu/publications/regehr-
       | chen-cuoq-ei....
        
         | gnulinux wrote:
         | I read this paper and I still feel lost as to _how can this
         | even be possible_. It seems to understand how to tokenize,
         | merge lines, remove tokens etc for arbitrary programming
         | languages. Is there another paper that explains this algorithm
         | alone?
        
           | IshKebab wrote:
           | It basically runs transformation passes over the code until
           | they stop changing the code or break its behaviour.
           | 
           | It seems like a small number of the passes are not specific
           | to C:
           | 
           | https://github.com/csmith-
           | project/creduce/blob/master/creduc...
           | 
           | `"C" => 1,` means it is only for C.
           | 
           | I would guess `pass_lines` is the most important for non-C
           | code; I'm guessing (it's written in unreadable Perl) that it
           | removes lines.
           | 
           | So while it _can_ work for languages other than C, most of
           | its features are C-specific so it 's not going to work nearly
           | as well. Still I'd never heard of C-Reduce; pretty cool tool!
           | 
           | Someone make one based on Tree Sitter, quick!
        
             | the8472 wrote:
             | douzo https://langston-barrett.github.io/treereduce/
        
             | codesnik wrote:
             | unreadable perl?! I clicked on the link expecting something
             | super-terse and unstructured, but it's anything but. What
             | the hell is wrong with people casually dropping such
             | remarks.
        
               | dazed_confused wrote:
               | Because if you aren't writing in python or JS/TS then it
               | is gibberish. Not like Perl is used all throughout Linux
               | system...
        
           | dataflow wrote:
           | I was wondering the same thing but I guess the key is
           | probably that you don't actually have to do it correctly
           | every time. Just tokenizing based on a few common
           | characteristics (brace pairing, quotes, indentation,
           | newlines, etc.) should let you trim a ton without knowing
           | anything about the language, I imagine?
           | 
           | My real worry is what if this ends up running dangerous code.
           | Like what if you have a disabled line that writes instead of
           | reading, that randomly gets reactivated?
        
             | steventhedev wrote:
             | It's intended to run for producing compiler test cases, so
             | there shouldn't be any code that's actually running.
             | 
             | CPython includes a flag to only run parsing/compiling to
             | bytecode. While you can use it like they did here and run
             | the code - it really depends on how much you trust every
             | possible subset of your code
        
             | summarity wrote:
             | > Just tokenizing based on a few common characteristics
             | (brace pairing, quotes, indentation, newlines, etc.) should
             | let you trim a ton without knowing anything about the
             | language, I imagine?
             | 
             | Yep, here's an explanation from a related tool, which was
             | spawned by the parser-parser-combinator[0] approach to
             | syntactical analysis and transformation:
             | https://comby.dev/blog/2021/03/26/comby-reducer - which is
             | based on what you've said.
             | 
             | [0] - https://www.youtube.com/watch?v=JMZLBB_BFNg
        
         | nyrikki wrote:
         | This paper is not about C-Reduce as a whole but 3 new "domain-
         | specific test-case reducers" that the authors added to the
         | project.
         | 
         | Most of the non-domain specific reductions in C-Reduce is
         | simply brute force IIRC.
        
       | lambda wrote:
       | Note that as recommended by John Regehr, author of C-Reduce, for
       | this use case you might also want to try Shrinkray, a tool that
       | was written to be format independent and works well for cases
       | that C-Reduce dow not:
       | https://mastodon.social/@regehr/113489759789563570
        
         | canu7 wrote:
         | Looking forward to try it! A shout out to cvise
         | (https://github.com/marxin/cvise) as a python alternative that
         | works well too for non c languages.
        
         | droideqa wrote:
         | Is this[0] the official repo of Shrinkray?
         | 
         | [0]: https://github.com/DRMacIver/shrinkray
        
           | eru wrote:
           | Interesting, isn't that also the author of Python's
           | hypothesis?
        
             | lambda wrote:
             | Yes, hypothesis also does minimization of failed test
             | cases, so it's kind of a similar problem, just a question
             | if what format the data is in and how you invoke your test.
        
             | DRMacIver wrote:
             | Yes, that's me.
             | 
             | I accidentally got obsessed with test-case reduction as a
             | result of writing Hypothesis, and wrote shrinkray because I
             | thought it was ridiculous that I hadn't put all of the
             | things I'd learned about test-case reduction into some
             | general purpose tool that other people could benefit from.
             | 
             | Shrinkray is still a bit under-advertised and rough around
             | the edges in places, but I think that basically there's no
             | good reason to use C-reduce over shrinkray for things that
             | aren't C or C++. Shrinkray has very good generic heuristics
             | that should work in most languages (it even works pretty
             | well on binary files apparently, although I've not tested
             | this myself), a couple of language specific passes for
             | other languages, and is much more fun to watch.
             | 
             | It might even be the case that you should use shrinkray for
             | C and C++ too (because I use C-Reduce's transformations for
             | those languages), but I'm less confident that that's true.
             | The best I'll confidently claim is "probably more
             | competitive than you'd expect".
        
           | lambda wrote:
           | Yep!
        
       | andrewchambers wrote:
       | I wrote a general purpose delta debugger in python -
       | https://github.com/andrewchambers/ddmin-python. Can be used to do
       | the same thing.
        
       | judofyr wrote:
       | Well, aren't you going to share the reduced file?
       | 
       | Okay fine, let's see for ourselves:                   # Setup
       | git clone git@github.com:RustPython/RustPython.git && cd
       | RustPython && cargo build --release         git clone
       | git@github.com:tekknolagi/scrapscript.git && cp
       | scrapscript/scrapscript.py .              # Download
       | interesting.sh, replace the path to RustPython         chmod +x
       | interesting.sh              # Command in article:         nix run
       | nixpkgs#creduce -- --not-c interesting.sh scrapscript.py
       | 
       | Niiice:                   (93.2 %, 13717 bytes)         (93.2 %,
       | 13683 bytes)         (93.3 %, 13614 bytes)         (93.3 %, 13592
       | bytes)         (93.3 %, 13571 bytes)         (93.3 %, 13517
       | bytes)         (93.3 %, 13449 bytes)         (93.4 %, 13412
       | bytes)         (93.4 %, 13365 bytes)         (93.4 %, 13333
       | bytes)         (93.4 %, 13313 bytes)
       | 
       | It seems to have stopped at "(96.4 %, 7347 bytes)" with the
       | following output:
       | https://gist.github.com/judofyr/47cba8a20cb2cd5798943ef975d0...
        
         | tomberek wrote:
         | Random thought. Another commenter worried about the runtime of
         | the program becoming mangled and performing destructive
         | operations on your machine. What if you run the reducer as a
         | source-to-source Nix derivation? Protects against dangerous
         | things, and can be easily distributed to remote builders.
        
           | capitol_ wrote:
           | How does this protect against dangerous things?
           | 
           | My understanding is that this would just cause the dangerous
           | things to be repeatable.
        
             | dietr1ch wrote:
             | It must be that by running any program within the nix build
             | sandbox you don't expose your files unless you discover a
             | privilege escalation attack by chance during the reduction
             | process.
        
           | judofyr wrote:
           | Oh, that's a neat idea. To reply to the sibling comment: Nix
           | runs commands under a sandbox. It's not bullet proof, but
           | useful enough that it will prevent obvious commands like
           | deleting local files and accessing the network.
           | 
           | It took some time to get RustPython to run, but it seems to
           | work fine: https://gist.github.com/judofyr/82d2255c92ebb0c6b3
           | a6882ac9fd...
           | 
           | Also, after running it for 5 minutes it was now able of
           | reducing it to a much smaller case:                   class
           | a:             def b() : c             def h(d) :
           | while d:                         break                 else:
           | return                  d.b()         def j(e) :
           | f = a()             if f.h  () :                     g
           | i = ""         j(i)
        
           | hinkley wrote:
           | You generally shouldn't use absolute paths in your programs.
           | That makes it dangerous for local development, and makes
           | running two copies of the program to compare behaviors
           | difficult.
           | 
           | There's a Venn diagram of people who don't care enough to do
           | that (works for me!) and people who would never think to use
           | c-reduce. It's not a perfect circle, but it'll be fairly
           | close.
        
             | somat wrote:
             | > You generally shouldn't use absolute paths in your
             | programs.
             | 
             | This is true, but I was enjoying the irony that there is an
             | old sys-sdmin adage that you should only use absolute paths
             | in your program(usually a shell script, in this
             | environment) this is to make sure it is running exactly
             | what you expect it to run.
             | 
             | So always put "/usr/bin/awk" instead of just "awk"
             | 
             | I had a co-worker once who took this as gospel. His scripts
             | were always... interesting... to port to a new environment.
        
         | DRMacIver wrote:
         | FWIW shrinkray gets to 162 bytes if I leave it to run for about
         | 10 minutes.
         | https://gist.github.com/DRMacIver/ee025c90b4867125b382a13aaa...
         | 
         | I think it might do a little but not a lot better if I left it
         | to run for longer but it looked to have mostly stalled then so
         | I got bored and killed it.
        
       | def- wrote:
       | Works well for SQL too, I've been using it at my day job, found
       | out via https://github.com/sqlancer/sqlancer?tab=readme-ov-
       | file#redu...
        
       | gmueckl wrote:
       | How does this compare to dustmite
       | (https://dlang.org/blog/2020/04/13/dustmite-the-general-
       | purpo...)?
        
       | dswilkerson wrote:
       | Delta debugging is not new:
       | https://en.wikipedia.org/wiki/Delta_debugging
       | 
       | My implementation of delta debugging, "delta" is 19+ years old:
       | https://github.com/dsw/delta
       | 
       | I released it as Open Source because Microsoft Research sent
       | someone to my office to ask me to, back when Microsoft was
       | calling Open Source a "cancer".
       | 
       | The LLVM introduction by Latner refers to the "standard delta
       | debugging tool", so it is rather well-known:
       | https://aosabook.org/en/v1/llvm.html 'unlike the standard "delta"
       | command line tool.'
        
         | arxanas wrote:
         | For other readers' benefit: C-Reduce is a little more
         | sophisticated than plain delta-debugging. From the abstract of
         | Test-Case Reduction for C Compiler Bugs (2012):
         | 
         | > [...] [C-Reduce] produces outputs that are, on average, more
         | than 25 times smaller than those produced by our other reducers
         | or by the existing reducer that is most commonly used by
         | compiler developers. We conclude that effective program
         | reduction requires more than straightforward delta debugging.
         | 
         | (Of course, this means that C-Reduce is 12 years old now.)
         | 
         | At the same time, C-Reduce seems to be more general than the
         | LLVM tool you linked ("BugPoint", dating to 2002), since that
         | one works with LLVM IR specifically.
         | 
         | I think most developers are generally unfamiliar with automatic
         | test case minimization tools and techniques, so the post may be
         | helpful even if the ideas have been known in their respective
         | circles for quite some time.
        
       | jheriko wrote:
       | the seems somewhat interesting but the example given is
       | incredibly stupid.
        
         | tekknolagi wrote:
         | Thank you for your feedback. I'll take that into consideration
         | for my next blog post and compiler.
        
       | wyldfire wrote:
       | A good companion/ successor is cvise.
       | 
       | I've used c-reduce/cvise to reduce assembly programs to the
       | minimal set that evokes an assembler defect quite a few times.
       | It's a real handy program.
       | 
       | [1] https://github.com/marxin/cvise
        
       | ndesaulniers wrote:
       | True, I've used it for asm, though these days I prefer c-vise.
        
       | af3d wrote:
       | The link to the documentation seems to be dead. Archived version:
       | https://web.archive.org/web/20221203043431/https://embed.cs....
        
       | rurban wrote:
       | Of course, because it's perl, the Swiss army knife. Reducing
       | token sets, blocks, statements is pretty language agnostic. It
       | would have been an overarchitectured multi-file mess in python
        
         | edwcross wrote:
         | A "super-parallel Python port" of C-Reduce, cvise
         | (https://github.com/marxin/cvise), is actually written in
         | Python.
        
           | rurban wrote:
           | See? What I said
        
       | ajb wrote:
       | This kind of algorithm needs to be better known. For example, the
       | original DD algorithm can be used to automatically minimise the
       | set of cloud permissions for a service (from a given set) (A PoC
       | is aws-policy-minimize here:
       | https://github.com/kanocomputing/aws-tools/)
        
       | pyrolistical wrote:
       | Makes me think about an alternative one that is language specific
       | and uses genetic algorithm.
       | 
       | You start with your large repo case
       | 
       | 1. Use language specific mutations that both add/delete
       | 
       | 2. Crossover
       | 
       | 3. Test fitness to see if they retain the compiler bug and sort
       | by code size
       | 
       | 4. Keep top N
       | 
       | 5. Repeat
       | 
       | This allows the code temporarily get larger to get out of local
       | minimums.
        
       ___________________________________________________________________
       (page generated 2024-11-28 23:00 UTC)