[HN Gopher] Semantic fuzzing of the Rust compiler and interprete...
___________________________________________________________________
Semantic fuzzing of the Rust compiler and interpreter [pdf]
Author : todsacerdoti
Score : 143 points
Date : 2023-11-28 13:52 UTC (9 hours ago)
(HTM) web link (ethz.ch)
(TXT) w3m dump (ethz.ch)
| sebstefan wrote:
| > A long-standing miscompilation in the Rust compiler was the
| incorrect removal of loops even if the compiler cannot tell
| whether it terminates (#28728).
|
| >This resulted in the Rust compiler apparently "proving" the
| unresolved Collatz conjecture by optimising this function to
| return true [1]. pub fn collatz(n: usize) ->
| bool { match n { 1 => true, n
| if n % 2 == 0 => collatz(n/2), _ => collatz(3\*n +
| 1), } }
|
| That's funny. Hey maybe it was just that good!
| j16sdiz wrote:
| Collatz conjecture is tested to be true for n < 2^100000 - 1.
| That is, true for all value represent-able in usize.
| Squeeze2664 wrote:
| Sounds like it was working as intended, then! Doing a great
| job at it, too.
| dllthomas wrote:
| Sounds, rather, like it was doing the right thing in that
| case but for the wrong reasons!
| Sharlin wrote:
| Oh yes, that's just LLVM's ConstantFoldCollatzConjecture
| pass, a basic optimization.
|
| > That is, true for all value represent-able in usize.
|
| A slight understatement!
| not2b wrote:
| But if we pass in an odd value that is more than half of the
| maximum value represented in a usize, we get an overflow and
| a panic when we make a recursive call with n * 3 + 1, right?
| We shouldn't be optimizing those away.
| random_ wrote:
| > The source of this bug is LLVM [2] which Rust relies on for
| optimisations and machine code generation.
|
| So basically rustc_codegen_gcc and cranelift can compile code
| that don't with LLVM and then introduce (or solve) new bugs?
| kibwen wrote:
| Note that it's entirely feasible that GCC would do the same
| in this specific case. This wasn't necessarily a "bug" in
| LLVM per se, because this is the specified behavior in C and
| C++, and LLVM was simply implementing those semantics. The
| fix here was just to allow languages to opt-out of this
| behavior. Since GCC is also primarily a C/C++ backend, you'd
| have the same problem unless they offer a similar opt-out.
|
| But in general, yes, if this _was_ unintended behavior of
| LLVM, then we would expect alternative backends to exhibit
| differing behavior.
| Arnavion wrote:
| What you're talking about was a bug in LLVM. The paper
| says:
|
| >Infinite loops and recursions are Undefined Behaviour in C
| and C++, therefore the compiler can assume that a loop
| always terminates.
|
| ... which is wrong - only C++ requires loops to terminate.
| C allows non-terminating loops (with some specific
| requirements), and LLVM was already known to miscompile C
| in this same way until it was fixed in LLVM 12 with the
| `mustprogress` attribute.
|
| https://bugs.llvm.org/show_bug.cgi?id=965
|
| https://stackoverflow.com/questions/59925618/how-do-i-
| make-a...
|
| Then this:
|
| >While LLVM does not require loops to terminate and should
| compile Rust loops correctly, its code contained
| assumptions that only hold for C/C++, thus miscompiling
| Rust code.
|
| ... is implied to be a _different_ LLVM bug after the first
| one was already fixed, ie despite using `mustprogress` LLVM
| had additional miscompilations for Rust code because of
| "assumptions that only hold for C/C++". I can't tell which
| one in table 4.2 it is, though; none of them see to match.
| gpm wrote:
| Eh, 0 is an infinite loop in that function.
|
| It also doesn't implement the collatz conjecture. It implements
| the collatz-conjecture mod (usize::MAX + 1). I don't know about
| that modulus specifically, but there are m where the collatz
| conjecture mod m is false.
|
| https://www.researchgate.net/profile/Cganesa-Moorthy/publica...
|
| Edit: (2^64 - 1) / 3 is an integer and also trivially
| infinitely loops (by going to 0).
| Georgelemental wrote:
| > It implements the collatz-conjecture mod (usize::MAX + 1).
|
| Only if you have overflow checking disabled.
| aidenn0 wrote:
| Note that in C++, all side-effect-free loops can be assumed to
| terminate, and thus eliminated.
| vacuity wrote:
| Assume is a strong word. It's defined by the standard by way
| of undefined behavior.
| tester756 wrote:
| Which is crazy, because there's always side effect like
| temperature, fans speed/noise, etc.
| not2b wrote:
| That isn't what is meant by "side effect" here. A function
| with no side effects only modifies its local variables and
| returns a result. So, if we have
|
| int foo(int a, int b) { int c = bar(a); return b; }
|
| and we can see bar's body, and therefore we can determine
| that it has no side effects, we can optimize the call to
| bar away, because we can assume that it terminates and it
| doesn't affect the result.
|
| But if bar writes to global variables, that's a side
| effect. If we can't see the body of bar we have to assume
| that it might.
| muglug wrote:
| Nikita Popov is thanked in the forward for work fixing LLVM bugs.
|
| Before he pivoted to LLVM work, Nikita was one of the key
| individual contributors to the official PHP project in the last
| decade. He's an incredible force for good.
| KMag wrote:
| Or, throwing a bit of (mostly) pretend shade on PHP, he's
| atoning for his PHP contributions by fixing LLVM bugs.
| kibwen wrote:
| Note that Nikita has had a hand in many of the improvements
| that have made PHP much more pleasant since the fractal-of-
| bad-design days:
| https://www.npopov.com/aboutMe.html#accepted-php-proposals
| hu3 wrote:
| There's nothing wrong with working on PHP and he should (and
| is) proud of his contributions. HN is no place for this level
| of snark and PL tribalism.
|
| In fact, if you check his Twitter, he retweeted the latest
| release of PHP version, 5 days ago:
| https://twitter.com/nikita_ppv
| IshKebab wrote:
| I don't think you can dismiss criticism of programming
| languages as "PL tribalism". The idea that all programming
| languages are equally great and "best tool for the job"
| excuses all flaws is clearly ridiculous.
|
| PHP is not a very good language. Yes, even modern PHP.
| mcronce wrote:
| It was also explicitly stated to be a joke. There's
| really nothing wrong with that.
| hu3 wrote:
| Not all jokes are harmless. Specially the flamebait prone
| ones. Take a look at the site guidelines about it
| (comment above in this same thread).
|
| Moderation is active and monitors this kind of behavior.
| For example:
| https://news.ycombinator.com/item?id=38410031
| hu3 wrote:
| I can and do dismiss non-constructive criticism, based on
| HN guidelines.
|
| Literally the first two lines about comment etiquete:
|
| > Be kind. Don't be snarky. Converse curiously; don't
| cross-examine. Edit out swipes.
|
| > Comments should get more thoughtful and substantive,
| not less, as a topic gets more divisive.
|
| https://news.ycombinator.com/newsguidelines.html
| pcwalton wrote:
| Seconding this. Nikita has been a fantastic help in reviewing
| my LLVM contributions over the past couple of years.
| eminence32 wrote:
| Neat that this thesis was written by a student of ralfj [1][2]
| (who has done a lot of work to drive forward formal definitions
| for rust).
|
| [1] https://research.ralfj.de/ [2]
| https://news.ycombinator.com/user?id=ralfj
| PoignardAzur wrote:
| > _Producing observable and deterministic output_
|
| The approach described in this section (just insert a bunch of
| calls to a dump_var() function which dumps its results into the
| graph) is really interesting to me. A problem I've had with
| fuzzing is getting some high-bandwidth error data other than "the
| program crashed". You can eg instrument pointer reads and writes,
| but sometimes reordering or deleting them is a legitimate
| optimization.
|
| It's really cool to see Rust developers pioneer these
| revolutionary new methods nobody thought of before.
| </CunninghamsLaw> (But seriously, if someone has a name for this
| technique, I'm interested.)
| UltimateEdge wrote:
| As I understand it, the value of this thesis is the design of the
| fuzzer, which produces sufficiently 'complex' code to exercise
| uncommon code paths in the compiler, while retaining desirable
| properties such as reducibility.
| IshKebab wrote:
| I've only got a few pages through this but I just wanted to say
| how well this is written. It's so clear! If only all academic
| texts were like this.
___________________________________________________________________
(page generated 2023-11-28 23:01 UTC)