[HN Gopher] Semantic fuzzing of the Rust compiler and interprete...
       ___________________________________________________________________
        
       Semantic fuzzing of the Rust compiler and interpreter [pdf]
        
       Author : todsacerdoti
       Score  : 143 points
       Date   : 2023-11-28 13:52 UTC (9 hours ago)
        
 (HTM) web link (ethz.ch)
 (TXT) w3m dump (ethz.ch)
        
       | sebstefan wrote:
       | > A long-standing miscompilation in the Rust compiler was the
       | incorrect removal of loops even if the compiler cannot tell
       | whether it terminates (#28728).
       | 
       | >This resulted in the Rust compiler apparently "proving" the
       | unresolved Collatz conjecture by optimising this function to
       | return true [1].                   pub fn collatz(n: usize) ->
       | bool {           match n {             1 => true,             n
       | if n % 2 == 0 => collatz(n/2),             _ => collatz(3\*n +
       | 1),           }         }
       | 
       | That's funny. Hey maybe it was just that good!
        
         | j16sdiz wrote:
         | Collatz conjecture is tested to be true for n < 2^100000 - 1.
         | That is, true for all value represent-able in usize.
        
           | Squeeze2664 wrote:
           | Sounds like it was working as intended, then! Doing a great
           | job at it, too.
        
             | dllthomas wrote:
             | Sounds, rather, like it was doing the right thing in that
             | case but for the wrong reasons!
        
           | Sharlin wrote:
           | Oh yes, that's just LLVM's ConstantFoldCollatzConjecture
           | pass, a basic optimization.
           | 
           | > That is, true for all value represent-able in usize.
           | 
           | A slight understatement!
        
           | not2b wrote:
           | But if we pass in an odd value that is more than half of the
           | maximum value represented in a usize, we get an overflow and
           | a panic when we make a recursive call with n * 3 + 1, right?
           | We shouldn't be optimizing those away.
        
         | random_ wrote:
         | > The source of this bug is LLVM [2] which Rust relies on for
         | optimisations and machine code generation.
         | 
         | So basically rustc_codegen_gcc and cranelift can compile code
         | that don't with LLVM and then introduce (or solve) new bugs?
        
           | kibwen wrote:
           | Note that it's entirely feasible that GCC would do the same
           | in this specific case. This wasn't necessarily a "bug" in
           | LLVM per se, because this is the specified behavior in C and
           | C++, and LLVM was simply implementing those semantics. The
           | fix here was just to allow languages to opt-out of this
           | behavior. Since GCC is also primarily a C/C++ backend, you'd
           | have the same problem unless they offer a similar opt-out.
           | 
           | But in general, yes, if this _was_ unintended behavior of
           | LLVM, then we would expect alternative backends to exhibit
           | differing behavior.
        
             | Arnavion wrote:
             | What you're talking about was a bug in LLVM. The paper
             | says:
             | 
             | >Infinite loops and recursions are Undefined Behaviour in C
             | and C++, therefore the compiler can assume that a loop
             | always terminates.
             | 
             | ... which is wrong - only C++ requires loops to terminate.
             | C allows non-terminating loops (with some specific
             | requirements), and LLVM was already known to miscompile C
             | in this same way until it was fixed in LLVM 12 with the
             | `mustprogress` attribute.
             | 
             | https://bugs.llvm.org/show_bug.cgi?id=965
             | 
             | https://stackoverflow.com/questions/59925618/how-do-i-
             | make-a...
             | 
             | Then this:
             | 
             | >While LLVM does not require loops to terminate and should
             | compile Rust loops correctly, its code contained
             | assumptions that only hold for C/C++, thus miscompiling
             | Rust code.
             | 
             | ... is implied to be a _different_ LLVM bug after the first
             | one was already fixed, ie despite using `mustprogress` LLVM
             | had additional miscompilations for Rust code because of
             | "assumptions that only hold for C/C++". I can't tell which
             | one in table 4.2 it is, though; none of them see to match.
        
         | gpm wrote:
         | Eh, 0 is an infinite loop in that function.
         | 
         | It also doesn't implement the collatz conjecture. It implements
         | the collatz-conjecture mod (usize::MAX + 1). I don't know about
         | that modulus specifically, but there are m where the collatz
         | conjecture mod m is false.
         | 
         | https://www.researchgate.net/profile/Cganesa-Moorthy/publica...
         | 
         | Edit: (2^64 - 1) / 3 is an integer and also trivially
         | infinitely loops (by going to 0).
        
           | Georgelemental wrote:
           | > It implements the collatz-conjecture mod (usize::MAX + 1).
           | 
           | Only if you have overflow checking disabled.
        
         | aidenn0 wrote:
         | Note that in C++, all side-effect-free loops can be assumed to
         | terminate, and thus eliminated.
        
           | vacuity wrote:
           | Assume is a strong word. It's defined by the standard by way
           | of undefined behavior.
        
           | tester756 wrote:
           | Which is crazy, because there's always side effect like
           | temperature, fans speed/noise, etc.
        
             | not2b wrote:
             | That isn't what is meant by "side effect" here. A function
             | with no side effects only modifies its local variables and
             | returns a result. So, if we have
             | 
             | int foo(int a, int b) { int c = bar(a); return b; }
             | 
             | and we can see bar's body, and therefore we can determine
             | that it has no side effects, we can optimize the call to
             | bar away, because we can assume that it terminates and it
             | doesn't affect the result.
             | 
             | But if bar writes to global variables, that's a side
             | effect. If we can't see the body of bar we have to assume
             | that it might.
        
       | muglug wrote:
       | Nikita Popov is thanked in the forward for work fixing LLVM bugs.
       | 
       | Before he pivoted to LLVM work, Nikita was one of the key
       | individual contributors to the official PHP project in the last
       | decade. He's an incredible force for good.
        
         | KMag wrote:
         | Or, throwing a bit of (mostly) pretend shade on PHP, he's
         | atoning for his PHP contributions by fixing LLVM bugs.
        
           | kibwen wrote:
           | Note that Nikita has had a hand in many of the improvements
           | that have made PHP much more pleasant since the fractal-of-
           | bad-design days:
           | https://www.npopov.com/aboutMe.html#accepted-php-proposals
        
           | hu3 wrote:
           | There's nothing wrong with working on PHP and he should (and
           | is) proud of his contributions. HN is no place for this level
           | of snark and PL tribalism.
           | 
           | In fact, if you check his Twitter, he retweeted the latest
           | release of PHP version, 5 days ago:
           | https://twitter.com/nikita_ppv
        
             | IshKebab wrote:
             | I don't think you can dismiss criticism of programming
             | languages as "PL tribalism". The idea that all programming
             | languages are equally great and "best tool for the job"
             | excuses all flaws is clearly ridiculous.
             | 
             | PHP is not a very good language. Yes, even modern PHP.
        
               | mcronce wrote:
               | It was also explicitly stated to be a joke. There's
               | really nothing wrong with that.
        
               | hu3 wrote:
               | Not all jokes are harmless. Specially the flamebait prone
               | ones. Take a look at the site guidelines about it
               | (comment above in this same thread).
               | 
               | Moderation is active and monitors this kind of behavior.
               | For example:
               | https://news.ycombinator.com/item?id=38410031
        
               | hu3 wrote:
               | I can and do dismiss non-constructive criticism, based on
               | HN guidelines.
               | 
               | Literally the first two lines about comment etiquete:
               | 
               | > Be kind. Don't be snarky. Converse curiously; don't
               | cross-examine. Edit out swipes.
               | 
               | > Comments should get more thoughtful and substantive,
               | not less, as a topic gets more divisive.
               | 
               | https://news.ycombinator.com/newsguidelines.html
        
         | pcwalton wrote:
         | Seconding this. Nikita has been a fantastic help in reviewing
         | my LLVM contributions over the past couple of years.
        
       | eminence32 wrote:
       | Neat that this thesis was written by a student of ralfj [1][2]
       | (who has done a lot of work to drive forward formal definitions
       | for rust).
       | 
       | [1] https://research.ralfj.de/ [2]
       | https://news.ycombinator.com/user?id=ralfj
        
       | PoignardAzur wrote:
       | > _Producing observable and deterministic output_
       | 
       | The approach described in this section (just insert a bunch of
       | calls to a dump_var() function which dumps its results into the
       | graph) is really interesting to me. A problem I've had with
       | fuzzing is getting some high-bandwidth error data other than "the
       | program crashed". You can eg instrument pointer reads and writes,
       | but sometimes reordering or deleting them is a legitimate
       | optimization.
       | 
       | It's really cool to see Rust developers pioneer these
       | revolutionary new methods nobody thought of before.
       | </CunninghamsLaw> (But seriously, if someone has a name for this
       | technique, I'm interested.)
        
       | UltimateEdge wrote:
       | As I understand it, the value of this thesis is the design of the
       | fuzzer, which produces sufficiently 'complex' code to exercise
       | uncommon code paths in the compiler, while retaining desirable
       | properties such as reducibility.
        
       | IshKebab wrote:
       | I've only got a few pages through this but I just wanted to say
       | how well this is written. It's so clear! If only all academic
       | texts were like this.
        
       ___________________________________________________________________
       (page generated 2023-11-28 23:01 UTC)