[HN Gopher] && operator chains (and -|, possibly) generates unop...
       ___________________________________________________________________
        
       && operator chains (and -|, possibly) generates unoptimizable LLVM
       IR
        
       Author : est31
       Score  : 176 points
       Date   : 2021-03-29 11:20 UTC (11 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | kazinator wrote:
       | This stuff is fun. In the TXR Lisp compiler was running into
       | pattern matching code refusing to optimize nicely. My test case
       | was this Ackermann:                 (defun ack (:match)
       | ((0 @n) (+ n 1))         ((@m 0) (ack (- m 1) 1))         ((@m
       | @n) (ack (- m 1) (ack m (- n 1)))))
       | 
       | I then added an "early peephole" pass which works on the input
       | code before it has been divided into basic blocks:
       | 
       | http://www.kylheku.com/cgit/txr/tree/share/txr/stdlib/optimi...
       | (function at line 588)
       | 
       | There, I added recognition for a certain four-instruction kernel
       | occurring out of pattern matching code, and rewrote it to three
       | instructions.
       | 
       | Poof; I was left rubbing my eyes as out popped an instruction
       | sequence _identical_ to the one for this non-pattern-matching
       | Ackermann, modulo some register names:                 (defun ack
       | (m n)         (cond           ((eq m 0) (+ n 1))           ((eq n
       | 0) (ack (- m 1) 1))           (t (ack (- m 1) (ack m (- n 1))))))
       | 
       | That early reduction kind of removed a logjam, allowing other
       | optimizations to proceed.
       | 
       | The pattern matching code generates a flag which is set true upon
       | matching, so that subsequent cases are skipped. This flag
       | completely disappears.
       | 
       | The recognized instruction pattern spans (what would be) two
       | basic blocks because it includes a jmp instruction. The rewritten
       | instruction sequence eliminates one conditional branch, which
       | gets rid of a basic block division, which is part of why it
       | works.
       | 
       | Anyway, a big picture lesson here is that a big reason for the
       | existence of pattern matching is so that so you have a tool that
       | can be applied to optimize pattern matching.
        
         | kazinator wrote:
         | This is more related to the story than meets the eye. _early-
         | peephole_ matches a pattern that will span two basic blocks if
         | it is not replaced.
         | 
         | The rewrite removes a conditional which tests a temporary flag;
         | that test ends up in a basic block by itself:
         | (defun early-peephole (code)         (rewrite-case insns code
         | (((mov (t @t1) (d @d1))             (jmp @lab2)
         | @(symbolp @lab1)             (mov (t @t1) (t 0))
         | 
         | A label is matched (backreferencing the earlier (jmp @lab2), so
         | we know this item must be a label) followed by an _ifq_ :
         | @lab2             (ifq (t @t1) (t 0) @lab3)
         | 
         | and so the above label and ifq form a basic block which does
         | nothing more than tests a register to go somewhere else: the
         | following code or some lab3 elsewhere.
         | 
         | This is very reminiscent to the numerous temporary-testing
         | basic blocks shown the messy "unoptimizable" flow graph in the
         | submission.                       . @rest)           ^((mov (t
         | ,t1) (d ,d1))             (jmp ,lab3)             ,lab1
         | (mov (t ,t1) (t 0))             ,lab2             ,*rest))
         | 
         | In the rewritten pattern, that _ifq_ is gone.
         | 
         | If we look at the diagram in the submission: some things are
         | striking. For instance, have a basic block 14:
         | bb14: fill temp5 with false
         | 
         | And another one:                  bb13: fill temp5 with true
         | 
         | Both of these jumps unconditionally to                  bb16:
         | check temp5
         | 
         | The situation tested by my pattern above is exactly this sort
         | of thing.
         | 
         | E.g. if we look at the pattern matching _ack_ without
         | optimization, we can find them:                 2> (disassemble
         | (let ((*opt-level* 0)) (compile 'ack)))       data:
         | 0: ack           1: 0           2: 1           3: t
         | 
         | Note 3: t means that in this VM description, register d3 holds
         | t, the canonical Boolean true symbol:                     [
         | snip ]                    31: 2C060403 movsr t6 d3          32:
         | 34000022 jmp 34          33: 2C060000 movsr t6 nil          34:
         | 10000006 end t6          35: 3C000036 ifq t6 nil 54
         | 
         | Here is an instance of the pattern. Except for the "end t6" has
         | to do with closing a "frame ..." instruction earlier. We don't
         | see this "end t6" in the pattern, and so it would cause a
         | mismatch; but this instruction is removed by frame elimination,
         | which is earlier in the compiler and happens at a lower
         | optimization level.
         | 
         | So then, what do we have here?                   basic block A:
         | "fill temp6 with true, jump to C"               31: 2C060403
         | movsr t6 d3          32: 34000022 jmp 34              basic
         | block B: "fill temp6 with false, fall to C"               33:
         | 2C060000 movsr t6 nil              basic block C: "check temp6"
         | 34: 10000006 end t6          35: 3C000036 ifq t6 nil 54
         | 
         | That very similar that aforementioned situation in that basic
         | block diagram!
         | 
         | In my case, what breaks the logjam is that we get rid of the
         | check, because the pattern knows that the check is only being
         | reached from those two sources.
         | 
         | And then, register t6 succumbs to dead register removal. A
         | subsequent data flow analysis, done after flow control
         | optimizations (jump threading) discovers that these definitions
         | of the t6 value have no next use.
         | 
         | I think that thanks to work done since then, that early-
         | peephole thing could be moved into jump threading. There could
         | be jump threading patterns which infer that the target of a
         | jump is a conditional instruction which depends on the value of
         | a register which is set to a true/false value prior to the
         | jump, and then adjust the original jump. It's more general, but
         | more annoying to code because of pattern matching across
         | multiple discontinuous basic blocks.
        
           | sillysaurusx wrote:
           | Is the code up anywhere? I'd be interested in studying a lisp
           | compiler.
        
             | tekknolagi wrote:
             | I have my own Lisp compiler as well if you are interested:
             | https://bernsteinbear.com/blog/lisp/
        
               | sillysaurusx wrote:
               | Bonus points for (a) being (essentially) a literate
               | program, and (b) being in C. Thanks!
        
             | kazinator wrote:
             | Yes:
             | 
             | http://www.kylheku.com/cgit/txr/tree/share/txr/stdlib/
             | 
             | Files compiler.tl, optimize.tl. asm.tl.
             | 
             | Main entry points into it are the functions compile and
             | compile-file, as well as compile-toplevel which is public
             | and used by those two.
        
               | sillysaurusx wrote:
               | There's some interesting syntax in those files that I
               | haven't seen before. What's ^((,me.(get-sidx 'exception-
               | subtype-p) mean? Specifically, dot followed by open
               | paren. Is it a method call?
               | 
               | In my own lisp, I settled on a convention like (foo (.bar
               | 1) (.baz 2)) but that might be an interesting
               | alternative.
               | 
               | ^ seems like quasiquote. Is it?
        
               | kazinator wrote:
               | Yes, ^ is quasiquote. The only other weird thing is that
               | ,* is splicing, and not ,@. With that mapping, your
               | quasiquote skills transfer.
               | 
               | ` cannot be quasiquote because it's used for
               | quasistrings. @ is also a notational prefix with a number
               | of uses.
               | 
               | When an earmuffed special variable is spliced, it is
               | written , *variable*.
        
       | ashton314 wrote:
       | Wow! What a write-up! I appreciate the liberal use of the
       | summary/details tags. Really helps things stay readable.
       | 
       | Curious: is there a Markdown exporter that supports some syntax
       | for this?
        
         | est31 wrote:
         | The details tag is just html that you can fall back to in
         | markdown:
         | https://developer.mozilla.org/de/docs/Web/HTML/Element/detai...
         | 
         | The main contribution by Github's markdown renderer is that it
         | doesn't remove that tag.
        
       | unwind wrote:
       | Meta: odd typo in the title, "-|" should just be "||".
        
         | est31 wrote:
         | I copy pasted the title from Github. I think the mismatch is
         | caused by hn's headline mangling? I can't edit it any more, but
         | an admin like dang could maybe correct it.
        
       | adwn wrote:
       | Whenever someone repeats the tired meme of _" the compiler
       | generates better assembly code than a human"_, I think of cases
       | like this one (or the many occasions where I inspected compiler-
       | generated assembly and it was atrocious). The rule of thumb
       | should be _" the compiler generates better assembly code than a
       | human could within the time it takes to write high-level code and
       | press 'Build project'"_, but that's not as catchy.
        
         | [deleted]
        
         | MaxBarraclough wrote:
         | > the tired meme of "the compiler generates better assembly
         | code than a human"
         | 
         | This seems incomplete. Which human? The average software
         | developer? That doesn't say much. Few developers have the
         | necessary expertise. Expert assembly programmers are often able
         | to outperform compilers.
        
           | jlcummings wrote:
           | Better wouldn't necessarily be the right qualifier, but
           | faster, typically more repeatable, and greatly more
           | economical with scale/workload would certainly fit as better
           | from different vantages.
           | 
           | With high novelty? Probably not until machine learning and
           | compilers are deeply entangled.
        
             | slaymaker1907 wrote:
             | That very much depends on the code. Program synthesis is an
             | active area of research and the programs found would often
             | be very difficult for a human to figure out. Of course a
             | sufficiently determined human can always do whatever these
             | programs do, but I do think it is unfair to give an
             | unlimited amount of time to human optimizers.
        
         | theamk wrote:
         | Your rule is much weaker to the point of being useless -- you
         | are claiming that if it took me 1 hour to write high-level
         | code, the compiler will generate better assembly code than what
         | I can write in 1 hour.
         | 
         | This is obviously true in most cases -- if it takes me an hour
         | to write task X in high-level language, it will probably take
         | me a day or more to write it in assembly. After spending only
         | one hour on assembly version the program will be nowhere close
         | to completion, and likely completely non-functional. So
         | whatever compiler produces will be much better -- it will at
         | least work.
         | 
         | I can believe that at some moment hand-rolled assembly will
         | "beat" generated code (I am pretty sure that if someone spends
         | a month optimizing assembly listing it will beat the best Rust
         | program written in a hour), but I am not sure what the ratio
         | is. I am sure it is not 1:1, and I suspect the ratio gets worse
         | as program grows and there are more parts that one needs to
         | keep track of.
        
           | adwn wrote:
           | > _Your rule is much weaker to the point of being useless_
           | 
           | Yes, it's much weaker, but it's not useless. The vast
           | majority of code in a program does not need to be as fast as
           | it could possibly be [1], so there's no point in manually
           | implementing everything in assembly. That doesn't mean that
           | it _never_ makes sense to manually implement something in
           | assembly for performance reasons.
           | 
           | [1] And for many programs, _all_ the code does not need to be
           | as fast as it could possibly be.
        
             | theamk wrote:
             | Right, but I called the rules "useless" because it does not
             | tell you any actionable info -- and this is normally
             | expected of rules.
             | 
             | Here is a fixed version (note I added "times 20" at the
             | end):
             | 
             | "the compiler generates better assembly code than a human
             | could within the time it takes to write high-level code
             | _times 20_ "
             | 
             | See -- now I can make predictions. Let's say you have a
             | function that you want to make fast, and you want to
             | optimize it in assembly. The function took 30 minutes to
             | write and debug. Do you have 10 hours to spare? Try
             | assembly. No time? Leave as-is.
             | 
             | Much more useful this way, no?
        
               | adwn wrote:
               | Now it's more accurate, but even less catchy ;-)
               | 
               | How about: "don't blindly assume that the compiler is
               | smarter than you"
        
         | Blikkentrekker wrote:
         | There may be some specific parts of assembly generation where
         | compilers are still outperformed by men, just as there are
         | still specific parts of chess where men still outperform
         | machines, but overall machines beat human programmers.
         | 
         | The argument of allowing a human programmer to spend vastly
         | more time is hardly fair, one could also probably write a
         | hyper-optimizing compiler that takes far longer to compile to
         | level the playing field, and that compiler will still triumph
         | over men in how optimized the end result is.
         | 
         | The only real advantage overal to human programmers here is the
         | flexibility. A man can easily be told "spend four months on
         | optimizing this", a machine would have to be substantially
         | rewritten to fully take advantage of being given extra time.
        
         | kroltan wrote:
         | Maybe "the compiler generates better assembly code per second
         | than a human could"?
        
       | api wrote:
       | Rust is already very close to C and C++ performance, and getting
       | to equivalence or possibly better will mean a long slog through
       | these kinds of edge cases. This was already done in C and C++
       | compiler toolchains, which is why these languages are so fast,
       | and it's a bit of a feedback loop because their popularity and
       | centrality to systems development is what led to the work being
       | done in the first place.
       | 
       | Rust + LLVM will already handily beat a naive bare bones C or C++
       | compiler that isn't loaded with sophisticated optimization
       | tricks.
       | 
       | If the work is done I expect Rust to be _faster_ than C and C++
       | for the same reason that C++ can sometimes be faster than C: a
       | more advanced type system can allow better optimization in many
       | cases.
       | 
       | In C++ for example you can do algorithms with templates, while in
       | C you generally end up with function pointers all over the place.
       | See qsort() vs. std::sort<> for one common case.
       | 
       | With Rust you can do all this generic stuff plus leverage better
       | safety to do more strict aliasing optimizations, less unnecessary
       | memory fencing, and so on.
        
         | mhh__ wrote:
         | I'm a little suspect of the "Language x is faster than language
         | y" talk - it seems too prone to programmer culture to test
         | scientifically, beyond being able to prove that you can or
         | can't end up with the same machine code from multiple
         | languages.
        
           | Sohcahtoa82 wrote:
           | And you can almost always write a program that executes
           | faster in a "slower" language than a "faster" one.
        
         | MaxBarraclough wrote:
         | I think Ada/GNAT performs roughly on par with C and C++, so we
         | have reason to be optimistic. It's not just the dominant
         | languages that end up with mature optimizing compilers.
        
           | pjmlp wrote:
           | Having been there when C compilers used to generate lousy
           | code and we had legends like Mike Abrash teaching an whole
           | industry how to write high performance Assembly, it always
           | feels ironic that there is this myth about C and C++ being
           | blazing fast since they were created.
        
             | anthk wrote:
             | C++ was always taken as big, bloated and unusable compared
             | to C.
        
               | pjmlp wrote:
               | Only by C developers that still believe C generates
               | faster code than C++, while their beloved C compiler has
               | been rewritten in C++.
        
               | anthk wrote:
               | gcc is not my daily compiler.
        
               | pjmlp wrote:
               | I found another tinyc user.
        
             | mids_hn wrote:
             | C _was_ fast.
             | 
             | On a PDP-11.
        
         | gnulinux wrote:
         | I write compilers for fun (hobby) and I just transpile my IR
         | data structure to C++ (I'm also experimenting with other
         | targets like Haskell (similar to agda compiler) and Rust (work
         | in progress)). My friends usually treat this approach "too
         | amateur" and a "real language" should compile all the way to
         | assembly. The reason I don't do that is if you transpile to a
         | well-known language and then manage that compiler to go all the
         | way to machine code, you get insane amount of optimizations for
         | free. Like C++ compilers are so good, you rarely need think
         | about low-level optimizations, as long as your program is
         | asymptotically sound (i.e. you don't keep random inserting into
         | an array) it gives pretty good performance regardless.
        
           | Flow wrote:
           | But then you miss all the fun with doing a nanopass compiler.
           | 
           | Maybe that's not entirely true though, you can still generate
           | C/C++ in the end I guess.
        
           | PoignardAzur wrote:
           | Not to be like your judgmental friends, but if you're going
           | to use the approach that gives optimizations for free, why
           | not just target WebAssembly / LLVM IR / Cranelift IR?
        
             | tom_mellior wrote:
             | On the other hand, why yes? The OP found something that
             | works for them and seems to give them the level of
             | optimization they want. Presumably they also know C++ well,
             | but possibly none of the technologies you mention well
             | enough to "just" target them.
        
           | zokier wrote:
           | Nim compiler also generates C, so I'd say you are in pretty
           | good company there.
        
           | ericbarrett wrote:
           | > My friends usually treat this approach "too amateur" and a
           | "real language" should compile all the way to assembly.
           | 
           | Get new friends--no, seriously. I used to think like them,
           | and hung out with similar minds. It is gatekeeping[0] in a
           | way that, if internalized, will blind you to opportunity.
           | 
           | A few things I missed because I scoffed at them for not being
           | "pure enough":
           | 
           | 1) The entire Javascript ecosystem
           | 
           | 2) Bitcoin, circa 2009
           | 
           | 3) Countless SV startup opportunities from 2001-2015
           | 
           | 4) ML and data science (e.g. the terrible tooling)
           | 
           | The success I _have_ found always came with an open mind to
           | people 's approaches, even if they were not the ones I
           | settled on. It took me way too long to learn this, but I hope
           | I can change a few younger people's paths.
           | 
           | [0]
           | https://www.urbandictionary.com/define.php?term=Gatekeeping
        
         | nwallin wrote:
         | > Rust + LLVM will already handily beat a naive bare bones C or
         | C++ compiler that isn't loaded with sophisticated optimization
         | tricks.
         | 
         | Well - sure. LLVM is loaded with sophisticated optimization
         | tricks. If one compiler does sophisticated optimization tricks
         | and the other compiler doesn't do sophisticated optimization
         | tricks, the compiler with the sophisticated optimization tricks
         | will be faster.
         | 
         | Tangent: Ten years ago or so, LLVM/Clang was touted as being
         | head and shoulders better than GCC, the big new thing, the bees
         | knees. Some of how it was better was obvious to everyone-
         | readable error messages are pretty awesome. (GCC stole that
         | idea pretty quickly) But the big thing that the LLVM/Clang
         | people talked about was how blazingly fast compilation was
         | compared to GCC. It was normal to take 1/10th as long to
         | compile something. The code didn't run as fast as the code
         | generated by GCC, maybe 50% of the performance, but that was a
         | problem that they would eventually fix with more sophisticated
         | optimizers. But compilation was fast!
         | 
         | Here we are ten years later, and LLVM/Clang generates code that
         | runs almost as quickly as GCC. What did it cost, besides ten
         | years of incrementally making the optimizer more and more
         | sophisticated, a little bit better one little bit at a time?
         | Well, it takes just as long to compile something with LLVM's
         | sophisticated optimizer as it does to compile something with
         | GCC's sophisticated optimizer.
         | 
         | The point is, there's a lot of yak shaving that will have to go
         | into making Rust+LLVM work together better. And LLVM is already
         | well acquainted with the barber. The low hanging fruit you're
         | looking for has already been plucked.
        
         | killercup wrote:
         | > If the work is done I expect Rust to be faster than C and C++
         | for the same reason that C++ can sometimes be faster than C: a
         | more advanced type system can allow better optimization in many
         | cases.
         | 
         | Every now and then I check in on whether LLVM can deal with
         | rustc spamming "noalias" on all references. You can find the
         | latest change in [1]. While in theory this unlocks _a ton_ of
         | optimizations, noalias is used very rarely in C/C++ code so
         | these compiler passes are not exercised a lot by existing LLVM
         | tests and/or not realized in full.
         | 
         | [1] https://github.com/rust-lang/rust/pull/82834
        
           | amelius wrote:
           | A more advanced type system may also drive you into a corner
           | where you make bad decisions wrt performance.
        
             | titzer wrote:
             | Can you give an example?
        
               | Too wrote:
               | In c/c++, Marking every argument as const throughout a
               | deep call chain to later find some edge case where you
               | need to mutate one member far down the call stack, and
               | where this would not have broken the top level contract
               | of the function. Forcing you to do expensive copies
               | instead.
        
               | Arnavion wrote:
               | Not amelius, but one case that happened to me is that
               | Rust requires wrapping a `T` in a `RefCell` if two
               | closures use it as `&mut T`. This happens even if you the
               | caller know that the closures are invoked from a single
               | thread and do not invoke each other, and thus only one
               | `&mut T` will be in effect at any time. This is because
               | closures are effectively structs with the captures as
               | fields, so both struct values (and thus both `&mut`
               | borrows) _exist_ at the same time even though their
               | respective fields are not _used_ at the same time.
               | 
               | Not only do you have to use `RefCell`, but you now also
               | have panicking code when the `RefCell` borrow "fails",
               | even though you know it can't. rustc is also not smart
               | enough to notice the exclusivity at compile-time and elid
               | away the RefCell borrow flag test and the panic branch.
               | fn foo(mut cb1: impl FnMut(), mut cb2: impl FnMut()) {
               | for _ in 0..10 {                 cb1();
               | cb2();             }         }              let mut x =
               | String::new();         foo(|| x.push_str("cb1,"), ||
               | x.push_str("cb2,"));
               | 
               | https://play.rust-
               | lang.org/?version=stable&mode=debug&editio...
               | 
               | Fixed using `RefCell`: https://play.rust-
               | lang.org/?version=stable&mode=release&edit... . Inspect
               | the ASM and trace the use of the string constant "already
               | borrowed"; you'll see it being used for the borrow flag
               | test because it wasn't elided.
               | 
               | The equivalent non-Rust program could use String pointers
               | for the two closures. I'm not sure whether they could be
               | noalias or not, but at the very least they wouldn't need
               | to generate any panicking code.
        
               | adwn wrote:
               | Just use `UnsafeCell` instead of `RefCell` [1]: It has
               | zero overhead, but you have to be sure that there's
               | really no simultaneous write/write or read/write access -
               | just like using raw pointers in C or C++.
               | 
               | [1] https://doc.rust-
               | lang.org/beta/std/cell/struct.UnsafeCell.ht...
        
               | Arnavion wrote:
               | Yes, I'm not averse to using `unsafe`, but one has to
               | justify it on a case-by-case basis. Eg if you're doing
               | this in a library, then keep in mind that some users are
               | very adamant about using unsafe-free crates, so you may
               | prefer to take the hit.
        
               | colejohnson66 wrote:
               | > then keep in mind that some users are very adamant
               | about using unsafe-free crates
               | 
               | Couldn't you just put the use of unsafe as a default and
               | add a feature flag to force the safe (but slower)
               | behavior. Then you get the best of both worlds: those who
               | don't care get performance for "free", while those who
               | care can force it when they want.
        
               | est31 wrote:
               | If anything you'd have to go the opposite way: use safe
               | by default and add the option to turn off runtime checks
               | like bounds checks on slice access. Because when you
               | write safe code, you tell the compiler about the
               | invariants of your code, while with unsafe code, you keep
               | them in your mind yourself. They might not even translate
               | to any safe Rust constructs at all. E.g. if you pass a
               | pointer in C, what is the recipient of the pointer
               | supposed to do with it? Is the memory content
               | initialized? Who is responsible for deallocation? On the
               | other hand, if the compiler is told invariants in terms
               | of safe code, it's easy to avoid any runtime checks for
               | them.
        
               | Arnavion wrote:
               | The users I was thinking of were more along the lines of
               | people that run cargo-geiger etc, which just looks for
               | "unsafe" in the source rather than anything dynamic based
               | on selected features.
        
               | comex wrote:
               | FWIW, another option is to use `std::cell::Cell`. That
               | only allows replacing the value rather than borrowing it
               | in place, so you would have to take the value out and put
               | it back after you're done with it, which also results in
               | unnecessary code generation. But there'd be no branch and
               | no panic, so the impact should be less than RefCell.
               | There's also no borrow flag to take up space (not that it
               | really matters when this is a single value on the stack).
        
               | Arnavion wrote:
               | This is a good point. I always forget Cell can be used
               | with non-Copy types too.
        
               | gpm wrote:
               | Apart from rust not being smart enough to see what you're
               | doing in this sort of situation, the access pattern
               | you're trying to use would actually result in undefined
               | behavior with &mut pointers (or I assume C restrict
               | pointers) because of the aliasing guarantees. For example
               | one optimization you could imagine the compiler actually
               | doing would result in the following                  let
               | mut x = String::new();                let str1 = x;
               | (store x in a local pointer, no one else is touching it
               | because we have aliasing guarantees)        let str2 = x;
               | (store x in a local pointer, no one else is touching it
               | because we have aliasing guarantees)                for _
               | in 0.. 10 {            str1.push_str("cb1,");
               | str2.push_str("cb2,");        }             x = str1;
               | (restore x to the original variable before our aliasing
               | guarantee goes away)        x = str2; (restore x to the
               | original variable before our aliasing guarantee goes
               | away)
               | 
               | And you just:
               | 
               | - Leaked str1
               | 
               | - Created an x that just says cb2 repeatedly instead of
               | alternating between cb1 and cb2.
               | 
               | Obviously it's possible to fix this problem by having
               | different guarantees on pointers (C's pointers, rust raw
               | pointers), but it's not clear that the occasional
               | overhead of some metadata tracking (refcell) isn't
               | actually going to be more performant than the constant
               | overhead of not having aliasing guarantees everywhere
               | else. The most performant would be obviously having both,
               | but as we've seen with C asking programmers to go around
               | marking pointers as restrict is too much work for too
               | little benefit.
        
               | Arnavion wrote:
               | >the access pattern you're trying to use would actually
               | result in undefined behavior with &mut pointers (or I
               | assume C restrict pointers) [...] Obviously it's possible
               | to fix this problem by having different guarantees on
               | pointers (C's pointers, rust raw pointers)
               | 
               | Yes, I said as much in the last paragraph.
               | 
               | >but it's not clear that the occasional overhead of some
               | metadata tracking (refcell) isn't actually going to be
               | more performant than the constant overhead of not having
               | aliasing guarantees everywhere else.
               | 
               | Generating panicking code where it's not needed is bad in
               | general. It adds unwinding (unless disabled), collects
               | backtraces (unavoidable), and often pulls in the std::fmt
               | machinery.
               | 
               | Yes, in general it's almost certainly true that non-
               | aliasing pointers produce more benefits regardless. My
               | comment was in the context of the very specific example
               | it gave.
        
               | Diggsey wrote:
               | The "used from a single thread" aspect is a red herring:
               | RefCell can only be used from a single thread anyway, and
               | the compiler enforces this statically.
               | 
               | The "state" value in a RefCell _is_ overhead, although it
               | 's fairly minor given that it doesn't need any
               | synchronization to access. The extra panic branches are
               | probably the largest overhead.
               | 
               | That said, these overheads stem from Rust's safety
               | guarantees rather than its strong type system: you can
               | have a language with a strong type system that does not
               | do these checks.
               | 
               | Furthermore, there _are_ of course ways to avoid this
               | overhead within safe Rust: if you can use the type system
               | to prove that the cell cannot be borrowed at the same
               | time, then you don 't need to do the checks, and in that
               | sense a strong type system can actually help avoid
               | overheads that were introduced by being a safe language.
        
               | Arnavion wrote:
               | >That said, these overheads stem from Rust's safety
               | guarantees rather than its strong type system: you can
               | have a language with a strong type system that does not
               | do these checks.
               | 
               | The difference in semantics between a `&mut T` and a
               | `*mut T` is a type system one. `&mut T` requires that two
               | do not exist at the same time, regardless of whether they
               | are used at the same time or not; this is the contract of
               | the type.
               | 
               | >Furthermore, there are of course ways to avoid this
               | overhead within safe Rust: if you can use the type system
               | to prove that the cell cannot be borrowed at the same
               | time, then you don't need to do the checks, and in that
               | sense a strong type system can actually help avoid
               | overheads that were introduced by being a safe language.
               | 
               | Correct, which is why I made the effort of pointing out
               | that rustc is not smart enough to do it, not that it's
               | impossible to do it.
        
               | codeflo wrote:
               | This may be splitting hairs a bit, because we all agree
               | that this is a good example where using Rust in this
               | straightforward manner leads to suboptimal performance.
               | But I agree with the grand parent that this is mainly an
               | issue with safety, not with the type system itself.
               | 
               | To show why, consider two alternative languages.
               | 
               | "Weak Rust": an equally safe Rust with a weaker type
               | system. It might not distinguish & and &mut, but it would
               | still need those checks, because you might use those
               | shared references to break a data structure invariant. It
               | would have to detect such unsafe usage at runtime and
               | raise the equivalent of Java's
               | ConcurrentModificationException.
               | 
               | "Unsafe Rust": a less safe Rust with an equally strong
               | type system. It _wouldn't_ need to do those checks. In
               | fact, that's basically C++.
        
               | nicoburns wrote:
               | You could rewrite this as an iterator, which would avoid
               | this problem and make the code much nicer.
        
               | Arnavion wrote:
               | The hypothetical `foo` is a third-party library function.
               | 
               | (The real code which I reduced to this example is https:/
               | /github.com/Arnavion/k8s-openapi/blob/1fcfe4b34a1f4f1...
               | , and the callee does happen to be another crate in my
               | control. While this can't be reduced to something like an
               | Iterator, it _can_ be resolved by making the calle take a
               | trait with two ` &mut self` methods instead of taking two
               | closures. That still requires changing the callee, of
               | course.)
        
             | Blikkentrekker wrote:
             | Only so long as it be mandatory to obey.
             | 
             |  _Rust_ was specifically designed so that one can ignore
             | all the restrictions if one be confined and tell the
             | compiler " _I know better, and I know it is safe._ ".
             | 
             | Of course, if one be wrong in such confidence, u.b. lurks
             | around the corner.
             | 
             | I suppose one big problem with _Rust_ is that it 's less
             | specified than in _C_ what is and isn 't safe, so it's
             | harder to be so confident.
        
           | api wrote:
           | This is why I think Rust could eventually be faster than C
           | and C++ for a lot of things. The work has to be done though.
           | You're right that noalias enabled optimizations are neglected
           | because you can rarely use them in C code.
           | 
           | On the Rust side I think the language needs some way to
           | annotate if's as likely/unlikely. This doesn't matter in most
           | cases but can occasionally matter a lot in tight high
           | performance code. It can allow the compiler to emit code that
           | is structured so as to cause the branch predictor to usually
           | be right, which can have a large impact.
        
             | steveklabnik wrote:
             | That's tracked by https://github.com/rust-
             | lang/rust/issues/26179 by the way.
        
             | drran wrote:
             | Why not just use PGO (Profile Guided Optimizations)?
             | 
             | Sadly, PGO does not work with cross-language LTO, because
             | of conflict of LLVM module name ('Profile').
        
           | [deleted]
        
       | quiescant_dodo wrote:
       | This is a phenomenal bug report.
        
         | mhh__ wrote:
         | I reserve a different term for bugs where the backend is
         | actually _wrong_ (e.g. emitting the wrong SIMD instruction was
         | a fun one)!
        
       | wyldfire wrote:
       | Interesting investigation. But it sounds like the discssion
       | revealed that this may be "just" a regression.
        
         | losvedir wrote:
         | A regression when the LLVM backend updated from 8 to 9. So
         | arguably a regression in LLVM!
         | 
         | Reminds me of a Zig stream I watched recently where Andy was
         | updating the compiler backend to use an LLVM 12 release
         | candidate. He ended up filling a couple bug reports with LLVM,
         | using Zig compiler issues to generate reduced case LLVM IR
         | examples that compiled differently with LLVM 11. He mentioned
         | that he had to try this upgrade with the release candidate
         | because right now they're LLVM regressions, but once 12 ships
         | they become Zig regressions!
        
           | jedisct1 wrote:
           | Zig does a really good job at finding bugs in LLVM every time
           | the LLVM version is updated.
           | 
           | And Zig developers also do a very good job at reporting and
           | fixing them. Which is why contributing to Zig also means
           | contributing to LLVM.
        
             | rob74 wrote:
             | Language developers finding _bugs_ in LLVM is ok, but if
             | there are already several instances of _regressions_ in
             | LLVM found by developers of (maybe  "exotic") languages,
             | then the reaction should be less "hey, well done, you found
             | a bug" and more "how can we avoid such regressions in the
             | future?"...
        
               | MaxBarraclough wrote:
               | I imagine Clang and GCC have growing test-suites to
               | defend against regressions. I wonder if they share them.
        
               | PoignardAzur wrote:
               | Separate the backend into functional subcomponents and
               | fuzz them as much as possible?
               | 
               | It's what cranelift does.
        
         | PoignardAzur wrote:
         | I think this should count as a bug in Rustc.
         | 
         | If your compiler generates IR that's _just_ awful enough that
         | it 's a coin toss whether the backend can optimize it, whereas
         | it can optimize the "naive" IR you'd write by hand, it's a
         | compiler problem, not a backend problem.
        
       ___________________________________________________________________
       (page generated 2021-03-29 23:01 UTC)