[HN Gopher] && operator chains (and -|, possibly) generates unop...
___________________________________________________________________
&& operator chains (and -|, possibly) generates unoptimizable LLVM
IR
Author : est31
Score : 176 points
Date : 2021-03-29 11:20 UTC (11 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| kazinator wrote:
| This stuff is fun. In the TXR Lisp compiler was running into
| pattern matching code refusing to optimize nicely. My test case
| was this Ackermann: (defun ack (:match)
| ((0 @n) (+ n 1)) ((@m 0) (ack (- m 1) 1)) ((@m
| @n) (ack (- m 1) (ack m (- n 1)))))
|
| I then added an "early peephole" pass which works on the input
| code before it has been divided into basic blocks:
|
| http://www.kylheku.com/cgit/txr/tree/share/txr/stdlib/optimi...
| (function at line 588)
|
| There, I added recognition for a certain four-instruction kernel
| occurring out of pattern matching code, and rewrote it to three
| instructions.
|
| Poof; I was left rubbing my eyes as out popped an instruction
| sequence _identical_ to the one for this non-pattern-matching
| Ackermann, modulo some register names: (defun ack
| (m n) (cond ((eq m 0) (+ n 1)) ((eq n
| 0) (ack (- m 1) 1)) (t (ack (- m 1) (ack m (- n 1))))))
|
| That early reduction kind of removed a logjam, allowing other
| optimizations to proceed.
|
| The pattern matching code generates a flag which is set true upon
| matching, so that subsequent cases are skipped. This flag
| completely disappears.
|
| The recognized instruction pattern spans (what would be) two
| basic blocks because it includes a jmp instruction. The rewritten
| instruction sequence eliminates one conditional branch, which
| gets rid of a basic block division, which is part of why it
| works.
|
| Anyway, a big picture lesson here is that a big reason for the
| existence of pattern matching is so that so you have a tool that
| can be applied to optimize pattern matching.
| kazinator wrote:
| This is more related to the story than meets the eye. _early-
| peephole_ matches a pattern that will span two basic blocks if
| it is not replaced.
|
| The rewrite removes a conditional which tests a temporary flag;
| that test ends up in a basic block by itself:
| (defun early-peephole (code) (rewrite-case insns code
| (((mov (t @t1) (d @d1)) (jmp @lab2)
| @(symbolp @lab1) (mov (t @t1) (t 0))
|
| A label is matched (backreferencing the earlier (jmp @lab2), so
| we know this item must be a label) followed by an _ifq_ :
| @lab2 (ifq (t @t1) (t 0) @lab3)
|
| and so the above label and ifq form a basic block which does
| nothing more than tests a register to go somewhere else: the
| following code or some lab3 elsewhere.
|
| This is very reminiscent to the numerous temporary-testing
| basic blocks shown the messy "unoptimizable" flow graph in the
| submission. . @rest) ^((mov (t
| ,t1) (d ,d1)) (jmp ,lab3) ,lab1
| (mov (t ,t1) (t 0)) ,lab2 ,*rest))
|
| In the rewritten pattern, that _ifq_ is gone.
|
| If we look at the diagram in the submission: some things are
| striking. For instance, have a basic block 14:
| bb14: fill temp5 with false
|
| And another one: bb13: fill temp5 with true
|
| Both of these jumps unconditionally to bb16:
| check temp5
|
| The situation tested by my pattern above is exactly this sort
| of thing.
|
| E.g. if we look at the pattern matching _ack_ without
| optimization, we can find them: 2> (disassemble
| (let ((*opt-level* 0)) (compile 'ack))) data:
| 0: ack 1: 0 2: 1 3: t
|
| Note 3: t means that in this VM description, register d3 holds
| t, the canonical Boolean true symbol: [
| snip ] 31: 2C060403 movsr t6 d3 32:
| 34000022 jmp 34 33: 2C060000 movsr t6 nil 34:
| 10000006 end t6 35: 3C000036 ifq t6 nil 54
|
| Here is an instance of the pattern. Except for the "end t6" has
| to do with closing a "frame ..." instruction earlier. We don't
| see this "end t6" in the pattern, and so it would cause a
| mismatch; but this instruction is removed by frame elimination,
| which is earlier in the compiler and happens at a lower
| optimization level.
|
| So then, what do we have here? basic block A:
| "fill temp6 with true, jump to C" 31: 2C060403
| movsr t6 d3 32: 34000022 jmp 34 basic
| block B: "fill temp6 with false, fall to C" 33:
| 2C060000 movsr t6 nil basic block C: "check temp6"
| 34: 10000006 end t6 35: 3C000036 ifq t6 nil 54
|
| That very similar that aforementioned situation in that basic
| block diagram!
|
| In my case, what breaks the logjam is that we get rid of the
| check, because the pattern knows that the check is only being
| reached from those two sources.
|
| And then, register t6 succumbs to dead register removal. A
| subsequent data flow analysis, done after flow control
| optimizations (jump threading) discovers that these definitions
| of the t6 value have no next use.
|
| I think that thanks to work done since then, that early-
| peephole thing could be moved into jump threading. There could
| be jump threading patterns which infer that the target of a
| jump is a conditional instruction which depends on the value of
| a register which is set to a true/false value prior to the
| jump, and then adjust the original jump. It's more general, but
| more annoying to code because of pattern matching across
| multiple discontinuous basic blocks.
| sillysaurusx wrote:
| Is the code up anywhere? I'd be interested in studying a lisp
| compiler.
| tekknolagi wrote:
| I have my own Lisp compiler as well if you are interested:
| https://bernsteinbear.com/blog/lisp/
| sillysaurusx wrote:
| Bonus points for (a) being (essentially) a literate
| program, and (b) being in C. Thanks!
| kazinator wrote:
| Yes:
|
| http://www.kylheku.com/cgit/txr/tree/share/txr/stdlib/
|
| Files compiler.tl, optimize.tl. asm.tl.
|
| Main entry points into it are the functions compile and
| compile-file, as well as compile-toplevel which is public
| and used by those two.
| sillysaurusx wrote:
| There's some interesting syntax in those files that I
| haven't seen before. What's ^((,me.(get-sidx 'exception-
| subtype-p) mean? Specifically, dot followed by open
| paren. Is it a method call?
|
| In my own lisp, I settled on a convention like (foo (.bar
| 1) (.baz 2)) but that might be an interesting
| alternative.
|
| ^ seems like quasiquote. Is it?
| kazinator wrote:
| Yes, ^ is quasiquote. The only other weird thing is that
| ,* is splicing, and not ,@. With that mapping, your
| quasiquote skills transfer.
|
| ` cannot be quasiquote because it's used for
| quasistrings. @ is also a notational prefix with a number
| of uses.
|
| When an earmuffed special variable is spliced, it is
| written , *variable*.
| ashton314 wrote:
| Wow! What a write-up! I appreciate the liberal use of the
| summary/details tags. Really helps things stay readable.
|
| Curious: is there a Markdown exporter that supports some syntax
| for this?
| est31 wrote:
| The details tag is just html that you can fall back to in
| markdown:
| https://developer.mozilla.org/de/docs/Web/HTML/Element/detai...
|
| The main contribution by Github's markdown renderer is that it
| doesn't remove that tag.
| unwind wrote:
| Meta: odd typo in the title, "-|" should just be "||".
| est31 wrote:
| I copy pasted the title from Github. I think the mismatch is
| caused by hn's headline mangling? I can't edit it any more, but
| an admin like dang could maybe correct it.
| adwn wrote:
| Whenever someone repeats the tired meme of _" the compiler
| generates better assembly code than a human"_, I think of cases
| like this one (or the many occasions where I inspected compiler-
| generated assembly and it was atrocious). The rule of thumb
| should be _" the compiler generates better assembly code than a
| human could within the time it takes to write high-level code and
| press 'Build project'"_, but that's not as catchy.
| [deleted]
| MaxBarraclough wrote:
| > the tired meme of "the compiler generates better assembly
| code than a human"
|
| This seems incomplete. Which human? The average software
| developer? That doesn't say much. Few developers have the
| necessary expertise. Expert assembly programmers are often able
| to outperform compilers.
| jlcummings wrote:
| Better wouldn't necessarily be the right qualifier, but
| faster, typically more repeatable, and greatly more
| economical with scale/workload would certainly fit as better
| from different vantages.
|
| With high novelty? Probably not until machine learning and
| compilers are deeply entangled.
| slaymaker1907 wrote:
| That very much depends on the code. Program synthesis is an
| active area of research and the programs found would often
| be very difficult for a human to figure out. Of course a
| sufficiently determined human can always do whatever these
| programs do, but I do think it is unfair to give an
| unlimited amount of time to human optimizers.
| theamk wrote:
| Your rule is much weaker to the point of being useless -- you
| are claiming that if it took me 1 hour to write high-level
| code, the compiler will generate better assembly code than what
| I can write in 1 hour.
|
| This is obviously true in most cases -- if it takes me an hour
| to write task X in high-level language, it will probably take
| me a day or more to write it in assembly. After spending only
| one hour on assembly version the program will be nowhere close
| to completion, and likely completely non-functional. So
| whatever compiler produces will be much better -- it will at
| least work.
|
| I can believe that at some moment hand-rolled assembly will
| "beat" generated code (I am pretty sure that if someone spends
| a month optimizing assembly listing it will beat the best Rust
| program written in a hour), but I am not sure what the ratio
| is. I am sure it is not 1:1, and I suspect the ratio gets worse
| as program grows and there are more parts that one needs to
| keep track of.
| adwn wrote:
| > _Your rule is much weaker to the point of being useless_
|
| Yes, it's much weaker, but it's not useless. The vast
| majority of code in a program does not need to be as fast as
| it could possibly be [1], so there's no point in manually
| implementing everything in assembly. That doesn't mean that
| it _never_ makes sense to manually implement something in
| assembly for performance reasons.
|
| [1] And for many programs, _all_ the code does not need to be
| as fast as it could possibly be.
| theamk wrote:
| Right, but I called the rules "useless" because it does not
| tell you any actionable info -- and this is normally
| expected of rules.
|
| Here is a fixed version (note I added "times 20" at the
| end):
|
| "the compiler generates better assembly code than a human
| could within the time it takes to write high-level code
| _times 20_ "
|
| See -- now I can make predictions. Let's say you have a
| function that you want to make fast, and you want to
| optimize it in assembly. The function took 30 minutes to
| write and debug. Do you have 10 hours to spare? Try
| assembly. No time? Leave as-is.
|
| Much more useful this way, no?
| adwn wrote:
| Now it's more accurate, but even less catchy ;-)
|
| How about: "don't blindly assume that the compiler is
| smarter than you"
| Blikkentrekker wrote:
| There may be some specific parts of assembly generation where
| compilers are still outperformed by men, just as there are
| still specific parts of chess where men still outperform
| machines, but overall machines beat human programmers.
|
| The argument of allowing a human programmer to spend vastly
| more time is hardly fair, one could also probably write a
| hyper-optimizing compiler that takes far longer to compile to
| level the playing field, and that compiler will still triumph
| over men in how optimized the end result is.
|
| The only real advantage overal to human programmers here is the
| flexibility. A man can easily be told "spend four months on
| optimizing this", a machine would have to be substantially
| rewritten to fully take advantage of being given extra time.
| kroltan wrote:
| Maybe "the compiler generates better assembly code per second
| than a human could"?
| api wrote:
| Rust is already very close to C and C++ performance, and getting
| to equivalence or possibly better will mean a long slog through
| these kinds of edge cases. This was already done in C and C++
| compiler toolchains, which is why these languages are so fast,
| and it's a bit of a feedback loop because their popularity and
| centrality to systems development is what led to the work being
| done in the first place.
|
| Rust + LLVM will already handily beat a naive bare bones C or C++
| compiler that isn't loaded with sophisticated optimization
| tricks.
|
| If the work is done I expect Rust to be _faster_ than C and C++
| for the same reason that C++ can sometimes be faster than C: a
| more advanced type system can allow better optimization in many
| cases.
|
| In C++ for example you can do algorithms with templates, while in
| C you generally end up with function pointers all over the place.
| See qsort() vs. std::sort<> for one common case.
|
| With Rust you can do all this generic stuff plus leverage better
| safety to do more strict aliasing optimizations, less unnecessary
| memory fencing, and so on.
| mhh__ wrote:
| I'm a little suspect of the "Language x is faster than language
| y" talk - it seems too prone to programmer culture to test
| scientifically, beyond being able to prove that you can or
| can't end up with the same machine code from multiple
| languages.
| Sohcahtoa82 wrote:
| And you can almost always write a program that executes
| faster in a "slower" language than a "faster" one.
| MaxBarraclough wrote:
| I think Ada/GNAT performs roughly on par with C and C++, so we
| have reason to be optimistic. It's not just the dominant
| languages that end up with mature optimizing compilers.
| pjmlp wrote:
| Having been there when C compilers used to generate lousy
| code and we had legends like Mike Abrash teaching an whole
| industry how to write high performance Assembly, it always
| feels ironic that there is this myth about C and C++ being
| blazing fast since they were created.
| anthk wrote:
| C++ was always taken as big, bloated and unusable compared
| to C.
| pjmlp wrote:
| Only by C developers that still believe C generates
| faster code than C++, while their beloved C compiler has
| been rewritten in C++.
| anthk wrote:
| gcc is not my daily compiler.
| pjmlp wrote:
| I found another tinyc user.
| mids_hn wrote:
| C _was_ fast.
|
| On a PDP-11.
| gnulinux wrote:
| I write compilers for fun (hobby) and I just transpile my IR
| data structure to C++ (I'm also experimenting with other
| targets like Haskell (similar to agda compiler) and Rust (work
| in progress)). My friends usually treat this approach "too
| amateur" and a "real language" should compile all the way to
| assembly. The reason I don't do that is if you transpile to a
| well-known language and then manage that compiler to go all the
| way to machine code, you get insane amount of optimizations for
| free. Like C++ compilers are so good, you rarely need think
| about low-level optimizations, as long as your program is
| asymptotically sound (i.e. you don't keep random inserting into
| an array) it gives pretty good performance regardless.
| Flow wrote:
| But then you miss all the fun with doing a nanopass compiler.
|
| Maybe that's not entirely true though, you can still generate
| C/C++ in the end I guess.
| PoignardAzur wrote:
| Not to be like your judgmental friends, but if you're going
| to use the approach that gives optimizations for free, why
| not just target WebAssembly / LLVM IR / Cranelift IR?
| tom_mellior wrote:
| On the other hand, why yes? The OP found something that
| works for them and seems to give them the level of
| optimization they want. Presumably they also know C++ well,
| but possibly none of the technologies you mention well
| enough to "just" target them.
| zokier wrote:
| Nim compiler also generates C, so I'd say you are in pretty
| good company there.
| ericbarrett wrote:
| > My friends usually treat this approach "too amateur" and a
| "real language" should compile all the way to assembly.
|
| Get new friends--no, seriously. I used to think like them,
| and hung out with similar minds. It is gatekeeping[0] in a
| way that, if internalized, will blind you to opportunity.
|
| A few things I missed because I scoffed at them for not being
| "pure enough":
|
| 1) The entire Javascript ecosystem
|
| 2) Bitcoin, circa 2009
|
| 3) Countless SV startup opportunities from 2001-2015
|
| 4) ML and data science (e.g. the terrible tooling)
|
| The success I _have_ found always came with an open mind to
| people 's approaches, even if they were not the ones I
| settled on. It took me way too long to learn this, but I hope
| I can change a few younger people's paths.
|
| [0]
| https://www.urbandictionary.com/define.php?term=Gatekeeping
| nwallin wrote:
| > Rust + LLVM will already handily beat a naive bare bones C or
| C++ compiler that isn't loaded with sophisticated optimization
| tricks.
|
| Well - sure. LLVM is loaded with sophisticated optimization
| tricks. If one compiler does sophisticated optimization tricks
| and the other compiler doesn't do sophisticated optimization
| tricks, the compiler with the sophisticated optimization tricks
| will be faster.
|
| Tangent: Ten years ago or so, LLVM/Clang was touted as being
| head and shoulders better than GCC, the big new thing, the bees
| knees. Some of how it was better was obvious to everyone-
| readable error messages are pretty awesome. (GCC stole that
| idea pretty quickly) But the big thing that the LLVM/Clang
| people talked about was how blazingly fast compilation was
| compared to GCC. It was normal to take 1/10th as long to
| compile something. The code didn't run as fast as the code
| generated by GCC, maybe 50% of the performance, but that was a
| problem that they would eventually fix with more sophisticated
| optimizers. But compilation was fast!
|
| Here we are ten years later, and LLVM/Clang generates code that
| runs almost as quickly as GCC. What did it cost, besides ten
| years of incrementally making the optimizer more and more
| sophisticated, a little bit better one little bit at a time?
| Well, it takes just as long to compile something with LLVM's
| sophisticated optimizer as it does to compile something with
| GCC's sophisticated optimizer.
|
| The point is, there's a lot of yak shaving that will have to go
| into making Rust+LLVM work together better. And LLVM is already
| well acquainted with the barber. The low hanging fruit you're
| looking for has already been plucked.
| killercup wrote:
| > If the work is done I expect Rust to be faster than C and C++
| for the same reason that C++ can sometimes be faster than C: a
| more advanced type system can allow better optimization in many
| cases.
|
| Every now and then I check in on whether LLVM can deal with
| rustc spamming "noalias" on all references. You can find the
| latest change in [1]. While in theory this unlocks _a ton_ of
| optimizations, noalias is used very rarely in C/C++ code so
| these compiler passes are not exercised a lot by existing LLVM
| tests and/or not realized in full.
|
| [1] https://github.com/rust-lang/rust/pull/82834
| amelius wrote:
| A more advanced type system may also drive you into a corner
| where you make bad decisions wrt performance.
| titzer wrote:
| Can you give an example?
| Too wrote:
| In c/c++, Marking every argument as const throughout a
| deep call chain to later find some edge case where you
| need to mutate one member far down the call stack, and
| where this would not have broken the top level contract
| of the function. Forcing you to do expensive copies
| instead.
| Arnavion wrote:
| Not amelius, but one case that happened to me is that
| Rust requires wrapping a `T` in a `RefCell` if two
| closures use it as `&mut T`. This happens even if you the
| caller know that the closures are invoked from a single
| thread and do not invoke each other, and thus only one
| `&mut T` will be in effect at any time. This is because
| closures are effectively structs with the captures as
| fields, so both struct values (and thus both `&mut`
| borrows) _exist_ at the same time even though their
| respective fields are not _used_ at the same time.
|
| Not only do you have to use `RefCell`, but you now also
| have panicking code when the `RefCell` borrow "fails",
| even though you know it can't. rustc is also not smart
| enough to notice the exclusivity at compile-time and elid
| away the RefCell borrow flag test and the panic branch.
| fn foo(mut cb1: impl FnMut(), mut cb2: impl FnMut()) {
| for _ in 0..10 { cb1();
| cb2(); } } let mut x =
| String::new(); foo(|| x.push_str("cb1,"), ||
| x.push_str("cb2,"));
|
| https://play.rust-
| lang.org/?version=stable&mode=debug&editio...
|
| Fixed using `RefCell`: https://play.rust-
| lang.org/?version=stable&mode=release&edit... . Inspect
| the ASM and trace the use of the string constant "already
| borrowed"; you'll see it being used for the borrow flag
| test because it wasn't elided.
|
| The equivalent non-Rust program could use String pointers
| for the two closures. I'm not sure whether they could be
| noalias or not, but at the very least they wouldn't need
| to generate any panicking code.
| adwn wrote:
| Just use `UnsafeCell` instead of `RefCell` [1]: It has
| zero overhead, but you have to be sure that there's
| really no simultaneous write/write or read/write access -
| just like using raw pointers in C or C++.
|
| [1] https://doc.rust-
| lang.org/beta/std/cell/struct.UnsafeCell.ht...
| Arnavion wrote:
| Yes, I'm not averse to using `unsafe`, but one has to
| justify it on a case-by-case basis. Eg if you're doing
| this in a library, then keep in mind that some users are
| very adamant about using unsafe-free crates, so you may
| prefer to take the hit.
| colejohnson66 wrote:
| > then keep in mind that some users are very adamant
| about using unsafe-free crates
|
| Couldn't you just put the use of unsafe as a default and
| add a feature flag to force the safe (but slower)
| behavior. Then you get the best of both worlds: those who
| don't care get performance for "free", while those who
| care can force it when they want.
| est31 wrote:
| If anything you'd have to go the opposite way: use safe
| by default and add the option to turn off runtime checks
| like bounds checks on slice access. Because when you
| write safe code, you tell the compiler about the
| invariants of your code, while with unsafe code, you keep
| them in your mind yourself. They might not even translate
| to any safe Rust constructs at all. E.g. if you pass a
| pointer in C, what is the recipient of the pointer
| supposed to do with it? Is the memory content
| initialized? Who is responsible for deallocation? On the
| other hand, if the compiler is told invariants in terms
| of safe code, it's easy to avoid any runtime checks for
| them.
| Arnavion wrote:
| The users I was thinking of were more along the lines of
| people that run cargo-geiger etc, which just looks for
| "unsafe" in the source rather than anything dynamic based
| on selected features.
| comex wrote:
| FWIW, another option is to use `std::cell::Cell`. That
| only allows replacing the value rather than borrowing it
| in place, so you would have to take the value out and put
| it back after you're done with it, which also results in
| unnecessary code generation. But there'd be no branch and
| no panic, so the impact should be less than RefCell.
| There's also no borrow flag to take up space (not that it
| really matters when this is a single value on the stack).
| Arnavion wrote:
| This is a good point. I always forget Cell can be used
| with non-Copy types too.
| gpm wrote:
| Apart from rust not being smart enough to see what you're
| doing in this sort of situation, the access pattern
| you're trying to use would actually result in undefined
| behavior with &mut pointers (or I assume C restrict
| pointers) because of the aliasing guarantees. For example
| one optimization you could imagine the compiler actually
| doing would result in the following let
| mut x = String::new(); let str1 = x;
| (store x in a local pointer, no one else is touching it
| because we have aliasing guarantees) let str2 = x;
| (store x in a local pointer, no one else is touching it
| because we have aliasing guarantees) for _
| in 0.. 10 { str1.push_str("cb1,");
| str2.push_str("cb2,"); } x = str1;
| (restore x to the original variable before our aliasing
| guarantee goes away) x = str2; (restore x to the
| original variable before our aliasing guarantee goes
| away)
|
| And you just:
|
| - Leaked str1
|
| - Created an x that just says cb2 repeatedly instead of
| alternating between cb1 and cb2.
|
| Obviously it's possible to fix this problem by having
| different guarantees on pointers (C's pointers, rust raw
| pointers), but it's not clear that the occasional
| overhead of some metadata tracking (refcell) isn't
| actually going to be more performant than the constant
| overhead of not having aliasing guarantees everywhere
| else. The most performant would be obviously having both,
| but as we've seen with C asking programmers to go around
| marking pointers as restrict is too much work for too
| little benefit.
| Arnavion wrote:
| >the access pattern you're trying to use would actually
| result in undefined behavior with &mut pointers (or I
| assume C restrict pointers) [...] Obviously it's possible
| to fix this problem by having different guarantees on
| pointers (C's pointers, rust raw pointers)
|
| Yes, I said as much in the last paragraph.
|
| >but it's not clear that the occasional overhead of some
| metadata tracking (refcell) isn't actually going to be
| more performant than the constant overhead of not having
| aliasing guarantees everywhere else.
|
| Generating panicking code where it's not needed is bad in
| general. It adds unwinding (unless disabled), collects
| backtraces (unavoidable), and often pulls in the std::fmt
| machinery.
|
| Yes, in general it's almost certainly true that non-
| aliasing pointers produce more benefits regardless. My
| comment was in the context of the very specific example
| it gave.
| Diggsey wrote:
| The "used from a single thread" aspect is a red herring:
| RefCell can only be used from a single thread anyway, and
| the compiler enforces this statically.
|
| The "state" value in a RefCell _is_ overhead, although it
| 's fairly minor given that it doesn't need any
| synchronization to access. The extra panic branches are
| probably the largest overhead.
|
| That said, these overheads stem from Rust's safety
| guarantees rather than its strong type system: you can
| have a language with a strong type system that does not
| do these checks.
|
| Furthermore, there _are_ of course ways to avoid this
| overhead within safe Rust: if you can use the type system
| to prove that the cell cannot be borrowed at the same
| time, then you don 't need to do the checks, and in that
| sense a strong type system can actually help avoid
| overheads that were introduced by being a safe language.
| Arnavion wrote:
| >That said, these overheads stem from Rust's safety
| guarantees rather than its strong type system: you can
| have a language with a strong type system that does not
| do these checks.
|
| The difference in semantics between a `&mut T` and a
| `*mut T` is a type system one. `&mut T` requires that two
| do not exist at the same time, regardless of whether they
| are used at the same time or not; this is the contract of
| the type.
|
| >Furthermore, there are of course ways to avoid this
| overhead within safe Rust: if you can use the type system
| to prove that the cell cannot be borrowed at the same
| time, then you don't need to do the checks, and in that
| sense a strong type system can actually help avoid
| overheads that were introduced by being a safe language.
|
| Correct, which is why I made the effort of pointing out
| that rustc is not smart enough to do it, not that it's
| impossible to do it.
| codeflo wrote:
| This may be splitting hairs a bit, because we all agree
| that this is a good example where using Rust in this
| straightforward manner leads to suboptimal performance.
| But I agree with the grand parent that this is mainly an
| issue with safety, not with the type system itself.
|
| To show why, consider two alternative languages.
|
| "Weak Rust": an equally safe Rust with a weaker type
| system. It might not distinguish & and &mut, but it would
| still need those checks, because you might use those
| shared references to break a data structure invariant. It
| would have to detect such unsafe usage at runtime and
| raise the equivalent of Java's
| ConcurrentModificationException.
|
| "Unsafe Rust": a less safe Rust with an equally strong
| type system. It _wouldn't_ need to do those checks. In
| fact, that's basically C++.
| nicoburns wrote:
| You could rewrite this as an iterator, which would avoid
| this problem and make the code much nicer.
| Arnavion wrote:
| The hypothetical `foo` is a third-party library function.
|
| (The real code which I reduced to this example is https:/
| /github.com/Arnavion/k8s-openapi/blob/1fcfe4b34a1f4f1...
| , and the callee does happen to be another crate in my
| control. While this can't be reduced to something like an
| Iterator, it _can_ be resolved by making the calle take a
| trait with two ` &mut self` methods instead of taking two
| closures. That still requires changing the callee, of
| course.)
| Blikkentrekker wrote:
| Only so long as it be mandatory to obey.
|
| _Rust_ was specifically designed so that one can ignore
| all the restrictions if one be confined and tell the
| compiler " _I know better, and I know it is safe._ ".
|
| Of course, if one be wrong in such confidence, u.b. lurks
| around the corner.
|
| I suppose one big problem with _Rust_ is that it 's less
| specified than in _C_ what is and isn 't safe, so it's
| harder to be so confident.
| api wrote:
| This is why I think Rust could eventually be faster than C
| and C++ for a lot of things. The work has to be done though.
| You're right that noalias enabled optimizations are neglected
| because you can rarely use them in C code.
|
| On the Rust side I think the language needs some way to
| annotate if's as likely/unlikely. This doesn't matter in most
| cases but can occasionally matter a lot in tight high
| performance code. It can allow the compiler to emit code that
| is structured so as to cause the branch predictor to usually
| be right, which can have a large impact.
| steveklabnik wrote:
| That's tracked by https://github.com/rust-
| lang/rust/issues/26179 by the way.
| drran wrote:
| Why not just use PGO (Profile Guided Optimizations)?
|
| Sadly, PGO does not work with cross-language LTO, because
| of conflict of LLVM module name ('Profile').
| [deleted]
| quiescant_dodo wrote:
| This is a phenomenal bug report.
| mhh__ wrote:
| I reserve a different term for bugs where the backend is
| actually _wrong_ (e.g. emitting the wrong SIMD instruction was
| a fun one)!
| wyldfire wrote:
| Interesting investigation. But it sounds like the discssion
| revealed that this may be "just" a regression.
| losvedir wrote:
| A regression when the LLVM backend updated from 8 to 9. So
| arguably a regression in LLVM!
|
| Reminds me of a Zig stream I watched recently where Andy was
| updating the compiler backend to use an LLVM 12 release
| candidate. He ended up filling a couple bug reports with LLVM,
| using Zig compiler issues to generate reduced case LLVM IR
| examples that compiled differently with LLVM 11. He mentioned
| that he had to try this upgrade with the release candidate
| because right now they're LLVM regressions, but once 12 ships
| they become Zig regressions!
| jedisct1 wrote:
| Zig does a really good job at finding bugs in LLVM every time
| the LLVM version is updated.
|
| And Zig developers also do a very good job at reporting and
| fixing them. Which is why contributing to Zig also means
| contributing to LLVM.
| rob74 wrote:
| Language developers finding _bugs_ in LLVM is ok, but if
| there are already several instances of _regressions_ in
| LLVM found by developers of (maybe "exotic") languages,
| then the reaction should be less "hey, well done, you found
| a bug" and more "how can we avoid such regressions in the
| future?"...
| MaxBarraclough wrote:
| I imagine Clang and GCC have growing test-suites to
| defend against regressions. I wonder if they share them.
| PoignardAzur wrote:
| Separate the backend into functional subcomponents and
| fuzz them as much as possible?
|
| It's what cranelift does.
| PoignardAzur wrote:
| I think this should count as a bug in Rustc.
|
| If your compiler generates IR that's _just_ awful enough that
| it 's a coin toss whether the backend can optimize it, whereas
| it can optimize the "naive" IR you'd write by hand, it's a
| compiler problem, not a backend problem.
___________________________________________________________________
(page generated 2021-03-29 23:01 UTC)