[HN Gopher] When static makes your C code 10 times faster
___________________________________________________________________
When static makes your C code 10 times faster
Author : rostayob
Score : 142 points
Date : 2021-07-04 13:12 UTC (9 hours ago)
(HTM) web link (mazzo.li)
(TXT) w3m dump (mazzo.li)
| kahlonel wrote:
| I think a simple "const" would have also done the trick.
| Sometimes -O3 is clever enough to figure out that the value is
| never written, so it makes it an asm constant.
| painchoc wrote:
| It is mostly about the fact that if the variable is not static,
| then it's non-local to the translation unit and can be modified
| from everywhere else. So its value needs to be loaded and a
| plain and slow division is applied.
|
| Having it local or const makes the compiler able to inline it
| and do a simple bitwise and with a constant.
|
| So yes, make your variables static const by default (if you
| really need global).
| jheriko wrote:
| const_cast and linkage make const weaker than static on a
| variable that doesn't change.
| jokoon wrote:
| So the compiler cannot always optimize...
| peter_d_sherman wrote:
| >"When modulus is static, gcc / clang know that it is private to
| the current compilation unit, and therefore they can inline the
| value itself. Then, they turn the expensive _div_ into a much
| cheaper _and_ - since
|
| _mod'ing by a power of two -- is equal to bitwise and of that
| number minus one!_
|
| All you need to do is keep the bits lower than that power of two,
| which is what the and will do."
| EdSchouten wrote:
| Back in 2012 I observed exactly the same thing. I actually
| managed to get a warning for this added to Clang, called
| -Wmissing-variable-declarations. When set, warnings are generated
| in case a non-static global variable is defined without an
| external declaration that precedes it.
|
| I worked on this as part of FreeBSD, which is why their base
| system is nowadays built with that flag enabled.
| rostayob wrote:
| Thanks, that's good to know!
| Hello71 wrote:
| static can also make your code 10 times smaller: note that in the
| linked godbolt, there are actually two copies of both loop
| functions: one regular, and one inlined. this is because the
| compiler wants to inline the function, but is required to
| generate an additional one in case someone else will be calling
| it. what's more, at least on Linux, this copy cannot be removed
| from the final executable even if nobody calls it, unless a) the
| compilation is done with -ffunction-sections and the linking is
| done with --gc-sections, or b) LTO is enabled. adding static to
| the function declaration resolves this issue.
|
| the situation is even worse with ELF dynamic libraries due to the
| interaction of two rules: a) by default, all functions are
| exported, and b) by default, all functions can be interposed,
| e.g. by LD_PRELOAD. here, if you specify -fPIC in the compilation
| arguments (as is required to produce a modern dynamic library),
| _inlining is totally disabled_. for small functions, the call
| overhead can be substantial.
| alok-g wrote:
| I think this is beyond simply making the variable
| static/constant. It is the specific value of the constant that is
| allowing the division to be substituted with bitwise AND, which
| then makes it so much faster. I wonder how much the speedup would
| be if some other near-random value is there for the constant
| (which is likely beyond the purpose at hand).
| ddulaney wrote:
| This can be accomplished more simply and reliably by marking
| modulus as const. The compiler currently has to reason about the
| whole compilation unit to determine that modulus is not modified,
| which works. However, if future code modifies modulus (either on
| purpose or accidentally) or something changes that prevents the
| compiler from performing global reasoning, the optimization will
| be lost. By marking the actual intention, any modifications of
| modulus turn into compiler errors. Plus, if it becomes important
| to expose modulus to another compilation unit, now that's
| possible.
|
| This is a common issue with C code (including lots of code I've
| written). It's really easy to forget to const something, which
| forces the compiler to do global reasoning or to generate worse
| code. I've gotten into the habit of making things const unless I
| know I plan on mutating them, but I wish there was tooling that
| encouraged it. (BTW, this is something Rust does well by making
| things constant by default and requiring "mut" if it's mutable.)
| forrestthewoods wrote:
| > It's really easy to forget to const something, which forces
| the compiler to do global reasoning or to generate worse code.
|
| Global mutable state is pure evil. Don't write globals.
|
| It shouldn't be easy to forget const on a global because a
| mutable global should produce immediate revulsion and nausea.
|
| (I don't really consider a const global to be "a global". So
| ordinarily I'd just say globals are evil don't write globals.
| But I'm trying to be explicit here.)
| krapht wrote:
| This is one of those sayings that I don't think is helpful.
| What it should be is: limit variable scope to the smallest
| thing it can be.
|
| Nobody is being fooled about global state when you have a
| singleton database connection, event bus router, or network
| stack. I don't think your program is better when you pass in
| i/o functionality to every single class context in the
| constructor.
|
| Similarly a mega-class that encapsulates everything your
| program does is also a code smell. There's no point to a
| private variable when everything can access it.
| forrestthewoods wrote:
| I respectfully disagree on both points.
|
| > I don't think your program is better when you pass in i/o
| functionality to every single class context in the
| constructor.
|
| Abstracting over I/O transport is an excellent thing to do.
| This allows you to do things like easily record and replay
| a network stream. Which is useful for both debugging and
| automated tests.
|
| I/O comes in a kazillion flavors. Networked, interprocess,
| serial port, file, synthetic, etc etc. It's definitely
| something that should be abstracted around and not doing so
| is something I've deeply regretted in the past.
|
| > a mega-class that encapsulates everything your program
| does is also a code smell
|
| Ok I agree it can have a foul odor. But even this can be
| advantageous.
|
| Once upon a time Blizzard gave a GDC presentation about
| Overwatch. Kill-cam replays are notoriously difficult in
| video games.
|
| Blizzard's solution to this was delightfully elegant. They
| made two copies of their world. One perpetually runs on
| latest. One takes snapshots of the world every N frames.
| When a player dies their viewport switches to the old
| snapshot which then simulates and renders for ~6-10
| seconds. When the replay finishes or skips the viewport
| switches back to the main game, which never stopped
| receiving updates. This was a relatively trivial
| implementation given the complete lack of globals and
| singletons.
|
| A mega-class lets you run parallel instances of your
| "world". It's also a nice pattern when you want to build-up
| and tear-down your world in-process and guarantee no stale
| state. For example when running tests you likely want
| certain tests to "start clean". It's nice to be able to do
| this without restarting the entire process.
|
| I'll double-down that globals are evil. They are a
| sometimes necessary evil. Or the least bad choice. But my
| experience is that not using globals is almost always
| simpler, more elegant, more flexible, and ultimately
| preferable.
| wuxb wrote:
| I started to use const whenever possible after being familiar
| with some compiler optimizations and the Haskell pl. The point
| is knowing how to give the compiler an easier job.
| rostayob wrote:
| Author here -- I agree that const is better. The perf
| difference I encountered was due to the static keyword though,
| which is why the blog post talks about that specific issue.
| 95014_refugee wrote:
| The perf difference was due to the compiler being able to
| infer 'const' by way of 'static'.
|
| Advocating 'static' when your actual intent is 'const' does
| less experienced readers a disservice; they will assume that
| 'static' is meant to make things faster, and be disappointed
| when it doesn't work for non-constant values.
| rostayob wrote:
| I am not advocating static. I'm advocating for looking at
| what the compiler outputs when surprising behavior is
| encountered. The example is extracted from a larger piece
| of code, and I reported the minimal case as-is.
|
| If anything, the title is meant to be read as "isn't it
| amusing that something apparently unrelated such as
| `static` causes a performance improvement".
|
| That said, I have added a note clarifying this at top of
| the post now.
| wyldfire wrote:
| If there were any expressions that took the address of the
| variable, then even both `static const` qualifiers wouldn't
| work for a sufficiently paranoid compiler.
| giomasce wrote:
| Are you sure? Isn't it UB to modify a const object? It will
| probably end up in a non-writable memory page.
|
| EDIT: gcc seems to agree with me: you can see the optimized
| version here[1] and the unoptimzed version if you remove
| "const".
|
| [1] https://godbolt.org/z/KWrW45rK8
| Snoonan999 wrote:
| Const is to state that this module is not allowed to
| modify. Think on what'const volatile int foo' means.
|
| Mostly seen in embedded space.
| not2b wrote:
| This is why language standards specify what the compiler
| can assume and call out some behavior as undefined, exactly
| so compilers don't have to be paranoid and produce code
| that sucks. If an underlying object is const, the compiler
| is allowed to assume that it does not change (it is valid
| to cast away const on a pointer or reference, but not if
| the object itself was declared const).
| toast0 wrote:
| > (it is valid to cast away const on a pointer or
| reference, but not if the object itself was declared
| const).
|
| Isn't it valid to cast to non-const for a const, but only
| invalid to modify the const through the casted pointer?
| [deleted]
| why_only_15 wrote:
| Interestingly GCC will still optimize the loop function
| even if you have code that modifies the modulus.
| https://godbolt.org/z/EE9PnrY7s
| [deleted]
| jheriko wrote:
| i think this is most compilers.
| kevin_b_er wrote:
| I agree. Static alone was the incorrect choice. If the global
| were being modified elsewhere in the file then this
| optimization would also not be possible.
|
| The core optimization is modulus % constant (and a power-of-2
| as well). The static just enabled the optimizer to do better
| heavy lifting to get there. A const would've made the intent
| clear to the human reader and the compiler.
|
| static const would've been best.
| daneel_w wrote:
| Use a const instead. It's the right tool for the job.
| fallingfrog wrote:
| Wouldn't making that const or using #define be a bit cleaner?
|
| Honestly if I as the programmer knew that I was really trying to
| select bits from a number I'd just use a binary and directly. In
| that specific situation I think the intent is more clear that
| way. Like:
|
| //select the bottom 8 bits
|
| unsigned bottom8 = val & 0xff;
| sdfdf4434r34r wrote:
| As an embedded programmer working on small micros I make every
| single function static (including third party code, which I
| modify and bring into the source tree and curate myself), gives
| you global/link time optimizations and dead code elimination for
| free, leads to better code even at -O0, -Og and -O1, static const
| configuration structs/values gets optimized away, and so on.
|
| Really wish it was the default.
| [deleted]
| kevin_thibedeau wrote:
| It also gives you a nice private namespace for the translation
| unit so you can confidently modify any of the static objects
| without concern for impacting external code.
|
| I once had a gray beard chew me out over changing a function
| signature when revising things. I had to point out politely
| that it was static and that anyone who managed to link to the
| function had to be breaking a lot of rules to do so.
| GlitchMr wrote:
| Without `static`, compiler exports a symbol. $
| cat value.c int value = 42; int get_value()
| { return value; } $ make value.o
| gcc -c -o value.o value.c $ nm value.o
| 0000000000000000 T get_value U
| _GLOBAL_OFFSET_TABLE_ 0000000000000000 D value
|
| This symbol, not being `const` can be modified by any other
| compilation unit. $ cat main.c #include
| <stdio.h> int value; int get_value();
| int main() { value = 123456789;
| printf("%d\n", get_value()); } $ make main.o
| gcc -c -o main.o main.c $ cc value.o main.o
| $ ./a.out 123456789
|
| Compiler when generating an object file has to assume the value
| of exported non-const symbol can change. It's necessary to tell
| the compiler that the value cannot change, either by not
| exporting the symbol by using `static` or making the value of it
| `const`. In example provided in your article `static` makes sense
| (or even `static const`) as I don't think there is a reason to
| export this global.
| rostayob wrote:
| Yes, see last few paragraphs of the post, in which I also
| speculate why GCC doesn't infer that there is only one
| compilation unit.
| makapuf wrote:
| I quite like C syntax, and I sometimes long for a language that
| would change a few elements (default to const and static,
| different/safer standard api for strings). Some might call it
| Go but a simple slightly updated C would be nice.
| mhh__ wrote:
| That is pretty much D (assuming you only want to use the
| C-like bits) - it's not const by default but const is much
| easier to use and much more violent
|
| And arrays are bounds checked by default, no more issues with
| them (and strings are arrays)
| nicoburns wrote:
| Have you seen zig? It's a modern language that tries to keep
| the simplicity and explicitness of C (it does change the
| syntax a bit though).
| api wrote:
| Division is slow, which is something most programmers don't know.
| If you can binary AND instead of MOD this can be a huge win.
|
| Multiplication is also very fast, usually one or two cycles on
| larger chips.
| jeffbee wrote:
| If your compiler doesn't do this for divisors known at compile
| time, get your money back.
| IvanK_net wrote:
| I think every programmer should know, that logical AND is many
| times faster than Modulus (which is at least as hard as a
| division), and use & instead of % right in his code for powers of
| two (and not expect it to be done by a compiler).
| toast0 wrote:
| That's bitwise AND.
| Rompect wrote:
| This is like the easiest thing for a compiler to detect and
| optimize.
| codeflo wrote:
| True, but something to be aware of: If the compiler uses
| bitwise operations to implement %, it emits special handling
| of negative values (it's really more of a remainder operator
| than a modulus). You can avoid that either by using unsigned
| numbers, or & as suggested:
|
| https://godbolt.org/z/vdz59q994
| simias wrote:
| If I could travel back in time I'd tell Dennis to make "static"
| the implicit default, and have a special keyword like "public" or
| "export" for items that are meant to be accessible from outside
| the compilation unit.
|
| I'd also ask him to make "switch" break by default.
|
| Then I'd go kill Hitler or something.
| glouwbug wrote:
| Because, admittedly, non-default switch breaks are on par with
| Hitler
| phaemon wrote:
| No, even if you could kill Hitler, then WW2 doesn't happen,
| computers aren't invented yet, so time travel isn't invented
| yet so you can't head back to kill him. Read your briefing
| notes.
|
| Also
| https://tvtropes.org/pmwiki/pmwiki.php/Main/HitlersTimeTrave...
| wyldfire wrote:
| Apparently everyone kills Hitler on their first trip [1].
|
| [1] https://www.tor.com/2011/08/31/wikihistory/
| andi999 wrote:
| But then you would need 'unbreak' or just 'goto' to the next
| case. No duffs device no cigar.
|
| Probably a module/namespace system would be the biggest
| improvement.
| jcranmer wrote:
| The only real case I use switch fallthrough for is when you
| have two cases with the same code, e.g.:
| switch (foo) { case 1: case 2: /* common
| body */ break; case 3: /* body 3 */
| break; }
|
| It's not cognitively hard to make an empty case body
| fallthrough to the next run, but include an implicit break at
| the end of every nontrivial body.
|
| Supporting Duff's Device is not a compelling feature to
| support--irreducible loops are basically going to destroy any
| hope of optimization you might accrue.
| jackewiehose wrote:
| > The only real case
|
| which is probably the case in 90% of all my switch'es.
| Worst use of a time machine ever.
| pjmlp wrote:
| > But then you would need 'unbreak' or just 'goto' to the
| next case. No duffs device no cigar.
|
| From Algol's linage point of view, a very good outcome.
| jimsmart wrote:
| FWIW: Go's switch statements are break by default, with an
| explicit 'fallthrough' keyword if one wishes to override that
| behaviour.
| cpeterso wrote:
| clang and gcc have a -Wimplicit-fallthrough warning flag
| that will warn about switch cases that fall through without
| either C++17's [[fallthrough]] attribute or a /*
| fallthrough */ comment.
|
| I made Firefox's code base able to compile with -Wimplicit-
| fallthrough. About one hundred fall through cases needed to
| be annotated and about 2-3 were actual bugs (though minor).
| Someone wrote:
| Yes, but you would need way fewer of such phrases than you
| need now. Making the exceptional case take more code is the
| way to go, IMO.
|
| Many languages also support something like
| case 1,2,4:
|
| or even case 1-10, 20-22, 34:
|
| That further decreases the need for falling through.
| wyldfire wrote:
| We have learned so much over the decades of using C. Lowest-
| visibility-by-default is just one of the many good choices of
| Rust and other C successors. The example here is for codegen
| benefits but reducing visibility benefits encapsulation too.
| nyc_pizzadev wrote:
| My first C intuition would be defining this value as a macro. If
| I was in C++, then a const would make sense.
| lr4444lr wrote:
| Great write up: a precise problem that digs into the internals
| pointedly to teach a simple concept. This is exactly how I tell
| the junior devs where I work to do lunch talks that give people
| some concrete, memorable piece of learning that makes them better
| in their practical work.
| rostayob wrote:
| Author here -- just to be clear, I agree that marking the
| variable as const is the "right" thing to do here. I reported the
| investigation as-is because removing a static declaration made
| the code slower, which I then narrowed down to the isolated bit
| of code.
| codeflo wrote:
| I agree with many of the sibling comments, static vs. non-static
| is almost a complete red-herring.
|
| Static only means that the variable is local to the translation
| unit (the C file). The relevant difference in the example is
| actually the const-ness of the variable, which you may put
| explicitly, but which a powerful compiler can also infer here in
| the static case.
|
| Other than this optimization possibility, const and static are
| orthogonal concepts. I'm not sure to what extent the article
| author is aware of this.
|
| So the lesson should be: use const (or #define) if you mean to
| have a constant. It's still a good idea to also make things
| static, but the real reason for that is to avoid name collisions
| with variables in other C files.
| jheriko wrote:
| this is totally wrong in my experience, although some of the
| understanding is right. by using static to limit to the
| compilation unit the compiler can workout the value is
| constant.
|
| const doesn't do this thanks to const_cast etc. maybe things
| have changed, but const on a file level variable doesn't do
| this reliably, or at least hasn't for considerable lengths of
| time.
|
| of course 'static const' is the better answer. :)
| codeflo wrote:
| I hope your comment wasn't intended to be as hostile as it
| reads to me, but I do wonder what your experience was
| exactly, because there might be a misinterpretation.
|
| From a technical standpoint, using const_cast to modify a
| constant is Undefined Behavior. This was explicitly specified
| that way to make constants inlineable by the compiler.
| _Every_ compiler that I 've ever used does this, it's a
| trivial but very effective optimization.
|
| Proof by Godbolt: https://godbolt.org/z/djEdvee4s
|
| Observe how GCC completely eliminates the contents of
| undefined_mutate_modulus (which it's allowed to -- UB means
| it can do anything with that function, and it chooses the
| simplest possible thing) rather than de-optimizing mod4_const
| like you suggested. Compilers are smart.
| nicetryguy wrote:
| Const / static allows for "immediate" ASM instruction generation:
| which means the value is known at compile time so it can compare
| it directly inline as opposed to the overhead of comparing it to
| a labeled memory address. It's generally good practice whenever
| possible.
| omegalulw wrote:
| Isn't that constexpr?
| jheriko wrote:
| what?!?!
|
| static doesn't imply the value can be known, only in some cases
| where its not modified in that compilation unit (which is
| trivial to detect with SSA)
|
| const is something else.
| jheriko wrote:
| really? people mentioned const?
|
| you can tell how much low-level optimisation they have done if
| they think its gonna change codegen reliably, or at ll.
| einpoklum wrote:
| See my other comment here regarding const.
| arthur2e5 wrote:
| Link time optimization would enable a similar change even without
| a code edit. Using static is good, but it's a good idea to figure
| out how to just let other people's code run fast too.
| synergy20 wrote:
| use const is a better choice, I changed static to const, the
| result is the same here.
| malkia wrote:
| In ideal world const, constexpr, explicit (for constructors), and
| no default implicit conversions (and others that I'v missed)
| should've been the default...
| jheriko wrote:
| the biggest win here is informing the compiler sufficiently to
| swap out div for and.
|
| the use of static is just a tool to inform the compiler that the
| value is a constant (which const /might/)
| [deleted]
| einpoklum wrote:
| Title rephrase:
|
| When the compiler can assume your values don't change magically,
| it can optimize their use.
|
| This is true for restricted pointers, for global-scope variables
| which can only be accessed in the same translation unit, for
| stuff in inlined functions (often), etc.
|
| --------------------------------------------
|
| const is a bit shifty. const makes the compiler restrict what it
| allows you to write, but it can still not really assume other
| functions don't break constness via casting: void
| i_can_change_x_yeah_i_can_just_watch_me(const int* x) {
| *(int*) x = x + 1; }
|
| now, if the compiler sees the code, then fine (maybe), but when
| all you see is: void sly(const int* x);
|
| You can't assume the value pointed to by x can change. See this
| on GodBolt: https://godbolt.org/z/fGEMj9Meo
|
| and it could well be the same for constants too. But somehow it
| isn't:
|
| https://godbolt.org/z/fqGzh7o8z
| missblit wrote:
| Specifically it's well-defined behavior to mutate an object
| after const_cast-ing away constness if the object wasn't const
| to begin with (const references or const pointers can refer to
| non-const objects).
|
| In your first example you have `int x = 1;` which isn't const,
| so the compiler has to assume that `f` may mutate it after
| const casting.
|
| In your second example you have `const int x = 1;` which is
| const so the compiler can assume the value will never change.
| einpoklum wrote:
| You must be right. But - it's so easy to forget that! Or -
| never to be told that in the first place.
___________________________________________________________________
(page generated 2021-07-04 23:00 UTC)