[HN Gopher] How to make smaller C and C++ binaries
       ___________________________________________________________________
        
       How to make smaller C and C++ binaries
        
       Author : costco
       Score  : 92 points
       Date   : 2023-05-07 17:28 UTC (5 hours ago)
        
 (HTM) web link (ptspts.blogspot.com)
 (TXT) w3m dump (ptspts.blogspot.com)
        
       | frozenport wrote:
       | Hard to make sense without a relative scale. Are they trying to
       | make a 20kb program 15kb?
        
       | wyldfire wrote:
       | > In C++, use -fno-exceptions if your code doesn't use
       | exceptions.
       | 
       | > In C++, use -fno-rtti if your code doesn't use RTTI (run-time
       | type identification) or dynamic_cast.
       | 
       | Note that "your code" here generally must encompass "your code"
       | and all of your C++ archives/shared object dependencies (and
       | their dependencies).
        
         | lldb wrote:
         | You can usually compile each library independently with/without
         | exceptions and rtti and it works as expected.
        
           | pjmlp wrote:
           | It works by chance more than anything.
        
           | mhh__ wrote:
           | Works as expected or silently fails to destruct things
           | properly during unwinding?
        
             | mort96 wrote:
             | Haven't tried, but it almost certainly fails in fun and
             | unexpected ways I'd guess. Code which happens to be in your
             | dependencies' source files probably works as expected, code
             | which happens to be in your dependencies' source files will
             | be compiled without RTTI/exceptions and therefore possibly
             | break in fun and interesting ways.
             | 
             | And I wonder what happens if your code creates a class
             | which inherits from a base class defined in a library and
             | that library expects anything which inherits from that base
             | class to have runtime type info.
        
               | Kranar wrote:
               | Nothing breaks in fun or interesting ways. All that
               | happens is std::terminate is called.
        
               | mort96 wrote:
               | Even when a template or inline function depends on RTTI
               | but is called in a source file compiled with -fno-rtti?
               | 
               | And if a template or inline function depends on catching
               | an exception, you're right that your app unexpectedly
               | crashing due to std::terminate isn't "fun or interesting"
               | but it's not really great either.
        
               | Kranar wrote:
               | Templates and RTTI do not have anything to do with each
               | other.
               | 
               | An inline function that depends on RTTI will fail to
               | compile when -fno-rtti is enabled.
               | 
               | There is also nothing unexpected about an application
               | immediately terminating due to an exception. That's the
               | most expected outcome in an application that has made an
               | explicit choice to not handle exceptions and is what
               | happens in any application that does not have a try/catch
               | handler.
        
               | mort96 wrote:
               | Templates and inline functions are relevant because
               | they're in headers and therefore compiled as part of your
               | project's translation units. So if a function from a
               | library depends on RTTI, and it's compiled as part of
               | your project's translation units with -fno-rtti,
               | something will break (or maybe you'll have an ODR
               | violation).
               | 
               | > There is also nothing unexpected about an application
               | immediately terminating due to an exception. That's the
               | most expected outcome in an application that has made an
               | explicit choice to not handle exceptions and is what
               | happens in any application that does not have a try/catch
               | handler.
               | 
               | If I call a function from a library, and that library
               | function depends on catching and handling exceptions
               | (maybe it uses a stdlib function which throws, such as
               | vector::at), it's unexpected that my application crashes
               | due to that exception. Yet, if that library function
               | happens to be in a header, calling it will crash my
               | program (or, again, you have an ODR violation and
               | anything could happen).
        
               | Kranar wrote:
               | >If I call a function from a library, and that library
               | function depends on catching and handling exceptions
               | 
               | This doesn't make any sense. If a function from a library
               | depends on catching and handling an exception then it can
               | continue to do so. Using -fno-rtti doesn't change the
               | semantics of functions that were compiled without that
               | flag, they can continue working as usual.
               | 
               | All -fno-rtti does is treat your program as if any
               | exception thrown goes unhandled, which in C++ means that
               | std::terminate is called.
               | 
               | If you're saying you have a function that throws an
               | exception and that function expects someone else to catch
               | it, then the problem isn't with -fno-rtti, the problem is
               | with your function's expectations. That means your
               | function is using exceptions as a form of control flow,
               | which is not a recommended practice. The primary goal and
               | intention of exceptions is that the function that throws
               | it is agnostic about how the exception is handled, or
               | whether the exception is handled at all.
               | 
               | A function fails to meet its post-condition can throw an
               | exception. If that exception gets caught then the
               | receiver of that exception makes the decision on to
               | proceed (not the thrower), if that exception does not get
               | caught then std::terminate is called. All -fno-rtti does
               | is guarantee that any exception thrown goes uncaught.
               | 
               | This is the least surprising behavior one can expect.
               | What possible alternative behavior would you want from a
               | function that throws an exception that goes uncaught?
        
               | mort96 wrote:
               | You seem to be completely misunderstanding what I'm
               | saying. Of course a function compiled without -fno-rtti
               | or -fno-exceptions can keep using those features. There's
               | a reason I keep talking about functions defined _in
               | headers_. Those functions are compiled in _your program
               | 's_ translation units, with _your program 's_ compiler
               | options.
        
               | Kranar wrote:
               | And you seem to completely misunderstand how -fno-rtti
               | works but you speak as if you're knowledgeable about it
               | despite the fact that it's absolutely obvious you've
               | never even bothered to try it yourself or read the
               | documentation for it.
               | 
               | It's very hard to understand someone who is repeatedly
               | making false claims but doesn't realize it.
               | 
               | If you don't know how something works, please avoid
               | making claims about it as if you did know such as:
               | 
               | >Yet, if that library function happens to be in a header,
               | calling it will crash my program (or, again, you have an
               | ODR violation and anything could happen).
               | 
               | That's a false claim and saying it does nothing but
               | create confusion and spread misinformation. Especially
               | since you're referencing ODR violations which to someone
               | who doesn't know better might be fooled into thinking
               | your claim is more credible than it really is (there is
               | no ODR violation).
        
               | mort96 wrote:
               | I didn't know that certain compilers (though not all)
               | will error in that situation. I'm sorry. You will only
               | have fun unexpected behaviors on some compilers.
               | 
               | Maybe we could've gotten to this point faster though if
               | you didn't keep responding as if I wasn't talking about
               | library functions defined in headers.
        
               | Kranar wrote:
               | My second reply to you was:
               | 
               | >An inline function that depends on RTTI will fail to
               | compile when -fno-rtti is enabled.
               | 
               | Note that you completely ignored that point.
        
               | Kranar wrote:
               | You get a compiler error.
        
               | mort96 wrote:
               | Seems like it's an error on GCC and Clang, though just a
               | warning on MSVC. That makes it much less of a foot gun at
               | least.
        
             | Calavar wrote:
             | It's similar to slapping noexcept on every function in a
             | compilation unit. Unwinding will still happen as usual in
             | compilation units where exceptions were enabled. When a
             | function in an -fno-exceptions compilation unit calls a
             | function that might throw exceptions, the compiler inserts
             | an exception handler that calls std::terminate.
             | 
             | std::terminate won't call any destructors further up the
             | chain, but if all you're doing in destructors is
             | deallocating memory and releasing handles to system
             | resources, this shouldn't matter.
        
               | inetknght wrote:
               | > _if all you 're doing in destructors is deallocating
               | memory and releasing handles to system resources, this
               | shouldn't matter._
               | 
               | Note that closing system resources is often abortive. On
               | the other hand, object destructors might wait for a
               | flush.
        
               | wyldfire wrote:
               | > if all you're doing in destructors is deallocating
               | memory and releasing handles to system resources, this
               | shouldn't matter.
               | 
               | That "if" is doing some heavy lifting. C++ places a lot
               | of stock in RAII, so failing to call destructors could
               | cause lots of interesting failure modes. I don't think
               | it's fair to dismiss this fundamental part of the
               | language so easily.
        
               | Calavar wrote:
               | If the user kills your program with Ctrl-C, you don't
               | have any destructor guarantees. If your program is killed
               | by the OS because the user logged off or powered down
               | their machine, you don't have any destructor guarantees.
               | If your program fails an assertion, you don't have any
               | destructor guarantees. If your program segfaults, you
               | don't have any destructor guarantees. This really isn't
               | that exotic of a scenario.
               | 
               | My personal opinion is that using RAII to do anything
               | that you wouldn't trust the OS do for you in the event of
               | an early termination is setting yourself up for disaster.
        
             | jcelerier wrote:
             | -fno-exceptions does not disable unwinding support afaik,
             | for this you need -fno-unwind-tables / -fno-asynchronous-
             | unwind-tables ; if I'm not mistaken unwind table generation
             | is enabled even in C with GCC and clang
        
         | lelanthran wrote:
         | Agreed, but how certain can you be that the program generates
         | no exceptions?
         | 
         | Simply instantiating a pointer to class using new uses
         | exceptions.
         | 
         | Without exceptions all constructors that fail turn into
         | undefined behaviour. The caller will never know that the
         | instance they are using is filled with random bytes.
        
           | pjmlp wrote:
           | Unless there is std::nothrow in every new invocation.
        
           | ridiculous_fish wrote:
           | I'm not sure that a failing constructor is UB under -fno-
           | execptions. The libstdc++ manual suggests that these will
           | instead abort:
           | 
           | https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_except.
           | ..
        
             | zabzonk wrote:
             | > I'm not sure that a failing constructor is UB under -fno-
             | execptions
             | 
             | the c++ standard does not allow for disabling exceptions,
             | so at best this would be implementation-dependent
             | behaviour, at worst UB.
             | 
             | in other words, using -fno-execptions, and similar on other
             | compilers, raises compatibility problems.
        
               | mort96 wrote:
               | You're right, the C++ standard does not allow for
               | disabling exceptions. So C++ with exceptions disabled is
               | not a language described by the standard. We instead have
               | to look at the documentation of this "C++ with exceptions
               | disabled" language.
               | 
               | And this "C++ with exceptions disabled" language defines
               | that exceptions terminate the program.
               | 
               | There is no UB anywhere here, everything is well-defined.
               | 
               | You're right again though that there are compatibility
               | problems, you could hypothetically use a compiler whose
               | '-fno-exceptions' option actually defines throwing
               | exceptions to be UB. No compiler would do that since it's
               | obviously insane, but it is a possible alternate "C++
               | with exceptions disabled" language. Make sure to not use
               | such compilers.
               | 
               | And it's probably a good idea to keep your code valid C++
               | anyways, meaning a standard C++ compiler will terminate
               | the program due to an uncaught exception while GCC and
               | Clang with -fno-exceptions will terminate the program by
               | calling abort(). This way, GCC and Clang's -fno-
               | exceptions behavior is just an optimization for when
               | compiling on those compilers.
        
               | kelnos wrote:
               | > _And this "C++ with exceptions disabled" language
               | defines that exceptions terminate the program._
               | 
               | Except that's not a language that is standardized, so
               | there's nothing that defines anything here. GCC and Clang
               | may call std::terminate() when exceptions are disabled
               | and some library/dependency throws, but unless your build
               | process aborts when a compiler other than GCC or Clang is
               | used, you can't rely on that behavior in general.
        
               | mort96 wrote:
               | As I wrote in response to your sibling comment which
               | pointed out exactly the same thing:
               | 
               | Correct! It's a language defined ad-hoc by GCC and Clang.
               | GCC's man page describes its behavior with '-fno-
               | exceptions' to be essentially: act like a normal C++
               | compiler, except produce code which calls abort() where
               | it would otherwise have engaged the exception throwing
               | machinery. That defines a kind of "C++ with exceptions
               | disabled" language, but it's not a standard.
        
               | zabzonk wrote:
               | > "C++ with exceptions disabled" language
               | 
               | not a language standard i was previously aware of.
        
               | mort96 wrote:
               | Correct! It's a language defined ad-hoc by GCC and Clang.
               | GCC's man page describes its behavior with '-fno-
               | exceptions' to be essentially: act like a normal C++
               | compiler, except produce code which calls abort() where
               | it would otherwise have engaged the exception throwing
               | machinery. That defines a kind of "C++ with exceptions
               | disabled" language, but it's not a standard.
        
       | charcircuit wrote:
       | I wish this article included a benchmark of a large project and
       | showed the difference each option made. I have a feeling most of
       | these don't have much of an affect.
        
         | zabzonk wrote:
         | i don't think any of these will greatly, if at all, increase
         | performance. they are of interest for people that are a)
         | obsessive about program size or b) want to fit the compiled
         | code into some very small storage device, such as a rom.
        
           | charcircuit wrote:
           | I never said "performance benchmark." By benchmark I meant
           | you take a few programs like Chromium, clang, etc and measure
           | how much savings each option gives.
        
           | nrclark wrote:
           | Program size has a lot of interaction with performance, by
           | way of your CPU's L2/L3 cache.
           | 
           | If your program is too large, you get more cache misses when
           | branching around. Frequent cache misses have a _huge_ effect
           | on your code's overall performance, and can completely negate
           | any of the wins you get from loop unrolling or other
           | optimizations.
           | 
           | This effect is very pronounced on small armv7l/aarch64 CPUs,
           | which usually have much smaller caches than a desktop CPU.
           | You can even see it on laptop CPUs too.
           | 
           | It's impossible to know where exactly your size cutoffs are,
           | because they depend a lot on what else is running in your
           | system (and CPU core allocation, etc). But they 100% exist,
           | for any program on any system. If your whole program can fit
           | into your CPU cache, you'll see a big performance difference
           | vs having to fetch it from RAM all the time.
        
         | alpaca128 wrote:
         | Considering one of the listed options talks about "several
         | kilobytes" of reduced size it's definitely not all a huge
         | improvement. A standard stripping of debug information is
         | probably one of the biggest and easiest steps.
         | 
         | Using upx can also make a significant difference, but
         | unfortunately not every OS likes the result. Mac OS just
         | immediately killed my program when it was compressed like that.
        
       | stabbles wrote:
       | Is there any difference between "-ffunction-sections -fdata-
       | sections -Wl,--gc-sections" and gcc's -fwhole-program?
        
         | mgaunard wrote:
         | Yes, they do completely different things.
         | 
         | The first affects the layout of object files. The second
         | affects the optimizations done when linking.
        
           | leni536 wrote:
           | -fwhole-program is a compile-time optimization, not link-
           | time. You shouldn't link anything to a translation unit
           | compiled with -fwhole-program, as the whole point of it is to
           | compile the whole application in a single translation unit
           | and tell the compiler to optimize accordingly.
           | 
           | It can also eliminate unused data and functions, as it
           | basically treats most entites as internal linkage, and that's
           | one of the optimisations applied there. The mechanism is
           | rather different to GCing sections at link time though. I
           | assume LTO has something similar.
           | 
           | It's useful to combine it with unity builds. It's poor man's
           | LTO in a way.
        
       | mgaunard wrote:
       | There are two paths towards building small binaries:
       | 
       | - build all of your dependencies yourself (including standard
       | libs) and ensure they're built with LTO, and link everything
       | statically. Everything you don't actually need will be removed.
       | 
       | - link everything dynamically. The total image size is huge, but
       | technically it's not part of your binary, and can be shared with
       | other binaries as well.
        
         | vvanders wrote:
         | That's actually one thing that I really like about Rust, since
         | everything is static linkage by default it's easy to make
         | pretty sweeping changes, on nightly you can even recompile the
         | stdlib with different configurations(I.E. Oz/Os) if you want.
        
         | hackcasual wrote:
         | LTO without Os/z will increase your binary size, since more
         | things get inlined
        
           | einpoklum wrote:
           | Why would most things get inlined when you're optimizing for
           | size rather than speed?
        
       | markus_zhang wrote:
       | I have heard that embedded developers sometimes build their own
       | libc to minimize the size of the binary file. I'm wondering which
       | is a good example (readable source, good coding techniques) that
       | I can learn from?
        
         | ghotli wrote:
         | You're probably looking for something like this. Newlib is the
         | libc in play here. Not exactly what you're asking for, but
         | adjacent at least.
         | 
         | https://interrupt.memfault.com/blog/memcpy-newlib-nano
        
           | markus_zhang wrote:
           | Thanks, looks like the thing I'm looking for.
        
       | rbrown46 wrote:
       | I've gotten good insight into what takes up space in binaries by
       | profiling with Bloaty McBloatface. My last profiling session
       | showed that clang's ThinLTO was inlining too aggressively in some
       | cases, causing functions that should be tiny to be 75 kB+.
       | 
       | https://github.com/google/bloaty
        
         | chc4 wrote:
         | If you can run PGO, it will take the profiling information into
         | account when doing inlining heuristics, which can help a lot in
         | some cases. Technically that is general optimization for speed
         | and not size, though, so if you really care specifically for
         | binary size you'd probably still have to muck about with
         | noinline attributes and such.
        
           | mananaysiempre wrote:
           | Unfortunately, PGO done the default way is antithetical to
           | reproducible builds. You can avoid that by putting the
           | profiling data in your VCS, but then you suffer of all the
           | consequences of a version-controlled binary blob, one heavily
           | dependent on other files at that.
           | 
           | Perhaps it should be possible use profiling data to keep
           | human-managed {un,}likely or {hot,cold} annotations up to
           | date? How valuable are PGO's frequencies compared to these
           | discrete-valued labels? (I know GCC allows you to specify
           | frequencies in the source, but that sounds less than
           | convenient.)
        
         | ghotli wrote:
         | I spent a lot of time with bloaty for our embedded application
         | and found I had more actionable output from something like
         | this...
         | 
         | nm -B -l -r --size-sort --print-size -t d
         | ./path/to/compiler/output{.so} | c++filt > /tmp/by_size
         | 
         | Just a lot of flags that show you size by symbol in decimal
         | with unmangled symbols. Run it before you run `strip` in your
         | CI pipeline or whatever preps a build for proper release.
        
       | boomanaiden154 wrote:
       | If you're using Clang/LLVM you can also enable ML inlining[1]
       | (assuming you build from source) which can save up to around 7%
       | if all goes well.
       | 
       | There are also talks of work on just brute forcing the inlining
       | for size problem for embedded releases for smallish applications.
       | It's definitely feasible if the problem is important enough to
       | you to throw some compute at it [2].
       | 
       | 1. https://github.com/google/ml-compiler-opt
       | 
       | 2. https://doi.org/10.1145/3503222.3507744
        
       | Night_Thastus wrote:
       | There's also some really terrible advice in here. Not using the
       | STL is really, really bad advice for most. Same with fast-math.
       | Same with "just write C instead lol".
       | 
       | If you're in an situation where binary size is absolutely
       | critical, then sure. But most people should avoid most of these
       | suggestions.
        
         | mkoubaa wrote:
         | The title of the article is literally "How to make smaller C
         | and C++ binaries"
        
         | kevin_thibedeau wrote:
         | He also recommends disabling exceptions which is a valid
         | approach for size constrained applications. STL throws, so it
         | can never be used safely with exceptions disabled. When you do
         | this you're committing to writing C-with-classes rather than
         | modern C++.
        
           | vodou wrote:
           | This is pretty uncontroversial in an embedded system context.
           | As others have said in this thread, nothing spectacular
           | happens if STL throws, it just boils down to a
           | std::terminate. You can mitigate this by being careful with
           | what you do with STL.
           | 
           | Also, in a real-time system context, exceptions can be
           | undesirable since they might cause non-deterministic
           | behaviour.
           | 
           | Catching an exception can be surprisingly costly. Did some
           | benchmarking a while ago on the embedded, real-time system I
           | work on and saw that throwing and catching a
           | std::runtime_error had about the same execution time as a
           | rather slow CRC32 calculation (no pre-calculated tables, no
           | special instructions) of a 256 bytes input array. (Of course,
           | this depends a lot of the CPU architecture, compiler, etc.)
        
           | galangalalgol wrote:
           | There have been implementations of most of the stl without
           | exceptions here and there for embedded stuff. You can do
           | modernish c++ without rtti or the stl, you will just have to
           | do some extra work.
        
           | flafla2 wrote:
           | I agree with the sentiment but in practice I've found that
           | most C++ STL exceptions throw in a "fatal error" type of
           | scenario like a bad allocation and generally not an "expected
           | error". For example, basic_ifstream::open() sets a fail bit
           | on error, and doesn't throw an exception.
           | 
           | This is in contrast to python or Swift for example, their
           | standard libraries are more "throw-prone". Building off the
           | previous example Swift's String.init(contentsOf:encoding:)
           | throws on error on failure.
           | 
           | So in practice, IMO it is usually safe to disable exceptions
           | in C++. Though, I have run into tricky ABI breaks when you
           | link multiple libraries in a chain of
           | exceptions->noexcept->exceptions and so on! You're of course
           | at the mercy of nonstandard behavior so buyer-beware. I
           | definitely wouldn't advocate for turning them off -just- for
           | a binary size reduction.
        
         | IChooseY0u wrote:
         | Why is avoiding STL bad advice? Lack of problem solving skills?
         | You could survive with LIST_ENTRY and UNICODE_STRING-type
         | structures.
        
         | stephc_int13 wrote:
         | Are you aware that STL, exceptions and RTTI are _verboten_ by
         | most game studios?
         | 
         | Templates and operators overloading are also quite often
         | frowned up.
         | 
         | https://gist.github.com/raizr/c08922b11b33477a1157156e424342...
        
         | coliveira wrote:
         | The stated goal of the article is to reduce binary size. If you
         | want to use C++ and STL you already have decided that binary
         | size is secondary. By the way, the article even mentions
         | rewriting some routines in assembly.
        
           | lozenge wrote:
           | This is like saying "how to reduce the load time of your web
           | app - remove all CSS". Technically true, but not useful for
           | most web apps. People can want faster loading without
           | removing all styling.
        
             | Kranar wrote:
             | I don't see how your claim follows or would be true.
             | Replacing CSS with the equivalent JavaScript would most
             | certainly not reduce load times. In fact you should use CSS
             | as a means of reducing load times.
        
               | wtetzner wrote:
               | They didn't suggest replacing it with JavaScript.
        
               | Kranar wrote:
               | They didn't really suggest anything. Feel free to propose
               | a way of replacing CSS functionality that results in
               | faster load times.
        
             | colonwqbang wrote:
             | I think that's quite unfair. The article contains many
             | different tips on how to reduce size. Rewriting in a
             | different language was just one of them, pretty far down on
             | the list.
             | 
             | "Write in C instead of C++" is also clearly not the same as
             | "remove all your code"...
        
             | ipaddr wrote:
             | It's more like saying remove react and use pure javascript.
        
         | Kranar wrote:
         | Not using the STL in and of itself is not terrible advice, it's
         | a trade-off. For example, I tend to avoid the STL because of
         | how inconsistent it is across the various compilers in terms of
         | performance, bugs, and behavior. The standard leaves a great
         | deal of ambiguity in how the STL behaves and those
         | inconsistencies can be frustrating to deal with when you're
         | writing cross platform software. Furthermore plenty of embedded
         | systems also cut out the STL entirely due to disabling
         | exceptions. Using fast-math is also a perfectly acceptable
         | option for numerous domains that do not need strict IEEE
         | conformance, such as graphics programming or training neural
         | networks.
         | 
         | If people were to listen to your advice, no one would ever post
         | any kind of article on how to achieve certain specific goals,
         | all we'd ever discuss are very generic topics and constantly
         | rehash the same content over and over again. It's good to be
         | able to learn how certain developers manage to accomplish
         | various goals without having someone come along and call that
         | advice terrible.
        
           | maccard wrote:
           | > For example, I tend to avoid the STL because of how
           | inconsistent it is across the various compilers in terms of
           | performance and behavior.
           | 
           | What are your targets? This was my experience 10 years ago,
           | but today with modern (last 5-6 years) MSVC, GCC and Clang
           | (and even on modern Xbox and Playstation toolchains) my
           | experience has been that it's close enough.
        
             | kwant_kiddo wrote:
             | Even just between x86 and ARM there can be a big enough
             | difference in performance s.t your product straight up does
             | not work on one platform, and that is with the current gcc
             | toolchain.
             | 
             | I have been working with small embedded devices so this may
             | be an outlier here compared to the rest of the C/C++ world.
        
         | bee_rider wrote:
         | The advice for fast math is kinda funny.
         | 
         | > If you don't need IEEE-conformat floating point calculations,
         | use -ffast-math .
         | 
         | I mean... if you know enough to know whether you need standard
         | floating-point calculations, you probably know about the
         | -ffast-math flag.
         | 
         | Someone who is more into system specific stuff might correct
         | me, but I believe fast math produces a binary which can flip
         | bits in the MXCSR if it wants. So, you'll have to make sure
         | that none of your libraries make any assumptions about floating
         | point behavior, not just your code...
        
           | mananaysiempre wrote:
           | > if you know enough to know whether you need standard
           | floating-point calculations, you probably know about the
           | -ffast-math flag.
           | 
           | If you just want physics engine go brr and a small binary,
           | though, maybe you don't.
           | 
           | I find the FP advice for a different reason: I've seen some
           | no good, very bad, terrible, really horrifyingly awful
           | codegen from Clang for x86-64 things involving `long double`,
           | and without checking first I wouldn't assume that it doesn't
           | extend to all x87 stuff in general.
           | 
           | (Clang's preferred optimized way to copy a 16-byte union with
           | a `long double` being the longest member is... to copy the
           | first ten bytes with FLD/FSTP, then the next two by a word-
           | sized MOV, then the final four by a doubleword-sized MOV.
           | When Clang spills onto the stack a `long double` temporary
           | _it itself invented_ to implement an atomic operation's inner
           | loop, it first _clears the padding_ using _a bloody XORPS
           | /MOVAPS pair_ before FSTPing it there--twice, to two
           | identical temporaries, because apparently it belongs to the
           | offshore oil rig school of register spilling. And so on. I
           | lack the words to describe how bad Clang's codegen is for
           | anything that might have ever briefly brushed against a `long
           | double`.)
        
       | waynecochran wrote:
       | I imagine this is w huge win in large, multiple file linked
       | binaries.:                    In C++, use as few template types
       | as possible
       | 
       | But if I am not using the STL and many other C++ features, I'll
       | just write the code in C.
        
         | pjmlp wrote:
         | Even when writing C like code, C++ has the edge with stronger
         | type system, RAII, ability to use/implement types with bounds
         | checking and fat pointers.
        
           | Gibbon1 wrote:
           | We could disband WG14 and add fat pointers to C.
        
             | pjmlp wrote:
             | Not even Dennis managed to convince them otherwise,
             | 
             | https://www.bell-labs.com/usr/dmr/www/vararray.html
        
       | josefx wrote:
       | One thing I often do on Linux for so files: default symbol
       | visibility to hidden / windows compatible. A downside is that
       | your dependencies have to set the visibility attribute correctly,
       | but that can still be worked around by wrapping includes between
       | push/pop pragmas.
        
       | mierle wrote:
       | I work on Pigweed, a suite of embedded C++ tools and libraries. A
       | major focus for us is size optimization, and we've written up
       | some of our learnings:
       | 
       | https://pigweed.dev/docs/size_optimizations.html
        
       | synergy20 wrote:
       | I tested static build the other day,it seems zig has a much
       | smaller static executable comparing to c, did not dive in but was
       | quite surprised
        
       | laurentlb wrote:
       | On my free time, I make demos in less than 64kB (sometimes 8kB).
       | I use many of the advices described in this article, except that
       | I work on Windows with Visual Studio (I'd like to compare with
       | other compilers in the future). In particular, I disable
       | exceptions, I compile without any standard library, and I avoid
       | virtual methods. These are the most important things, in my
       | experience.
       | 
       | Regarding compression, demoscene tools like kkrunchy, Squishy
       | (https://logicoma.io/squishy/), and Crinkler can give much better
       | results than UPX. They come with their own downsides, e.g. made
       | for Windows, decompression can be slow, outputs might trigger
       | antivirus systems, etc.
       | 
       | Another important advice is to check the binary size regularly.
       | It can be difficult to predict how a code change will affect the
       | binary (especially after compression!), so you need to test it
       | often.
       | 
       | > If it's feasible, rewrite your C++ code as C.
       | 
       | Many C++ features are just free syntactic sugar, so I've decided
       | to use them (e.g. namespaces, classes without virtuals).
        
       | z303 wrote:
       | Lots of resource from the demoscene, especially 1K, 4K and 64K
       | intros
       | 
       | - How a 64k intro is made [1]
       | 
       | - in4k site creation of demoscene 1kb, 4kb and 8kb intros [2]
       | 
       | - SizeCoding.org is a wiki dedicated to the art of creating very
       | tiny programs [3]
       | 
       | [1] http://www.lofibucket.com/articles/64k_intro.html
       | 
       | [2] https://in4k.github.io/wiki/about
       | 
       | [3] http://www.sizecoding.org/wiki/Main_Page
        
         | GuB-42 wrote:
         | Came to say this. The main thing is to use a good packing tool.
         | There are also templates to set up your project correctly,
         | usually under Visual Studio.
         | 
         | For 64k: kkcrunchy or squishy
         | 
         | For 1-8k: crinkler, which is both a packer and a linker
         | 
         | For 256b or less: not enough space for a packer, use 16-bit x86
         | assembly and DOS .COM files
         | 
         | And of course, find a way to get interesting music and visuals
         | with very little data, usually, it means maths.
        
           | charcircuit wrote:
           | The main thing is arguably to use libraries or the OS as much
           | as possible since code there doesn't typically count for
           | demos.
        
       ___________________________________________________________________
       (page generated 2023-05-07 23:00 UTC)