codevoid.de

        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
 (HTM) Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
 (HTM)   The Cost of a Closure in C
       
       
        kazinator wrote 40 min ago:
        Defininig a callback interface in C without a user context parameter is
        a capital crime.
       
        kazinator wrote 1 hour 14 min ago:
        > Itâs no wonder GCC is trying to add -ftrampoline-impl=heap to the
        story of GNU Nested Functions; they might be able to tighten up that
        performance and make it more competitive with Apple Blocks.
        
        [disclaimer] Without brushing up on the details of this, I strongly
        suspect that this is about removing the need for executable stacks than
        performance. Allocating a trampoline on the stack rather than heap is
        good for efficiency.
        
        These days, many GNU/Linux distros are disabling executable stacks by
        default in their toolchain configuration, both for building the distro
        and for the toolchain offered by the system to the user.
        
        When you use GCC local functions, it overrides the linker behavior so
        that the executable is marked for executable stacks.
        
        Of course, that is a security concession because when your stack is
        executable, that enables malicious remote execution code to work that
        relies on injecting code into the stack via a buffer overflow and
        tricking the process into jumping to it.
        
        If trampolines can be allocated in a heap, then you don't need an
        executable stack. You do need an executable heap, or an executable
        dedicated heap for these allocations. (Trampolines are all the same
        size, so they could be packed into an array.)
        
        Programs which indirect upon GCC local functions are not aware of the
        trampolines. The trampolines are deallocated naturally when the stack
        rolls back on function return or longjmp, or a C++ exception passing
        through.
        
        Heap-allocated trampolines have an obvious deallocation problem; it
        would be interesting to see what strategy is used for that.
       
        hyperbolablabla wrote 2 hours 53 min ago:
        Stewart Lynch in his 10x VODs mentions his custom Function abstraction
        in C++. It's super clean and explicit, avoiding `auto` requirement of
        C++ lambdas. It's use looks something akin to:
        
            // imagine my_function takes 3 ints, the first 2 args are captured
        and curried.
            Function my_closure(&my_function, 1, 2);
            my_closure(3);
        
        I've never implemented it myself, as I don't use C++ features all too
        much, but as a pet project I'd like to someday. I wonder how something
        like that compares!
       
          spacechild1 wrote 1 hour 10 min ago:
          Isn't this basically the same as passing the function to
          std::bind_front and storing it in a std::function or
          std::function_ref?
       
        groundzeros2015 wrote 3 hours 3 min ago:
        Thread locals do solve the problem. You create a wrapper around the
        original function. You set a global thread local user data, you pass in
        a function which calls the function pointer accepting the user data
        with the global one.
       
          gpderetta wrote 32 min ago:
          reentrancy.
       
          srcreigh wrote 1 hour 14 min ago:
          Yep. Thread locals are probably faster than the other solutions shown
          too.
          
          Itâs confusing to me that thread locals are ânot the best idea
          outside small snippetsâ meanwhile the top solution is templating on
          recursion depth with a constexpr limit of 11.
       
        trgn wrote 3 hours 30 min ago:
        i wish JS gurus understood this before jumping all in on hooks and
        bloating the runtime footprint of every web app out there
       
        keymasta wrote 5 hours 7 min ago:
        It's a post about Man or Boy... and the only typo is...
        the word _son_. Pretty sure it's supposed to be "on"
       
        unwind wrote 5 hours 40 min ago:
        This was very interesting, and it's obvious from the majority of the
        text that the author knows a lot about these languages, their
        implementation, benchmarking corners, and so on. Really!
        
        Therefore it's very jarring with this text after the first C code
        example:
        
        This uses a static variable to have it persist between both the compare
        function calls that qsort makes and the main call which (potentially)
        changes its value to be 1 instead of 0
        
        This feels completely made up, and/or some confusion about things that
        I would expect an author of a piece like this to really know.
        
        In reality, in this usage (at the global outermost scope level)
        `static` has nothing to do with persistence. All it does is make the
        variable "private" to the translation unit (C parliance, read as "C
        source code file"). The value will "persist" since the global outermost
        scope can't go out of scope while the program is running.
        
        It's different when used inside a function, then it makes the value
        persist between invocations, in practice typically by moving the
        variable from the stack to the "global data" which is generally
        heap-allocated as the program loads. Note that C does not mention the
        existence of a stack for local variables, but of course that is the
        typical implementation on modern systems.
       
          debugnik wrote 38 min ago:
          It took me a second read to realise that the mention of static is a
          red herring. I think the author knows that the linkage is irrelevant
          for the rest of the explanation; it just happens to be static so they
          called it static. But by drawing attention to it, it does first read
          like they're confused about the role of static there.
       
          sfpotter wrote 1 hour 25 min ago:
          I had a completely different response reading the sentence. I've been
          programming in C for 20+ years and am very familiar with exactly the
          problem the author is discussing. When they referred to a "static
          variable", I understood immediately that they meant a file static
          variable private to the translation unit. Didn't feel contrived or
          made up to me at all; just a reflection of the author's expertise.
          Precision of language.
       
          gldrk wrote 1 hour 33 min ago:
          >This uses a static variable to have it persist between both the
          compare function calls that qsort makes and the main call which
          (potentially) changes its value to be 1 instead of 0
          
          The only misleading thing here is that âstaticâ is monospaced in
          the article (this canât be seen on HN).  Other than that, âstatic
          variableâ can plausibly refer to an object with a static storage
          duration, which is what the C standard would call it.
          
          >moving the variable from the stack to the "global data" which is
          generally heap-allocated as the program loads
          
          It is not heap-allocated because you canât free() it.  Non-zero
          static data is not even anonymously mapped, it is file-backed with
          copy-on-write.
       
          pjmlp wrote 2 hours 21 min ago:
          The author contributes to ISO C and ISO C++ working groups, and his
          latest contribution was #embed.
       
            steveklabnik wrote 1 hour 37 min ago:
            Not just that, the author is the Project Editor for WG14.
            
            This doesnât mean that itâs impossible to make mistakes, but
            still.
       
              uecker wrote 28 min ago:
              It means he can edit LaTeX.  Of course, JeanHeyd is very
              qualified, but being project editor for an ISO standard does not
              require this.
       
                steveklabnik wrote 12 min ago:
                I mean, you're closer to the committee than I am, but while
                that is true in a literal sense, I'd assume that you all would
                not let someone who knew how to edit LaTeX but not know
                anything about C hold that position.
       
          kreco wrote 5 hours 4 min ago:
          That's a very weird comment, your spreading your knowledge and not
          really addresse what could have been changed in the article.
          
          If I follow your comment, you mean that he could have use a
          non-static global variable instead and avoid mentioning "static"
          keyword afterward?
       
            unwind wrote 4 hours 50 min ago:
            Oh! Thanks, I was not being as concrete as I imagined. Sorry.
            
            Yes, the `static` can simply be dropped, it does no additional work
            for a single-file snippet like this.
            
            I tried diving into Compiler Explorer to examine this, and it
            actually produces slightly different code for the with/without
            `static` cases, but it was confusing to deeply understand quickly
            enough to use the output here. Sorry.
       
              mananaysiempre wrote 4 hours 10 min ago:
              I see exactly the same assembly from x86-64 GCC 15.2 with -O2 the
              first example in the article both as is and without `static`,
              which makes sense. The two do differ if you add -fPIC, as though
              youâre compiling a dynamic library, and do not add
              -fvisibility=hidden at the same time, but thatâs because Linux
              dynamic linking is badly designed.
       
                Chabsff wrote 4 hours 2 min ago:
                TU-level concepts (mostly) dissolve during the linking stage.
                You need to compile with -c to generate an object file in order
                to see the distinction.
                
                Also, the difference manifests in the symbols table, not the
                assembly.
       
                  mananaysiempre wrote 3 hours 35 min ago:
                  To clarify, I was talking about Compiler Explorer-cleaned
                  disassembly, same as the comment I was replying to.
       
        psyclobe wrote 7 hours 9 min ago:
        c++ for the win!! finally!!
       
        Rochus wrote 7 hours 14 min ago:
        The benchmark demonstrates that the modern C++ "Lambda" approach
        (creating a unique struct with fields for captured variables) is
        effectively a compile-time calculated static link. Because the compiler
        sees the entire definition, it can flatten the "link" into direct
        member access, which is why it wins. The performance penalty the author
        sees in GCC is partly due to the OS/CPU overhead of managing executable
        stacks, not just code inefficiency. The author correctly identifies
        that C is missing a primitive that low-level languages perfected
        decades ago: the bound method (wide) pointer.
        
        The most striking surprise is the magnitude of the gap between
        std::function and std::function_ref. It turns out std::function (the
        owning container) forces a "copy-by-value" semantics deeply into the
        recursion. In the "Man-or-Boy" test, this apparently causes an
        exponential explosion of copying the closure state at every recursive
        step. std::function_ref (the non-owning view) avoids this entirely.
       
          gpderetta wrote 6 hours 54 min ago:
          Even if you never copy the std::function the overhead is very large.
          GCC (14 at least) does not seem to be able to elide the allocation,
          nor inline the function itself, even if used immediately after use
          and the object never escapes the function. Given the opportunity, GCC
          seems to be able to completely remove one layer pf function_ref, but
          fails at two layers.
       
            Rochus wrote 6 hours 10 min ago:
            This is exactly right, and the "Man-or-Boy" benchmark hits the
            worst-case scenario for libstdc++ specifically. The optimization
            fails here. My "copy-by-value" comment refers to the ownership
            semantics. Since std::function owns its storage, and the Man-or-Boy
            recursion passes the closure into the next layer (often by value or
            by capturing it into a new closure), we trigger the copy
            constructor. If the SBO limit is exceeded, that copy constructor
            performs a new heap allocation and a deep copy of the state.
       
            boris wrote 6 hours 45 min ago:
            GCC (libstdc++) as all other major C++ runtimes (libc++, MSVC)
            implements the small object optimization for std::function where a
            small enough callable is stored directly in std::function's state
            instead of on the heap. Across these implementations, you can reply
            on being able to capture two pointers without a dynamic allocation.
       
              gpderetta wrote 6 hours 31 min ago:
              You would think so, but it actually doesn't. last time I checked,
              libstdc++ could only optimize std::bind closures. A trivial test
              with a stateless lambda shows this is still the case in GCC14 and
              15. In fact I can't even seem to trigger the library optimization
              with bind.
              
              Differently from GCC14, GCC15 itself does seem to be able to
              optimize the allocation  (and the whole std::function) in trivial
              cases though (independently of what the library does).
       
        sirwhinesalot wrote 7 hours 32 min ago:
        I think local functions (like the GNU extension) that behave like C++
        byref(&) capturing lambdas makes the most sense for C.
        
        You can call the local functions directly and get the benefits of the
        specialized code.
        
        There's no way to spell out this function's type, and no way to store
        it anywhere. This is true of regular functions too!
        
        To pass it around you need to use the type-erased "fat pointer"
        version.
        
        I don't see how anything else makes sense for C.
       
          gpderetta wrote 6 hours 18 min ago:
          > There's no way to spell out this function's type, and no way to
          store it anywhere. This is true of regular functions too!
          
          well regular functions decay to function pointers. You could have the
          moral equivalent of std::function_ref (or similarly, borland
          __closure) in C of course and have closures decay to it.
       
          nutjob2 wrote 6 hours 49 min ago:
          The price you pay for GCC nested (local) functions is an executable
          stack with 'trampolines'.
          
          I'm a fan of nested functions but don't think the executable stack
          hack is worth it, and using a 'display' is a better solution.
          
          See the Dragon Book or Compiler Construction: Principles and Practice
          (1984) by Louden
       
            sirwhinesalot wrote 6 hours 41 min ago:
            You misunderstood my comment. GNU local function syntax, C++ [&]
            lambda behavior (i.e., a hidden struct).
       
              nutjob2 wrote 5 hours 6 min ago:
              I really did, my comment is specific to C.
       
        nesarkvechnep wrote 7 hours 54 min ago:
        I'm thinking of using C++ for a personal project specifically for the
        lambdas and RAII.
        
        I have a case where I need to create a static templated lambda to be
        passed to C as a pointer. Such thing is impossible in Rust, which I
        considered at first.
       
          queuebert wrote 1 hour 28 min ago:
          In Rust, could you instead use a templated struct wrapping a function
          pointer along with #[repr(C)]?
       
          pornel wrote 5 hours 16 min ago:
          Yeah, Rust closures that capture data are fat pointers { fn*, data*
          }, so you need an awkward dance to make them thin pointers for C.
          
              let mut state = 1;
              let mut fat_closure = || state += 1;
              let (fnptr, userdata) = make_trampoline(&mut &mut fat_closure);
          
              unsafe {
              fnptr(userdata);
              }
          
              assert_eq!(state, 2);
          
              use std::ffi::c_void;
              fn make_trampoline(closure: &mut &mut C) -> (unsafe fn(*mut
          c_void), *mut c_void) {
              let fnptr = |userdata: *mut c_void| {
                  let closure: *mut &mut C = userdata.cast();
                  (unsafe { &mut *closure })()
              };
              (fnptr, closure as *mut _ as *mut c_void)
              }
              
          
          It requires a userdata arg for the C function, since there's no
          allocation or executable-stack magic to give a unique function
          pointer to each data instance. OTOH it's zero-cost. The generic
          make_trampoline inlines code of the closure, so there's no extra
          indirection.
       
            skavi wrote 1 hour 21 min ago:
            > Rust closures that capture data are fat pointers { fn, data }
            
            This isnât fully accurate. In your example, `&mut C` actually has
            the same layout as usize. Itâs not a fat pointer. `C` is a
            concrete type and essentially just an anonymous struct with FnMut
            implemented for it.
            
            Youâre probably thinking of `&mut dyn FnMut` which is a fat
            pointer that pairs a pointer to the data with a pointer to a
            VTable.
            
            So in your specific example, the double indirection is unnecessary.
            
            The following passes miri: [1] (did this on mobile, so please
            excuse any messiness).
            
 (HTM)      [1]: https://play.rust-lang.org/?version=nightly&mode=debug&edi...
       
            nesarkvechnep wrote 4 hours 30 min ago:
            I know about this technique but it uses too much unsafe for my
            taste. Not that it's bad or anything, just a personal preference.
       
        Progge wrote 8 hours 34 min ago:
        Long time ago I wrote C. Could anyone fill me in why the first code
        snippet is arg parsing the way it is?
        
        int main(int argc, char* argv[]) {
        
          if (argc > 1) {
        
            char\* r_loc = strchr(argv[1], 'r');
        
            if (r_loc != NULL) {
        
              ptrdiff_t r_from_start = (r_loc - argv[1]);
        
              if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1)
        {
            in_reverse = 1;
              } 
        
            }
        
          }
        
          ...
        
        }
        
        Why not
        
        if (argc > 1 && strcmp(argv[1], "-r") == 0) {
        
            in_reverse = 1;
        
        }
        
        for example?
       
          Joker_vD wrote 5 hours 37 min ago:
          I suspect it was adopted from a bigger snippet that had support for
          parsing things like "-abc" as "-a -b -c", etc.
       
          tapete2 wrote 8 hours 12 min ago:
          It doesn't even make sense to use strchr for determining the position
          of 'r', when the code checks that the position of '-' is at index 0.
          
          Your solution is perfectly fine. Even if you don't have access to
          strchr for some reason, the original snippet is really convoluted.
          
          You could just write (strlen(argv[1]) > 1 && argv[1][0] == '-' &&
          argv[1][0] == 'r') if you really want to.
       
            unwind wrote 5 hours 45 min ago:
            Of course, `&&` in C is short-circuiting so it's safe without the
            `strlen()` too, as long as the argument is there i.e. not NULL.
            
            Also, the use of a convoluted `if` to conditionally assign a
            literal boolean is a code smell (to me), I would drop the `if` and
            just use:
            
                in_reverse = argc > 0 && argv[1][0] == '-' && argv[1][1] ==
            'r';
            
            if a more forward-thinking/strict check is not needed.
       
            microtherion wrote 7 hours 57 min ago:
            It could make some sense to use strchr, because in idiomatic UNIX
            tools, single character command line options can be clustered. But
            that also means that subsequent code should not be tested for a
            specific position.
            
            And if you ever find yourself actually doing command line parsing,
            use getopt(). It handles all the corner cases reliably, and
            consistent with other tools.
       
        capestart wrote 8 hours 37 min ago:
        The breakdown of lambda, blocks, and nested functions demonstrates how
        important implementation and ABI details are in addition to syntax. I
        think the standard for C should include a straightforward, first class
        wide function pointer along with a closure story to stop people from
        adding these half portable, half spooky extensions.
       
          uecker wrote 25 min ago:
          This.
       
        mgaunard wrote 8 hours 45 min ago:
        I feel the results say more about the testing methodology and inlining
        settings than anything else.
        
        Practically speaking all lambda options except for the one involving
        allocation (why would you even do that) are equivalent modulo inlining.
        
        In particular, the caveat with the type erasure/helper variants is
        precisely that it prevents inlining, but given everything is in the
        same translation unit and isn't runtime-driven, it's still possible for
        the compiler to devirtualize.
        
        I think it would be more interesting to make measurements when
        controlling explicitly whether inlining happens or the function type
        can be deduced statically.
       
          gpderetta wrote 7 hours 43 min ago:
          Given a Sufficiently Goodâ¢ compiler, yes, after devirtualization
          and heap elision all variants should generate exactly the same code.
          In practice is more complicated. Devirtualization needs to runs after
          (potentially interprocedural) constant propagation, which might be
          too late to take advantage of other optimization opportunities,
          unless the compiler keeps rerunning the optimization pipeline.
          
          In a simple test I see that GCC14 has no problems completely removing
          the overhead of std::function_ref, but plain std::function is a huge
          mess.
          
          Eventually we will get there [1], but in the meantime I prefer not to
          rely on devirtualization, and heap elision is more of a party trick.
          
          edit: to compare early vs late inlining: while gcc 14 can remove one
          layer of function_ref, it seems that it cannot remove two layers, as
          apparently doesn't rerun the required passes to take advantage of the
          new opportunity. It has no problem of course removing an arbitrary
          large (but finite) layers of plain lambdas.
          
          edit2: GCC15 can remove trivial uses of std::function, but this is
          very fragile. It still can't remove two function_ref.
          
          [1] for example 25 years ago compilers were terrible at removing
          abstraction overhead of the STL, today there is very little cost.
       
            mgaunard wrote 18 min ago:
            You can just write the benchmark in such a way that the
            optimizations are not possible.
       
        RossBencina wrote 9 hours 16 min ago:
        Good to see Borland's __closure extension got a mention.
        
        Something I've been thinking about lately is having a "state" keyword
        for declaring variables in a "stateful" function. This works just like
        "static" except instead of having a single global instance of each
        variable the variables are added to an automatically defined struct,
        whose type is available using "statetype(foo)" or some other mechanism,
        then you can invoke foo as with an instance of the state (in C this
        would be an explicit first parameter also marked with the "state"
        parameter.) Stateful functions are colored in the sense that if you
        invoke a nested stateful function its state gets added to the caller's
        state. This probably won't fly with separate compilation though.
       
          1f60c wrote 6 hours 30 min ago:
          > a "state" keyword for declaring variables in a "stateful" function
          
          Raku (nÃ©e Perl 6) has this!
          
 (HTM)    [1]: https://docs.raku.org/language/variables#The_state_declarato...
       
          fuhsnn wrote 7 hours 35 min ago:
          I dreamed up a similar idea[1] upon reading the author's closure
          proposal, it's also really close to async coroutines.
          
 (HTM)    [1]: https://github.com/ThePhD/future_cxx/issues/55#issuecomment-...
       
          vintagedave wrote 7 hours 38 min ago:
          Yes, though it was a remarkably brief mention. I believe Borland
          tried to standardise it back in 2002 or so,* along with properties.
          (I was the C++Builder PM, but a decade and a half after that
          attempt.)
          
          C++Builderâs entire UI system is built around __closure and it is
          remarkably efficient: effectively, a very neat fat pointer of object
          instance and method.
          
          [*] Edit: two dates on the paper, but âbound pointer to memberâ
          and they note the connection to events too:
          
 (HTM)    [1]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2002/n13...
       
          juvoly wrote 9 hours 1 min ago:
          That sounds cool, but this quickly gets complicated. Some aspects
          that need to be addressed:
          
          - where does the automatically defined struct live? Data segment
          might work for static, but doesn't allow dynamic use. Stack will be
          garbage if closure outlives function context (ie. callback, future).
          Heap might work, but how do you prevent leaks without C++/Rust RAII?
          
          - while a function pointer may be copied or moved, the state area
          probably cannot. It may contain pointers to stack object or point
          into itself (think Rust's pinning)
          
          - you already mention recursion, compilation
          
          - ...
       
            fuhsnn wrote 7 hours 43 min ago:
            IMO the C way is to allow users to explicitly manage context area,
            along the lines of posix ucontext.h or how the author's closure
            proposal handle closure allocation[1]
            
 (HTM)      [1]: https://thephd.dev/_vendor/future_cxx/papers/C%20-%20Funct...
       
        ddtaylor wrote 9 hours 27 min ago:
        I actually enjoy trampoline functions in C a bit and it's one of the
        GNU extensions I use sometimes.
       
       
 (DIR) <- back to front page