[HN Gopher] The pervasive effects of C's malloc() and free() on ...
___________________________________________________________________
The pervasive effects of C's malloc() and free() on C APIs
Author : todsacerdoti
Score : 108 points
Date : 2022-08-07 14:03 UTC (8 hours ago)
(HTM) web link (utcc.utoronto.ca)
(TXT) w3m dump (utcc.utoronto.ca)
| greatgib wrote:
| lpapez wrote:
| _Sure, it is not for script kiddies, and for the usual new
| 'web' developers that are now living in easy dev sandboxes.
| But, now, most applications are really mediocre. Everything
| uses hundreds of MB or GB of memory just for simple
| programs..._
|
| Do you realize how snobbish this sounds?
| jeffbee wrote:
| Well the example of the Linux kernel is an extremely bad one
| because it is absolutely stuffed full of memory management bugs
| on error paths. It will leak from `goto` beyond the
| deallocation, or it will free memory that was never allocated
| if it branched over the allocation. When pressed to its limits,
| for example by running a container out of memory, the linux
| kernel memory management falls to pieces.
| eska wrote:
| In practice this is mostly avoidable. There are many libraries
| that do not allocate at all by forcing the caller to provide the
| memory. The library may tell in advance how much memory is
| necessary, or report in hindsight whether it was enough for the
| operation. This leads to a style of allocating ahead of time and
| also considering worst case memory size.
|
| Otherwise I prefer libraries that allow setting the allocation
| function at least.
| Sharlin wrote:
| Of course this very paradigm, combined with the fact that C
| doesn't have a proper pointer-and-length type, led to `gets`
| and around n million other security disasters when bad APIs
| would just take a pointer without a length and assume there's
| "enough" space...
|
| Anyway, the whole "caller allocates" concept doesn't work too
| well in the particular case of `gethostbyname`, which as the
| author mentions, is a complex struct containing several
| pointers and double pointers, with a potentially unlimited
| number of different-length allocations one would have to make!
| dathinab wrote:
| There is a funny version of this:
|
| Caller provides memory and function fails if there is to little
| memory, but writes the the required (variable) amount of memory
| to an `out length` pointer.
|
| Effectively leading to a common pattern of:
|
| 1. call function with empty buffer (or buffer or arbitrary
| size)
|
| 2. allocate buffer
|
| 3. call function again with properly sized buffer
|
| 4. add a loop if between 1 & 3 the required buffer size can
| change
|
| its fascinating how well it works for some of it's common use
| cases and how subtle but badly broken it can be for other use
| cases :=)
| userbinator wrote:
| That's classic "Microsoft style" and I've always hated APIs
| like that because they add unnecessary complexity to the code
| of all their callers.
| severino wrote:
| Please excuse my little experience with this topic, but when
| the article says this:
|
| > If this structure is dynamically allocated by gethostbyname()
| and returned to the caller, either you need an additional API
| function to free it or you have to commit to what fields in the
| structure have to be freed separately, and how
|
| With your approach, wouldn't you need to free all the fields in
| the structure separately as well? Because, what's the
| difference between the library allocating the memory -with the
| problems the article points out- and allocating it yourself?
| liuliu wrote:
| Yes, it is educated that way. However, for the case requires
| some scratch space, dynamic allocations can be easier to
| manage, less error prune and with less overall memory usage
| (you don't need to preallocate worst case amount).
|
| That's why even people knew, many C APIs still return dynamic
| allocated objects or simply let you inject malloc / free if you
| want more control.
|
| This is a roundabout way to say: if you aspired to provide APIs
| with zero dynamic allocation, go ahead. But if you find
| yourself struggling with more complicated code as a result,
| think about just letting a little bit dynamic allocations may
| help.
| layer8 wrote:
| This is the correct approach. Let the caller allocate the
| memory, and if necessary provide a function to calculate the
| required memory beforehand, and for the caller to specify the
| size of the memory passed to the primary function. For
| dynamically-sized results, another option is to provide
| iteration functions (like when walking a directory tree) in
| order to simplify memory sizing for each individual function
| calls.
| stormbrew wrote:
| The especially great thing about this approach is it lets you
| put things on the stack when appropriate too. It's really
| annoying when some library forces you to spam the heap with
| short lived, easily scoped objects.
| eska wrote:
| True. Also great for thread safety. And avoiding shared
| access performance pitfalls
| bArray wrote:
| I've always wondered why there isn't a `free_all()` function
| (that I'm aware of) for exactly ensuring that handles this. Or
| why you can't probe memory, with something like `is_alloc()` (is
| allocated).
|
| I usually define my own version of `free()` to check whether the
| pointer is NULL, free the memory if not, and then set the pointer
| to NULL. That way if your pointer isn't NULL, it should be
| pointing somewhere. I believe there are some caveats though,
| specifically around OOM allocations as memory isn't truly
| allocated until you go to access it.
|
| C memory is generally quite cool to work with, but those tripping
| points really will trip you up. It's exceptionally easy to have
| stuff lingering around indefinitely, even worse when it happens
| in a loop.
| c-smile wrote:
| Such problem is pretty widespread and not limited by just
| structures, C-style strings are there too.
|
| For example: const char* getenv(const char*
| name);
|
| Do we need to free the string? And if "yes" then how? It is not
| realistic to provide free** for each such API function...
|
| In Sciter API (https://sciter.com) I am solving this by callback
| functions: typedef void string_receiver(const
| char* s, size_t slen, void* tag); const char*
| getenv(const char* name, string_receiver* r, void* tag);
|
| So getenv calls string_receiver and frees (if needed) stuff after
| the call.
|
| This is a bit ugly on pure C side but it plays quite well with
| C++ where you can define receiver for std::string for example and
| define pure C++ version: std::string
| getenv(const char* name);
|
| It would be nice for C to have code blocks a la Objective-C (
| https://www.tutorialspoint.com/objective_c/objective_c_block...
| ), with them solution of returning data is trivial.
| juped wrote:
| You don't free the pointer returned by getenv(), because the
| environment variables are already in memory and getenv() is
| just giving you a pointer to one of their values.
|
| The most comfortable way to do a C API is to make the caller
| allocate space for return values (that aren't something simply
| copiable like int), and take a pointer to it as a parameter.
| The few standard library functions that malloc things are
| annoying, because you might not want to do that.
| c-smile wrote:
| getenv() is just a sample. But even with it... it puts some
| limitation on potential API implementation and overall system
| performance. E.g. const char* user_password() cannot store
| the data anywhere, right? All that...
|
| > The most comfortable way to do a C API is to make the
| caller allocate space for return values
|
| That's even worse. How will caller know size of the buffer
| upfront?
|
| With the callback approach that is trivial - you get the size
| on call - no need to call the API function twice - for size
| of the buffer and then for real copy.
| coliveira wrote:
| In C the way to solve this is to look at the man page for the
| function and see what they say about memory allocation. There
| is no magic involved.
| c-smile wrote:
| Documentation solves just one problem: to free or not to
| free.
|
| But there are performance, security and other issues.
|
| What if it is significantly more performant for getenv() (or
| whatever) to fetch needed data using alloca (on stack, with
| fallback to heap/malloc)?
|
| Returning naked pointer is far from being flexible really.
| smarks wrote:
| I'm convinced there was another style of C API where the callee
| would malloc a struct, populate it, and free it immediately
| before returning a pointer to it. Of course, the only way the
| caller can use the result is after it has been freed.
|
| Naturally there was a dependency on the exact behavior of the
| allocator, specifically, that it had to leave a freed block of
| memory untouched sufficiently long for the caller to be able to
| use the results. I seem to recall the stipulation was that freed
| memory was left untouched until the next memory allocation
| operation. The caller also had to be careful about using or
| copying the results immediately, before doing too much work.
|
| I have dim memories of people talking about this sort in thing at
| university in the early 1980s; we were a 4.2 BSD shop. I also
| recall debugging some old C source code (srogue, which also has
| BSD heritage) decades later, and encountering use-after-free
| crashes. There were several instances of this. There were too
| many to be accidental; it seemed deliberate to me.
|
| I suspect the reason for this "technique" was to relieve the
| caller of the burden of freeing the memory. It allowed the caller
| to return variable-length data easily, which couldn't be done if
| the pointer was to a static data area. And finally it relieved
| the callee of defining an explicit "free" API.
|
| Frankly I think this is a terrible API style. However, code that
| used it properly and that was sufficiently careful would actually
| function properly. But it seems like an incredibly fragile and
| sloppy way to design a system.
| mtlmtlmtlmtl wrote:
| Man this sounds like something some students came up with after
| partaking in the ganja. And it coming out of Berkeley in the
| 80s certainly tracks with that...
|
| Not a good way of doing things. I mean, have fun using
| Valgrind. Or switching out libc, etc. And what about key
| material? There you would still have to do a second step of
| zeroing or junking the memory when done with it anyway.
| smarks wrote:
| Yes, grad students at Berkeley in the early 1980s. For some
| reason I associate this technique with Bill Joy (who
| obviously was a major influence on a lot of what went into
| the 4bsd releases). However I have no evidence of this, nor
| whether any or what kinds of substances might have been
| involved.
| bitwize wrote:
| > I'm convinced there was another style of C API where the
| callee would malloc a struct, populate it, and free it
| immediately before returning a pointer to it. Of course, the
| only way the caller can use the result is after it has been
| freed.
|
| That's more than a bad API design, that's undefined behavior --
| squarely in nasal-demons territory. Depending on the compiler,
| the callee, the caller, or the entire observable universe can
| be optimized away into a no-op.
| smarks wrote:
| Oh yes, totally undefined.
|
| But consider the time frame, early 1980s K&R C on 4bsd Unix
| on a VAX. This predates ANSI/ISO C and Posix. It even
| predates "nasal demons." There was no specification; or
| perhaps the implementation _was_ the specification. The fact
| was that at some point the bsd allocator did leave freed
| memory untouched until the next memory allocation operation,
| and so people wrote programs that relied on this.
|
| Again, I'm not defending this, but this seemed to be the way
| that some people thought about things. I even remember
| questioning some code that used memory after having freed it.
| It was explained to me that this was "safe" because the
| memory wouldn't be modified until the next malloc!
|
| Also, remember that BSD was the system where if you did
| printf("%s", NULL);
|
| it would print "(null)" instead of getting SIGSEGV. And in
| general, deferencing a null pointer would return zero. The
| rationale for this was that it "made programs more robust."
| (Again, I disagree, don't argue with me about this!)
|
| One more common technique from the BSD era (srogue again, but
| other programs did this too). To save the state of a program,
| write to a file everything between the base of the data
| segment to the "break" at the top of the data segment. To
| restore, just sbrk() to the right size and read it all back
| in, overwriting everything starting at the base of the data
| segment. I always found it surprising that this worked, but
| it worked often enough that people did sh!t like this.
| Athas wrote:
| Well, this sounds like it was before ANSI C, so there was no
| defined notion of undefined behaviour - I think that term of
| art came with the later standardisations. And if it was
| written to run on a specific OS or compiler (4BSD), one can
| argue that it was a really bad design, but it worked reliably
| on what was essentially the implementation-defined platform
| it was targeting.
| giomasce wrote:
| It also fails on multithdreaded programs, unless you add even
| more assumptions on the allocator.
| smarks wrote:
| "Multidreaded programs" :-)
|
| This was at least a decade before multithreading in C and
| Unix. But yes this "technique" would have failed miserably in
| a multithreaded environment.
| dragontamer wrote:
| Sounds like just a common mistake where people used the "stack"
| accidentally.
|
| Ex: struct someStruct* badfunc(){
| struct someStruct toReturn; toReturn.a = foo();
| toReturn.b = bar(); return &toReturn; }
|
| In most people's code, this would probably work...
| struct someStruct a = *badfunc(); func2(); // This will
| overwrite the "toReturn" // from the last
| call, but as long as the // struct was copied
| before any other function // call, you're
| probably fine though in // undefined-behavior
| land
|
| -----------------
|
| Either that, or you're talking about strtok (and other non-
| reentrant functions).
| smarks wrote:
| It was definitely malloc'd memory, as I remember removing
| free() calls from the callee and adding them to the caller.
| pitched wrote:
| I've used this before in embedded code to save rom space by
| not having to include malloc. The stack _is_ the heap! Just
| have to be very careful about when you can call another
| function.
|
| The best version of this is where you allocate a block on
| your stack then pass that as a pointer up to the next
| function to use. The one who owns the memory is the one who
| allocates it (Rust style?). Or, have the linker allocate
| global blocks works too.
| tpoacher wrote:
| Question, why would this "probably work"? I would have agreed
| with your assessment if `a` was used for its one and only
| purpose before `func2` was called ... but as it stands as
| soon as `func` is called, the content of `a` will most likely
| be replaced by garbage (and therefore definitely will not
| "work" in any reasonable sense of the word), no?
|
| Or do you mean something else by "probably work"? (like, in
| the sense that it will output "something").
| 13of40 wrote:
| Since func2 doesn't have any parameters, the only thing
| that calling it will put on the stack is the return
| address. In fact, if it's the last function call in the
| code block, even that might get optimized out. The struct
| should be intact and available via a local variable in
| func2.
| Jach wrote:
| In GC languages a common approach is to have "finalizers" to
| make something like this a possible and sometimes convenient
| way of dealing with a foreign API, I wonder if what you saw was
| something similar? The idea is to allocate the foreign memory,
| then make a finalizer which is just a hook that will
| (eventually) call the foreign free only when some object (like
| a wrapper for the foreign memory) is collected. Something
| similar could be hidden behind some preprocessor macros, with
| guarantees only until the next OUR_MALLOC...
|
| The problems for the GC languages tend to be fewer but if the
| wrapper is out of scope but someone grabbed and maintains a
| hold on the foreign memory directly, they're playing with fire
| as for when the GC will execute the finalizer hook and make
| that memory invalid. It's also a frustrating technique when
| foreign APIs -- particularly in certain graphics contexts --
| require allocation threads to be the same as freeing threads,
| and of course depending on the implementation of free and the
| GC it might be an expensive operation to have a bunch of them
| suddenly happen at once when all you were expecting was a new
| native object and not a bunch of GC work behind the scenes.
| smarks wrote:
| Definitely not GC. This was K&R C on BSD Unix around 1984.
| pjmlp wrote:
| That is why finalizers are yesterday solution, most modern GC
| based languages have eventually catched up with Common Lisp
| and offer region based resource management (try-with-
| resources, use, using, defer, with,...), and in some cases
| trailing lambdas, which completly hide the resource
| management from the consumer.
|
| For scenarios like you're describing, .NET has SafeHandles
| for example.
| derefr wrote:
| Sounds like it was just using the heap to badly imitate
| returning variable-length data on the stack under a callee-
| preserved calling convention. (Callee writes the variable-
| length data inside their own stack frame, pops the stack frame
| in the function epilogue, and "leaks" the pointer and size of
| the data in caller-expected return registers. Caller uses the
| dangling data -- carefully not pushing to the stack until it
| has finished. Everything works out.)
| giomasce wrote:
| One solution I've sometimes seen or the wild is to mandate that
| the library allocates just one big malloc chunk and arranges
| pointers inside that chunk. So the caller has to free just one
| thing. It's more inconvenient for the library, though.
| Animats wrote:
| Well, yes. It's 1970s C technology.
|
| There are several options. They all suck.
|
| - Pass in a buffer to be filled by the API. The API can't check
| the buffer size you gave it. Be mentioned in a CERT security
| advisory for creating a buffer overflow vulnerability.
|
| - Have the API give you a buffer. Reboot your system regularly to
| recover the memory leaks.
|
| - Free the buffer before returning it, so the caller is using the
| buffer after free. Debug memory corruption bugs when someone uses
| an allocator which overwrites freed buffers.
|
| This is what move semantics are for. You call something, it gives
| you a thing, and now it's yours to use and release. Needs
| language support to work well, but is the right answer.
| HarHarVeryFunny wrote:
| Just one minor quibble: I'd say it's C++'s classes (not move
| semantics) that made doing things like this much cleaner -
| mostly by having a destructor that let's data structures
| automatically clean up after themselves (release memory) when
| they go out of scope. Now the developer doesn't need to know or
| care about whether the structure they were given has internal
| dynamically allocated components or not.
|
| Move semantics is "just" an optimization that makes passing
| data-owning classes around more efficient. Pre C++11 you'd just
| return the class by reference to avoid the inefficiency of
| return by value, but with move semantics you can treat complex
| types the same as simple ones, and not worry about the
| efficiency of how you pass them around.
| Sharlin wrote:
| There's another way: pass a pointer to the cleanup function as
| an out parameter - if it's a struct you're returning, just
| return the pointer as one of the struct fields. This is, of
| course, how "OO" as in methods and polymorphism is sometimes
| simulated in C. This way you don't need to pollute your API
| with a zillion different `free_foo` functions.
| chjj wrote:
| Just an interesting aside: returning structs directly (e.g.
| `struct foo bar();`) was added in V7 UNIX (1979). That said, the
| convention for it was pretty archaic: behind the scenes PCC used
| a static return buffer and the caller knew to copy the struct
| from the returned pointer afterwards. So what looked like thread-
| safe code was actually totally broken by today's standards.
|
| GCC still supports this with -fpcc-struct-return[1] (though, the
| modern man page doesn't seem to mention the static return
| buffer).
|
| Also just because there were no threads back in the day doesn't
| mean static return buffers were okay. In some cases, invoked
| signal handlers could still call something and corrupt your
| statically allocated return buffer. So making any system call
| after receiving your static return pointer was a footgun to watch
| out for: struct foo *bar = some_lib_func();
| time(0); /* potential breakage */
|
| [1]
| https://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Incompatibilities...
| wwalexander wrote:
| I'm not sure how common it is for libraries to return heap-
| allocated memory like this, vs. taking a pointer to an
| uninitialized value.
| userbinator wrote:
| _This became a serious issue when Unix added threads (this static
| area isn 't thread safe)_
|
| I'm not convinced it's "serious" --- thread-local-storage easily
| solves that.
|
| _Since this structure contains embedded pointers (including two
| that point to arrays of pointers), there could be quite a lot of
| things for the caller to call free() on (and in the right
| order)._
|
| Again the solution is simple: Allocate everything at once, so
| that free() need be called only once on the returned block.
|
| In some ways I think the relative difficulty of using dynamic
| allocation in C compared to other languages is a good thing ---
| it forces you to think whether it's really necessary before doing
| so, and in many cases, it turns out not to be. That way
| encourages simpler, more efficient code. In contrast, other
| languages which make it _very_ easy to dynamically allocate (or
| even do it by default) tend to cause the default efficiency of
| code written in them to be lower, because it 's full of
| unnecessary dynamic allocations.
| Athas wrote:
| > I'm not convinced it's "serious" --- thread-local-storage
| easily solves that.
|
| What about other cases of exceptional control flow? What
| happens if a signal arrives while that static area is being
| used, and the signal handler also needs to use the static area?
| userbinator wrote:
| The set of functions that can be called from a signal handler
| is very small (most of them corresponding to system calls or
| otherwise non-stateful functions like strchr());
| gethostbybame() is not one of them, and neither are malloc()
| nor free().
| thxg wrote:
| One slight variation on the getaddrinfo()/freeaddrinfo() approach
| is what (among many others) GMP [1] and its derivatives do: For
| every struct or custom type, you systematically get
| void type_init(type *t, [...]); void type_clear(type *t,
| [...]);
|
| This is essentially explicit constructors and destructors in C,
| and one can legitimately argue that it is clunky, verbose and
| error-prone.
|
| However, if we are constrained to a C API, it does have one
| important practical quality in my experience: Because it is
| always the same, it eases the mental load on both the API's user
| and the API's implementer, especially if there are many such
| types involved.
|
| [1] See e.g. https://gmplib.org/manual/Initializing-Integers
| vlovich123 wrote:
| CoreFoundation at Apple had a similar convention. Any memory
| obtained by an API named Create or Copy would have similar
| Delete method that always had to be called.
| deathanatos wrote:
| > _and one can legitimately argue that it is clunky, verbose
| and error-prone._
|
| Clunky and verbose, yes, error-prone no. The APIs that do that
| are generally much more clear about ownership, and thus much
| easier, IMO, to write correct code for. Much worse is the API
| that returns you a pointer with no obvious mechanism to free
| it. Is it tied to the lifetime of an input to the function that
| returned it? Is it a global and this API is completely thread-
| unsafe? Am I leaking memory?
| throwawaymaths wrote:
| Especially for certain generics (e.g. hashmaps) you might want
| to have different allocators without creating a whole separate
| type: for example a global one, a threadlocal arena, etc.
| username223 wrote:
| > explicit constructors and destructors in C
|
| Exactly, and that's the right way to do it. In a language
| without implicit destructors/finalizers, you need a way for
| callers to say "okay, I'm done with this thing." And even with
| GC, you need finalizers to take care of non-memory resources.
| This may be clunky in C, but that's what you get in a language
| that makes you be explicit.
| Someone wrote:
| This is unavoidable in any language that (supports dynamic memory
| allocation and) moves dynamic memory allocation into a library.
|
| And it goes even further than the article claims: even functions
| that allocate a flat structure on behalf of the caller and return
| it should provide a companion function to free it. Reason is that
| the caller and the called function might have a different idea
| about what _the_ memory allocator is. That's rare on unixes, but
| was reasonably common on Windows with cross-DLL calls
| (https://codereview.stackexchange.com/questions/153559/safely...)
|
| Also, say a DLL function returns a char pointer containing a
| string. How would you know whether to call _free_ or _delete_ on
| it? Or, maybe, the equivalent of _free_ in Frob, the language
| that DLL happens to be written in?
| masklinn wrote:
| > How would you know whether to call free or delete on it?
|
| By making ownership part of the API (and ABI).
|
| Sadly C is unable to express this, and thus so are FFI layers.
| legalcorrection wrote:
| olliej wrote:
| I'm unsure why you're saying there's rust spam here?
|
| "lifetime" is not a rust only concept, syntactic lifetimes
| might be, but the idea of C APIs specifying the lifetime of
| a returned value is not new, novel, or rust specific.
|
| Many C APIs have SomeLibraryCreateObject(...),
| SomeLibraryRetainObject(...) and
| SomeLibraryReleaseObject(...) - or a more basic but less
| flexible SomeLibraryCreateFoo() SomeLibraryFreeFoo(...).
|
| The important thing is that the API specifies the lifetime
| of the returned value, idiomatic APIs do stuff like
| "SomeLibraryGet..." does not transfer ownership,
| "SomeLibraryCreate..." "SomeLibraryCopy..." etc do.
| Generally this works more robustly with some variation of
| refcounting, but you can be copy centric and say "if you
| want to keep this data call SomeLibraryCopy(..)".
| KerrAvon wrote:
| C is unable to express this as a machine-readable attribute,
| but you can certainly document it as part of the API contract
| and teach the FFI layer about it. This doesn't scale, but an
| FFI layer rarely translates directly into the language's
| idiom without some manual effort.
| pjmlp wrote:
| On Windows it is clear, memory allocated by DLLs belongs to
| them, and should be deallocated by APIs exposed by them, and to
| play safe you should use Win32 APIs for memory management and
| not rely on the free()/malloc() provided by the compiler.
| josephcsible wrote:
| > the free()/malloc() provided by the compiler
|
| Nitpick: they're provided by the C runtime library.
| pjmlp wrote:
| Which on non-UNIX platforms means the C compiler that one
| bought, not necessarly from the OS vendor, as libc isn't
| traditionally part of the OS APIs.
| josephcsible wrote:
| I don't think there's a perfect 1:1 correspondence
| between compilers and libc versions on Windows today, but
| even if there were, it's still a distinction worth
| making. For example, if two libraries both statically
| link the C runtime, that counts as separate ones (and so
| a malloc in one paired with a free in the other will
| wreak havoc), even if they're the exact same version.
| cesarb wrote:
| > and to play safe you should use Win32 APIs for memory
| management
|
| Which ones? HeapAlloc/HeapFree? LocalAlloc/LocalFree?
| GlobalAlloc/GlobalFree? CoTaskMemAlloc/CoTaskMemFree?
| VirtualAlloc/VirtualFree? Something else? If the answer is
| HeapAlloc/HeapFree, which heap? Should you enable the low-
| fragmentation heap or not?
| pjmlp wrote:
| Doesn't matter to the caller, because they aren't supposed
| to be clever and call any of them instead of the APIs
| exposed by the respective DLL for resource management.
|
| The DLL authors should better know what APIs to call
| internally on their own code.
| dathinab wrote:
| While not common to run into bugs because of it on unix
| semantically its a problem all the time.
|
| While multiple statically linked C libs normally use the same
| allocator, the moment you link in any other language in any way
| (static,.so) the guarantee is gone.
|
| So you `dart:ffi.allocate` `C-malloc` and rust
| `std::alloc::alloc` might in the end all use different
| allocators or might happen to use the same allocator, but as
| long as you don't carefully control all parts involved the all
| bets are off.
|
| And it can make a lot of sense to use different allocators in
| FFI-libraries in some use cases (mainly as a form of
| optimization).
| messe wrote:
| > This is unavoidable in any language that (supports dynamic
| memory allocation and) moves dynamic memory allocation into a
| library.
|
| Not necessarily. The Zig[1] standard library forces callers to
| provide at runtime an allocator to each data structure or
| function that allocates. Freeing is then handled either by
| calling .deinit()--a member function of the datastructure
| returned (a standard convention)--or, if the function returns a
| pointer, using the same allocator you passed in to free the
| returned buffer. C's problem here is it doesn't have namespaces
| or member functions, so there's a mix of conventions for what
| the freeing function should be called.
|
| C++ allows this as well for standard library containers,
| although I've rarely seen it used.
|
| > Also, say a DLL function returns a char pointer containing a
| string. How would you know whether to call free or delete on
| it? Or, maybe, the equivalent of free in Frob, the language
| that DLL happens to be written in?
|
| I have to concede this one. I can't see a way out of this other
| than documentation.
|
| [1]: https://ziglang.org/
| thechao wrote:
| If `push_back()` & friends took an (optional) extra allocator
| parameter, it'd be pretty ideal. It'd be nicer if the
| implementations were forced to be single-word containers like
| Stepanov wanted...
| messe wrote:
| std::vector can take an allocator as a template parameter
| though? For a list, sure I can see that having a separate
| allocator per element could work, but for a vector, surely
| you'd want the same allocator for the entire range?
|
| EDIT: assuming we're talking about C++, if not please don't
| hesitate to corrrect me.
| thechao wrote:
| Stateful allocators require a word be dedicated to the
| allocator in the header. In my use cases, I _always_ have
| access to the allocator, but I need to create a _lot_ of
| containers -- most staying empty, to boot! Paying the
| extra overhead is morally objectionable.
| nine_k wrote:
| You provide a memory allocation interface either way: it may
| be a special function per type, it may be a generic
| allocator.
| ErikCorry wrote:
| Languages with GC generally have much cleaner and simpler APIs.
| chrsig wrote:
| Usually. Even in languages with a GC, you get into a Bring Your
| Own Buffer (BYOB) situation when trying to eek out performance.
|
| As in real life, one of the best ways to cut down on waste (gc
| load) is to recycle.
| Lammy wrote:
| Ruby 3.1 provides a built-in middle ground for this use case:
| https://docs.ruby-lang.org/en/master/IO/Buffer.html
| ErikCorry wrote:
| I'm super-reluctant to go this way because it's often a bug
| factory. Once you are expecting the programmer to do manual
| GC and detect when a buffer can be reused you are losing some
| of the benefit. A lot of the time the answer should a better
| GC and a smarter compiler. I realize that in practice that's
| not always available.
| chrsig wrote:
| I want to say that it should only be used after profiling
| and determining that it's a major cause off GC pressure,
| however I think there are other times when it's obvious
| that a private scratch buffer would be appropriate.
|
| For example, when marshaling an object before writing it to
| a file, makes sense to write it to a scratch buffer before
| writing it to the file. It's generally on the order of
| trivial to keep said buffer encapsulated to prevent any
| caller from dealing with potential pitfalls.
|
| Having clear ownership of the buffer is a big benefit to
| help reduce any potential issues. You're correct that as a
| GC approaches perfect, there ceases to be a need for it.
___________________________________________________________________
(page generated 2022-08-07 23:00 UTC)