[HN Gopher] Memory Allocation
___________________________________________________________________
Memory Allocation
Author : thecoppinger
Score : 927 points
Date : 2023-05-22 09:26 UTC (13 hours ago)
(HTM) web link (samwho.dev)
(TXT) w3m dump (samwho.dev)
| kayo_20211030 wrote:
| Thanks. Well written and presented.
| junon wrote:
| Seems to be a bug on the first interactive graph, at least for
| me. Unless I'm misunderstanding the point of the graph,
| `malloc(7)` only allocates 2 bytes.
| ozfive wrote:
| I came here to see if anyone else noticed this and am
| confirming that there is a bug in the first slider on
| malloc(7). Indeed it only allocates two bytes instead of seven.
| samwho wrote:
| Good spot! Thank you. Fix on its way out now. :)
| ozfive wrote:
| Thank you for building this! It's helped a lot for me to
| wrap my head around the concept even more deeply than
| before.
| carlmr wrote:
| True, it might be cut-off from screen?
| samwho wrote:
| Nah, I just failed at basic arithmetic :D
|
| I wrote this: <div class="memory" bytes="32">
| <malloc size="4" addr="0x0"></malloc> <malloc
| size="5" addr="0x4"></malloc> <malloc size="6"
| addr="0x9"></malloc> <malloc size="7"
| addr="0xa"></malloc> <free addr="0x0"></free>
| <free addr="0x4"></free> <free addr="0x9"></free>
| <free addr="0xa"></free> </div>
|
| Instead of this: <div class="memory"
| bytes="32"> <malloc size="4" addr="0x0"></malloc>
| <malloc size="5" addr="0x4"></malloc> <malloc
| size="6" addr="0x9"></malloc> <malloc size="7"
| addr="0xf"></malloc> <free addr="0x0"></free>
| <free addr="0x4"></free> <free addr="0x9"></free>
| <free addr="0xf"></free> </div>
| Mizoguchi wrote:
| Love this. It is so important to know these fundamentals
| concepts. I would like to see a similar version for SQL database
| indexes as 99% of the engineers I work with have no idea how to
| use them.
| Dudester230518 wrote:
| It's kind of not important though, except for a tiny group of
| developers.
| asnyder wrote:
| If more took the time to EXPLAIN ANALYZE they'd see indexes in
| action and hopefully learn from a few variations, especially
| pre/post variations of indexes. Or so I'd hope :).
| shepherdjerred wrote:
| Oh, if only I had this in college! This is a fantastic
| explanation.
| xpuente wrote:
| [1] is not very far from that. IMHO, [1] it is better.
|
| [1] https://pages.cs.wisc.edu/~remzi/OSTEP/vm-freespace.pdf
| cromulent wrote:
| Nice article, but highly editorialized title, not sure what is
| intuitive about it.
| kaba0 wrote:
| Somewhat relevant about how GCs work, with also very great
| visuals: https://spin.atomicobject.com/2014/09/03/visualizing-
| garbage...
| photochemsyn wrote:
| Nice article on general-purpose memory allocation approaches.
| While the article doesn't explicitly discuss it, these all seem
| to be list-allocation mechanisms?
|
| > "List allocators used by malloc() and free() are, by necessity,
| general purpose allocators that aren't optimized for any
| particular memory allocation pattern... To understand, let's
| examine commonly used list allocation algorithms: first-fit,
| next-fit, best-fit and quick-fit."
|
| That's from an article on custom memory management (aimed at
| embedded programming issues) I've found pretty useful, and it
| picks up where this leaves off:
|
| https://www.embedded-software-engineering.de/dynamic-memory-...
|
| You can practice writing custom memory managers in an application
| that runs on an operating system by only using the stack (just
| create a big array of int etc. and call that your memory space):
|
| > "For the safety-critical tasks, the developer can replace the
| standard allocator with a custom allocator that sets aside a
| buffer for the exclusive use of that task, and satisfies all
| memory allocation requests out of that buffer... The remainder of
| this paper presents four examples of custom allocator algorithms:
| the block allocator, stack allocator, bitmap allocator and
| thread-local allocator. Custom memory managers can use any one of
| these algorithms or a combination of different algorithms."
| ModernMech wrote:
| This is wonderful! I'm definitely going to be sharing this with
| my students (college sophomores studying CS).
|
| If I were to make some suggestions, based on how I know they
| would receive this:
|
| - I would make explicit reference to heap and stack. Students who
| are learning this material are learning about the heap/stack
| dichotomy, and I think it would really improve the exposition to
| make clear that not all memory is allocated this way in a
| program.
|
| - I would remove this line: "At the end of this post, you should
| know everything you need to know to write your own allocator." I
| can confidently say that my student will not be able to write an
| allocator after reading this. It's nothing to do with your piece,
| it's just the intersection of people who don't understand hex and
| who could build an allocator after a short blog post is very very
| small. Students who read this post and at the end still feel like
| they can't build an allocator will end up discouraged, with a
| feeling that maybe they are missing something.
|
| - Consider rethinking how you handle hex numbers throughout. You
| introduce them and say they are distinguished by a preceding
| "0x", but then you immediately omit the 0x to save space in the
| figure. In my experience, students have _a lot_ of trouble with
| understanding that hex and dec have the same underlying
| representation. They will not be able to distinguish between hex
| and dec bases implicitly, so from a pedagogical standpoint, it 's
| better to be consistent throughout and include the prefix.
|
| - Finally at the end I would make mention that there are other
| strategies for memory management to encourage further
| exploration. Other languages do it differently, and for students
| it's important to know which other avenues are out there.
| Otherwise they might think heap-based malloc/free are the only
| way to do things, the same way they often thing imperative
| programming is the only way to program.
|
| Anyway, thank you for creating this, and I'm sure it will really
| help my students. In a just world, "seeing" the memory like this
| would ideally be first-class tooling for languages like C.
| samwho wrote:
| Really appreciate you taking the time to write this, thank you.
|
| I tried a couple of different ways to introduce the stack and
| the heap but it always felt like it made the post too long and
| complicated. In the end I decided to take a pure, idealistic
| view of memory in order to focus on the algorithms used to pack
| allocations effectively. You can see some of my abandoned
| efforts as HTML comments in the post :D
|
| Introducing the 0x prefix and immediately dropping it hurt me
| as well, but I didn't have a better way to make the
| visualisation work on mobile. I completely agree with you that
| it's not ideal.
|
| I'd like to do a post of this style about garbage collection at
| some point.
| wrycoder wrote:
| Please do, this excellent post is already a good start on the
| issues involved in creating compacting garbage collectors.
| ModernMech wrote:
| Maybe make this a series? I think another post just like this
| on the stack is in order. You could show allocating stack
| frames with the slider! Then you can put them both together
| in a third post and show the entire memory layout of a C
| program.
|
| Would definitely like to see more thoughts from those cute
| corgis.
| samwho wrote:
| A few people suggested a series, but there's a human
| problem: my attention jumps between topics quite a lot! At
| the end of this post, and my load balancing post, my main
| feeling was relief in that I could start looking into a new
| topic.
|
| Maybe after more time has gone by I can revisit earlier
| posts and do follow-ups :)
| spatter wrote:
| > _Others will return what 's called a "null pointer", a special
| pointer that will crash your program if you try to read or write
| the memory it points to._
|
| This is not strictly true, it depends on the environment you're
| using. Some older operating systems and some modern embedded
| systems have memory mapped at the zero address.
| devit wrote:
| The article claims that an allocator that splits memory based on
| allocation size is called a "buddy allocator". That's misleading:
| an allocator that allocates an area for each size class is
| usually called a "slab allocator", while a "buddy allocator" is
| one that when needed subdivides a memory area with a power of two
| size into two half-sized areas that are "buddies", does so
| recursively to satisfy allocations, and coalesces them again when
| they are free.
|
| E.g. the Linux kernel used (not sure if it's still like this) a
| buddy allocator to allocate pages and power-of-two blocks of
| pages and slab allocators to subdivide those pages and allocate
| data structures.
|
| Another thing that the article doesn't mention that is important
| is that most production allocators make use of thread-local
| storage and either have per-thread caches of free blocks or
| sometimes whole per-thread memory regions. This is to reduce lock
| contention and provide memory that is more likely to be in the
| current core's cache.
| sylware wrote:
| linux had a bitmap based "buddy allocator" (power of two), now
| it is not bitmap based anymore (complexity not worth it
| anymore, performance wise, namely simplicty was restored).
|
| Then linux has various slabs(slub/slob/slab), built on top of
| the "buddy allocator".
|
| Userlevel code shoud use non virtual address stable mmap-ed
| regions (slabs + offsets). Legacy "libc" services were built as
| virtual address stable services... which are kind of expensive
| to manage on a modern paginated system. Virtual address stable
| regions should be kept to a minimum (that horrible ELF static
| TLS). There is a workaround though (but linux overcommit
| default policy could kill your process): a user process would
| query the amount of ram on the system and mmap a region of
| roughly (care of the overcommit policy) the same size, only
| once per process life-time. Then you could have a virtual
| address stable region which could use most if not all the
| available ram (excluding hot-memory addition...). Should be
| very easy to manage with lists.
| samwho wrote:
| You're absolutely right, I've corrected this. Thank you!
|
| I had originally written about threading and locality but it
| made the post too long and complicated, so I cut it out for the
| final draft. You can see remnants of it if you check the HTML
| comments in the post :D
| matheusmoreira wrote:
| So good! I wish I had an article like this to learn from when I
| implemented my language's memory allocator a few months ago!
| Wonderful, thank you!
| jamesgill wrote:
| This is incredibly well done. I've never seen malloc/memory
| allocation explained so clearly. I'd buy a book written like
| this.
| samwho wrote:
| I have some bad news for you about books.
| imtany wrote:
| Could you elaborate?
| ftxbro wrote:
| > "There's no shortage of information about memory allocators on
| the Internet, and if you've read this far you should be well-
| placed to dive in to it. Join the discussion on Hacker News!
| https://news.ycombinator.com/item?id=36029087 "
|
| Interesting to use hacker news as the blog's own comment section
| in this way.
| samwho wrote:
| I've seen a few people doing it, seems to work well.
| SV_BubbleTime wrote:
| I am slightly embarrassed I didn't know overallocate was the term
| for that. TIL!
| cinntaile wrote:
| Can you highlight the specific malloc() calls in the interactive
| part? It confused me when it said malloc(5) because it looks
| almost exactly like malloc(4). Specifically highlighting the
| malloc(5) allocation would make that a bit clearer I think.
| samwho wrote:
| I completely agree with you on this, though I couldn't find a
| great way to do it. It was suggested to me to put an outline
| around the allocation or free that just happened, but I
| struggled to get the box to look good when the allocation was
| split across 2 lines.
|
| I've started writing my next post and have learnt a bit more
| about PixiJS/GSAP3, and think I know a way to do it that would
| work nicely but would require changing some of the underlying
| code. I can't promise I'll revisit this soon, but I would like
| to in future.
| cinntaile wrote:
| I understand. It's surprisingly difficult to do anything non-
| trivial on the web.
| sailorganymede wrote:
| This is a great article. I also recommend K&R Chapter 8 if people
| wanna know more. Do the exercises and it'll just click.
| shreyshnaccount wrote:
| Someone needs to make an awesome list for visual guides
| ocay wrote:
| Absolutely, this is so helpful
| tpoacher wrote:
| Great webpage, but it somehow left me more confused.
|
| It seems to imply that malloc/free works by boundary tag? Which I
| don't think is the case? (and if not, it leaves the reader
| confused as to how it then actually works).
|
| I know "some" languages use the tag technique (e.g. julia), but
| the article seems to imply that this also applies to the c code
| above? Or am I wrong and c also makes use of such a scheme when
| you use pointer arithmetic?? (I don't see how that would work
| with arrays if that's the case though)
| samwho wrote:
| I'm sorry you're more confused than you were when you started!
|
| The post is intending to talk about the topic of memory
| allocation in a general sense. The way that malloc works on any
| given system is implementation dependent, it's not possible to
| say "malloc works in this way" because Debian can differ from
| Arch can differ from BSD, etc.
|
| It's not my goal to tell you exactly how modern malloc/free
| implementations work. I could probably be more clear about this
| at the start of the post.
| davidgrenier wrote:
| The only thing that confused me is how it said we can know the
| location of the block after and before by calculating:
| address + <value at address> address - <value at
| address-1>
|
| Shouldn't this be? address + <value at address>
| + 3 address - <value at address-1> - 3
| samwho wrote:
| Well shit. I think you're right.
| davidgrenier wrote:
| Well otherwise, I learned a lot and the basics are much
| simpler than I expected, thank you for the article.
| davidgrenier wrote:
| Oh another thing, I'm not a fan of the premise:
|
| "As a general-purpose memory allocator, though, we can't get
| away with having no free implementation."
|
| I have a belief that the future of software are short-lived
| programs that never free memory. Programs allocate and
| terminate. Short-lived program communicate with each other
| via blocking CSP-style channels (see Reppy's Concurrent
| Programming in ML).
|
| If you could also educate me on why this is a bad idea I
| would appreciate.
| colanderman wrote:
| Even if the programs don't free memory, something has to
| allocate and free memory for the programs and channels.
| e_y_ wrote:
| This would probably be something closer to actors
| (https://en.wikipedia.org/wiki/Actor_model) rather than
| programs since programs are traditionally implemented as OS
| processes which are relatively expensive to spin up and
| terminate. At some level, though, _somebody_ has to deal
| with freeing the memory, and they may do it less
| efficiently than you can.
| junon wrote:
| I agree with your point but disagree with your reasoning. I
| think programs should always free memory at _some point_
| because then it 's easier to reason about debugging memory
| leaks.
|
| Practically speaking though, there _are_ arena allocators
| that do exactly this - you allocate a bunch of memory at
| once, assign like-typed instances to "slots" in that
| memory region, and then deallocate everything all at once.
| Thus, the individual instance `free()` is a no-op.
| saagarjha wrote:
| It's not general purpose, and lots of programs that were
| designed to be short-lived often end up not being so in the
| future. People used to point at compilers as a typical
| example of this kind of thing, well, now we have compilers
| as libraries sitting resident in every popular developer
| tool.
| vidarh wrote:
| My first large scale web application was a webmail service
| built in C++ (!) where I early on decided we'd ditch
| _nearly_ all freeing of memory, as it was running as a CGI,
| and it was much faster to just let the OS free memory on
| termination. The exception was for any particularly large
| buffers. Coupled with statically linking it, it reduced the
| overhead sufficiently that running it as a CGI performed
| well enough to save us the massive pain of guaranteeing
| sufficient isolation and ensuring we were free of memory
| leaks.
|
| _Especially_ in a request /reply style environment, long
| running application servers is largely a workaround for
| high startup costs, and it's only a "bad idea" in the
| instances where removing that high startup cost is too
| difficult to be practical. Overall I love avoiding long
| running programs.
| planede wrote:
| My take on this is that code should always match up malloc
| and free, but your application may use an allocator where
| free is noop, if that's appropriate for the application you
| write. This way your code is more generic and can be reused
| in an other application with different constraints.
|
| And as soon as you are replacing free, you can replace
| malloc as well to be optimized for your use case. No need
| to build difficult bookkeeping hierarchies when they will
| never get used.
| jandrese wrote:
| So basically garbage collection via just terminating and
| letting the OS handle it?
| samwho wrote:
| It's funny, I saw and retweeted this while writing this
| post:
| https://twitter.com/samwhoo/status/1650572915770036225?s=20
|
| Not sure the future you describe is where we'll end up,
| haven't given it a huge amount of thought. Would be
| interesting to see, though.
|
| Things like web servers could probably get away with doing
| some sort of arena allocation per request (I'd be surprised
| if some don't already do this).
| williamcotton wrote:
| Apache does this! And I do this in my own C web
| framework:
|
| https://github.com/williamcotton/express-c/blob/master/de
| ps/...
| dzaima wrote:
| With a simple enough allocator, freeing things could maybe
| even be beneficial even for short-lived programs, purely
| from the memory being in cache already, instead of needing
| to ask the OS for more (incl. page faulting & zeroing,
| besides being limited by RAM throughput). For a buddy
| allocator without coalescing, a free() that just adds the
| argument to the corresponding freelist can be as simple as
| 5 or so x86-64 instructions (the fast path of the allocator
| being ~8 or so instructions; certainly more than a bump
| allocator, but not by much, and the reuse benefits can
| easily be pretty big).
| ho_schi wrote:
| This also trapped me.
| robjwells wrote:
| Love the dog. A bit like the "Cool Bear" asides in Amos's
| articles (fasterthanli.me), a chance to pause & consolidate.
| samwho wrote:
| Don't tell Amos, but Cool Bear is the inspiration for Haskie,
| my little husky companion.
| bambax wrote:
| Excellent, excellent article! I have a question though.
|
| > _Couldn 't we rearrange the memory to get a block of 6
| contiguous bytes? Some sort of defragmentation process?_
|
| > _Sadly not. Remember earlier we talked about how the return
| value of malloc is the address of a byte in memory? Moving
| allocations won 't change the pointers we have already returned
| from malloc. We would change the value those pointers are pointed
| at, effectively breaking them. This is one of the downsides of
| the malloc/free API._
|
| But why not? Couldn't we store information about old pointers
| somewhere and match them with new addresses when defragmenting?
| Some kind of virtual memory driver that would map old pointers to
| new adresses transparently for the programs? Or would it be too
| much overhead for too little benefit?
| aidenn0 wrote:
| > But why not? Couldn't we store information about old pointers
| somewhere and match them with new addresses when defragmenting?
| Some kind of virtual memory driver that would map old pointers
| to new adresses transparently for the programs? Or would it be
| too much overhead for too little benefit?
|
| In languages where memory is managed for you, you can
| absolutely do this, since the runtime can find every single
| pointer to an object and rewrite it.
|
| Virtual memory _can_ let you do this, but would require a
| separate page for each allocation (since virtual memory
| operates on a page-level). Given that the smallest page on
| modern architectures is 4k, this would mean using 4k of ram for
| each allocation (and rounding up each allocation to a 4k page
| boundary).
|
| On top of that, it's an OS system call to map and unmap pages,
| which means you incur a system-call on every allocation and
| deallocation, which is much slower than using a user-space
| allocator.
| alex7734 wrote:
| To do that you either need a structure that you update every
| time a pointer is created, copied, moved or deleted (too much
| overhead), or you need a way to scan the entire memory and get
| all the pointers. And at the point where you have a piece of
| code that knows where every pointer is, you already know which
| pointers aren't used anywhere anymore so it's a waste to not
| have it also call free() for you.
|
| Once you have it call free() for you, your piece of code is now
| a compacting GC, like Java's for example.
| AshamedCaptain wrote:
| Most OSes today do that "transparently" with virtual memory,
| but usually with a coarse granularity (e.g. 4k). A page table
| is just a translation of "pointers" to "memory addresses"; the
| OS can rearrange physical memory as it sees fit without the
| program having to update its pointers.
|
| In OSes without virtual memory, one option is to do the same
| non-transparently: instead of returning pointers, malloc and
| friends work with "pointers to pointers" (called handles), so
| there is one extra level of indirection, and now the OS is free
| to rearrange this 2nd level as it sees fit. Whenever a program
| wants to read/write the data behind a handle, it must
| dereference the handle to get to the real pointer, but it also
| must let the OS know that it is currently using the real
| pointer -- this is to avoid the OS moving it around. This is
| usually called "locking/unlocking" the handle.
|
| Some classic examples are Windows 3.x, Mac OS (toolbox),
| PalmOS, etc.
|
| https://en.wikipedia.org/wiki/Classic_Mac_OS_memory_manageme...
| cconstantine wrote:
| In a language like C that isn't really possible because the
| language can't keep track of all of the places that memory
| address is stored.
|
| If malloc were to return something like an address that holds
| the address of memory allocated there is nothing preventing the
| program from reading that address, doing math on it, and
| storing it somewhere else.
| kaba0 wrote:
| Well, that's what some GCs do, and they do indeed defragment
| the heap.
| umanwizard wrote:
| > Some kind of virtual memory driver that would map old
| pointers to new adresses transparently for the programs
|
| You would need hardware support for this, since the hardware is
| what decides what gets returned when a program attempts to read
| from a memory location.
|
| Hardware already does support virtual memory but the
| granularity is the page (which are a minimum of 4KiB in most
| OSs).
| samwho wrote:
| You'd need cooperation between your malloc implementation and
| the application code. It's possible, but tricky. If your malloc
| returned a pointer to a pointer, and promised to keep the first
| level pointer up to date, and was able to lock your application
| whenever it moved things around, it could work.
|
| Someone else already mentioned, but garbage collected languages
| do actually do this. Because they're fully in control of memory
| (the language exposes no raw pointers), they're able to create
| the layer of indirection you've suggested and move things
| around as they please. I know at minimum the JVM does this. The
| term to search for is "heap compaction."
|
| There's also the weird and wonderful work of Emery Berger et al
| with their "Mesh" malloc implementation, which blows my mind:
| https://youtu.be/XRAP3lBivYM.
| eschneider wrote:
| Early MacOS did this with handles (basically pointers to
| pointers.) You'd lock them to read/write and when they were
| unlocked, the OS was free to move the underlying memory and
| change the pointer in the handle.
| markus_zhang wrote:
| Interesting. Is there a book that focuses on evolution of
| allocators so we can follow along and code allocators of
| different difficulties?
| samwho wrote:
| Good question! I'm not aware of one, but I also haven't looked.
|
| Most of the research for this post was done by reading papers
| for various malloc implementations (phkmalloc, dlmalloc,
| tcmalloc, mimalloc) and reading their source code.
| dexter89_kp3 wrote:
| this is awesome
| fmiras wrote:
| This is an example of why interactive blog post are awesome,
| loved the illustrative sliders
| winrid wrote:
| I was having some fun recently just seeing how fast we can
| allocate large chunks of memory.
|
| There's something refreshing about firing up a C (or maybe now
| Zig, in my case) compiler and allocating a gigabyte of memory,
| and seeing that your process is using exactly 1GB.
| thecoppinger wrote:
| All credit goes to my wonderful friend Sam Who:
| https://twitter.com/samwhoo
|
| He recently published a similar guide on load balancing that is
| equally intuitive and insightful: https://samwho.dev/load-
| balancing/
|
| I can't wait to see what he puts out next!
| terrycody wrote:
| https://news.ycombinator.com/item?id=35588797
|
| We never missed good things lol!
| fzeindl wrote:
| Ahh yes. Back at university we had a class called efficient
| programming where we had to implement a Unix utility and optimize
| it for speed, meaning cpu cycles.
|
| While aggressively optimizing we replaced malloc in the end with
| a function we called "cooloc", that traded portability and
| security for speed. The debug tool here would have been useful.
| thangalin wrote:
| When writing C, I tend to avoid calling malloc and free directly.
|
| * https://github.com/DaveJarvis/mandelbrot/blob/master/memory....
|
| I then apply this same principle of "opening" and "closing"
| structures throughout the application. Generally, I can quickly
| verify that the calls are balanced:
|
| * https://github.com/DaveJarvis/mandelbrot/blob/master/threads...
|
| What's nice about this pattern is that the underlying
| implementation of exactly how the memory is maintained for a
| particular data structure (e.g., whether malloc or
| gdImageCreateTrueColor is called) becomes an implementation
| detail:
|
| * https://github.com/DaveJarvis/mandelbrot/blob/master/image.c
|
| The main application opens then closes structures in the reverse
| order:
|
| * https://github.com/DaveJarvis/mandelbrot/blob/master/main.c
|
| This means there's only one call to malloc and one call to free
| throughout the entire application (third-party libraries
| notwithstanding), allowing them to be swapped out with ease.
|
| Aside, logging can follow this same idea by restricting where any
| text is written to the console to a single location in the code
| base:
|
| * https://github.com/DaveJarvis/mandelbrot/blob/master/logging...
| FrankyHollywood wrote:
| This is a nice read too (the whole blog actually)
|
| http://igoro.com/archive/volatile-keyword-in-c-memory-model-...
|
| It's a bit old by now (2010), but I always remembered the mental
| model Igor created.
| bjt2n3904 wrote:
| The asides with the dog are great. I've seen a few blogs use this
| technique -- I'm not big on Greek, but I think Plato used this
| technique too.
|
| It instills the idea that you should be asking a question at this
| point. Some information has been provided that should generate
| more questions in your head, if you're keeping pace.
| samsquire wrote:
| Thank you for this, this is helpful.
|
| I wrote a JIT compiler and I didn't bother calling free much, I
| just let the operating system free up all allocated memory.
|
| I got into this situation often: return_struct =
| do_something(mystruct); return_struct->inner_struct =
| malloc(sizeof(struct my_inner_struct));
|
| Now, who owns inner_struct? Who is responsible for freeing it? Do
| I free it when I assign to it?
|
| I feel this ownership complicates cross-language FFI API calls,
| because who is responsible for freeing structures depends on the
| application and the platform you're running under. For example,
| Rust code being called from Erlang. You have to be extra careful
| when memory is managed by a different language runtime.
|
| I feel I am at the beginning of intuitively understanding what
| memory really is: memory is just a huge contiguous region of
| numbered locations. Bump allocators allocate in one direction and
| free all at once. Arena allocators allocate to a preallocated
| region, I think.
|
| Memory is a logistical problem of how you arrange and allocate
| finite resources.
|
| I am thinking of alternative visualizations of understanding
| memory, for example, I started writing an animation of binary
| search:
|
| https://replit.com/@Chronological/ProgrammingRTS
|
| The idea is that you see values and memory locations move around
| with the final goal being able to command memory to move around
| and be computed, such as real time strategy game.
|
| I think if we could visualise memory as cars on a road, we would
| see obvious traffic jams.
| duped wrote:
| > You have to be extra careful when memory is managed by a
| different language runtime.
|
| While it would be _nice_ to have next to no overhead for FFI,
| it 's not always tractable. That's why you have to serialize
| across boundaries, the same as if you're serializing across
| processes or the network. At least in a single virtual memory
| space you can have a caller allocate a buffer and the callee
| fill it, with the caller being responsible for deserializing
| and freeing later. That gets you pretty far, and is relatively
| safe.
|
| The alternative is to be callee managed, and for the callee to
| return things by handle and not necessarily by pointer, but
| that is also fraught.
| dahart wrote:
| > I feel I am at the beginning of intuitively understanding
| what memory really is: memory is just a huge contiguous region
| of numbered locations.
|
| There might be an analogy here that could help you reason about
| your nested structure allocations...
|
| Memory is an array of bytes owned by the OS. While there are
| all kinds of implementation details about addressing and
| storage and performance and paging and virtual memory, it's
| really just an array. The OS gives you a way to reserve pieces
| of the array for your own use, and you're responsible for
| giving them back if you want to play nice and/or run for a long
| time, otherwise (as a safety net) the OS will take them back as
| soon as you exit.
|
| This is, in a sense, very similar to the question you posed. An
| outer routine owns the outer structure, and an inner routine
| allocates some inner structure. The simplest, most intuitive,
| and generally best advice is that whoever allocates is also
| responsible for freeing memory. In other words, one way to
| _define_ ownership of memory is by who allocates it. Implicitly
| and automatically the responsibility to free that memory
| belongs to owner that allocated it. It's okay to explicitly
| transfer ownership, but can easily get complicated and
| unintuitive. You can also consider letting the parent free your
| struct to be similar to not calling free() in your JIT compiler
| - it's a 'lazy' optimization to have the parent clean up - and
| I don't mean that in a judgemental sense, I mean it's valid to
| let the parent handle it, if you know that it will, and this
| can be done without getting confused about who owns the memory
| and who was actually responsible for freeing it. Note that when
| you leave the parent to clean it up, you are foregoing the
| ability to re-use the memory - this is true in your JIT
| compiler and it's true for malloc() and free() as well. If you
| let the OS handle it, you're in effect declaring that you
| believe you do not need to recycle the memory allocated in your
| program during it's execution. (This might be true, and it
| might stay that way, but it's always worth asking if it will
| remain true, since lots of people have been eventually bitten
| and had to retroactively refactor for memory management when
| their requirements change.)
| samwho wrote:
| Yeah, I hear you. I've not done a lot of FFI stuff directly, it
| scares me.
|
| Arena allocators are cool, the idea is you allocate a large-ish
| region of memory and sub-allocate into it (often with a fast,
| simple allocator like a bump allocator) and then free the
| large-ish block when you're done. It's a way to take knowing
| how much memory you need as a whole and optimise that to a
| single call to malloc/free.
|
| You may enjoy looking through https://www.cs.usfca.edu/~galles/
| visualization/Algorithms.ht....
| samsquire wrote:
| Thanks for the link to the animations.
|
| I want an extremely performant deep copy solution, I've been
| thinking of using an allocator to implement it.
|
| If we have a tree data structure or a nested hashmap, then we
| want to copy it cheaply, there is copy on write. But most
| copies of hashmaps are slow because they instantiate every
| child object in a recursive loop.
|
| So I want to be able to memcpy a complicated data structure
| for cheap copies.
| secondcoming wrote:
| > Now, who owns inner_struct?
|
| return_struct does since it is the only thing that knows the
| address.
|
| > Who is responsible for freeing it?
|
| return_struct is, unless you hand that responsibility over to
| something else.
|
| > Do I free it when I assign to it?
|
| Yes, unless you want leaks.
|
| > I think if we could visualise memory as cars on a road, we
| would see obvious traffic jams.
|
| That visualisation is helpful for threads, where the program is
| the road/map and the cars are the threads. I don't see how it's
| useful for memory.
| terrycody wrote:
| Glad there are always talent people and great guide like him/this
| exist! Great people and guides will help tremendous learners
| along the way, timeless, priceless.
| jxf wrote:
| A great example of why I'll always read a post over watching a
| video.
| a257 wrote:
| I think it will be helpful if it discusses internal
| fragmentation, where the payload is smaller than the allocated
| block. I observed that this was important in understanding the
| purpose of various memory allocation algorithms when undertaking
| the malloc lab in college.
| patleeman wrote:
| I love this so much, thank you for putting this together!
|
| My only piece of feedback would be for the "Inline Bookkeeping"
| section (https://samwho.dev/memory-allocation/#inline-
| bookkeeping), it took a while for me to grok the numbered list to
| figure out which block corresponded to address + X. I wonder if
| there is a better way to visualize the 4 numbered bullet points?
| Maybe just arrows and text pointing to the visualization?
|
| Thanks again for this wonderful article!
| samwho wrote:
| Yes, this is one of the things I look back on as a weak point
| in the writing. I wanted to make this a lot more clear but by
| the time I'd gotten to this point in the article, I'd sort of
| coded myself into a corner with the visualisation code.
|
| In another universe I'd hook in to the page scroll and
| highlight each block being referred to as I talk about it in
| the text. I'm probably not explaining that well here, but
| imagine something like the way this article works:
| https://pudding.cool/2017/05/song-repetition/.
| wizzwizz4 wrote:
| Couldn't you highlight each block being referred to as you
| mouse over / click on relevant text? (The latter might not be
| _terribly_ hard to do with <a> and CSS :target, if you can
| give the blocks ids.)
| samwho wrote:
| The blocks aren't HTML elements, they're drawn on to a
| canvas. It _could_ be possible regardless, though. I may
| revisit this. Thanks for the idea! :)
| jesselawson wrote:
| Sam, this is such a wonderful resource that you've put out into
| the world. Thank you for the time and care you've put into each
| paragraph and interactive component. You've not only given me a
| lot to think about in terms of my basic assumptions about memory,
| but also about how to write and teach better online. I'm really
| excited to start following you!
| alberth wrote:
| While I do like the article, I wish it simply used multiple
| images and/or animated gifs (instead of javascript) for the
| pictorials.
|
| It would make the site much more accessible and clear in the
| event you didn't realize you had to click forward.
| samwho wrote:
| The accessibility of this technique is something that greatly
| worries me. I try and be quite thoughtful about things like
| colour palette. I'm not sure how to achieve the level of
| interactivity I want while still being accessible to screen
| readers, without writing two completely separate articles with
| and without JavaScript.
|
| Definitely open to ideas.
| celeritascelery wrote:
| This is absolute gold. When I use things like this, I am reminded
| how powerful digital learning can be. So much more capable then
| just text or video. But very little content is this well put
| together. Probably because it is so time intensive.
| dmd wrote:
| If you're not yet familiar with it -
| https://ciechanow.ski/archives/ is for you (and everyone!)
| samwho wrote:
| The master of this artform.
| throwaway689236 wrote:
| Agreed, I wish I had more thing like this growing up.
| samwho wrote:
| This feedback has made my day. Thank you.
|
| I'm inspired by Bartosz Ciechanowski and Julia Evans. The web
| is such a powerful toolbox. So many concepts are more complex
| than text alone can hope to explain. Those two are so creative
| and full of energy.
|
| And you're right, it's incredibly time intensive to put these
| together. Thousands of lines of code, plus the text content,
| plus reaching out to domain experts for reviews (shout out
| Chris Down, kernel wizard extraordinaire).
| samwho wrote:
| Also shout out Anton Verinov (https://anton.codes/): the only
| reason this web page doesn't drain your battery before you
| get to the end of it.
| rozularen wrote:
| why is it mind you?
| couchand wrote:
| One more name I'dd add to this list is Mike Bostock. The care
| and attention he gives to data visualization examples comes
| through in the finished product.
|
| Communicating complex subjects through interactive visual
| displays is very effective -- it's worth the effort. Thank
| you!
| naillo wrote:
| Great job Sam
| ww520 wrote:
| The writing is very clear and the concepts are explained
| well. Not too much information and not too little. Excellent
| post.
| samwho wrote:
| Deja vu. :D
| spenczar5 wrote:
| Such great work. I learned things and have a clearer
| understanding that I think I will come back to in the future.
| And I say that as someone who has programmed for 15 years! I
| think your effort paid off, and the inspiration shows
| through.
| alex7734 wrote:
| > When we free memory, we should make sure that if the block we
| return to the free list is next to any other free blocks, we
| combine them together. This is called "coalescing."
|
| A little offtopic but the default Delphi 7 memory allocator did
| this, except that it also merged blocks that it obtained from
| different OS allocation calls.
|
| This worked fine for regular usage, but if that memory was ever
| used for Bitmaps for UI stuff, it wouldn't work: Since Windows
| does some of its UI stuff in kernel mode, before doing anything
| Windows would attempt to lock the entire allocation's VAD entry
| to prevent you from messing with it in another thread while it
| was using it. If the Bitmap you were working on happened to
| belong to two different OS-level allocations, this lock function
| would fail since it wasn't meant to handle that case.
|
| This would lead to random, extremely cryptic errors such as
| ERROR_NO_SCROLLBARS "The window does not have scroll bars." since
| the lock call was deep in the stack and the callers replaced the
| error with another one as it bubbled up.
|
| In the end we had to replace the allocator to avoid that issue.
| That was a fun day I spent debugging that.
| ilyt wrote:
| I think the far weirder part of this was the kernel-side
| handling of scrollbars
| alex7734 wrote:
| If I recall correctly the kernel part of things would return
| an out of memory error which the user mode DLL translated to
| that weird error (sometimes, other times it would just say
| "out of system resources", it depended on what widget the
| bitmap that overlapped the two memory regions belonged to).
|
| Here's a 2003 forum post from someone else having the same
| problem: http://www.delphigroups.info/2/1/749064.html
| derefr wrote:
| Until Windows 95, Windows was essentially just a DOS
| application that grabbed the framebuffer and ran an event
| loop where it drew "controls" (which includes windows,
| buttons, text views, and yes, scrollbars.) That was the whole
| point of it. It wasn't an "OS" per se; DOS was the OS.
| Windows was what a Linux-head would think of as a combination
| of an X server and window manager. And Windows loaded your
| "application" as essentially a DLL, with the Windows global
| event loop calling into your application's event-loop
| delegate handler (WndProc) whenever it has an interesting
| event that your application might like to react to.
|
| (Your application wasn't even a "process" per se. Until
| Windows 95, everything was just happening in one shared
| address space, in real mode. In fact, it was only in Windows
| 3.1 where user applications stopped running in ring 0!)
|
| If you think about it, this "the kernel is a game engine and
| your application is the game" approach isn't necessarily a
| _bad_ design... for a single-tasking OS 's library exokernel,
| like the Wii's https://wiibrew.org/wiki/IOS.
|
| But, of course, Windows claimed to be a multitasking OS. But
| it actually wasn't! And I don't mean the obvious thing about
| it not having pre-emption. Lots of multitasking OSes didn't
| have pre-emption.
|
| No, what I mean is that the concurrency primitive for the
| cooperative scheduling wasn't the "task" (i.e. the process or
| thread. Which, again, there weren't any of.) Instead, the
| concurrency primitive was the _window_. Until Windows 95,
| Windows was a multi- _windowing_ OS.
|
| Each control was owned by a window. Each window had a
| WndProc. If your Windows executable (i.e. application
| delegate module) set up two windows, then each window
| participated separately in the global Windows event loop, up-
| to-and-including things like having its own set of loaded
| fonts, its own clipboard state, and its own _interned strings
| table_. In OOP terms+, your application was just a dead
| "class object", running no logic of its own save for one-time
| load and unload hooks. It was the windows themselves that
| were the "instances" of your class.
|
| This might make you realize why MDI (or Multiple Document
| Interface, where there are multiple small per-document
| "windows" inside one big window) was so popular back then.
| The MDI "windows" weren't actually windows -- they didn't
| have their own WndProc. They were just controls, like a tab
| view is a control. Only the big container window was a real
| window, and so all the _resources_ within that big window
| were shared between all the virtual windows. MDI was a
| memory-saving trick!
|
| ---
|
| + The actual more interesting analogy is that Windows was
| essentially a (single-threaded, cooperatively-scheduled)
| _actor system_ , where windows were the actors. There is a
| very close parallel between (Windows 3.1 executables, Windows
| 3.1 windows) and (Erlang modules, Erlang processes).
| Stratoscope wrote:
| > _This might make you realize why MDI (or Multiple
| Document Interface, where there are multiple small per-
| document "windows" inside one big window) was so popular
| back then. The MDI "windows" weren't actually windows --
| they didn't have their own WndProc. They were just
| controls, like a tab view is a control. Only the big
| container window was a real window, and so all the
| resources within that big window were shared between all
| the virtual windows. MDI was a memory-saving trick!_
|
| MDI may have saved some memory - I can't say one way or the
| other on that - but the mechanism you describe is
| incorrect.
|
| Every MDI child window was a window of its own with its own
| WndProc. Every _control_ inside those windows was also a
| window with its own WndProc. Every dialog box was also -
| yes - a window with its own WndProc.
|
| You wouldn't always be aware of the WndProc in your code,
| but it was there.
|
| If you ran WinSight or Spy++, you could see the entire
| window hierarchy including all the child windows, child
| control windows, and so on.
|
| Later on, a few applications implemented "windowless
| controls" to save memory, but this was uncommon, especially
| in the early days. For example, there was an optional
| windowless version of the Rich Edit control:
|
| https://learn.microsoft.com/en-
| us/windows/win32/controls/win...
|
| Fun fact: an early informal name for MDI was "Mac in a
| box", because if you maximized the top level window, you
| had a somewhat Mac-like environment, with one menu bar at
| the top that was shared by the child windows.
|
| Source: I was the author of WinSight and the VBX (Visual
| Basic eXtension) API.
| AshamedCaptain wrote:
| > Until Windows 95, everything was just happening in one
| shared address space, in real mode. In fact, it was only in
| Windows 3.1 where user applications stopped running in ring
| 0!
|
| Windows 3.0 and predecessors runs in processor which had no
| concept of "ring 0", so that should not be surprising at
| all...
|
| > Your application wasn't even a "process" per se
|
| I think this is a bit of a "modernistic", "win32y" view of
| the definition of a process. Surely there are
| processes/tasks in 3.x -- you can launch multiple instances
| of a module, each of them can allocate memory separately,
| each of them have different resources/handles, and the OS
| will cleanup for you once each instance terminates
| (cleanly). Obviously, without any type of virtual address
| space or memory protection, any such process can write to
| and destroy the memory of any other process, but they are
| still processes. The existence of Yield/DirectedYield ,
| which do not take/need a window, is also a hint of that.
| (Note there are no threads in 3.x).
|
| Many platforms (that either predate VM or decide not to use
| VM for e.g. power usage concerns) worked like this. MacOS,
| Windows CE, PalmOS, etc.
| meekaaku wrote:
| You mean a bug this deep took one day to debug?
| alex7734 wrote:
| Yes and no, it took me from 8am to 3am once we decided it
| needed to get fixed but really it sat on the app for years,
| it only happened on a background process that sent print jobs
| on a timer, since it used Windows GDI to compose the image we
| sent to the printer it was affected (our "frontend" should've
| been affected too but never was, I guess because it had a
| different memory usage pattern).
|
| We just had it restart itself and try again whenever it got
| one of those errors when printing but eventually we wanted to
| add a feature that required the process to not die, and by
| that time I was already 99% sure that it wasn't something in
| our code and I had already ruled out threading issues.
|
| I ended up putting it in a VM with a kernel debugger attached
| and having a script make a snapshot and make it print over
| and over until it errored, then following along in IDA until
| I saw what was going on.
|
| Having a way to trigger it (by restoring the snapshot) on
| demand helped a lot, otherwise it would have taken forever to
| make sense of it as it could sit without crashing for nearly
| an hour.
| zeusk wrote:
| How do you attach kd to VM?
|
| While I was at MS, it was such a big PITA - we just had a
| bunch of IT managed machines with KVM console access and
| KDNET for debugging.
| EvanAnderson wrote:
| In Hyper-V it's fairly easy. You make a virtual serial
| port ("COMPort"), set the bootloader to enable kernel
| debugging over serial, then connect to the virtual serial
| port from the host via a named pipe.
|
| https://learn.microsoft.com/en-us/windows-
| hardware/drivers/d...
|
| I haven't tried it with vSphere but I suspect it'd be
| similar.
| bee_rider wrote:
| Owch, I'm guessing that wasn't -5 hours of debugging.
| duped wrote:
| It's somewhat incomplete without a discussion of how one actually
| gets the memory allocate to a program.
| samwho wrote:
| Could you elaborate on what you mean?
| [deleted]
| duped wrote:
| It talks about how a simple malloc implementation would doll
| out entries in a buffer of memory and manage free lists, but
| not how the implementation actually gets that buffer from the
| operating system, or return it to the system when done (mmap,
| munmap, for example).
| samwho wrote:
| Ahh yes, I'm with you now.
|
| I made a deliberate choice to not include that, mostly due
| to post size/complexity reasons. I decided that the focus
| of the piece was going to be on the task of effectively
| managing the memory you have in the face of calls to
| malloc/free. I decided against talking about any
| environment-specific information (brk, mmap, cache
| locality, multithreading, etc.)
|
| May not have been the right call, but it kept the length
| fairly reasonable and I think it gives people enough of a
| nudge to dip their toes into the playground if they have
| time.
|
| If you check the HTML comments on the page you'll find
| quite a lot of cut content. I wanted to do deep dives on
| real-world malloc implementations, but it ended up being
| very difficult to cover without making the post take over
| an hour to read.
| OliverJones wrote:
| Oh man, this brings back the days when I wrote special debug-
| version malloc and free code to try to track down heap corruption
| due to malloc / free misuse (in code I had contributed to). Stuff
| like kbyte-long boundary buffers with bit patterns in them, and
| all sorts of lookaside lists run in parallel with libc's default
| code. Those bug-detectors worked OK. Hard-nosed code inspection
| worked far better.
|
| As an old-timer in writing code, I think my generation's most-
| challenging legacies (=== the things we f**ked up) are the non-
| robust malloc/free concept and null-terminated text strings. Bugs
| in code using those conventions have been exploitable to a really
| damaging extent. I learned to program in C from K&R. And getting
| data-structure code right, and safe to deploy, in that language
| and its derivatives is HARD.
|
| The C inventors are (were in Dennis Ritchie's case) brilliant and
| Bell Labs was great. Their ideas shaped the the stuff the global
| internet runs on. But these parts of what thy invented .....
| ouch. (All OSs had the same problems.)
|
| I wish the wonderful article presented here carried a more
| prominent warning about this. Many will read it as they learn to
| code. The history of our profession can teach about what NOT to
| do as well as what to do.
| junon wrote:
| This is really, really well done. Also, the allocator
| playground[0] is really cool. Will be my go-to when teaching this
| topic moving forward :)
|
| [0] https://samwho.dev/allocator-playground/
| samwho wrote:
| Thanks so much, I really appreciate it.
|
| I'm glad you like the playground. If you don't mind me asking,
| what/where/how do you teach? I was actually hoping to get the
| attention of educators with this tool to see if it would make
| sense in, e.g., undergrad CS courses.
| junon wrote:
| Just friends and colleagues, nothing formal. I wish you the
| best of luck with it, though!
| digitalianz wrote:
| [dead]
| CliffStoll wrote:
| A delightful page, written in a wonderfully interactive way.
|
| My high congratulations for creating a most friendly, readable
| and useful lesson!
| RamiAwar wrote:
| I love this! I wish it existed back when I was writing my first
| memory allocators in university or when building a custom
| EntityComponentSystem implementation.
|
| I'd love to also see applications of custom memory allocations. I
| know about usecases in building game engines and the importance
| of hitting cache there, but I'm not sure where else in the world
| this would be as useful.
| erwincoumans wrote:
| Robotics, embedded systems, CUDA/gpgpu, AI/Deep
| learning/Reinforcement Learning/LLM (PyTorch) for example.
| delta_p_delta_x wrote:
| I really like the website design, above all. I'd love to pen my
| thoughts online, but haven't had the energy to learn static site
| generation...
| N-Krause wrote:
| It even has a playground! https://samwho.dev/allocator-
| playground/
|
| How I wish I had something like that when I first learned C.
| samwho wrote:
| The playground was the inspiration for the post. I always
| wanted to be able to fiddle with malloc implementations and see
| how they work in this way.
|
| Admittedly the playground is hard to understand at first, and a
| touch janky in places. But the workloads are taken from for-
| real malloc/free traces on programs!
| N-Krause wrote:
| When you think that I (and probably the vast majority of
| developers) used a pen and a paper for the first few years
| every time I tried to visualize more complex memory, then
| that's a big upgrade.
|
| Especially because you can scroll through all the steps.
___________________________________________________________________
(page generated 2023-05-22 23:00 UTC)