[HN Gopher] Zig's new LinkedList API (it's time to learn fieldPa...
       ___________________________________________________________________
        
       Zig's new LinkedList API (it's time to learn fieldParentPtr)
        
       Author : todsacerdoti
       Score  : 218 points
       Date   : 2025-04-14 10:06 UTC (2 days ago)
        
 (HTM) web link (www.openmymind.net)
 (TXT) w3m dump (www.openmymind.net)
        
       | roetlich wrote:
       | This looks somewhat horrifying. How do I safely write a function
       | that takes a linked list as a parameter?
        
         | the_mitsuhiko wrote:
         | > How do I safely write a function that takes a linked list as
         | a parameter?
         | 
         | Zig does not have a lot of generic code. You would pass the
         | user directly and then walk the list or you use comptime. The
         | real answer is that "you don't write code like that in Zig".
        
         | reissbaker wrote:
         | Assuming you mean "how would I safely write a function that
         | takes a generic linked list and does something with the data,"
         | I'm pretty sure you would use comptime: your function would
         | take the concrete type (say, User) as a comptime parameter, and
         | then you would do your list stuff via accessing .node and use
         | the fieldParentPtr to access the underlying data.
         | 
         | Syntactically I don't think it's _that_ weird, TBH. And it 's
         | typesafe: if you write invalid code the compiler will give you
         | an error.
        
           | fpoling wrote:
           | But how does the generic code know the name by which the
           | field is known in the struct that contains the field? Is it
           | supposed to be passed as an extra parameter to the generic
           | function?
        
             | reissbaker wrote:
             | Yup, you could pass it in at comptime as well! And access
             | via @field.
        
             | spiffyk wrote:
             | I think the more important question here is: What
             | operations would you possibly want to do with a linked list
             | that are generic _and_ at the same time do something with
             | the concrete data in it?
             | 
             | To me these complaints sound like hypotheticals with no
             | sound grounding in the real world. If there indeed _is_
             | data that is common to the various types you would want to
             | operate upon in such linked lists, you would nest. E.g. you
             | would have some common Entity struct containing the common
             | data and the LinkedList.Node; this Entity would then be
             | inside your more concrete structs. The generic function
             | would then take a pointer to Entity.
        
               | reissbaker wrote:
               | I think a pretty good example would be: how do you write
               | a map function? Or filter, find, etc.
               | 
               | (You would do it with more comptime, but the question is
               | legitimate!)
        
               | spiffyk wrote:
               | > how do you write a map function? Or filter, find, etc.
               | 
               | You just don't. Zig does not have lambdas anyway, there's
               | no readability incentive to having such functions there.
               | You do these things with plain old loops built into the
               | language.
        
               | reissbaker wrote:
               | Zig has first-class functions (you can pass functions as
               | parameters); it just doesn't have closures. Map pretty
               | rarely uses closures anyway IME; e.g. converting a list
               | to JSON doesn't need to close over outside variables. And
               | anyway, anything that's:
               | 
               | 1. Generic over lists, and
               | 
               | 2. Takes a function as a parameter
               | 
               | Will want to know what the node field name is. Luckily,
               | comptime provides a solution there.
               | 
               | TBH I think "you just don't" is a pretty unsatisfying
               | answer to "how do I use these features that are built
               | into Zig" -- especially when you can use them, very
               | easily.
        
               | spiffyk wrote:
               | I mean yes, you can pass functions as parameters in Zig,
               | you can even define them inline if you wrap them in
               | anonymous structs, but my point is that -- in a language
               | where this is supported _somewhat properly_ -- you would
               | do this to improve readability with some sort of a fluent
               | API. If you attempt to do it in status-quo Zig,
               | readability will only suffer and you are still much
               | better off doing it with for and /or while loops
               | (depending on the data structure), especially when Zig
               | has nice syntax sugar with payloads in both to support
               | these.
               | 
               | edit: formatting
        
               | Zambyte wrote:
               | Zig definitely encourages an imperative style by design,
               | but it does support function pointers, which can be used
               | to implement a map function. I do think passing the field
               | name of the node as a comptime []const u8 as an extra
               | argument to map, filter, etc. would be the only way to
               | implement a generic version of these functions.
               | 
               | The fact that these functions are already designed to be
               | cumbersome to implement, I think that's fine? I have also
               | yet to use a linked list in Zig anyways. It's probably
               | better to use an array, slice, std.BoundedArray,
               | std.ArrayList, std.SegmentedList, or std.MultiArrayList
               | unless there is a specific reason that a linked list is
               | the best option.
        
           | roetlich wrote:
           | Okay, thanks for explaining more.
           | 
           | > And it's typesafe: if you write invalid code the compiler
           | will give you an error.
           | 
           | One more question, if @fieldParentPtr("node", n) is typesafe
           | and can get you a User, the compiler needs to know that the
           | struct has the fields of a User, and that this pointer is
           | indeed pointing at a node field, right? Then why do you need
           | to specify "node" at all?
           | 
           | I think I don't understand Zig at all :)
        
             | messe wrote:
             | What if the User is in multiple linked lists?
             | const User = struct {             username: []const u8,
             | id: u128,             age: u7,             sorted_by_age:
             | std.SinglyLinkedList.Node,             sorted_by_id:
             | std.SinglyLinkedList.Node,         };
        
           | boomlinde wrote:
           | @fieldParentPtr is not type safe. It assumes that the given
           | pointer is to a field of the given name in the inferred type.
           | Zig can detect at compile time whether the inferred type has
           | a field of that name of the same type as the pointer child.
           | So far, so good: type safe.
           | 
           | The lack of safety stems from the fact it doesn't know that
           | the assumption that the pointer has that parent is true.
           | Here's a very simple illustration. It'll compile.
           | const T = struct {             field: i32 = 0,         };
           | pub fn main() void {             var nonfield: i32 = 0;
           | _ = @as(*T, @fieldParentPtr("field", &nonfield));         }
           | 
           | `nonfield` doesn't have a parent T. The use of
           | @fieldParentPtr here causes what the official reference calls
           | "unchecked illegal behavior"; it isn't checked even in the
           | safe build modes.
        
             | grayhatter wrote:
             | > The use of @fieldParentPtr here [...]
             | 
             | the *hypothetical* use here...
             | 
             | This simple example as written is actually correct, and not
             | actually unchecked illegal behavior. But this is just a bit
             | of pedantry, because it would be unchecked illegal if you
             | use this in a more complicated example.
             | 
             | I mention it because it's important to note, it can be done
             | correctly, as you obviously just did it correctly, by
             | accident. That said, I still agree, there're no guardrails
             | on this pattern.
        
               | boomlinde wrote:
               | > the _hypothetical_ use here...
               | 
               | No, it's not hypothetical. I just did it, which
               | demonstrates that it's not type safe and that the
               | compiler won't give you an error if you write invalid
               | code, which is what the post I responded to claimed.
               | 
               | > This simple example as written is actually correct
               | 
               | No. The official reference is clear about this: "If
               | field_ptr does not point to the field_name field of an
               | instance of the result type, and the result type has ill-
               | defined layout, invokes unchecked Illegal Behavior."
               | 
               | Because there is no well-defined layout for plain structs
               | in Zig, they satisfy the "ill-defined layout"
               | requirement. Merely invoking the @fieldParentPtr on a
               | pointer that doesn't satisfy the other requirement in
               | that case invokes unchecked illegal behavior according to
               | the language reference.
               | 
               | > I mention it because it's important to note, it can be
               | done correctly,
               | 
               | Ignoring that you can't seem to get the facts straight,
               | the claim I am responding to is that it's "type safe" and
               | that "if you write invalid code the compiler will give
               | you an error". This is not the case, regardless of
               | whether my example invokes unchecked illegal behavior
               | (which it does) or not.
               | 
               | I'm very clearly _not_ saying that it _can 't_ be done
               | correctly and I don't think you can argue for that in
               | good faith.
        
               | grayhatter wrote:
               | > No, it's not hypothetical. I just did it, which
               | demonstrates that it's not type safe and that the
               | compiler won't give you an error if you write invalid
               | code, which is what the post I responded to claimed.
               | 
               | Describe explicitly how your example will trigger illegal
               | behavior? Happy to discuss the behavior of the compiled
               | asm if you'd like to be that pendantic?
               | 
               | Not type safe, and contains an appreciable risk or defect
               | are two distinct concerns. One matters less than the
               | other.
               | 
               | > Because there is no well-defined layout for plain
               | structs in Zig, they satisfy the "ill-defined layout"
               | requirement.
               | 
               | No? Struts have a defined layout, they're not guaranteed
               | between compilations, but they don't change within a
               | compilation unit.
               | 
               | > Merely invoking the @fieldParentPtr on a pointer that
               | doesn't satisfy the other requirement in that case
               | invokes unchecked illegal behavior according to the
               | language reference.
               | 
               | And the failure mode is the compiler will decide to
               | delete your OS?
               | 
               | Unchecked illegal behavior isn't magic, doing something
               | similar to illegal behavior doesn't mean the code will or
               | won't do what you intended. uint rollover is unchecked
               | illegal behavior (when not explict, in release fast) but
               | just because the uint might roll over doesn't make that
               | code invalid.
               | 
               | > I'm very clearly not saying that it can't be done
               | correctly and I don't think you can argue for that in
               | good faith.
               | 
               | I thought you were saying that it's guaranteed to be
               | incorrect. In fact, I think that's what you said in your
               | reply to me. Mostly because you used the word 'mearly'.
               | 
               | The way that I understood what you meant was, the example
               | you provided was guaranteed wrong. In fact it's actually
               | correct. It will have the same semantics and behavior as
               | what I assume was desired.
               | 
               | > Ignoring that you can't seem to get the facts straight
               | [...] and I don't think you can argue for that in good
               | faith.
               | 
               | I'm sorry if I was unclear, I was trying to discuss it in
               | good faith. :(
        
               | boomlinde wrote:
               | > Describe explicitly how your example will trigger
               | illegal behavior? Happy to discuss the behavior of the
               | compiled asm if you'd like to be that pendantic?
               | 
               | Through calling @fieldParentPtr with a field_ptr not
               | pointing at a field of the given field name in the result
               | type, when the result type has an ill-defined memory
               | layout. I'm essentially just paraphrasing the
               | documentation, which I've already quoted.
               | 
               | Generated code is irrelevant to this end. Illegal
               | behavior can coincidentally result in code that works as
               | intended in practice as a side effect of the
               | implementation. Similarly you may expect a signed integer
               | to wrap around as it overflows in C because of the
               | implementation of the compiler and the code it generates,
               | but it's still undefined behavior.
               | 
               | > No? Struts have a defined layout, they're not
               | guaranteed between compilations, but they don't change
               | within a compilation unit.
               | 
               | This is not the sense in which the Zig language reference
               | uses the term well-defined memory layout. Bare structs,
               | error unions, slices and optionals don't have a well-
               | defined memory layout in the sense the Zig language
               | reference uses it.
               | 
               | > And the failure mode is the compiler will decide to
               | delete your OS?
               | 
               | The failure mode is irrelevant. Unchecked illegal
               | behavior is unchecked illegal behavior regardless of the
               | failure mode. You are moving goalposts now.
               | 
               | > I thought you were saying that it's guaranteed to be
               | incorrect. In fact, I think that's what you said in your
               | reply to me. Mostly because you used the word 'mearly'.
               | 
               | The example I posted is guaranteed to be incorrect, which
               | demonstrates that the use of @fieldParentPtr can result
               | in unchecked illegal behavior, hence not type safe nor
               | giving you an error when you write invalid code.
               | Regarding the use of "merely", you should adopt the habit
               | of reading complete sentences before you draw conclusions
               | about what they imply.
               | 
               | > The way that I understood what you meant was, the
               | example you provided was guaranteed wrong.
               | 
               | The example is guaranteed to cause unchecked illegal
               | behavior.
               | 
               | > I'm sorry if I was unclear, I was trying to discuss it
               | in good faith. :(
               | 
               | If you want to argue in good faith, you can start with a
               | good faith reading of the language reference.
        
         | GolDDranks wrote:
         | If you write a function that takes a _generic_ linked list as a
         | parameter, you'd have a function that refers to just the linked
         | list subrecord, and does only linked list operations to that
         | and the linked nodes.
         | 
         | If you want to write a function that takes a linked list of a
         | specific type as a parameter, you just take in a value of that
         | type. The linked list is baked in, so you can get to the other
         | nodes, and because the type is known, you can get back to the
         | parent type from the linked nodes with fieldParentPtr. How to
         | do that _safely_? I don't think that Zig embraces any Rust-like
         | safe/unsafe dichotomy, so you don't.
        
         | flohofwoe wrote:
         | Since you don't need to care about the 'outer type' in that
         | case you just pass a pointer to the linked list header or a
         | linked list node and that's it.
         | 
         | If you need to access the outer type, just pass a pointer to
         | that type (since your functions need to know the outer type
         | anyway I don't think there's a need to reach for generics).
        
           | skywal_l wrote:
           | You have to pass the field name too ("node").
        
             | flohofwoe wrote:
             | Not really. If you only want to manipulate the list, you
             | only need a direct pointer to the nested node but don't
             | need to know the 'outer type'.
             | 
             | Only if the function takes a pointer to the outer type it
             | needs to know how to get the pointer to the embedded node
             | struct from the item pointer.
             | 
             | ...I guess it makes a lot more sense if you ever wrote code
             | for AmigaOS ;)
        
         | kllrnohj wrote:
         | You don't & generally shouldn't be in the first place, in any
         | language. Linked lists are a _very_ niche data structure, so
         | generic code should ~never be using them. So it 's a moot
         | question. It's kinda like the complaints about how hard a
         | doubly linked list is in Rust - it's just not important because
         | it's not something you should be using 99.999% of the time
         | anyway.
        
           | roetlich wrote:
           | Well I personally almost never use linked lists, don't like
           | them. But Zig seems to think otherwise, they have two
           | implementations in the standard library.
           | 
           | I don't know much Zig, I just wanted to ask how to use the
           | type that the article talks about?
           | 
           | Let's use the very simple example from the article. Let's say
           | I want to extract this code into a function:
           | while (node) |n| {           const user: *User =
           | @fieldParentPtr("node", n);
           | std.debug.print("{any}\n", .{user});           node = n.next;
           | }
           | 
           | 1. How does that look, what's the type signature of this
           | function? 2. What happens if I put in a list doesn't contain
           | users? Do I get a simple to understand compile time error, or
           | can I segfault because I'm accessing bad memory?
           | 
           | And I don't think that would need any generics, since the
           | list type isn't generic, right?
        
             | boomlinde wrote:
             | 1. If you mean @fieldParentPtr, the first argument is a
             | compile time-known name of a field, the second argument is
             | a pointer to a field of that name in the parent type, which
             | is inferred from the left hand side of the assignment
             | 
             | 2. Then you're screwed. In Zig, using @fieldParentPtr on a
             | field that doesn't have the given parent is unchecked
             | illegal behavior, meaning that there are no checks that can
             | catch it at compile time. This change basically guarantees
             | that there will be a pretty serious foot gun every time you
             | iterate over the items of a standard library list.
        
               | roetlich wrote:
               | Thank you, that answers all my questions.
        
               | throwawaymaths wrote:
               | Is 2 really an issue? Just how often are you going to
               | have multiple Node types going on in your codebase?
        
           | grayhatter wrote:
           | A linked list is just a simpler version of both a graph and a
           | tree. What data structure would you suggest to represent
           | both/either of those?
           | 
           | Or are you asserting, because you've never used them they're
           | not common? Because while maybe I agree, and I don't often
           | reach for a linked list, I've built plenty of trees and
           | graphs.
        
             | kllrnohj wrote:
             | Trees and graphs serve useful roles, of course, but calling
             | a linked list just a simpler version is a stretch. It'd be
             | like calling an array a simplified B-tree. The
             | simplification changes not just the implementation but also
             | radically changes the applicability of the result.
        
               | grayhatter wrote:
               | except a tree or graph with multiple pointers is similar
               | to linked list with one pointer.
               | 
               | where as a b-tree stored in an array without pointers
               | probably shouldn't be called a b-tree.
               | 
               | or am I missing something?
        
         | AndyKelley wrote:
         | Two points here:
         | 
         | Linked lists are useful in unsafe code. Most recent use case I
         | had for them was in an event loop with coroutines. It's not
         | possible to implement such thing in memory safe languages. For
         | example if you use Rust, you have to use unsafe [1].
         | 
         | @fieldParentPtr does not yet have safety but it is a planned
         | upcoming change to the language, with a fairly straightforward
         | implementation [2].
         | 
         | [1]: https://github.com/search?q=repo%3Atokio-
         | rs%2Ftokio%20unsafe...
         | 
         | [2]: https://github.com/ziglang/zig/issues/2414
        
           | roetlich wrote:
           | Oh, that makes more sense. Thank you!
           | 
           | Is this linked list type then mostly meant to be used as an
           | internal implementation detail, and not be exposed in a
           | public API? Because if I see a function that takes a non-
           | generic list I won't really know how to call it. :)
        
             | AndyKelley wrote:
             | Linked lists are sometimes the right tool for the job. They
             | are available in the standard library for such cases. If
             | you are struggling to understand how you would use this API
             | then it probably means you have not yet been exposed to
             | such a use case, and you are being quite intentionally
             | steered away from using them.
        
               | roetlich wrote:
               | Fair, thanks.
        
       | spiffyk wrote:
       | For better or worse, the Zig community has long been embracing
       | the `@fieldParentPtr` builtin as the primary way to do things
       | like OOP-ish inheritance, so this change feels pretty in-
       | character for the language to me.
        
         | Zambyte wrote:
         | Really? I did a quick grep in ghostty (160k lines of Zig) and
         | found four total references to @fieldParentPtr. I'm still a Zig
         | baby, but I have not seen very many references to
         | @fieldParentPtr at all.
        
           | spiffyk wrote:
           | I would expect the inheritance pattern to be quite rare in
           | performance-aware software design overall, since such
           | constructs tend to be rather cache-unfriendly. So I would
           | attribute the low number of references to inheritance being
           | rare, rather than @fieldParentPtr not being used for
           | inheritance.
        
       | steventhedev wrote:
       | This feels like a net negative result. It removes some of the
       | complexity of using generics, but it couples between the data
       | type and the collections it can be indexed by.
       | 
       | What are the benefits of this approach? Is it limited to data
       | alignment, or is it out of a greater desire to remove generics?
        
         | reissbaker wrote:
         | Zig has no desire to remove comptime AFAIK (comptime being the
         | larger Zig feature by which people implement generics in the
         | language) -- comptime is pretty much _the_ reason to use Zig.
         | 
         | The benefits of intrusive linked lists are higher performance;
         | you can use comptime to still have decoupled code.
        
           | kevin_thibedeau wrote:
           | C++20 has consteval to do the same thing.
        
           | surajrmal wrote:
           | The generic approach should be the same performance. This
           | approach just lets you place your data in multiple lists
           | without needing multiple allocations.
        
             | reissbaker wrote:
             | No, the generic approach requires your data to be spaced
             | out further in memory (or to be heap allocated), which
             | causes CPU cache misses and is slower. The entire reason
             | for intrusive linked lists is performance. Standard linked
             | lists are notoriously slow compared to almost any other
             | similar data structure, which is why they are hardly ever
             | used in real code (ArrayLists and vectors are much more
             | common).
        
               | foldr wrote:
               | Maybe it requires this in Zig (I don't know), but in
               | general there's no reason why you couldn't allocate the
               | nodes of an extrusive linked list from a pool residing on
               | the heap or on the stack. For example, you could do this
               | with the STL (for all that STL allocators are a pain to
               | use in practice). Or you could have a slightly different
               | API where you add nodes to the list by passing an entire
               | Node<T> to the relevant function rather than just T, at
               | which point you can trivially allocate the nodes as you
               | please.
        
         | flohofwoe wrote:
         | It's not at all unusual for intrusive linked lists though?
         | 
         | On AmigaOS (which is entirely built out of intrusive doubly
         | linked list) the list node is placed at the start of a struct,
         | so the pointer to the node is also the pointer to the linked
         | item. There's also no 'coupling' because the list manipulation
         | functions take pointers to the list node structs, but they
         | don't need to know the 'outer' item struct.
         | 
         | Zig's @fieldParentPtr is more flexible since the node struct
         | can be anywhere in the parent struct and you still can get from
         | the node pointer to the item base pointer (and more
         | importantly, it makes it trivial to link the same item into
         | multiple linked lists).
        
           | Joker_vD wrote:
           | It's not unusual at all, it's just... should it be exposed? I
           | personally prefer having "Node struct with next pointer +
           | inlined generic data" design when it can be encapsulated away
           | since, well, it can be encapsulated away, and the result data
           | layout is the same. And when you expose this design, well,
           | you end with all sorts of disasters, including OOP
           | inheritance (no, really: [0]).
           | 
           | [0] https://catern.com/inheritance.html
        
             | flohofwoe wrote:
             | If the node embeds the data item you can't link the item
             | into multiple linked lists, but if the data item embeds the
             | node(s) suddenly that's trivial.
             | 
             | I don't know if this was the motiviation for the design
             | change though, but IMHO intrusive linked list are the
             | 'proper' linked list design, and C++-style "nodes with
             | payload" are a weird outlier.
             | 
             | Another reason might be that generic code leads to a lot of
             | code duplication in the binary (which may or may not be
             | optimized away by the compiler though - but at the cost of
             | increased build times though).
        
               | lelanthran wrote:
               | > If the node embeds the data item you can't link the
               | item into multiple linked lists, but if the data item
               | embeds the node(s) suddenly that's trivial.
               | 
               | Perhaps I'm missing something, but isn't it the other way
               | around? In C, this is what I'd expect for the intrusive
               | and extrusive:                   // Intrusive
               | struct user_data_t {            struct user_data_t *next;
               | struct user_data_t *prev;            // The object's
               | fields         };              // Extrusive
               | struct node_t {            struct node_t *next;
               | struct node_t *prev;            struct user_data_t
               | *payload;         };
               | 
               | The intrusive one obviously can't be shared between
               | lists.
        
               | flohofwoe wrote:
               | No, intrusive would be (note: no pointers in
               | user_data_t):                   typedef struct { node_t*
               | next } node_t;              typedef struct { ... }
               | payload_t;              typedef struct {
               | node_t node;             payload_t payload;         }
               | user_data_t;
               | 
               | ...and if you want user_data_t to be included into
               | multiple lists:                   typedef struct {
               | node_t list1_node;             node_t list2_node;
               | node_t list3_node;             payload_t payload;
               | } user_data_t;
               | 
               | ...of course in C now it gets tricky to get to the start
               | of user_data_t given a pointer to `list3_node`, but
               | that's where @fieldParentPtr comes in.
               | 
               | The advantage versus your extrusive example is that the
               | payload doesn't need to be referenced through a pointer,
               | which drastically simplifies lifetime tracking / memory
               | management.
        
               | lelanthran wrote:
               | > No, intrusive would be (note: no pointers in
               | user_data_t):                   typedef struct { node_t*
               | next } node_t;              typedef struct { ... }
               | payload_t;              typedef struct {
               | node_t node;             payload_t payload;         }
               | user_data_t;
               | 
               | That's exactly the same as my intrusive structure, no?
               | 
               | > typedef struct { node_t list1_node; node_t list2_node;
               | node_t list3_node; payload_t payload; } user_data_t;
               | 
               | In C, at any rate, that doesn't give you "inclusion into
               | multiple lists". It gives you "inclusion into at most 3
               | lists. The extrusive example I posted gives "inclusion
               | into multiple lists".
               | 
               | So, yeah, I'm still not seeing your point.
        
               | Zambyte wrote:
               | > That's exactly the same as my intrusive structure, no?
               | 
               | It's not, for the reason they put in parentheses:
               | 
               | > note: no pointers in user_data_t
               | 
               | An allocation of user_data_t also allocates space for
               | payload_t, rather than just allocating space for a
               | pointer to payload_t. Your structure requires an
               | additional allocation to point the payload_t* at
               | something. The fact that they hid the next node_t* in a
               | struct doesn't matter though.
        
               | lelanthran wrote:
               | > It's not, for the reason they put in parentheses:
               | 
               | >> note: no pointers in user_data_t
               | 
               | I feel like I am taking crazy pills: where, in my
               | intrusive example, are pointers in the `user_data_t`?
               | 
               | My intrusive code sample had no pointers for the
               | user_data_t. It's exactly the same as GP's intrusive
               | example.
        
               | Zambyte wrote:
               | It is not you that is crazy, I have been exhausted. My
               | bad.
        
             | sph wrote:
             | Your link about inheritance is worth its own post.
             | 
             | EDIT: Previously:
             | https://news.ycombinator.com/item?id=26988839
        
           | codr7 wrote:
           | This approach is also common in C:
           | 
           | https://github.com/codr7/hacktical-c/tree/main/list
           | 
           | One pretty big advantage is you get some help from the
           | compiler making sure you mean what you say. As opposed to
           | just blindly casting items and assuming no one forgot to put
           | the list in their struct. Also the possibility to be part of
           | multiple lists.
        
         | GolDDranks wrote:
         | I think it matches really well for the Zig ethos of "artisanal,
         | minimal code". More than often, a type is strongly tied to how
         | it is being used in the code base. In that sense, having it be
         | tightly coupled to a data structure isn't much of a problem.
         | The opposite isn't true, and the data structure is independent
         | of the embedding parent data. The implementation of that data
         | structure itself can still be presented as a library, and the
         | "generic" parts carefully tested.
        
         | yccs27 wrote:
         | You could create a comptime function that adds the Node field
         | to any type. I guess then you've arrived back at the previous
         | generic version.
        
         | Someone wrote:
         | The only logical explanation I can see is that these are two
         | decisions:
         | 
         | - linked lists aren't useful on modern systems because
         | traversing them causes to many cache misses. Therefore, we
         | shouldn't provide such a simple implementation.
         | 
         | - but we need one in low level OS code, and there, intrusive
         | lists are preferred. Their limitation that you cannot store an
         | object in multiple lists isn't a problem, and the extra speed
         | and the fact that you can move items between lists without
         | allocations is desired.
         | 
         | I don't see why the intrusive implementation has to be so
         | basic, though. Can't you, in Zig, express that a generic type T
         | used in a generic list class has to have a nullable _next_
         | field that points to a T?
        
           | throwawaymaths wrote:
           | > linked lists aren't useful on modern systems because
           | traversing them causes to many cache misses
           | 
           | Only if you "default to using" (and only use) malloc. Zig
           | encourages you to use different allocators within the same
           | program for different use cases, including, potentially
           | allocators which will thrash the cache far less (for example
           | thread local arenas).
        
         | whizzter wrote:
         | Intrusive linked lists as this is called removes unnecessary
         | allocations and traversal (main reason why linked lists have
         | such a horrible reputation), say a hypothetical example where
         | you have a temporary listener object installed that listens to
         | broadcasts on channel X and times out after 5 minutes.
         | 
         | Upon connecting the object is connected to the channel X linked
         | list as well as some other list of objects that are killed at
         | the timestamp 5 minutes in the future.
         | 
         | With an intrusive linked list the link-node resides within the
         | object, the only thing needed when installing is linking it
         | into the 2 lists (this is a few move-operations), an external
         | linked list would _require 2 extra allocations_ for the linked-
         | list nodes.
         | 
         | Broadcasting from X is almost the same since it's a simple
         | traversal, albeit with double the cache pressure since the
         | object and the linked-list node probably lives separately.
         | 
         | The real HORROR comes when disconnecting, if channel X has
         | millions of listeners it could become a real bottleneck to
         | traverse the linked list to find the link-node that connects
         | the actual object since there could be tons of jumping around
         | memory. An intrusive linked list would just disconnect itself
         | if it's doubly-linked.
         | 
         | This is why hashsets/tables,vectors/arraylists,etc are often
         | used in practice (because many "modern" OO languages never
         | added good support for the machinery needed here) making linked
         | lists look quite bad (there is other badness but using non-
         | intrusive linked lists is almost always worse than using
         | something else than a linked list altogether).
        
           | steventhedev wrote:
           | The generic version in TFA puts the data type allocated
           | alongside the next pointer - no additional allocation needed.
           | The only functional difference is if the zig compiler is not
           | sufficiently advanced to understand it can reorder the fields
           | (hence the alignment question).
           | 
           | The removal scenario is merely specifying that you are
           | passing in ConnectionListNode instead of a Connection.
           | Although maybe it's a good idea to think about how they
           | compose comparatively.
        
       | DeathArrow wrote:
       | I wonder why should a language implement in its libraries a
       | SingleLinkedList and a DoubleLinkedList.
       | 
       | I do get why you need an array-like list, a dictionary/hashmap, a
       | stack and a queue. I got the impression that linked lists aren't
       | used very often. Or maybe it's a different case with Zig?
        
         | messe wrote:
         | Maybe because it's used elsewhere in the standard library?
         | 
         | SinglyLinkedList is used in the standard library in
         | std.Thread.Pool, and std.heap.ArenaAllocator. DoublyLinkedList
         | is used in the http client's connection pool, as well as the
         | ObjectCache for the package manager's git client.
        
         | boomlinde wrote:
         | > I got the impression that linked lists aren't used very
         | often.
         | 
         | I did some cursory research on the software I've been using
         | today and presented the results here:
         | https://news.ycombinator.com/item?id=43684121
         | 
         | I think there is a misconception around linked lists mostly
         | stemming from the fact that they're not great for cache
         | locality. Someone presents the idea that linked lists under
         | common circumstances are unintuitively slow and therefore
         | should be considered carefully for performance sensitive
         | applications. A game of Chinese whispers immediately forms
         | around that idea and what comes out the other end is a new
         | idea: that linked lists are bad, shouldn't be used and aren't
         | used. Meanwhile, they continue to be used extensively in system
         | software.
        
           | bsder wrote:
           | Linked lists get a bad reputation because they wound up being
           | a default data structure--and that's almost always a mistake
           | on modern computers.
           | 
           | Back in days of yore, small memory and CPU meant that linked
           | lists and arrays were really the only effective data
           | structures. Allocating and computation were super expensive,
           | so following a link or having an index is really the only
           | thing that works.
           | 
           | On today's computers, CPU is so much faster than memory that
           | it is almost always better to recompute a value than try to
           | look it up. Locality is so important that linear actions
           | trump asymptotic complexity until you get quite a lot more
           | elements than you might expect. Memory is so cheap that it is
           | worth exponentially overallocating in order to amortize
           | complexity. etc.
           | 
           | At the end of the day with modern programming, you should
           | almost always be reaching for something simpler than a linked
           | list (array/vector) or you should be reaching for an
           | abstraction that is more complex than a linked list
           | (hash/dict, queue/deque, etc.).
           | 
           | With modern CPUs, if you are using a linked list for anything
           | other than building a more complex data structure, you're
           | probably making an incorrect choice on almost all fronts.
        
             | Validark wrote:
             | > Locality is so important that linear actions trump
             | asymptotic complexity until you get quite a lot more
             | elements than you might expect.
             | 
             | The reverse of this can be true. In the following paper,
             | they found that William's Heap Construction (one by one,
             | O(n log n) time) beat Floyd's Heap Construction (O(n) time)
             | when the input is larger than the size of off-chip cache.
             | 
             | https://doi.org/10.1145/351827.384257
        
             | xxs wrote:
             | >actions trump asymptotic complexity
             | 
             | Which operation of a linked list has any better complexity.
             | The only one I can think of is a frequent removal of random
             | elements in the middle -- and opt not to use tombstones.
        
           | xxs wrote:
           | Linked lists have higher constant const memory wise, much
           | slower to iterate, effectively removals of a random in the
           | middle are the same as vector. When used as queues the ring
           | buffers are better for most applications.
           | 
           | Their prime use is that adding new elements lack a terrible
           | worst case, aside the fact it puts more pressure on the
           | memory allocator - need to be concurrent friendly.
           | 
           | If there is a full control of the memory allocation (e.g.
           | kernels) linked lists make sense, for virtually all other
           | cases there are better data structures.
        
             | boomlinde wrote:
             | I'd first like to point out that yours is an argument from
             | the perspective that performance is crucial and dependent
             | on the use or non-use of linked lists. This is not
             | necessarily the case. Performance may not be crucial, and
             | even when it is, linked lists can find use in areas where
             | they're not even nearly the bottleneck, for being the most
             | obvious data structure that satisfies the required
             | operations on the collection. That pointer chasing is slow
             | is not an argument against the use of linked lists in
             | general.
             | 
             | What if I only ever iterate over a list of some hundred
             | entries in the rare case of an error? Should I be reaching
             | for the most performant data structure or the one that most
             | obviously and clearly maps to the operations the program
             | needs to perform on it at some infinitesimal cost?
             | 
             | > Linked lists have higher constant const memory wise
             | 
             | Than a fixed vector with a known maximum number of
             | elements, yes. With something like a dynamic array you are
             | however usually better off using a linked list memory-wise,
             | unless you want to reallocate every few insertions or your
             | node data is smaller than the link.
             | 
             | > effectively removals of a random in the middle are the
             | same as vector.
             | 
             | You're conflating concepts. Seeking is O(n). Removal is
             | O(1). There are cases where seeking for the purpose of
             | removal is either amortized (e.g. you're already iterating
             | over all the nodes, executing an associated handler for
             | each which may or may not determine that the node should be
             | unlinked in-place) or completely unnecessary (some other
             | data structure already tracks a reference to the node and
             | some code associated with that determines if/when it is
             | removed).
             | 
             | One might similarly argue that I am conflating compacting
             | (O(n)) with removal (O(1)) in the vector case, but in the
             | case of linked lists, the difference is just a natural
             | consequence of design, while for a typical implementation
             | of vector operations they are the same operation and
             | require an additional data structure to be anything but
             | that in order to track the holes.
             | 
             | Moving elements around in memory also comes with a host of
             | other issues, for example if there are many references to
             | the elements, requiring maintaining and indirectly
             | accessing memory via an index array.
             | 
             | > aside the fact it puts more pressure on the memory
             | allocator - need to be concurrent friendly.
             | 
             | This is again a conflation of concepts: that of using
             | linked lists and that of making space for the nodes or
             | preparing an allocator for concurrent application. The
             | nodes could be on the stack for some applications, in a
             | static pool for others.
        
         | bsder wrote:
         | What do you think is the data structure underneath a hash,
         | stack or queue? It's likely a linked list.
         | 
         | So, you are completely correct. The issue is simply that you
         | are at a different level of abstraction.
        
       | ralferoo wrote:
       | I don't use Zig, but one advantage of having a genericised
       | intrusive linked list where the next pointer doesn't have to be
       | the first thing in the structure is when you want to use larger
       | types, such as 128-bit fields. Sticking a pointer at the
       | beginning would mean the compiler would have to insert alignment
       | padding after the pointer or break the default alignment.
        
         | jwmerrill wrote:
         | The next pointer doesn't have to go first in the structure
         | here. It can go anywhere, and you can use @fieldParentPtr to go
         | back from a reference to the embedded node to a reference to
         | the structure.
        
           | n42 wrote:
           | I think that's the point the parent was making. I read it as
           | them describing a benefit of Zig's approach.
        
             | ralferoo wrote:
             | Yeah, that's what I was trying to say, but obviously not
             | clearly enough.
        
               | jwmerrill wrote:
               | My mistake! It seems clear in hindsight...
        
       | mastax wrote:
       | > The new version isn't generic. Rather, you embed the linked
       | list node with your data. This is known as an intrusive linked
       | list and tends to perform better and require fewer allocations.
       | 
       | I don't get this. In the generic version the data is embedded
       | with the linked list node, so there's only one allocation per
       | node. 'T' isn't a pointer (or doesn't have to be) there's no
       | indirection between data and node.
        
         | CBLT wrote:
         | > only one allocation per node
         | 
         | I believe the implication is there's fewer than one allocation
         | per node with the new API. You allocate contiguous memory once,
         | and use it to store n elements.
        
           | flowerthoughts wrote:
           | No, that's not stated.
           | 
           | > The new version isn't generic. Rather, you embed the linked
           | list node with your data. This is known as an intrusive
           | linked list and tends to perform better and require fewer
           | allocations. Except in trivial examples, the data that we
           | store in a linked list is typically stored on the heap.
           | Because an intrusive linked list has the linked list node
           | embedded in the data, it doesn't need its own allocation.
           | 
           | Is simply a misunderstanding. The storage layout will be the
           | same for the generic and the intrusive one.
           | 
           | The benefit of intrusive linked lists is that each node can
           | be a member of several linked lists with a single allocation
           | per node. This is used heavily e.g. in the Linux kernel.
           | 
           | The cost is that you need to keep track of the offset from
           | object address to which linked list pointer you're currently
           | dealing with. That's often a compile time constant, but makes
           | the API more awkward. In this case it seems to be the string
           | "node", which seems nice enough. C libraries often use the
           | preprocessor to do something similar.
        
           | mastax wrote:
           | new Node<TStruct>[16];              new
           | TStructContainingNode[16];
           | 
           | What's the difference?
        
             | ulbu wrote:
             | there's 16 contiguously stored pointers to 16 non-
             | contiguously stored tstructs (they may be contiguously
             | stored, but you can't make this assumption from this type).
             | there's 16 contiguously stored tstructcontainingnode's.
        
         | torginus wrote:
         | I remember reading an article about this technique - it was
         | used in the original Starcraft. The idea here, is that the
         | object needs to be allocated anyway by someone else, so you get
         | the linked list for free.
         | 
         | An object can also be part of multiple lists, each used by a
         | different subsystem with its own concerns.
         | 
         | Deleting an item involves unlinking it from all lists (though
         | that requires a doubly-linked list)
         | 
         | Edit: found the article
         | 
         | https://www.codeofhonor.com/blog/tough-times-on-the-road-to-...
        
           | jamiejquinn wrote:
           | AFAIK it's also a structure heavily used by Nethack 3 to
           | manage its astounding complexity. Modern, complex roguelikes
           | seem to be using ECS as an alternative but I think they still
           | have their uses.
        
         | ajross wrote:
         | > I don't get this. In the generic version the data is embedded
         | with the linked list node, so there's only one allocation per
         | node.
         | 
         | The reason is that the object being placed "in" the list can
         | have a longer lifetime than the list node, in fact that's
         | generally the case. Like, I work on Zephyr and in Zephyr we
         | have "threads", whose tracking structs are managed by the
         | application code. But they can be in kernel-maintained lists
         | like a scheduler queue, which is an ephemeral state that the
         | essentially-perpetual thread will long outlive. There's no
         | place to allocate a "generic" node at list insert time.
         | 
         | (Also, more importantly, Zephyr is an RTOS and disallows heap
         | use in the core parts of the system to permit use in
         | environments that need to disallow dynamic allocation. But
         | that's not really relevant to a generics discussion.)
         | 
         | But you can trivially put a scheduler queue node right into the
         | thread object, knowing that it's a behavior of the type. The
         | semantics become different though: you only get the one node,
         | it's not possible to have multiple "lists of threads" using
         | this technique unless you know exactly what each list is for
         | ahead of time.
        
         | mppm wrote:
         | It would probably have been more correct to say "requires fewer
         | allocations _in some cases_ ". As you point out, in terms of
         | layout, the old generic version is just as intrusive as the new
         | version, and requires just as many allocations (one). However,
         | the new version gives extra flexibility for you to move a long-
         | lived object in and out of lists without copying or allocating,
         | at the cost of having the pointer field baked into it
         | permanently, rather than on demand.
         | 
         | I think the reasoning behind this change is that (from Zig's
         | perspective), if you are using linked lists you are probably
         | doing something wrong, _unless_ your application requires the
         | above-mentioned kind of juggling, which favors explicitly
         | intrusive LLs. In addition, this change de-generifys the lists
         | 's methods, like `prepend`, which reduces code size a little.
         | 
         | At least that's my understanding.
        
           | DarkWiiPlayer wrote:
           | > However, the new version gives extra flexibility for you to
           | move a long-lived object in and out of lists without copying
           | or allocating
           | 
           | You could also do this with the entire node in the case of a
           | generic implementation though, the API just needs to expose
           | the node struct to you and allow you to detach it; but the
           | same is true for this new implementation as well.
           | 
           | In terms of memory, a detached node that embeds the payload
           | struct isn't different from an object with an embedded
           | detached node.
           | 
           | What changes is that now, if you _have_ an object class that
           | you don 't want to (or can't) extend to include a list node,
           | you have to wrap it in a container struct that, again, looks
           | the same in memory but now has a node and your object as its
           | members. I'm not sure if this is really much of an
           | improvement at the end of the day.
           | 
           | Also, correct me if I'm wrong (I don't do zig), but shouldn't
           | it be possible to create a list of void elements from the
           | generic implementation and embed its node type inside your
           | object, then proceed from there as if it were the new
           | implementation?
        
             | mppm wrote:
             | Yeah... with bitcasts and some creativity (and some
             | boilerplate) both versions are ultimately equivalent, or
             | nearly so. But the new one pushes you towards intrusive-
             | data-structure thinking and away from container-of
             | thinking.
             | 
             | This, by the way, is a typical design consideration in Zig
             | -- using "friction" to steer programmers away from doing
             | the Wrong Thing (according to Andrew). In addition, Zig is
             | really more of a fancy macro-assembler for outputting
             | optimal code, and less a pragmatic general-purpose
             | language, and makes design decisions accordingly. Taking
             | both together, the linked-list change sort of makes sense,
             | even though personally, I would have just added a separate
             | intrusive list data structure.
        
       | lightingthedark wrote:
       | Can someone explain how the claim of higher performance works
       | here? In C, which lacks generics, an intrusive list is preferred
       | because otherwise you end up with each node having a void pointer
       | to the data it holds. The previous Zig version was a generic, so
       | the data type could easily be a value type. Given that the
       | compiler is allowed to rearrange the layout of both structs
       | unless data is packed or extern (in which case it almost
       | certainly won't want to have a linked list node in the middle
       | anyway) isn't the resulting type exactly the same in memory
       | unless you intentionally made T a pointer type?
        
         | ashdnazg wrote:
         | I don't understand the higher performance either. What I know
         | as the significant advantage is that you can have one object in
         | multiple lists.
        
           | lightingthedark wrote:
           | Technically that should be possible the other way by using a
           | Node<T> so the type of the second list ends up being a
           | Node<Node<T>> but I can see why an intrusive list would be
           | preferred to that, and also the linked list API might prevent
           | that pattern.
           | 
           | Usually if I have multiple lists holding something I have one
           | that's the 'owner' and then the secondary data structures
           | would have a non-owning pointer to it. Is that the case where
           | the performance would be better with an intrusive list? My
           | intuition would be that having multiple Node members would
           | pollute the cache and not actually be a performance win but
           | maybe it is still better off because it's all colocated?
           | Seems like the kind of thing I'd have to benchmark to know
           | since it would depend on the number of lists and the size of
           | the actual data.
        
             | lightingthedark wrote:
             | Okay so in the multiple list case performance would
             | actually be worse because you'd have a double pointer
             | dereference. I was thinking you'd have the list nodes
             | contiguous in memory so the first dereference would always
             | hit cache but that's a bad assumption for a linked list.
             | 
             | Since you shouldn't reach for a linked list as a default
             | data structure modern hardware anyway, I actually do see
             | how this change makes sense for Zig. Neat!
        
               | codr7 wrote:
               | Allocating list nodes in one block of memory is very
               | common in the intrusive case.
        
           | messe wrote:
           | > What I know as the significant advantage is that you can
           | have one object in multiple lists.
           | 
           | Another advantage is smaller code size, as the compiler
           | doesn't need to generate code for SinglyLinkedList(i32),
           | SinglyLinkedList(User), and SinglyLinkedList(PointCloud).
           | This could have a performance impact by making it more likely
           | that code remains in the cache.
        
         | anarazel wrote:
         | Intrusive lists are often used to enqueued pre-existing
         | structures onto lists. And often the same object can be in
         | different lists at different times.
         | 
         | That's not realistically dealt with by the compiler re-
         | organizing the struct layout.
        
         | grayhatter wrote:
         | > Can someone explain how the claim of higher performance works
         | here? In C, which lacks generics, an intrus
         | 
         | I can only give a handwavey answer because I've yet to see any
         | data, and if an engineer tells you something is better but
         | doesn't provide any data, they're not a very good engineer. So
         | grain of salt and all that. But the answer I got was because
         | cache performance. Writing code this way your CPU will spend
         | less time waiting for main memory, and the branch predictor
         | will have better luck. The argument makes sense, but like I
         | said,I've yet to see real world data.
         | 
         | > isn't the resulting type exactly the same in memory unless
         | you intentionally made T a pointer type?
         | 
         | Yes and no. If I understand what you mean, the bit layout will
         | be the 'same'. But I think your confusion is more about how
         | what a compiler means by pointer type, and what a human means.
         | If you pull away enough layers of abstraction, the compiler
         | doesn't see *Type it'll only see *anyopaque, phrased completely
         | differently; according to the compiler, all pointers are the
         | same and are exactly memory_address_size() big. *Type doesn't
         | really exist.
         | 
         | Writing it this way, imagine using just the LinkedList type,
         | without a container of any kind. node to node to node, without
         | any data. While it would be pointless, that would (might) be
         | faster to walk that list, right? There's no extra data loads
         | for the whole struct? That's what this is. Using it this way it
         | gets complicated, but translating theory to asm is always
         | messy. Even more messy when you try to account for speculative
         | execution.
        
           | Zambyte wrote:
           | > But I think your confusion is more about how what a
           | compiler means by pointer type, and what a human means. If
           | you pull away enough layers of abstraction, the compiler
           | doesn't see *Type it'll only see *anyopaque, phrased
           | completely differently; according to the compiler, all
           | pointers are the same and are exactly memory_address_size()
           | big. *Type doesn't really exist.
           | 
           | Check out the implementation of SinglyLinkedList in the
           | latest release (before the change in the post)[0]. You'll
           | notice the argument for SinglyLinkedList is (comptime T:
           | type), which is used in the internal Node struct for the data
           | field, which is of type T. Notably, the data field is _not_ a
           | *T.
           | 
           | In Zig, when you call the function SinglyLinkedList with the
           | argument i32 (like SinglyLinkedList(i32)) to return a type
           | for a list of integers, the i32 is used in the place of T,
           | and a Node struct that is unique for i32 is defined and used
           | internally in the returned type. Similarly, if you had a
           | struct like Person with fields like name and age, and you
           | created a list of Persons like SinglyLinkedList(Person), a
           | new Node type would be internally defined for the new struct
           | type returned for Person lists. This internal Node struct
           | would instead use Person in place of T. The memory for the
           | Node type used internally in SinglyLinkedList(Person)
           | actually embeds the memory for the Person type, rather than
           | just containing a pointer to a Person.
           | 
           | These types are very much known to the compiler, as the
           | layout of a Node for a SinglyLinkedList(i32) is not the same
           | as the layout of a Node for a SinglyLinkedList(Person),
           | because the argument T is not used as a pointer. Unless, as
           | the gp mentioned, T is explicitly made to be a pointer (like
           | SinglyLinkedList(*Person)).
           | 
           | [0] https://ziglang.org/documentation/0.14.0/std/#src/std/lin
           | ked...
        
       | hoseja wrote:
       | Linked lists should be obscure niche data structures for when you
       | absolutely need their unique characteristics, not some front-and-
       | center default.
        
         | netbsdusers wrote:
         | They're not obscure or niche, they are everywhere in operating
         | systems. The linked list is probably the single most essential
         | and the most used data structure in any kernel.
        
           | simonask wrote:
           | Implementing an operating system is incredibly niche and
           | obscure.
           | 
           | The practices that apply to kernel development do not
           | generally apply to very much else.
        
           | fcantournet wrote:
           | Ah yes, operating system kernels, famously super mainstream
           | programs that almost everyone works on.
        
         | grayhatter wrote:
         | This is true for the kind of programmin most people do. Linked
         | Lists make no sense in JS, Python, Rust, etc. But they are
         | without a doubt still an incredibly useful pattern with
         | unmatched performance characteristics. Especially if you're
         | trying to be memory efficient. It's a systems programming
         | pattern, rarely useful if you're just a web developer.
         | 
         | But then again... I never thought I'd need to know music
         | waveform theory while I was watching the magic school bus as a
         | kid... Turns out that was incredibly useful when I needed to
         | implement a music note generator for a VoIP client... You never
         | know when something might actually be really useful.
        
           | simonask wrote:
           | This comment makes no sense.
           | 
           | 1. You have linked lists in Rust. They're in the standard
           | library.
           | 
           | 2. Linked lists do not have "unmatched performance" - they
           | are slower than almost anything, outside of highly
           | specialized use cases. Even moreso for intrusive linked
           | lists. Worst of all, they are horrible for cache locality.
           | 
           | There are valid uses of linked lists, that's why they exist,
           | but most people don't have those use cases.
        
             | grayhatter wrote:
             | Speed isn't the only performance metric that matters. I
             | probably should have said explicitly that I'm referring to
             | uses where it makes sense to use a linked list. But
             | unfortunately I didn't specify because that wasn't the
             | point I really wanted to make.
             | 
             | There's a lot of things to learn that appear to to not
             | makes sense... until suddenly they do.
             | 
             | > There are valid uses of linked lists, that's why they
             | exist, but most people don't have those use cases.
             | 
             | Isn't that pretty much what I said?
             | 
             | I'm pretty sure we agree that they're useful, but their
             | usefulness isn't as universal nor as common as the pattern
             | is.
        
               | simonask wrote:
               | Linked lists are worse than flat arrays in every single
               | performance metric other than ordered removal. They use
               | more memory (the links), have inferior cache locality,
               | cause more allocations (unless you're allocating anyway,
               | in the case of intrusive lists).
               | 
               | Their one benefit - O(1) ordered removal - is easily
               | achieved with a flat array, in at least two common ways:
               | swap in the last element if you don't care about the
               | order, or mark the index as vacant if you do care about
               | the order, or if you care about the stability of
               | references.
               | 
               | I mean, I agree that they're kind of neat from a certain
               | perspective, but they're almost never the right choice.
        
               | xxs wrote:
               | About the ordered removal, if it's not very common, using
               | tombstones (a constant to denote no element) is fine. Too
               | many tombstones would need a compaction phase and still
               | it'd work way better than a linked list.
               | 
               | Personally I have not met a case where a random removal
               | in the middle of a list is common -- b/c finding the
               | needle is slow to begin with O(n). In such case linked
               | hash sets are O(1) but way more memory intense (which is
               | a very advanced linked list)
               | 
               | That being said linked queues are great for concurrent
               | lock free cases.
        
               | delamon wrote:
               | Dynamic arrays often use doubling as a resize strategy.
               | This means that on average about 25% of capacity is
               | "wasted". Depending on element size and element count it
               | may lead to larger memory usage compared to linked list.
        
           | xxs wrote:
           | >Especially if you're trying to be memory efficient
           | 
           | Linked list have higher const cost memory wise compared to
           | array backed structures. Even when an array back structure is
           | half empty it's still takes less memory.
        
         | grandempire wrote:
         | I'm all about data-oriented design, but I don't think this is
         | true - you need their unique characteristics in almost every
         | project.
        
           | simonask wrote:
           | In all of my years, I have seen maybe 2 projects that had one
           | valid use case each. They exist, sure. It's not that common.
        
             | boomlinde wrote:
             | Out of curiosity I looked up some of the software I've
             | meaningfully interacted with today. Of all I looked up--the
             | operating system kernel, the init system, the shell, the
             | terminal emulator, the browser, the compilers, the text
             | editor, the windowing system, the window manager, the
             | notification daemon, the audio server, the audio session
             | manager, the GPG agent, the NTP daemon, the SSH daemon, the
             | DHCP client, the wireless network manager, the power
             | manager, the policy manager, D-Bus, the video player, the
             | video editor--each uses linked lists. There's some more
             | system software showing up in ps (which by the way uses
             | linked lists) that I haven't considered but I am rather
             | confident that most of it uses linked lists.
             | 
             | Maybe you only see these projects as a user, but linked
             | lists are not uncommon. Your experience reflects your,
             | well, experience. We all sit in different niches.
        
               | simonask wrote:
               | I'm wondering what makes you feel confident about the use
               | of linked lists in all of those components.
               | 
               | Mind you, most of those will be written in C on a typical
               | Linux installation, and linked lists happen to be one of
               | the two collection types that are relatively easy to use
               | in C (the other being a flat array), so I will concede
               | that some software is using linked lists out of
               | desperation, rather than it being the correct choice. :-)
        
               | boomlinde wrote:
               | > I'm wondering what makes you feel confident about the
               | use of linked lists in all of those components.
               | 
               | Of all of those mentioned I literally looked up their
               | source repositories up and searched for obvious
               | indicators like "linked", "list", ".next" or "->next" and
               | then verified that I was indeed looking at one or more
               | linked lists. Where does your confidence come from? Oh
               | right, you already mentioned it: it's based on your
               | experience of the projects you've worked on.
               | 
               | The rest of your reply is just moving goalposts, mind
               | reading and a glaring category error. Get back if and
               | when you have something useful to add.
        
               | simonask wrote:
               | That's an incredible amount of effort to throw at an
               | argument on Hacker News.
        
               | tialaramex wrote:
               | Not really desperation, it's just easier. Sometimes this
               | is where perf doesn't matter, any choice would be fine, a
               | linked list of the up to 4 Doodads won't be meaningfully
               | worse or better than a growable array of the 4 Doodads,
               | or I dunno, a HashMap from index to each of the four
               | Doodads. Stop worrying about it and focus on the real
               | problem.
               | 
               | In larger C software sometimes a use of linked lists is
               | costing meaningful performance and a growable array type
               | would be a huge win - but the other side of the coin is
               | that sometimes in these perf critical environments a
               | linked list actually _is_ the right choice, and a C
               | programmer might luck into that since it was the first
               | tool to hand while say a C++ or Rust programmer might
               | take longer to realise that this data structure is the
               | best option.
        
       | esjeon wrote:
       | Hmph, is there any rationale against a generic wrapper over it?
       | Exposing the internal details can be irky in higher level
       | application development. It would not hurt to have some inline
       | functions.
        
       | MawKKe wrote:
       | Isn't the new one also horribly type-unsafe? Since you can put
       | (parent) objects of any kind into the list without any mechanism
       | to detect which is which?
        
       | mgaunard wrote:
       | There are several advantages to intrusive node-based data
       | structures that I haven't seen stated in the article or the
       | comments:
       | 
       | - the same object can move between containers with no allocation
       | and no need for a dedicated complex API
       | 
       | - the same object can be part of multiple containers at once;
       | particularly useful for intrusive binary trees, for indexing data
       | with different criteria.
       | 
       | - the container can be fully polymorphic, no need for all
       | elements to be the same dynamic type.
       | 
       | - no need for complex allocators, you can just store the objects
       | as you see fit.
        
         | rowanG077 wrote:
         | All of these sound like grievous sources of bugs to me. Sure
         | it's nice that it's possible but please don't do those things
         | unless there isn't a safer way to do it.
        
           | Diggsey wrote:
           | They are definitely more error-prone usage patterns, and
           | coming from Rust the amount of hidden unsafety is quite
           | unsettling (in addition to the obvious potential for user
           | errors, depending on the aliasing model of your
           | language/compiler, which I suspect Zig hasn't figured out
           | yet, this parent pointer trick could even lead to
           | miscompilations...)
           | 
           | Having said that, intrusive linked lists are quite an
           | important data structure in code which is required to be
           | allocation-free. Kernel code, signal handlers and code which
           | requires consistent performance (eg. audio processing) need
           | to be able to move objects between containers or have objects
           | in multiple containers without allocating.
        
             | bananabiscuit wrote:
             | Do you mean you will likely make an error when implementing
             | a standard textbook linked list? Or that a standard
             | textbook linked list has non-obvious errors in its design?
             | If so, would be curious to learn more.
        
               | im3w1l wrote:
               | _Implementing_ anything is more error prone than _using_
               | something battle-tested.
        
               | simonask wrote:
               | Most programmers can implement a linked list.
               | 
               | Extremely few can use one safely.
               | 
               | They're an elegant weapon for a more civilized age -
               | namely one without multithreading, aliasing-based
               | optimization, and a host of other nice things.
        
               | tauoverpi wrote:
               | Multithreading is exactly where linked lists are useful
               | as in the implementation of MPSC, MPMC, SPMC, and so on
               | it allows one to safely construct objects out of vire of
               | other threads then use a CAS to append the new item to
               | the list making it visible for other threads to consume
               | even without needing to allocate additional memory.
               | 
               | They're also pretty useful for free-list structures which
               | overwrite entries to be removed (they're scheduled for
               | deletion thus assumptiong here is that the object is now
               | dead) with linked list nodes that would fit in that space
               | thus never needing to allocate any additional as you can
               | reuse existing space.
        
               | simonask wrote:
               | Yeah, wait-free data structures often contain linked
               | lists. That doesn't cancel out all of the other problems.
               | 
               | Anecdotally, I've seen wait-free structures used
               | incorrectly many more times than I've seen them used
               | correctly. Some people think they are magically faster
               | because "acquiring a mutex lock is expensive", but that's
               | far from always the case.
        
               | codr7 wrote:
               | I don't see a problem with their safety, even in C.
               | 
               | More the opposite, their simplicity makes them safer to
               | deal with than the alternatives. And sometimes their
               | unique features (mostly the intrusive variant) are
               | exactly what you need.
               | 
               | Multi-threaded accesses to a vector would be just as bad?
        
             | rowanG077 wrote:
             | Use a separate data structure and then index into it with a
             | linked list like API. No need to do something so dangerous
             | if you simply need an allocation less linked list
             | interface.
             | 
             | But yes there is a place for these data structures. It just
             | shouldn't be common.
        
               | Diggsey wrote:
               | If you just mean using array indexes instead of pointers,
               | that can work in some cases, but can run into concurrency
               | issues when the container needs to be resized. With
               | pointers to nodes, new nodes can be allocated and old
               | nodes freed on a separate thread without interfering with
               | the performance sensitive thread, whereas resizing an
               | array requires all access to stop while the backing
               | memory is re-allocated.
               | 
               | I don't think you mean this, but elaborating on why a
               | non-intrusive linked list is not very useful, that's
               | because it takes linear time to find an object in a non-
               | intrusive linked list, but constant time in an intrusive
               | linked list.
               | 
               | For example:                   struct Node {
               | Node* prev_by_last_update;             Node*
               | next_by_last_update;             Node* prev_with_owner;
               | Node* next_with_owner;             Payload data;
               | }
               | 
               | This intrusive structure allows reassigning and updating
               | nodes, whilst being able to efficiently find eg. the next
               | most recently updated node, or to iterate over nodes with
               | the same owner.
        
             | throwawaymaths wrote:
             | Maybe advocate for wrappers in the stdlib to implement
             | typesafe. LL, typesafe extrusive LL, etc. Using this as a
             | primitive?
        
           | immibis wrote:
           | bugs* ironic
           | 
           | It depends on whether you approach programming from a
           | perspective of writing piles of obviously correct abstraction
           | layers (the total effect of which is to leak so much the
           | solution ends up being wrong anyway), or from the direct
           | unabstracted perspective of actually making the computer do
           | what you want it to do (which does not preclude the solution
           | from using abstractions).
           | 
           | I call them the Uncle Bob vs Casey Muratori school, after
           | Casey's lectures such as "Clean Code, Horrible Performance":
           | https://www.youtube.com/watch?v=tD5NrevFtbU
        
             | rowanG077 wrote:
             | Well uncle bob gives good advice. If you do exactly the
             | opposite of what he tells you to do. so I don't think this
             | comparison is apt at all.
        
               | immibis wrote:
               | Can you elaborate on why he's a hack and perhaps how this
               | relates to the idea of forcing yourself to use highly
               | constrained inefficient "safe" data structures?
        
               | rowanG077 wrote:
               | Because the excessive OOP and TDD he recommends is just
               | bad. He also hates static typing and seems to think
               | testing can somehow replace it.
               | 
               | I'm not for forcing inefficient safe data structures. You
               | are pretending that this is a dichotomy when the vast
               | majority of the time it isn't. You are completely
               | misinterpreting what I said. I'm for safe data
               | structures. And the data structures described are huge
               | footguns. I made no point about efficiency.
        
               | Ygg2 wrote:
               | In programming and life, both extremes can be equally
               | wrong. I'm not saying Uncle Bob is the shining beacon to
               | steer your boat towards at all cost, but neither is Casey
               | Muratori.
               | 
               | > He also hates static typing
               | 
               | I mean, in OOP there are many reasons to dislike it. It's
               | harder to extend and test. Calls to such functions within
               | other methods are a nuisance to remove. E.g.
               | `Instant.now()`.
        
               | rowanG077 wrote:
               | I'm sorry I'm not familiar with Casey Muratori. So I have
               | no clue what he says.
        
               | Ygg2 wrote:
               | He's very data and performance oriented. He gives advice
               | on how to write games, high-performance systems. His
               | advice is solid and great even, but not when your source
               | of latency is a cloud DB.
        
               | immibis wrote:
               | I beg to differ even on that. When your source of latency
               | is a cloud DB, learning about CPU caches won't help you,
               | but learning to think about end-to-end performance
               | absolutely will. "What sequence of queries achieves my
               | goal with the shortest critical path?" is very similar to
               | "What sequence of vector instructions achieves my goal
               | with the shortest loop-carried dependency?"
        
               | Ygg2 wrote:
               | > but learning to think about end-to-end performance
               | absolutely will.
               | 
               | In a cloud DB webservice what will save you is having a
               | good tracing/debugging story. From it, it will be easy to
               | see E2E performance. Sure, some things will translate,
               | but going SIMD optimized layout on a class, that will
               | have to talk to DB across the ocean is a fool's errands.
               | 
               | To optimize you must have a good target and make sure you
               | are measuring correctly.
        
               | immibis wrote:
               | Why do you think that making a minimum number of queries
               | to a cloud database is anything other than conceptually
               | similar to making a SIMD optimized layout on a class?
        
             | Ygg2 wrote:
             | > do what you want it to do
             | 
             | Until a colleague or future you, or someone else calling
             | the code, forgets.
             | 
             | Which leads to "I WANT TO KILL WHOEVER WROTE THIS"
             | debugging sessions.
             | 
             | Encapsulation exists for a reason.
        
           | tauoverpi wrote:
           | You don't need everything to be generic to have safety as the
           | API provided by the object using the linked list can maintain
           | safety by making it difficult to do the wrong thing and being
           | correct by construction. Of course that doesn't stop a user
           | of a library ignoring the safe way to handle types which use
           | intrusive linked list nodes but that same user is likely
           | going to shoot themselves in the foot regardless even when
           | given a "safer" interface.
           | 
           | Trying to write a safer version of the above without losing
           | the performance / advantages takes you into the realm of ATS2
           | https://www.cs.bu.edu/~hwxi/atslangweb/ which quickly becomes
           | more difficult to work with than practical (and no, rust
           | can't solve this with it's implementation of resource types
           | as far as I'm aware).
           | 
           | So yes, doing the above makes sense when the problem demands
           | it as allocating memory you don't need when there's a clear
           | way to represent a graph without it just wastes cycles on
           | memory management you didn't even need to do.
        
         | suspended_state wrote:
         | I think the main advantage of the intrusive definition is that
         | you can use it to implement the non intrusive one easily,
         | whereas the reverse isn't possible.
        
           | NobodyNada wrote:
           | With a non-intrusive linked list type, can't you define the
           | intrusive version as `Node<void>`?
        
         | MrCroxx wrote:
         | Agreed. Intrusive data structures are good for implementing
         | multi-container data structures with lower allocation overhead.
         | And they are widely used than expected. E.g. LRU is a classic
         | multi-container problem.
        
         | grandempire wrote:
         | Is there any generic implementation which is not intrusive? I
         | expect C++ forward_list to look like
         | 
         | struct Node<T> { Node<T> *next; T x; }
        
           | _huayra_ wrote:
           | At least in C++ land, that is not quite what is referred to
           | as intrusive lists. It's basically the opposite / inside-out
           | of what you wrote:
           | 
           | ```C++ struct MyType : Node<MyType> { Node<MyType> _next,_
           | prev; // rest of data }; ```
           | 
           | I usually reach for Boost.Intrusive when I need this [0].
           | 
           | [0] https://www.boost.org/doc/libs/1_88_0/doc/html/intrusive/
           | usa...
        
             | grandempire wrote:
             | I see. I am noticing the main difference is that
             | forward_list manages the lifetime and allocation of nodes
             | and that having a pointer to an object is equivalent to a
             | pointer to the node (could be implemented as a helper).
             | 
             | The layouts look the same?
        
               | _huayra_ wrote:
               | At the byte level, it's quite possible the layouts are
               | the same. However, an "intrusive data structure" means
               | that the nodes themselves are the data.
               | 
               | In other words, intrusive is like this:
               | 
               | struct T : NodeType<T> { // T data NodeType<T> _next,_
               | prev; };
               | 
               | whereas non-intrusive data structures the T's are _not_
               | the nodes themselves
               | 
               | struct NodeType<T> { NodeType<T> _next,_ prev; T t; };
               | 
               | Doing non-intrusive means you need to own the lifecycle
               | of the nodes (and make them copyable, move-constructible,
               | or some combo thereof). Intrusive means that the caller
               | can manage the nodes' lifecycles and just say "here's a
               | node, I promise it'll live while the list is still alive"
               | and have that node be on the stack, heap, static, etc.
        
         | cheezebubba wrote:
         | How can you have elements of different types? I don't
         | understand how you would know which type a given node was in?
        
           | throwawaymaths wrote:
           | it was maybe easier to understand when @fieldParentPtr took
           | the type as an argument, but if you look at this:
           | var x: *T = @fieldParentPtr("node", node_ptr);
           | 
           | @fieldParentPtr knows that its result must have type pointer
           | T; and then the builtin function "looks" inside of T and
           | notices there is a "node" field. Then it can backtrack the
           | pointer of T given the pointer to the node field, since the
           | layout of T is well-defined at compile time.
           | 
           | So you could have a mixed-type linked list, and at the
           | callsite zig can at _compile-time_ back out the parent type.
           | There obviously needs to be outside logic (and an outside
           | mechanism to track) this with branches for each type that you
           | could be putting into the linked list. And yes, _this is more
           | than a little bit type-unsafe because of the potential for
           | type erasure._
           | 
           | You could probably pretty trivially write a typesafe wrapper
           | for this, as suggested in one of the comments here.
        
             | cheezebubba wrote:
             | Thanks. I still don't understand where the bits live that
             | tell you which type T is for a given node.
             | 
             | Is this something like "list starts with type A, and that
             | will tell you if the next node is type A or B"?
             | 
             | Is there a way to stuff those bits in with the node somehow
             | so you always know the type you expect back?
        
               | throwawaymaths wrote:
               | > Thanks. I still don't understand where the bits live
               | that tell you which type T is for a given node.
               | 
               | Yup. You'd have to set that up yourself. It could be
               | anywhere. It could be in a separate array, it could be in
               | the previous node-data, etc.
               | 
               | you wouldn't put it in the Node (capital N) datastructure
               | though, that is privileged.
               | 
               | Personally, I would say doing something like this is "bad
               | idea bears" unless you are absolutely certain you need it
               | and you should explore using, say, data payload type with
               | a union field instead, if possible (though know that that
               | comes with a potential memory penalty)
        
               | cheezebubba wrote:
               | Aha Thanks!
        
         | paraboul wrote:
         | > the same object can be part of multiple containers at once
         | 
         | I'm not sure I understand this one. Since the object contains
         | the reference to where it belongs inside a container (e.g.
         | object.node.next) how can it be re-used in multiple containers.
         | Conversely, in a non-intrusive data structure, multiple
         | containers can hold a ref to the same object through an
         | intermediate node object
        
           | gnubison wrote:
           | You add multiple next variables. buffer.next,
           | buffer.next_displayed, etc
        
             | paraboul wrote:
             | That's not an advantage of intrusive data structures then.
             | That's precisely an advantage of _non-intrusive_ data
             | structure : object can be inserted in an arbitrary number
             | of containers
        
               | mikepurvis wrote:
               | But for the intrusive one you have the visibility into
               | its membership in the other structures, since you're
               | holding a reference to the object _and_ its intrusive
               | fields.
               | 
               | In the non-intrusive case, all you know is how you got
               | there, you don't have return info to the other
               | structures-- particularly annoying if you're trying to
               | destroy an object and have to speculatively look for
               | references to it in the other lists in order to get rid
               | of them.
        
               | paraboul wrote:
               | Of course. But I wouldn't qualify "multi-container use"
               | to be an advantage of intrusive data structures _per se_.
        
           | Zambyte wrote:
           | The object can contain multiple intrusive node fields.
        
         | boomlinde wrote:
         | > the same object can move between containers with no
         | allocation and no need for a dedicated complex API
         | 
         | This is true also of the previous design.
         | 
         | > no need for complex allocators, you can just store the
         | objects as you see fit.
         | 
         | Same here; the previous design leaves it up to the user.
        
       | kajika91 wrote:
       | Couldn't we have a function to return the data like this?
       | pub const SinglyLinkedList(comptime T: type) type {
       | return struct {           first: ?*Node = null,
       | pub const Node = struct {             next: ?*Node = null,
       | };                const Self = @This();            pub fn
       | data(self: Self) *T {             return @fieldParentPtr("node",
       | self);           }       };
        
         | nh1996 wrote:
         | This would require the SinglyLinkedList to be generic and would
         | require that the data struct use "node" as the field name.
         | Also, as some comments have pointed out, this type of linked
         | list can be useful when a data struct needs to be in multiple
         | lists, in which case there is no single "node" field.
         | const some_data = struct {         // Some data fields
         | // ...         bar_node: SinglyLinkedList.Node,
         | baz_node: SinglyLinkedList.Node,       };
        
           | throwawaymaths wrote:
           | sure, just make T == void fall back to the existing
           | implementation.
        
       | RcouF1uZ4gsC wrote:
       | Awesome! In my opinion, an intrusive linked-list is almost always
       | what you really want when you reach for a linked list. An
       | external linked list is often worse than a vector.
        
       | alexvitkov wrote:
       | Please just use a pointer. Use metaprogramming when it makes
       | sense, not to obfuscate something that was figured out 60 years
       | ago.
        
       | bmandale wrote:
       | A few notes I would make as an outsider:
       | 
       | - conventionally we put the `node' struct at the start of the
       | `user' struct. The advantage is that we can convert from node* to
       | user* with a simple cast instead of messing around with offsets.
       | Knowing the offset stuff is still potentially useful in case you
       | want multiple `node's in a single struct, but that is seldom the
       | case so you can almost always get by with just a cast.
       | 
       | - this strategy is, especially with the node at the start of the
       | user, equivalent to inheritance. I take it from the fine article
       | that zig doesn't have inheritance? If they are planning on adding
       | more interfaces like this linked list one, then inheritance would
       | be a good idea. We often treat inheritance using analogies to the
       | animal kingdom or to geometric shapes, but data structures like
       | linked lists are (to my understanding) the problem that
       | inheritance is actually designed to solve, and it is really the
       | best approach to solving problems like in the OP.
       | 
       | - just lol on zig speed running being a crappier C++.
        
       | RKFADU_UOFCCLEL wrote:
       | This is in fact a com.on technique in the Linux kernel:
       | https://en.wikipedia.org/wiki/Offsetof#Usage
        
       | zoogeny wrote:
       | > We give @fieldParentPtr the name of the field, a pointer to
       | that field as well as a return type (which is inferred above by
       | the assignment to a *User variable), and it gives us back the
       | instance that contains that field.
       | 
       | That seems like a refactoring nightmare. If I say:
       | const user: *NotAUser = @fieldParentPtr("node", n);
       | 
       | Will it calculate this incorrectly? I mean, if `NotAUser` has a
       | "node" field and `n` is actually from the `User` type (and it has
       | a different offset in each type)?
        
       | tpayet wrote:
       | I read Zig's new LinkedIn API, and I was intrigued
        
       | codr7 wrote:
       | Not super uncommon in C land, I learned the technique from the
       | Linux kernel.
       | 
       | https://github.com/codr7/hacktical-c/tree/main/list
       | 
       | Haven't come across another language (besides Zig maybe) low
       | level enough for it to make sense. I've tried in C++ and Go, but
       | it gets so ugly that it loses its appeal.
        
       ___________________________________________________________________
       (page generated 2025-04-16 17:02 UTC)