[HN Gopher] Zig's new LinkedList API (it's time to learn fieldPa...
___________________________________________________________________
Zig's new LinkedList API (it's time to learn fieldParentPtr)
Author : todsacerdoti
Score : 218 points
Date : 2025-04-14 10:06 UTC (2 days ago)
(HTM) web link (www.openmymind.net)
(TXT) w3m dump (www.openmymind.net)
| roetlich wrote:
| This looks somewhat horrifying. How do I safely write a function
| that takes a linked list as a parameter?
| the_mitsuhiko wrote:
| > How do I safely write a function that takes a linked list as
| a parameter?
|
| Zig does not have a lot of generic code. You would pass the
| user directly and then walk the list or you use comptime. The
| real answer is that "you don't write code like that in Zig".
| reissbaker wrote:
| Assuming you mean "how would I safely write a function that
| takes a generic linked list and does something with the data,"
| I'm pretty sure you would use comptime: your function would
| take the concrete type (say, User) as a comptime parameter, and
| then you would do your list stuff via accessing .node and use
| the fieldParentPtr to access the underlying data.
|
| Syntactically I don't think it's _that_ weird, TBH. And it 's
| typesafe: if you write invalid code the compiler will give you
| an error.
| fpoling wrote:
| But how does the generic code know the name by which the
| field is known in the struct that contains the field? Is it
| supposed to be passed as an extra parameter to the generic
| function?
| reissbaker wrote:
| Yup, you could pass it in at comptime as well! And access
| via @field.
| spiffyk wrote:
| I think the more important question here is: What
| operations would you possibly want to do with a linked list
| that are generic _and_ at the same time do something with
| the concrete data in it?
|
| To me these complaints sound like hypotheticals with no
| sound grounding in the real world. If there indeed _is_
| data that is common to the various types you would want to
| operate upon in such linked lists, you would nest. E.g. you
| would have some common Entity struct containing the common
| data and the LinkedList.Node; this Entity would then be
| inside your more concrete structs. The generic function
| would then take a pointer to Entity.
| reissbaker wrote:
| I think a pretty good example would be: how do you write
| a map function? Or filter, find, etc.
|
| (You would do it with more comptime, but the question is
| legitimate!)
| spiffyk wrote:
| > how do you write a map function? Or filter, find, etc.
|
| You just don't. Zig does not have lambdas anyway, there's
| no readability incentive to having such functions there.
| You do these things with plain old loops built into the
| language.
| reissbaker wrote:
| Zig has first-class functions (you can pass functions as
| parameters); it just doesn't have closures. Map pretty
| rarely uses closures anyway IME; e.g. converting a list
| to JSON doesn't need to close over outside variables. And
| anyway, anything that's:
|
| 1. Generic over lists, and
|
| 2. Takes a function as a parameter
|
| Will want to know what the node field name is. Luckily,
| comptime provides a solution there.
|
| TBH I think "you just don't" is a pretty unsatisfying
| answer to "how do I use these features that are built
| into Zig" -- especially when you can use them, very
| easily.
| spiffyk wrote:
| I mean yes, you can pass functions as parameters in Zig,
| you can even define them inline if you wrap them in
| anonymous structs, but my point is that -- in a language
| where this is supported _somewhat properly_ -- you would
| do this to improve readability with some sort of a fluent
| API. If you attempt to do it in status-quo Zig,
| readability will only suffer and you are still much
| better off doing it with for and /or while loops
| (depending on the data structure), especially when Zig
| has nice syntax sugar with payloads in both to support
| these.
|
| edit: formatting
| Zambyte wrote:
| Zig definitely encourages an imperative style by design,
| but it does support function pointers, which can be used
| to implement a map function. I do think passing the field
| name of the node as a comptime []const u8 as an extra
| argument to map, filter, etc. would be the only way to
| implement a generic version of these functions.
|
| The fact that these functions are already designed to be
| cumbersome to implement, I think that's fine? I have also
| yet to use a linked list in Zig anyways. It's probably
| better to use an array, slice, std.BoundedArray,
| std.ArrayList, std.SegmentedList, or std.MultiArrayList
| unless there is a specific reason that a linked list is
| the best option.
| roetlich wrote:
| Okay, thanks for explaining more.
|
| > And it's typesafe: if you write invalid code the compiler
| will give you an error.
|
| One more question, if @fieldParentPtr("node", n) is typesafe
| and can get you a User, the compiler needs to know that the
| struct has the fields of a User, and that this pointer is
| indeed pointing at a node field, right? Then why do you need
| to specify "node" at all?
|
| I think I don't understand Zig at all :)
| messe wrote:
| What if the User is in multiple linked lists?
| const User = struct { username: []const u8,
| id: u128, age: u7, sorted_by_age:
| std.SinglyLinkedList.Node, sorted_by_id:
| std.SinglyLinkedList.Node, };
| boomlinde wrote:
| @fieldParentPtr is not type safe. It assumes that the given
| pointer is to a field of the given name in the inferred type.
| Zig can detect at compile time whether the inferred type has
| a field of that name of the same type as the pointer child.
| So far, so good: type safe.
|
| The lack of safety stems from the fact it doesn't know that
| the assumption that the pointer has that parent is true.
| Here's a very simple illustration. It'll compile.
| const T = struct { field: i32 = 0, };
| pub fn main() void { var nonfield: i32 = 0;
| _ = @as(*T, @fieldParentPtr("field", &nonfield)); }
|
| `nonfield` doesn't have a parent T. The use of
| @fieldParentPtr here causes what the official reference calls
| "unchecked illegal behavior"; it isn't checked even in the
| safe build modes.
| grayhatter wrote:
| > The use of @fieldParentPtr here [...]
|
| the *hypothetical* use here...
|
| This simple example as written is actually correct, and not
| actually unchecked illegal behavior. But this is just a bit
| of pedantry, because it would be unchecked illegal if you
| use this in a more complicated example.
|
| I mention it because it's important to note, it can be done
| correctly, as you obviously just did it correctly, by
| accident. That said, I still agree, there're no guardrails
| on this pattern.
| boomlinde wrote:
| > the _hypothetical_ use here...
|
| No, it's not hypothetical. I just did it, which
| demonstrates that it's not type safe and that the
| compiler won't give you an error if you write invalid
| code, which is what the post I responded to claimed.
|
| > This simple example as written is actually correct
|
| No. The official reference is clear about this: "If
| field_ptr does not point to the field_name field of an
| instance of the result type, and the result type has ill-
| defined layout, invokes unchecked Illegal Behavior."
|
| Because there is no well-defined layout for plain structs
| in Zig, they satisfy the "ill-defined layout"
| requirement. Merely invoking the @fieldParentPtr on a
| pointer that doesn't satisfy the other requirement in
| that case invokes unchecked illegal behavior according to
| the language reference.
|
| > I mention it because it's important to note, it can be
| done correctly,
|
| Ignoring that you can't seem to get the facts straight,
| the claim I am responding to is that it's "type safe" and
| that "if you write invalid code the compiler will give
| you an error". This is not the case, regardless of
| whether my example invokes unchecked illegal behavior
| (which it does) or not.
|
| I'm very clearly _not_ saying that it _can 't_ be done
| correctly and I don't think you can argue for that in
| good faith.
| grayhatter wrote:
| > No, it's not hypothetical. I just did it, which
| demonstrates that it's not type safe and that the
| compiler won't give you an error if you write invalid
| code, which is what the post I responded to claimed.
|
| Describe explicitly how your example will trigger illegal
| behavior? Happy to discuss the behavior of the compiled
| asm if you'd like to be that pendantic?
|
| Not type safe, and contains an appreciable risk or defect
| are two distinct concerns. One matters less than the
| other.
|
| > Because there is no well-defined layout for plain
| structs in Zig, they satisfy the "ill-defined layout"
| requirement.
|
| No? Struts have a defined layout, they're not guaranteed
| between compilations, but they don't change within a
| compilation unit.
|
| > Merely invoking the @fieldParentPtr on a pointer that
| doesn't satisfy the other requirement in that case
| invokes unchecked illegal behavior according to the
| language reference.
|
| And the failure mode is the compiler will decide to
| delete your OS?
|
| Unchecked illegal behavior isn't magic, doing something
| similar to illegal behavior doesn't mean the code will or
| won't do what you intended. uint rollover is unchecked
| illegal behavior (when not explict, in release fast) but
| just because the uint might roll over doesn't make that
| code invalid.
|
| > I'm very clearly not saying that it can't be done
| correctly and I don't think you can argue for that in
| good faith.
|
| I thought you were saying that it's guaranteed to be
| incorrect. In fact, I think that's what you said in your
| reply to me. Mostly because you used the word 'mearly'.
|
| The way that I understood what you meant was, the example
| you provided was guaranteed wrong. In fact it's actually
| correct. It will have the same semantics and behavior as
| what I assume was desired.
|
| > Ignoring that you can't seem to get the facts straight
| [...] and I don't think you can argue for that in good
| faith.
|
| I'm sorry if I was unclear, I was trying to discuss it in
| good faith. :(
| boomlinde wrote:
| > Describe explicitly how your example will trigger
| illegal behavior? Happy to discuss the behavior of the
| compiled asm if you'd like to be that pendantic?
|
| Through calling @fieldParentPtr with a field_ptr not
| pointing at a field of the given field name in the result
| type, when the result type has an ill-defined memory
| layout. I'm essentially just paraphrasing the
| documentation, which I've already quoted.
|
| Generated code is irrelevant to this end. Illegal
| behavior can coincidentally result in code that works as
| intended in practice as a side effect of the
| implementation. Similarly you may expect a signed integer
| to wrap around as it overflows in C because of the
| implementation of the compiler and the code it generates,
| but it's still undefined behavior.
|
| > No? Struts have a defined layout, they're not
| guaranteed between compilations, but they don't change
| within a compilation unit.
|
| This is not the sense in which the Zig language reference
| uses the term well-defined memory layout. Bare structs,
| error unions, slices and optionals don't have a well-
| defined memory layout in the sense the Zig language
| reference uses it.
|
| > And the failure mode is the compiler will decide to
| delete your OS?
|
| The failure mode is irrelevant. Unchecked illegal
| behavior is unchecked illegal behavior regardless of the
| failure mode. You are moving goalposts now.
|
| > I thought you were saying that it's guaranteed to be
| incorrect. In fact, I think that's what you said in your
| reply to me. Mostly because you used the word 'mearly'.
|
| The example I posted is guaranteed to be incorrect, which
| demonstrates that the use of @fieldParentPtr can result
| in unchecked illegal behavior, hence not type safe nor
| giving you an error when you write invalid code.
| Regarding the use of "merely", you should adopt the habit
| of reading complete sentences before you draw conclusions
| about what they imply.
|
| > The way that I understood what you meant was, the
| example you provided was guaranteed wrong.
|
| The example is guaranteed to cause unchecked illegal
| behavior.
|
| > I'm sorry if I was unclear, I was trying to discuss it
| in good faith. :(
|
| If you want to argue in good faith, you can start with a
| good faith reading of the language reference.
| GolDDranks wrote:
| If you write a function that takes a _generic_ linked list as a
| parameter, you'd have a function that refers to just the linked
| list subrecord, and does only linked list operations to that
| and the linked nodes.
|
| If you want to write a function that takes a linked list of a
| specific type as a parameter, you just take in a value of that
| type. The linked list is baked in, so you can get to the other
| nodes, and because the type is known, you can get back to the
| parent type from the linked nodes with fieldParentPtr. How to
| do that _safely_? I don't think that Zig embraces any Rust-like
| safe/unsafe dichotomy, so you don't.
| flohofwoe wrote:
| Since you don't need to care about the 'outer type' in that
| case you just pass a pointer to the linked list header or a
| linked list node and that's it.
|
| If you need to access the outer type, just pass a pointer to
| that type (since your functions need to know the outer type
| anyway I don't think there's a need to reach for generics).
| skywal_l wrote:
| You have to pass the field name too ("node").
| flohofwoe wrote:
| Not really. If you only want to manipulate the list, you
| only need a direct pointer to the nested node but don't
| need to know the 'outer type'.
|
| Only if the function takes a pointer to the outer type it
| needs to know how to get the pointer to the embedded node
| struct from the item pointer.
|
| ...I guess it makes a lot more sense if you ever wrote code
| for AmigaOS ;)
| kllrnohj wrote:
| You don't & generally shouldn't be in the first place, in any
| language. Linked lists are a _very_ niche data structure, so
| generic code should ~never be using them. So it 's a moot
| question. It's kinda like the complaints about how hard a
| doubly linked list is in Rust - it's just not important because
| it's not something you should be using 99.999% of the time
| anyway.
| roetlich wrote:
| Well I personally almost never use linked lists, don't like
| them. But Zig seems to think otherwise, they have two
| implementations in the standard library.
|
| I don't know much Zig, I just wanted to ask how to use the
| type that the article talks about?
|
| Let's use the very simple example from the article. Let's say
| I want to extract this code into a function:
| while (node) |n| { const user: *User =
| @fieldParentPtr("node", n);
| std.debug.print("{any}\n", .{user}); node = n.next;
| }
|
| 1. How does that look, what's the type signature of this
| function? 2. What happens if I put in a list doesn't contain
| users? Do I get a simple to understand compile time error, or
| can I segfault because I'm accessing bad memory?
|
| And I don't think that would need any generics, since the
| list type isn't generic, right?
| boomlinde wrote:
| 1. If you mean @fieldParentPtr, the first argument is a
| compile time-known name of a field, the second argument is
| a pointer to a field of that name in the parent type, which
| is inferred from the left hand side of the assignment
|
| 2. Then you're screwed. In Zig, using @fieldParentPtr on a
| field that doesn't have the given parent is unchecked
| illegal behavior, meaning that there are no checks that can
| catch it at compile time. This change basically guarantees
| that there will be a pretty serious foot gun every time you
| iterate over the items of a standard library list.
| roetlich wrote:
| Thank you, that answers all my questions.
| throwawaymaths wrote:
| Is 2 really an issue? Just how often are you going to
| have multiple Node types going on in your codebase?
| grayhatter wrote:
| A linked list is just a simpler version of both a graph and a
| tree. What data structure would you suggest to represent
| both/either of those?
|
| Or are you asserting, because you've never used them they're
| not common? Because while maybe I agree, and I don't often
| reach for a linked list, I've built plenty of trees and
| graphs.
| kllrnohj wrote:
| Trees and graphs serve useful roles, of course, but calling
| a linked list just a simpler version is a stretch. It'd be
| like calling an array a simplified B-tree. The
| simplification changes not just the implementation but also
| radically changes the applicability of the result.
| grayhatter wrote:
| except a tree or graph with multiple pointers is similar
| to linked list with one pointer.
|
| where as a b-tree stored in an array without pointers
| probably shouldn't be called a b-tree.
|
| or am I missing something?
| AndyKelley wrote:
| Two points here:
|
| Linked lists are useful in unsafe code. Most recent use case I
| had for them was in an event loop with coroutines. It's not
| possible to implement such thing in memory safe languages. For
| example if you use Rust, you have to use unsafe [1].
|
| @fieldParentPtr does not yet have safety but it is a planned
| upcoming change to the language, with a fairly straightforward
| implementation [2].
|
| [1]: https://github.com/search?q=repo%3Atokio-
| rs%2Ftokio%20unsafe...
|
| [2]: https://github.com/ziglang/zig/issues/2414
| roetlich wrote:
| Oh, that makes more sense. Thank you!
|
| Is this linked list type then mostly meant to be used as an
| internal implementation detail, and not be exposed in a
| public API? Because if I see a function that takes a non-
| generic list I won't really know how to call it. :)
| AndyKelley wrote:
| Linked lists are sometimes the right tool for the job. They
| are available in the standard library for such cases. If
| you are struggling to understand how you would use this API
| then it probably means you have not yet been exposed to
| such a use case, and you are being quite intentionally
| steered away from using them.
| roetlich wrote:
| Fair, thanks.
| spiffyk wrote:
| For better or worse, the Zig community has long been embracing
| the `@fieldParentPtr` builtin as the primary way to do things
| like OOP-ish inheritance, so this change feels pretty in-
| character for the language to me.
| Zambyte wrote:
| Really? I did a quick grep in ghostty (160k lines of Zig) and
| found four total references to @fieldParentPtr. I'm still a Zig
| baby, but I have not seen very many references to
| @fieldParentPtr at all.
| spiffyk wrote:
| I would expect the inheritance pattern to be quite rare in
| performance-aware software design overall, since such
| constructs tend to be rather cache-unfriendly. So I would
| attribute the low number of references to inheritance being
| rare, rather than @fieldParentPtr not being used for
| inheritance.
| steventhedev wrote:
| This feels like a net negative result. It removes some of the
| complexity of using generics, but it couples between the data
| type and the collections it can be indexed by.
|
| What are the benefits of this approach? Is it limited to data
| alignment, or is it out of a greater desire to remove generics?
| reissbaker wrote:
| Zig has no desire to remove comptime AFAIK (comptime being the
| larger Zig feature by which people implement generics in the
| language) -- comptime is pretty much _the_ reason to use Zig.
|
| The benefits of intrusive linked lists are higher performance;
| you can use comptime to still have decoupled code.
| kevin_thibedeau wrote:
| C++20 has consteval to do the same thing.
| surajrmal wrote:
| The generic approach should be the same performance. This
| approach just lets you place your data in multiple lists
| without needing multiple allocations.
| reissbaker wrote:
| No, the generic approach requires your data to be spaced
| out further in memory (or to be heap allocated), which
| causes CPU cache misses and is slower. The entire reason
| for intrusive linked lists is performance. Standard linked
| lists are notoriously slow compared to almost any other
| similar data structure, which is why they are hardly ever
| used in real code (ArrayLists and vectors are much more
| common).
| foldr wrote:
| Maybe it requires this in Zig (I don't know), but in
| general there's no reason why you couldn't allocate the
| nodes of an extrusive linked list from a pool residing on
| the heap or on the stack. For example, you could do this
| with the STL (for all that STL allocators are a pain to
| use in practice). Or you could have a slightly different
| API where you add nodes to the list by passing an entire
| Node<T> to the relevant function rather than just T, at
| which point you can trivially allocate the nodes as you
| please.
| flohofwoe wrote:
| It's not at all unusual for intrusive linked lists though?
|
| On AmigaOS (which is entirely built out of intrusive doubly
| linked list) the list node is placed at the start of a struct,
| so the pointer to the node is also the pointer to the linked
| item. There's also no 'coupling' because the list manipulation
| functions take pointers to the list node structs, but they
| don't need to know the 'outer' item struct.
|
| Zig's @fieldParentPtr is more flexible since the node struct
| can be anywhere in the parent struct and you still can get from
| the node pointer to the item base pointer (and more
| importantly, it makes it trivial to link the same item into
| multiple linked lists).
| Joker_vD wrote:
| It's not unusual at all, it's just... should it be exposed? I
| personally prefer having "Node struct with next pointer +
| inlined generic data" design when it can be encapsulated away
| since, well, it can be encapsulated away, and the result data
| layout is the same. And when you expose this design, well,
| you end with all sorts of disasters, including OOP
| inheritance (no, really: [0]).
|
| [0] https://catern.com/inheritance.html
| flohofwoe wrote:
| If the node embeds the data item you can't link the item
| into multiple linked lists, but if the data item embeds the
| node(s) suddenly that's trivial.
|
| I don't know if this was the motiviation for the design
| change though, but IMHO intrusive linked list are the
| 'proper' linked list design, and C++-style "nodes with
| payload" are a weird outlier.
|
| Another reason might be that generic code leads to a lot of
| code duplication in the binary (which may or may not be
| optimized away by the compiler though - but at the cost of
| increased build times though).
| lelanthran wrote:
| > If the node embeds the data item you can't link the
| item into multiple linked lists, but if the data item
| embeds the node(s) suddenly that's trivial.
|
| Perhaps I'm missing something, but isn't it the other way
| around? In C, this is what I'd expect for the intrusive
| and extrusive: // Intrusive
| struct user_data_t { struct user_data_t *next;
| struct user_data_t *prev; // The object's
| fields }; // Extrusive
| struct node_t { struct node_t *next;
| struct node_t *prev; struct user_data_t
| *payload; };
|
| The intrusive one obviously can't be shared between
| lists.
| flohofwoe wrote:
| No, intrusive would be (note: no pointers in
| user_data_t): typedef struct { node_t*
| next } node_t; typedef struct { ... }
| payload_t; typedef struct {
| node_t node; payload_t payload; }
| user_data_t;
|
| ...and if you want user_data_t to be included into
| multiple lists: typedef struct {
| node_t list1_node; node_t list2_node;
| node_t list3_node; payload_t payload;
| } user_data_t;
|
| ...of course in C now it gets tricky to get to the start
| of user_data_t given a pointer to `list3_node`, but
| that's where @fieldParentPtr comes in.
|
| The advantage versus your extrusive example is that the
| payload doesn't need to be referenced through a pointer,
| which drastically simplifies lifetime tracking / memory
| management.
| lelanthran wrote:
| > No, intrusive would be (note: no pointers in
| user_data_t): typedef struct { node_t*
| next } node_t; typedef struct { ... }
| payload_t; typedef struct {
| node_t node; payload_t payload; }
| user_data_t;
|
| That's exactly the same as my intrusive structure, no?
|
| > typedef struct { node_t list1_node; node_t list2_node;
| node_t list3_node; payload_t payload; } user_data_t;
|
| In C, at any rate, that doesn't give you "inclusion into
| multiple lists". It gives you "inclusion into at most 3
| lists. The extrusive example I posted gives "inclusion
| into multiple lists".
|
| So, yeah, I'm still not seeing your point.
| Zambyte wrote:
| > That's exactly the same as my intrusive structure, no?
|
| It's not, for the reason they put in parentheses:
|
| > note: no pointers in user_data_t
|
| An allocation of user_data_t also allocates space for
| payload_t, rather than just allocating space for a
| pointer to payload_t. Your structure requires an
| additional allocation to point the payload_t* at
| something. The fact that they hid the next node_t* in a
| struct doesn't matter though.
| lelanthran wrote:
| > It's not, for the reason they put in parentheses:
|
| >> note: no pointers in user_data_t
|
| I feel like I am taking crazy pills: where, in my
| intrusive example, are pointers in the `user_data_t`?
|
| My intrusive code sample had no pointers for the
| user_data_t. It's exactly the same as GP's intrusive
| example.
| Zambyte wrote:
| It is not you that is crazy, I have been exhausted. My
| bad.
| sph wrote:
| Your link about inheritance is worth its own post.
|
| EDIT: Previously:
| https://news.ycombinator.com/item?id=26988839
| codr7 wrote:
| This approach is also common in C:
|
| https://github.com/codr7/hacktical-c/tree/main/list
|
| One pretty big advantage is you get some help from the
| compiler making sure you mean what you say. As opposed to
| just blindly casting items and assuming no one forgot to put
| the list in their struct. Also the possibility to be part of
| multiple lists.
| GolDDranks wrote:
| I think it matches really well for the Zig ethos of "artisanal,
| minimal code". More than often, a type is strongly tied to how
| it is being used in the code base. In that sense, having it be
| tightly coupled to a data structure isn't much of a problem.
| The opposite isn't true, and the data structure is independent
| of the embedding parent data. The implementation of that data
| structure itself can still be presented as a library, and the
| "generic" parts carefully tested.
| yccs27 wrote:
| You could create a comptime function that adds the Node field
| to any type. I guess then you've arrived back at the previous
| generic version.
| Someone wrote:
| The only logical explanation I can see is that these are two
| decisions:
|
| - linked lists aren't useful on modern systems because
| traversing them causes to many cache misses. Therefore, we
| shouldn't provide such a simple implementation.
|
| - but we need one in low level OS code, and there, intrusive
| lists are preferred. Their limitation that you cannot store an
| object in multiple lists isn't a problem, and the extra speed
| and the fact that you can move items between lists without
| allocations is desired.
|
| I don't see why the intrusive implementation has to be so
| basic, though. Can't you, in Zig, express that a generic type T
| used in a generic list class has to have a nullable _next_
| field that points to a T?
| throwawaymaths wrote:
| > linked lists aren't useful on modern systems because
| traversing them causes to many cache misses
|
| Only if you "default to using" (and only use) malloc. Zig
| encourages you to use different allocators within the same
| program for different use cases, including, potentially
| allocators which will thrash the cache far less (for example
| thread local arenas).
| whizzter wrote:
| Intrusive linked lists as this is called removes unnecessary
| allocations and traversal (main reason why linked lists have
| such a horrible reputation), say a hypothetical example where
| you have a temporary listener object installed that listens to
| broadcasts on channel X and times out after 5 minutes.
|
| Upon connecting the object is connected to the channel X linked
| list as well as some other list of objects that are killed at
| the timestamp 5 minutes in the future.
|
| With an intrusive linked list the link-node resides within the
| object, the only thing needed when installing is linking it
| into the 2 lists (this is a few move-operations), an external
| linked list would _require 2 extra allocations_ for the linked-
| list nodes.
|
| Broadcasting from X is almost the same since it's a simple
| traversal, albeit with double the cache pressure since the
| object and the linked-list node probably lives separately.
|
| The real HORROR comes when disconnecting, if channel X has
| millions of listeners it could become a real bottleneck to
| traverse the linked list to find the link-node that connects
| the actual object since there could be tons of jumping around
| memory. An intrusive linked list would just disconnect itself
| if it's doubly-linked.
|
| This is why hashsets/tables,vectors/arraylists,etc are often
| used in practice (because many "modern" OO languages never
| added good support for the machinery needed here) making linked
| lists look quite bad (there is other badness but using non-
| intrusive linked lists is almost always worse than using
| something else than a linked list altogether).
| steventhedev wrote:
| The generic version in TFA puts the data type allocated
| alongside the next pointer - no additional allocation needed.
| The only functional difference is if the zig compiler is not
| sufficiently advanced to understand it can reorder the fields
| (hence the alignment question).
|
| The removal scenario is merely specifying that you are
| passing in ConnectionListNode instead of a Connection.
| Although maybe it's a good idea to think about how they
| compose comparatively.
| DeathArrow wrote:
| I wonder why should a language implement in its libraries a
| SingleLinkedList and a DoubleLinkedList.
|
| I do get why you need an array-like list, a dictionary/hashmap, a
| stack and a queue. I got the impression that linked lists aren't
| used very often. Or maybe it's a different case with Zig?
| messe wrote:
| Maybe because it's used elsewhere in the standard library?
|
| SinglyLinkedList is used in the standard library in
| std.Thread.Pool, and std.heap.ArenaAllocator. DoublyLinkedList
| is used in the http client's connection pool, as well as the
| ObjectCache for the package manager's git client.
| boomlinde wrote:
| > I got the impression that linked lists aren't used very
| often.
|
| I did some cursory research on the software I've been using
| today and presented the results here:
| https://news.ycombinator.com/item?id=43684121
|
| I think there is a misconception around linked lists mostly
| stemming from the fact that they're not great for cache
| locality. Someone presents the idea that linked lists under
| common circumstances are unintuitively slow and therefore
| should be considered carefully for performance sensitive
| applications. A game of Chinese whispers immediately forms
| around that idea and what comes out the other end is a new
| idea: that linked lists are bad, shouldn't be used and aren't
| used. Meanwhile, they continue to be used extensively in system
| software.
| bsder wrote:
| Linked lists get a bad reputation because they wound up being
| a default data structure--and that's almost always a mistake
| on modern computers.
|
| Back in days of yore, small memory and CPU meant that linked
| lists and arrays were really the only effective data
| structures. Allocating and computation were super expensive,
| so following a link or having an index is really the only
| thing that works.
|
| On today's computers, CPU is so much faster than memory that
| it is almost always better to recompute a value than try to
| look it up. Locality is so important that linear actions
| trump asymptotic complexity until you get quite a lot more
| elements than you might expect. Memory is so cheap that it is
| worth exponentially overallocating in order to amortize
| complexity. etc.
|
| At the end of the day with modern programming, you should
| almost always be reaching for something simpler than a linked
| list (array/vector) or you should be reaching for an
| abstraction that is more complex than a linked list
| (hash/dict, queue/deque, etc.).
|
| With modern CPUs, if you are using a linked list for anything
| other than building a more complex data structure, you're
| probably making an incorrect choice on almost all fronts.
| Validark wrote:
| > Locality is so important that linear actions trump
| asymptotic complexity until you get quite a lot more
| elements than you might expect.
|
| The reverse of this can be true. In the following paper,
| they found that William's Heap Construction (one by one,
| O(n log n) time) beat Floyd's Heap Construction (O(n) time)
| when the input is larger than the size of off-chip cache.
|
| https://doi.org/10.1145/351827.384257
| xxs wrote:
| >actions trump asymptotic complexity
|
| Which operation of a linked list has any better complexity.
| The only one I can think of is a frequent removal of random
| elements in the middle -- and opt not to use tombstones.
| xxs wrote:
| Linked lists have higher constant const memory wise, much
| slower to iterate, effectively removals of a random in the
| middle are the same as vector. When used as queues the ring
| buffers are better for most applications.
|
| Their prime use is that adding new elements lack a terrible
| worst case, aside the fact it puts more pressure on the
| memory allocator - need to be concurrent friendly.
|
| If there is a full control of the memory allocation (e.g.
| kernels) linked lists make sense, for virtually all other
| cases there are better data structures.
| boomlinde wrote:
| I'd first like to point out that yours is an argument from
| the perspective that performance is crucial and dependent
| on the use or non-use of linked lists. This is not
| necessarily the case. Performance may not be crucial, and
| even when it is, linked lists can find use in areas where
| they're not even nearly the bottleneck, for being the most
| obvious data structure that satisfies the required
| operations on the collection. That pointer chasing is slow
| is not an argument against the use of linked lists in
| general.
|
| What if I only ever iterate over a list of some hundred
| entries in the rare case of an error? Should I be reaching
| for the most performant data structure or the one that most
| obviously and clearly maps to the operations the program
| needs to perform on it at some infinitesimal cost?
|
| > Linked lists have higher constant const memory wise
|
| Than a fixed vector with a known maximum number of
| elements, yes. With something like a dynamic array you are
| however usually better off using a linked list memory-wise,
| unless you want to reallocate every few insertions or your
| node data is smaller than the link.
|
| > effectively removals of a random in the middle are the
| same as vector.
|
| You're conflating concepts. Seeking is O(n). Removal is
| O(1). There are cases where seeking for the purpose of
| removal is either amortized (e.g. you're already iterating
| over all the nodes, executing an associated handler for
| each which may or may not determine that the node should be
| unlinked in-place) or completely unnecessary (some other
| data structure already tracks a reference to the node and
| some code associated with that determines if/when it is
| removed).
|
| One might similarly argue that I am conflating compacting
| (O(n)) with removal (O(1)) in the vector case, but in the
| case of linked lists, the difference is just a natural
| consequence of design, while for a typical implementation
| of vector operations they are the same operation and
| require an additional data structure to be anything but
| that in order to track the holes.
|
| Moving elements around in memory also comes with a host of
| other issues, for example if there are many references to
| the elements, requiring maintaining and indirectly
| accessing memory via an index array.
|
| > aside the fact it puts more pressure on the memory
| allocator - need to be concurrent friendly.
|
| This is again a conflation of concepts: that of using
| linked lists and that of making space for the nodes or
| preparing an allocator for concurrent application. The
| nodes could be on the stack for some applications, in a
| static pool for others.
| bsder wrote:
| What do you think is the data structure underneath a hash,
| stack or queue? It's likely a linked list.
|
| So, you are completely correct. The issue is simply that you
| are at a different level of abstraction.
| ralferoo wrote:
| I don't use Zig, but one advantage of having a genericised
| intrusive linked list where the next pointer doesn't have to be
| the first thing in the structure is when you want to use larger
| types, such as 128-bit fields. Sticking a pointer at the
| beginning would mean the compiler would have to insert alignment
| padding after the pointer or break the default alignment.
| jwmerrill wrote:
| The next pointer doesn't have to go first in the structure
| here. It can go anywhere, and you can use @fieldParentPtr to go
| back from a reference to the embedded node to a reference to
| the structure.
| n42 wrote:
| I think that's the point the parent was making. I read it as
| them describing a benefit of Zig's approach.
| ralferoo wrote:
| Yeah, that's what I was trying to say, but obviously not
| clearly enough.
| jwmerrill wrote:
| My mistake! It seems clear in hindsight...
| mastax wrote:
| > The new version isn't generic. Rather, you embed the linked
| list node with your data. This is known as an intrusive linked
| list and tends to perform better and require fewer allocations.
|
| I don't get this. In the generic version the data is embedded
| with the linked list node, so there's only one allocation per
| node. 'T' isn't a pointer (or doesn't have to be) there's no
| indirection between data and node.
| CBLT wrote:
| > only one allocation per node
|
| I believe the implication is there's fewer than one allocation
| per node with the new API. You allocate contiguous memory once,
| and use it to store n elements.
| flowerthoughts wrote:
| No, that's not stated.
|
| > The new version isn't generic. Rather, you embed the linked
| list node with your data. This is known as an intrusive
| linked list and tends to perform better and require fewer
| allocations. Except in trivial examples, the data that we
| store in a linked list is typically stored on the heap.
| Because an intrusive linked list has the linked list node
| embedded in the data, it doesn't need its own allocation.
|
| Is simply a misunderstanding. The storage layout will be the
| same for the generic and the intrusive one.
|
| The benefit of intrusive linked lists is that each node can
| be a member of several linked lists with a single allocation
| per node. This is used heavily e.g. in the Linux kernel.
|
| The cost is that you need to keep track of the offset from
| object address to which linked list pointer you're currently
| dealing with. That's often a compile time constant, but makes
| the API more awkward. In this case it seems to be the string
| "node", which seems nice enough. C libraries often use the
| preprocessor to do something similar.
| mastax wrote:
| new Node<TStruct>[16]; new
| TStructContainingNode[16];
|
| What's the difference?
| ulbu wrote:
| there's 16 contiguously stored pointers to 16 non-
| contiguously stored tstructs (they may be contiguously
| stored, but you can't make this assumption from this type).
| there's 16 contiguously stored tstructcontainingnode's.
| torginus wrote:
| I remember reading an article about this technique - it was
| used in the original Starcraft. The idea here, is that the
| object needs to be allocated anyway by someone else, so you get
| the linked list for free.
|
| An object can also be part of multiple lists, each used by a
| different subsystem with its own concerns.
|
| Deleting an item involves unlinking it from all lists (though
| that requires a doubly-linked list)
|
| Edit: found the article
|
| https://www.codeofhonor.com/blog/tough-times-on-the-road-to-...
| jamiejquinn wrote:
| AFAIK it's also a structure heavily used by Nethack 3 to
| manage its astounding complexity. Modern, complex roguelikes
| seem to be using ECS as an alternative but I think they still
| have their uses.
| ajross wrote:
| > I don't get this. In the generic version the data is embedded
| with the linked list node, so there's only one allocation per
| node.
|
| The reason is that the object being placed "in" the list can
| have a longer lifetime than the list node, in fact that's
| generally the case. Like, I work on Zephyr and in Zephyr we
| have "threads", whose tracking structs are managed by the
| application code. But they can be in kernel-maintained lists
| like a scheduler queue, which is an ephemeral state that the
| essentially-perpetual thread will long outlive. There's no
| place to allocate a "generic" node at list insert time.
|
| (Also, more importantly, Zephyr is an RTOS and disallows heap
| use in the core parts of the system to permit use in
| environments that need to disallow dynamic allocation. But
| that's not really relevant to a generics discussion.)
|
| But you can trivially put a scheduler queue node right into the
| thread object, knowing that it's a behavior of the type. The
| semantics become different though: you only get the one node,
| it's not possible to have multiple "lists of threads" using
| this technique unless you know exactly what each list is for
| ahead of time.
| mppm wrote:
| It would probably have been more correct to say "requires fewer
| allocations _in some cases_ ". As you point out, in terms of
| layout, the old generic version is just as intrusive as the new
| version, and requires just as many allocations (one). However,
| the new version gives extra flexibility for you to move a long-
| lived object in and out of lists without copying or allocating,
| at the cost of having the pointer field baked into it
| permanently, rather than on demand.
|
| I think the reasoning behind this change is that (from Zig's
| perspective), if you are using linked lists you are probably
| doing something wrong, _unless_ your application requires the
| above-mentioned kind of juggling, which favors explicitly
| intrusive LLs. In addition, this change de-generifys the lists
| 's methods, like `prepend`, which reduces code size a little.
|
| At least that's my understanding.
| DarkWiiPlayer wrote:
| > However, the new version gives extra flexibility for you to
| move a long-lived object in and out of lists without copying
| or allocating
|
| You could also do this with the entire node in the case of a
| generic implementation though, the API just needs to expose
| the node struct to you and allow you to detach it; but the
| same is true for this new implementation as well.
|
| In terms of memory, a detached node that embeds the payload
| struct isn't different from an object with an embedded
| detached node.
|
| What changes is that now, if you _have_ an object class that
| you don 't want to (or can't) extend to include a list node,
| you have to wrap it in a container struct that, again, looks
| the same in memory but now has a node and your object as its
| members. I'm not sure if this is really much of an
| improvement at the end of the day.
|
| Also, correct me if I'm wrong (I don't do zig), but shouldn't
| it be possible to create a list of void elements from the
| generic implementation and embed its node type inside your
| object, then proceed from there as if it were the new
| implementation?
| mppm wrote:
| Yeah... with bitcasts and some creativity (and some
| boilerplate) both versions are ultimately equivalent, or
| nearly so. But the new one pushes you towards intrusive-
| data-structure thinking and away from container-of
| thinking.
|
| This, by the way, is a typical design consideration in Zig
| -- using "friction" to steer programmers away from doing
| the Wrong Thing (according to Andrew). In addition, Zig is
| really more of a fancy macro-assembler for outputting
| optimal code, and less a pragmatic general-purpose
| language, and makes design decisions accordingly. Taking
| both together, the linked-list change sort of makes sense,
| even though personally, I would have just added a separate
| intrusive list data structure.
| lightingthedark wrote:
| Can someone explain how the claim of higher performance works
| here? In C, which lacks generics, an intrusive list is preferred
| because otherwise you end up with each node having a void pointer
| to the data it holds. The previous Zig version was a generic, so
| the data type could easily be a value type. Given that the
| compiler is allowed to rearrange the layout of both structs
| unless data is packed or extern (in which case it almost
| certainly won't want to have a linked list node in the middle
| anyway) isn't the resulting type exactly the same in memory
| unless you intentionally made T a pointer type?
| ashdnazg wrote:
| I don't understand the higher performance either. What I know
| as the significant advantage is that you can have one object in
| multiple lists.
| lightingthedark wrote:
| Technically that should be possible the other way by using a
| Node<T> so the type of the second list ends up being a
| Node<Node<T>> but I can see why an intrusive list would be
| preferred to that, and also the linked list API might prevent
| that pattern.
|
| Usually if I have multiple lists holding something I have one
| that's the 'owner' and then the secondary data structures
| would have a non-owning pointer to it. Is that the case where
| the performance would be better with an intrusive list? My
| intuition would be that having multiple Node members would
| pollute the cache and not actually be a performance win but
| maybe it is still better off because it's all colocated?
| Seems like the kind of thing I'd have to benchmark to know
| since it would depend on the number of lists and the size of
| the actual data.
| lightingthedark wrote:
| Okay so in the multiple list case performance would
| actually be worse because you'd have a double pointer
| dereference. I was thinking you'd have the list nodes
| contiguous in memory so the first dereference would always
| hit cache but that's a bad assumption for a linked list.
|
| Since you shouldn't reach for a linked list as a default
| data structure modern hardware anyway, I actually do see
| how this change makes sense for Zig. Neat!
| codr7 wrote:
| Allocating list nodes in one block of memory is very
| common in the intrusive case.
| messe wrote:
| > What I know as the significant advantage is that you can
| have one object in multiple lists.
|
| Another advantage is smaller code size, as the compiler
| doesn't need to generate code for SinglyLinkedList(i32),
| SinglyLinkedList(User), and SinglyLinkedList(PointCloud).
| This could have a performance impact by making it more likely
| that code remains in the cache.
| anarazel wrote:
| Intrusive lists are often used to enqueued pre-existing
| structures onto lists. And often the same object can be in
| different lists at different times.
|
| That's not realistically dealt with by the compiler re-
| organizing the struct layout.
| grayhatter wrote:
| > Can someone explain how the claim of higher performance works
| here? In C, which lacks generics, an intrus
|
| I can only give a handwavey answer because I've yet to see any
| data, and if an engineer tells you something is better but
| doesn't provide any data, they're not a very good engineer. So
| grain of salt and all that. But the answer I got was because
| cache performance. Writing code this way your CPU will spend
| less time waiting for main memory, and the branch predictor
| will have better luck. The argument makes sense, but like I
| said,I've yet to see real world data.
|
| > isn't the resulting type exactly the same in memory unless
| you intentionally made T a pointer type?
|
| Yes and no. If I understand what you mean, the bit layout will
| be the 'same'. But I think your confusion is more about how
| what a compiler means by pointer type, and what a human means.
| If you pull away enough layers of abstraction, the compiler
| doesn't see *Type it'll only see *anyopaque, phrased completely
| differently; according to the compiler, all pointers are the
| same and are exactly memory_address_size() big. *Type doesn't
| really exist.
|
| Writing it this way, imagine using just the LinkedList type,
| without a container of any kind. node to node to node, without
| any data. While it would be pointless, that would (might) be
| faster to walk that list, right? There's no extra data loads
| for the whole struct? That's what this is. Using it this way it
| gets complicated, but translating theory to asm is always
| messy. Even more messy when you try to account for speculative
| execution.
| Zambyte wrote:
| > But I think your confusion is more about how what a
| compiler means by pointer type, and what a human means. If
| you pull away enough layers of abstraction, the compiler
| doesn't see *Type it'll only see *anyopaque, phrased
| completely differently; according to the compiler, all
| pointers are the same and are exactly memory_address_size()
| big. *Type doesn't really exist.
|
| Check out the implementation of SinglyLinkedList in the
| latest release (before the change in the post)[0]. You'll
| notice the argument for SinglyLinkedList is (comptime T:
| type), which is used in the internal Node struct for the data
| field, which is of type T. Notably, the data field is _not_ a
| *T.
|
| In Zig, when you call the function SinglyLinkedList with the
| argument i32 (like SinglyLinkedList(i32)) to return a type
| for a list of integers, the i32 is used in the place of T,
| and a Node struct that is unique for i32 is defined and used
| internally in the returned type. Similarly, if you had a
| struct like Person with fields like name and age, and you
| created a list of Persons like SinglyLinkedList(Person), a
| new Node type would be internally defined for the new struct
| type returned for Person lists. This internal Node struct
| would instead use Person in place of T. The memory for the
| Node type used internally in SinglyLinkedList(Person)
| actually embeds the memory for the Person type, rather than
| just containing a pointer to a Person.
|
| These types are very much known to the compiler, as the
| layout of a Node for a SinglyLinkedList(i32) is not the same
| as the layout of a Node for a SinglyLinkedList(Person),
| because the argument T is not used as a pointer. Unless, as
| the gp mentioned, T is explicitly made to be a pointer (like
| SinglyLinkedList(*Person)).
|
| [0] https://ziglang.org/documentation/0.14.0/std/#src/std/lin
| ked...
| hoseja wrote:
| Linked lists should be obscure niche data structures for when you
| absolutely need their unique characteristics, not some front-and-
| center default.
| netbsdusers wrote:
| They're not obscure or niche, they are everywhere in operating
| systems. The linked list is probably the single most essential
| and the most used data structure in any kernel.
| simonask wrote:
| Implementing an operating system is incredibly niche and
| obscure.
|
| The practices that apply to kernel development do not
| generally apply to very much else.
| fcantournet wrote:
| Ah yes, operating system kernels, famously super mainstream
| programs that almost everyone works on.
| grayhatter wrote:
| This is true for the kind of programmin most people do. Linked
| Lists make no sense in JS, Python, Rust, etc. But they are
| without a doubt still an incredibly useful pattern with
| unmatched performance characteristics. Especially if you're
| trying to be memory efficient. It's a systems programming
| pattern, rarely useful if you're just a web developer.
|
| But then again... I never thought I'd need to know music
| waveform theory while I was watching the magic school bus as a
| kid... Turns out that was incredibly useful when I needed to
| implement a music note generator for a VoIP client... You never
| know when something might actually be really useful.
| simonask wrote:
| This comment makes no sense.
|
| 1. You have linked lists in Rust. They're in the standard
| library.
|
| 2. Linked lists do not have "unmatched performance" - they
| are slower than almost anything, outside of highly
| specialized use cases. Even moreso for intrusive linked
| lists. Worst of all, they are horrible for cache locality.
|
| There are valid uses of linked lists, that's why they exist,
| but most people don't have those use cases.
| grayhatter wrote:
| Speed isn't the only performance metric that matters. I
| probably should have said explicitly that I'm referring to
| uses where it makes sense to use a linked list. But
| unfortunately I didn't specify because that wasn't the
| point I really wanted to make.
|
| There's a lot of things to learn that appear to to not
| makes sense... until suddenly they do.
|
| > There are valid uses of linked lists, that's why they
| exist, but most people don't have those use cases.
|
| Isn't that pretty much what I said?
|
| I'm pretty sure we agree that they're useful, but their
| usefulness isn't as universal nor as common as the pattern
| is.
| simonask wrote:
| Linked lists are worse than flat arrays in every single
| performance metric other than ordered removal. They use
| more memory (the links), have inferior cache locality,
| cause more allocations (unless you're allocating anyway,
| in the case of intrusive lists).
|
| Their one benefit - O(1) ordered removal - is easily
| achieved with a flat array, in at least two common ways:
| swap in the last element if you don't care about the
| order, or mark the index as vacant if you do care about
| the order, or if you care about the stability of
| references.
|
| I mean, I agree that they're kind of neat from a certain
| perspective, but they're almost never the right choice.
| xxs wrote:
| About the ordered removal, if it's not very common, using
| tombstones (a constant to denote no element) is fine. Too
| many tombstones would need a compaction phase and still
| it'd work way better than a linked list.
|
| Personally I have not met a case where a random removal
| in the middle of a list is common -- b/c finding the
| needle is slow to begin with O(n). In such case linked
| hash sets are O(1) but way more memory intense (which is
| a very advanced linked list)
|
| That being said linked queues are great for concurrent
| lock free cases.
| delamon wrote:
| Dynamic arrays often use doubling as a resize strategy.
| This means that on average about 25% of capacity is
| "wasted". Depending on element size and element count it
| may lead to larger memory usage compared to linked list.
| xxs wrote:
| >Especially if you're trying to be memory efficient
|
| Linked list have higher const cost memory wise compared to
| array backed structures. Even when an array back structure is
| half empty it's still takes less memory.
| grandempire wrote:
| I'm all about data-oriented design, but I don't think this is
| true - you need their unique characteristics in almost every
| project.
| simonask wrote:
| In all of my years, I have seen maybe 2 projects that had one
| valid use case each. They exist, sure. It's not that common.
| boomlinde wrote:
| Out of curiosity I looked up some of the software I've
| meaningfully interacted with today. Of all I looked up--the
| operating system kernel, the init system, the shell, the
| terminal emulator, the browser, the compilers, the text
| editor, the windowing system, the window manager, the
| notification daemon, the audio server, the audio session
| manager, the GPG agent, the NTP daemon, the SSH daemon, the
| DHCP client, the wireless network manager, the power
| manager, the policy manager, D-Bus, the video player, the
| video editor--each uses linked lists. There's some more
| system software showing up in ps (which by the way uses
| linked lists) that I haven't considered but I am rather
| confident that most of it uses linked lists.
|
| Maybe you only see these projects as a user, but linked
| lists are not uncommon. Your experience reflects your,
| well, experience. We all sit in different niches.
| simonask wrote:
| I'm wondering what makes you feel confident about the use
| of linked lists in all of those components.
|
| Mind you, most of those will be written in C on a typical
| Linux installation, and linked lists happen to be one of
| the two collection types that are relatively easy to use
| in C (the other being a flat array), so I will concede
| that some software is using linked lists out of
| desperation, rather than it being the correct choice. :-)
| boomlinde wrote:
| > I'm wondering what makes you feel confident about the
| use of linked lists in all of those components.
|
| Of all of those mentioned I literally looked up their
| source repositories up and searched for obvious
| indicators like "linked", "list", ".next" or "->next" and
| then verified that I was indeed looking at one or more
| linked lists. Where does your confidence come from? Oh
| right, you already mentioned it: it's based on your
| experience of the projects you've worked on.
|
| The rest of your reply is just moving goalposts, mind
| reading and a glaring category error. Get back if and
| when you have something useful to add.
| simonask wrote:
| That's an incredible amount of effort to throw at an
| argument on Hacker News.
| tialaramex wrote:
| Not really desperation, it's just easier. Sometimes this
| is where perf doesn't matter, any choice would be fine, a
| linked list of the up to 4 Doodads won't be meaningfully
| worse or better than a growable array of the 4 Doodads,
| or I dunno, a HashMap from index to each of the four
| Doodads. Stop worrying about it and focus on the real
| problem.
|
| In larger C software sometimes a use of linked lists is
| costing meaningful performance and a growable array type
| would be a huge win - but the other side of the coin is
| that sometimes in these perf critical environments a
| linked list actually _is_ the right choice, and a C
| programmer might luck into that since it was the first
| tool to hand while say a C++ or Rust programmer might
| take longer to realise that this data structure is the
| best option.
| esjeon wrote:
| Hmph, is there any rationale against a generic wrapper over it?
| Exposing the internal details can be irky in higher level
| application development. It would not hurt to have some inline
| functions.
| MawKKe wrote:
| Isn't the new one also horribly type-unsafe? Since you can put
| (parent) objects of any kind into the list without any mechanism
| to detect which is which?
| mgaunard wrote:
| There are several advantages to intrusive node-based data
| structures that I haven't seen stated in the article or the
| comments:
|
| - the same object can move between containers with no allocation
| and no need for a dedicated complex API
|
| - the same object can be part of multiple containers at once;
| particularly useful for intrusive binary trees, for indexing data
| with different criteria.
|
| - the container can be fully polymorphic, no need for all
| elements to be the same dynamic type.
|
| - no need for complex allocators, you can just store the objects
| as you see fit.
| rowanG077 wrote:
| All of these sound like grievous sources of bugs to me. Sure
| it's nice that it's possible but please don't do those things
| unless there isn't a safer way to do it.
| Diggsey wrote:
| They are definitely more error-prone usage patterns, and
| coming from Rust the amount of hidden unsafety is quite
| unsettling (in addition to the obvious potential for user
| errors, depending on the aliasing model of your
| language/compiler, which I suspect Zig hasn't figured out
| yet, this parent pointer trick could even lead to
| miscompilations...)
|
| Having said that, intrusive linked lists are quite an
| important data structure in code which is required to be
| allocation-free. Kernel code, signal handlers and code which
| requires consistent performance (eg. audio processing) need
| to be able to move objects between containers or have objects
| in multiple containers without allocating.
| bananabiscuit wrote:
| Do you mean you will likely make an error when implementing
| a standard textbook linked list? Or that a standard
| textbook linked list has non-obvious errors in its design?
| If so, would be curious to learn more.
| im3w1l wrote:
| _Implementing_ anything is more error prone than _using_
| something battle-tested.
| simonask wrote:
| Most programmers can implement a linked list.
|
| Extremely few can use one safely.
|
| They're an elegant weapon for a more civilized age -
| namely one without multithreading, aliasing-based
| optimization, and a host of other nice things.
| tauoverpi wrote:
| Multithreading is exactly where linked lists are useful
| as in the implementation of MPSC, MPMC, SPMC, and so on
| it allows one to safely construct objects out of vire of
| other threads then use a CAS to append the new item to
| the list making it visible for other threads to consume
| even without needing to allocate additional memory.
|
| They're also pretty useful for free-list structures which
| overwrite entries to be removed (they're scheduled for
| deletion thus assumptiong here is that the object is now
| dead) with linked list nodes that would fit in that space
| thus never needing to allocate any additional as you can
| reuse existing space.
| simonask wrote:
| Yeah, wait-free data structures often contain linked
| lists. That doesn't cancel out all of the other problems.
|
| Anecdotally, I've seen wait-free structures used
| incorrectly many more times than I've seen them used
| correctly. Some people think they are magically faster
| because "acquiring a mutex lock is expensive", but that's
| far from always the case.
| codr7 wrote:
| I don't see a problem with their safety, even in C.
|
| More the opposite, their simplicity makes them safer to
| deal with than the alternatives. And sometimes their
| unique features (mostly the intrusive variant) are
| exactly what you need.
|
| Multi-threaded accesses to a vector would be just as bad?
| rowanG077 wrote:
| Use a separate data structure and then index into it with a
| linked list like API. No need to do something so dangerous
| if you simply need an allocation less linked list
| interface.
|
| But yes there is a place for these data structures. It just
| shouldn't be common.
| Diggsey wrote:
| If you just mean using array indexes instead of pointers,
| that can work in some cases, but can run into concurrency
| issues when the container needs to be resized. With
| pointers to nodes, new nodes can be allocated and old
| nodes freed on a separate thread without interfering with
| the performance sensitive thread, whereas resizing an
| array requires all access to stop while the backing
| memory is re-allocated.
|
| I don't think you mean this, but elaborating on why a
| non-intrusive linked list is not very useful, that's
| because it takes linear time to find an object in a non-
| intrusive linked list, but constant time in an intrusive
| linked list.
|
| For example: struct Node {
| Node* prev_by_last_update; Node*
| next_by_last_update; Node* prev_with_owner;
| Node* next_with_owner; Payload data;
| }
|
| This intrusive structure allows reassigning and updating
| nodes, whilst being able to efficiently find eg. the next
| most recently updated node, or to iterate over nodes with
| the same owner.
| throwawaymaths wrote:
| Maybe advocate for wrappers in the stdlib to implement
| typesafe. LL, typesafe extrusive LL, etc. Using this as a
| primitive?
| immibis wrote:
| bugs* ironic
|
| It depends on whether you approach programming from a
| perspective of writing piles of obviously correct abstraction
| layers (the total effect of which is to leak so much the
| solution ends up being wrong anyway), or from the direct
| unabstracted perspective of actually making the computer do
| what you want it to do (which does not preclude the solution
| from using abstractions).
|
| I call them the Uncle Bob vs Casey Muratori school, after
| Casey's lectures such as "Clean Code, Horrible Performance":
| https://www.youtube.com/watch?v=tD5NrevFtbU
| rowanG077 wrote:
| Well uncle bob gives good advice. If you do exactly the
| opposite of what he tells you to do. so I don't think this
| comparison is apt at all.
| immibis wrote:
| Can you elaborate on why he's a hack and perhaps how this
| relates to the idea of forcing yourself to use highly
| constrained inefficient "safe" data structures?
| rowanG077 wrote:
| Because the excessive OOP and TDD he recommends is just
| bad. He also hates static typing and seems to think
| testing can somehow replace it.
|
| I'm not for forcing inefficient safe data structures. You
| are pretending that this is a dichotomy when the vast
| majority of the time it isn't. You are completely
| misinterpreting what I said. I'm for safe data
| structures. And the data structures described are huge
| footguns. I made no point about efficiency.
| Ygg2 wrote:
| In programming and life, both extremes can be equally
| wrong. I'm not saying Uncle Bob is the shining beacon to
| steer your boat towards at all cost, but neither is Casey
| Muratori.
|
| > He also hates static typing
|
| I mean, in OOP there are many reasons to dislike it. It's
| harder to extend and test. Calls to such functions within
| other methods are a nuisance to remove. E.g.
| `Instant.now()`.
| rowanG077 wrote:
| I'm sorry I'm not familiar with Casey Muratori. So I have
| no clue what he says.
| Ygg2 wrote:
| He's very data and performance oriented. He gives advice
| on how to write games, high-performance systems. His
| advice is solid and great even, but not when your source
| of latency is a cloud DB.
| immibis wrote:
| I beg to differ even on that. When your source of latency
| is a cloud DB, learning about CPU caches won't help you,
| but learning to think about end-to-end performance
| absolutely will. "What sequence of queries achieves my
| goal with the shortest critical path?" is very similar to
| "What sequence of vector instructions achieves my goal
| with the shortest loop-carried dependency?"
| Ygg2 wrote:
| > but learning to think about end-to-end performance
| absolutely will.
|
| In a cloud DB webservice what will save you is having a
| good tracing/debugging story. From it, it will be easy to
| see E2E performance. Sure, some things will translate,
| but going SIMD optimized layout on a class, that will
| have to talk to DB across the ocean is a fool's errands.
|
| To optimize you must have a good target and make sure you
| are measuring correctly.
| immibis wrote:
| Why do you think that making a minimum number of queries
| to a cloud database is anything other than conceptually
| similar to making a SIMD optimized layout on a class?
| Ygg2 wrote:
| > do what you want it to do
|
| Until a colleague or future you, or someone else calling
| the code, forgets.
|
| Which leads to "I WANT TO KILL WHOEVER WROTE THIS"
| debugging sessions.
|
| Encapsulation exists for a reason.
| tauoverpi wrote:
| You don't need everything to be generic to have safety as the
| API provided by the object using the linked list can maintain
| safety by making it difficult to do the wrong thing and being
| correct by construction. Of course that doesn't stop a user
| of a library ignoring the safe way to handle types which use
| intrusive linked list nodes but that same user is likely
| going to shoot themselves in the foot regardless even when
| given a "safer" interface.
|
| Trying to write a safer version of the above without losing
| the performance / advantages takes you into the realm of ATS2
| https://www.cs.bu.edu/~hwxi/atslangweb/ which quickly becomes
| more difficult to work with than practical (and no, rust
| can't solve this with it's implementation of resource types
| as far as I'm aware).
|
| So yes, doing the above makes sense when the problem demands
| it as allocating memory you don't need when there's a clear
| way to represent a graph without it just wastes cycles on
| memory management you didn't even need to do.
| suspended_state wrote:
| I think the main advantage of the intrusive definition is that
| you can use it to implement the non intrusive one easily,
| whereas the reverse isn't possible.
| NobodyNada wrote:
| With a non-intrusive linked list type, can't you define the
| intrusive version as `Node<void>`?
| MrCroxx wrote:
| Agreed. Intrusive data structures are good for implementing
| multi-container data structures with lower allocation overhead.
| And they are widely used than expected. E.g. LRU is a classic
| multi-container problem.
| grandempire wrote:
| Is there any generic implementation which is not intrusive? I
| expect C++ forward_list to look like
|
| struct Node<T> { Node<T> *next; T x; }
| _huayra_ wrote:
| At least in C++ land, that is not quite what is referred to
| as intrusive lists. It's basically the opposite / inside-out
| of what you wrote:
|
| ```C++ struct MyType : Node<MyType> { Node<MyType> _next,_
| prev; // rest of data }; ```
|
| I usually reach for Boost.Intrusive when I need this [0].
|
| [0] https://www.boost.org/doc/libs/1_88_0/doc/html/intrusive/
| usa...
| grandempire wrote:
| I see. I am noticing the main difference is that
| forward_list manages the lifetime and allocation of nodes
| and that having a pointer to an object is equivalent to a
| pointer to the node (could be implemented as a helper).
|
| The layouts look the same?
| _huayra_ wrote:
| At the byte level, it's quite possible the layouts are
| the same. However, an "intrusive data structure" means
| that the nodes themselves are the data.
|
| In other words, intrusive is like this:
|
| struct T : NodeType<T> { // T data NodeType<T> _next,_
| prev; };
|
| whereas non-intrusive data structures the T's are _not_
| the nodes themselves
|
| struct NodeType<T> { NodeType<T> _next,_ prev; T t; };
|
| Doing non-intrusive means you need to own the lifecycle
| of the nodes (and make them copyable, move-constructible,
| or some combo thereof). Intrusive means that the caller
| can manage the nodes' lifecycles and just say "here's a
| node, I promise it'll live while the list is still alive"
| and have that node be on the stack, heap, static, etc.
| cheezebubba wrote:
| How can you have elements of different types? I don't
| understand how you would know which type a given node was in?
| throwawaymaths wrote:
| it was maybe easier to understand when @fieldParentPtr took
| the type as an argument, but if you look at this:
| var x: *T = @fieldParentPtr("node", node_ptr);
|
| @fieldParentPtr knows that its result must have type pointer
| T; and then the builtin function "looks" inside of T and
| notices there is a "node" field. Then it can backtrack the
| pointer of T given the pointer to the node field, since the
| layout of T is well-defined at compile time.
|
| So you could have a mixed-type linked list, and at the
| callsite zig can at _compile-time_ back out the parent type.
| There obviously needs to be outside logic (and an outside
| mechanism to track) this with branches for each type that you
| could be putting into the linked list. And yes, _this is more
| than a little bit type-unsafe because of the potential for
| type erasure._
|
| You could probably pretty trivially write a typesafe wrapper
| for this, as suggested in one of the comments here.
| cheezebubba wrote:
| Thanks. I still don't understand where the bits live that
| tell you which type T is for a given node.
|
| Is this something like "list starts with type A, and that
| will tell you if the next node is type A or B"?
|
| Is there a way to stuff those bits in with the node somehow
| so you always know the type you expect back?
| throwawaymaths wrote:
| > Thanks. I still don't understand where the bits live
| that tell you which type T is for a given node.
|
| Yup. You'd have to set that up yourself. It could be
| anywhere. It could be in a separate array, it could be in
| the previous node-data, etc.
|
| you wouldn't put it in the Node (capital N) datastructure
| though, that is privileged.
|
| Personally, I would say doing something like this is "bad
| idea bears" unless you are absolutely certain you need it
| and you should explore using, say, data payload type with
| a union field instead, if possible (though know that that
| comes with a potential memory penalty)
| cheezebubba wrote:
| Aha Thanks!
| paraboul wrote:
| > the same object can be part of multiple containers at once
|
| I'm not sure I understand this one. Since the object contains
| the reference to where it belongs inside a container (e.g.
| object.node.next) how can it be re-used in multiple containers.
| Conversely, in a non-intrusive data structure, multiple
| containers can hold a ref to the same object through an
| intermediate node object
| gnubison wrote:
| You add multiple next variables. buffer.next,
| buffer.next_displayed, etc
| paraboul wrote:
| That's not an advantage of intrusive data structures then.
| That's precisely an advantage of _non-intrusive_ data
| structure : object can be inserted in an arbitrary number
| of containers
| mikepurvis wrote:
| But for the intrusive one you have the visibility into
| its membership in the other structures, since you're
| holding a reference to the object _and_ its intrusive
| fields.
|
| In the non-intrusive case, all you know is how you got
| there, you don't have return info to the other
| structures-- particularly annoying if you're trying to
| destroy an object and have to speculatively look for
| references to it in the other lists in order to get rid
| of them.
| paraboul wrote:
| Of course. But I wouldn't qualify "multi-container use"
| to be an advantage of intrusive data structures _per se_.
| Zambyte wrote:
| The object can contain multiple intrusive node fields.
| boomlinde wrote:
| > the same object can move between containers with no
| allocation and no need for a dedicated complex API
|
| This is true also of the previous design.
|
| > no need for complex allocators, you can just store the
| objects as you see fit.
|
| Same here; the previous design leaves it up to the user.
| kajika91 wrote:
| Couldn't we have a function to return the data like this?
| pub const SinglyLinkedList(comptime T: type) type {
| return struct { first: ?*Node = null,
| pub const Node = struct { next: ?*Node = null,
| }; const Self = @This(); pub fn
| data(self: Self) *T { return @fieldParentPtr("node",
| self); } };
| nh1996 wrote:
| This would require the SinglyLinkedList to be generic and would
| require that the data struct use "node" as the field name.
| Also, as some comments have pointed out, this type of linked
| list can be useful when a data struct needs to be in multiple
| lists, in which case there is no single "node" field.
| const some_data = struct { // Some data fields
| // ... bar_node: SinglyLinkedList.Node,
| baz_node: SinglyLinkedList.Node, };
| throwawaymaths wrote:
| sure, just make T == void fall back to the existing
| implementation.
| RcouF1uZ4gsC wrote:
| Awesome! In my opinion, an intrusive linked-list is almost always
| what you really want when you reach for a linked list. An
| external linked list is often worse than a vector.
| alexvitkov wrote:
| Please just use a pointer. Use metaprogramming when it makes
| sense, not to obfuscate something that was figured out 60 years
| ago.
| bmandale wrote:
| A few notes I would make as an outsider:
|
| - conventionally we put the `node' struct at the start of the
| `user' struct. The advantage is that we can convert from node* to
| user* with a simple cast instead of messing around with offsets.
| Knowing the offset stuff is still potentially useful in case you
| want multiple `node's in a single struct, but that is seldom the
| case so you can almost always get by with just a cast.
|
| - this strategy is, especially with the node at the start of the
| user, equivalent to inheritance. I take it from the fine article
| that zig doesn't have inheritance? If they are planning on adding
| more interfaces like this linked list one, then inheritance would
| be a good idea. We often treat inheritance using analogies to the
| animal kingdom or to geometric shapes, but data structures like
| linked lists are (to my understanding) the problem that
| inheritance is actually designed to solve, and it is really the
| best approach to solving problems like in the OP.
|
| - just lol on zig speed running being a crappier C++.
| RKFADU_UOFCCLEL wrote:
| This is in fact a com.on technique in the Linux kernel:
| https://en.wikipedia.org/wiki/Offsetof#Usage
| zoogeny wrote:
| > We give @fieldParentPtr the name of the field, a pointer to
| that field as well as a return type (which is inferred above by
| the assignment to a *User variable), and it gives us back the
| instance that contains that field.
|
| That seems like a refactoring nightmare. If I say:
| const user: *NotAUser = @fieldParentPtr("node", n);
|
| Will it calculate this incorrectly? I mean, if `NotAUser` has a
| "node" field and `n` is actually from the `User` type (and it has
| a different offset in each type)?
| tpayet wrote:
| I read Zig's new LinkedIn API, and I was intrigued
| codr7 wrote:
| Not super uncommon in C land, I learned the technique from the
| Linux kernel.
|
| https://github.com/codr7/hacktical-c/tree/main/list
|
| Haven't come across another language (besides Zig maybe) low
| level enough for it to make sense. I've tried in C++ and Go, but
| it gets so ugly that it loses its appeal.
___________________________________________________________________
(page generated 2025-04-16 17:02 UTC)