[HN Gopher] Fast linked lists
       ___________________________________________________________________
        
       Fast linked lists
        
       Author : dmitry_dygalo
       Score  : 167 points
       Date   : 2024-05-14 13:51 UTC (9 hours ago)
        
 (HTM) web link (dygalo.dev)
 (TXT) w3m dump (dygalo.dev)
        
       | ertucetin wrote:
       | 7-8 years ago I created GlueList
       | (https://github.com/ertugrulcetin/GlueList) in order to build
       | faster version of LinkedList + ArrayList. It was a fun effort.
        
         | boxed wrote:
         | I believe that's called a "rope"
        
           | vbezhenar wrote:
           | Rope is binary tree of arrays, for O(logN) index operation.
        
         | amelius wrote:
         | It was fun because you didn't write it in Rust :)
         | 
         | https://rust-unofficial.github.io/too-many-lists/
        
         | duped wrote:
         | If I have this right, what you've built is this, storing M
         | items in M / N nodes where N is the radix of the array?
         | 0                 1                        M / Nth node
         | [ N elem ] [.] -> [N elem] [.] -> .... -> [M - (M / N) elem]
         | [null]
         | 
         | And so if you want to index the `i`th element you chase the
         | pointers                   index (node i) :             acc = 0
         | while acc + N < i                acc += N                node =
         | node.next             return node.array[j - acc]
         | 
         | Or something like that? You can do better using a B tree and
         | exploit the fact that you don't need keys to just store
         | pointers in the branch nodes. This reduces the number of
         | pointer dereferences in large arrays.
         | 
         | For example say you have N = 8 and there are 1024 elements in
         | the list, you would need 127 pointer dereferences to reach the
         | end of the array. In a B-tree you only have 3. (double check my
         | math, I might be off by one).
         | 
         | This is the typical implementation for immutable/persistent
         | vector types. If you keep the links between leave nodes at the
         | bottom of the tree, you've got a B+ tree and have fast
         | iteration through it as well.
        
           | ertucetin wrote:
           | You're probably right, I was in college back then and wanted
           | to work on something cool, this was my idea.
        
         | twic wrote:
         | Similar to an unrolled linked list:
         | https://en.wikipedia.org/wiki/Unrolled_linked_list
         | 
         | But you size the nodes dynamically, rather than using a fixed
         | size.
        
       | kevingadd wrote:
       | I was hoping to see optimization of the actual linked list
       | manipulation and traversal (pipelining? i'm not sure what you'd
       | do), but this is still a neat post. It's cool to see thought put
       | into various parts of the problem, like
       | reallocation/preallocation, stack allocation, etc.
        
         | cogman10 wrote:
         | reallocation/preallocation actually does increase manipulation
         | and traversal performance.
         | 
         | The major slowness of linked lists is cache inconsistency. New
         | nodes can be put all over the memory space. However, if you can
         | make sure all or part of the list exists in contiguous blocks
         | of memory, then there's a good chance that when the CPU loads
         | up the next node it will also grab the next 3 nodes in a cache
         | line.
         | 
         | The closer these node addresses are in memory, the faster
         | things will be.
        
       | vlovich123 wrote:
       | Intrusive linked lists might bring down the allocations further,
       | reduce the memory footprint, & more importantly improve locality
       | when doing pointer chasing (single predictable indirection vs
       | double hard-to-predict indirection).
        
       | kolbe wrote:
       | Linked list benchmarks are amazing.... if you don't thrash your
       | cache on inserts so all its elements are contiguous. You get all
       | the benefits of a vector and a linked list, without the reality
       | that linked lists mostly don't get populated 100% consecutively,
       | and thus can be anywhere in memory.
        
         | bjoli wrote:
         | Most languages with linked lists as an important part (lisps
         | mostly) all have well optimized linked lists that end up with a
         | lot better memory locality.
        
           | wavemode wrote:
           | How does that work? I don't follow how any runtime or
           | compile-time optimization can solve the problem of locality
           | for a linked list. If the data wasn't allocated sequentially,
           | then it's simply not going to be sequential (unless you move
           | it).
        
             | hayley-patton wrote:
             | SBCL tries its hardest to allocate sequentially, then moves
             | lists to be sequential in GC.
        
               | s17n wrote:
               | The entire theoretical advantage of linked lists over
               | vectors (constant time insertion and deletion) comes from
               | not sequentially allocating them.
        
               | screcth wrote:
               | If you have a compacting GC you are going to move the
               | nodes anyway, so why not reallocate them sequentially?
        
               | stonemetal12 wrote:
               | That implies you took the time to sort them, so your O(1)
               | insertion time just became O(N log N). Sure, that cost is
               | spread over how ever many inserts you did between GCs but
               | it isn't O(1) anymore.
        
               | hayley-patton wrote:
               | SBCL cheats and copies starting from the first cons cell
               | that is referenced from outside the list; this tends to
               | be the start of the list, and so traversing just works.
               | The contiguous copying also ends when a cons cell which
               | was already copied is found, so there can't be more
               | discontinuities than there are cons cells referenced from
               | outside the list, regardless of which order we discover
               | the references.
        
               | thefaux wrote:
               | There are other advantages of linked lists over vectors
               | besides those that you've listed.
        
               | twic wrote:
               | I don't think this is what hayley-patton meant (although
               | it's not crystal clear). I think SBCL does memory
               | management with a bump allocator and a moving collector.
               | So if you build a linked list sequentially, the cons
               | cells will be allocated sequentially, and will be
               | sequential in memory. If you build it in random order,
               | they won't be. But when the collector runs, it will move
               | the whole list, and it will move the cells in sequential
               | order, and then it will be.
        
             | Someone wrote:
             | A sufficiently smart compiler can detect that a new node
             | gets allocated and then appended to a list, and allocate
             | the node near the other nodes of that list.
             | 
             | To have a good chance that there is space near the other
             | nodes, you'd have to reserve memory up front for each list
             | you'd want to do that with, though, and that will kill
             | cache locality before that optimization sets in.
             | Corrections welcome, but I don't see that (easily) being a
             | net win. But hey, with a sufficiently smart compiler at
             | hand, you can do wonders. (Edit: when looking at pure
             | functions, it many times may not be that hard to
             | discriminate between "this allocation will be returned" and
             | "this allocation is temporary". For example, if you use
             | _map_ to apply a function to all elements in a list, and
             | that function is pure, detecting that only the return
             | values of that function will be visible after the function
             | returns isn't that hard)
             | 
             | An easier win is that lisp garbage collectors tend to move
             | memory, and thus can try to move nodes that reference each
             | other close together. You get that for free with a
             | semispace collector, but that has quite a few
             | disadvantages, so you don't want to use that. A
             | generational collector also will do it a bit, though (say
             | you create a few million nodes of garbage to build a list
             | of a few hundred results. Then, after generational
             | collection, those few hundred cells will be closer together
             | than before)
        
             | jerf wrote:
             | Instead of thinking of a linked list structurally, think of
             | it functionally. You can have a token that represents a
             | location in the linked list. From that token you have a
             | Fetch() operation that will fetch what is at that location,
             | a Next() operation that will fetch the next token or some
             | indication that you are done, and depending on your
             | language, some sort of insert or append operation
             | (mutability factors in here).
             | 
             | While there is a natural encoding of this process into a
             | pointer to the target value, and a pointer to the next
             | node, it is not the only encoding possible by any means.
             | 
             | A good exercise if you are having trouble with this is to
             | implement a little linked list that performs those
             | operations on something that simply backs to an array,
             | including the blind O(n) copy when appending another list.
             | It should be quite short. But that is not the only
             | alternate implementation, either, and you can easily build
             | others where the interface is maintained but you pay only
             | log n additional factors maximum for operations, easily
             | recovered from the constant factors which on modern
             | hardware are often staggering.
             | 
             | Once you break the entanglement between memory
             | representation and the API a data structure offers in your
             | mind, many ways become obvious as to how to possibly
             | improve this structure while maintaining the same
             | operations, many of which of course already exist and have
             | names.
        
               | Arch485 wrote:
               | Linked lists are, by definition, a value with a pointer
               | to the next. That's what makes it "linked". The API
               | you're describing is the iterator pattern, which indeed
               | can be backed by almost anything (be it a linked list,
               | array, tree, etc.).
        
               | jerf wrote:
               | Both replies as I write this are ignoring the question I
               | was answering. The question was, how can a compiler
               | implement a linked list as anything other than a linked
               | list? The answer is what I gave. Being a compiler, it may
               | also do things like an analysis of the use cases to prove
               | that there are no relevant changes to performance (or
               | that performance only improves) and fall back to a "true"
               | linked list if it can't prove how it is being used, or,
               | being a compiler, it may just tell you to get stuffed and
               | deal with the performance differences. Depends on the
               | choices the compiler author(s) made.
               | 
               | But just because Lisp has cons and cdr does not mean the
               | compiler is obligated to only and exactly use things that
               | structurally look like linked lists in the actual
               | implementation. You need to break the relationship
               | between memory layout and API to understand what is going
               | on here. You may choose later to put them back together;
               | the naive memory layout is often a good one. (Linked
               | lists nowadays are arguably one of the larger exceptions
               | to that, but there are still circumstances even so where
               | that may not be an issue; see Erlang's memory management
               | for instance and ponder how that impacts linked list
               | locality.) But you can't understand what compilers may be
               | doing if you do not realize that there _is_ a
               | distinction.
        
               | wavemode wrote:
               | > Instead of thinking of a linked list structurally,
               | think of it functionally
               | 
               | If your argument is that a linked-list-like data
               | structure can be implemented using something other than a
               | linked list, then I agree with you. Vector tries (for
               | example) are great for that use case.
               | 
               | But a vector trie isn't a linked list, it's a vector
               | trie. As such, it will be faster for some usage patterns,
               | equal for some, and completely degenerate for some
               | others. Just like any other data structure that isn't a
               | linked list.
               | 
               | It wouldn't really be an "optimization" to implement my
               | linked list code with something other than a linked list
               | - it would be rewriting my code.
        
               | kolbe wrote:
               | Every linked list discussion I've ever gotten into ends
               | with someone finding a way to modify the std::list data
               | structure in such a way that totally breaks the
               | definition of the LL data structure. We may as well call
               | it "the law of linked list arguments"
        
             | bjoli wrote:
             | Well, I once wrote a small scheme that did loop unrolling
             | and allocated lists in as many steps as the unroll went,
             | which was capped at a maximum of 5.
             | 
             | That worked surprisingly well, but there were lots of edge
             | cases since I am a professional bassoonist and not a
             | compiler guy.
             | 
             | I generalized it and made all lists allocated in chunks of
             | 16 and then let the GC truncate it. I spent a stupid amount
             | of work doing the first thing and the second thing was done
             | in an afternoon.
             | 
             | Then there are all kinds of
        
             | zelphirkalt wrote:
             | If needed, linked list can be backed by a vector, that is
             | resized, when needed.
        
           | twic wrote:
           | If anyone knows of a detailed writeup of a language
           | implementation like this, please link to it. The thread under
           | this comment is disappointingly full of "a sufficiently smart
           | compiler could ..." type stuff.
        
         | Xeamek wrote:
         | Why cant we 'simply' get processor extension to mark data as
         | pointer so that the prefetcher would actually know what to
         | fetch.
         | 
         | From my understanding this is what led to the recent
         | 'unpatchable' exploit in Apple's M1, but rather then trying to
         | guess it by some heuristic, why not just give compilers option
         | to make that optimization?
        
           | infamouscow wrote:
           | This was the idea of Itanium. It failed mostly because of
           | economics.
           | 
           | It turns out programmers, or rather their employers, don't
           | really care about using hardware efficiency. They care about
           | shipping things yesterday, because that's how business deals
           | get closed, and making software efficient is secondary to
           | money changing hands. Performance never really matters to
           | business people after the checks are cashed.
           | 
           | Multicore computers have been ubiquitous for more than a
           | decade, yet the overwhelming majority of software built today
           | is single-threaded microservices, where in they spend most of
           | their time serializing and deserializing message payloads.
           | 
           | This is all really to say that most performance is already
           | being left on the table for the majority of what computers
           | are used for today.
        
             | Xeamek wrote:
             | I mean sure, I don't doubt 99% of end-user programmers
             | wouldn't look twice at something like this, but compilers
             | designers probably would care.
             | 
             | And it's not like the companies arent trying this idea
             | (again, M1 exploit). But for whatever reason they want to
             | keep cpus as black box, perfect abstract machines, even
             | though we know they aren't
        
             | kolbe wrote:
             | Given how much of today's computer needs are dependent on a
             | database query, this is no surprise. Who cares about the
             | micros you gain with added efficiency while there's a 100ms
             | db query return in the path?
        
               | Xeamek wrote:
               | Apparently Apple and Intel do, since they introduced
               | those changes into their silicon
        
               | kolbe wrote:
               | Not every pipeline involves a db query.
        
               | gpderetta wrote:
               | Where do you think DBs run?
        
             | ComputerGuru wrote:
             | I do want to say that I think the Itanic would have fared
             | way, way better in a post-LLVM world where the importance
             | of smart, optimizing compilers is much more valued and
             | understood and language designers actively work hand-in-
             | hand with compiler devs far more often (with much more
             | significant back-and-forth from hardware manufacturers).
        
               | gpderetta wrote:
               | I don't think LLVM is particularly good at optimizing
               | VLIW code.
               | 
               | Very good optimizing compilers existed before LLVM. Intel
               | had one specifically for Itanium. It wasn't enough.
        
           | Joker_vD wrote:
           | I don't think making CPU issue (likely bogus) pre-fetches for
           | every field in the cache line that's marked as a pointer is
           | really that good idea. At best, you save couple of cycles
           | because the fetches are started a one or two instructions
           | earlier before the actual load instruction for "loading the
           | linked pointer" is issued. At worst, you keep thrashing your
           | cache loading data you're _not_ going to read, delaying
           | fetching the data you _will_ read.
        
             | Xeamek wrote:
             | For everything? Obviously no.
             | 
             | But for all the crazy optimizations modern compiler do, I
             | don't see how marking pointers for more then couple of them
             | in a raw is _that_ crazy
        
           | favorited wrote:
           | Apple Silicon does this. If it prefetches something that
           | looks like a pointer, it will also fetch the pointed-to
           | memory. It's a cool feature, and is especially useful for
           | Apple, since their Objective-C collections only store
           | pointers - but it also can re-open the door for certain
           | timing attacks by violating preconditions of constant-time
           | cryptography algorithms.
           | 
           | https://en.wikipedia.org/wiki/GoFetch
        
         | thefaux wrote:
         | An exception to this is precisely parsing and validation in
         | which linked lists can and ideally should be populated
         | consecutively (using an arena allocator). I agree that general
         | purpose linked lists are rarely optimal but they are hard to
         | beat for parsers where they hit a pareto sweet spot of
         | implementation simplicity and good performance.
        
         | dpc_01234 wrote:
         | This is such an important point.
         | 
         | Benchmarking is hard, and very often is done in ideal
         | conditions, which never materialize in real use cases.
        
       | ILoveQaWolf wrote:
       | this is interesting!
        
       | jkaptur wrote:
       | Another cool aspect of this and (if I understand correctly),
       | where Rust really helps you is that you can explore multiple
       | branches in parallel, since the linked list is immutable.
        
         | eimrine wrote:
         | What if the linked list is cycled or doubly-linked?
        
           | amelius wrote:
           | Then Rust isn't the right tool for the job. Rust is great for
           | tree-like structures which is 99% of what you encounter
           | anyway. Unless you're writing a kernel or something.
        
           | jkaptur wrote:
           | I was talking about this case in particular, where we know
           | the list is basically isomorphic to the call stack.
        
         | ComputerGuru wrote:
         | To be fair, lack of concurrency safety never stopped C and C++
         | devs from boldly doing just that, anyway!
        
       | evmar wrote:
       | The "maybe you don't need a linked list" proposal at the bottom
       | seems significantly better than the options presented in the
       | post:
       | 
       | - almost no cost in the non-erroring path
       | 
       | - no extra data structures to manage
       | 
       | - a lot less code
       | 
       | I think the post would benefit from a better analysis of why this
       | doesn't work for them.
        
         | vouwfietsman wrote:
         | Indeed, also building a linked list over the stack like that is
         | a crafty but very weird design. Keep it simple.
        
         | dmitry_dygalo wrote:
         | Indeed, I agree with your points.
         | 
         | This idea was added after I wrote the post and wasn't taken
         | from my own optimization efforts in `jsonschema`. Originally,
         | in `jsonschema` the output type is actually a badly composed
         | iterator and I intended to simplify it to just a `Result<(),
         | ValidationError>` for the article, but with this output, there
         | are actually way better optimizations than I originally
         | implemented.
         | 
         | If I'd discovered this idea earlier, I'd probably spend more
         | time investigating it.
        
       | duped wrote:
       | It strikes me the bottleneck for this problem isn't Vec or List,
       | it's the serde_json Value type that needs to be used. This is
       | useful for serializing/deserializing values into Rust types but
       | if you're trying to validate JSON against a schema you don't
       | actually need the JSON value data, just the types of the nodes
       | (or more specifically, you only need _some_ of the value data,
       | and probably not much of it, so don 't pay for it when you don't
       | have to).
       | 
       | If you implemented your own parser for the schema and the JSON
       | and only used an AST to validate + span information (which can
       | just be a pair of u16s for start/end of a token) then you can
       | collect your error messages very, very quickly and generate the
       | error message once validation is finished.
       | 
       | Heavily optimized compilers will do this for semantic analysis
       | and type checking, where the IR they use can be constructed
       | quickly and then used with the input text to get helpful messages
       | out, while the guts of the algorithm is only working with
       | semantic information in a structure that's compact and easy to
       | access/manipulate.
       | 
       | All that said, serde_json is incredibly convenient and giving up
       | to write your own parser is a big hammer for a problem that
       | probably doesn't need it.
        
         | EGreg wrote:
         | Just use capn'proto. No deserialization needed !
        
           | aabhay wrote:
           | Whats your experience like using it? Is it ergonomic or does
           | it require you to do lots of type gymnastics?
        
             | cabronerp wrote:
             | This repo has a nice pub/sub implementation based on capnp:
             | https://github.com/commaai/cereal/blob/master/log.capnp
        
         | ComputerGuru wrote:
         | > All that said, serde_json is incredibly convenient and giving
         | up to write your own parser is a big hammer for a problem that
         | probably doesn't need it.
         | 
         | I had a thought in my reply [0] on this that actually might let
         | him eat his cake and have it too in this regard. I think you
         | can heavily abuse serde::de::Visitor to schema validate without
         | actually parsing (or with less parsing, at any rate). I went
         | into more detail in my comment but I wanted to ping you
         | (@duped).
         | 
         | [0]: https://news.ycombinator.com/item?id=40357159
        
         | dmitry_dygalo wrote:
         | That is really cool idea! Thank you
        
         | ww520 wrote:
         | I've seen a lexer/parser scheme that encodes the lexer token
         | type along with the token file location information into a u64
         | integer, something like                   struct Token {
         | token_type: u8,             type_info: u8,
         | token_start: u32,    // offset into the source file.
         | token_len: u16         }
         | 
         | It's blazing fast. The lexer/parser can process millions of
         | lines per second. The textual information is included, and the
         | location information is included.
        
           | acidx wrote:
           | This is roughly what my JSON parser does. It does type-
           | checking, albeit without using JSON-schema, but an object
           | descriptor that you have to define to parse and generate
           | JSON.
           | 
           | It's been developed for embedded systems (it was written
           | originally for a NATS implementation in the Zephyr RTOS), so
           | it's a bit limited and there's no easy way to know where some
           | parsing/type validation error happened, but the information
           | is there if one wants to obtain it: https://github.com/lperei
           | ra/lwan/blob/master/src/samples/tec...
        
       | thewakalix wrote:
       | Wouldn't using push_back prevent the need to reverse the Vec at
       | the end?
        
       | ComputerGuru wrote:
       | Nice post, Dmitry!
       | 
       | Two suggestions: it's not immediately obvious whether subsequent
       | benchmark result tables/rows show deltas from the original
       | approach or from the preceding one (it's the latter, which is
       | more impressive). Maybe call that out the first couple of times?
       | 
       | Second, the "using the single Vec but mutating it" option would
       | presumably benefit from a reserve() or with_capacity() call.
       | Since in that approach you push to the vector in both erroring
       | and non-erroring branches, it doesn't have to be exact (though
       | you could do a bfs search to find maximum depth, that doesn't
       | strike me as a great idea) and could be up to some constant value
       | since a single block memory allocation is cheap in this context.
       | (Additionally, the schema you are validating against defines a
       | minimum depth whereas the default vec has a capacity of zero, so
       | you're _guaranteed_ that it's a bad choice unless you're
       | validating an empty, invalid object.)
       | 
       | But I agree with the sibling comment from @duped that actually
       | parsing to JSON is the biggest bottleneck and simply parsing to
       | the minimum requirements for validation would be far cheaper,
       | although it depends on if you'll be parsing immediately after in
       | case it _isn't_ invalid (make the common case fast, assuming the
       | common case here is absence of errors rather than presence of
       | them) or if you really do just want to validate the schema (which
       | isn't that rare of a requirement, in and of itself).
       | 
       | (Edit: I do wonder if you can still use serde and serde_json but
       | use the deserialize module's `Visitor` trait/impl to
       | "deserialize" to an enum { Success, ValidationError(..) }` so you
       | don't have to write your own parser, get to use the already
       | crazy-optimized serde code, and still avoid actually fully
       | parsing the JSON in order to merely validate it.)
       | 
       | If this were in the real world, I would use a custom slab
       | allocator, possibly from storage on the stack rather than the
       | heap, to back the Vec (and go with the last design with no linked
       | lists whatsoever). But a compromise would be to give something
       | like the mimalloc crate a try!
        
         | dmitry_dygalo wrote:
         | Thanks!
         | 
         | > Two suggestions: it's not immediately obvious whether
         | subsequent benchmark result tables/rows show deltas from the
         | original approach or from the preceding one (it's the latter,
         | which is more impressive). Maybe call that out the first couple
         | of times?
         | 
         | Agree!
         | 
         | > Second, the "using the single Vec but mutating it" option
         | would presumably benefit from a reserve() or with_capacity()
         | call. Since in that approach you push to the vector in both
         | erroring and non-erroring branches, it doesn't have to be exact
         | (though you could do a bfs search to find maximum depth, that
         | doesn't strike me as a great idea) and could be up to some
         | constant value since a single block memory allocation is cheap
         | in this context. (Additionally, the schema you are validating
         | against defines a minimum depth whereas the default vec has a
         | capacity of zero, so you're guaranteed that it's a bad choice
         | unless you're validating an empty, invalid object.)
         | 
         | Oh, this is a cool observation! Indeed it feels like
         | `with_capacity` would help here
         | 
         | > But I agree with the sibling comment from @duped that
         | actually parsing to JSON is the biggest bottleneck and simply
         | parsing to the minimum requirements for validation would be far
         | cheaper, although it depends on if you'll be parsing
         | immediately after in case it isn't invalid (make the common
         | case fast, assuming the common case here is absence of errors
         | rather than presence of them) or if you really do just want to
         | validate the schema (which isn't that rare of a requirement, in
         | and of itself).
         | 
         | My initial assumption was that usually the input is already
         | parsed. E.g. validating incoming data inside an API endpoint
         | which is then passed somewhere else in the same representation.
         | But I think that is a fair use case too and I was actually
         | thinking of implementing it at some point via a generic `Json`
         | trait which does not imply certain representation.
         | 
         | > (Edit: I do wonder if you can still use serde and serde_json
         | but use the deserialize module's `Visitor` trait/impl to
         | "deserialize" to an enum { Success, ValidationError(..) }` so
         | you don't have to write your own parser, get to use the already
         | crazy-optimized serde code, and still avoid actually fully
         | parsing the JSON in order to merely validate it.)
         | 
         | Now when I read the details, it feels like a really really cool
         | idea!
         | 
         | > If this were in the real world, I would use a custom slab
         | allocator, possibly from storage on the stack rather than the
         | heap, to back the Vec (and go with the last design with no
         | linked lists whatsoever). But a compromise would be to give
         | something like the mimalloc crate a try!
         | 
         | Nice! In the original `jsonschema` implementation the
         | `validate` function returns `Result<(), ErrorIter>` which makes
         | it more complex to apply that approach, but I think it still
         | should be possible.
        
       | tomck wrote:
       | This article is disingenuous with its Vec benchmark. Each call to
       | `validate` creates a new Vec, but that means you allocate + free
       | the vec for _each_ validation. Why not store the vec on the
       | validator to reuse the allocation? Why not mention this in the
       | article, i had to dig in the git history to find out whether the
       | vec was getting reallocated. This feels like you had a cool
       | conclusion for your article,  'linked lists faster than vec', but
       | you had to engineer the vec example to be worse. Maybe I'm being
       | cynical.
       | 
       | It would be interesting to see the performance of a `Vec<&str>`
       | where you reuse the vector, but also a `Vec<u8>` where you copy
       | the path bytes directly into the vector and don't bother doing
       | any pointer traversals. The example path sections are all very
       | small - 'inner', 'another', 5 bytes, 7 bytes - less than the
       | length of a pointer! storing a whole `&str` is 16 bytes per
       | element and then you have to rebuild it again anyway in the
       | invalid case.
       | 
       | ---
       | 
       | This whole article is kinda bad, it's titled 'blazingly fast
       | linked lists' which gives it some authority but the approach is
       | all wrong. Man, be responsible if you're choosing titles like
       | this. Someone's going to read this and assume it's a reasonable
       | approach, but the entire section with Vec is bonkers.
       | 
       | Why are we designing 'blazingly fast' algorithms with rust
       | primitives rather than thinking about where the data needs to go
       | first? Why are we even considering vector clones or other crazy
       | stuff? The thought process behind the naive approach and step 1
       | is insane to me:
       | 
       | 1. i need to track some data that will grow and shrink like a
       | stack, so my solution is to copy around an immutable Vec (???)
       | 
       | 2. this is really slow for obvious reasons, how about we: pull in
       | a whole new dependency ('imbl') that attempts to optimize for the
       | general case using complex trees (???????????????)
       | 
       | You also mention:
       | 
       | > In some scenarios, where modifications occur way less often
       | than clones, you can consider using Arc as explained in this
       | video
       | 
       | I understand you're trying to be complete, but 'some scenarios'
       | is doing a lot of work here. An Arc<[T]> approach is _literally_
       | just the same as the naive approach _but_ with extra atomic
       | refcounts! Why mention it in this context?
       | 
       | You finally get around to mutating the vector + using it like a
       | stack, but then comment:
       | 
       | > However, this approach requires more bookkeeping and somewhat
       | more lifetime annotations, which can increase code complexity.
       | 
       | I have no idea why you mention 'code complexity' here (complexity
       | introduced _by rust_ and its lifetimes), but fail to mention how
       | adding a dependency on  'imbl' is a negative.
        
         | thefaux wrote:
         | > how about we: pull in a whole new dependency ('imbl') that
         | attempts to optimize for the general case using complex trees
         | (???????????????)
         | 
         | To me this is a self answering question.
        
         | dmitry_dygalo wrote:
         | > This article is disingenuous with its Vec benchmark. Each
         | call to `validate` creates a new Vec, but that means you
         | allocate + free the vec for each validation. Why not store the
         | vec on the validator to reuse the allocation? Why not mention
         | this in the article, i had to dig in the git history to find
         | out whether the vec was getting reallocated.
         | 
         | The idea comes back to [0] which is similar to one of the steps
         | in the article, and before adding `push` & `pop` I just cloned
         | it to make things work. That's what Rust beginners do.
         | 
         | > This feels like you had a cool conclusion for your article,
         | 'linked lists faster than vec', but you had to engineer the vec
         | example to be worse. Maybe I'm being cynical.
         | 
         | Maybe from today's point in time, I'd think the same.
         | 
         | > It would be interesting to see the performance of a
         | `Vec<&str>` where you reuse the vector, but also a `Vec<u8>`
         | where you copy the path bytes directly into the vector and
         | don't bother doing any pointer traversals. The example path
         | sections are all very small - 'inner', 'another', 5 bytes, 7
         | bytes - less than the length of a pointer! storing a whole
         | `&str` is 16 bytes per element and then you have to rebuild it
         | again anyway in the invalid case.
         | 
         | Yeah, that makes sense to try!
         | 
         | > This whole article is kinda bad, it's titled 'blazingly fast
         | linked lists' which gives it some authority but the approach is
         | all wrong. Man, be responsible if you're choosing titles like
         | this. Someone's going to read this and assume it's a reasonable
         | approach, but the entire section with Vec is bonkers.
         | 
         | > Why are we designing 'blazingly fast' algorithms with rust
         | primitives rather than thinking about where the data needs to
         | go first? Why are we even considering vector clones or other
         | crazy stuff? The thought process behind the naive approach and
         | step 1 is insane to me:
         | 
         | > 1. i need to track some data that will grow and shrink like a
         | stack, so my solution is to copy around an immutable Vec (???)
         | 
         | > 2. this is really slow for obvious reasons, how about we:
         | pull in a whole new dependency ('imbl') that attempts to
         | optimize for the general case using complex trees
         | (???????????????)
         | 
         | That's clickbait-y, though none of the article's ideas aim to
         | be a silver bullet. I mean, there are admittedly dumb ideas in
         | the article, though I won't believe that somebody would come up
         | with a reasonable solution without trying something stupid
         | first. However, I might have used better wording to highlight
         | that and mention that I've come up with some of these ideas
         | when was working on `jsonschema` in the past.
         | 
         | > I understand you're trying to be complete, but 'some
         | scenarios' is doing a lot of work here. An Arc<[T]> approach is
         | literally just the same as the naive approach but with extra
         | atomic refcounts! Why mention it in this context?
         | 
         | If you don't need to mutate the data and need to store it in
         | some other struct, it might be useful, i.e. just to have cheap
         | clones. But dang, that indeed is a whole different story.
         | 
         | > I have no idea why you mention 'code complexity' here
         | (complexity introduced by rust and its lifetimes), but fail to
         | mention how adding a dependency on 'imbl' is a negative.
         | 
         | Fair. Adding `imbl` wasn't a really good idea for this context
         | at all.
         | 
         | Overall I think what you say is kind of fair, but I think that
         | our perspectives on the goals of the article are quite
         | different (which does not disregard the criticism).
         | 
         | Thank you for taking the time and answer!
         | 
         | - [0] - https://github.com/Stranger6667/jsonschema-
         | rs/commit/1a1c6c3...
        
         | ww520 wrote:
         | Yes, when I saw the immutable path is cloned and appended a
         | key, I knew the benchmark was a strawman.
        
       | torusle wrote:
       | > Linked lists are taught as fundamental data structures in
       | programming courses, but they are more commonly encountered in
       | tech interviews than in real-world projects.
       | 
       | I beg to disagree.
       | 
       | In kernels, drivers, and embedded systems they are _very_ common.
        
         | ComputerGuru wrote:
         | Really only because they're so goddamn easy. I find myself
         | using linked lists a lot less since adopting rust for embedded
         | code (even with no_std and no allocator, but especially when
         | alloc-only std data structures are within reach).
        
         | saghm wrote:
         | Most people who take data structures courses or perform tech
         | interviews don't end up working on kernels, drivers, or
         | embedded systems though. To me, it sounds like the point being
         | made is that there are a large number of programmers who have
         | learned about linked lists but haven't run into many cases
         | where they needed them in the world world, and I think it's
         | accurate.
        
           | dmitry_dygalo wrote:
           | This was my intention
        
             | SoftTalker wrote:
             | Agree, I can't recall using anything more complicated than
             | lists/arrays or hash tables (key/value stores) in practice,
             | in many years of (mostly web application) programming. And
             | even those I'm not coding from scratch, I'm using classes
             | or functions that my programming language gives me. For
             | anything more complicated than that, I'm using a database,
             | which of course is using many data structures under the
             | covers but I don't directly touch those.
        
               | sumtechguy wrote:
               | I used to use them all the time. However, now? I would be
               | hard pressed to not use one of the many built in
               | vector/list/dict/hash items in many languages now. I
               | would have to be truly doing something very low level or
               | for speed to use one.
        
               | josephg wrote:
               | As a counterpoint, I've been working on collaborative
               | text editing. I ended up implementing a custom b-tree
               | because we needed a few features that I couldn't find in
               | any off the shelf library:
               | 
               | - My values represent runs of characters in the document.
               | 
               | - Inserts in the tree may split a run.
               | 
               | - Runs have a size - 0 if the run is marked as deleted or
               | the number of characters otherwise. The size changes as
               | we process edits
               | 
               | - Every inserted character has an ID. I need to be able
               | to look up any run by its ID, and then edit it.
               | (Including querying the run's current position in the
               | tree and editing the run's size).
               | 
               | It's an interesting data structure problem, and it took a
               | few weeks to have a good solution (and a few weeks more
               | later rewriting it in safe rust & improving performance
               | in the process).
               | 
               | I love this stuff. I think it's pretty rare to find a
               | reason to code your own collection types these days, but
               | it certainly comes up from time to time!
        
           | samatman wrote:
           | You need a linked list to write hello world in any Lisp,
           | though.
           | 
           | Seems like the glaring exception to the rule!
        
         | ziddoap wrote:
         | > _In kernels, drivers, and embedded systems they are very
         | common._
         | 
         | Out of all the programmers in the world, what percentage of
         | them do you think work in the kernel/driver/embedded spaces?
        
           | 9659 wrote:
           | 1%
        
             | coldtea wrote:
             | More like 0.01% -- if we consider enterprise programmers,
             | web programmers, and application/game programmers which I'd
             | expect to be the largest groups...
        
           | sfink wrote:
           | First, my only guess is that everyone's guesses are going to
           | be wildly wrong. People who work in such spaces will greatly
           | overestimate. People who don't will greatly underestimate.
           | (This is mostly due to how many comments I've read on HN that
           | implicitly assume that most people's problems and
           | perspectives are the same as the commenter's.)
           | 
           | Second, linked lists are useful in a lot more places than
           | that. Probably a better proxy would be low-level coders. You
           | almost always want a linked list _somewhere_ when you 're
           | dealing with memory addresses and pointers. Maybe not for the
           | primary collections, but there are always secondary ones that
           | need to maintain a collection with a different membership or
           | ordering, and vectors of pointers don't have many clear
           | advantages over intrusive linked lists for those secondary
           | collections.
        
             | josephg wrote:
             | Yeah intrusive collections in C is the biggest use I've
             | seen. I played with a physics engine a few years ago
             | (chipmunk2d) which made heavy use of intrusive linked lists
             | to store all the objects in the world model. I suspect
             | there's some clever data structures out there that might
             | have better performance, but the intrusive linked list
             | approach was simple and fast!
        
         | igammarays wrote:
         | Why? Why would someone reach for a linked list in a kernel,
         | driver, or embedded system?
        
           | dmitry_dygalo wrote:
           | Easier to avoid allocation errors, e.g. in the Linux kernel.
           | I think Alice Ryhl mentioned it here -
           | https://www.youtube.com/watch?v=CEznkXjYFb4
        
             | SJC_Hacker wrote:
             | How do linked list prevent allocation errors? If anything
             | it would seem to make them worse.
             | 
             | My experience in embedded, everything is hardcoded as a
             | compile time constant, including fixed size arrays (or
             | vectors of a fixed capacity)
        
               | sratner wrote:
               | In more complex embedded software you are likely to see
               | free lists used to manage pools of preallocated resources
               | (like event structs etc) or pools of fixed sized memory
               | buffers.
        
               | sfink wrote:
               | Intrusive linked lists eliminate the allocation entirely.
               | With a vector<Obj>, you have the Obj allocation and then
               | potential vector-related reallocations. With an intrusive
               | linked list, you only have the Obj allocation. So your
               | code that adds/removes list entries does no additional
               | allocation at all, it reuses a pointer or two that was
               | allocated as part of the original Obj allocation. Often
               | the list manipulation happens at a time when allocation
               | failures are inconvenient or impossible to handle.
        
               | torusle wrote:
               | In embedded, you often need message queues.
               | 
               | A common way to implement these is to have an array of
               | messages, sized for the worst case scenario and use this
               | as the message pool.
               | 
               | You keep the unused messages in a single linked "free-
               | list", and keep the used messages in a double linked
               | queue or fifo structure.
               | 
               | That way you get O(1) allocation, de-allocation, enqueue
               | and dequeue operations for your message queue.
               | 
               | Another example for this paradigm are job queues. You
               | might have several actuators or sensors connected to a
               | single interface and want to talk to them. The high level
               | "business" logic enqueues such jobs and an interrupt
               | driven logic works on these jobs in the background, aka
               | interrupts.
               | 
               | And because you only move some pointers around for each
               | of these operations it is perfectly fine to do so in
               | interrupt handlers.
               | 
               | What you really want to avoid is to move kilobytes of
               | data around. That quickly leads to missing other
               | interrupts in time.
        
           | sratner wrote:
           | No memory allocation/reallocation, preallocated resources
           | managed in e.g. a free list. Also for things like packetized
           | networks, lists are handy for filling as you progress down
           | the stack while using fixed sized packet buffers, or
           | reassembling fragments.
           | 
           | In embedded world, memory often needs to be exactly
           | controlled, and allocation failures are fatal without a more
           | complex MMU. In kernel world, I believe the main reason is
           | that allocations can block.
        
           | kevingadd wrote:
           | Intrusive lists are really powerful for those kinds of
           | scenarios, and technically are linked lists. They're widely
           | used in the kernel, IIRC.
        
           | akira2501 wrote:
           | O(n) iteration but pretty much guaranteed O(1) for every
           | other operation. If that's the semantic you need, then linked
           | lists are your friend.
        
           | cyberax wrote:
           | In kernels, it's usually hard to get general-purpose
           | allocation working reliably in all contexts. And you need
           | that for resizable vectors. With lists, you just need to be
           | able to grab an element-sized block. Quite often, it's even
           | done with the memory page granularity.
           | 
           | In addition, a lot of data structures might be shared across
           | multiple cores. Linked lists can be traversed and mutated
           | concurrently (although with a bit of care).
        
             | dist1ll wrote:
             | I wonder how much of that is due to the kernel history, and
             | the influence of C idioms, and not because of some inherent
             | design superiority.
             | 
             | I'd be convinced once I see pure Rust kernels geared
             | towards modern machines suddenly using linked lists
             | everywhere. Otherwise I'm leaning towards it being a side-
             | effect of the language choice and culture.
             | 
             | Also because I've seen the same kind of reasoning applied
             | to compilers (e.g. "of course you need linked lists in
             | compilers, they are extremely graph traversal heavy"). But
             | one look at modern compilers implemented in Rust paint a
             | very different picture, with index-based vectors, data-
             | oriented design and flattened ASTs everywhere.
        
           | vineyardlabs wrote:
           | Any time you have a computer interacting with the outside
           | world in an asynchronous fashion you basically have to have
           | some form of buffering which takes the form of a queue/fifo.
           | A linked list is the most performant/natural way of modeling
           | a queue in our ubiquitous computing infrastructure.
           | 
           | I/e in a DMA-based ethernet driver, the ethernet MAC receives
           | packets asynchronously from the processor, perhaps faster
           | than the processor can ingest them. So the mac interrupts the
           | processor to give it new packets, and the processor can't sit
           | processing the packets in the interrupt context, so it needs
           | to put them into some ordered list for processing later when
           | it has downtime. In a true embedded system, the memory for
           | this list is going to be fixed or statically allocated, but
           | you still don't really want to have an array-style list with
           | fixed indexing, as you'll have to manage what happens when
           | the index wraps around back to 0 etc, so instead you just
           | construct a linked list in that pre-allocated memory.
           | 
           | I wouldn't say linked lists aren't really used in high-level
           | applications, as I said they're used all over the place
           | whenever you have external asynchronous communication, it's
           | just that modern high-level frameworks/libs totally abstract
           | this away from most people writing high level code.
        
         | dmitry_dygalo wrote:
         | I'd say most developers don't write kernels/drivers or embeds,
         | at least from what I've seen. I am not saying that there are
         | not many devs like this, but rather that there are fewer kernel
         | devs than web devs.
        
         | wesnerm2 wrote:
         | Linked lists were heavily used in application software before
         | the appearance of standard libraries and Java, which is when
         | dynamically sizable array-based lists become common. There also
         | wasn't a gap between the performance of linked lists and arrays
         | before CPU became significantly faster than RAM.
        
         | TheCondor wrote:
         | There are plenty of good uses for linked list and their
         | variants. Like LRU lists come to mind; I couldn't bet that it's
         | the most efficient way to implement them but they're pretty
         | darn good. Then obviously things like breadth first search need
         | a type of queue data structure. It often can come down to
         | memory pressure, if you've got Gigs to spare, then allocating a
         | contiguous block of memory for a list of something isn't a big
         | deal, if memory is tight and maybe fragmented, linked lists can
         | and will get it done. They have their places.
         | 
         | I did start to encounter some fresh grads with degrees that
         | said "computer science" on them that couldn't answer some basic
         | linked list questions. I was beginning to think it was a bad
         | set of questions until I hit those kids. If you claim to know
         | "computer science" and don't know what a linked list is,
         | especially beyond some text books stuff, I'm probably not
         | interested.
        
         | davexunit wrote:
         | I don't do any of those things and I still use lists
         | constantly. Kinda strange to learn that many others don't use
         | them at all it seems.
        
           | nequo wrote:
           | What kinds of things are you using them for usually? Is it
           | mostly in C/C++?
        
         | waynesonfire wrote:
         | linked lists shine when you can perform a O(1) remove operation
         | if you have a reference to an object on the list. This is very
         | common when using C structs and not possible in Java for
         | example.
        
           | adgjlsfhk1 wrote:
           | these cases are usually cases where you want to use a (hash)
           | Set. If you're Ok changing everyone's indexing, the indexing
           | didn't matter.
        
       | sfink wrote:
       | Linked lists get a bum rap.
       | 
       | Yes, if you have a simple choice between a vector and a linked
       | list, then the vector is vastly superior due to locality and (for
       | non-intrusive linked lists) allocations. So much so that vectors
       | often win even when you're doing lots of O(n) deletions that
       | would be O(1) with a linked list.
       | 
       | But that doesn't mean that linked lists are useless! A vector
       | gives you a single ordering. What if you need multiple? What if
       | you need a free list, which you're never going to be iterating
       | over but will just grab off an item at a time? I find it quite
       | common to have one "natural" order, for which I will use a vector
       | (or equivalently, a bump allocator of fixed-size items), and then
       | one or several auxiliary orders like the entries belonging to a
       | particular owner or the ones that will need to be visited for
       | some sort of cleanup or an undo list or some sort of stack or
       | queue of pending items to process. Your common iteration will be
       | in memory order, but that doesn't mean you won't ever want to do
       | different iterations.
       | 
       | It annoys me that this is always omitted, with the attitude that
       | linked lists are obsolete and useless because vectors be moar
       | better faster gooder, to the point that it's a waste of time to
       | learn how to manipulate linked lists anymore.
       | 
       | I guess a lot of this is probably due to the popularity of high
       | level languages, where you're just dealing with references to
       | everything in the first place. But in those, the arguments in
       | favor of vectors are often not valid because that blessed golden
       | vector of Objects is giving you no more locality than giving your
       | Object a `next` field: the vector of Objects is represented
       | internally as a vector of pointers to the actual Object data, so
       | your oh so nicely cached lookups are doing memory reads that are
       | 100% pure _overhead_ compared to following a `next` link from an
       | Object that fits into a cache line. In both cases, your
       | performance is going to be limited by the locality of your Object
       | data, which is the same whether you have a vector of pointers or
       | Object data with an intrusive pointer.
       | 
       | Also, if you have an existing setup and need to introduce a new
       | list, it is sometimes far easier to drop in a `next` (and maybe
       | `prev`) field than to refactor everything to accommodate a new
       | vector. Especially since the vector will move all of your data
       | when resizing the vector, which invalidates any pointers you
       | might be using for other purposes. If you'll be iterating that
       | list frequently, then the vector may very well be a good idea. If
       | it's just for error cases or slow paths, then linked lists really
       | aren't bad at all.
       | 
       | I'm not trying to argue _for_ linked lists here so much as
       | arguing against the blanket arguments against them.
       | 
       | </rant>
        
         | PartiallyTyped wrote:
         | > What if you need multiple?
         | 
         | You can very much have a single vector owning the memory, and
         | do all other ordering over auxiliary vectors of indices. Should
         | be cheaper and faster than holding more linked lists.
         | 
         | If you want to remove elements, you can very much use
         | tombstones to flag deleted elements and then clean up after
         | some threshold.
        
           | sfink wrote:
           | > You can very much have a single vector owning the memory,
           | and do all other ordering over auxiliary vectors of indices.
           | Should be cheaper and faster than holding more linked lists.
           | 
           | Cheaper in space depends on the percentage of elements in the
           | auxiliary list. An intrusive list has space for every element
           | to be in the list, which is wasteful if few are. A vector
           | that grows by doubling could waste nearly half of its
           | elements. Indexes can often be smaller than pointers, though,
           | which favors the vector approach.
           | 
           | Faster is debatable. Iterating the vector of indexes is quite
           | fast, but indirecting into the data in a random order is
           | still slow. An intrusive linked list doesn't need to do the
           | former, only the latter. (Then again, it also bloats up the
           | data slightly, which matters for small items since fewer fit
           | in cache.)
           | 
           | The main reason why linked lists could still be at an
           | advantage is if you don't want allocations in your auxiliary
           | list manipulation path. Maybe you don't want the
           | unpredictable timing, or maybe you can't handle failure.
           | 
           | I agree on tombstoning, but note that you're giving up some
           | of the vector advantages by doing so. Your processing loop
           | now has a much less predictable branch in it. (Though really,
           | the vector probably still wins here, since the linked list is
           | _built_ out of data dependent accesses!)
           | 
           | Sometimes these auxiliary lists need to contain things that
           | are no longer in the main list, too, as in the case of a free
           | list. (And if you swap such things to the end, then you've
           | just invalidated all of your indexes.) And non-intrusive
           | linked lists can point to variable-size elements (though they
           | lose most of the other benefits of an intrusive linked list.)
           | 
           | Anyway, it's the usual "use the right tool for the right
           | job", I just claim that linked lists are sometimes the right
           | tool.
           | 
           | (Random side note: I use "indexes" instead of "indices" these
           | days, for a silly reason--"indices" is a fine word, I have no
           | issue with it, but it seems to encourage some people to use
           | the non-word "indice" which is an abomination.)
        
       | osigurdson wrote:
       | Link lists can move items in O(1) but their O(N) search can be
       | bad because of all of the cache line misses.
        
       ___________________________________________________________________
       (page generated 2024-05-14 23:01 UTC)