[HN Gopher] Fast linked lists
___________________________________________________________________
Fast linked lists
Author : dmitry_dygalo
Score : 167 points
Date : 2024-05-14 13:51 UTC (9 hours ago)
(HTM) web link (dygalo.dev)
(TXT) w3m dump (dygalo.dev)
| ertucetin wrote:
| 7-8 years ago I created GlueList
| (https://github.com/ertugrulcetin/GlueList) in order to build
| faster version of LinkedList + ArrayList. It was a fun effort.
| boxed wrote:
| I believe that's called a "rope"
| vbezhenar wrote:
| Rope is binary tree of arrays, for O(logN) index operation.
| amelius wrote:
| It was fun because you didn't write it in Rust :)
|
| https://rust-unofficial.github.io/too-many-lists/
| duped wrote:
| If I have this right, what you've built is this, storing M
| items in M / N nodes where N is the radix of the array?
| 0 1 M / Nth node
| [ N elem ] [.] -> [N elem] [.] -> .... -> [M - (M / N) elem]
| [null]
|
| And so if you want to index the `i`th element you chase the
| pointers index (node i) : acc = 0
| while acc + N < i acc += N node =
| node.next return node.array[j - acc]
|
| Or something like that? You can do better using a B tree and
| exploit the fact that you don't need keys to just store
| pointers in the branch nodes. This reduces the number of
| pointer dereferences in large arrays.
|
| For example say you have N = 8 and there are 1024 elements in
| the list, you would need 127 pointer dereferences to reach the
| end of the array. In a B-tree you only have 3. (double check my
| math, I might be off by one).
|
| This is the typical implementation for immutable/persistent
| vector types. If you keep the links between leave nodes at the
| bottom of the tree, you've got a B+ tree and have fast
| iteration through it as well.
| ertucetin wrote:
| You're probably right, I was in college back then and wanted
| to work on something cool, this was my idea.
| twic wrote:
| Similar to an unrolled linked list:
| https://en.wikipedia.org/wiki/Unrolled_linked_list
|
| But you size the nodes dynamically, rather than using a fixed
| size.
| kevingadd wrote:
| I was hoping to see optimization of the actual linked list
| manipulation and traversal (pipelining? i'm not sure what you'd
| do), but this is still a neat post. It's cool to see thought put
| into various parts of the problem, like
| reallocation/preallocation, stack allocation, etc.
| cogman10 wrote:
| reallocation/preallocation actually does increase manipulation
| and traversal performance.
|
| The major slowness of linked lists is cache inconsistency. New
| nodes can be put all over the memory space. However, if you can
| make sure all or part of the list exists in contiguous blocks
| of memory, then there's a good chance that when the CPU loads
| up the next node it will also grab the next 3 nodes in a cache
| line.
|
| The closer these node addresses are in memory, the faster
| things will be.
| vlovich123 wrote:
| Intrusive linked lists might bring down the allocations further,
| reduce the memory footprint, & more importantly improve locality
| when doing pointer chasing (single predictable indirection vs
| double hard-to-predict indirection).
| kolbe wrote:
| Linked list benchmarks are amazing.... if you don't thrash your
| cache on inserts so all its elements are contiguous. You get all
| the benefits of a vector and a linked list, without the reality
| that linked lists mostly don't get populated 100% consecutively,
| and thus can be anywhere in memory.
| bjoli wrote:
| Most languages with linked lists as an important part (lisps
| mostly) all have well optimized linked lists that end up with a
| lot better memory locality.
| wavemode wrote:
| How does that work? I don't follow how any runtime or
| compile-time optimization can solve the problem of locality
| for a linked list. If the data wasn't allocated sequentially,
| then it's simply not going to be sequential (unless you move
| it).
| hayley-patton wrote:
| SBCL tries its hardest to allocate sequentially, then moves
| lists to be sequential in GC.
| s17n wrote:
| The entire theoretical advantage of linked lists over
| vectors (constant time insertion and deletion) comes from
| not sequentially allocating them.
| screcth wrote:
| If you have a compacting GC you are going to move the
| nodes anyway, so why not reallocate them sequentially?
| stonemetal12 wrote:
| That implies you took the time to sort them, so your O(1)
| insertion time just became O(N log N). Sure, that cost is
| spread over how ever many inserts you did between GCs but
| it isn't O(1) anymore.
| hayley-patton wrote:
| SBCL cheats and copies starting from the first cons cell
| that is referenced from outside the list; this tends to
| be the start of the list, and so traversing just works.
| The contiguous copying also ends when a cons cell which
| was already copied is found, so there can't be more
| discontinuities than there are cons cells referenced from
| outside the list, regardless of which order we discover
| the references.
| thefaux wrote:
| There are other advantages of linked lists over vectors
| besides those that you've listed.
| twic wrote:
| I don't think this is what hayley-patton meant (although
| it's not crystal clear). I think SBCL does memory
| management with a bump allocator and a moving collector.
| So if you build a linked list sequentially, the cons
| cells will be allocated sequentially, and will be
| sequential in memory. If you build it in random order,
| they won't be. But when the collector runs, it will move
| the whole list, and it will move the cells in sequential
| order, and then it will be.
| Someone wrote:
| A sufficiently smart compiler can detect that a new node
| gets allocated and then appended to a list, and allocate
| the node near the other nodes of that list.
|
| To have a good chance that there is space near the other
| nodes, you'd have to reserve memory up front for each list
| you'd want to do that with, though, and that will kill
| cache locality before that optimization sets in.
| Corrections welcome, but I don't see that (easily) being a
| net win. But hey, with a sufficiently smart compiler at
| hand, you can do wonders. (Edit: when looking at pure
| functions, it many times may not be that hard to
| discriminate between "this allocation will be returned" and
| "this allocation is temporary". For example, if you use
| _map_ to apply a function to all elements in a list, and
| that function is pure, detecting that only the return
| values of that function will be visible after the function
| returns isn't that hard)
|
| An easier win is that lisp garbage collectors tend to move
| memory, and thus can try to move nodes that reference each
| other close together. You get that for free with a
| semispace collector, but that has quite a few
| disadvantages, so you don't want to use that. A
| generational collector also will do it a bit, though (say
| you create a few million nodes of garbage to build a list
| of a few hundred results. Then, after generational
| collection, those few hundred cells will be closer together
| than before)
| jerf wrote:
| Instead of thinking of a linked list structurally, think of
| it functionally. You can have a token that represents a
| location in the linked list. From that token you have a
| Fetch() operation that will fetch what is at that location,
| a Next() operation that will fetch the next token or some
| indication that you are done, and depending on your
| language, some sort of insert or append operation
| (mutability factors in here).
|
| While there is a natural encoding of this process into a
| pointer to the target value, and a pointer to the next
| node, it is not the only encoding possible by any means.
|
| A good exercise if you are having trouble with this is to
| implement a little linked list that performs those
| operations on something that simply backs to an array,
| including the blind O(n) copy when appending another list.
| It should be quite short. But that is not the only
| alternate implementation, either, and you can easily build
| others where the interface is maintained but you pay only
| log n additional factors maximum for operations, easily
| recovered from the constant factors which on modern
| hardware are often staggering.
|
| Once you break the entanglement between memory
| representation and the API a data structure offers in your
| mind, many ways become obvious as to how to possibly
| improve this structure while maintaining the same
| operations, many of which of course already exist and have
| names.
| Arch485 wrote:
| Linked lists are, by definition, a value with a pointer
| to the next. That's what makes it "linked". The API
| you're describing is the iterator pattern, which indeed
| can be backed by almost anything (be it a linked list,
| array, tree, etc.).
| jerf wrote:
| Both replies as I write this are ignoring the question I
| was answering. The question was, how can a compiler
| implement a linked list as anything other than a linked
| list? The answer is what I gave. Being a compiler, it may
| also do things like an analysis of the use cases to prove
| that there are no relevant changes to performance (or
| that performance only improves) and fall back to a "true"
| linked list if it can't prove how it is being used, or,
| being a compiler, it may just tell you to get stuffed and
| deal with the performance differences. Depends on the
| choices the compiler author(s) made.
|
| But just because Lisp has cons and cdr does not mean the
| compiler is obligated to only and exactly use things that
| structurally look like linked lists in the actual
| implementation. You need to break the relationship
| between memory layout and API to understand what is going
| on here. You may choose later to put them back together;
| the naive memory layout is often a good one. (Linked
| lists nowadays are arguably one of the larger exceptions
| to that, but there are still circumstances even so where
| that may not be an issue; see Erlang's memory management
| for instance and ponder how that impacts linked list
| locality.) But you can't understand what compilers may be
| doing if you do not realize that there _is_ a
| distinction.
| wavemode wrote:
| > Instead of thinking of a linked list structurally,
| think of it functionally
|
| If your argument is that a linked-list-like data
| structure can be implemented using something other than a
| linked list, then I agree with you. Vector tries (for
| example) are great for that use case.
|
| But a vector trie isn't a linked list, it's a vector
| trie. As such, it will be faster for some usage patterns,
| equal for some, and completely degenerate for some
| others. Just like any other data structure that isn't a
| linked list.
|
| It wouldn't really be an "optimization" to implement my
| linked list code with something other than a linked list
| - it would be rewriting my code.
| kolbe wrote:
| Every linked list discussion I've ever gotten into ends
| with someone finding a way to modify the std::list data
| structure in such a way that totally breaks the
| definition of the LL data structure. We may as well call
| it "the law of linked list arguments"
| bjoli wrote:
| Well, I once wrote a small scheme that did loop unrolling
| and allocated lists in as many steps as the unroll went,
| which was capped at a maximum of 5.
|
| That worked surprisingly well, but there were lots of edge
| cases since I am a professional bassoonist and not a
| compiler guy.
|
| I generalized it and made all lists allocated in chunks of
| 16 and then let the GC truncate it. I spent a stupid amount
| of work doing the first thing and the second thing was done
| in an afternoon.
|
| Then there are all kinds of
| zelphirkalt wrote:
| If needed, linked list can be backed by a vector, that is
| resized, when needed.
| twic wrote:
| If anyone knows of a detailed writeup of a language
| implementation like this, please link to it. The thread under
| this comment is disappointingly full of "a sufficiently smart
| compiler could ..." type stuff.
| Xeamek wrote:
| Why cant we 'simply' get processor extension to mark data as
| pointer so that the prefetcher would actually know what to
| fetch.
|
| From my understanding this is what led to the recent
| 'unpatchable' exploit in Apple's M1, but rather then trying to
| guess it by some heuristic, why not just give compilers option
| to make that optimization?
| infamouscow wrote:
| This was the idea of Itanium. It failed mostly because of
| economics.
|
| It turns out programmers, or rather their employers, don't
| really care about using hardware efficiency. They care about
| shipping things yesterday, because that's how business deals
| get closed, and making software efficient is secondary to
| money changing hands. Performance never really matters to
| business people after the checks are cashed.
|
| Multicore computers have been ubiquitous for more than a
| decade, yet the overwhelming majority of software built today
| is single-threaded microservices, where in they spend most of
| their time serializing and deserializing message payloads.
|
| This is all really to say that most performance is already
| being left on the table for the majority of what computers
| are used for today.
| Xeamek wrote:
| I mean sure, I don't doubt 99% of end-user programmers
| wouldn't look twice at something like this, but compilers
| designers probably would care.
|
| And it's not like the companies arent trying this idea
| (again, M1 exploit). But for whatever reason they want to
| keep cpus as black box, perfect abstract machines, even
| though we know they aren't
| kolbe wrote:
| Given how much of today's computer needs are dependent on a
| database query, this is no surprise. Who cares about the
| micros you gain with added efficiency while there's a 100ms
| db query return in the path?
| Xeamek wrote:
| Apparently Apple and Intel do, since they introduced
| those changes into their silicon
| kolbe wrote:
| Not every pipeline involves a db query.
| gpderetta wrote:
| Where do you think DBs run?
| ComputerGuru wrote:
| I do want to say that I think the Itanic would have fared
| way, way better in a post-LLVM world where the importance
| of smart, optimizing compilers is much more valued and
| understood and language designers actively work hand-in-
| hand with compiler devs far more often (with much more
| significant back-and-forth from hardware manufacturers).
| gpderetta wrote:
| I don't think LLVM is particularly good at optimizing
| VLIW code.
|
| Very good optimizing compilers existed before LLVM. Intel
| had one specifically for Itanium. It wasn't enough.
| Joker_vD wrote:
| I don't think making CPU issue (likely bogus) pre-fetches for
| every field in the cache line that's marked as a pointer is
| really that good idea. At best, you save couple of cycles
| because the fetches are started a one or two instructions
| earlier before the actual load instruction for "loading the
| linked pointer" is issued. At worst, you keep thrashing your
| cache loading data you're _not_ going to read, delaying
| fetching the data you _will_ read.
| Xeamek wrote:
| For everything? Obviously no.
|
| But for all the crazy optimizations modern compiler do, I
| don't see how marking pointers for more then couple of them
| in a raw is _that_ crazy
| favorited wrote:
| Apple Silicon does this. If it prefetches something that
| looks like a pointer, it will also fetch the pointed-to
| memory. It's a cool feature, and is especially useful for
| Apple, since their Objective-C collections only store
| pointers - but it also can re-open the door for certain
| timing attacks by violating preconditions of constant-time
| cryptography algorithms.
|
| https://en.wikipedia.org/wiki/GoFetch
| thefaux wrote:
| An exception to this is precisely parsing and validation in
| which linked lists can and ideally should be populated
| consecutively (using an arena allocator). I agree that general
| purpose linked lists are rarely optimal but they are hard to
| beat for parsers where they hit a pareto sweet spot of
| implementation simplicity and good performance.
| dpc_01234 wrote:
| This is such an important point.
|
| Benchmarking is hard, and very often is done in ideal
| conditions, which never materialize in real use cases.
| ILoveQaWolf wrote:
| this is interesting!
| jkaptur wrote:
| Another cool aspect of this and (if I understand correctly),
| where Rust really helps you is that you can explore multiple
| branches in parallel, since the linked list is immutable.
| eimrine wrote:
| What if the linked list is cycled or doubly-linked?
| amelius wrote:
| Then Rust isn't the right tool for the job. Rust is great for
| tree-like structures which is 99% of what you encounter
| anyway. Unless you're writing a kernel or something.
| jkaptur wrote:
| I was talking about this case in particular, where we know
| the list is basically isomorphic to the call stack.
| ComputerGuru wrote:
| To be fair, lack of concurrency safety never stopped C and C++
| devs from boldly doing just that, anyway!
| evmar wrote:
| The "maybe you don't need a linked list" proposal at the bottom
| seems significantly better than the options presented in the
| post:
|
| - almost no cost in the non-erroring path
|
| - no extra data structures to manage
|
| - a lot less code
|
| I think the post would benefit from a better analysis of why this
| doesn't work for them.
| vouwfietsman wrote:
| Indeed, also building a linked list over the stack like that is
| a crafty but very weird design. Keep it simple.
| dmitry_dygalo wrote:
| Indeed, I agree with your points.
|
| This idea was added after I wrote the post and wasn't taken
| from my own optimization efforts in `jsonschema`. Originally,
| in `jsonschema` the output type is actually a badly composed
| iterator and I intended to simplify it to just a `Result<(),
| ValidationError>` for the article, but with this output, there
| are actually way better optimizations than I originally
| implemented.
|
| If I'd discovered this idea earlier, I'd probably spend more
| time investigating it.
| duped wrote:
| It strikes me the bottleneck for this problem isn't Vec or List,
| it's the serde_json Value type that needs to be used. This is
| useful for serializing/deserializing values into Rust types but
| if you're trying to validate JSON against a schema you don't
| actually need the JSON value data, just the types of the nodes
| (or more specifically, you only need _some_ of the value data,
| and probably not much of it, so don 't pay for it when you don't
| have to).
|
| If you implemented your own parser for the schema and the JSON
| and only used an AST to validate + span information (which can
| just be a pair of u16s for start/end of a token) then you can
| collect your error messages very, very quickly and generate the
| error message once validation is finished.
|
| Heavily optimized compilers will do this for semantic analysis
| and type checking, where the IR they use can be constructed
| quickly and then used with the input text to get helpful messages
| out, while the guts of the algorithm is only working with
| semantic information in a structure that's compact and easy to
| access/manipulate.
|
| All that said, serde_json is incredibly convenient and giving up
| to write your own parser is a big hammer for a problem that
| probably doesn't need it.
| EGreg wrote:
| Just use capn'proto. No deserialization needed !
| aabhay wrote:
| Whats your experience like using it? Is it ergonomic or does
| it require you to do lots of type gymnastics?
| cabronerp wrote:
| This repo has a nice pub/sub implementation based on capnp:
| https://github.com/commaai/cereal/blob/master/log.capnp
| ComputerGuru wrote:
| > All that said, serde_json is incredibly convenient and giving
| up to write your own parser is a big hammer for a problem that
| probably doesn't need it.
|
| I had a thought in my reply [0] on this that actually might let
| him eat his cake and have it too in this regard. I think you
| can heavily abuse serde::de::Visitor to schema validate without
| actually parsing (or with less parsing, at any rate). I went
| into more detail in my comment but I wanted to ping you
| (@duped).
|
| [0]: https://news.ycombinator.com/item?id=40357159
| dmitry_dygalo wrote:
| That is really cool idea! Thank you
| ww520 wrote:
| I've seen a lexer/parser scheme that encodes the lexer token
| type along with the token file location information into a u64
| integer, something like struct Token {
| token_type: u8, type_info: u8,
| token_start: u32, // offset into the source file.
| token_len: u16 }
|
| It's blazing fast. The lexer/parser can process millions of
| lines per second. The textual information is included, and the
| location information is included.
| acidx wrote:
| This is roughly what my JSON parser does. It does type-
| checking, albeit without using JSON-schema, but an object
| descriptor that you have to define to parse and generate
| JSON.
|
| It's been developed for embedded systems (it was written
| originally for a NATS implementation in the Zephyr RTOS), so
| it's a bit limited and there's no easy way to know where some
| parsing/type validation error happened, but the information
| is there if one wants to obtain it: https://github.com/lperei
| ra/lwan/blob/master/src/samples/tec...
| thewakalix wrote:
| Wouldn't using push_back prevent the need to reverse the Vec at
| the end?
| ComputerGuru wrote:
| Nice post, Dmitry!
|
| Two suggestions: it's not immediately obvious whether subsequent
| benchmark result tables/rows show deltas from the original
| approach or from the preceding one (it's the latter, which is
| more impressive). Maybe call that out the first couple of times?
|
| Second, the "using the single Vec but mutating it" option would
| presumably benefit from a reserve() or with_capacity() call.
| Since in that approach you push to the vector in both erroring
| and non-erroring branches, it doesn't have to be exact (though
| you could do a bfs search to find maximum depth, that doesn't
| strike me as a great idea) and could be up to some constant value
| since a single block memory allocation is cheap in this context.
| (Additionally, the schema you are validating against defines a
| minimum depth whereas the default vec has a capacity of zero, so
| you're _guaranteed_ that it's a bad choice unless you're
| validating an empty, invalid object.)
|
| But I agree with the sibling comment from @duped that actually
| parsing to JSON is the biggest bottleneck and simply parsing to
| the minimum requirements for validation would be far cheaper,
| although it depends on if you'll be parsing immediately after in
| case it _isn't_ invalid (make the common case fast, assuming the
| common case here is absence of errors rather than presence of
| them) or if you really do just want to validate the schema (which
| isn't that rare of a requirement, in and of itself).
|
| (Edit: I do wonder if you can still use serde and serde_json but
| use the deserialize module's `Visitor` trait/impl to
| "deserialize" to an enum { Success, ValidationError(..) }` so you
| don't have to write your own parser, get to use the already
| crazy-optimized serde code, and still avoid actually fully
| parsing the JSON in order to merely validate it.)
|
| If this were in the real world, I would use a custom slab
| allocator, possibly from storage on the stack rather than the
| heap, to back the Vec (and go with the last design with no linked
| lists whatsoever). But a compromise would be to give something
| like the mimalloc crate a try!
| dmitry_dygalo wrote:
| Thanks!
|
| > Two suggestions: it's not immediately obvious whether
| subsequent benchmark result tables/rows show deltas from the
| original approach or from the preceding one (it's the latter,
| which is more impressive). Maybe call that out the first couple
| of times?
|
| Agree!
|
| > Second, the "using the single Vec but mutating it" option
| would presumably benefit from a reserve() or with_capacity()
| call. Since in that approach you push to the vector in both
| erroring and non-erroring branches, it doesn't have to be exact
| (though you could do a bfs search to find maximum depth, that
| doesn't strike me as a great idea) and could be up to some
| constant value since a single block memory allocation is cheap
| in this context. (Additionally, the schema you are validating
| against defines a minimum depth whereas the default vec has a
| capacity of zero, so you're guaranteed that it's a bad choice
| unless you're validating an empty, invalid object.)
|
| Oh, this is a cool observation! Indeed it feels like
| `with_capacity` would help here
|
| > But I agree with the sibling comment from @duped that
| actually parsing to JSON is the biggest bottleneck and simply
| parsing to the minimum requirements for validation would be far
| cheaper, although it depends on if you'll be parsing
| immediately after in case it isn't invalid (make the common
| case fast, assuming the common case here is absence of errors
| rather than presence of them) or if you really do just want to
| validate the schema (which isn't that rare of a requirement, in
| and of itself).
|
| My initial assumption was that usually the input is already
| parsed. E.g. validating incoming data inside an API endpoint
| which is then passed somewhere else in the same representation.
| But I think that is a fair use case too and I was actually
| thinking of implementing it at some point via a generic `Json`
| trait which does not imply certain representation.
|
| > (Edit: I do wonder if you can still use serde and serde_json
| but use the deserialize module's `Visitor` trait/impl to
| "deserialize" to an enum { Success, ValidationError(..) }` so
| you don't have to write your own parser, get to use the already
| crazy-optimized serde code, and still avoid actually fully
| parsing the JSON in order to merely validate it.)
|
| Now when I read the details, it feels like a really really cool
| idea!
|
| > If this were in the real world, I would use a custom slab
| allocator, possibly from storage on the stack rather than the
| heap, to back the Vec (and go with the last design with no
| linked lists whatsoever). But a compromise would be to give
| something like the mimalloc crate a try!
|
| Nice! In the original `jsonschema` implementation the
| `validate` function returns `Result<(), ErrorIter>` which makes
| it more complex to apply that approach, but I think it still
| should be possible.
| tomck wrote:
| This article is disingenuous with its Vec benchmark. Each call to
| `validate` creates a new Vec, but that means you allocate + free
| the vec for _each_ validation. Why not store the vec on the
| validator to reuse the allocation? Why not mention this in the
| article, i had to dig in the git history to find out whether the
| vec was getting reallocated. This feels like you had a cool
| conclusion for your article, 'linked lists faster than vec', but
| you had to engineer the vec example to be worse. Maybe I'm being
| cynical.
|
| It would be interesting to see the performance of a `Vec<&str>`
| where you reuse the vector, but also a `Vec<u8>` where you copy
| the path bytes directly into the vector and don't bother doing
| any pointer traversals. The example path sections are all very
| small - 'inner', 'another', 5 bytes, 7 bytes - less than the
| length of a pointer! storing a whole `&str` is 16 bytes per
| element and then you have to rebuild it again anyway in the
| invalid case.
|
| ---
|
| This whole article is kinda bad, it's titled 'blazingly fast
| linked lists' which gives it some authority but the approach is
| all wrong. Man, be responsible if you're choosing titles like
| this. Someone's going to read this and assume it's a reasonable
| approach, but the entire section with Vec is bonkers.
|
| Why are we designing 'blazingly fast' algorithms with rust
| primitives rather than thinking about where the data needs to go
| first? Why are we even considering vector clones or other crazy
| stuff? The thought process behind the naive approach and step 1
| is insane to me:
|
| 1. i need to track some data that will grow and shrink like a
| stack, so my solution is to copy around an immutable Vec (???)
|
| 2. this is really slow for obvious reasons, how about we: pull in
| a whole new dependency ('imbl') that attempts to optimize for the
| general case using complex trees (???????????????)
|
| You also mention:
|
| > In some scenarios, where modifications occur way less often
| than clones, you can consider using Arc as explained in this
| video
|
| I understand you're trying to be complete, but 'some scenarios'
| is doing a lot of work here. An Arc<[T]> approach is _literally_
| just the same as the naive approach _but_ with extra atomic
| refcounts! Why mention it in this context?
|
| You finally get around to mutating the vector + using it like a
| stack, but then comment:
|
| > However, this approach requires more bookkeeping and somewhat
| more lifetime annotations, which can increase code complexity.
|
| I have no idea why you mention 'code complexity' here (complexity
| introduced _by rust_ and its lifetimes), but fail to mention how
| adding a dependency on 'imbl' is a negative.
| thefaux wrote:
| > how about we: pull in a whole new dependency ('imbl') that
| attempts to optimize for the general case using complex trees
| (???????????????)
|
| To me this is a self answering question.
| dmitry_dygalo wrote:
| > This article is disingenuous with its Vec benchmark. Each
| call to `validate` creates a new Vec, but that means you
| allocate + free the vec for each validation. Why not store the
| vec on the validator to reuse the allocation? Why not mention
| this in the article, i had to dig in the git history to find
| out whether the vec was getting reallocated.
|
| The idea comes back to [0] which is similar to one of the steps
| in the article, and before adding `push` & `pop` I just cloned
| it to make things work. That's what Rust beginners do.
|
| > This feels like you had a cool conclusion for your article,
| 'linked lists faster than vec', but you had to engineer the vec
| example to be worse. Maybe I'm being cynical.
|
| Maybe from today's point in time, I'd think the same.
|
| > It would be interesting to see the performance of a
| `Vec<&str>` where you reuse the vector, but also a `Vec<u8>`
| where you copy the path bytes directly into the vector and
| don't bother doing any pointer traversals. The example path
| sections are all very small - 'inner', 'another', 5 bytes, 7
| bytes - less than the length of a pointer! storing a whole
| `&str` is 16 bytes per element and then you have to rebuild it
| again anyway in the invalid case.
|
| Yeah, that makes sense to try!
|
| > This whole article is kinda bad, it's titled 'blazingly fast
| linked lists' which gives it some authority but the approach is
| all wrong. Man, be responsible if you're choosing titles like
| this. Someone's going to read this and assume it's a reasonable
| approach, but the entire section with Vec is bonkers.
|
| > Why are we designing 'blazingly fast' algorithms with rust
| primitives rather than thinking about where the data needs to
| go first? Why are we even considering vector clones or other
| crazy stuff? The thought process behind the naive approach and
| step 1 is insane to me:
|
| > 1. i need to track some data that will grow and shrink like a
| stack, so my solution is to copy around an immutable Vec (???)
|
| > 2. this is really slow for obvious reasons, how about we:
| pull in a whole new dependency ('imbl') that attempts to
| optimize for the general case using complex trees
| (???????????????)
|
| That's clickbait-y, though none of the article's ideas aim to
| be a silver bullet. I mean, there are admittedly dumb ideas in
| the article, though I won't believe that somebody would come up
| with a reasonable solution without trying something stupid
| first. However, I might have used better wording to highlight
| that and mention that I've come up with some of these ideas
| when was working on `jsonschema` in the past.
|
| > I understand you're trying to be complete, but 'some
| scenarios' is doing a lot of work here. An Arc<[T]> approach is
| literally just the same as the naive approach but with extra
| atomic refcounts! Why mention it in this context?
|
| If you don't need to mutate the data and need to store it in
| some other struct, it might be useful, i.e. just to have cheap
| clones. But dang, that indeed is a whole different story.
|
| > I have no idea why you mention 'code complexity' here
| (complexity introduced by rust and its lifetimes), but fail to
| mention how adding a dependency on 'imbl' is a negative.
|
| Fair. Adding `imbl` wasn't a really good idea for this context
| at all.
|
| Overall I think what you say is kind of fair, but I think that
| our perspectives on the goals of the article are quite
| different (which does not disregard the criticism).
|
| Thank you for taking the time and answer!
|
| - [0] - https://github.com/Stranger6667/jsonschema-
| rs/commit/1a1c6c3...
| ww520 wrote:
| Yes, when I saw the immutable path is cloned and appended a
| key, I knew the benchmark was a strawman.
| torusle wrote:
| > Linked lists are taught as fundamental data structures in
| programming courses, but they are more commonly encountered in
| tech interviews than in real-world projects.
|
| I beg to disagree.
|
| In kernels, drivers, and embedded systems they are _very_ common.
| ComputerGuru wrote:
| Really only because they're so goddamn easy. I find myself
| using linked lists a lot less since adopting rust for embedded
| code (even with no_std and no allocator, but especially when
| alloc-only std data structures are within reach).
| saghm wrote:
| Most people who take data structures courses or perform tech
| interviews don't end up working on kernels, drivers, or
| embedded systems though. To me, it sounds like the point being
| made is that there are a large number of programmers who have
| learned about linked lists but haven't run into many cases
| where they needed them in the world world, and I think it's
| accurate.
| dmitry_dygalo wrote:
| This was my intention
| SoftTalker wrote:
| Agree, I can't recall using anything more complicated than
| lists/arrays or hash tables (key/value stores) in practice,
| in many years of (mostly web application) programming. And
| even those I'm not coding from scratch, I'm using classes
| or functions that my programming language gives me. For
| anything more complicated than that, I'm using a database,
| which of course is using many data structures under the
| covers but I don't directly touch those.
| sumtechguy wrote:
| I used to use them all the time. However, now? I would be
| hard pressed to not use one of the many built in
| vector/list/dict/hash items in many languages now. I
| would have to be truly doing something very low level or
| for speed to use one.
| josephg wrote:
| As a counterpoint, I've been working on collaborative
| text editing. I ended up implementing a custom b-tree
| because we needed a few features that I couldn't find in
| any off the shelf library:
|
| - My values represent runs of characters in the document.
|
| - Inserts in the tree may split a run.
|
| - Runs have a size - 0 if the run is marked as deleted or
| the number of characters otherwise. The size changes as
| we process edits
|
| - Every inserted character has an ID. I need to be able
| to look up any run by its ID, and then edit it.
| (Including querying the run's current position in the
| tree and editing the run's size).
|
| It's an interesting data structure problem, and it took a
| few weeks to have a good solution (and a few weeks more
| later rewriting it in safe rust & improving performance
| in the process).
|
| I love this stuff. I think it's pretty rare to find a
| reason to code your own collection types these days, but
| it certainly comes up from time to time!
| samatman wrote:
| You need a linked list to write hello world in any Lisp,
| though.
|
| Seems like the glaring exception to the rule!
| ziddoap wrote:
| > _In kernels, drivers, and embedded systems they are very
| common._
|
| Out of all the programmers in the world, what percentage of
| them do you think work in the kernel/driver/embedded spaces?
| 9659 wrote:
| 1%
| coldtea wrote:
| More like 0.01% -- if we consider enterprise programmers,
| web programmers, and application/game programmers which I'd
| expect to be the largest groups...
| sfink wrote:
| First, my only guess is that everyone's guesses are going to
| be wildly wrong. People who work in such spaces will greatly
| overestimate. People who don't will greatly underestimate.
| (This is mostly due to how many comments I've read on HN that
| implicitly assume that most people's problems and
| perspectives are the same as the commenter's.)
|
| Second, linked lists are useful in a lot more places than
| that. Probably a better proxy would be low-level coders. You
| almost always want a linked list _somewhere_ when you 're
| dealing with memory addresses and pointers. Maybe not for the
| primary collections, but there are always secondary ones that
| need to maintain a collection with a different membership or
| ordering, and vectors of pointers don't have many clear
| advantages over intrusive linked lists for those secondary
| collections.
| josephg wrote:
| Yeah intrusive collections in C is the biggest use I've
| seen. I played with a physics engine a few years ago
| (chipmunk2d) which made heavy use of intrusive linked lists
| to store all the objects in the world model. I suspect
| there's some clever data structures out there that might
| have better performance, but the intrusive linked list
| approach was simple and fast!
| igammarays wrote:
| Why? Why would someone reach for a linked list in a kernel,
| driver, or embedded system?
| dmitry_dygalo wrote:
| Easier to avoid allocation errors, e.g. in the Linux kernel.
| I think Alice Ryhl mentioned it here -
| https://www.youtube.com/watch?v=CEznkXjYFb4
| SJC_Hacker wrote:
| How do linked list prevent allocation errors? If anything
| it would seem to make them worse.
|
| My experience in embedded, everything is hardcoded as a
| compile time constant, including fixed size arrays (or
| vectors of a fixed capacity)
| sratner wrote:
| In more complex embedded software you are likely to see
| free lists used to manage pools of preallocated resources
| (like event structs etc) or pools of fixed sized memory
| buffers.
| sfink wrote:
| Intrusive linked lists eliminate the allocation entirely.
| With a vector<Obj>, you have the Obj allocation and then
| potential vector-related reallocations. With an intrusive
| linked list, you only have the Obj allocation. So your
| code that adds/removes list entries does no additional
| allocation at all, it reuses a pointer or two that was
| allocated as part of the original Obj allocation. Often
| the list manipulation happens at a time when allocation
| failures are inconvenient or impossible to handle.
| torusle wrote:
| In embedded, you often need message queues.
|
| A common way to implement these is to have an array of
| messages, sized for the worst case scenario and use this
| as the message pool.
|
| You keep the unused messages in a single linked "free-
| list", and keep the used messages in a double linked
| queue or fifo structure.
|
| That way you get O(1) allocation, de-allocation, enqueue
| and dequeue operations for your message queue.
|
| Another example for this paradigm are job queues. You
| might have several actuators or sensors connected to a
| single interface and want to talk to them. The high level
| "business" logic enqueues such jobs and an interrupt
| driven logic works on these jobs in the background, aka
| interrupts.
|
| And because you only move some pointers around for each
| of these operations it is perfectly fine to do so in
| interrupt handlers.
|
| What you really want to avoid is to move kilobytes of
| data around. That quickly leads to missing other
| interrupts in time.
| sratner wrote:
| No memory allocation/reallocation, preallocated resources
| managed in e.g. a free list. Also for things like packetized
| networks, lists are handy for filling as you progress down
| the stack while using fixed sized packet buffers, or
| reassembling fragments.
|
| In embedded world, memory often needs to be exactly
| controlled, and allocation failures are fatal without a more
| complex MMU. In kernel world, I believe the main reason is
| that allocations can block.
| kevingadd wrote:
| Intrusive lists are really powerful for those kinds of
| scenarios, and technically are linked lists. They're widely
| used in the kernel, IIRC.
| akira2501 wrote:
| O(n) iteration but pretty much guaranteed O(1) for every
| other operation. If that's the semantic you need, then linked
| lists are your friend.
| cyberax wrote:
| In kernels, it's usually hard to get general-purpose
| allocation working reliably in all contexts. And you need
| that for resizable vectors. With lists, you just need to be
| able to grab an element-sized block. Quite often, it's even
| done with the memory page granularity.
|
| In addition, a lot of data structures might be shared across
| multiple cores. Linked lists can be traversed and mutated
| concurrently (although with a bit of care).
| dist1ll wrote:
| I wonder how much of that is due to the kernel history, and
| the influence of C idioms, and not because of some inherent
| design superiority.
|
| I'd be convinced once I see pure Rust kernels geared
| towards modern machines suddenly using linked lists
| everywhere. Otherwise I'm leaning towards it being a side-
| effect of the language choice and culture.
|
| Also because I've seen the same kind of reasoning applied
| to compilers (e.g. "of course you need linked lists in
| compilers, they are extremely graph traversal heavy"). But
| one look at modern compilers implemented in Rust paint a
| very different picture, with index-based vectors, data-
| oriented design and flattened ASTs everywhere.
| vineyardlabs wrote:
| Any time you have a computer interacting with the outside
| world in an asynchronous fashion you basically have to have
| some form of buffering which takes the form of a queue/fifo.
| A linked list is the most performant/natural way of modeling
| a queue in our ubiquitous computing infrastructure.
|
| I/e in a DMA-based ethernet driver, the ethernet MAC receives
| packets asynchronously from the processor, perhaps faster
| than the processor can ingest them. So the mac interrupts the
| processor to give it new packets, and the processor can't sit
| processing the packets in the interrupt context, so it needs
| to put them into some ordered list for processing later when
| it has downtime. In a true embedded system, the memory for
| this list is going to be fixed or statically allocated, but
| you still don't really want to have an array-style list with
| fixed indexing, as you'll have to manage what happens when
| the index wraps around back to 0 etc, so instead you just
| construct a linked list in that pre-allocated memory.
|
| I wouldn't say linked lists aren't really used in high-level
| applications, as I said they're used all over the place
| whenever you have external asynchronous communication, it's
| just that modern high-level frameworks/libs totally abstract
| this away from most people writing high level code.
| dmitry_dygalo wrote:
| I'd say most developers don't write kernels/drivers or embeds,
| at least from what I've seen. I am not saying that there are
| not many devs like this, but rather that there are fewer kernel
| devs than web devs.
| wesnerm2 wrote:
| Linked lists were heavily used in application software before
| the appearance of standard libraries and Java, which is when
| dynamically sizable array-based lists become common. There also
| wasn't a gap between the performance of linked lists and arrays
| before CPU became significantly faster than RAM.
| TheCondor wrote:
| There are plenty of good uses for linked list and their
| variants. Like LRU lists come to mind; I couldn't bet that it's
| the most efficient way to implement them but they're pretty
| darn good. Then obviously things like breadth first search need
| a type of queue data structure. It often can come down to
| memory pressure, if you've got Gigs to spare, then allocating a
| contiguous block of memory for a list of something isn't a big
| deal, if memory is tight and maybe fragmented, linked lists can
| and will get it done. They have their places.
|
| I did start to encounter some fresh grads with degrees that
| said "computer science" on them that couldn't answer some basic
| linked list questions. I was beginning to think it was a bad
| set of questions until I hit those kids. If you claim to know
| "computer science" and don't know what a linked list is,
| especially beyond some text books stuff, I'm probably not
| interested.
| davexunit wrote:
| I don't do any of those things and I still use lists
| constantly. Kinda strange to learn that many others don't use
| them at all it seems.
| nequo wrote:
| What kinds of things are you using them for usually? Is it
| mostly in C/C++?
| waynesonfire wrote:
| linked lists shine when you can perform a O(1) remove operation
| if you have a reference to an object on the list. This is very
| common when using C structs and not possible in Java for
| example.
| adgjlsfhk1 wrote:
| these cases are usually cases where you want to use a (hash)
| Set. If you're Ok changing everyone's indexing, the indexing
| didn't matter.
| sfink wrote:
| Linked lists get a bum rap.
|
| Yes, if you have a simple choice between a vector and a linked
| list, then the vector is vastly superior due to locality and (for
| non-intrusive linked lists) allocations. So much so that vectors
| often win even when you're doing lots of O(n) deletions that
| would be O(1) with a linked list.
|
| But that doesn't mean that linked lists are useless! A vector
| gives you a single ordering. What if you need multiple? What if
| you need a free list, which you're never going to be iterating
| over but will just grab off an item at a time? I find it quite
| common to have one "natural" order, for which I will use a vector
| (or equivalently, a bump allocator of fixed-size items), and then
| one or several auxiliary orders like the entries belonging to a
| particular owner or the ones that will need to be visited for
| some sort of cleanup or an undo list or some sort of stack or
| queue of pending items to process. Your common iteration will be
| in memory order, but that doesn't mean you won't ever want to do
| different iterations.
|
| It annoys me that this is always omitted, with the attitude that
| linked lists are obsolete and useless because vectors be moar
| better faster gooder, to the point that it's a waste of time to
| learn how to manipulate linked lists anymore.
|
| I guess a lot of this is probably due to the popularity of high
| level languages, where you're just dealing with references to
| everything in the first place. But in those, the arguments in
| favor of vectors are often not valid because that blessed golden
| vector of Objects is giving you no more locality than giving your
| Object a `next` field: the vector of Objects is represented
| internally as a vector of pointers to the actual Object data, so
| your oh so nicely cached lookups are doing memory reads that are
| 100% pure _overhead_ compared to following a `next` link from an
| Object that fits into a cache line. In both cases, your
| performance is going to be limited by the locality of your Object
| data, which is the same whether you have a vector of pointers or
| Object data with an intrusive pointer.
|
| Also, if you have an existing setup and need to introduce a new
| list, it is sometimes far easier to drop in a `next` (and maybe
| `prev`) field than to refactor everything to accommodate a new
| vector. Especially since the vector will move all of your data
| when resizing the vector, which invalidates any pointers you
| might be using for other purposes. If you'll be iterating that
| list frequently, then the vector may very well be a good idea. If
| it's just for error cases or slow paths, then linked lists really
| aren't bad at all.
|
| I'm not trying to argue _for_ linked lists here so much as
| arguing against the blanket arguments against them.
|
| </rant>
| PartiallyTyped wrote:
| > What if you need multiple?
|
| You can very much have a single vector owning the memory, and
| do all other ordering over auxiliary vectors of indices. Should
| be cheaper and faster than holding more linked lists.
|
| If you want to remove elements, you can very much use
| tombstones to flag deleted elements and then clean up after
| some threshold.
| sfink wrote:
| > You can very much have a single vector owning the memory,
| and do all other ordering over auxiliary vectors of indices.
| Should be cheaper and faster than holding more linked lists.
|
| Cheaper in space depends on the percentage of elements in the
| auxiliary list. An intrusive list has space for every element
| to be in the list, which is wasteful if few are. A vector
| that grows by doubling could waste nearly half of its
| elements. Indexes can often be smaller than pointers, though,
| which favors the vector approach.
|
| Faster is debatable. Iterating the vector of indexes is quite
| fast, but indirecting into the data in a random order is
| still slow. An intrusive linked list doesn't need to do the
| former, only the latter. (Then again, it also bloats up the
| data slightly, which matters for small items since fewer fit
| in cache.)
|
| The main reason why linked lists could still be at an
| advantage is if you don't want allocations in your auxiliary
| list manipulation path. Maybe you don't want the
| unpredictable timing, or maybe you can't handle failure.
|
| I agree on tombstoning, but note that you're giving up some
| of the vector advantages by doing so. Your processing loop
| now has a much less predictable branch in it. (Though really,
| the vector probably still wins here, since the linked list is
| _built_ out of data dependent accesses!)
|
| Sometimes these auxiliary lists need to contain things that
| are no longer in the main list, too, as in the case of a free
| list. (And if you swap such things to the end, then you've
| just invalidated all of your indexes.) And non-intrusive
| linked lists can point to variable-size elements (though they
| lose most of the other benefits of an intrusive linked list.)
|
| Anyway, it's the usual "use the right tool for the right
| job", I just claim that linked lists are sometimes the right
| tool.
|
| (Random side note: I use "indexes" instead of "indices" these
| days, for a silly reason--"indices" is a fine word, I have no
| issue with it, but it seems to encourage some people to use
| the non-word "indice" which is an abomination.)
| osigurdson wrote:
| Link lists can move items in O(1) but their O(N) search can be
| bad because of all of the cache line misses.
___________________________________________________________________
(page generated 2024-05-14 23:01 UTC)