[HN Gopher] In defense of linked lists
___________________________________________________________________
In defense of linked lists
Author : grep_it
Score : 561 points
Date : 2022-11-04 20:46 UTC (1 days ago)
(HTM) web link (antirez.com)
(TXT) w3m dump (antirez.com)
| ordiel wrote:
| For an "article" starting with a ramble about how the author's
| life is being tragically affected by "the fate of Twitter", the
| sentence
|
| > get ready to read a sentimental post about a data structure
|
| Makes a looot of sense
| c-smile wrote:
| Why do we need to defend a hammer against a sickle?
|
| Each tool has its own value it was designed for ...
|
| Try to imagine LISP without lists ...
| bsder wrote:
| > Try to imagine LISP without lists ...
|
| It's called Clojure.
|
| Getting rid of CONS (effectively removing singly linked lists)
| is a huge benefit.
| klysm wrote:
| A list can have different memory representations with identical
| semantics.
| js2 wrote:
| > Linked lists are conceptual. A node pointing to itself is the
| most self centered thing I can imagine in computing: an ideal
| representation of the more vulgar infinite loop. A node pointing
| to NULL is a metaphor of loneliness. A linked list with tail and
| head connected, a powerful symbol of a closed cycle.
|
| Tongue-in-cheek, but this really made me smile. :-)
| sytelus wrote:
| Linked lists don't need defense. It is unfortunate that job of
| vast majority of developers is rather benign. They don't work on
| problems that requires complex algorithms and data structures
| beyond arrays and dictionaries. But interviewers keep asking
| those questions and give bad rep to these subjects. However, if
| you are working on building any real infrastructures such as
| search, deep learning, cloud, crypto, game engines, OS, compilers
| etc then you need linked lists, graphs, trees and myriad of
| algorithms surrounding them all the time. There is clearly
| different tier of developers who work on such infrastructure and
| those you simply utilize infrastructures others have built.
| nialv7 wrote:
| It's odd that the author states they wrote the article in
| response to arguments on Twitter, yet they didn't show us the
| arguments this was a response to.
| pharmakom wrote:
| Immutable linked lists are a cornerstone of functional
| programming.
|
| However, Clojure has shown that the underlying implementation if
| often better done using a tree of contiguous memory blocks.
| sprawld wrote:
| Donald Knuth's Dancing Links paper is a beautiful exploration of
| the power of linked lists. He uses a table with doubly linked
| lists (for rows and cols) to solve omino tiling. The linked lists
| allow for an elegant unpicking of rows (and re-attaching when
| backtracking)
|
| The paper's a really enjoyable read, Knuth's tone throughout is
| "look at this, this is fun"
|
| https://arxiv.org/abs/cs/0011047
| [deleted]
| tunesmith wrote:
| How _do_ you detect a cycle with linked lists? :) I actually have
| a side project where I 've run into this - it's a distributed
| directed graph structure where the nodes can send messages to
| each other. It's intended to be acyclic, but "walking the graph"
| is really only used to send truth values, which are combined with
| others to run Boolean logic. So in that case, the occasional
| cycle doesn't matter, because if the calculated boolean value
| matches that of the node's internal state, it just can cease
| propagating. So a structural loop won't turn into an infinite
| processing loop.
|
| The problem then becomes that I can't introduce NOT gates
| anywhere in a cycle, because then the bit will continue flipping
| and I'll get an infinite processing loop.
|
| So it seems my only hope is external processes that continually
| walk the graph and keep track of where it visited to try and
| detect loops, and I don't like how that scales...
| rightbyte wrote:
| I would say the easiest way is to walk the graph and tag the
| nodes as visited. If you don't visit every node each iteration
| use a increasing key you store in each node.
| Smaug123 wrote:
| One of the standard texts on the matter:
| https://aphyr.com/posts/341-hexing-the-technical-interview
| dragontamer wrote:
| With "just" a linked list, you need a turtle (advance one-node
| at a time), and a hare (advance two-nodes at a time).
|
| If the turtle and hare ever meet, you have a cycle. Otherwise,
| if the hare reaches the end of the list, you don't have a
| cycle.
| timjver wrote:
| Just in case performance matters, there is a more efficient
| way: have the tortoise stay in place and advance the hare
| only one node at a time, and assign the hare to the tortoise
| every time a power of 2 number of steps have been made. This
| is known as Brent's algorithm, and it requires fewer
| advancements than the original tortoise and hare algorithm by
| Floyd.
|
| Another notable advantage of Brent's algorithm is that it
| automatically finds the cycle length, rather than (in Floyd's
| case) any multiple of the cycle length.
|
| https://en.wikipedia.org/wiki/Cycle_detection#Brent's_algori.
| ..
| Jtsummers wrote:
| https://rosettacode.org/wiki/Cycle_detection
|
| Cycle detection in (currently) 44 languages. Most (all?)
| use Brent's algorithm. They're operating on an iterated
| function but converting most of these to detect cycles in a
| linked list would be straightforward.
| klysm wrote:
| Does the graph have static structure? I'm not sure if you're
| describing something like an actor system.
| tunesmith wrote:
| Akka cluster sharding :) Shape of the graph can change as
| processes are being run.
| mdavidn wrote:
| It's not difficult. Keep a pointer to a recently-visited node.
| After visiting N nodes, update the pointer to the current node,
| but also double N. If the pointer ever matches the next node,
| the list has a cycle. The doubling of N ensures that even large
| cycles will eventually be detected.
| bewaretheirs wrote:
| Set two pointers to the head of the list.
|
| Step through the list by one element per iteration with one of
| them, and with the other, step every other iteration.
|
| If the two pointers are ever again equal you've found a cycle.
|
| If you hit end of list, no cycle.
| mst wrote:
| Recycler.
|
| To be clear, http://trout.me.uk/gc/recycler-overview.pdf
|
| There's also a paper with the full algorithm in that directory.
|
| It's how Nim's ORC collector works.
|
| (note that yes, I know, this is for collection, not cycle
| detection in general, but it's also brilliant and works)
| quickthrower2 wrote:
| Can you just store the first reference as a local variable and
| then compare each time you move to a new node?
|
| Or to save the comparisons make the thing the first node points
| to have a stop flag of some sort.
| mdavidn wrote:
| That approach works only when the first node is included in
| the cycle. Other cycles are possible.
| Jtsummers wrote:
| Consider this kind of cycle: Head -> Node1 ->
| Node2 -> Node3 ^ |
| +-----------------+
|
| If it's a circular list then your option would work, but not
| all cycles will produce circular lists.
| quickthrower2 wrote:
| Thanks. This seems so obvious now! Always draw it out I
| guess.
| ufo wrote:
| There are a bunch of algorithms for detecting cycles. One of
| the classics is the tortoise and hare algorithm, which uses
| only O(1) space.
| https://en.wikipedia.org/wiki/Cycle_detection#Floyd's_tortoi...
| [deleted]
| subarctic wrote:
| Interesting, I have to disable HTTPS Everywhere on this site,
| otherwise it redirects me to https://redis.io/news/138 and I get
| a 404 page.
| grog454 wrote:
| There is no contiguous memory, arbitrarily resizeable alternative
| to a linked list in high performance concurrency is there?
| https://docs.oracle.com/javase/7/docs/api/java/util/concurre...
| tomcam wrote:
| > Linked lists are simple. It is one of those rare data
| structures, together with binary trees and hash tables and a few
| more, that you can implement just from memory without likely
| stepping into big errors.
|
| One of the best engineers in Microsoft's devdiv told me that he
| often gave a linked list implementation in C as a whiteboard
| assignment for interviewees and that no one ever managed a bug-
| free version. (I failed mine even after creating a full general
| purpose implementation on my own just a couple years before.)
| antirez wrote:
| One thing is: you have 30 minutes of time and can compile and
| test and end with a working implementation. Another thing is
| write the code on a whiteboard, that is a process nobody ever
| follows in practice: very easy to make mistakes this way.
| Anyway doubly linked lists are easy to get wrong without
| compiling / testing.
| scottlamb wrote:
| > I thought of writing this blog post full of all the things I
| love about linked lists.
|
| The blog post is short, and I think the author covered about
| everything.
|
| Yeah, linked lists are sometimes appropriate, especially
| intrusive linked lists (the author called them "embedded" linked
| lists). The hate they get is a backlash against CS programs that
| introduce people to them early and don't emphasize enough the
| reasons you should usually pick something else. If we have to err
| on one side or the other, I'd rather it be telling people not to
| use them.
|
| btw:
|
| > Redis can be wrong, but both Redis and the Linux kernel can't.
|
| Yes, they can, and I'd rather avoid argument from authority
| altogether.
| pornel wrote:
| Both Redis and Linux are C projects, and in C it's customary to
| roll your own containers. Without generics, and with a culture
| of avoiding small dependencies, it's preferred to roll your own
| simple list than to reach for more advanced reusable data
| structures.
|
| So I don't think it's a particularly enlightened decision by
| these projects to use so many lists on their merit, but rather
| a programming pattern especially common in the C language they
| use.
| colonwqbang wrote:
| If you have a pointer to an element of a doubly liked list,
| you can always remove it in a few processor cycles. You can
| also always add or remove from the end of front of the list
| in a few cycles.
|
| This means that you can do those things under a spin lock or
| with interrupts masked. You cannot really resize a dynamic
| array under a tight spinlock, or modify a tree that should be
| kept balanced.
|
| Another use case is free lists, a list of preallocated
| structures where you can grab one or put one back quickly
| under spinlock.
|
| These are some examples of how lists are used in kernel land.
| pornel wrote:
| I'm not saying there are no uses. Concurrent lock-free
| algorithms are an especially useful case, and a e.g. linked
| list of on-stack mutex waiters is a very clever approach
| that I don't think has a better replacement. But I doubt
| that lists are as common as they are because every place
| really needs them.
|
| Vecdeque can cheaply pop/push front and end too. Arrays
| have swap-remove. You may not need to grow if you
| preallocated a capacity. Freelists can be a stack, or you
| can choose a data structure that doesn't allocate objects
| individually so you don't need the freelist in the first
| place.
|
| Another thing that is C-specific is that most objects are
| treated as unmovable and their address is their permanent
| identity. Without language features for prevention of
| dangling pointers, moves are asking for trouble. This
| favors linked lists and pointery data structures over
| contiguous containers and inlined data.
| scottlamb wrote:
| Meh, that's true of some C projects, but AFAIK Linux/Redis
| have good reasons here, and if I were to criticize Linux for
| something, it wouldn't be unwillingness to write more code.
|
| I still prefer to evaluate each decision on its merits. Linux
| has made some great decisions. Also some terrible ones--e.g.
| fsync semantics [1], attitude to fuzzing. Redis probably has
| too. They can agree on something and be wrong.
|
| [1] https://wiki.postgresql.org/wiki/Fsync_Errors
| romellem wrote:
| > > Redis can be wrong, but both Redis and the Linux kernel
| can't.
|
| > Yes, they can, and I'd rather avoid argument from authority
| altogether.
|
| Pretty sure that was a joke, considering the overall light-
| heartedness of the whole post.
| scottlamb wrote:
| Ahh, you're probably right. I missed it.
| xxs wrote:
| >Yes, they can, and I'd rather avoid argument from authority
| altogether.
|
| Proof by authority is the worst. Yet, linux kernel can have
| valid reasons for, but they are intrusive lists with the data
| having a pointer to the next.
| zarzavat wrote:
| The problem with linked lists is that they are rarely the ideal
| solution to any problem. They are convenient _in C_ , because
| of C's inadequacies, but in C++ and Rust they are marginalized
| because the tools exist to use superior data structures.
|
| You almost always want to unroll your linked linked lists, so
| that items are fetched in blocks. But this rarely happens in C
| because it's premature optimization _in C_. In C++ or Rust you
| can write the code once in a generic data structure and enjoy
| the benefits to the end of time.
|
| That all said, I think we should still teach them in CS because
| understanding linked lists is the first step to understanding
| trees.
| mac01021 wrote:
| OS kernels tend to use them simply because they don't have
| malloc at their disposal and so it's most practical to stick to
| fixed size memory chunks, right?
| scaramanga wrote:
| Linux kernel's malloc is probably more resilient to the long-
| term (as in, long running) problems you would get from
| continually resizing an array with realloc in userspace (ie.
| massive heap fragmentation and eventually dying in flames)
| waynecochran wrote:
| Another cool property about linked lists is that you can add an
| element to the head of a list without mutating the previous list.
| There are not many data structure that you can add an element to
| and leave the previous version unchanged. This form of
| immutability is a very nice feature that Lisp / functional
| programmers enjoy. Because of this, you can concurrently add an
| element to the head of a list to create new lists without having
| a critical section in your code.
| robocat wrote:
| > you can concurrently add an element to the head of a list to
| create new lists without having a critical section
|
| The race is on: now you have two lists when before you had one?
| rileyphone wrote:
| Every element of the list, from its own point of view, is
| another list.
| jbandela1 wrote:
| In my experience, the linked lists that are useful are intrusive
| linked lists where the data you care about has a next/previous
| embedded pointer as opposed to an just storing an arbitrary
| object inside a linked list.
|
| One example would be if your thread structure has these embedded
| pointers, you can easily add/remove a thread to a linked list of
| threads waiting for some resource without any allocation just by
| pointer manipulation.
| pclmulqdq wrote:
| This is a very useful pattern when you have small collections
| of large objects, where the vector/array implementation starts
| to fall flat on its face. Kernels and high-performance servers
| use these a lot for state tracking.
|
| If you combine intrusive linking with a pool allocator, you can
| have decent locality too.
| scaramanga wrote:
| Or how about an object that can't easily be relocated like,
| you know, a stack for a kernel thread which also contains the
| task struct and needs to be moved around between different
| scheduler queues.
|
| But yeah, I understand they took the task struct out of the
| kernel stack now.
|
| Edit: difficult being an understatement, you'd need to solve
| the halting problem to be able to rewrite any self-
| referential pointers to stack objects wherever they may be.
| robotresearcher wrote:
| That's absolutely right, but it's not just large objects. Any
| situation where you follow links rarely compared to your
| other memory accesses then the cache performance of links
| becomes negligible and the benefit of simplicity and O(1)
| operations can shine. Vectors for scrubbing through things
| without changing their structure, lists and trees the
| opposite.
|
| Scrubbing through things quickly is empirically more common,
| hence the standard advice to use vectors by default.
| dclowd9901 wrote:
| I know it's splitting hairs but doesn't that mean the structure
| is, indeed, a linked list? A structure can be two things at
| once.
| antirez wrote:
| This is a very popolar use case inside kernels implementations.
| colinfinck wrote:
| FWIW, I recently implemented a safe Rust crate around the
| Windows variant of intrusive linked lists [1] and spoke about
| it at EuroRust [2].
|
| I did it only for compatibility and would still prefer
| vectors whenever I can use them. Nevertheless, I put some
| efforts into coming up with a useful and high-performing
| implementation. To give an example: My `append` function is
| O(1) and not a poor O(N) version that you rightfully
| criticized in your tweet.
|
| [1] https://colinfinck.de/posts/nt-list-windows-linked-lists-
| in-... [2] https://www.youtube.com/watch?v=IxhZIyXOIw8
| klysm wrote:
| The other benefit of this kind of data structure is the ability
| for one object to participate in multiple collections.
| Otherwise you would have to do some kind of hash map structure
| to point to them.
| kelnos wrote:
| Isn't that the opposite, though? If you store the next/prev
| pointers in the data structure itself, you _can 't_ use the
| same object in multiple collections.
|
| If you have a standalone linked list data structure, then you
| can, as long as you have a good means of tracking the
| lifetime of the data itself.
| nirs wrote:
| Typically you keep a list entry in the struct:
| https://man7.org/linux/man-
| pages/man3/stailq.3.html#EXAMPLES
|
| The same list entry can be moved between several lists
| using the same type.
|
| If the struct need to be in multiple lists in the same time
| you can keep multiple list entries in the same struct:
| https://github.com/freebsd/freebsd-
| src/blob/69413598d2660054...
| DeathArrow wrote:
| While there's a certain use for linked lists - in operating
| systems kernels for example and as a programmer you absolutely
| have to know how linked lists work, there are other linear data
| structures which are more suited for general programming.
|
| To give a C# example, arrays, lists and dictionaries (hash
| tables) implement iterators so you can always know what the next
| element in collection is. Elements can be accessed by key or
| index in O(1),elements can be added in O(1). The case in which
| you absolutely have to insert an element in a particular position
| is rare so linked lists are seldom used.
|
| The same case is for C++, there is a vector class in STL but no
| linked list class. Same for Java.
| selimco wrote:
| > The same case is for C++, there is a vector class in STL but
| no linked list class. Same for Java.
|
| Java does have a linked list in the standard library:
| https://docs.oracle.com/en/java/javase/17/docs/api/java.base...
| cyber_kinetist wrote:
| > The same case is for C++, there is a vector class in STL but
| no linked list class.
|
| No, there is std::list.
|
| https://en.cppreference.com/w/cpp/container/list
|
| > Same for Java.
|
| Also no, LinkedList exists.
|
| https://docs.oracle.com/javase/7/docs/api/java/util/LinkedLi...
| summerlight wrote:
| https://twitter.com/cpuGoogle/status/1415061974820421635
|
| A nice explanation on rationales of using linked lists in OS
| kernel, from a Fuchsia developer. In short, it is almost
| guaranteed not to fail on most typical operations given the
| invariant is not broken, which make it suitable for cases where
| there's no fallback option left like kernel codes.
| cbreynoldson wrote:
| > You get what data structures made of links truly are: the
| triviality of a single node that becomes a lot more powerful and
| complex once it references another one.
|
| I don't think beginners actually make this connection for a
| while. Linked Lists are introduced analogously to arrays, sets,
| etc. Beginners think about Linked Lists in terms of what they
| already know.
|
| As a beginner, I thought of Linked Lists purely as non-contiguous
| arrays, even though there are deeper concepts behind them.
|
| Unless the beginners already have the perspective of "same having
| the power as the whole", I don't think this connection gets made
| for a while. Linked Lists don't expose so much possibility on
| their own.
| armchairhacker wrote:
| Linked lists are great to know and understand. Linked lists are
| not good for performance. In most cases, an array or other data
| structure is much more efficient than a linked list. But there
| are cases where linked lists' immutability/segmentation or
| simplicity to implement make them the better choice.
| taeric wrote:
| There are, interestingly, also times when fun facets of the
| linked lists mutability can lead to benefits.
| https://taeric.github.io/DancingLinks.html is my attempt at
| exploring Knuth's DLX algorithm. I can't recommend enough that
| you look into how he did it. Very fun and exciting use of
| mutable doubly linked lists.
| taeric wrote:
| I suspect the problem is that many people think of stuff like
| Java's LinkedList when they think of linked lists. As a standard
| list implementation, I think it is easy to see that a doubly
| linked list just isn't that strong. Especially with how
| cumbersome the List interface makes it to take advantage of the
| links.
|
| That said, conceptually, linked items are everywhere; such that
| you can easily find how to list things following the links. You
| probably just don't bother keeping that in a "List" structure in
| your code.
| xxs wrote:
| LinkedList is useless in Java. Aside the lock free version of
| them, Linked Lists are better avoided in any language (Few
| exceptions)
| taeric wrote:
| I'll forever think of this tweet when I see java's LinkeList.
| https://twitter.com/joshbloch/status/583813919019573248
|
| Funny, as I constantly have to tell folks to not bother using
| it at the office. Everyone always assumes it has to be faster
| than arraylist for whatever they happen to be doing this
| time. I think the vast majority of the time it flat doesn't
| matter, but I also fully expect LinkedList will lose most
| speed tests.
| xxs wrote:
| LinkedList is terrible as the memory overhead is way too
| high - if you need something better use an ArrayDeque.
| smarks wrote:
| Right. The main problem is that Java's List interface
| affords access by int index, making it seem array-like.
| Unfortunately get(i) on a LinkedList is O(n). This turns
| apparently innocuous things quadratic, like looping over a
| list by index.
| xxs wrote:
| Nothing to do with the interface - only
| java.util.RandomAccess guarantees O(1). Iterations using
| get(int) is a mistake regardless, stick to the
| java.util.iterator or ListIterator it you need the index.
|
| The main issue w/ the LinkedList is its memory footprint
| (aside being LinkedList w/ an indirection on each
| access), along with the increased GC costs (the GC has to
| iterate the nodes too, cache misses - gc pauses), even
| half empty ArrayList is more memory friendly than any
| LinkedList. ArrayDeque offers 'adds' at the both ends, if
| you wish to use LinkedList as a queue.
| jsight wrote:
| I run into people that use LinkedList everywhere in their code
| "for efficiency". It always confuses me. Are CS professors
| teaching something that leads to this?
|
| In practice, it seems unusual for it to have any advantages
| over just using an ArrayList in typical JVM applications.
| taeric wrote:
| I blame an over reliance on Big O as an arbiter of what is a
| fast algorithm. Linked lists are commonly taught as having
| much better insertion speed than arrays.
|
| This ignores constant factors and cache coherency, of course.
| More, it also ignores that most lists that folks will make
| have a VERY predictable access pattern. Typically just a one
| way scan, if any traversal at all. With very few inserts, and
| mostly just appending.
| Narishma wrote:
| Cache locality, not coherency.
| forrestthewoods wrote:
| The first problem with Linked Lists is you should almost never
| use them. Do they have uses? Of course! However on modern
| computers memory access is, roughly, more expensive than CPU
| cycles. The Linked List has been demoted to a "special case" data
| structure.
|
| The second problem is Linked Lists are taught too early. They're
| historically taught first in Data Structures 101. This results in
| new programmers using LinkedLists first when they should be used
| last.
| readams wrote:
| Linked lists are great. But they have the problem that, almost
| always, whatever problem you're trying to solve would be better
| done with a regular resizable vector.
|
| This includes problems that they should be great for, like insert
| into the middle or the front.
|
| The reason is that in practice the way computers actually work is
| that there is an enormous time penalty for jumping around
| randomly in memory, and it's large enough that it's often worth
| paying a O(lg n) cost to switch to something that will be
| contiguous in memory and allow the CPU to prefetch data.
|
| There are exceptions when you really should use an actual linked
| list, but your default should be a vector and not a linked list.
| jstimpfle wrote:
| That works if the data that you want to put in the sequence is
| copyable (doesn't have "identity"), or if you can arrange so
| that it gets created in the right spot from the start and you
| can always choose the iteration order to be sequential in
| memory. For many more complex structures, that is not the case.
| ay wrote:
| Could you give an example of a data that is uncopyable ? A
| struct witj self-referential pointers ?
| Const-me wrote:
| For example, an std::mutex is uncopyable, and even
| immovable.
| AstralStorm wrote:
| If the data is relatively big buffers, you really do not
| want to copy them. Your memory bandwidth will thank you,
| and performance will vastly increase. Cache locality will
| be good anyway since you're referring to long blocks.
| ay wrote:
| This kind of data structure often has a buffer +
| associated metadata. If the metadata+pointer fits into
| 1-2 cache lines, storing it in a vector can give be win
| vs. storing the next/prev pointers next to data +
| metadata.
|
| In any case, there is always only one authoritative
| answer: "perf top". :)
| hither_shores wrote:
| It's about the meaning, not the raw bits. I might be able
| to duplicate your signature, but that doesn't mean I can
| sign documents for you.
| jlokier wrote:
| Example: Any object shared among multiple lists or other
| data structures at the same time, or directly pointed to by
| other objects, such that any of those views must reach the
| same cheaply mutable state. This includes objects in the
| object-oriented programing (OOP) model, actors in the actor
| model, and file objects in a kernel which are in multiple
| lists.
|
| For those you have to store object pointers in the vector,
| instead of copies of the objects. That works and is often
| done, but it defeats or reduces the cache locality
| perfornance improvement which motivates using a vector in
| the first place.
|
| On an in-order CPU an _intrusive_ list (pointers inside the
| objects) can be traversed faster than a vector or B-tree of
| object pointers, though a _non-intrusive_ list (list of
| nodes containing object pointers) won 't be faster. On an
| out-order CPU with heavy speculative execution it is less
| clear cut because the vector allows successive objects to
| be dereferenced speculatively in parallel to a limited
| extent, while this is not possible traversing an intrusive
| list.
|
| If the list is going to be traversed looking for objects
| based on some filter criterion such as flags or a number
| comparison, based on non-mutable state (or where it's ok
| for mutation to be expensive), traversal can be sped up by
| storing criterion data in the vector alongside the
| pointers, or in a second vector, to avoid dereferencing
| every touched object pointer during traversal.
| gpderetta wrote:
| Linked list are indeed great for secondary traversal
| orders (or for subsets).
|
| But I don't understand your comments re in-order CPUs:
| vectors will be faster there too as they avoid the load-
| load dependency. Most modern inorder CPUs are still
| superscalar and fully pipelined.
| lispm wrote:
| what makes you think that list operations make the CPU jump
| randomly in memory and that implementors of, say, Lisp systems
| haven't optimized memory management for linked lists? A modern
| GC provides features generations, areas, copying and
| compacting.
|
| One a Lisp Machine from the 80s/90s the system saves a type
| sorted and locality optimized memory image, from which it later
| boots the Lisp system. Additionally to the typical GC
| optimizations (incl. integration with the paging system) it
| also provided CDR coding, which allocates lists as continuous
| memory. The GC also creates those CRD coded lists. CDR coding
| fell out of fashion in modern Lisp implementations, because the
| space and time savings aren't that great to justify the more
| complex implementation.
| eru wrote:
| > Linked lists are great. But they have the problem that,
| almost always, whatever problem you're trying to solve would be
| better done with a regular resizable vector.
|
| One important exemption is when you want your datastructure to
| be persistent.
|
| See https://en.wikipedia.org/wiki/Persistent_data_structure
|
| But for many applications there are also better persistant data
| structures than linked lists around.
| Smaug123 wrote:
| Nobody has mentioned immutability yet - linked lists are easily
| used as one of the simplest immutable persistent data
| structures. Your definition of "better" appears to be solely
| about keeping the CPU burning, but if CPU _isn 't_ the
| bottleneck, I prefer "immutable linked list" over "clone a
| vector many times" merely on aesthetic and reason-ability
| grounds.
| mst wrote:
| cons cells are a local maximum.
|
| often they're sufficiently maximal.
| yongjik wrote:
| Well, of course if a solution is "cloning a vector many
| times" then arguably you're using the vectors wrong - in
| which case it's totally appropriate to find a different data
| structure (or a better way to use vectors, if possible).
| gleenn wrote:
| Or you could go the Clojure route and do the tree-style
| immutable vectors where the cost is log32(N) for most
| operations, small enough in all the relevant operations, and
| you get the amazing usability of immutable data structures
| that are performant enough in most cases.
| kazinator wrote:
| How does that scale down to small sequences of under ten
| items?
| Jim_Heckler wrote:
| an array is used for the last 1-32 elements of the vector
| (the "tail") so there would be no trie at all, just the
| tail
| dan-robertson wrote:
| In C where using the simplest data structures is valuable,
| you'd likely do much better by upgrading your programming
| language to something that offers a better data structure. If
| you're using a higher level language where you can abstract
| details about your sequence data structure, I think there's
| very little benefit to using the simplest data structure and
| you should instead have widely used libraries that offer a
| good immutable sequence type, e.g. RRB trees.
|
| One exception is ML style languages where singly linked lists
| get a big syntax advantage (Haskell only half counts because
| most of its lists are really just iterators). I think this
| was, in hindsight, a mistake for a widely used language. It's
| also a source of pointless pain for learners (eg I don't
| think there's much benefit to people having to write lots of
| silly tail-recursive functions or learning that you have to
| build your list backwards and then reverse it). Another
| exception would be some lisps where lists are everywhere and
| the cons-cell nature of them makes it hard to change.
| osigurdson wrote:
| >> This includes problems that they should be great for, like
| insert into the middle or the front.
|
| How is a vector good for inserting elements in the middle? That
| is O(N). Where is the O(lg n) cost coming from?
|
| >> your default be a vector and not a linked list
|
| The default should be to understand the problem.
| tsimionescu wrote:
| > How is a vector good for inserting elements in the middle?
| That is O(N). Where is the O(lg n) cost coming from?
|
| Both an array and a linked list are O(n) for adding in the
| middle. In an array, you finding the middle element is O(1),
| then moving the rest of the array is O(n) [you have to move
| n/2 elements]. In a linked list, finding the middle element
| is O(n) [you have to traverse n/2 elements], but adding it is
| O(1).
|
| So, asymptotically there is no difference. In practice
| though, the linked list traversal will cause O(n) cache
| misses to reach the middle element (since we are traversing
| random memory), while the array move will only incur O(1)
| cache misses (since we are accessing memory in order) - so,
| the array will actually win in practice.
|
| Edit: Note that I also don't know where the GP got the O(lg
| n).
| osigurdson wrote:
| It seems that the case where it is necessary to find the
| exact middle element and insert something wouldn't be very
| common but OK, I'll humor this use case. Let's assume that
| the list is huge and inserting at the middle is the only
| thing we do. Just maintain a pointer to the middle element
| and insert there: O(1) for find and insert. If it is
| necessary to add/remove at other places in the list, it is
| possible to maintain a doubly linked list and move the
| middle pointer back and forth accordingly (still O(1)).
|
| I suppose if the LL like structure is mostly read only with
| rare inserts, not that big and holds simple data types or
| structs and often needs to be read sequentially then an
| array/vector/list would be better than a regular LL but
| then it is obvious that an array is better anyway.
|
| It's poor guidance to tell people that the vector is the
| "go to" when you need a LL. Sad actually that this poor
| guidance has been upvoted by so many people such that it is
| the top comment, all in the name of some FUD about cache
| misses.
| tsimionescu wrote:
| Let's take the case where we need to add to any element
| of the structure, with uniform probability.
|
| For an array, this means we'll have to move N elements if
| we need to add at the beginning, N/2 if exactly in the
| middle, or 0 if at the end. Either way, we'll need to re-
| size the array, which has some chance of requiring N
| copies anyway - say, half the time. So, on average, we'll
| perform 3N/4 moves per element added, requiring 1 cache
| miss every time.
|
| For a (doubly-)linked list, this means we'll have to
| traverse 0 elements if we want to add right before Head,
| 0 if we want to add right after Tail, and worse case
| we'll need to traverse N/2 nodes if right in the middle -
| so, we'll have to do on average N/4 traversals per
| element added. Each of the nodes traversed will generally
| be a cache miss, so N/4 cache misses. So, the Linked List
| will indeed be require a third the number of operations
| on average, but many times more cache misses, so should
| be expected to lose. If you store the middle pointer as
| well, you'll again half the number of operations for the
| list, but that still won't cover the amount of time
| wasted on cache misses.
|
| Of course, if the distribution is not uniform and is in
| fact skewed near a predictable element of the linked
| list, at some level of skewed-ness vs cache speed
| advantage, the Linked List will overtake the array.
|
| Note that linked lists also have other advantages - for
| example, they can be used as in the Linux kernel to store
| the same object in multiple different lists without any
| copying and without extra pointer indirections (at a
| small cost of one extra pointer in the object per list it
| could be a member of). They are also easier to pack into
| a small address space that has to be dynamically
| allocated (since you don't require a large contiguous
| block).
| llbeansandrice wrote:
| I think by "middle" they mean "at the current position" not
| at an arbitrary position in the middle when starting from
| HEAD
| huhtenberg wrote:
| > almost always
|
| As any generalization this one too is, of course, incorrect.
|
| Outside of the realm of academia, the need to keep data pieces
| in several _containers_ at the same time, with no heap
| operations on insertion or removal and O(1) removal is very
| common, especially in the kernel space and embedded contexts.
| The only option that fits the bill - you guessed it - are the
| linked lists.
| nostrademons wrote:
| Note that this (and the article) describes an _intrusive_
| linked list. _Extrusive_ linked lists (like you might see in
| a class library or CS 101 project), where the node structure
| is heap-allocated separately from the data that it points to,
| have very few practical advantages over vectors, which is why
| standard libraries are increasingly defaulting to vectors
| even when the class is named "list".
| majjgepolja wrote:
| Is there a difference between intrusive and extrusive list
| when the member type is a struct (value type)?
| nostrademons wrote:
| Yes. An intrusive list puts the pointers to next & prev
| in the struct itself. IntrusiveList<Member>::iterator is
| a Member*, then member->next points to another Member*.
| An extrusive list puts the pointers in a separate node
| structure, which then has a pointer to the actual member
| type. ExtrusiveList<Member>::iterator is a
| ExtrusiveListNode<Member>*, node->next points to another
| ExtrusiveListNode*, and you access the actual Member*
| with node->data. Basically it's a double indirection.
|
| The advantage of extrusive linked lists is that you don't
| need to modify the data layout of Member at all. All the
| linked list manipulation happens on ListNodes, which have
| a pointer to the actual member, which can be accessed
| with just a cast. That's why they're so well-suited to
| class libraries: you can easily write a generic
| ExtrusiveLinkedList that works with any data that can be
| represented with a pointer, hide all the traversal (and
| the existence of individual ExtrusiveListNodes) in
| accessor functions, and present a simple API with no
| codegen or modifications to Member.
|
| The advantages of intrusive linked lists are 1)
| performance 2) no heap allocations 3) easy traversal when
| given a Member*. It's only a single indirection rather
| than a double, and usually that indirection is necessary
| anyway to avoid copies. Insertion and deletion are just
| pointer manipulation; you don't need to allocate new
| space for the nodes. Oftentimes your functions will take
| Member* anyway, and it's nice not to take or traverse the
| container if you just need to peek at the next element
| coming up.
|
| The point of this subthread is that intrusive linked
| lists have very clear use cases in the kernel & embedded
| spaces, where these advantages are often critical.
| Extrusive linked lists, however, have very few advantages
| over vectors (which also require heap allocation but are
| more cache & locality friendly). In a common business app
| responding to user input, the difference between O(1)
| amortized and O(1) is negligible; you can eat the
| occasional pause for a vector resizing, because the user
| isn't going to care about a few milliseconds. They both
| require heap allocation. The vector will give you _much_
| faster traversal, because subsequent reads all hit cache.
| The vector takes less memory (1 pointer per element, with
| a max of 50% overhead, while an extrusive doubly-linked
| list is 3 pointers per element). There 's just little
| reason to use an extrusive linked list because the use-
| cases where intrusive linked lists are not better are
| precisely the same ones where vectors are better.
| forrestthewoods wrote:
| As any generalization this one too is, of course, incorrect.
|
| In the real world of application development when choosing
| between Vector and LinkedList the answer is almost always
| Vector. There are, of course, certain problems where
| LinkedList is always the correct answer.
|
| However for all problem that have chosen either Vector or
| LinkedList as their container of choice a super majority
| should choose Vector.
| Rioghasarig wrote:
| > As any generalization this one too is, of course,
| incorrect.
|
| His statement wasn't a generalization to repeating this
| back does not make sense.
| AstralStorm wrote:
| There are essentially no vectors in most of the embedded
| world, including RTOS. Why would you need that many extra
| copies of data?
|
| Coopies are not free. Not in terms of stack, neither in
| terms of memory or its bandwidth. Both of which are in
| demand there.
|
| For an example, look into how most of Zephyr RTOS is
| implemented. It's mostly intrusive lists (called queues
| there for some reason, but they support iteration) or ring
| buffers (msgq or mpsc_pbuf or pipe). For bigger systems you
| can use red-black trees.
|
| There is no vector data structure, you can allocate a block
| from the heap and manage its size manually, but it's rarely
| done as most sizes are static and many data structures are
| preallocated by the bootloader or loaded directly from
| flash memory. Cache locality of these is of course possible
| to manipulate manually by use of memory sections.
|
| A list does not copy by default, which is an extremely
| important performance characteristic. And it does not
| require overallocation like a vector of pointers, plus has
| a fast remove, especially at head or tail.
| forrestthewoods wrote:
| Extra copies, huh?
|
| Feel free to replace the term "Vector" with "Contiguous
| Memory". The relevant distinction in this conversation is
| "contiguous vs node pointers". Embedded loves contiguous
| blocks.
|
| A dynamic array copies memory if it regrows, inserts,
| removes. The cost of that operation depends on numerous
| factors. But it doesn't result in "extra copies" laying
| around.
|
| Oh wait maybe you mean array operations perform extra
| copies, not that they persist. In which case it comes
| down to which is more expensive: memcpy some bytes or
| cache inefficient pointer traversal.
|
| It depends! But most problems for most programs are MUCH
| faster if they avoid LinkedList. Many people, especially
| new programmers, radically underestimate the penalty of
| pointer chasing.
| ernst_klim wrote:
| > Embedded loves contiguous blocks.
|
| Quite the opposite. When you have no dynamic allocations,
| vectors are out of the question, while pool of nodes is
| fine and sound.
|
| And quite often you can't copy elements from one place to
| another in embedded or operating systems, since many
| consumers can store a pointer to this element. Of course
| this could be solved by usage of vector of pointers, but
| this completely ruins the ``cache locality'' argument.
| tylerhou wrote:
| If you're iterating, sure, use a vector. If you're pulling from
| a queue and doing a lot of processing, maybe an RPC -- does the
| ~70ns memory latency hit really matter that much? Probably not.
| kibwen wrote:
| Why impose that latency if you don't need to? It costs me
| nothing to reach for a vector instead.
| tylerhou wrote:
| Without extra work you can't use a vector like a queue, and
| a circular buffer doesn't necessarily support all types
| and/or preserve iterators on insert. Could use a deque, a
| blocked linked list. But IMO list is fine if it's not
| clearly a bottleneck.
| AstralStorm wrote:
| Costs you a dynamic memory allocation, code overhead and
| higher memory use. There are good reasons why std::array
| and plain arrays exist. Vectors are for when your data is
| unbounded, which is actually a risk most of the time. For
| truly big data, you want a special structure anyway.
| pornel wrote:
| If you're pulling from a queue use a ring buffer.
| AstralStorm wrote:
| Sometimes. Depends on whether you are filling in a data
| buffer or trying to send an already calculated data buffer.
|
| A standard ring buffer does copies/fill. A ring buffer of
| pointers has similar performance characteristics to a
| linked list.
| Phelinofist wrote:
| OT but does anyone know a general purpose ring buffer
| implementation for Java?
| grogers wrote:
| LMAX disruptor: https://lmax-
| exchange.github.io/disruptor/user-guide/index.h...
|
| It's used by log4j2 and several other java libraries and
| is quite solid. You can use it in blocking or polling
| modes, so it's quite generic depending on your latency
| requirements.
| speed_spread wrote:
| Unfortunately log4j2 is no longer a shining example of
| good engineering, if you're trying to sell something...
|
| I'd prefer saying LMAX is / was used as the core of high
| frequency trading applications.
| manv1 wrote:
| It's surprising people are arguing about this. That fact alone
| shows that lots of developers don't understand the relationship
| between L1/L2 cache, data access patterns, and how prefetching
| works.
|
| That makes sense, given how abstract things have gotten.
| Decades ago there was an article that showed the penalty for
| non-linear data access was massive. That was before speculative
| access, branch prediction and cache prefetching were standard
| features. The performance hit today would be even more (except
| presumably for machines that have implemented the speculative
| execution security mitigations).
| eru wrote:
| Well, ideally your programming language (and its standard
| library) should provide you with basic data structures that
| are cache oblivious (or optimized for your arrangements
| cache). So programmers wouldn't need to worry.
| teo_zero wrote:
| I think the argument of locality and linearity and cache hits
| is valid, but specific for one access pattern: reading items
| in order and adding to the tail. If that's what your code
| ends up doing most of the time, then an array wins over a
| linked list hands down.
|
| As soon as you start adding or deleting items in the middle,
| the game changes because the data you need to move is not
| contained in the cache anymore (except for small arrays).
|
| I can even imagine a workflow that uses both data structures:
| an app could build up a linked list with the data it reads
| from an external file, for example, where they can be
| scattered and not in order; when the file is finally closed,
| the data are considered read-only and rearranged into an
| array for the remaining part of the execution.
|
| So don't dismiss linked lists as inefficient "a priori",
| which is exactly what antirez claims in the article.
| tsimionescu wrote:
| That analysis is lacking. Linked lists turn out to be
| inferior to arrays for essentially any access pattern, at
| least when the linked list can be assumed to be randomly
| stored and not accidentally sorted.
|
| Say you want to add an element in the middle of an
| N-element ll/array.
|
| For the linked list, this means you need to do 2 * N/2
| memory reads (read value, read next pointer), which will
| mean on average N/2 cache misses, to reach the element at
| index N/2. Then you'll do O(1) operations to add your new
| element. So, overall you've done O(N) operations and
| incurred O(N) cache misses.
|
| For an array, there are two cases:
|
| 1. In the happy case, you'll be able to re-alloc the array
| in-place, and then you'll have to move N/2 elements one to
| the left to make room for your new element. You'll incur
| one cache miss to read the start of the array, and one
| cache miss to read the middle of the array, and one cache
| miss for realloc() to check if there is enough space. The
| move itself will not incur any more cache misses, since
| your memory access pattern is very predictable. So, overall
| you've done O(N) operations, but incurred only O(1) cache
| misses - a clear win over the linked list case.
|
| 2. Unfortunately, sometimes there is no more room to expand
| the array in place, and you actually have to move the
| entire array to a new place. Here you'll have to move N
| elements instead of N/2, but you'll still incur only 1
| cache miss for the same reason as above. Given the massive
| difference between a cache miss and a memory access, doing
| N operations with 1 cache miss should easily be faster than
| doing N/2 operations with N/2 cache misses.
|
| The only access patterns where the linked list should be
| expected to have an advantage are cases where you are
| adding elements very close to an element which you already
| have "in hand" (e.g. at the head, or at the tail if you
| also keep a pointer to the tale, or to the current element
| if you're traversing the list while modifying it). In those
| cases, the linked list would need O(1) operations and cache
| misses, while the array would need O(N) operations and O(1)
| cache misses, so it would start to lose out.
| pencilguin wrote:
| The fact is that in 95+% of cases, the performance doesn't
| matter at all, and a linked list is fine.
|
| The only problem is that use cases evolve, and what was fine
| suddenly isn't, anymore.
| couchand wrote:
| As with any such subject, it's worth discussing the details
| but probably not the generalities really (except perhaps in
| the whimsical style of antirez).
|
| You're right to point out that given the performance
| characteristics of most modern CPUs, accessing memory
| linerally is optimum. But it doesn't necessarily follow that
| all collection data structures should be stored as a vector.
| Consider a collection traversed very slowly relative to the
| overall computation. There may not be any cache win if the
| line's always evicted by the time you come back to it.
| fiddlerwoaroof wrote:
| I've always wondered about whether you could have your cake
| and eat it too with linked lists where nodes are allocated
| in contiguous blocks of N nodes. So, the normal case of
| traversing the next pointer is not a discontinuous memory
| access.
| tsimionescu wrote:
| That's quite likely to happen to a linked list when using
| a compacting GC (such as Java's or C#'s generational
| garbage collectors) (assuming the elements of the list
| are only reachable from the preceding element, not from
| many other places as well).
| simplotek wrote:
| > It's surprising people are arguing about this. That fact
| alone shows that lots of developers don't understand the
| relationship between L1/L2 cache, data access patterns, and
| how prefetching works.
|
| Your misconception lies in the assumption that the only
| conceivable use case is iterating over a preallocated data
| structure to do read-only operations.
|
| Once you add real world usages, with CRUD operations
| involving memory allocations/reallocations then your
| misconception about cache crumbles, and you start to
| understand why "people are arguing over this".
| Dylan16807 wrote:
| Which misconceptions? Messy memory allocations hurt lists
| even more.
| pizza wrote:
| It's hard for me to blame developers for their misaligned
| mental models since, as you yourself point out, the
| relationship between your mental model of the hardware's
| behavior and what it actually does is complex and not really
| obvious. Mitigations etc means that the hardware you bought
| might totally change its behavior in the future. If all you
| can rely on is time measurements, it doesn't explain that the
| software you wrote last year is now eg 40% slower due to bug
| mitigations. I think it's easier to expect developers to put
| in the effort once they're empowered with powerful tools to
| explain how the software gets mapped onto the hardware.
|
| I think tools that result in greater transparency at lower
| levels will probably emerge, something like a compiler-
| generated dynamic analysis + pre-flight checklist hybrid. But
| since it's so low-level, it might actually affect results,
| which may be a problem. I think intel vtune seems interesting
| in that regard.
| djha-skin wrote:
| This is only a problem with implementation of linked lists. The
| article talks about how linked lists are conceptual. They make
| a nice data structure to work with in code. But several ways
| could easily be conceived of that optimize them so that
| elements are adjacent to each other.
| jalino23 wrote:
| by vector you mean regular js arrays?
| josephg wrote:
| Broadly yes, C++'s vector is the equivalent of JS arrays.
|
| But the javascript array type is much more complicated. It
| intelligently changes its internal structure based on how you
| use it - which is pretty wild.
| andrekandre wrote:
| nsarray (objective-c) also does something similar
|
| https://news.ycombinator.com/item?id=2413656 (sadly the
| article link is dead)
| haimez wrote:
| I recommend James Micken's JavaScript talk on this matter:
| https://m.youtube.com/watch?t=3m10s&v=D5xh0ZIEUOE
|
| (Time stamped to the beginning of the arrays section)
| LecroJS wrote:
| Would you mind elaborating on js arrays intelligently
| changing their internal structure based on use? My
| background is only JS/Python and have never touched C++ so
| I don't have the context of how these differ.
| josephg wrote:
| A c++ vector internally uses an array - which is a chunk
| of memory where each item is just in the next spot in
| memory in sequential order. Eg, the array [1,2,3] might
| have 1 at memory address 100, 2 at memory address 101 and
| 3 at memory address 102. Given an array index, you can
| figure out where a value is stored in memory pretty
| trivially.
|
| When the array fills up, vector will allocate a fresh
| array (thats bigger - often 2x bigger) and copy
| everything over. This is slow - because it has to copy
| everything. But much faster than you think.
|
| So thats how C++'s _vector_ (and Rust 's _Vec_ ) work.
|
| Javascript implementations (like v8) can swap out the
| implementation behind the scenes based on how you're
| using your array. The API is always the same - but the
| way the array works in memory will be different based on
| whatever is fastest given the way you're using your
| array. (V8 doesn't know ahead of time what sort of data
| will be in your array or what you're using it for, so it
| guesses then changes things up if it guesses wrong).
|
| Here's a blog post talking about some of the internal
| details in V8:
|
| https://itnext.io/v8-deep-dives-understanding-array-
| internal...
|
| The actual implementation is much more complex than this
| blog post suggests. (I think they also optimize based on
| push/pop vs shift/unshift vs other stuff).
| LecroJS wrote:
| Thanks for taking the time to write that up-I appreciate
| the explanation.
|
| My only adjacent knowledge here was when I checked the V8
| implementation of the `sort` method on the array
| prototype and saw it changes the sorting algorithm used
| based on array length (IIRC insertion sort for length <
| 10 and merge sort otherwise).
|
| Thanks for the V8 resource as well.
| bjoli wrote:
| While I generally agree with you, I have had multiple occasions
| where amortized O(1) was unacceptable, whereas the overhead of
| linked lists was ok.
| eru wrote:
| You can make dynamic arrays have worst case O(1) at the cost
| of some extra overhead.
| bjoli wrote:
| Copy the array on every append? :)
| halayli wrote:
| if you can replace linked list with vector then you don't need
| a linked list and the two shouldn't be compared as it is almost
| always obvious when you need one over the other. Link lists are
| used in areas when there is a relationship between nodes like a
| list of running processes. one of the frequent operations is to
| remove a node from whatever position it's in and add it on
| either side.
| Sirened wrote:
| Right. Like there are so many algorithms that I shudder to
| think about how you'd implement them without linked lists.
| Like...how about a buddy allocator? Sure you _could_ use a
| vector for each but you 'd be copy and resizing huge swathes
| of memory constantly for very little gain. Use the right tool
| for the job!
| tsimionescu wrote:
| The thing is, the cost of copying memory is essentially the
| same as the cost of reading it, and linked lists force you
| to keep re-reading the same memory over and over (assuming
| random access).
|
| Adding an element in the middle of an array vs the middle
| of a linked list actually has the same asymptotic
| complexity (O(n)), but far better cache impact for the
| array (since moving a contiguous block of n/2 elements
| should only incur 1 cache miss, while reading n/2 randomly
| distributed nodes of the list will incur on average
| something like n/4 cache misses). Adding closer to the
| beginning of the linked list is faster, but adding close to
| the end of the vector is faster still, so overall with
| random access, the vector should win.
| DeathArrow wrote:
| Also, when in doubt, do a quick benchmark with different
| implementation.
| imbnwa wrote:
| I've always wondered why GSAP and CreateJS both rely on linked
| lists for sequencing animations
| xaedes wrote:
| I learned to love linked lists as soon as I discovered that I
| can just store them in vectors to get the performance of
| guaranteed contiguous memory: //
| LinkedListItem[k]: item[k], prev[k], next[k]
| std::vector<T> item; std::vector<uint> prev;
| std::vector<uint> next;
|
| Similar is used in transparency rendering with per-pixel
| linked-lists.
| anthomtb wrote:
| Interesting idea but how does C++ guarantee contiguous memory
| for a vector? I just don't see how a data structure with an
| arbitrary, dynamic size can also reside in a contiguous range
| of address space.
| zabzonk wrote:
| the size of a c++ object is fixed at compile-time
| fnbr wrote:
| When the array is resized, it's moved to a new contiguous
| block of memory: everything is copied or moved over. See:
| https://stackoverflow.com/questions/8261037/what-happen-
| to-p...
| frankchn wrote:
| Simple, you just allocate a bigger contiguous chunk of
| memory and copy the entire vector over when the current
| chunk maxes out.
| [deleted]
| nraynaud wrote:
| I think a lot of graph edits are a nightmare to perform with an
| adjacency list, and a breeze when you just edit linked
| pointers.
|
| Imagine the Euler Operators on a manifold (ie a 3D mesh) or on
| a planar graph, but on a vector instead of on a DCEL.
| chrisseaton wrote:
| > almost always, whatever problem you're trying to solve would
| be better done with a regular resizable vector
|
| Resizing a vector brings the whole of it into cache,
| potentially dragging all of it all the way from actual RAM.
| Prepending an entry brings nothing into cache.
|
| Cache is king.
| tsimionescu wrote:
| Sure, prepending or adding at the end(assuming you're keeping
| a tail pointer, which is easy and common) are basically the
| only cases where a linked list can beat an array.
| chrisseaton wrote:
| Yeah, and we deliberately prepend because that doesn't
| involve writing to a field in a node which may not be in
| cache. It's the consumer's job to reverse the list if they
| ever actually need it.
| jameshart wrote:
| > Resizing a vector brings the whole of it into cache
|
| I don't do low level code on modern systems; is there not a
| way to just blit chunks of memory around without a CPU
| looping it's way through word by word carrying out MOVQ
| instructions which do a full RAM read/write?
|
| If you're streaming data out of memory to a PCIe bus or
| something, that wouldn't go via cpu cache, right? There's
| some kind of DMA model, so there's something in the system
| architecture that allows memory access to be done off-CPU (I
| vaguely think of this as being what the northbridge was for
| in the old school x86 architecture but I'm fuzzy on the
| details?).
|
| Mechanical copying of byte arrays feels like something a CPU
| should be able to outsource (precisely because you don't want
| to pollute your cache with that data)
| chrisseaton wrote:
| Not in architectures I've seen or maybe I'm missing the
| instructions to do it? Even if there were and you can
| bypass the CPU, hauling data out of RAM is still going to
| take a long time.
| NobodyNada wrote:
| > maybe I'm missing the instructions to do it?
|
| Yes, they're called "non-temporal" load and store
| instructions. AFAIK most memcpy implementations will
| automatically use non-temporal access when copying blocks
| larger than a certain threshold, precisely to avoid
| destroying the cache during bulk copies.
|
| Additionally, a well-optimized memcpy implementation
| (e.g. making good use of prefetch hints) should not
| suffer _too_ much from RAM latency and be able to pretty
| much max out the bandwidth available to the CPU core.
| chrisseaton wrote:
| You're thinking of like MOVNTDQA? Those still have to
| reach into RAM - that's still double-digits nanoseconds
| away and still creates a stall in the processor at _n_
| complexity!! No thanks? I can allocate a new linked list
| node with memory guaranteed already entirely in cache
| (except for TLA failure.)
|
| > be able to pretty much max out the bandwidth available
| to the CPU core
|
| Why max out a slow link at _n_ complexity when you can
| just avoid doing that entirely and use a linked list?
| [deleted]
| Dylan16807 wrote:
| Surely you know the answer, if you're analyzing at this
| level of detail?
|
| It's because spending 1 millisecond per X operations
| doing vector moves is preferable to spending 20
| milliseconds per X operations stalling on list links.
|
| If you set a good vector size from the start, the moves
| will be rare to nonexistent. Or you could reserve some
| address space and _never_ have to move the data.
| tsimionescu wrote:
| In the ideal case (and assuming a vector large enough that
| it spans many pages), you would actually be able to work
| with the OS's virtual memory manager to re-allocate and
| move only the page that contains the address you want to
| write to, while all other pages can stay untouched. But
| there is quite a bit of careful work to do to implement
| something like this and make sure you are not invalidating
| pointers or even internal malloc() data structures.
| KerrAvon wrote:
| So use an array deque if you need O(1) prepending.
| chrisseaton wrote:
| Resizing the array in the array dequeue brings the entire
| thing into cache. That's a disaster.
| kragen wrote:
| It's a disaster but it's an amortized O(1) disaster.
| CyberDildonics wrote:
| It's far from a disaster because the best way to deal
| with a list is still to use a contiguous array.
|
| Allocation isn't free. You have to do it sometime and
| doing it all in bulk is going to have the same
| consequences as just one small allocation anyway. If you
| are mapping new pages into main memory, making system
| calls or looping through a heap structure, you're not
| only affecting the data cache but also the TLB while also
| pointer chasing.
|
| Making a list out of an allocation for every element is
| performance suicide on modern computers. Even the
| amortized resizing of an array is done at the speed of
| memory bandwidth because of the prefetcher. Thinking that
| is some sort of performance hindrance is absurd because
| it outweighs the alternative by orders of magnitude. If
| there really are hard limits on the time every insertion
| can take, then the solution needs to be links of large
| chunks of memory to still minimize allocations on every
| element.
| kragen wrote:
| That's amortized O(1), not WCET O(1). For transformational
| systems they're equivalent, for reactive systems they
| aren't.
| im3w1l wrote:
| One data structure I recently found myself wanting is a tree-
| based "array", with O(log n) access by index / insertion
| anywhere / deletion anywhere.
|
| Kinda weird how rare it seems to be.
| vanviegen wrote:
| That does sound useful, although I'm having some trouble
| thinking of what for exactly. :-)
|
| And indeed, I don't recall seeing something like that. At a
| first glance O(log n) seems easy to do in the average case,
| but perhaps not in the worst case.
| socksy wrote:
| I suppose it's quite similar to the idea of Bagwell's
| persistent data structures[1] -- those are particularly
| useful for lock free coding and with a high enough
| branching factor could be not as much overhead as a linked
| list or just plain copying.
|
| [1] https://hypirion.com/musings/understanding-persistent-
| vector...
| Munksgaard wrote:
| That sounds like a pretty run-of-the-mill balanced binary
| tree? Rust has a BTreeMap: https://doc.rust-
| lang.org/std/collections/struct.BTreeMap.ht...
| vanviegen wrote:
| Inserting into an array increments the indexes of all
| subsequent values. How would a regular BTree emulate that
| behavior?
| AstralStorm wrote:
| It would rebalance on insert with O(log N) performance
| but chunky constant factor, keeping the ordering so
| access by index is also O(log N) with low constant
| factor...
|
| Which is why usually you would use a red-black tree
| rather than a BTree, as it has much lower constant for
| insertion and access by index. However higher for
| traversal in order.
| Koshkin wrote:
| Well, doesn't the "index" need to be stored in the tree
| itself for this to work? If so, the tree just represents an
| ordered map. (Changing indexes after deletion or insertion
| would be a problem, though.)
| im3w1l wrote:
| https://stackoverflow.com/a/837911 disregard the O(1)
| discussion.
| WhitneyLand wrote:
| I wish the word vector didn't have so many different uses in
| science and engineering.
| layer8 wrote:
| That depends on the number of elements and the size of the
| payload data. Both approaches can be combined by using a linked
| list of arrays.
| marcosdumay wrote:
| Yes.
|
| There is a reason why it's hard to create either a bare
| linked list or a bare array on modern language. It's because
| both are bad extremes, and the optimum algorithm is almost
| always some combination of them.
| andirk wrote:
| Understanding, and even sympathizing, with the machine's toil
| re: vectors vs manual linked list can separate a good engineer
| from a great one. We learn linked lists, assembly, and even
| machine code in Computer Science majors so that we know what's
| happening under the hood and can more easily surmise the
| runtime effects.
| kibwen wrote:
| All of these are indeed important, and will eventually (one
| hopes) result in the student realizing that linked lists are
| to be avoided because of their poor mechanical sympathy
| resulting from their awful cache locality.
|
| Valid uses for a linked list are extremely niche. Yes, you
| can name some, and I can name several valid uses for bloom
| filters. Just because a simple linked list is easy to
| implement does not mean it should be used, any more than
| bubble sort should be used for its simplicity.
| throwawaymaths wrote:
| Wait why do linked lists have bad cache locality? It
| depends on how your heap is set up. For example, you could
| have an allocator that gives relatively good locality by
| having high granularity and only bailing out if say your LL
| gets super long (so if your lists are usually short, they
| could have awesome locality)
| AstralStorm wrote:
| If you are in that place, you're probably using a linked
| list over small statically allocated memory, but you
| still need random ordering and fast remove or reorder.
| duskwuff wrote:
| That only works if you allocate lists all at once and
| never modify them.
|
| If there are any other allocations taking place at the
| same time -- say, allocations from other threads, or for
| other data you had to create while building the list,
| that locality is shot. Same goes if you insert more
| elements to the list later, or if you perform an
| operation which changes its order (like sorting it).
| wtetzner wrote:
| > That only works if you allocate lists all at once and
| never modify them.
|
| You can reorder and modify the contents of the list
| without losing locality.
| duskwuff wrote:
| I mean, if you modify the contents of the list nodes in
| place, sure. But at that point, you're just doing an
| array with extra steps.
|
| Changing any of the pointers, however, will make memory
| prefetch much less efficient. CPUs like linear memory
| access patterns; they're easy to predict. Chasing
| pointers is much less predictable, even if the pointers
| are all to the same region of memory.
| wtetzner wrote:
| It might still be cheaper than copying the values if
| they're large.
| cerved wrote:
| That doesn't seem to make much sense to me. LL are useful
| in cases where you deal with changing lists, where you'd
| be forced to allocate and deallocate memory often. If you
| know before hand the exact or approximate size of the
| list and it's contents you can store that continuously in
| memory with greater cache locality because you're not
| storing pointers in addition to data. Seems to me the use
| case of LL is perhaps as ill suited for cache locality as
| the structure itself
| wtetzner wrote:
| They're also useful when you want to cheaply reorder
| nodes without copying the elements.
| cerved wrote:
| True. You could implement that using an array of structs
| where each struct contains indexes of the array to link
| other structures. instead of memory pointers, since they
| are 8 bytes in 64-bit applications. This way you get the
| nice properties of both linked lists and continuously
| allocated arrays
| CalChris wrote:
| If your list is short then you won't have a problem. But
| if your list is long then traversing it can be a major
| problem.
|
| For example, if your list item is page aligned like the
| _task_struct_ in the Linux kernel, rescheduling the next
| process could traverse the process list. The process list
| links will be at the same page offset and associate with
| a restricted set of the L1 /L2/L3 caches lines and thrash
| from memory. Worse, the TLB will thrash as well. This was
| actually the case for Linux until about 2000.
|
| https://www.usenix.org/legacy/publications/library/procee
| din...
| mst wrote:
| http://trout.me.uk/lisp/vlist.pdf
| atemerev wrote:
| An order book in electronic trading.
|
| It is not too large (a few thousand of entries usually),
| but it needs to be accessed and modified extremely fast.
| There can be insertions and deletions anywhere, but it is
| strongly biased towards front. It is ordered by price,
| either ascending or descending.
|
| Various "ordered queue" implementations are usually a poor
| fit, as there are many insertions and deletions in the
| middle, and they need to avoid any additional latency. The
| same goes to vectors. Red-black trees etc. are too complex
| (and slow) for such a small size.
|
| So we start with a humble linked list, and optimize it
| (unrolled, skip search etc). But we respect linked lists,
| they work.
| BeetleB wrote:
| > Understanding, and even sympathizing, with the machine's
| toil re: vectors vs manual linked list can separate a good
| engineer from a great one.
|
| I think your parent's point is that if you're a C++
| programmer, recognizing that vectors outperform other data
| structures for most use cases is what separates a good
| engineer from a great one. Often even for things that require
| lookup (i.e. vector outperforming a set).
|
| During code reviews, a senior C++ programmer in my company
| (formerly sat in the C++ standards committee) would always
| demand benchmarks from the developer if they used anything
| else. In most cases, using representative real world data,
| they discovered that vectors outperformed even when the
| theoretical complexity of the other data structure was
| better.
|
| A great engineer knows the limitation of theory.
|
| Of course, my comment is only regarding C++. No guarantees
| that vectors are great in other languages.
| asveikau wrote:
| > [V]ectors outperform other data structures ... even for
| things that require lookup (i.e. vector outperforming a
| set)
|
| This is something I've heard before (not often benchmarked)
| and it makes intuitive sense, and sometimes I've also told
| it to myself in order to not feel bad for writing O(n)
| where n is always small.
|
| But somehow, reading this snippet, I'm reminded that the
| vector does not maintain the semantic reminder that it's
| used in a key/value use case. So it has me thinking there
| should be a map/hashtable/dictionary interface that's
| really just a vector with linear lookup.
|
| Then I think, hmm, maybe the standard containers should
| just do that. Maybe the authors of those should have some
| heuristic of "when to get fancy".
| dabitude wrote:
| These are flat_map, flat_set, flat_hash_map, etc ...
|
| See: https://en.cppreference.com/w/cpp/header/flat_map
| usefulcat wrote:
| I believe that flat_map and flat_set at least are sorted,
| so that binary_search/lower_bound can be used for
| lookups.
|
| Linear search generally does outperform for small
| collections though (it's better for caches and the branch
| predictor).
| asveikau wrote:
| C++23! The future is here.
| bluGill wrote:
| Std::find_if works great if you want to treat a vector
| like a map. Plus you can have different keys if that is
| useful for your problem (look up employee by name, id, or
| address?)
| KerrAvon wrote:
| I think the cases where lookup is faster for a linearly
| scanned vector is where the hash computation overhead is
| greater than a linear scan. Assuming you have a good hash
| implementation, that should be true only for a small
| number of entries, where you need to determine "small" by
| measuring.
| asveikau wrote:
| I think it's about locality and quickness of compare.
|
| So if you have a small data structure and/or large cache,
| and the key is not complicated.
| bluGill wrote:
| Even if the key is cheap, that it isn't in cache means
| that vector will be faster just because we avoid the
| cache miss of loading the value into memory.
| GuB-42 wrote:
| > Often even for things that require lookup (i.e. vector
| outperforming a set)
|
| I am always uncomfortable using higher algorithmic
| complexity. Sure, for small data sets, the vector will
| outperform the set, but it may not stay small forever. I
| may waste a millisecond now, but at least, I won't waste an
| hour if the user decides to use 100x the amount of data I
| initially considered. And using much more data than
| expected is a very common thing. It is a robustness thing
| more than optimization.
|
| If you want both, there are staged algorithms, a typical
| example is a sort algorithm that goes insertion ->
| quicksort -> mergesort. Insertion sort is O(N^2), but it is
| great for small arrays, quicksort is O(N log N) on average
| but may become O(N^2) in some cases, and mergesort is
| guaranteed O(N log N) and parallelizable, but it is also
| the slowest for small arrays. For lists, that would be a
| list of vectors.
| jll29 wrote:
| I like the staging idea, but curiously I haven't seen any
| papers investigating this.
|
| Obviously, there is an overhead cost of the staging
| itself, which hits the use case for small data, and there
| is the question "From what data size onwards to use what
| algorithm?".
|
| For example, I just had a look at sort in Clang's STL
| implementation, which seems to set a boundary at N=1000.
| Since the number is a multiple of 10, it looks like it
| might be arbitrarily picked, in any case there is no
| paper references in the comments to back it up.
|
| Are people aware of formal studies that explore whether
| it is worth the trouble to inspect data structure and
| then to dispatch to different algorithms based on the
| findings?
| the_sleaze_ wrote:
| You're saying you think there's still optimization to be
| done re. the N=1000 constant?
|
| Other than that, I'd suggest that big O complexity is
| already telling you that the inspection is worth it.
| Assuming inspection of an array is constant.
| BeetleB wrote:
| > Sure, for small data sets, the vector will outperform
| the set, but it may not stay small forever.
|
| In over 90% of SW applications written in C++, there is
| little to no uncertainty on the size of the data set.
| Most applications are written for a very specific
| purpose, with fairly well known needs.
|
| If my data set is size N, and I can benchmark and show
| the vector outperforms the theoretical option up to, say,
| 20N, it's a safe choice to go with vector. Doing these
| analyses _is_ engineering.
|
| In any case, it's almost trivial to swap out the vector
| with other options when needed, if you plan for it.
| zelphirkalt wrote:
| I hope that senior engineer is also the one person, who
| will refactor things, if ever there are more elements in
| any of those vectors, making linear search slower than an
| actual set or other data structure. Oh and of course that
| senior engineer will also hopefully be the one monitoring
| the lengths of all those vectors in production.
| josefx wrote:
| > vectors, making linear search slower
|
| The standard library gives you plenty of tools to
| maintain and search in a sorted vector. Often enough
| still faster than maps.
|
| > Oh and of course that senior engineer will also
| hopefully be the one monitoring the lengths of all those
| vectors in production.
|
| Doesn't a std::map have more space overhead than a
| vector?
| BeetleB wrote:
| I believe he was referring to the number of elements in a
| vector, to know when to swap with a different data
| structure - not the size in bytes.
|
| But to your point - I once needed to do a key/value
| lookup for a large data set, and so _of course_ I used a
| map. Then I ran out of RAM (had 80GB of it too!). So I
| switched to a vector and pre-sorted it and used
| binary_search for lookups (data set was static - populate
| once and the rest of the codebase didn 't modify it). And
| of course, the RAM consumption was less than half that of
| the map.
| sfink wrote:
| Just describe your sorted list as "a balanced binary tree
| with implicit child links" and then you get credit for
| being fast, space-efficient, _and_ using fancy data
| structure.
| sqrt_1 wrote:
| Here is a performance graph on removing items from a vector vs
| list - slightly old (2017)
| https://github.com/dtrebilco/Taren/blob/master/Articles/iter...
|
| From the article
| https://github.com/dtrebilco/Taren/blob/master/Articles/Eras...
| devmunchies wrote:
| That is removing an item at N index, right? (that's why it
| talking about iterators?) Linked lists are only fast at
| append/pop with the first node.
| hintymad wrote:
| Isn't linked list the fundamental data structure for LISP? The
| sweet memory of cons() this and cdr() that are all backed by
| linked list?
| tialaramex wrote:
| So, two things. One: 1980s machines don't have the memory
| access performance you see today. If N list chase operations
| takes the same time as N sequential reads then the linked
| list has the nice properties you were probably taught in
| school and actual in-memory representation of a linked list
| is a good choice. On today's machines that list chase might
| incur a sizeable cache stall, making it thousands or even
| tens of thousands of times slower.
|
| Two: Lisp doesn't actually care whether there are literally
| linked lists behind everything. Today you would use a
| different data structure reflecting the hardware you have.
| pjmlp wrote:
| Not only today, since the mid-70's, yet somehow many keep
| thinking List only has lists.
| lispm wrote:
| > 1980s machines don't have the memory access performance
| you see today
|
| They had similar problems. CPU with tiny caches <-> caches
| <-> expensive RAM in the range from 500kbytes to a few MB
| <-> virtual memory paging to slow disks.
|
| For example a typical Lisp Machine might have had 20 MB
| RAM. But the Lisp image it ran was probably already much
| larger. Thus paging spaces upwards from 60 megabytes were
| not uncommon. I had a Lisp Machine with 40 MB RAM and have
| used > 200 MB paging space. Disks were very slow. Were are
| talking about ESDI (2.5 Mbyte/sec or less) interfaces or
| later 5-10 Mbyte/sec SCSI 1 and 2.
|
| I had a 600MB ESDI disk inside a 1 MIPS Lisp Machine with 8
| Megawords RAM of 36bit memory + 8 bit ECC.
|
| Thus locality plaid an extremely large role for usable
| performance. In the early days machines had to be rebooted
| when they ran out of memory, since a garbage collection
| could take a long time. Rebooting a machine was just a few
| minutes. Doing a full GC over 200 MB virtual memory could
| take half an hour.
|
| When I was making a new Lisp image (called a world), the
| size was upwards 50MB. 100 MB was common. A special command
| ran for roughly 30 minutes and reordered the objects in
| main memory to improve locality. The another command saved
| the image - which took also tens of minutes.
|
| A big breakthrough in usability came with the introduction
| of the Ephemeral Garbage Collector, which only touched RAM
| and took care of the short lived objects, with some
| hardware support to identify and track RAM pages with
| changed content.
|
| Features back then were:
|
| * cdr coded lists which were allocated like vectors
|
| * lots of other data structures like vectors, n-dimensional
| arrays, hashtables, records, objects, ...
|
| * an ephemeral garbage collector with hardware support
| tracking changes in RAM pages
|
| * incremental garbage collection
|
| * a copying/compacting generational garbage collector with
| type sorted memory regions
|
| * cooperation between the garbage collector and the virtual
| memory pager
|
| * various manual or semi-manual memory management
| facilities
|
| * incremental memory image saves
|
| The main reason to develop Lisp Machines in the end 70s was
| to get Lisp development off of time-shared computers (with
| limited shared RAM and virtual memory) onto single user
| workstations, where RAM and virtual memory is not shared
| between different users.
|
| The same problem appeared then on UNIX machines, where Lisp
| systems often were among the most memory hungry programs ->
| thus they needed lots of RAM, which was expensive. Thus a
| lot of virtual memory was used. But access to Lisp objects
| in virtual memory was much slower than Lisp objects in RAM.
| It took many years to have competitive GCs on those
| machines.
|
| When RAM got more affordable and larger, things improved.
| pjmlp wrote:
| No, only if you are speaking about the first implementations
| of LISP until LISP 1.5 and similar timeframe.
|
| By the time Xerox, MIT, TI started delving into Lisp
| machines, the language had already gotten support for stack
| allocation, value types, arrays, structures (records).
|
| Same applies to its big brother Scheme.
| ww520 wrote:
| Cons can build a tree, though it can certainly build a list.
|
| (a1 (b1 (c1 c2 c3) b2 (c4 c5)) a2 a3) is a tree.
| sgtnoodle wrote:
| I often write code destined to run on a microcontroller, and
| linked lists are great for avoiding dynamic memory. It's nice
| to be able to statically or transiently allocate storage
| without having to bound the size of anything.
|
| Of course, the same unintuitive tradeoffs tend to apply even if
| there isn't much caching to worry about. A trivial linear
| iteration is often faster than a more elaborate algorithm when
| the input is small.
| bitexploder wrote:
| Carmack is a fan of this approach. Allocate some reasonably
| large number as your design constraint and if you start
| approaching it that is a time to ask why and is this data
| structure still reasonable here
| KerrAvon wrote:
| A memory-constrained microcontroller is actually one of the
| valid exceptions to the rule.
| Sirened wrote:
| Well, yes, and a lot of people treat kernel development as
| if they were developing for a constrained microcontroller.
| I don't think this is even a bad thing, we want the kernel
| to be as lean as possible because every page and cycle the
| kernel burns is one that users don't get to use.
| jstimpfle wrote:
| I suppose a more important reason is the "not have to
| bound anything" part. OS kernels almost definition don't
| know how many objects will be created. Implementing
| queues as vectors of pointers will require dynamic
| allocation and/or introduce many more failure points when
| going out of resources. With linked lists, once you have
| an object you can always append it to a linked-list
| queue.
| Dylan16807 wrote:
| This is a memory vs. time tradeoff. And the time cost
| gets worse as the machine gets bigger.
|
| Designing for embedded is not the same thing as being
| lean.
| rcxdude wrote:
| It's also an area where the costs are a lot less
| significant either: a cortex m4 and below often doesn't
| have a cache (at least for RAM: flash often has a cache and
| predictive lookup, but the perfomance difference is small
| enough that the cache can basically render executing code
| from flash as identical to RAM), and the RAM is about as
| fast as the processor, so random access and sequential
| access are pretty similar performance wise. (And with the
| speed of modern microcontroller cores the size of the data
| is rarely even an issue: most STM32s can iterate through
| their entire RAM in under a millisecond).
|
| That said, in my embedded work I still rarely find a linked
| list to be a useful data structure (but I know FreeRTOS,
| which I use a lot, uses them extremely heavily internally).
| G3rn0ti wrote:
| That's why the Amiga OS (Exec) used linked lists as its
| primary data structure. It was one of the first micro
| computer operating systems using dynamic memory mapping (i.e.
| no fixed addresses for OS calls and data structures) and
| supported multi tasking. Any object you received from Exec
| basically was a pointer to a list node. And, yes, the Amiga's
| biggest problem was memory fragmentation and leakage because
| unreachable linked list nodes would be starting to accumulate
| whenever working with a buggy program.
| dmitrygr wrote:
| Modern CPUs detect patterns of pointer chasing common in linked
| list use and prefetch to cache just fine. Your comment would
| have been valid a decade or more ago.
|
| And _plenty_ of use cases are much better with linked lists
| than resizable vectors. Eg: queues.
| vbezhenar wrote:
| RAM latency is huge. No amount of prefetching can fix that.
| If anything, prefetching works better with arrays as RAM
| reads multiple bytes at once.
| xxs wrote:
| Queues are best implemented with a pow2 cyclic array, and you
| have a free deque, too.
| attractivechaos wrote:
| A ring-buffer based queue is mostly better than a linked
| list but it is not necessarily the best. The typical STL
| deque implementation [1], with all its complexity, is
| usually faster than a ring buffer.
|
| [1] https://stackoverflow.com/questions/6292332/what-
| really-is-a...
| tylerhou wrote:
| I have some experience using/modifying linked-list benchmarks
| (https://github.com/google/multichase) specifically to test
| memory latency.
|
| It is extremely difficult, maybe impossible, to design a
| prefetcher that can predict the next cacheline(s) to prefetch
| while traversing in a linked-list. I am not aware of a single
| CPU that can do this consistently. For instance, if you run
| multichase (a linked-list chaser) on GCP servers, you
| generally get the expected memory latency (~70-100ns,
| depending on the platform).
| AstralStorm wrote:
| Why let the CPU guess when you can tell it what you want?
| (Prefetch a small arena of objects linked to by the list.)
| gpderetta wrote:
| Even with a perfect prefercher a linked list would still be
| slower to iterate than a vector as it is inherently serial
| (unless your CPU does address prediction and I don't think
| any CPU does).
| Borealid wrote:
| I think there's more to cache locality than prefetching. In
| an array, consecutive elements will be usually be adjacent in
| physical memory (excepting page boundaries ), so each cache
| line load where a cache line is larger than the array element
| will "overfetch" into the next element. This should mean
| fewer total cache loads and thus more efficient use of memory
| bandwidth for situations where it is constrained (such as
| walking a list).
| AstralStorm wrote:
| Then again, if you have control over allocations you can
| prefetch whole chunks of data to which the list refers
| making this point moot.
|
| Often lists refer to arrays or sections of memory. The
| performance loss if any appears in bigger structures where
| you do want to have explicit control anyway.
| howinteresting wrote:
| For most use cases a ring buffer either with or without a
| maximum length will do much better as a queue than a linked
| list will.
|
| There are some niche use cases where linked lists are good.
| Lock-free queues are one, and another big set of use cases is
| where you need hard O(1) insert even with a high constant
| factor, not amortized O(1).
| anonymoushn wrote:
| High-performance SPSC queues are always implemented with an
| array.
|
| If you need hard O(1) insert you can usually just allocate
| an array much bigger than you will need. The combination of
| hard performance requirements and willingness to wait for
| an allocator all the time is unusual.
| gamegoblin wrote:
| The place I see linked lists pop up the mosts are when
| implementing LRU caches. You need a linked hashmap of sorts
| where the hashmap gives O(1) lookup to the linked list node
| in question, and then you can extract that node and move it
| to the head of the queue in O(1) as well.
|
| This is a special case of where they also appear: linked
| hashmap implementations, where you want a hashmap that has
| a consistent iteration order.
| xxs wrote:
| Having linked nodes of a hashmap, is just that - I'd not
| call it a linked list. You can do moving median with a
| binary tree and linked nodes, too.
| josephg wrote:
| I've seen them show up in lots of situations through
| intrusive lists, which I think is the general form of the
| example you're giving.
|
| Intrusive lists are used in the Linux kernel, in physics
| engines (eg Chipmunk2d) and game engines. The way Redis
| uses them for TTL expiry follows the same pattern.
| xxs wrote:
| >where you need hard O(1) insert even with a high constant
| factor
|
| You will need a truly concurrent memory allocator for that,
| too... or if you are just the kernel (one of the reasons
| the linux kernel can benefit from linked lists)
|
| Overall, allocating new objects/memory is not guaranteed
| O(1).
| spijdar wrote:
| There are a lot of factors involved, but given a limited
| number of _cache lines_ that can be stored in low level
| cache, I would think there 'd be at least some performance
| penalty for prefetching from non-linear blocks of memory vs
| the inherent spatial locality in an array/vector, if only
| because it'd cause more old cache to be evicted.
| AstralStorm wrote:
| Or the opposite, prefetching a whole vector when you just
| need one item, evicting more than needed unnecessarily.
|
| There is a reason why non-embedded CPUs have sizable L2
| caches now.
| colinmhayes wrote:
| Queues are generally implemented with with a vector of
| pointers to vectors. Nicely diagrammed here
| https://stackoverflow.com/questions/6292332/what-really-
| is-a... Ring buffers also work better than linked lists
| waynesonfire wrote:
| is Erlang and JDK general enough for you? Oh, both use
| linked lists,
|
| https://github.com/erlang/otp/blob/master/lib/stdlib/src/qu
| e... https://github.com/openjdk-
| mirror/jdk7u-jdk/blob/master/src/...
| xxs wrote:
| why would your refer LinkedBlockingQueue? You have
| ArrayBlockingQueue, too. ArrayDeque is just better than
| LinkedList.
| [deleted]
| anonymoushn wrote:
| Even if your linked list is backed by an array and all the
| elements are right next to each other, computing some cheap
| function of the elements of a linked list is incredibly slow
| compared to the alternative because of the unnecessary and
| very long dependency chain.
|
| Vectors are basically perfect for queues.
| optymizer wrote:
| I'm going to call you out on this one, because it's a bold
| claim, and I'd love to see an explanation and some perf
| numbers.
|
| For example, I'm wondering how the CPU knows what the next
| item in the list to prefetch it. Unlike the next item in an
| array, list items could be pointing anywhere in process
| memory.
|
| Also, what counts as a modern CPU in this context? Are we
| talking latest generation desktop CPUs from Intel, or
| anything after Core 2 Duo? How about mobile devices with ARM
| CPUs? Would those just be ignored?
|
| Is there a benchmark?
| dmitrygr wrote:
| Feast your eyes on the "Intel(r) 64 and IA-32 Architectures
| Optimization Reference Manual": https://cdrdv2-public.intel
| .com/671488/248966_software_optim... (newer link, as
| requested by one of the child comments)
|
| It details the prefetchers, how they work, what patterns
| they recognize (section 3.7)
| NavinF wrote:
| Prefetching just one element ahead does fuck-all when ram
| is 100x slower than cache especially if you need to
| precharge and activate a new row which is always the case
| when you're spraying tiny objects all over the place.
| AstralStorm wrote:
| So don't spray tiny objects around. :) Keep them nicely
| tucked away in an arena.
|
| Vectors actually tend to create a spray of tiny
| objects... As opposed to a true dynamic array which has
| limitations in resize. If you're lucky, they will keep a
| single dynamic array per vector. You're usually not this
| lucky with bigger data.
| josephg wrote:
| If you never free your objects, you may as well use an
| array anyway - it'll be much more simple. And if you use
| an arena with an object pool, you're back in cache miss
| territory.
|
| Vectors only "create a spray of tiny objects" if you have
| a vector-of-references or something like that. And even
| then, reading data requires 2 reads from memory (1 to
| read the pointer in the vector, and another to read the
| object). Finding item N in a linked list will require N
| reads (since you need to read each item before the
| target). Yes, resizing a vector is O(n). But every time I
| measure memcpy, it always goes faster than I expect.
|
| If you disagree, try to prove it with code. At least one
| of us will learn something that way.
| insanitybit wrote:
| This is a great document that will absolutely confirm, in
| a number of ways, that linked lists are going to be
| terrible for prefetching.
| AstralStorm wrote:
| That depends more on what they link to. Not prefetching a
| worthless list of pointers when you could prefetch data
| instead is typically faster. Prefetchers have limited
| capabilities and capacity for loads.
| insanitybit wrote:
| The prefetcher doesn't even have to get involved for
| linear sequences, it would only get involved after a
| cache miss threshold is met. The fastest and simplest
| prefetcher, L1, is stream based and will work great for
| sequential data.
| [deleted]
| josephg wrote:
| It doesn't claim linked lists are fast. And it doesn't
| have any benchmarking data.
|
| Actually the only reference to linked lists I see is in
| the prefetcher section (3.7.3), where it explicitly
| recommends that consecutive items are stored in a single
| 4kb cache line or in consecutive cache lines. They give a
| positive code example using an array, like other
| commenters are suggesting.
| fuckstick wrote:
| It's a link to a document on architectures well over a
| decade old - the most recent mentioned is the original
| Core architecture (from 2008). Honestly, based on your
| first comment I thought you were implying something about
| content dependent prefetch or other techniques that I am
| familiar in the academic literature but unaware of ever
| being used in mainstream hardware.
|
| > Organize the data so consecutive accesses can usually
| be found in the same 4-KByte page. > * Access the data in
| constant strides forward or backward IP Prefetcher. 3-72
|
| > Method 2: >* Organize the data in consecutive lines. >
| * Access the data in increasing addresses, in sequential
| cache lines.
|
| Nothing new here that contradicts the GPs skepticism.
| Certainly not enough evidence for you to be a dick.
| dmitrygr wrote:
| yup, was link to old a document with same title. updated.
| thanks.
| fuckstick wrote:
| Ok, but the hardware data prefetch functionality seems
| basically unchanged:
|
| "Characteristics of the hardware prefetcher are: * It
| requires some regularity in the data access patterns. --
| If a data access pattern has constant stride, hardware
| prefetching is effective if the access stride is less
| than half of the trigger distance of hardware prefetcher.
| -- If the access stride is not constant, the automatic
| hardware prefetcher can mask memory latency if the
| strides of two successive cache misses are less than the
| trigger threshold distance (small- stride memory
| traffic). -- The automatic hardware prefetcher is most
| effective if the strides of two successive cache misses
| remain less than the trigger threshold distance and close
| to 64 bytes."
| forrestthewoods wrote:
| My benchmarks say you're wrong.
|
| https://www.forrestthewoods.com/blog/memory-bandwidth-
| napkin...
| dmitrygr wrote:
| You're joking right?
|
| "I made an extra test that wraps matrix4x4 in
| std::unique_ptr."
|
| And that is your test? One piece of code? Not even
| presenting disassembly? Who the hell knows what your
| compiler wrought in response to your "std::unique_ptr"
| and "matrix4x4" ?
| forrestthewoods wrote:
| Well I've provided 1 benchmark to your 0. I'd say I'm
| ahead.
|
| My code benchmark is actually far more pre-fetch friendly
| than a LinkedList because it can prefetch more than one
| "node" at a time. In a LinkedList you can't prefetch N+1
| and N+2 at the same time because N+2 is dependent on the
| result of N+1.
|
| I'm always open to learning new things. Modern compiler
| magic is deep and dark. If sometimes a compiler magically
| makes it fast and sometimes slow that'd be quite
| interesting! If you have any concrete examples I'd love
| to read them.
| attractivechaos wrote:
| Sorry for my ignorance - linked list matching the
| performance of vector is something new to me. I would
| like to learn more. The best way to prove your point is
| to show us a benchmark. However, I couldn't find one with
| a quick google search.
| porcoda wrote:
| Google for Intel hardware prefetcher and you'll turn up
| some info.
|
| Re: benchmarks. Not likely exactly what you're looking for,
| but the GigaUpdates Per Second benchmark
| (https://en.wikipedia.org/wiki/Giga-updates_per_second) is
| sort of the most-pathological-case of pointer jumping where
| the jumps are random. Prefetchers don't do well with that
| (for obvious reasons) - they tend to work best when there
| is some pattern to the accesses. Linked data structures
| often live somewhere in between - they may start with good,
| regular stride patterns, but then over time as elements are
| added and moved around, it turns into a tangle of
| spaghetti.
|
| Edit: I should add more. While prefetchers do exist in
| hardware, they're a bit of a murky subject. Sometimes they
| speed things up, sometimes they do so bad they slow
| everything down. They can be very workload dependent. And
| to top it off, hardware vendors can sometimes be a bit
| opaque when explaining what their prefetching hardware
| actually does well on. It's probably safest to not assume
| your hardware will help you when it comes to tangled
| pointer jumping code, and if you can avoid it via a
| different data structure it's probably good to do so. That
| said, other data structures may entail other tradeoffs -
| better cache usage for lookups, but potentially more
| expensive insertions. As usual with data structures, its a
| game of balancing tradeoffs.
| insanitybit wrote:
| I don't buy this.
|
| https://www.youtube.com/watch?v=WDIkqP4JbkE
|
| Prefetching in CPUs predates this talk.
|
| https://www.youtube.com/watch?v=Nz9SiF0QVKY
|
| That talk is 2018.
|
| Cache prefetching works best for linear accesses like
| with a vector, not for random pointer lookups.
| Prefetchers are also going to have an extra harder time
| with double indirection since they're unable to see which
| value is going to be fed to the lea.
|
| Prefetching is also expensive in and of itself, the CPU
| doesn't do it unless you're already incurring cache hits.
| This makes cache misses _even worse_. There are also
| multiple prefetchers in the CPU and _the L1 prefetcher_
| is very basic and is only going to help with contiguous
| accesses. Your allocator might be doing some nice things
| for you that let the _one of the_ L2 prefetchers help, I
| would expect that you 'll see better L2 performance due
| to prefetching.
|
| If you have an example of a linked list being as fast as
| a vector for iteration, please do show it.
| porcoda wrote:
| ... I didn't say it was for random pointer lookups.
| That's the pathological case that was the primary reason
| the Cray X-MT (formerly Tera MTA) was invented. As I
| said, when you have pointers with some regularity (hence
| my reference to regular or patterned striding), then you
| _may_ get some benefit of prefetching when you 're not
| just dealing with linear contiguous access. The "may" is
| important there - whether you do or don't get it is
| workload / data dependent. As I also said, more likely
| than not over time the data structure will lose
| regularity as pointers get re-wired, and the benefits of
| prefetching go out the window.
|
| I'm not sure what you "don't buy" - I didn't say
| definitively that prefetching does or does not always
| help. Sometimes it does, and when it does, it's highly
| dependent on the regularity of the way the pointers cause
| you to stride through memory. I even basically agreed
| with you -- to quote me, "It's probably safest to not
| assume your hardware will help you when it comes to
| tangled pointer jumping code". You appear to want to
| argue for some reason.
| insanitybit wrote:
| > As I said, when you have pointers with some regularity
| (hence my reference to regular or patterned striding),
| then you may get some benefit of prefetching when you're
| not just dealing with linear contiguous access.
|
| I would need a source to believe this. The only way I can
| imagine this working is that the L2 prefetcher might
| incidentally work well with your allocator - your
| allocator might try to hand out more or less colocated or
| evenly strided data.
|
| > I'm not sure what you "don't buy"
|
| This statement, primarily:
|
| > Modern CPUs detect patterns of pointer chasing common
| in linked list use and prefetch to cache just fine
|
| To me, this statement reads as:
|
| 1. Prefetchers in CPUs can prefetch data by examining
| loads that have yet to be executed
|
| 2. That they do this well, or at least that's how I read
| "just fine"
|
| _To my knowledge_ neither of those is true.
|
| > You appear to want to argue for some reason.
|
| I'm not trying to argue at all, I'm trying to learn. What
| you're saying does not sound right to be but it would be
| extremely interesting to me if I learned that CPUs are
| doing something that I'm not aware of.
|
| If you're actually saying that an allocator may end up
| striding allocations such that the prefetcher can
| reliably predict where to load next, sure I'll buy that,
| at least for L2. That's a fine thing to say, it just
| isn't what I took away from your initial statement, which
| is what I wanted to drill into because, again, it would
| be very interesting to me.
|
| Sorry if I'm coming off as argumentative, I was rushing
| that previous comment, which isn't really fair to put on
| you. I _genuinely_ just am curious and if this is a case
| of miscommunication, no worries at all. I 'd just be
| remiss if I didn't try to suss out if I'm wrong.
| nice_byte wrote:
| how do linked lists fit the regularity requirement? the
| only regularity in this pattern of access is the offset
| at which the next address to fetch is stored. this seems
| like a nontrivial amount of logic and state for the
| prefetcher to keep track of, unlike something like base *
| scale + offset when e.g. traversing a 2d array.
| monocasa wrote:
| It is more complex for hardware to keep track of for
| sure. It's been a feature of high performance cores for
| more than a decade though. Kernels tend to be based off
| of linked lists, so it's a massive win typically.
| optymizer wrote:
| Look, people who are excited about this and just want to
| learn are asking for proof. You can't just make a claim
| like "it's been a feature of high performance for more a
| decade" and provide 0 evidence for the claim, because
| it's not common knowledge and explicitly contradicts what
| is commonly taught.
|
| I can time how long it takes to iterate through an array
| and show that it is faster than iterating through a
| linked list with the same number of elements.
|
| Now, what optimization would I have to impleemnt to end
| up with a program that iterates through this linked list
| faster than the equivalent array?
| fuckstick wrote:
| A bunch of people keep saying this with no substantiation
| - the best we saw was a link to the latest Intel
| optimization manual that quite specifically refutes this
| claim. I even quoted it upthread.
|
| Do you have anything substantial to add to that?
| josephg wrote:
| I'm sure CPUs will try their best to optimize these usage
| patterns. But I suspect vector / array based approaches
| will still always outcompete linked lists in actual
| benchmarks.
|
| Thats always been my experience of them.
|
| Linked lists still show up in the kernel, in redis, in
| game engines and so on because you can't build fast
| intrusive collections using arrays.
|
| Making linked lists "as fast as we can make them" doesn't
| mean they're competitive with a vector.
| Sirened wrote:
| Apple's M1 has a data dependent prefetcher (superset form
| of pointer chase prefetcher), as discovered by the
| researchers behind the Augury paper [1]. They discuss how
| it works, how and when it activates, as well as a couple
| other cool details which should more than answer your
| curiosity. They don't have benchmarks for performance, but
| these sorts of prefetchers do exist (and possible in
| hardware you're using right now!).
|
| [1] https://www.prefetchers.info/augury.pdf
| tylerhou wrote:
| The prefetcher described in the paper implemented in the
| M1 is nowhere near a prefetcher that would be able to
| prefetched linked-list nodes.
|
| > We tested for the existence of four DMPs: both single-
| and two-level versions of pointer-chasing and
| indirection-based DMPs (Section IV). Our findings show
| the existence of a single-level pointer-chasing DMP....
|
| When activated, the prefetcher described will also issue
| loads for pointers in a _contiguous_ array of pointers.
| In particular, the pointer /addresses are already known
| by the core (because they themselves were presumably
| prefetched far ahead of time by the stream/stride
| prefetcher), so the core knows which addresses to
| prefetch. But in a linked-list, the address of the next
| node is not known by the core until _after_ the node is
| retrieved from memory. I.e. there is an additional data
| dependency, and it takes a round trip to memory to
| resolve that data dependency.
|
| It's the difference between prefetching B's in
| A0 A1 A2 A3 A4 v v v v v B0 B1
| B2 B3 B4
|
| and prefetching B, C, D, E in A -> B -> C
| -> D -> E
|
| The former is much easier to do than the latter as 1) the
| A's are easy to prefetch, and 2) the B's can be fetched
| in parallel. In the second, the core has to wait until B
| resolves before it can fetch C, and it has to wait until
| C resolves before it can fetch D, etc.
| ShredKazoo wrote:
| Has there been any work on nudging the allocator towards
| putting all the nodes on the linked list on the same page? Or
| close together in memory more generally (to leverage the L1 &
| L2 caches)
| tsimionescu wrote:
| If you're allocating the entire linked list at once, that may
| well happen in practice. Otherwise, there is no good way it
| can happen automatically; but, compacting GCs will usually do
| this for you (assuming that the list elements are only
| reachable from the previous element).
| ShredKazoo wrote:
| Interesting thanks!
| theCodeStig wrote:
| Assuming rust or other systems language.
|
| Linked lists are optimal for head access; not random access.
| For random access, yes a vector would be better.
| waynesonfire wrote:
| andrewmcwatters wrote:
| Would you like to explain to the class?
| [deleted]
| klysm wrote:
| Care to demonstrate why? I have the same experience
| waynesonfire wrote:
| the linked list has a wonderful capability that's
| frequently seen in C where you can do O(1) removal from a
| list. E.g. you can have an "object" detach itself. This is
| not possible in Java's standard linked list implementation,
| among other languages.
|
| in general, inserting and removing elements from a vector
| requires a memory copying.
|
| There are programming languages that have linked lists as
| their fundamental data structure, like Lisp and Erlang.
|
| For situations where I don't need random access, linked
| lists are hard to beat.
|
| Linked lists also make wonderful immutable data structures.
|
| linked lists have wonderful filter and flatmap performance.
| how much memory copying would a vector require?
| klysm wrote:
| Yes, but in practice the coefficients on the O() almost
| always end up working in the dense arrays favor because
| of memory locality.
| suremarc wrote:
| This is assuming you already have a pointer of the
| element you're removing, or a neighbor thereof.
| Otherwise, you'd have to traverse a portion of the linked
| list, which has a significant amount of per-element
| overhead.
| sakras wrote:
| The problem with your analysis is that you're using big-O
| notation. Big-O notation is based on an abstract machine
| that is pretty unrepresentative of real computers. In
| practice, the memcpy is always going to be faster (until
| you get to really big sizes). Put another way, linked
| list is O(1) with a massive constant, and memcpy is O(n)
| with a tiny constant.
|
| And actually this logic extends to other algorithms too.
| My rule of thumb is that a linear search (despite being
| O(n)) will be faster up to around n=100 than binary
| search (which is O(log n)) just due to the way the CPU
| operates.
| zabzonk wrote:
| yes, he does linked lists are, except in a few specific
| cases, the wrong solution
| [deleted]
| koheripbal wrote:
| This is not a substantive comment.
| dragontamer wrote:
| > regular resizable vector
|
| At gross complexity / fragmentation / to your memory allocator.
|
| Linked Lists are definitely easier to use if you're ever in a
| position where you're writing your own malloc(). The
| infrastructure needed to cleanly resize arrays (and also: the
| pointers all going bad as you do so) has a lot of faults IMO.
|
| EDIT: In particular, linked lists have a fixed size, so their
| malloc() hand-written implementation is extremely simple.
| taneq wrote:
| More complexity than allocating/deallocating a struct for
| each link in the list?
|
| And the solution to "pointers to elements of a vector going
| bad" is don't save pointers to elements of a vector.
| ay wrote:
| Doubling the vector size each time it needs to be enlarged
| takes care of fragmentation/memory overhead, in practice. Or
| preallocating if you know the rough size.
|
| It does take some discipline to avoid the storage of
| pointers, but once you get used to that it's quite fine.
|
| Source: I work on VPP, which uses vectors quite extensively.
| saagarjha wrote:
| Usually you want a smaller growth factor than that to allow
| for more reuse. (Also: what's VPP?)
| ay wrote:
| I use doubling because it's simple to reason about and
| hard to screw up the math.
|
| What kind of growth factors heuristics worked for you the
| best ?
|
| VPP: rather fast user mode dataplane. https://fd.io/ is
| the "marketing" site. https://wiki.fd.io/view/VPP is the
| "less flashy" dev wiki.
| sfink wrote:
| I usually use doubling, but I just recently landed a
| patch to expand by a factor of 8 instead, which sped up a
| microbenchmark (of constructing a JSON string, fwiw) by
| 50%.
|
| Someone saw in a profile that we were spending a lot of
| time in realloc in a task that involved building a string
| in contiguous memory. But the final string was realloc'd
| down to its actual size, so it was safe to get a lot more
| aggressive with the growth to avoid copying. The overhead
| was only temporary. It turns out that powers of 8 get big
| fast, so there are now many fewer copies and the few that
| happen are copying a small portion of the overall data,
| before a final octupling that provided mostly unused
| space that didn't cost much.
|
| Know your problem, I guess?
| ay wrote:
| Very interesting. And indeed, especially being adjacent
| with the very interesting "golden ratio" comment, this
| shows that there is more than one definition of "best"
| dependent on the task, and that one always needs to
| verify their hunch by actual profiling.
| Kranar wrote:
| The optimal growth factor is the golden ratio (1.6), in
| practice many vectors use a growth factor of 1.5.
|
| The reason for not going above the golden ratio is that
| it prevents any previously allocated memory from ever
| being reused. If you are always doubling the size of your
| vector, then it is never possible to reclaim/reuse any
| previously allocated memory (for that vector) which means
| every time your vector grows you are causing more and
| more memory fragmentation, as opposed to using a growth
| factor of 1.5 which results in memory compaction.
| llbeansandrice wrote:
| Why does that happen when doubling size? I don't
| understand
| Kranar wrote:
| Sure we can go over both cases:
|
| Case 1: Growth factor of 2 and an initial size of 10
| bytes.
|
| Start with an initial allocation of 10 bytes of memory.
|
| On growth allocate 20 bytes and release the 10 bytes,
| leaving a hole 10 bytes.
|
| On growth allocate 40 bytes and release the 20 bytes, the
| hole in memory is now 30 bytes large (the initial 10 byte
| hole + the new 20 byte hole).
|
| On growth allocate 80 bytes and release the 40 bytes, the
| hole is now 60 bytes.
|
| On growth allocate 160 bytes and release the 80 bytes,
| the hole is now 140 bytes.
|
| So on so forth... using this strategy it is never
| possible for the dynamic array to reclaim the hole it
| left behind in memory.
|
| Case 2: Growth factor of 1.5 and an initial size of 10
| bytes.
|
| Start with an initial allocation of 10 bytes of memory.
| On growth allocate 15 bytes and release the 10 bytes,
| leaving a hole 10 bytes.
|
| On growth allocate 22 bytes and release the 15 bytes, the
| hole in memory is now 25 bytes large (the initial 10 byte
| hole + the new 15 byte hole).
|
| On growth allocate 33 bytes and release the 22 bytes, the
| hole is now 47 bytes.
|
| On growth allocate 50 bytes and release the 33 bytes, the
| hole is now 80 bytes.
|
| On growth reuse 75 bytes from the hole in memory left
| over from previous growths, the hole is now 5 bytes.
|
| With a growth factor of 1.5 (or anything less than the
| golden ratio), the hole grows up to a point and then
| shrinks, grows and shrinks, allowing the dynamic array to
| reuse memory from past allocations.
|
| With a growth factor of 2, the hole in memory continues
| to grow and grow and grow.
| Dylan16807 wrote:
| 99.9% of the time, each allocation leaves a separate
| hole. Your goal is to enable other allocations to use
| those holes without further splitting them, not to get
| your ever-growing vector to reuse memory.
| sfink wrote:
| This appears to assume that there will be no intervening
| allocations that are allowed to use the same region of
| memory, since otherwise your "hole" will be very
| discontiguous.
|
| But if that is the case, then why are you releasing
| memory? That's just costing you time moving the data.
| With a doubling mechanism:
|
| Start with an initial allocation of 10 bytes of memory.
|
| On growth allocate another 10, and don't copy anything,
| leaving no hole at all.
|
| On growth allocate another 20, and don't copy anything,
| leaving no hole at all.
|
| etc.
|
| In what scenario would the growth factor of 1.5 actually
| help? If you're restricting the allocation API to only
| malloc then you can't grow the size of your allocation
| and what you said might make sense as something the
| allocator could take advantage of internally. But realloc
| exists, and will hopefully try to extend an existing
| allocation if possible. (If your earlier allocation was
| fairly small, then the allocator might have put it in an
| arena for fixed-size allocations so it won't be possible
| to merge two of them, but once things get big they
| generally get their own VMA area. Obviously totally
| dependent on the allocator implementation, and there are
| many.)
| shaklee3 wrote:
| good to know vpp is still alive. always interesting work
| coming out of it, and it never seemed to take off.
| ay wrote:
| There are quite a few sizable businesses (hundreds of
| MUSD) built on it / depending on it, that I know of :)
| shaklee3 wrote:
| out of curiosity, is it mostly Cisco still funding it?
| what are the advantages over dpdk? mostly TCP?
| ay wrote:
| The funding - I believe organizationally it has now grown
| into https://lfnetworking.org/about/members/ and fd.io is
| just one of the projects.
|
| Contributions-wise to VPP: varies from time to time. On
| my very recent memory there were sizable IPSec
| acceleration patches from folks at Intel. There is a fair
| few one off bugfixes coming from smaller users who
| scratch their own itch. Pim van Pelt [0] has
| contributed/improved a few very cool and sizable features
| (Linux control plane integration, VPP yaml configurator
| [1])
|
| As for difference with DPDK - does it do L3 routing? NAT
| ? MAP ? MPLS ? DHCP ? VPP can do all of this, and of
| course it's just off top of my memory...
|
| https://docs.fd.io/vpp/22.10/aboutvpp/featurelist.html is
| the autogenerated feature list based on the "claimed
| official features" that are tracked via YAML files in the
| source code.
|
| [0] https://ipng.ch/s/articles/2021/08/12/vpp-1.html [1]
| https://lists.fd.io/g/vpp-
| dev/topic/feedback_on_a_tool_vppcf...
| asddubs wrote:
| or if you can't guarantee allocations will succeed and need
| to handle that case. in that case you can just have the
| object that's part of the list have its list element data
| contained within itself. (aka an intrusive list)
| pfdietz wrote:
| Compacting garbage collectors put linked lists into nearby
| memory.
| josephg wrote:
| Even if memory was packed by a GC, linked lists still have to
| pay the cost of allocating and garbage collecting every cell
| of memory individually. I doubt a GC would solve linked
| lists' woes.
|
| Do you have some benchmarks demonstrating your claim?
| throwawaymaths wrote:
| Erlang GC is pretty good, plus there are some nice highly
| local allocators
| josephg wrote:
| Be that as it may, it'll still take a benchmark to
| convince me.
| Jach wrote:
| There are also prefetch instructions. I listened to
| https://www.youtube.com/watch?v=SetLtBH43_U recently
| (transcript: https://signalsandthreads.com/memory-
| management/), part of it talked about some work in OCaml's
| GC.
|
| > ...each individual memory request that's not in cache
| still has to wait the full 300 cycles. But if we can get 10
| of them going at the same time, then we can be hitting a
| new cache line every 30 cycles. I mean, that's not as good
| as getting one every 16 cycles, but it's close. You're
| actually able to get to a reasonable fraction of the raw
| memory bandwidth of the machine while just traversing
| randomly over this huge one gigabyte heap
|
| https://github.com/ocaml/ocaml/pull/10195 shows the change
| adding prefetching to the marking phase
| (https://github.com/ocaml/ocaml/pull/9934 was done earlier
| for sweeping). There are some benchmarks in the
| thread/linked from the thread.
| [deleted]
| tsimionescu wrote:
| I don't have a benchmark, but it's important to remember
| that GCs only scan live objects, and they always have to
| scan all live objects; while allocation is normally an
| extremely simple instruction (bump the memory pointer).
| It's true though that scanning a live linked list would be
| expected to be slower than scanning a live array (though
| some optimized scanning strategies may be possible).
|
| Overall I couldn't tell what the impact on overall
| performance would be, but I can tell you for sure that de-
| allocating an N-element linked list or array are both O(1)
| operations for a copying garbage collector (which are the
| most common kinds of compacting GCs), since they simply
| will never reach nor copy any of the elements.
| josefx wrote:
| Will they? I don't think they reorder live objects relative
| to each other, so any object allocated between two nodes
| would still end up between those nodes after the GC ran.
| pfdietz wrote:
| It is very possible for a compacting garbage collector to
| reorder live objects.
| pencilguin wrote:
| It might or might not. Commonly, not.
| sfink wrote:
| Disagreeing with other answers here, but usually they will,
| and not because the authors tried to make it happen. It's
| just sort of the default behavior.
|
| A copying collector has to trace through the whole graph to
| find everything reachable. The tracing order is usually
| going to be depth first. (It's the easiest to code if
| you're using recursion and therefore maintaining state with
| the runtime's stack, but even with an explicit stack you'll
| want to push and pop the end for locality.) Which means
| that objects arranged into a linked list will be visited in
| list order, and you generally copy as you visit an object
| for the first time. So... reordering live objects relative
| to each other "just happens". Even if it's a doubly-linked
| list, since the back pointers have already been visited and
| so don't matter.
|
| This is less true if your objects have a lot of other
| pointers in them. But that's just saying that if your
| object graph isn't a list, then it won't end up laid out in
| list order. It'll still end up in DFS order according to
| the true graph.
| [deleted]
| kabdib wrote:
| The first week into my prior job, I had to fix an in-production
| bug involving quadratic vector growth that was tipping over
| servers. The vector implementation guaranteed too much, namely
| consecutive memory layout (which wasn't needed, and if we had
| needed it, it would have been a mistake).
|
| Decomposing that vector into recursive subvectors solved the
| problem. Going from a few tens of millions of elements in a
| single contiguous vector (with consequent -- and bad -- heap
| fragmentation!) to nested vectors with a few thousand elements
| each brought us back online again.
|
| Which is to say: Vectors are nice. Lists are nice. Use
| appropriate data structures for your scale, watch your semantic
| dependencies, and don't get hung up on dogma.
| AtlasBarfed wrote:
| This sounds like you need a log structured merge tree.
| mst wrote:
| The VList paper - which I archived at
| http://trout.me.uk/lisp/vlist.pdf to avoid having to google
| the bloody thing every time - is an interesting hybrid.
| simplotek wrote:
| Thanks for referring to the article. Great read!
| mst wrote:
| http://trout.me.uk/lisp/ and http://trout.me.uk/gc/ are
| both "selected stuff I knew I'd have trouble finding
| again later" and I wish you great joy of them.
|
| Still sad that John Shutt (creator of 'kernel lisp')
| passed away while I was still procrastinating dropping
| him an email - I think the only other 'thank you' I've
| most regretted not sending in time was to Terry
| Pratchett.
| GeorgeWBasic wrote:
| Thank you for that. That's a concept I remembered reading
| about, but I couldn't remember the name and had no way to
| find it again.
| mst wrote:
| trout.me.uk/lisp/ and trout.me.uk/gc/ are both basically
| archives of "stuff I knew I'd suffer that problem with"
| and it's a pleasure every time the contents come in handy
| for people who aren't me.
|
| Also back when I was still pretending to be a
| mathematician I used both GWBASIC and UBASIC for assorted
| purposes so thank -you- for the nostalgia kick.
| bluGill wrote:
| If your elements are in the millions than a tree might be
| better, depending on what you do. If you traverse the whole
| thing often then vector is best, but at a million a tree will
| do a search for element faster despite the cache misses.
|
| I don't know where the line is, but in general you are
| probably not close to that line so default go vector. However
| as your point is, sometimes you are over that line.
| sally_glance wrote:
| And if you really need it you can even do both tree and
| vector simultaneously - just make sure both consist of
| pointers only so you avoid update/sync issues.
| bluGill wrote:
| You can, but probably shouldn't. The pointers means a
| cache miss when you follow it. Putting everything in the
| vector is the point, otherwise the map is potentially
| faster
| jcelerier wrote:
| If you have to do a binary search in a million elements
| you'll have cache misses _anyways_
| bluGill wrote:
| Sure, ln cache misses. With a linear search through
| pointers you have n cache misses.
| diabeetusman wrote:
| Correct me if I'm wrong, but it sounds like you're describing
| an Unrolled Linked List [1]? If so, neat!
|
| [1]: https://en.wikipedia.org/wiki/Unrolled_linked_list
| ndriscoll wrote:
| Ironically, if you weren't using huge pages, the operating
| system was already doing this for you anyway, and your
| contiguous segments weren't contiguous.
|
| This does suggest functionality that the OS should provide:
| let the program remap memory somewhere else in its address
| space to obtain a larger contiguous chunk of addresses
| without copying data. This wouldn't help much if you have
| direct pointers, but I think e.g. Java has a layer of
| indirection that would let it benefit from remapping to grow
| ArrayLists. I don't know if things already work this way. It
| should make speculative fetches easier for the processor
| since now it understands what you're doing via the memory
| map.
|
| Edit: taking this idea a little further is actually
| fascinating: use the type system to separate objects into
| pools based on type, and instead of pointers, use indexes
| into your pool. You can move the pool for free to grow/shrink
| it. I think you could even do this with pointers and full
| hardware acceleration today if you used the high bits to tag
| pointers with their types and used remapping at the
| hypervisor level to move the pools (so the address in the
| application level doesn't change). You could do some crazy
| stuff if the application could coordinate multiple levels of
| virtual memory.
| pencilguin wrote:
| OSes do already provide this ability: mmap.
| ndriscoll wrote:
| Maybe I'm just not seeing it, but I don't see how mmap
| let's you remap an existing address to a new one? Looks
| like there is a mremap on Linux in any case.
| comex wrote:
| If you want to stick to mmap, you can do it by creating a
| file which doesn't have any actual disk backing (one way
| to do this is with POSIX shm_open), and handling all
| memory allocations by mmapping parts of that file rather
| than mapping true anonymous memory. Then, 'remapping' an
| address to a new one is a matter of calling mmap again
| with the same file offset. Unlike mremap, this allows
| mapping the same memory at different addresses
| simultaneously, which can be useful for some purposes.
| pencilguin wrote:
| /dev/hugepages/ is a good place to put this if you have
| enough control over the host.
| pencilguin wrote:
| Or, you can map new space onto the end of an existing
| mapping, at a slightly greater cost of keeping track.
| Mremap is usually better.
| kabdib wrote:
| That sounds like a lot of work unless you absolutely have
| to have a contiguous allocation of every element.
|
| If you don't need that, you can get "O(1.1)" access and
| nice heap behavior with an indirection or two, and the
| implementation is likely to be lots more portable and
| standards-friendly than if you also dragged in a bunch of
| OS mmap() semantics (which tend to have exciting,
| platform-dependent misfeatures).
|
| Also, I would have to reach for the mmap() man page. I
| have malloc() right in front of me. :-)
| colanderman wrote:
| glibc realloc(3) does this on Linux, by way of mremap(2)
| with MREMAP_MAYMOVE [1]. (Just confirmed on my local
| machine using strace(1).)
|
| Unfortunately, libstdc++ does not (to my knowledge) use
| realloc(3) or mremap(2) for vector resize, nor could it in
| many situations due to the semantics of C++.
|
| [1] https://github.com/bminor/glibc/blob/15a94e6668a6d7c569
| 7e805...
| bee_rider wrote:
| I somehow misread your comment the first time through and
| thought you were suggesting the ability to ask the OS:
| "Hey, I'm a pointer, is there anything in front of my block
| of memory, if so can you move it elsewhere so that I can
| expand?" This would of course be a pretty silly idea but it
| would be funny to give the OS folks a whole new class of
| use-after bugs to worry about.
|
| Edit: Let's call it "memoveouttamyway"
| FeepingCreature wrote:
| Isn't that `realloc`? At the allocator level rather than
| OS level, but still.
| kabdib wrote:
| Right, the ability to relocate a chunk of memory to a new
| virtual address (with more capacity at the end) without
| doing a physical copy would be an interesting way to solve
| the problem.
|
| I don't know if anyone has done research on integrating
| data structures and OS functionality in this way.
| Interesting idea.
|
| We needed to get things running again rather quickly,
| though :-)
| cyber_kinetist wrote:
| Some game engine developers have certainly used this
| idea, to prevent any reallocations when the dynamic array
| needs to be resized. Such as:
|
| https://ruby0x1.github.io/machinery_blog_archive/post/vir
| tua...
|
| Some ECS implementations use this to reduce the overhead
| of dynamic arrays, as well as ensuring pointer stability
| of the items stored. For example entt:
|
| https://skypjack.github.io/2021-06-12-ecs-baf-part-11/
|
| And here's a library implementing a resizable, pointer-
| stable vector using virtual memory functionality from the
| OS:
|
| https://github.com/mknejp/vmcontainer
| amluto wrote:
| You can certainly allocate memory at controlled addresses
| using mmap or VirtualAlloc.
|
| But you should _not_ play virtual memory games at runtime
| in an application that you expect to be scalable. Unmapping
| memory or changing its permissions requires propagating the
| change to all threads in the process. On x86, the
| performance impact is dramatic. On ARM64, it's not as bad
| but it's still there.
|
| Also, even 64 bits can go fast when you start using high
| bits, separate pools for different purposes, etc.
| comex wrote:
| Linux has mremap() for this. macOS has vm_remap(). Not sure
| about other operating systems.
| pencilguin wrote:
| Another name for that is "B* tree". Or anyway a thing with
| similar performance characteristics.
| cobalt wrote:
| also what a deque (C++) does
| pencilguin wrote:
| True... except in MSVC. std::deque in MSVC is cursed.
|
| Targeting MSVC, use boost::deque or something, instead.
| Boost containers are generally fairly good, although they
| are more or less unmaintained nowadays. (E.g.,
| boost::unordered_map::reserve doesn't work.) Or, like
| many of us, ignore performance on MS targets because
| anybody who cares about performances is running something
| else.
| q-big wrote:
| Why?
| pencilguin wrote:
| https://github.com/AcademySoftwareFoundation/openvdb/issu
| es/...
|
| Longer answer is that the size of the "blocks" is limited
| to 512 bytes or one element, whichever is larger. So
| unless your elements are really tiny, it is strictly a
| pessimization.
| cyber_kinetist wrote:
| Many of the STL containers in MSVC are _incredibly_
| cursed. Even std::vector is implemented in such a weird
| way, that you can 't debug it easily with any debugger
| other than Visual Studio. And it's _incredibly_ slow on
| Debug mode, up to the point that it 's unusable...
| (Because of these reasons I just switched to using my own
| Vector<T> class recently)
|
| I think there's a very good reason why old gamedevs who
| were stuck with developing on Windows largely ignored the
| STL and made their own containers / utility functions:
| MSVC's STL was just _awful_.
| simplotek wrote:
| > Many of the STL containers in MSVC are incredibly
| cursed. Even std::vector is implemented in such a weird
| way, that you can't debug it easily with any debugger
| other than Visual Studio. And it's incredibly slow on
| Debug mode, up to the point that it's unusable...
| (Because of these reasons I just switched to using my own
| Vector<T> class recently)
|
| Here's a link to an old HN discussion that offers some
| insight on the performance of MSVC's STL in debug.
|
| https://news.ycombinator.com/item?id=18939260
|
| The "cursed" keyword hand-waves over the problem and
| suggests it originates in the acritical propagation of
| ignorant beliefs. Sometimes clarifying these is just a
| Google search away.
| cyber_kinetist wrote:
| Yes, I know that already. And good luck with dealing all
| the linker/ABI issues from changing that setting.
| brantonb wrote:
| I always wondered why we weren't allowed to use the STL
| in the Windows division in the 00's. This probably
| explains it.
| pencilguin wrote:
| There was a lot of BS, as well. It is impossible to tease
| out which was what.
| ahtihn wrote:
| > ignore performance on MS targets because anybody who
| cares about performances is running something else.
|
| That's a strange take. Surely games still represent a
| significant chunk of the demand for C++ developers that
| need to care about performance?
| pencilguin wrote:
| Thus, "many". Game coders would not be among those.
| kabdib wrote:
| Right, though we didn't need full tree semantics (just fast
| random access and an append that didn't spool up the fans
| on the servers).
|
| I would hate to have to do a full balanced b-tree on short
| notice. (We also had rather terrible extra requirements
| that made using an off-the-shelf thing from, say, the STL
| work in our environment).
| samsquire wrote:
| For a simple and readable implementation of a Python
| btree see this:
|
| https://github.com/samsquire/btree
|
| I tried to keep the implementation as simple as I could.
| dilap wrote:
| simple vector implementations will do a full realloc and
| recopy when you run out of space. so O(1) amortized but
| O(N) every so often.
|
| another implementation strategy, at the cost of using
| some extra memory, is to allocate the new space, but only
| copy over a bit at a time, each time a new element is
| added -- so every add is still O(1). no long pauses.
|
| (well, assuming the memory alloc is still fast. which i
| think it should be in practice? tho i'm not super
| familiar w/ that stuff.)
|
| just curious, do you think that solution would have
| worked for your situation too?
| kabdib wrote:
| That's exactly the strategy that was tipping over. Once
| you start resizing vectors of tens of megabytes, it's
| kind of over.
|
| You can start playing games by accelerating capacity
| allocation when a vector rapidly starts growing extremely
| large. This doesn't solve the problem that vectors
| sometimes guarantee too much for what you need.
| dilap wrote:
| Ah thanks, I didn't read closely enough the first time.
| taf2 wrote:
| My Forman always would tell me KISS - (keep it simple
| stupid), maybe there is another single word for just solve
| the problem at hand silly - j,s,p,a,h,s - doesn't really roll
| off the tongue the same way...
| Razengan wrote:
| Seems to me that all these arguments are rooted in the way
| computer memory is fundamentally implemented: Could it not be
| possible to re-architecture it so that it's more agreeable to
| arrays and lists, since that's how most data is used?
|
| I often wonder about the separation of CPU and RAM, of ways it
| could be done better. How about making it like neurons, where
| each "memory cell" is also a teeny tiny computer itself? (How DO
| neurons do internal computation anyway?)
| fortran77 wrote:
| I think the Rust people are de-emphasizing the importance of
| linked lists due to the glaring hole in the Rust language that
| makes it difficult to implement linked lists, and horribly
| awkward to do doubly-linked lists in safe Rust
|
| (See https://rcoh.me/posts/rust-linked-list-basically-impossible/
| )
|
| To save face, the Rust astroturfers are now skipping around
| saying "Linked lists aren't important"
| c-cube wrote:
| The author is a known C programmer (he wrote redis). You're
| imagining things.
| klntsky wrote:
| Not to mention path sharing (also called structural sharing) for
| free.
| chrisseaton wrote:
| I've used linked lists where I wanted allocation of an almost-
| write-only log data structure to be very fast - so not resizing a
| buffer, and nobody cares about cache locality except for
| allocation. In a system with a TLA buffer and bump-allocation,
| this is absolutely ideal because allocation is local and very
| fast, and does not cause any of the existing log to be brought
| into cache.
|
| "Nobody uses this data structure stuff in real programming at
| work! You can forget it after college!"
| [deleted]
| eatonphil wrote:
| Here's another example for the crowd: an "unbounded" fifo queue
| in a fixed memory environment.
|
| https://github.com/tigerbeetledb/tigerbeetle/blob/main/src/f...
| gslepak wrote:
| Anyone else getting a serious cert expired warning when visiting
| this site?
|
| The one I'm getting is also for the wrong name - registered for
| redis.io and expired on `Fri, 07 Aug 2020 11:30:03 GMT`. Issued
| by Let's Encrypt. Fingerprint: `07:BF:EA:59:DB:83:33:77:50:27:A8:
| C5:2F:80:F6:CA:E6:EC:E2:7D:01:DB:72:7A:AF:EE:69:DD:EC:2D:DA:F3`
|
| Seems like a MITM attack is going on.
| gslepak wrote:
| Oh, nevermind, link was submitted in HTTP, but I have mandatory
| HTTPS enabled and the server is misconfigured.
| scaramanga wrote:
| My favorite typo/brainfart so far this week has to be "cache
| obviousness" :)
| m3kw9 wrote:
| If you work on iOS, it's really hard to do a better job than what
| the SDK has implemented in the arrays function.
| bitwize wrote:
| Linked lists are one of those things, like textual protocols,
| that people reach for because they're easy and fun but, from an
| engineering standpoint, you should never, ever use unless you can
| provide a STRONG justification for doing so. The locality of
| reference characteristics of vectors mean that, except for
| extreme cases, they beat linked lists almost every time on modern
| cached CPUs.
| ignoramous wrote:
| > _Linked lists are simple. It is one of those rare data
| structures, together with binary trees and hash tables and a few
| more, that you can implement just from memory without likely
| stepping into big errors._
|
| Well, if one likes linked lists enough, they can replace binary
| search trees / hash tables with a skip list. Super powerful and
| easier to implement than splay trees, red black trees, avl trees,
| augmented trees, what-have-you.
| ComputerGuru wrote:
| One of the only times linked lists are "undoubtedly the right
| option" is if you're writing some sort of intrusive data
| structure to avoid an allocation by reusing caller stack from
| another location. Not sure how many run into that often.
| docandrew wrote:
| When I need to avoid an allocation it's because I'm writing an
| allocator :) Overlaying the list nodes on top of a block of
| free memory avoids the need for a secondary data structure to
| keep track of free areas.
|
| The other time allocations are undesirable, and therefore
| linked lists are widely-used, is bare metal work when other
| allocators just aren't available. You can do tricks like
| representing pools of free and in-use objects with two lists
| where the nodes occupy the same memory block. Allocations are a
| fast O(1) unlink from the free list and O(1) re-link to the
| used list, and frees are the opposite.
| photochemsyn wrote:
| For educational purposes (i.e. hooking easily into graphics
| packages for visualization) you can make a linked list in Python,
| using elements like:
|
| class Node: def __init__(self, dataval=None):
| self.dataval = dataval self.nodepointer = None
|
| The only reason to do this is for education/visualization of data
| structures, although I'm wondering if this can be engineered to
| cause a Python memory leak, via circularization of the linked
| list, or if Python's GC would catch it. Also educational perhaps.
|
| Where linked lists really seem to come into play is with custom
| manual memory allocators in embedded programming, something of a
| niche subject.
| [deleted]
| lanstin wrote:
| I just used linked lists to make an LRU hard limit on the number
| of items in an in memory store. Each usage, the pointers are
| updated to move the item to the head. Each time the list is at
| the limit, the tail is removed and freed. I may take advantage of
| the machinery to make it go onto free list. This was Go and the
| DLL thing is my first use of generics.
| layer8 wrote:
| I think I first learned about linked lists in the context of
| programming for AmigaOS, which has a standardized form of
| (intrusive) linked lists that is used throughout the system. I
| remember being fascinated by the header node which serves as both
| the head and the tail of a list due to a "clever" field layout:
| https://wiki.amigaos.net/wiki/Exec_Lists_and_Queues
| Narishma wrote:
| Linked lists made more sense back then when memory access
| wasn't as slow compared to CPU speed.
| OOPMan wrote:
| Oh lord, can anyone make a post without whining on about Twitter?
| thedracle wrote:
| There was a time when memory was conventionally expensive, memory
| allocation of large blocks was slow, and small blocks was much
| faster (malloc would find a small block faster than a large one
| on a highly fragmented heap).
|
| Before having gigantic caches that would engulf nearly any sized
| contiguous list, linked lists were sort of vogue, frugal, and
| thought of as being fairly performant.
| AstralStorm wrote:
| That time is still there in embedded hardware. Even potent ARM
| chips (except the cellphone ones maybe) have some KB of cache
| at best, and similar amounts of SRAM. While code size is still
| at a premium since XIP (execute-in-place from in-built flash)
| is slower.
| stephc_int13 wrote:
| The good thing about linked list is how easy they are to
| understand and use, almost trivial.
|
| The bad thing is that their cost nis ot obvious and not
| consistent (difficult to predict).
|
| I tend prefer simple arrays.
|
| This talk by Mike Acton is a good introduction to understand why
| going for the linked list as go-to data structure can lead to
| performance issues.
|
| https://www.youtube.com/watch?v=rX0ItVEVjHc
| belter wrote:
| "Does anyone use LinkedList? I wrote it, and I never use it." -
| https://news.ycombinator.com/item?id=33418705
| int_19h wrote:
| "Don't get me wrong; I love linked lists. They're as useful in
| C as they are in Scheme. But LinkedList is another matter
| entirely."
| [deleted]
| dreamcompiler wrote:
| Lisp of course was originally invented for manipulating linked
| lists and it came about in an era where cache locality wasn't an
| issue because computers in 1959 were speed limited by their CPUs
| rather than their memories.
|
| Cdr coding [0] solves several problems with singly-linked (cons-
| style) lists including cache locality and the 2x storage overhead
| of these lists. But it's not typically used except in dedicated
| hardware e.g. Lisp machines. It could in principle be revived
| fairly easily on modern 64-bit Lisp systems on stock hardware --
| especially if such systems provided immutable cons cells.
|
| But it's probably not worthwhile because modern Lisp programmers
| (in Common Lisp at least) don't use lists all that much. CL has
| very nice adjustable arrays and hash tables that are easier and
| faster than lists for most purposes.
|
| [0] https://en.m.wikipedia.org/wiki/CDR_coding
| cyber_kinetist wrote:
| I think the whole "dynamic arrays" vs "linked lists" debacle is
| just a waste of time, since the two actually complement each
| other.
|
| - You can certainly implement linked lists using dynamic arrays,
| so you call the minimal amount of `malloc()`s and also make your
| items stored contiguously! You need to use indices instead of
| pointers, since you will lose pointer stability unless you use
| virtual memory (though the upside is that typically a uint32_t is
| enough for most cases, which only takes up half as much memory as
| a pointer).
|
| - A lot of data structures under the hood uses linked lists, even
| the ones that seem to use dynamic arrays on the surface. Have
| experience in writing a simple arena allocator? For the allocator
| to find a free space in the arena in O(1), you need to maintain a
| free-list data structure under the hood. And guess what: you need
| to write a linked list. Even your operating systems' `malloc()`
| itself probably uses linked lists under the hood.
|
| - Linked lists just come up inside so many other data structures,
| that you can essentially call it as the 'backbone'. For example:
| in compute graphics there is something called a half-edge data
| structure [0], which stores the geometry data of a mesh that's
| both compact, easy to iterate, and easy to do manipulations with
| (for instance, you can do things like insert/delete an edge/face
| easily in O(1) time, and more complex mesh operations can be
| implemented as well). And guess what? The vertices and halfedges
| are essentially stored in a complex web of linked lists.
|
| [0] https://jerryyin.info/geometry-processing-algorithms/half-
| ed...
| swellguy wrote:
| I think this article could be more intelligently titled: "Don't
| post, read, or respond to things on TWTR". The linked lists
| aspect could be replaced with anything.
| btilly wrote:
| In scripting languages it is a truism that anything you would
| want to do with a linked list is probably better done with an
| array. And given the overhead of working in the language versus
| things implemented in C, this is generally true.
|
| However I have a fun use for linked lists where this is NOT true.
|
| Anyone who has played around with algorithms has encountered
| dynamic programming. So you can, for example, find the count of
| subsets that sum to a particular number without actually
| enumerating them all. But what if instead of counting them, you
| wanted to actually find some?
|
| The answer turns out to be that instead of using dynamic
| programming to get a count, you use dynamic programming to build
| up a data structure from which you can get the answer. And the
| right data structure to use turns out to be...a type of linked
| list! There is no faster array equivalent for what you get.
|
| In other words, with a few extra fields for clarity, here is the
| basic structure for subset sum of positive integers:
| { current_sum: ..., count_of_solutions:
| ..., current_value: ...,
| solutions_using_current_value: (link in one dim of linked list),
| solutions_using_previous_values: (link on other dim of linked
| list), }
|
| I leave figuring out the dynamic programming code to generate
| this as a fun exercise. Likewise how to extract, say, the 500'th
| subset with this sum. Both are easy if you understand linked
| lists. If not...well...consider it education!
| Legion wrote:
| > (oh, dear Twitter: whatever happens I'll be there as long as
| possible - if you care about people that put a lot of energy in
| creating it, think twice before leaving the platform)
|
| You mean the people that all just got fired?
| pugworthy wrote:
| Linked lists are great. But they have the problem that, almost
| always, whatever technical interview you have, someone asks you
| to whiteboard how to reverse one.
|
| And I answer, "I'd google it"
| fnordpiglet wrote:
| Linked lists also work great in a sharded distributed data
| structure. You can make changes to the list structure by simply
| changing reference at nodes and don't need to move anything.
| sudarshnachakra wrote:
| I guess linked lists as they are are very useful for implementing
| queues (particularly those that feed thread pools) where-in the
| costs of growable array is not needed and the cache locality does
| not matter (continuing with a thread pool example - It's almost a
| guarantee that having next element in the cache of the CPU which
| is not going to execute the next Runnable is a waste).
|
| In Java particularly the both array as well as the linked
| implementation of blocking queues should perform equally well.
| FWIW most queue implementations are linked lists.
| NavinF wrote:
| The best implementations typically have a queue per thread and
| work stealing. The first threads to finish their assigned work
| will grab items from other queues, but until then you get
| perfect cache locality.
|
| Java's queues and global threadpool queues in general are
| pretty old hat.
| Jtsummers wrote:
| Language wars, editor wars, and now data structure wars. What
| random technical thing will people go to virtual blows over next?
| keepquestioning wrote:
| https://www.reddit.com/r/slatestarcodex/comments/9rvroo/most...
| B1FF_PSUVM wrote:
| I'd note that the number of people creating content now is
| probably two or six orders orders of magnitude larger than
| before the internet, when it was just print/radio/TV/etc.
|
| Most that Joe Random could expect to get in print would be a
| Letter to the Editor, or a random passerby interview. The in-
| crowd was even more "unusual".
| TillE wrote:
| The Windows kernel also has linked lists as one of its few
| general-purpose data structures available to drivers.
|
| Like any tool, it has its place. As long as you're not traversing
| large linked lists, they're probably fine.
| dilyevsky wrote:
| Traversing a short list but frequently still going to have
| terrible cache performance compared to something like vector.
| Just use vectors, folks
| slaymaker1907 wrote:
| Allocating a huge chunk of memory all at once when the array
| grows can also cause a bunch of problems. Linked lists are
| also much simpler in a multi threaded context.
| dilyevsky wrote:
| I'm not an expert in memory allocators but I suspect large
| allocation is better than many small allocs due to less
| fragmentation no? I'll grant you the multithreaded case but
| if you're doing that your cache performance is probably
| crap anyway =)
| [deleted]
| lacrosse_tannin wrote:
| Linked lists don't care about you. They aren't even alive.
| TheDudeMan wrote:
| Is that a "pro" or a "con"?
| adeptima wrote:
| When people asking my opinion for Rust, I loved to share them the
| Linkedin List implementation link: https://doc.rust-
| lang.org/src/alloc/collections/linked_list....
|
| And the first feedback is why so many unsafe blocks? What is it
| Option<NonNull<Node<T>>> ? '_ ?
|
| Another reason to share, if you can understand Linkedin List, you
| are free to code in Rust ;)
| dekhn wrote:
| I don't know Rust but I would guess that an
| Option<NonNull<Node<T>>> would be a Node of type T, that cannot
| be set to null, but can be optional. This is a type I would
| want to return from a function that could either return
| nothing, or a pointer to a real thing.
|
| As for the unsafety, I would assume the authors of collections
| know what they are doing and do that for performance.
| proto_lambda wrote:
| `NonNull` is actually a wrapper around the pointer type, with
| the added invariant that the pointer cannot be null. The
| compiler is made aware of this, which allows
| `Option<NonNull<T>>` to be just a plain pointer, where the
| `None` variant of the option corresponds to a null pointer
| and the `Some(ptr)` case corresponds to a non-null pointer.
| josephg wrote:
| I really wish rust had better syntax for this. Raw C
| pointers should probably all be Option<NonNull<T>> rather
| than *mut T.
|
| The latter is easier to type but worse in almost every way.
| Ergonomics should guide you to the former, not the latter.
| kibwen wrote:
| It's an interesting suggestion to impose Option semantics
| on raw pointers by default (presumably with some
| alternative way to receive a nullable pointer, for
| completeness). But in the meantime, in your own codebase,
| it's easy enough to do `type Ptr<T> =
| Option<NonNull<<T>>`.
| josephg wrote:
| The problem there is that NonNull is exclusively a
| wrapper around *mut T. Using it discards the distinction
| with *const _ pointers. I think that has implications for
| correctness with Miri.
| andrewflnr wrote:
| I think if you tried to do that, you'd basically be
| mixing type-driven optimizations with the C FFI, which
| sounds sketchy, to me at least. The null-optimization for
| Option is just that, an optimization, and I don't like
| taking it for granted where safety/security is at stake.
| nickitolas wrote:
| The layout is guaranteed and documented as thus:
| https://doc.rust-lang.org/std/option/#representation
|
| > Rust guarantees to optimize the following types T such
| that Option<T> has the same size as T:
|
| > - Box<U>
|
| > - &U
|
| > - &mut U
|
| > - fn, extern "C" fn1
|
| > - num::NonZero*
|
| > - ptr::NonNull<U>
|
| > - #[repr(transparent)] struct around one of the types
| in this list.
|
| > This is called the "null pointer optimization" or NPO.
|
| > It is further guaranteed that, for the cases above, one
| can mem::transmute from all valid values of T to
| Option<T> and from Some::<T>(_) to T (but transmuting
| None::<T> to T is undefined behaviour).
|
| Although I'm not sure if the ABI is guaranteed to match
| (Which could differ even if the layout matches AFAIK).
| The ABI for Box is guaranteed to match since
| https://blog.rust-lang.org/2020/01/30/Rust-1.41.0.html ,
| and I would imagine NonNull is the same. Maybe you could
| open an issue in the UCG asking if you're needing
| confirmation.
| josephg wrote:
| Thats fair.
|
| I suppose one problem with this approach for C FFI is
| that there's a lot of different values which could all be
| "null pointers". Converting them all for Option would be
| awkward and slow, and you wouldn't want to ever risk
| getting this stuff wrong.
|
| But pointers are also useful even if you aren't doing
| FFI. Eg for implementing custom data structures. In that
| case, Option<NonNull<T>> (Or even NonNull<T>) is usually
| better than *mut T. But its harder to type, and it
| doesn't clearly tell you if the pointer should be *mut T
| or *const T. NonNull<T> should be preferred _because_
| security /safety is at stake for this sort of code.
| nemothekid wrote:
| > _When people asking my opinion for Rust, I loved to share
| them the Linkedin List implementation link:_
|
| This LinkedList obsession is a bit bizarre to me, and tends to
| come from older programmers who come from a time when coding
| interviews involved writing linked lists and balancing b-trees.
| To me though it also represents the stubbornness of C
| programmers who refuse to consider things like growable vectors
| a solved problem. My reaction to the LinkedList coders is not
| "well Rust needs to maintain ownership", its _why does your
| benchmark for how easy a language is involve how easy it is to
| fuck around with raw pointers_?.
|
| LinkedLists are a tool, but to C programmers that are an
| invaluable fundamental building block that shows up early in
| any C programmers education due to how simple they are to
| implement and the wide range of use cases they can be used for.
| But they are technically an unsafe data structure and if you
| willing to let some of that stubbornness go and finally accept
| some guard rails, you have to be able to see that a data
| structure like linkedlists will be harder to implement.
|
| It has nothing to do with the language; implementing with
| LinkedLists with any sort of guardrails adds a ton of
| complexity, either up front (e.g. borrowchecker) or behind the
| scenes (e.g. a garbage collector). When you accept this fact,
| it becomes ludicrous to imply that a LinkedList implementation
| is a good benchmark for the ergonomics of a language like Rust.
| bjourne wrote:
| Wtf? That code looks like shit.
| riwsky wrote:
| Also the implementation is all wrong. The proper way to add an
| element to a linkedin list is to send the element an email
| titled "I just requested to connect", along with a "view
| profile" and "accept" link.
| chlorion wrote:
| Considering that unsafe rust basically just allows raw pointer
| access (see link below) which is already a thing you can do
| normally in C, I do not see how that's a very good argument
| against it honestly.
|
| As for the Option type, it is exactly what it says. It's an
| optional, non-nullable node generic over type T. I suppose
| generics, optionals and the idea of non-nullable types might be
| exotic at first, but this is not a problem with Rust as much as
| it's a problem with C not being able to express these concepts
| in the first place and instead expecting you to spray null
| checks all over the place!
|
| https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html
| scaramanga wrote:
| Not to be overly contrary here but, you "need to spray null
| checks" probably just as much in rust as in C (in this
| specific example, not in general). Since those prev/next
| pointers are being modified in unsafe code where either
| you're using NonNull::new() which contains the null check, or
| you're using NonNull::new_unchecked() which means you need to
| convince yourself all the invariants hold true. The situation
| is roughly equal in C (ie. once you prove in a module that
| fields cannot be NULL, no need to add lots of redundant extra
| NULL checks in all users of the module).
| andrewflnr wrote:
| In the Rust ecosystem philosophy, linked lists are the kind of
| thing you want to write once, very carefully, then re-use a
| lot. That's exactly the kind of code where unsafe can be
| reasonable. It's not like it would actually be safer in any
| other language that didn't have an explicit unsafe keyword.
| More aesthetically pleasing, perhaps, but not safer.
|
| You're acting like it's some epic burn on Rust that there's
| some ugly code in its stdlib. It's not. Stdlibs are like that.
| Furthermore, as others have pointed out, linked lists in
| particular are a pathological case for Rust, meaning you're
| pointing and snickering at a special case of a special case.
| Most people never have to write or look at that kind of code.
| kragen wrote:
| Functional programming languages like Haskell, ML, and Lisps
| use linked lists very extensively and rarely have safety
| problems as a result. On the contrary, because linked lists
| allow you to build a list up efficiently without mutation,
| they give you a good way to avoid all the safety problems
| that come from mutation. In general just about any kind of
| data structure in these languages is some sort of weird
| linked list.
|
| This isn't the best solution for all cases! Sometimes
| performance is more important than safety, and there are
| things you can't do as efficiently without mutable data. And
| sometimes (especially for systems that are reactive rather
| than transformational, in Harel and Pnueli's terms, or for
| libraries) a garbage collector is an unacceptable cost.
|
| And Rust is actually reasonably good at singly-linked lists;
| it's just doubly-linked lists (which inherently feature
| mutation) that are troublesome. Singly-linked lists with
| sharing through Arc are a little more expensive in Rust than
| in something like GHC, Ikarus Scheme, or OCaml, but that's
| hardly a damning criticism of Rust.
|
| But you seem to be claiming that linked lists are unsafe in
| any language, and to anyone familiar with functional
| programming, that claim is obvious nonsense. Did you intend
| it?
| andrewflnr wrote:
| I should have been more specific about doubly linked lists.
| And I think the implicit context for comparisons with Rust
| is usually pointer-heavy non-GC languages, especially given
| OP, but maybe I should have specified that, too. It's
| certainly true that DLL's are tricky to write, and tricky
| to mechanically prove correct, which is mainly what
| 'unsafe' actually means.
| kragen wrote:
| Thank you for clarifying!
|
| I would think that the implicit context for _safety_
| comparisons with Rust would be other languages designed
| with safety as a goal. Those languages are almost all
| GCed! ML, Haskell, Scheme, Oberon, Coq, Erlang -- all
| GCed. That 's because GC helps enormously with type- and
| memory-safety. In the example in question, it ensures
| that your doubly-linked list manipulation cannot cause
| undetected type errors, use-after-free errors, double-
| free errors, or memory leaks; this last guarantee goes
| beyond what Rust can offer in this case. In ML, Haskell,
| and Oberon, it even ensures that all type errors are
| detected at compile time instead of run time.
|
| There are a few languages designed with safety as a goal
| that do not normally use GC: Pascal, Ada, COBOL, and
| MISRA C. But they suffer enormously in expressivity, are
| not pointer-heavy, and are generally only viable
| alternatives in restricted domains.
|
| The usual pointer-heavy non-GC languages are C and C++,
| and these have no concern whatsoever for safety, so it
| seems like a weak-man argument to target linked-list
| manipulation in them as unsafe relative to Rust.
|
| The correctness criteria for doubly-linked lists are
| indeed not within the capability of Rust's (or Haskell's,
| or ML's, or Oberon's) static checking. That doesn't imply
| that it's impossible to statically verify doubly-linked-
| list manipulation code, and in fact I think Ada SPARK is
| capable of doing it. Certainly it's within the scope of
| what you can prove with Coq or Isabelle/HOL.
| svnpenn wrote:
| this comes off as extremely defensive. I don't feel like the
| person you're responding to, intended any kind of "sick
| burn". Calm down dude.
| andrewflnr wrote:
| Whatever. They're definitely pointing and snickering. I
| don't think there's any way to read their post but as a
| criticism of Rust the language based on this one piece of
| stdlib code, and that's nonsense. That irritates me. So sue
| me.
|
| There are reasonable arguments to be had about Rust. I wish
| people would do those instead of all the nonsense you
| usually see.
| svnpenn wrote:
| on the contrary, I think more and more people are
| unwilling to hear any criticism, even constructive
| criticism, of "their thing".
|
| Rust is not perfect. I wont go into its faults, but you
| know what they are. I think overall, Rust is a great
| language, but you have to be realistic and understand
| that some stuff you ignore in favor of the holistic view,
| others might not be able to. So sometimes maybe just take
| the criticism and deal with it, or if you're in the
| position, do something to fix the problem.
| andrewflnr wrote:
| Contrary to _what_ exactly that I said?
| svnpenn wrote:
| to this:
|
| > I wish people would do those instead of all the
| nonsense you usually see
|
| you seem to want to place the blame squarely on the
| critics, when the reality is that many people in your
| position are seemingly intolerant of any criticism of
| "their thing", even constructive criticism.
| andrewflnr wrote:
| Ah, I see, since I'm defending Rust in this sub-thread,
| it's impossible for me to ask for constructive criticism
| instead of bullshit without it being "seemingly
| intolerant", while you're the reasonable one for doing
| the exact same thing. Basically just assuming I'm an
| idiot and/or arguing in bad faith. Lovely.
| svnpenn wrote:
| they linked to the source code. What bar do they have to
| reach, before you honor it with being constructive? the
| point is, there is no bar. unless its praise to Rust,
| then its trolling in your eyes.
| Dylan16807 wrote:
| The source code of the standard library is usually an
| awful way to show off any language. Pointing and gawking
| at such a thing as criticism is very bad criticism.
| andrewflnr wrote:
| Yeah, I already made that point in my root comment. It
| was pretty much my whole thesis, in fact. Doesn't matter,
| apparently! svnpenn is determined to confuse me for
| someone dumber. :)
| SighMagi wrote:
| There's so much code marked "unsafe" in there...
| estebank wrote:
| That's because in order to interact with pointers you need
| unsafe, and not using unsafe in Rust requires a single
| ownership tree _or_ reference counting. If you 're not keen
| on either of those solutions, you shouldn't be writing a
| Doubly Linked List in other languages either (because they
| will have either pointers or reference counting/GC).
| andrewflnr wrote:
| "Those solutions" start off being hierarchical ownership or
| reference counting, but turn into "pointers or reference
| counting". Requiring pointers is implicit to the definition
| of linked lists, and calling it out here is silly. And you
| don't really need either ref counting (again, pretty
| obviously, witness every real world implementation) or
| hierarchical ownership outside Rust.
|
| Conceptually, the list as a vague concept owns all the
| nodes. The ownership just stops being expressible directly
| through references the way Rust wants it to be. But you
| don't have to re-invent hierarchical ownership to implement
| DLLs in other languages, because it's not really there in
| the problem. You just have to accept that they're always
| going to suck under Rust's ownership model.
| estebank wrote:
| > Requiring pointers is implicit to the definition of
| linked lists, and calling it out here is silly.
|
| The usual complaints are that "Safe Rust doesn't let you
| do this", when unsafe Rust is right there to let you
| operate on any soup of pointers datastructure that you'd
| want to implement.
|
| > you don't really need either ref counting (again,
| pretty obviously, witness every real world
| implementation)
|
| If you implement DDL in a memory managed language, you
| effectively have the same behavior as you would in Rust
| with Arc or Rc. The ownership of the nodes then belongs
| to the GC or the RC value. You don't have to think about
| it because you're abstracted from the lifetime of each
| node by the language and runtime.
|
| > or hierarchical ownership outside Rust
|
| The ownership/borrow checker requires hierarchical to
| operate, but it doesn't stop you from writing unsafe
| Rust.
|
| > Conceptually, the list as a vague concept owns all the
| nodes. The ownership just stops being expressible
| directly through references the way Rust wants it to be.
| But you don't have to re-invent hierarchical ownership to
| implement DLLs in other languages, because it's not
| really there in the problem. You just have to accept that
| they're always going to suck under Rust's ownership
| model.
|
| Yeah, and that's what unsafe is for. Most code doesn't
| need vague ownership, though. I'd go as far as saying
| that the nudge towards clear ownership helps both
| maintainability and performance.
|
| My concern is with the prevalence of this idea that
| unsafe Rust is not "real Rust" or that the existence and
| use of unsafe Rust somehow precludes any benefits of
| Rust.
| db48x wrote:
| Technically the pointers don't have to be actual
| pointers. And if they're not actually pointers, then
| Rust's ownership model and borrow checker will be
| perfectly content; they'll barely trouble you at all. And
| you won't even need any unsafe blocks.
|
| What you do instead is use integers instead of pointers.
| The integers are indexes into a Vec of list nodes, owned
| by the linked list. Since the nodes are now owned only by
| the Vec, which is in turn owned only by the linked list,
| the borrow checker will not complain.
|
| Some people object that this is cheating somehow, but
| what is memory but a giant untyped global vector, shared
| between all parts of your application? Pointers into that
| giant shared array are just indexes with extra risk,
| since you have to trust that they point to real nodes
| that have been initialized. Plus, you often see users of
| a linked list put the nodes into an arena allocator
| anyway, especially in the kernel. The Vec in your Rust
| implementation serves the same purpose as the arena
| allocator.
| tracker1 wrote:
| There be dragons...
| xdavidliu wrote:
| > I loved to share them the Linkedin List implementation ... if
| you can understand Linkedin List ...
|
| is the "In" in "LinkedIn" deliberate here or just a typo that
| was made twice?
| mastax wrote:
| C++'s STL linked list for comparison (libcxx).
|
| https://github.com/llvm-mirror/libcxx/blob/master/include/li...
| einpoklum wrote:
| Rust's list only supports a default allocation mechanism. For
| an actual comparison, you'll want to show us a Rust
| implementation which is parametrized over allocators.
|
| Having said that - yes, it's not pretty. About the same
| length though.
| mastax wrote:
| Adding allocator support does add a bit more noise:
|
| https://github.com/rust-
| lang/rust/pull/103093/files#diff-8bd...
| kortex wrote:
| Oof. Yeah, standard lib code tends to be harder to read on
| average than application code. But at least I can mentally
| parse the Rust code. Heck it even looks like a linked list
| impl. The c++ code looks barely better than line noise. I'm
| sure I could wrap my head around it, but ho boy, it's far
| worse than Rust by a long shot.
|
| And this is with several years cpp experience, and maybe a
| few months of Rust on and off.
| hnov wrote:
| Isn't a doubly linked list basically incompatible with Rust's
| idea of memory ownership?
| proto_lambda wrote:
| It's definitely not a data structure that makes the borrow
| checker happy. Luckily it's also not a data structure that's
| required or desirable in 99% of cases, but the standard
| library still offers an implementation for those 1% so people
| don't try to write their own (turns out getting a doubly
| linked list correct isn't quite as trivial as it may seem).
| Waterluvian wrote:
| I'm actually a bit surprised it's in the standard library
| if it's so uncommonly needed. Isn't Rust's stdlib tiny with
| a philosophy of letting the community fill in most things?
| estebank wrote:
| There were arguments made for its non-inclusion[1].
|
| [1]: https://rust-unofficial.github.io/too-many-
| lists/sixth.html
| Waterluvian wrote:
| I enjoyed reading this. A fun style.
| zozbot234 wrote:
| AIUI, GhostCell and related patterns can be used to implement
| doubly linked lists in safe Rust. These patterns are quite
| obscure and hard to use in practice, though.
| tmtvl wrote:
| From what I remember Rust has some kind of RefCell with weak
| references. Those could be used to make DLLs, if anyone can
| find a reason to use those. That said, you could also use
| zippers...
| slaymaker1907 wrote:
| Now imagine writing the drop implementation for a DLL that
| doesn't cause a stack overflow.
| steveklabnik wrote:
| RefCell<T> and weak references are two different things,
| but rust has both, yes.
| brundolf wrote:
| It's at odds with it, but so are many other data structures
| found in the standard library. The solution is usually to use
| unsafe { } inside the data structure to allow for those
| criss-crossing pointers (while exposing safe, checker-
| friendly interfaces to the outside world)
|
| This is a great resource that outlines several different
| strategies for implementing linked lists in Rust:
| https://rust-unofficial.github.io/too-many-lists/
| kibwen wrote:
| When explaining why the difficulty of modeling linked lists in
| Rust doesn't matter, I like to share the following book, "Learn
| Rust With Entirely Too Many Linked Lists", written by the woman
| who wrote the implementation you've linked above: https://rust-
| unofficial.github.io/too-many-lists/
| adeptima wrote:
| My first thought it was a joke ... ;)
|
| Learn Rust with entirely too many linked lists (2019) -
| https://news.ycombinator.com/item?id=22390662 - https://rust-
| unofficial.github.io/too-many-lists/index.html
|
| In this series I will teach you basic and advanced Rust
| programming entirely by having you implement 6 linked lists.
| In doing so, you should learn:
|
| - The following pointer types: &, &mut, Box, Rc, Arc,
| _const,_ mut, NonNull(?)
|
| - Ownership, borrowing, inherited mutability, interior
| mutability, Copy
|
| - All The Keywords: struct, enum, fn, pub, impl, use, ...
|
| - Pattern matching, generics, destructors
|
| - Testing, installing new toolchains, using miri
|
| - Unsafe Rust: raw pointers, aliasing, stacked borrows,
| UnsafeCell, variance
| keepquestioning wrote:
| I read this and noped out of learning Rust.
| hinkley wrote:
| Are you bragging about self imposed ignorance of advanced
| programming topics?
|
| I guess there's plenty of space in the world for PHP
| programmers and their predecessors, the VB programmers.
| So... congrats?
| kelnos wrote:
| So I glanced at it, expecting something much much worse,
| but I was surprised to see it wasn't that bad. Even if you
| look at the final code for the final implementation, it
| doesn't seem much more complex than what you'd have to
| write in C. And a good chunk of the code is implementing
| traits from the stdlib, stuff that isn't strictly
| necessary, but makes it easier to make the API of the deque
| idiomatic and work well with other stdlib (and non-stdlib)
| APIs.
|
| And regardless, this book is teaching from the perspective
| of intentional super-hard-mode. Most Rust developers will
| never have to write that much unsafe code.
| AtNightWeCode wrote:
| People must stop give attention to old garbage solutions that
| should not be used. Never use a link list if you don't
| specifically need a link list. You probably don't know why you
| would need one so don't go there.
| int_19h wrote:
| If you stop "giving attention" to old solutions, people will
| simply reinvent them. Linked lists in particular are very easy
| to reinvent because they're so simple, and because their
| advantages are all evident even in a high-level language, while
| disadvantages require a more low-level understanding of modern
| compute performance.
| tester756 wrote:
| It could be way better advice if you tried
| deafpolygon wrote:
| Honestly, a lot of programming data structure concepts were
| difficult to grasp /until/ I truly understood how linked list
| works. It's a very simple concept in hindsight, but as a newbie
| programmer - this was an alien concept. Linked list? Just give me
| my array or built in List! Then I understood how to build a
| linked list and how it can be extended in various ways. They can
| be used to solve interesting problems. Many high level languages
| provide pre-baked implementations of LL but most programmers
| don't understand how they work under the surface. When I
| understood LL, I was able to apply that understanding to trees.
| GnarfGnarf wrote:
| I have built my business on linked lists (genealogy trees). Paid
| for my house :o)
| visarga wrote:
| That was my high school programming fun - implement linked lists
| and other data structures in TurboPascal. Pascal is a language
| that does not come with dynamic lists, only fixed size arrays and
| memory allocations. I had to build linked lists just to have a
| decent list primitive.
|
| A few years later I discovered Perl that comes with dynamic size
| lists, dictionaries and strings. Whoa, that was a treat, so very
| nice to use and efficient. To this day I prefer constructs made
| of lists and dicts 10x more than defining classes.
| dragontamer wrote:
| The most interesting thing about linked lists to me is how
| trivially it can be made lock-free multithreaded with atomics.
|
| An array/stack can be made multithreaded but it is non-trivial to
| handle the edge-cases. In particular, the ABA problem is rather
| difficult. I've seen many solutions to it (ex: a synchronization
| variable that puts the stack into "push-only" mode and then "pop-
| only" mode. There's no ABA-problem if all threads are pushing!)
|
| However, pushing/popping from a Linked List stack requires no
| such synchronization at all. Simply compare-and-swap the head
| (and on failure, try again). Its about as simple as you can get
| when it comes to atomic / lock free patterns.
| charleslmunger wrote:
| I did this for an object pool in re2j and saw even single
| threaded performance improve.
|
| https://github.com/google/re2j/blob/dc7d6e5d41225dc0825ea6fe...
|
| Java doesn't suffer from pointer address ABA but I did have to
| handle reinsertion (except when the stack had only one
| element).
| [deleted]
| yeputons wrote:
| > Simply compare-and-swap the head (and on failure, try again).
|
| It doesn't help with ABA at all, does it? Unless you assume
| that nodes are immutable and the same pointer is never reused,
| of course.
| dragontamer wrote:
| I forgot about that other ABA issue.
|
| Still, that's easily solved: 64-bit version number + 64-bit
| pointer (or 32-bit version number + 32-bit pointer) for a
| 128-bit (or 64-bit) compare-and-swap.
|
| All modern CPUs support 128-bit CAS.
|
| EDIT: The version number is incremented by +1 each time. It
| is unlikely that you'd overflow 64-bit version number and
| risk an ABA problem, though 32-bits can be overflowed
| surprisingly quickly in today's computers.
|
| --------
|
| EDIT2: Note: I'm pretty sure (but not 100% sure) that the
| Head of a linked-list stack can be "simply" compared-and-
| swapped to remain lock-free and 100% valid (ie: 64-bit
| compare and swap over the 64-bit pointer). I did not mean to
| imply that you can "CAS any arbitrary node of a linked-list".
| A fully generic linked list is possible and I've seen it with
| the 128-bit CAS operator, but that wasn't what I was going
| for originally.
| murderfs wrote:
| > EDIT2: Note: I'm pretty sure (but not 100% sure) that the
| Head of a linked-list stack can be "simply" compared-and-
| swapped to remain lock-free and 100% valid (ie: 64-bit
| compare and swap over the 64-bit pointer).
|
| No, you cannot. The problem is what you're comparing and
| swapping into the head during a pop. You want to do the
| moral equivalent of `current = list;
| list.compare_exchange(current, current->next)`, but
| current->next might have changed if someone else popped the
| original head, pushed something else, and then pushed the
| original head again.
|
| You need double CAS or LL/SC or a more complicated scheme
| to make this work.
| AstralStorm wrote:
| You can get away with single CAS if you don't need full
| size pointers. Of course then you're actually using your
| list to refer to an array.
| dragontamer wrote:
| It should be noted that on most x86 machines, they only
| support 48-bit pointers, with 16 free bits you can use.
|
| I think some Intel chips have 57-bit support. But no CPU
| actually reads all 64-bits for some reason.
|
| So yeah, you can shove count-bits, either 16 of them or 7
| of them or so.
| dragontamer wrote:
| Fair catch.
|
| That's obscure but it looks like you're correct in this
| case.
| murderfs wrote:
| It might not be immediately obvious that this can happen,
| but in practice, this scenario happens very frequently,
| because if you free a node and then malloc a node, you're
| very likely to get the same address back with most memory
| allocators.
| [deleted]
| ww520 wrote:
| Linked list is useful when you are hurting for memory and need
| precision memory allocation, like in the kernel. You trade memory
| compactness for more operation time in dealing with its certain
| aspects.
| AstralStorm wrote:
| And code size, let's not forget about that one. Plus you do not
| have a super smart memory allocator to prevent the hundreds of
| vectors from spraying the heap.
| ayhanfuat wrote:
| For completenes, here's what Bjarne Stroustrup says on why you
| should avoid using linked lists: https://youtu.be/YQs6IC-vgmo
| db48x wrote:
| This is a video everyone should watch.
| raible wrote:
| FWIW I usefully use a "fake" linked list by adding a .next field
| to the struct in a compiler-allocated array of structs. On
| initialization I set the list head to &item[0], and in a trivial
| loop set the .next field of each entry (except the last) to
| entry+1.
|
| Why bother? Because I can then easily add a few extra structs to
| the beginning of the (contiguously-allocated) linked-list without
| having to reallocate the whole thing.
|
| Sure, pointer chasing with separately allocated structs is
| "slow", but I haven't yet measured to see if it's any different
| when (almost all) items are contiguous.
|
| If you would... - what sort of cache behavior should one expect
| of this on a modern laptop CPU? - I haven't seen this approach
| before, have you?
| Sirened wrote:
| It depends on what prefetchers your CPU has and the actual
| underlying access pattern your accesses cause. If chasing next
| leads to a constant stride run through the array, you'll get
| identical performance to that of an iterative walk since
| essentially every high performance CPU since like 2010 supports
| stride prediction based on raw addresses. If .next sends you
| bouncing all over the area, you'll get complicated behavior
| since whether or not the data can be prefetched depends on the
| performance/existence of a pointer prefetcher, which is less
| common/more error prone. We know Apple's M1 have it due to some
| researchers using it as a side channel [1] but you'll have to
| do some digging on whether or not your laptop has one. Would
| make a nice post here if you do make the benchmarks :)
|
| [1] https://www.prefetchers.info/augury.pdf
| zebb wrote:
| > In defense of linked lists
|
| In defense of what? A brigade of null pointer exceptions?
|
| > Linked lists are conceptual. A node pointing to itself is the
| most self centered thing I can imagine in computing: an ideal
| representation of the more vulgar infinite loop. A node pointing
| to NULL is a metaphor of loneliness. A linked list with tail and
| head connected, a powerful symbol of a closed cycle.
|
| Oh, this article is complete satire. Bravo, you had me
| elteto wrote:
| I much prefer deques to linked lists when I need O(1) push/pop
| but still want to keep some semblance of cache locality. Although
| you can't splice two deques together as easily as two linked
| lists.
|
| Sadly, you can't really control the block size in C++'s
| std::deque so you can't communicate useful information to the
| library about your use case. I think MSVC allocates such a small
| block that it is effectively a linked list.
| woojoo666 wrote:
| I feel like to compare linked lists and dequeues, you would
| have to give a specific implementation of a dequeue
| Jtsummers wrote:
| Correct, deques are abstract data types and you can implement
| them multiple ways, including using linked lists or growable
| arrays/vectors. So they aren't directly comparable to linked
| lists since under the hood they can be implemented with
| linked lists (or doubly linked lists at least). You could
| compare the performance of different deque implementations,
| though.
| intrepidhero wrote:
| How does a dequeue improve the situation with cache locality
| versus a linked list?
| klyrs wrote:
| It depends on the implementation. Python's deque, for
| example, is implemented as a two-level data structure; a
| circularly linked list of blocks. If you've got page-sized
| blocks, then your deque can be sufficiently local for most
| cases. It introduces a little extra logic, and you'll want to
| keep around a spare block to prevent thrashing memory if you
| repeatedly push/pop at the block boundary, but it's often
| worth the pain if the data structure is anywhere near a hot
| loop.
| zwkrt wrote:
| Mosty dequeues are implemented under the hood with arrays. A
| linked list usually requires a heap allocation for each
| element.
|
| This is speaking in generalities though, since you could have
| an "alternatively implemented dequeue or a linked list
| entirely on the stack.
| blibble wrote:
| it's typically implemented as an array (e.g. circular buffer)
| Shorel wrote:
| It is usually implemented as linked arrays. For example,
| arrays of 16 or 64 elements, if it grows over that size, new
| arrays or vectors are used underneath.
|
| Two consecutive elements are probably in the same array, and
| that helps with cache locality.
| layer8 wrote:
| Note that a deque in general is an ADT that isn't necessarily
| implemented in the way std::deque commonly is. A deque can be
| implemented as a linked list.
| javajosh wrote:
| Linked lists are cool, but he missed one reason why: how easily
| they become n-arry trees. A tree expressed like this is quite
| elegant and useful in a variety of domains, my favorite being a
| "trie" that is an n-arry tree that computes the label of each
| node as the string you get from the previous label plus the
| string in this node. It's a wild and hairy data-structure, but
| very cool once tamed.
| koinedad wrote:
| They are really cool. Deleting an element by assigning next to
| next.next still blows my mind.
| Jtsummers wrote:
| If you think that's neat, check out Dancing Links used for
| backtracking algorithms:
| https://en.wikipedia.org/wiki/Dancing_Links
___________________________________________________________________
(page generated 2022-11-05 23:01 UTC)