https://blog.toit.io/hash-maps-that-dont-hate-you-1a96150b492a The Toit Take The Toit Take Hash maps that don't hate you Erik Corry Erik Corry Follow Jul 1 * 7 min read The map data-structure is incredibly useful when programming. The first time I used Perl the thing that most impressed me was how easy it was to use hashes, as they call them. I used them for everything! Map is a central feature in Go too -- it's one of the built-in object types with syntax support and generics. I think most people who program in Python have a similar experience. They call them dictionaries, and they are everywhere. In JavaScript, too, every object is a sort of a hash map. When I worked on V8 we had to go to huge lengths to hide the performance consequences of that decision, but I'm still a fan of easy-to-use hash maps in languages. But all these languages initially had hash maps that were unordered. You can put all your key-value pairs in the hash map, but when you want to go through the content of your map you get them back in some strange order. It's not the order you put them in, and it's almost certainly not the order you wanted. Go and Perl decided to fix this by making hash ordering more random. JavaScript and Python went the other way and fixed the ordering to be the same order as insertion. (There's one exception in JS objects -- numeric property names are ordered numerically, not by insertion order. We got a lot of hate on V8's bug tracker for that one!) The best way to make hash maps efficiently insertion ordered is some variation of what Python calls the "compact dictionary", popularized/ rediscovered by Raymond Hettinger. (According to Hettinger there is some prior art, implemented in Java, and it's actually similar to the internal object implementation of V8, invented by Lars Bak, also one of Toit's co-founders.) In compact dictionaries, the keys and values are inserted in a linear array, and there is a more conventional open addressing hash table that indexes into the linear array. It looks something like this. [1] This is what a hash map looks like in Toit, too. In fact, the hash numbers in the index table are not needed, but they make lookup faster. Storing the last few bits of the hash code lets us detect most of the collisions in the index without having to look at the keys in the backing array. The other advantage of having hash codes in the index is that it speeds up rebuilding the index when it becomes too full. If the user removes an entry from a hash map of this type we replace the key with a marker that indicates this entry has been removed. Perhaps a little morbidly, we call these markers "tombstones". When there are enough of these tombstones, the hash map gets rebuilt, but that's OK. Due to the magic of amortized constant time, this doesn't slow you down on average. This is all very cool because now there's an efficient way to make hash maps with insertion ordering. These are hash maps that don't hate you, and that's why we used them in the Dart project too, now probably best known as the engine behind Flutter. (Some of the Dart founders went on to found Toit.) Other ways to know your hash maps don't hate you The other feature you quickly find yourself needing is the ability to use arbitrary objects as keys in a hash map. In languages that don't support that (Perl, Python, Go, JavaScript until recently) you can use strings as keys, generating a unique string for each key object. This isn't very elegant, and your program creates a lot of strings that didn't really need to exist. Somewhat more advanced is being able to specify the equality function for the keys in a map. In Java and Toit, for example, you can specify a custom equals method for the class of the keys in a collection. In contrast, the map implementation in JavaScript only uses object identity. You can't create a new object and then discover whether an equivalent key is already in the map. Again the inelegant workaround is for each to generate strings and use those as keys. Going even further, it is sometimes necessary to specify the equality function per map instead of per-key type. This is available in C++ in the form of the Hash and Pred template parameters on unordered_map. (But as the name unordered_map indicates, the standard C++ collections hate you, so this advanced feature is little comfort.) Currently, we don't support per-map equality functions in Toit, but it's coming soon! In Java, you would have to wrap the keys in your map or use Trove. The fly in the ointment and how to remove it Even if your language has compact dictionaries, there's still a tiny fly in the ointment. Under certain circumstances, your hash maps will still act as if they hate you, but in a new way! Some people like to use hash maps (and sets, which have the same implementation) as a sort of deduplicating work queue. You can add "tasks" to the set, and it will automatically ignore duplicates because it's a set. A task queue (also called FIFO) needs an append-to-the-end operation and a remove-from-the-start operation. The append-to-the-end operation we already have, and it's easy to make a remove-first operation by looking at the start of the backing array. If the Toit language didn't already have this function, you could add it as follows: /// Function that takes a set, removes the and returns the first element /// Function that takes a set, removes the and returns the first element [1] [1] The problem arises after a while of using a set or map like this. There will be a lot of tombstones near the start of the backing, and suddenly the remove-from-the-start operation isn't so fast. It needs to skip over more and more tombstones, and the run time to insert-then-remove n items becomes quadratic. This issue can't be solved by rebuilding the hash map more frequently, because if you do that, you lose the property of amortized constant time operations. An ad-hoc solution would be to record for each hash map where the first entry is in the backing array. This feels like a band-aid though. What if users want to repeatedly remove the last entry using a set as a sort of deduplicating stack (or LIFO)? The hateful quadratic hash map problem would be back. What if they want to repeatedly remove the second entry in the set? One thing I learned on V8 is that it's hard to predict what your users will do with a programming language implementation you have built. And we built Toit to work on tiny devices. Do we really want to spend an extra word per hash map to record the number of leading tombstones? The solution we came up with was tombstones with distance markers. Each tombstone additionally records the distance to the next real entry, either on the left or the right. We update this lazily when scanning through the backing, so it doesn't cost any time on other operations, but it lets us skip long stretches of tombstones wherever they appear. [1] [1] For the common case where there are no deletions, or the deletions are randomly distributed, nothing changes, but for the problematic cases we retain the desirable property of amortized constant time insertion and deletion. Testing languages for this bug I wrote a tiny program to test whether language implementations have quadratic slowdowns when using a set as a FIFO. I'm only testing languages that have an insertion-ordered set. For JavaScript I'm using the new(ish) Set, rather than abusing ordinary objects as hash maps. In Java I used LinkedHashMap, which has a completely different internal representation. I suspect LinkedHashMap of being fairly profligate in memory use (I'll be blogging about this issue soon), but it's the fastest insertion ordered set I tested, especially when the adaptive JIT kicks in. So here, for your enjoyment, is the graph for various languages. Currently, all the compact-dictionary-based implementations (except Toit) have this issue. Hopefully, this blog post will inspire the implementers to fix it. [1] [1] us / operation for various set sizes [1] [1] The Toit Take Hear the latest news from the Toit team on continuous firmware delivery for your MCUs. 114 * Hashmap * JavaScript * Dart * Python3 * Perl 114 claps 114 The Toit Take The Toit Take Continuously update the code on your microcontrollers even over cellular connections. Monitor and securely service your devices in production; all through the Toit API. Erik Corry Written by Erik Corry Follow The Toit Take The Toit Take Continuously update the code on your microcontrollers even over cellular connections. Monitor and securely service your devices in production; all through the Toit API. More From Medium Consibio chooses Toit to optimize bioprocesses Nils Westerlund in The Toit Take [1] [1] Add mentions in your Flutter app (don't @ me) Annsh Singh in Flutter Clan [0] [0] 5 Easy Techniques That Will Improve Your Skills in Programming Josef Cruz in JavaScript in Plain English [0] [0] How to Pass Arguments to setTimeout and setInterval Callbacks Cristian Salcescu in Frontend Essentials [1] [1] Understanding Apache MultiKeyMap Ganesh Kumar Marimuthu in Nerd For Tech [1] [1] Conditional Statements in Python Banana Chip Tech [1] [1] 17 Python Tips and Tricks to Boost your Coding Skills Haider Imtiaz in Python in Plain English [1] [1] Getting To Know Flutter: Google Maps Integration Enrico Ori in TheOtherDev/s [1] [1] Learn more. Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more Make Medium yours. Follow the writers, publications, and topics that matter to you, and you'll see them on your homepage and in your inbox. Explore Write a story on Medium. If you have a story to tell, knowledge to share, or a perspective to offer -- welcome home. It's easy and free to post your thinking on any topic. Start a blog About Write Help Legal Get the Medium app A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store