https://www.quantamagazine.org/new-book-sorting-algorithm-almost-reaches-perfection-20250124/ * Physics * Mathematics * Biology * Computer Science * Topics * Archive * Blog * Columns * Interviews * Podcasts * Puzzles * Multimedia * Videos * About Quanta An editorially independent publication supported by the Simons Foundation. Follow Quanta Newsletter Get the latest news delivered to your inbox. Email [ ] [ ] Subscribe Recent newsletters Gift Store Shop Quanta gear * Physics * Mathematics * Biology * Computer Science * Topics * Archive * Saved Articles Create a reading list by clicking the Read Later icon next to the articles you wish to save. See all saved articles * Log out ----------------------------------------------------------------- Change password * * Type search term(s) and press enter [ ] What are you looking for? Popular Searches * Mathematics * Physics * Black Holes * Evolution Home New Book-Sorting Algorithm Almost Reaches Perfection Read Later Share Copied! * Comments * Read Later Read Later algorithms New Book-Sorting Algorithm Almost Reaches Perfection By Steve Nadis January 24, 2025 The library sorting problem is used across computer science for organizing far more than just books. A new solution is less than a page-width away from the theoretical ideal. Read Later Kristina Armitage/Quanta Magazine Introduction [nadis-1654x1720] By Steve Nadis Contributing Writer --------------------------------------------------------------------- January 24, 2025 --------------------------------------------------------------------- View PDF/Print Mode algorithms computer science data All topics [2021PodcastAd_Article_160] Computer scientists often deal with abstract problems that are hard to comprehend, but an exciting new algorithm matters to anyone who owns books and at least one shelf. The algorithm addresses something called the library sorting problem (more formally, the "list labeling" problem). The challenge is to devise a strategy for organizing books in some kind of sorted order -- alphabetically, for instance -- that minimizes how long it takes to place a new book on the shelf. Imagine, for example, that you keep your books clumped together, leaving empty space on the far right of the shelf. Then, if you add a book by Isabel Allende to your collection, you might have to move every book on the shelf to make room for it. That would be a time-consuming operation. And if you then get a book by Douglas Adams, you'll have to do it all over again. A better arrangement would leave unoccupied spaces distributed throughout the shelf -- but how, exactly, should they be distributed? This problem was introduced in a 1981 paper, and it goes beyond simply providing librarians with organizational guidance. That's because the problem also applies to the arrangement of files on hard drives and in databases, where the items to be arranged could number in the billions. An inefficient system means significant wait times and major computational expense. Researchers have invented some efficient methods for storing items, but they've long wanted to determine the best possible way. Last year, in a study that was presented at the Foundations of Computer Science conference in Chicago, a team of seven researchers described a way to organize items that comes tantalizingly close to the theoretical ideal. The new approach combines a little knowledge of the bookshelf's past contents with the surprising power of randomness. "It's a very important problem," said Seth Pettie, a computer scientist at the University of Michigan, because many of the data structures we rely upon today store information sequentially. He called the new work "extremely inspired [and] easily one of my top three favorite papers of the year." Narrowing Bounds So how does one measure a well-sorted bookshelf? A common way is to see how long it takes to insert an individual item. Naturally, that depends on how many items there are in the first place, a value typically denoted by n. In the Isabel Allende example, when all the books have to move to accommodate a new one, the time it takes is proportional to n. The bigger the n, the longer it takes. That makes this an "upper bound" to the problem: It will never take longer than a time proportional to n to add one book to the shelf. Scientists Find Optimal Balance of Data Storage and Time [HashTableLimits-byAllisonLi-Default] algorithms Scientists Find Optimal Balance of Data Storage and Time February 8, 2024 Read Later The authors of the 1981 paper that ushered in this problem wanted to know if it was possible to design an algorithm with an average insertion time much less than n. And indeed, they proved that one could do better. They created an algorithm that was guaranteed to achieve an average insertion time proportional to (log n)^2. This algorithm had two properties: It was "deterministic," meaning that its decisions did not depend on any randomness, and it was also "smooth," meaning that the books must be spread evenly within subsections of the shelf where insertions (or deletions) are made. The authors left open the question of whether the upper bound could be improved even further. For over four decades, no one managed to do so. However, the intervening years did see improvements to the lower bound. While the upper bound specifies the maximum possible time needed to insert a book, the lower bound gives the fastest possible insertion time. To find a definitive solution to a problem, researchers strive to narrow the gap between the upper and lower bounds, ideally until they coincide. When that happens, the algorithm is deemed optimal -- inexorably bounded from above and below, leaving no room for further refinement. In 2004, a team of researchers found that the best any algorithm could do for the library sorting problem -- in other words, the ultimate lower bound -- was log n. This result pertained to the most general version of the problem, applying to any algorithm of any type. Two of the same authors had already secured a result for a more specific version of the problem in 1990, showing that for any smooth algorithm, the lower bound is significantly higher: (log n)^2. And in 2012, another team proved the same lower bound, (log n)^2, for any deterministic algorithm that does not use randomness at all. These results showed that for any smooth or deterministic algorithm, you could not achieve an average insertion time better than (log n)^ 2, which was the same as the upper bound established in the 1981 paper. In other words, to improve that upper bound, researchers would need to devise a different kind of algorithm. "If you're going to do better, you have to be randomized and non-smooth," said Michael Bender, a computer scientist at Stony Brook University. Michael Bender in a blue shirt wearing black glasses. Michael Bender went after the library sorting problem using an approach that didn't necessarily make intuitive sense. Courtesy of Michael Bender But getting rid of smoothness, which requires items to be spread apart more or less evenly, seemed like a mistake. (Remember the problems that arose from our initial example -- the non-smooth configuration where all the books were clumped together on the left-hand side of the shelf.) And it also was not obvious how leaving things to random chance -- essentially a coin toss -- would help matters. "Intuitively, it wasn't clear that was a direction that made sense," Bender said. Nevertheless, in 2022, Bender and five colleagues decided to try out a randomized, non-smooth algorithm anyway, just to see whether it might offer any advantages. A Secret History Ironically, progress came from another restriction. There are sound privacy or security reasons why you may want to use an algorithm that's blind to the history of the bookshelf. "If I had 50 Shades of Grey on my bookshelf and took it off," said William Kuszmaul of Carnegie Mellon University, nobody would be able to tell. William Kuszmaul in a black checkered shirt in front of a whiteboard. William Kuszmaul, Bender and others lowered the upper bound on the library sorting problem practically down to the ideal. Rose Silver In a 2022 paper, Bender, Kuszmaul and four co-authors created just such an algorithm -- one that was "history independent," non-smooth and randomized -- which finally reduced the 1981 upper bound, bringing the average insertion time down to (log n)^1.5. Kuszmaul remembers being surprised that a tool normally used to ensure privacy could confer other benefits. "It's as if you used cryptography to make your algorithm faster," he said. "Which just seems kind of strange." Helen Xu of the Georgia Institute of Technology, who was not part of this research team, was also impressed. She said that the idea of using history independence for reasons other than security may have implications for many other types of problems. Closing the Gap Bender, Kuszmaul and others made an even bigger improvement with last year's paper. They again broke the record, lowering the upper bound to (log n) times (log log n)^3 -- equivalent to (log n)^1.000...1. In other words, they came exceedingly close to the theoretical limit, the ultimate lower bound of log n. Once again, their approach was non-smooth and randomized, but this time their algorithm relied on a limited degree of history dependence. It looked at past trends to plan for future events, but only up to a point. Suppose, for instance, you've been getting a lot of books by authors whose last name starts with N -- Nabokov, Neruda, Ng. The algorithm extrapolates from that and assumes more are probably coming, so it'll leave a little extra space in the N section. But reserving too much space could lead to trouble if a bunch of A-name authors start pouring in. "The way we made it a good thing was by being strategically random about how much history to look at when we make our decisions," Bender said. The result built on and transformed their previous work. It "uses randomness in a completely different way than the 2022 paper," Pettie said. Related: --------------------------------------------------------------------- 1. Computer Scientists Invent an Efficient New Way to Count 2. Researchers Approach New Speed Limit for Seminal Problem 3. New Breakthrough Brings Matrix Multiplication Closer to Ideal These papers collectively represent "a significant improvement" on the theory side, said Brian Wheatman, a computer scientist at the University of Chicago. "And on the applied side, I think they have the potential for a big improvement as well." Xu agrees. "In the past few years, there's been interest in using data structures based on list labeling for storing and processing dynamic graphs," she said. These advances would almost certainly make things faster. Meanwhile, there's more for theorists to contemplate. "We know that we can almost do log n," Bender said, "[but] there's still this tiny gap" -- the diminutive log log n term that stands in the way of a complete solution. "We don't know if the right thing to do is to lower the upper bound or raise the lower bound." Pettie, for one, doesn't expect the lower bound to change. "Usually in these situations, when you see a gap this close, and one of the bounds looks quite natural and the other looks unnatural, then the natural one is the right answer," he said. It's much more likely that any future improvements will affect the upper bound, bringing it all the way down to log n. "But the world's full of weird surprises." Share this article Copied! --------------------------------------------------------------------- Newsletter Get Quanta Magazine delivered to your inbox Subscribe now Recent newsletters [nadis-1654x1720] By Steve Nadis Contributing Writer --------------------------------------------------------------------- January 24, 2025 --------------------------------------------------------------------- View PDF/Print Mode algorithms computer science data All topics [2021PodcastAd_Article_160] Share this article Copied! --------------------------------------------------------------------- Newsletter Get Quanta Magazine delivered to your inbox Subscribe now Recent newsletters The Quanta Newsletter Get highlights of the most important news delivered to your email inbox Email [ ] [ ] Subscribe Recent newsletters Also in Computer Science Can AI Models Show Us How People Learn? Impossible Languages Point a Way. [ImpossibleLanguageModels-cr] natural language processing Can AI Models Show Us How People Learn? Impossible Languages Point a Way. By Ben Brubaker January 13, 2025 Read Later Why Computer Scientists Consult Oracles [Oracles-Explainer_crMarkBelan-Default] explainers Why Computer Scientists Consult Oracles By Ben Brubaker January 3, 2025 Read Later The Year in Computer Science [YIR-CS_crRichard-Borge-Default] 2024 in Review The Year in Computer Science By Bill Andrews December 19, 2024 Read Later Comment on this article Quanta Magazine moderates comments to facilitate an informed, substantive, civil conversation. Abusive, profane, self-promotional, misleading, incoherent or off-topic comments will be rejected. Moderators are staffed during regular business hours (New York time) and can only accept comments written in English. Show comments [Sawtooth-Extreme_crPaul-Chaikin-HP-1720x728] Next article The Jagged, Monstrous Function That Broke Calculus * About Quanta * Archive * Contact Us * Terms & Conditions * Privacy Policy --------------------------------------------------------------------- All Rights Reserved (c) 2025 An editorially independent publication supported by the Simons Foundation. Simons Foundation Log in to Quanta Use your social network Connect with Facebook Connect with Google or [ ] email [ ] password [*] Remember me [Log in] Forgot your password ? Don't have an account yet? Sign up Forgot your password? We'll email you instructions to reset your password [ ] email [Send] Change your password Enter your new password [ ] Password [ ] Retype new password [Send] Sign Up [ ] First Name [ ] Last Name [ ] Email [ ] Password [ ] Retype Password [Create an account] Creating an account means you accept Quanta Magazine's Terms & Conditions and Privacy Policy We care about your data, and we'd like to use cookies to give you a smooth browsing experience. Please agree and read more about our privacy policy.AGREEDISMISS