https://www.quantamagazine.org/new-book-sorting-algorithm-almost-reaches-perfection-20250124/

 

  * Physics
  * Mathematics
  * Biology
  * Computer Science
  * Topics
  * Archive

  * Blog
  * Columns
  * Interviews
  * Podcasts
  * Puzzles
  * Multimedia
  * Videos
  * About Quanta

An editorially independent publication supported by the Simons
Foundation.

Follow Quanta
 
 
 
 
 
 
Newsletter

Get the latest news delivered to your inbox.

Email [                    ]
[                    ]
Subscribe
Recent newsletters
Gift Store
Shop Quanta gear
 

  * Physics
  * Mathematics
  * Biology
  * Computer Science
  * Topics
  * Archive

  * 
    Saved Articles

    Create a reading list by clicking the Read Later icon next to the
    articles you wish to save.

    See all saved articles
  * 
    Log out

    -----------------------------------------------------------------
   
    Change password

  * 
  * 

Type search term(s) and press enter

[                    ] What are you looking for?
Popular Searches

  * Mathematics
  * Physics
  * Black Holes
  * Evolution

Home
New Book-Sorting Algorithm Almost Reaches Perfection
 
Read Later

Share

 
 
 
Copied!
 
 
 
 
 
  *  
    Comments
  * Read Later
    Read Later

algorithms

New Book-Sorting Algorithm Almost Reaches Perfection

By Steve Nadis

January 24, 2025

The library sorting problem is used across computer science for
organizing far more than just books. A new solution is less than a
page-width away from the theoretical ideal.
 
Read Later

Kristina Armitage/Quanta Magazine

Introduction

 
[nadis-1654x1720]
By Steve Nadis

Contributing Writer

---------------------------------------------------------------------

January 24, 2025

---------------------------------------------------------------------
View PDF/Print Mode
algorithms computer science data All topics
[2021PodcastAd_Article_160]

Computer scientists often deal with abstract problems that are hard
to comprehend, but an exciting new algorithm matters to anyone who
owns books and at least one shelf. The algorithm addresses something
called the library sorting problem (more formally, the "list
labeling" problem). The challenge is to devise a strategy for
organizing books in some kind of sorted order -- alphabetically, for
instance -- that minimizes how long it takes to place a new book on
the shelf.

Imagine, for example, that you keep your books clumped together,
leaving empty space on the far right of the shelf. Then, if you add a
book by Isabel Allende to your collection, you might have to move
every book on the shelf to make room for it. That would be a
time-consuming operation. And if you then get a book by Douglas
Adams, you'll have to do it all over again. A better arrangement
would leave unoccupied spaces distributed throughout the shelf -- but
how, exactly, should they be distributed?

This problem was introduced in a 1981 paper, and it goes beyond
simply providing librarians with organizational guidance. That's
because the problem also applies to the arrangement of files on hard
drives and in databases, where the items to be arranged could number
in the billions. An inefficient system means significant wait times
and major computational expense. Researchers have invented some
efficient methods for storing items, but they've long wanted to
determine the best possible way.

Last year, in a study that was presented at the Foundations of
Computer Science conference in Chicago, a team of seven researchers
described a way to organize items that comes tantalizingly close to
the theoretical ideal. The new approach combines a little knowledge
of the bookshelf's past contents with the surprising power of
randomness.

"It's a very important problem," said Seth Pettie, a computer
scientist at the University of Michigan, because many of the data
structures we rely upon today store information sequentially. He
called the new work "extremely inspired [and] easily one of my top
three favorite papers of the year."

Narrowing Bounds

So how does one measure a well-sorted bookshelf? A common way is to
see how long it takes to insert an individual item. Naturally, that
depends on how many items there are in the first place, a value
typically denoted by n. In the Isabel Allende example, when all the
books have to move to accommodate a new one, the time it takes is
proportional to n. The bigger the n, the longer it takes. That makes
this an "upper bound" to the problem: It will never take longer than
a time proportional to n to add one book to the shelf.

Scientists Find Optimal Balance of Data Storage and Time
[HashTableLimits-byAllisonLi-Default]
algorithms

Scientists Find Optimal Balance of Data Storage and Time

February 8, 2024
 
Read Later

The authors of the 1981 paper that ushered in this problem wanted to
know if it was possible to design an algorithm with an average
insertion time much less than n. And indeed, they proved that one
could do better. They created an algorithm that was guaranteed to
achieve an average insertion time proportional to (log n)^2. This
algorithm had two properties: It was "deterministic," meaning that
its decisions did not depend on any randomness, and it was also
"smooth," meaning that the books must be spread evenly within
subsections of the shelf where insertions (or deletions) are made.
The authors left open the question of whether the upper bound could
be improved even further. For over four decades, no one managed to do
so.

However, the intervening years did see improvements to the lower
bound. While the upper bound specifies the maximum possible time
needed to insert a book, the lower bound gives the fastest possible
insertion time. To find a definitive solution to a problem,
researchers strive to narrow the gap between the upper and lower
bounds, ideally until they coincide. When that happens, the algorithm
is deemed optimal -- inexorably bounded from above and below, leaving
no room for further refinement.

In 2004, a team of researchers found that the best any algorithm
could do for the library sorting problem -- in other words, the
ultimate lower bound -- was log n. This result pertained to the most
general version of the problem, applying to any algorithm of any
type. Two of the same authors had already secured a result for a more
specific version of the problem in 1990, showing that for any smooth
algorithm, the lower bound is significantly higher: (log n)^2. And in
2012, another team proved the same lower bound, (log n)^2, for any
deterministic algorithm that does not use randomness at all.

These results showed that for any smooth or deterministic algorithm,
you could not achieve an average insertion time better than (log n)^
2, which was the same as the upper bound established in the 1981
paper. In other words, to improve that upper bound, researchers would
need to devise a different kind of algorithm. "If you're going to do
better, you have to be randomized and non-smooth," said Michael
Bender, a computer scientist at Stony Brook University.

Michael Bender in a blue shirt wearing black glasses.

Michael Bender went after the library sorting problem using an
approach that didn't necessarily make intuitive sense.

Courtesy of Michael Bender

But getting rid of smoothness, which requires items to be spread
apart more or less evenly, seemed like a mistake. (Remember the
problems that arose from our initial example -- the non-smooth
configuration where all the books were clumped together on the
left-hand side of the shelf.) And it also was not obvious how leaving
things to random chance -- essentially a coin toss -- would help
matters. "Intuitively, it wasn't clear that was a direction that made
sense," Bender said.

Nevertheless, in 2022, Bender and five colleagues decided to try out
a randomized, non-smooth algorithm anyway, just to see whether it
might offer any advantages.

A Secret History

Ironically, progress came from another restriction. There are sound
privacy or security reasons why you may want to use an algorithm
that's blind to the history of the bookshelf. "If I had 50 Shades of
Grey on my bookshelf and took it off," said William Kuszmaul of
Carnegie Mellon University, nobody would be able to tell.

William Kuszmaul in a black checkered shirt in front of a whiteboard.

William Kuszmaul, Bender and others lowered the upper bound on the
library sorting problem practically down to the ideal.

Rose Silver

In a 2022 paper, Bender, Kuszmaul and four co-authors created just
such an algorithm -- one that was "history independent," non-smooth
and randomized -- which finally reduced the 1981 upper bound, bringing
the average insertion time down to (log n)^1.5.

Kuszmaul remembers being surprised that a tool normally used to
ensure privacy could confer other benefits. "It's as if you used
cryptography to make your algorithm faster," he said. "Which just
seems kind of strange."

Helen Xu of the Georgia Institute of Technology, who was not part of
this research team, was also impressed.  She said that the idea of
using history independence for reasons other than security may have
implications for many other types of problems.

Closing the Gap

Bender, Kuszmaul and others made an even bigger improvement with last
year's paper. They again broke the record, lowering the upper bound
to (log n) times (log log n)^3 -- equivalent to (log n)^1.000...1. In
other words, they came exceedingly close to the theoretical limit,
the ultimate lower bound of log n.

Once again, their approach was non-smooth and randomized, but this
time their algorithm relied on a limited degree of history
dependence. It looked at past trends to plan for future events, but
only up to a point. Suppose, for instance, you've been getting a lot
of books by authors whose last name starts with N -- Nabokov, Neruda,
Ng. The algorithm extrapolates from that and assumes more are
probably coming, so it'll leave a little extra space in the N
section. But reserving too much space could lead to trouble if a
bunch of A-name authors start pouring in. "The way we made it a good
thing was by being strategically random about how much history to
look at when we make our decisions," Bender said.

The result built on and transformed their previous work. It "uses
randomness in a completely different way than the 2022 paper," Pettie
said.

Related:

---------------------------------------------------------------------

 1. Computer Scientists Invent an Efficient New Way to Count

 2. Researchers Approach New Speed Limit for Seminal Problem

 3. New Breakthrough Brings Matrix Multiplication Closer to Ideal

These papers collectively represent "a significant improvement" on
the theory side, said Brian Wheatman, a computer scientist at the
University of Chicago. "And on the applied side, I think they have
the potential for a big improvement as well."

Xu agrees. "In the past few years, there's been interest in using
data structures based on list labeling for storing and processing
dynamic graphs," she said. These advances would almost certainly make
things faster.

Meanwhile, there's more for theorists to contemplate. "We know that
we can almost do log n," Bender said, "[but] there's still this tiny
gap" -- the diminutive log log n term that stands in the way of a
complete solution. "We don't know if the right thing to do is to
lower the upper bound or raise the lower bound."

Pettie, for one, doesn't expect the lower bound to change. "Usually
in these situations, when you see a gap this close, and one of the
bounds looks quite natural and the other looks unnatural, then the
natural one is the right answer," he said. It's much more likely that
any future improvements will affect the upper bound, bringing it all
the way down to log n. "But the world's full of weird surprises."

Share this article
 
 
 
Copied!
 
 
 
 
 
---------------------------------------------------------------------
Newsletter

Get Quanta Magazine delivered to your inbox

Subscribe now
Recent newsletters
 
[nadis-1654x1720]
By Steve Nadis

Contributing Writer

---------------------------------------------------------------------

January 24, 2025

---------------------------------------------------------------------
View PDF/Print Mode
algorithms computer science data All topics
[2021PodcastAd_Article_160]
Share this article
 
 
 
Copied!
 
 
 
 
 
---------------------------------------------------------------------
Newsletter

Get Quanta Magazine delivered to your inbox

Subscribe now
Recent newsletters
The Quanta Newsletter

Get highlights of the most important news delivered to your email
inbox

Email [                    ]
[                    ]
Subscribe
Recent newsletters

Also in Computer Science

Can AI Models Show Us How People Learn? Impossible Languages Point a
Way.
[ImpossibleLanguageModels-cr]
natural language processing

Can AI Models Show Us How People Learn? Impossible Languages Point a
Way.

By Ben Brubaker
January 13, 2025
 
Read Later
Why Computer Scientists Consult Oracles
[Oracles-Explainer_crMarkBelan-Default]
explainers

Why Computer Scientists Consult Oracles

By Ben Brubaker
January 3, 2025
 
Read Later
The Year in Computer Science
[YIR-CS_crRichard-Borge-Default]
2024 in Review

The Year in Computer Science

By Bill Andrews
December 19, 2024
 
Read Later

Comment on this article

Quanta Magazine moderates comments to facilitate an informed,
substantive, civil conversation. Abusive, profane, self-promotional,
misleading, incoherent or off-topic comments will be rejected.
Moderators are staffed during regular business hours (New York time)
and can only accept comments written in English. 

Show comments
[Sawtooth-Extreme_crPaul-Chaikin-HP-1720x728]

Next article

The Jagged, Monstrous Function That Broke Calculus
 
 
 
 
 
 
 

  * About Quanta
  * Archive
  * Contact Us
  * Terms & Conditions
  * Privacy Policy

---------------------------------------------------------------------
All Rights Reserved (c) 2025
An editorially independent publication supported by the Simons
Foundation. Simons Foundation
Log in to Quanta

Use your social network

 Connect with Facebook
Connect with Google
or
[                    ] email
[                    ] password
[*] Remember me
[Log in]

Forgot your password ?

Don't have an account yet? Sign up
Forgot your password?

We'll email you instructions to reset your password

[                    ] email
[Send]
Change your password

Enter your new password

[                    ] Password
[                    ] Retype new password
[Send]
Sign Up
[                    ] First Name
[                    ] Last Name
[                    ] Email
[                    ] Password
[                    ] Retype Password
[Create an account]
Creating an account means you accept Quanta Magazine's
Terms & Conditions and Privacy Policy
We care about your data, and we'd like to use cookies to give you a
smooth browsing experience. Please agree and read more about our
privacy policy.AGREEDISMISS