https://cacm.acm.org/news/an-algorithm-for-a-better-bookshelf/

Skip to content
 
Explore Topics

  * Architecture and Hardware
  * Artificial Intelligence and Machine Learning
  * Computer History
  * Computing Applications
  * Computing Profession
  * Data and Information
  * Education
  * HCI
  * Philosophy of Computing
  * Security and Privacy
  * Society
  * Software Engineering and Programming Languages
  * Systems and Networking
  * Theory

Latest Issue  
July 2025 cover

Latest Issue

July 2025, Vol. 68 No. 7

Previous Issue

June 2025, Vol. 68 No. 6
Explore the archive
Search Open Membership Navigation

  * Settings
  * Sign Out

Sign In  
Join ACM
[                    ]
Topics

  * Architecture and Hardware
  * Artificial Intelligence and Machine Learning
  * Computer History
  * Computing Applications
  * Computing Profession
  * Data and Information
  * Education
  * HCI
  * Philosophy of Computing
  * Security and Privacy
  * Society
  * Software Engineering and Programming Languages
  * Systems and Networking
  * Theory

Sections

  * Research and Advances
  * Opinion
  * Practice
  * News
  * Research Highlights
  * Careers

Magazine

  * Latest Issue
  * Magazine Archive
  * Editorial Staff and Board
  * Submit an Article
  * Alerts & Feeds
  * Author Guidelines

CACM Web Account

Membership in ACM includes a subscription to Communications of the
ACM (CACM), the computing industry's most trusted source for staying
connected to the world of advanced computing.

Sign In Sign Up

Communications of the ACM

  * About Us
  * Frequently Asked Questions
  * Contact Us

Follow Us

  * CACM on Twitter
  * CACM on Reddit
  * CACM on LinkedIn

News
Architecture and Hardware

An Algorithm for a Better Bookshelf

Managing the strategic positioning of empty spaces.

By Erica Klarreich
Posted Jun 30 2025
book shelf

Credit: Robuart / Shutterstock

  * Share
      + Twitter
      + Reddit
      + Hacker News
  * Print
  * Join the Discussion
  * View in the ACM Digital Library

      + Further Reading

Drop in at a library, and you'll likely notice that most shelves
aren't full--librarians leave some empty space on each shelf. That
way, when they get new books, they can slot them into place without
having to move too many other books.

It's a simple-enough idea, but one that arises in a host of settings
in computer science that involve sorted data, such as an
alphabetically ordered census repository, or a list of connections
between members of a social network. In such situations, where the
entries can number in the hundreds of billions, the strategic
positioning of empty spaces takes on great significance.

"Problems are getting bigger and bigger as we get more and more
data," said Helen Xu, an assistant professor in the School of
Computational Science and Engineering at the Georgia Institute of
Technology (Georgia Tech) in Atlanta. "At these scales, it becomes
important to efficiently manage how you add new entries."

The bookshelf problem (which computer scientists call the "list
labeling" problem) is one of the most basic topics in the field of
data structures. "It's the kind of problem you'd teach to freshman or
sophomore undergraduates," said Guy Blelloch, a professor of computer
science at Carnegie Mellon University in Pittsburgh.

Yet until recently, there was a wide gap between what computer
scientists could achieve algorithmically and what they knew about the
theoretical lower limit on how many books you should expect to have
to move when a new book arrives. On the algorithmic side, "There was
pretty much no progress for 30 years or more," Blelloch said.

Now, researchers have come up with a new algorithm that comes close
to that theoretical lower limit. Blelloch described it as "a very
elegant result."

The new approach will "hopefully open the door to new applications of
list labeling in settings where it wasn't useful before because the
cost was infeasible," said William Kuszmaul, an assistant professor
of computer science at Carnegie Mellon University and one of the
researchers who came up with the new algorithm.

The list labeling problem starts off with a bookshelf of size n,
along with some upper limit--say, 75% or 90%--for how much of the
bookshelf you're allowed to fill. There's some chosen ordering for
books (say, alphabetical by author), and as books arrive, one by one,
you choose spots for them that respect the ordering, moving other
books as needed. To find book-placing algorithms that are robust
under many different conditions, computer scientists imagine that the
books are being sent by an adversary who knows the algorithm and is
trying to force you to move as many books as possible. (In some
versions of the problem, the adversary may also remove books.)

If you were to use the most naive possible algorithm--the one that
places each book as close as possible to the start of the
bookshelf--then your adversary could force you to move every book
you've placed so far, simply by sending you books that precede every
book they've already sent. As the bookshelf fills up, the cost of
accommodating new books becomes proportional to n.

In 1981, computer scientists came up with a much better algorithm,
whose average cost for adding a new book is only about log^2n. This
algorithm starts by dividing the bookshelf into two equal chunks;
then it divides each of those chunks in half, and so on. For each
different size scale, the algorithm sets a threshold for how full the
chunks of that size are allowed to be, with small chunks allowed to
be fuller than large chunks. Once the books start arriving, if a new
book pushes some chunk over its density threshold, the algorithm
spreads out the books in the next larger chunk so they are evenly
spaced, easing the pressure on the smaller chunk.

In the years that followed, researchers came up with variations on
the theme of spreading books out evenly--"smooth" algorithms, as
they're called. However, no one could beat the log^2n cost of the
1981 algorithm. In 1990, computer scientists proved that no smooth
algorithm can do better than log^2n. Still, it didn't seem to make
sense to switch to a non-smooth algorithm--after all, if your
algorithm intentionally creates spots that are denser than average,
your adversary can target those spots and force you to move a lot of
books.

"The running belief seemed to become that log^2n should be optimal,
and that what we should be doing as a community is trying to prove
it," said Kuszmaul.

The focus shifted from looking for new algorithms to proving lower
bounds for wider and wider classes of algorithms. In 2012,
researchers proved that no deterministic algorithm can improve on log
^2n. "It's amazing how strong and robust that bound is," said Michael
Bender, of Stony Brook University, one of the authors of the new
paper.

Combined with the earlier results, "This meant that to beat log^2n,
you had to be both randomized and non-smooth, and you had to do both
things in a meaningful way," Kuszmaul said.

The first glimmer of how to achieve this came from an unexpected
direction: privacy research. In 2016, Bender and other researchers
astonished computer scientists by showing that it is possible to
create a bookshelf algorithm with a property called "history
independence" without worsening the log^2n cost. An algorithm is
history-independent if the current state of the bookshelf reveals
nothing about the prior history of insertions and deletions, apart
from showing which books are currently on the shelf.

Bender and his colleagues had created their history-independent
algorithm for the sake of its security properties, but they gradually
realized that history independence could also offer algorithmic
advantages. With a history-independent algorithm, an adversary can't
use some clever ordering of insertions to influence where dense
hotspots will appear--no matter the ordering, the bookshelf will end
up looking the same. This feature simplifies the collection of
potential dangers against which the algorithm must guard.

In 2022, Bender, Kuszmaul, and colleagues figured out how to make a
history-independent algorithm that incurs a cost per book of only log
^1.5n, breaking the four-decades-long dry spell. Their algorithm
modified the 2016 algorithm to make it more "lazy," as Kuszmaul put
it--the algorithm doesn't rush to smooth out vulnerable dense spots.
Next, the researchers added a layer of randomness to the shelving
process to hide the location of these dense spots, so the adversary
can't target them.

The researchers also proved their result was sharp: no
history-independent algorithm can do better than log^1.5n. Many
computer scientists guessed, therefore, that this new algorithm was
the final answer to the bookshelf problem. "We thought it was
probably the end of the road," Kuszmaul said.

Yet laziness is only one of two desirable properties for a bookshelf
algorithm. There's a second property that is fundamentally
incompatible with history independence: the ability to proactively
respond to an adversary's strategy.

If the adversary is targeting a particular region, you'd like to
quickly move free spots to that area to accommodate incoming books.
Yet "If you just do that naively, you are dead in the water,"
Kuszmaul said. "The adversary will figure out what you're doing, and
change their behavior to screw you up."

Now, in a paper posted online in May 2024, Bender, Kuszmaul, and five
colleagues have managed to combine the laziness benefits of history
independence with a more proactive response to an adversary's
strategy. Their new algorithm adapts to an adversary's strategy, but
on time scales that it picks randomly. "It's unpredictable enough
that the adversary can't exploit it," Kuszmaul said. The algorithm
"gets the best of both worlds--it behaves morally like a
history-independent algorithm, except in the one way it's being
strategically adaptive."

The new algorithm has an expected cost of logn x (log(logn))^2 per
insertion, a huge improvement on log^1.5n. For large values of n, the
logn factor dwarfs the (log(logn))^2 factor, meaning that the cost is
only slightly greater than logn, which computer scientists have long
known is a theoretical lower limit for any bookshelf algorithm.

The improved cost might make the new algorithm--or a simplified
version of it--valuable in new data applications, Xu said. Even though
many real-world data settings are not adversarial, situations without
an adversary can still sometimes involve sudden floods of data to
targeted spots, she noted. For instance, in a social network, a
famous person may swiftly gain followers after some newsworthy event.

"It's my belief that the algorithm will lead to performance
advantages in real-world situations," Bender said. However, the
researchers cautioned, as with any theoretical advance, making the
new algorithm into something that performs well in practice will
require serious work.

The recent improvements, Blelloch predicted, are likely to draw more
researchers to the bookshelf problem, with a view not just to
practical implementations, but also to further theoretical advances.
The burning question now is, can researchers get all the way to logn?

If they can--in a way that can be implemented in the real world--then
bookshelf algorithms might offer serious competition to binary search
trees, which are currently the most widely used data structure for
sorted data. Then, Kuszmaul said, "You would actually have a data
structure that would change the world."

There are, he warned, "a bunch of seemingly impossible hurdles
between here and there."

Then again, the new work "opens things up," Blelloch said. "Who knows
what's the best we can do? Maybe we can get to logn."

Further Reading

  * Bender, M. et al.
    Anti-persistence on persistent storage: History-independent
    sparse tables and dictionaries. In Proc. 35th ACM
    SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems
    (PODS), pages 289-302, June 2016.
  * Bender, M. et al.
    Online list labeling: breaking the log2n barrier. IEEE 63rd
    Annual Symposium on Foundations of Computer Science (FOCS), pages
    980-990, 2022.
  * Bender, M. et al.
    Nearly optimal list labeling. May 1, 2024 https://arxiv.org/abs/
    2405.00807
  * Bulanek, J. et al.
    Tight lower bounds for the online labeling problem. In Proc. 44th
    annual ACM Symposium on Theory of Computing (STOC), pages
    1185-1198, 2012.
  * Bulanek, J. et al.
    On randomized online labeling with polynomially many labels. In 
    Proc. International Colloquium on Autamata, Languages, and
    Programming (ICALP), volume 7965 of Lecture Notes in Computer
    Science, pages 291-203, Springer, 2013.
  * Dietz, P. and Zhang, J.
    Lower bounds for monotonic list labeling. In Scandinavian
    Workshop on Algorithm Theory, volume 447 of Lecture Notes in
    Computer Science, pages 173-180, Springer, 1990.
  * Dietz, P. et al.
    A tight lower bound for online monotonic list labeling. SIAM
    Journal on Discrete Mathematics, 18 (3):626-637, 2004.
  * Itai, A. et al.
    "A sparse table implementation of priority queues." In Proc. 8th
    International Colloquium on Automata, Languages and Programming
    (ICALP), volume 115 of Lecture Notes in Computer Science, pages
    417-431, Springer, 1981.

About the Authors

Erica Klarreich is a mathematics and science journalist based in
Berkeley, CA, USA.

  * Share
      + Twitter
      + Reddit
      + Hacker News
  * Print
  * Join the Discussion

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and
value to the computing community.

You Just Read

An Algorithm for a Better Bookshelf

View in the ACM Digital Library

(c) 2025 ACM 0001-0782/25/06

DOI

10.1145/3743656

Related Reading

  * Opinion

    A* Search: What's in a Name?

    Computing Applications

  * Research and Advances

    Sampling, Synthesis, and Input Devices

    Artificial Intelligence and Machine Learning

Advertisement

[vc]

Advertisement

[vc]

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Sign In Sign Up

The Latest from CACM

Explore More
News Jul 3 2025

Can AI Replace Copilots on Passenger Jets?

Paul Marks
Artificial Intelligence and Machine Learning
 
inside a cockpit
BLOG@CACM Jul 3 2025

Beyond Downtime: Architectural Resilience on Hyperscalers

Azim Shaik
Architecture and Hardware
 
Superman in the clouds
News Jul 1 2025

Can EU's EUR200B 'InvestAI' Initiative Close Europe's AI Gap with U.S.
and China?

Mark Halper
Artificial Intelligence and Machine Learning
 
hologram Earth

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the
future of the association. There are more ways than ever to get
involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access
publication.

By opening CACM to the world, we hope to increase engagement among
the broader computer science community and encourage non-members to
discover the rich resources ACM has to offer.

Learn More
 

  * CACM on Twitter
  * CACM on Reddit
  * CACM on LinkedIn

Topics

  * Architecture and Hardware
  * Artificial Intelligence and Machine Learning
  * Computer History
  * Computing Applications
  * Computing Profession
  * Data and Information
  * Education
  * HCI
  * Philosophy of Computing
  * Security and Privacy
  * Society
  * Software Engineering and Programming Languages
  * Systems and Networking
  * Theory

Magazine

  * Latest Issue
  * Magazine Archive
  * Editorial Staff and Board
  * Submit an Article
  * Alerts & Feeds
  * Author Guidelines

Communications of the ACM

  * About Us
  * Frequently Asked Questions
  * Contact Us
  * For Advertisers
  * Join ACM

(c) 2025 Communications of the ACM. All Rights Reserved.

  * Cookie Notice
  * Privacy Policy