[HN Gopher] Winners of the $10k ISBN visualization bounty
___________________________________________________________________
Winners of the $10k ISBN visualization bounty
Author : yamrzou
Score : 553 points
Date : 2025-02-25 06:26 UTC (2 days ago)
(HTM) web link (annas-archive.org)
(TXT) w3m dump (annas-archive.org)
| ofou wrote:
| My most sincere love to all shadow libraries out there, you're
| doing god's work.
| xtracto wrote:
| They do half of the work (which is a helluva lot)... the other
| half is done by the volunteers that digitize books.
|
| I was looking at my country's "shelve" and it's so sad to see
| so many missing titles. I almost wanted to go to my local
| livrary and digitize sone of them. The old ones that are out of
| print and imposible to acquire right now...
|
| So much knowledge lost.
| FabHK wrote:
| To be fair, the authors of the books also contribute quite a
| bit.
| dylan604 wrote:
| I had a Pavlovian response to reach for the defrag program at
| first sight of the top image.
| 2Gkashmiri wrote:
| win 98 had the best animation. pity everything beyond that was
| dogshit
| rahimnathwani wrote:
| This is amazing.
|
| One thing I found odd.
|
| I searched for 'Stubborn Attachments' which worked.
|
| On the same bookshelf there are several other Stripe Press books.
|
| One of them is called Zero to One Hundred, by Stephanie Friedman.
|
| When you search that book on Amazon, it has a different title,
| which I guess is reasonable as the book hasn't been published yet
| and they may not have finalized the decision:
| https://a.co/d/bQX5CNf
|
| Here's where it gets weird:
|
| - if you search for the book 'Zero to one hundred' (the title
| shown on the 'shelf') it doesn't come up
|
| - if you search for the book by its ISBN, it does come up, but
| the name displayed in the search results is yet another alternate
| title. And the bookshelf displays that title. So the same part of
| the bookshelf looks different depending on what you searched for.
|
| I haven't yet read the blog post about how this impressive
| visualization works, so I don't have an idea of why this is the
| case.
| spondyl wrote:
| I don't think it's the tool that's the issue, I think it's the
| book itself?
|
| If you search the ISBN on the web, you'll get "Zero to One
| Hundred" with the cover of "Built to Grow" and vice versa.
|
| There's also "Experiment, Build, Scale" which is the book that
| the visualisation shows, also with the same ISBN attributed to
| the previous two.
|
| Experiment, Build, Scale seems to be the only book of
| Stephanie's that is in Google Books while Worldcat has "Zero to
| One Hundred" with the cover art for "Built to Grow".
|
| Most of the online bookstore pages have this mess so I wouldn't
| blame the tool for what seems like an upstream data quality
| issue.
| rahimnathwani wrote:
| Sorry I didn't mean to make it seem like I think the tool is
| at fault.
|
| I just think it's interesting that the book title shows
| differently on the shelf depending on whether you reach it
| via an ISBN search, vs. if you discover it by panning from a
| nearby book.
| closewith wrote:
| > Most of the online bookstore pages have this mess so I
| wouldn't blame the tool for what seems like an upstream data
| quality issue.
|
| I think that's an uncharitable read of the GP's comment. I
| read it as curiosity about how the upstream data issues
| present in the tool, which also interests the part of my
| brain that likes to solve minor mysteries.
| vessenes wrote:
| Public request: anybody here who hates Anna's and wants to make a
| principled complaint about it? I love it and the idea of it so
| much, but I imagine some feel differently and I'd like to hear
| your best takedown shot.
| oguz-ismail wrote:
| it's not libgen
| kaladin-jasnah wrote:
| Their download wait time is upsetting to me because I'm
| impatient and cheap (else you have to pay). At least they have
| the libgen links now.
|
| I don't hate Anna's Archive though.
| ksynwa wrote:
| The "external links" section is the only thing making the
| website usable for non-subscribers.
| WillAdams wrote:
| Well, I made a comment at:
|
| https://news.ycombinator.com/item?id=43193432
|
| Does that count?
|
| The thing is, if we're going to have GPL software, then we need
| copyright.
|
| Yes, the terms/lengths need to be adjusted, but one can't do
| that by fiat/unilaterally.
| ivolimmen wrote:
| I have no idea whats on the site as my provider blocks it because
| European sanctions against Russia as this is on of the
| RussiaToday sites.
| notpushkin wrote:
| Do you have any evidence?
| chungus wrote:
| Judging by your profile location being in the netherlands, I
| think you are confusing the generic Ziggo ISP blocked page[1],
| where it lists Russia Today and Sputnik News and then in
| another post ThePirateBay
|
| In this case, the ISP blocked it because the website is anna's
| archive [2], which was blocked around a year ago, but they have
| not made a post about that.
|
| If you put "pcm." in front of the link it will work (for now)
|
| You should probably edit your post, so as not to misinform. But
| I have to admit this confusion stems from bad decisions at the
| ISP.
|
| [1] https://www.ziggo.nl/website-geblokkeerd
|
| [2] https://en.wikipedia.org/wiki/Anna%27s_Archive#Netherlands
| ivolimmen wrote:
| Seems that editing is not possible due to the negative point
| I gathered; which is weird as I just reported that I can not
| watch it. People seem to view everything though a political
| lens now. But thank you for your information; I saw the post.
| tokai wrote:
| No you were down voted because you claimed something false.
| svdr wrote:
| It it false but I would not blame the parent; the ISP
| blocked page is unclear and suggests the block is linked
| to Russia.
| bawolff wrote:
| I feel like visualizations of large datasets which are viewer-
| directed (i.e. they want you to "explore" the data instead of
| trying to tell you something specific about it or communicate a
| narrative) are often "pretty" but never particularly
| enlightening. I feel like that holds true for these in
| particular.
| WillAdams wrote:
| The thing is, ISBNs map to:
|
| - publisher - assigned title - (roughly) order of publication
|
| That's all that they communicate --- there is no hierarchy here
| to aid in discovery or to organize the content (and further
| complicating things, the same text may appear multiple times in
| a different binding --- a differentiation which is immaterial
| to an e-book).
|
| The elephant in the room of course is the matter that "Anna's
| Archive" is not a legitimate book repository, but a piracy
| site, so what they are showcasing is how compleat (and brazen)
| their theft (and attendant lack of compensation) is.
|
| This would be far more interesting if it were based on an
| hierarchical system such as LoC, and instead afforded an
| interface for accessing legitimately available books as are
| available from https://www.gutenberg.org/ or listed at:
| http://onlinebooks.library.upenn.edu/ or worked on at:
| https://www.wikibooks.org/
| bawolff wrote:
| > The thing is, ISBNs map to: > - publisher - assigned title
| - (roughly) order of publication
|
| I assume the task isn't just to visualize isbns literally.
| Presumably you are allowed to cross reference with other
| data.
|
| > The elephant in the room of course is the matter that
| "Anna's Archive" is not a legitimate book repository, but a
| piracy site,
|
| I think its pretty clear that the target audience doesn't
| care. I don't think the target audience holding differing
| political views is really a valid critcism of the project. It
| should be evaluated in the context and audience it was
| created for.
| WillAdams wrote:
| This is not a political stance, but one of basic questions
| of authorship and what compensation authors should receive
| and what control they should have over their work.
|
| See arguments by Alexander Pope in Pope _V._ Curll.
| mistrial9 wrote:
| when China decided to wholesale ignore Western copyright
| in the digital age, completely.. the equation changed
| IMHO.
| WillAdams wrote:
| Yes, but dealing with that politically would be made
| easier by having the moral high ground.
| mannyv wrote:
| Not really, because it depends on the basis of morality.
| In fact, this 'morality' problem is shown in the
| existence of libraries in the US.
|
| Is a book a collective good? Or property? In the US the
| answer is 'both' in an awkward way. But the US does know
| that having books behind a paywall is not in society's
| best interest.
|
| And in reality 99% of the books will never be read, which
| makes their 'value' as property suspect.
| WillAdams wrote:
| If so few books are to be read, then why is it so
| difficult to pay for those which are?
| bawolff wrote:
| > This is not a political stance, but one of basic
| questions of authorship and what compensation authors
| should receive and what control they should have over
| their work.
|
| Questions of compensation and ownership are one of the
| most political questions of all.
|
| What exactly do you think communist revolutions were
| revolting over?
| zozbot234 wrote:
| > This would be far more interesting if it were based on an
| hierarchical system such as LoC, and instead afforded an
| interface for accessing legitimately available books
|
| Isn't this exactly what Open Library does?
| WillAdams wrote:
| Given that "Textbooks" are separated out and "Animals" and
| "Childrens' Books" and "Health & Wellness" are top-level
| categories? and that it mixes in books which are not
| available for download, not really.
|
| The UI is not all that great either.
|
| I would like to see:
|
| - an hierarchical list with a hierarchy which actually
| makes sense and truly organizes knowledge
|
| - of legitimately available downloadable books
|
| - which has a nice UI
|
| but it's far more important that LLMs have training data
| without consideration of recompense than any other
| consideration.
| pphysch wrote:
| That's my issue with attempts to 3D-ify viz. Unless you are
| actually modeling a 3D volume, like medical imaging or CAD, the
| added "forced exploration" of 3D simply hides insights.
| boznz wrote:
| no wonder nobody can find my book :-)
| robingchan wrote:
| This was great fun to enter nevertheless, congrats all involved.
|
| My entry is still live for now for anyone curious:
|
| https://d199hl4t3ts6d9.cloudfront.net/
| TomK32 wrote:
| Fascinating. It allows for some interesting observations when you
| as I zoom in on this one (sadly no direct links to coords/zoom
| level) https://archive.anarchy.cool/maps/isbn.html You can find
| publishers like Hueber Verlag[1] in the eastern part of the
| German language section. They spread their ISBN numbers in a
| pattern with something like 1360000 between them (I know, ISBN
| having a checksum leads to gaps in the numbering), which
| generates a repetitive pattern with plenty of empty space. It is
| so wasteful on this huge chunk they have.
|
| Are there no rules on how publishers have to assign their
| numbers? Just so they could hand back an unused block if they
| don't need it any longer.
|
| [1] I can see how publishing learning material in 30 languages
| can give people "ideas" when assigning ISBN numbers
| https://de.wikipedia.org/wiki/Hueber_Verlag
| matthberg wrote:
| The winning submission [0] was discussed on HN recently [1]. It's
| highly impressive from both technical decisions and graphic
| design viewpoints, it somehow elegantly visualizes _2 billion_
| books (in a way that resembles a bookcase no less).
|
| [0]: https://phiresky.github.io/blog/2025/visualizing-all-
| books-i...
|
| [1]: https://news.ycombinator.com/item?id=42897120
| rishikeshs wrote:
| Noob here, but can someone explain like im fivr, why this is
| important? It looks beautiful nevertheless
| _mitterpach wrote:
| I'll start off by quoting the winning submission.
|
| _Libraries have been trying to collect humanity's knowledge
| almost since the invention of writing. In the digital age, it
| might actually be possible to create a comprehensive collection
| of all human writing that meets certain criteria. That's what
| shadow libraries do - collect and share as many books as
| possible._
|
| _One shadow library, Anna's Archive (which I will not link
| here directly due to copyright concerns), recently posed a
| question: How could we effectively visualize 100,000,000 books
| or more at once? There's lots of data to view: Titles, authors,
| which countries the books come from, which publishers, how old
| they are, how many libraries hold them, whether they are
| available digitally, etc._ -
| https://phiresky.github.io/blog/2025/visualizing-all-books-i...
|
| Basically, legally gray online book repositories such as Anna's
| Archive, who was the creator of this bounty, are trying to
| collect a lot of books. The question quickly arises - how many
| books are there?
|
| The best way to track books is by using ISBN, international
| standard book number, basically the personal id of any given
| books, given to books by an international agency. Now that you
| know which books exist, you can check which books your
| repository already has and which ones are missing.
|
| But ISBN covers the space of over 2 billion possible existing
| books. That's a lot. So, Anna's Archive has created a contest
| to display this space in the cleanest way possible. The winning
| submission is very nicely done, and in my view very well
| deserving of the 6,000$ bounty.
| rishikeshs wrote:
| Ok so from what I understood, this visualisation displays all
| the ISBNs that are assigned into countries, then across
| publishers. Books that are not highlighted are the ones that
| are not present on Annas Archives? Is that so?
|
| Also what do you mean by unassigned?
| c-fe wrote:
| Annas Archive has both books in their archive, but they
| also have other datasets that connect a book ISBN to the
| metadata (title, author, publisher, ...).
|
| In my visualisation https://isbnviz.pages.dev you can see
| which books they actually have the files of (blue) and
| which ones they know exist because they have the metadata
| from some other source (like google books, ...) (red).
| Finally, there are also ISBNs not contained in any of the
| sets that Annas Archive has, and these are either assigned
| or not assigned. A lot of the 979 prefixed ISBNs are not
| assigned, that means, no country/publisher has the right to
| assign them to a book. Other ISBNs are assigned to a
| publisher, but they just haven't published a book with that
| ISBN yet. Or they may have published a book, but Anna's
| archive doesnt know about the book because its not in their
| (or the ones they scraped) dataset.
| tokai wrote:
| I like Annas Archive but its definitely not legally gray.
| spudlyo wrote:
| There are places that have a minimal or no formal
| recognition of IP rights. Not counting stateless or
| breakaway regions like Transnistria and Sealand, countries
| like Somalia and South Sudan either do not have a
| government-run IP system, or in the case of South Sudan are
| not part of the Berne Convention. I doubt that Anna's
| Archive operates in one of these places, but there are
| still safe harbors for their mission.
| c-fe wrote:
| Im slightly surprised mine won 3rd place, I believe they liked my
| simplicity and visualisation. Hosted at https://isbnviz.pages.dev
|
| But honestly, I find both of these better: -
| https://bwv-1011.github.io/isbn-viewer/ -
| https://anna.candyland.page/map-sample.html
|
| in particular the one from bwv is technically similar but just
| all around better than mine, it is what I would want mine to be
| highcountess wrote:
| I'm glad you said that, because I was also surprised by the
| fact that the bwv-1011 only made it to honorable mention even
| though its technical focus was on visualizing the rarity of
| books, which ostensibly was the primary objective of the whole
| effort.
| gknoy wrote:
| I really like that your page talks about _why_ a Hilbert
| curve is good. I don't remember ever learning about those
| before, and now hopefully if I'm ever trying to visualize 1D
| data, I might remember that :)
| matsemann wrote:
| What is it that make yours and bws' have a floating island with
| spain/italy/++ in addition to them being represented in the
| main blob?
| c-fe wrote:
| Its due to how those ISBN ranges were handed out - I think
| they probably gave a block like 978-53 (for example) to those
| countries, meaning the right to distributed ISBNs
| 978-530-000-000 to 978-539-999-999 and then later they ran
| out or had all subblocks distributed to publishers, and then
| they got a new block further away (so not 978-54 in my
| example) and therefore those blocks are not numerically close
| to each other and thus also they are separate "islands" in
| the hilbert space.
| matsemann wrote:
| I see, thanks for explaining. Cool that your visualization
| then shows these idiosyncrasies!
| c-fe wrote:
| Thanks! That is indeed all thanks to using the hilbert
| curve fractal which has the property that it maps numbers
| which are close together onto 2d (or higher dimensional)
| coordinates which are close together, its a very cool
| property! Its used in lots of contexts for that reason
| abetusk wrote:
| I'm also surprised that I got 3rd place.
|
| But in terms of comparison of yours to bwv, I don't agree that
| bwv's is technically superior in every way. It lacks
| comparison, ISBN selection and link creation. bwv's main focus
| looks to be that one feature to highlight the rare books
| without trying to get the other requirements that AA wanted.
| c-fe wrote:
| Congrats to you too! Indeed, I think they could have improved
| the visual and comparison part, its a bit dark and not too
| interesting to look at. But I am envious of how smooth their
| tiling is. My tiles are 4096x4096 which allows me to satisfy
| both the 20,000 file limit and the max 20mb file limit
| imposed by cloudflare. I had some issues with smaller tiles,
| and wanting to host it on cloudflare restricted me from doing
| 512x512 tiles iirc. Also I really like that they extracted
| the publisher information and put that as a pmtile vector,
| thats something I attempted but ultimately ran out of time
| with.
| soneca wrote:
| Where the database is from? How and how often is it updated?
|
| I have two self-published books with ISBNs. Neither of them has
| the details in the 1st place submission (I assume it won't be in
| any other as well?).
|
| One was published on Feb 23 and the other on Dec 24. I had hoped
| at least the older one would be there. Does anyone know why they
| are not?
|
| The ISBNs:
|
| - 9786500718836
|
| - 9786501276830
| ziddoap wrote:
| From https://annas-archive.org/blog/all-isbns.html :
|
| > _We started mapping ISBNs two years ago with our scrape of
| ISBNdb. Since then, we have scraped many more metadata sources,
| such as Worldcat, Google Books, Goodreads, Libby, and more. A
| full list can be found on the "Datasets" and "Torrents" pages
| on Anna's Archive. We now have by far the largest fully open,
| easily downloadable collection of book metadata (and thus
| ISBNs) in the world._
|
| So, it your books would need to be present in one of the
| databases that Anna's Archive scraped, at the time they scraped
| it.
| jonplackett wrote:
| Is there anywhere that lists/publicises/collates competitions
| like this?
|
| I would like to have had a go at this but you often only find out
| about these things when winners are announced.
| bondant wrote:
| The winning submission kind of remind me of the Eagle mode file
| manager where you can zoom into a directory to see files in it
| and keep zooming to access subdirectories.
|
| https://eaglemode.sourceforge.net/emvideo.html
| ChrisMarshallNY wrote:
| Love the Trantor reference!
| franciscop wrote:
| I'm curious why there's no clear "Spanish" in these ISBN
| visualizations; there's 2 slots for English, one for France,
| Germany, Japan, Soviet Union, China, etc. but no big one for
| Spain. Do we really have so few books in Spanish? Or is this a
| predominantly English distribution?
|
| I say this as someone who grew up in Spanish libraries and book
| shops, surrounded and immersed in Spanish books, so it feels a
| bit strange to see the tiny bit we occupy in the world map here.
| rsecora wrote:
| The dataset consists of books from the Anna Archive, each
| identified by an ISBN. The ISBNs and titles are extracted from
| datasets [1], which include magazines and books primarily in
| Chinese, English, and French.
|
| Example: Germany publishes five times more books than the
| Netherlands [2], and Spain publishes twice as many books as the
| Netherlands. However, in visualizations, Germany appears
| similar to the Netherlands, while Spain and Mexico do not
| aligned with the high-level labels [3].
|
| [1] https://annas-archive.li/datasets
|
| [2] https://internationalpublishers.org/wp-
| content/uploads/2023/...
|
| [3] https://software.annas-archive.li/AnnaArchivist/annas-
| archiv...
| glenstein wrote:
| >I'm curious why there's no clear "Spanish" in these ISBN
| visualizations
|
| I had the exact same question, and I do have a completely
| unsupported theory. There's one large block that appears to be
| Argentina, or possibly Peru, although their titles are on the
| fringes of the large block. The block is otherwise unlabled, no
| name sitting at the center of the block like you see with the
| other major ones. I would be slightly surprised if it were
| entirely argentina, but it would make a lot of sense if that
| block were Spanish.
| layer8 wrote:
| Does Anna's Archive track and account for duplicate ISBNs?
|
| https://scis.edublogs.org/2017/09/28/the-dreaded-case-of-dup...
| no-reply wrote:
| I don't see any arabic literature. Curious whether that due to
| lack of actual digital/ocr text or lack of availability of the
| pdf/epub formats of the books.
| divbzero wrote:
| These ISBN visualizations remind me of the maps of IPv4 address
| space.
|
| https://xkcd.com/195/
|
| https://ant.isi.edu/address/
|
| https://www.caida.org/archive/id-consumption/census-map/
___________________________________________________________________
(page generated 2025-02-27 23:00 UTC)