hngopher.com

       [HN Gopher] Winners of the $10k ISBN visualization bounty
       ___________________________________________________________________
        
       Winners of the $10k ISBN visualization bounty
        
       Author : yamrzou
       Score  : 553 points
       Date   : 2025-02-25 06:26 UTC (2 days ago)
        
 (HTM) web link (annas-archive.org)
 (TXT) w3m dump (annas-archive.org)
        
       | ofou wrote:
       | My most sincere love to all shadow libraries out there, you're
       | doing god's work.
        
         | xtracto wrote:
         | They do half of the work (which is a helluva lot)... the other
         | half is done by the volunteers that digitize books.
         | 
         | I was looking at my country's "shelve" and it's so sad to see
         | so many missing titles. I almost wanted to go to my local
         | livrary and digitize sone of them. The old ones that are out of
         | print and imposible to acquire right now...
         | 
         | So much knowledge lost.
        
           | FabHK wrote:
           | To be fair, the authors of the books also contribute quite a
           | bit.
        
       | dylan604 wrote:
       | I had a Pavlovian response to reach for the defrag program at
       | first sight of the top image.
        
         | 2Gkashmiri wrote:
         | win 98 had the best animation. pity everything beyond that was
         | dogshit
        
       | rahimnathwani wrote:
       | This is amazing.
       | 
       | One thing I found odd.
       | 
       | I searched for 'Stubborn Attachments' which worked.
       | 
       | On the same bookshelf there are several other Stripe Press books.
       | 
       | One of them is called Zero to One Hundred, by Stephanie Friedman.
       | 
       | When you search that book on Amazon, it has a different title,
       | which I guess is reasonable as the book hasn't been published yet
       | and they may not have finalized the decision:
       | https://a.co/d/bQX5CNf
       | 
       | Here's where it gets weird:
       | 
       | - if you search for the book 'Zero to one hundred' (the title
       | shown on the 'shelf') it doesn't come up
       | 
       | - if you search for the book by its ISBN, it does come up, but
       | the name displayed in the search results is yet another alternate
       | title. And the bookshelf displays that title. So the same part of
       | the bookshelf looks different depending on what you searched for.
       | 
       | I haven't yet read the blog post about how this impressive
       | visualization works, so I don't have an idea of why this is the
       | case.
        
         | spondyl wrote:
         | I don't think it's the tool that's the issue, I think it's the
         | book itself?
         | 
         | If you search the ISBN on the web, you'll get "Zero to One
         | Hundred" with the cover of "Built to Grow" and vice versa.
         | 
         | There's also "Experiment, Build, Scale" which is the book that
         | the visualisation shows, also with the same ISBN attributed to
         | the previous two.
         | 
         | Experiment, Build, Scale seems to be the only book of
         | Stephanie's that is in Google Books while Worldcat has "Zero to
         | One Hundred" with the cover art for "Built to Grow".
         | 
         | Most of the online bookstore pages have this mess so I wouldn't
         | blame the tool for what seems like an upstream data quality
         | issue.
        
           | rahimnathwani wrote:
           | Sorry I didn't mean to make it seem like I think the tool is
           | at fault.
           | 
           | I just think it's interesting that the book title shows
           | differently on the shelf depending on whether you reach it
           | via an ISBN search, vs. if you discover it by panning from a
           | nearby book.
        
           | closewith wrote:
           | > Most of the online bookstore pages have this mess so I
           | wouldn't blame the tool for what seems like an upstream data
           | quality issue.
           | 
           | I think that's an uncharitable read of the GP's comment. I
           | read it as curiosity about how the upstream data issues
           | present in the tool, which also interests the part of my
           | brain that likes to solve minor mysteries.
        
       | vessenes wrote:
       | Public request: anybody here who hates Anna's and wants to make a
       | principled complaint about it? I love it and the idea of it so
       | much, but I imagine some feel differently and I'd like to hear
       | your best takedown shot.
        
         | oguz-ismail wrote:
         | it's not libgen
        
         | kaladin-jasnah wrote:
         | Their download wait time is upsetting to me because I'm
         | impatient and cheap (else you have to pay). At least they have
         | the libgen links now.
         | 
         | I don't hate Anna's Archive though.
        
           | ksynwa wrote:
           | The "external links" section is the only thing making the
           | website usable for non-subscribers.
        
         | WillAdams wrote:
         | Well, I made a comment at:
         | 
         | https://news.ycombinator.com/item?id=43193432
         | 
         | Does that count?
         | 
         | The thing is, if we're going to have GPL software, then we need
         | copyright.
         | 
         | Yes, the terms/lengths need to be adjusted, but one can't do
         | that by fiat/unilaterally.
        
       | ivolimmen wrote:
       | I have no idea whats on the site as my provider blocks it because
       | European sanctions against Russia as this is on of the
       | RussiaToday sites.
        
         | notpushkin wrote:
         | Do you have any evidence?
        
         | chungus wrote:
         | Judging by your profile location being in the netherlands, I
         | think you are confusing the generic Ziggo ISP blocked page[1],
         | where it lists Russia Today and Sputnik News and then in
         | another post ThePirateBay
         | 
         | In this case, the ISP blocked it because the website is anna's
         | archive [2], which was blocked around a year ago, but they have
         | not made a post about that.
         | 
         | If you put "pcm." in front of the link it will work (for now)
         | 
         | You should probably edit your post, so as not to misinform. But
         | I have to admit this confusion stems from bad decisions at the
         | ISP.
         | 
         | [1] https://www.ziggo.nl/website-geblokkeerd
         | 
         | [2] https://en.wikipedia.org/wiki/Anna%27s_Archive#Netherlands
        
           | ivolimmen wrote:
           | Seems that editing is not possible due to the negative point
           | I gathered; which is weird as I just reported that I can not
           | watch it. People seem to view everything though a political
           | lens now. But thank you for your information; I saw the post.
        
             | tokai wrote:
             | No you were down voted because you claimed something false.
        
               | svdr wrote:
               | It it false but I would not blame the parent; the ISP
               | blocked page is unclear and suggests the block is linked
               | to Russia.
        
       | bawolff wrote:
       | I feel like visualizations of large datasets which are viewer-
       | directed (i.e. they want you to "explore" the data instead of
       | trying to tell you something specific about it or communicate a
       | narrative) are often "pretty" but never particularly
       | enlightening. I feel like that holds true for these in
       | particular.
        
         | WillAdams wrote:
         | The thing is, ISBNs map to:
         | 
         | - publisher - assigned title - (roughly) order of publication
         | 
         | That's all that they communicate --- there is no hierarchy here
         | to aid in discovery or to organize the content (and further
         | complicating things, the same text may appear multiple times in
         | a different binding --- a differentiation which is immaterial
         | to an e-book).
         | 
         | The elephant in the room of course is the matter that "Anna's
         | Archive" is not a legitimate book repository, but a piracy
         | site, so what they are showcasing is how compleat (and brazen)
         | their theft (and attendant lack of compensation) is.
         | 
         | This would be far more interesting if it were based on an
         | hierarchical system such as LoC, and instead afforded an
         | interface for accessing legitimately available books as are
         | available from https://www.gutenberg.org/ or listed at:
         | http://onlinebooks.library.upenn.edu/ or worked on at:
         | https://www.wikibooks.org/
        
           | bawolff wrote:
           | > The thing is, ISBNs map to: > - publisher - assigned title
           | - (roughly) order of publication
           | 
           | I assume the task isn't just to visualize isbns literally.
           | Presumably you are allowed to cross reference with other
           | data.
           | 
           | > The elephant in the room of course is the matter that
           | "Anna's Archive" is not a legitimate book repository, but a
           | piracy site,
           | 
           | I think its pretty clear that the target audience doesn't
           | care. I don't think the target audience holding differing
           | political views is really a valid critcism of the project. It
           | should be evaluated in the context and audience it was
           | created for.
        
             | WillAdams wrote:
             | This is not a political stance, but one of basic questions
             | of authorship and what compensation authors should receive
             | and what control they should have over their work.
             | 
             | See arguments by Alexander Pope in Pope _V._ Curll.
        
               | mistrial9 wrote:
               | when China decided to wholesale ignore Western copyright
               | in the digital age, completely.. the equation changed
               | IMHO.
        
               | WillAdams wrote:
               | Yes, but dealing with that politically would be made
               | easier by having the moral high ground.
        
               | mannyv wrote:
               | Not really, because it depends on the basis of morality.
               | In fact, this 'morality' problem is shown in the
               | existence of libraries in the US.
               | 
               | Is a book a collective good? Or property? In the US the
               | answer is 'both' in an awkward way. But the US does know
               | that having books behind a paywall is not in society's
               | best interest.
               | 
               | And in reality 99% of the books will never be read, which
               | makes their 'value' as property suspect.
        
               | WillAdams wrote:
               | If so few books are to be read, then why is it so
               | difficult to pay for those which are?
        
               | bawolff wrote:
               | > This is not a political stance, but one of basic
               | questions of authorship and what compensation authors
               | should receive and what control they should have over
               | their work.
               | 
               | Questions of compensation and ownership are one of the
               | most political questions of all.
               | 
               | What exactly do you think communist revolutions were
               | revolting over?
        
           | zozbot234 wrote:
           | > This would be far more interesting if it were based on an
           | hierarchical system such as LoC, and instead afforded an
           | interface for accessing legitimately available books
           | 
           | Isn't this exactly what Open Library does?
        
             | WillAdams wrote:
             | Given that "Textbooks" are separated out and "Animals" and
             | "Childrens' Books" and "Health & Wellness" are top-level
             | categories? and that it mixes in books which are not
             | available for download, not really.
             | 
             | The UI is not all that great either.
             | 
             | I would like to see:
             | 
             | - an hierarchical list with a hierarchy which actually
             | makes sense and truly organizes knowledge
             | 
             | - of legitimately available downloadable books
             | 
             | - which has a nice UI
             | 
             | but it's far more important that LLMs have training data
             | without consideration of recompense than any other
             | consideration.
        
         | pphysch wrote:
         | That's my issue with attempts to 3D-ify viz. Unless you are
         | actually modeling a 3D volume, like medical imaging or CAD, the
         | added "forced exploration" of 3D simply hides insights.
        
       | boznz wrote:
       | no wonder nobody can find my book :-)
        
       | robingchan wrote:
       | This was great fun to enter nevertheless, congrats all involved.
       | 
       | My entry is still live for now for anyone curious:
       | 
       | https://d199hl4t3ts6d9.cloudfront.net/
        
       | TomK32 wrote:
       | Fascinating. It allows for some interesting observations when you
       | as I zoom in on this one (sadly no direct links to coords/zoom
       | level) https://archive.anarchy.cool/maps/isbn.html You can find
       | publishers like Hueber Verlag[1] in the eastern part of the
       | German language section. They spread their ISBN numbers in a
       | pattern with something like 1360000 between them (I know, ISBN
       | having a checksum leads to gaps in the numbering), which
       | generates a repetitive pattern with plenty of empty space. It is
       | so wasteful on this huge chunk they have.
       | 
       | Are there no rules on how publishers have to assign their
       | numbers? Just so they could hand back an unused block if they
       | don't need it any longer.
       | 
       | [1] I can see how publishing learning material in 30 languages
       | can give people "ideas" when assigning ISBN numbers
       | https://de.wikipedia.org/wiki/Hueber_Verlag
        
       | matthberg wrote:
       | The winning submission [0] was discussed on HN recently [1]. It's
       | highly impressive from both technical decisions and graphic
       | design viewpoints, it somehow elegantly visualizes _2 billion_
       | books (in a way that resembles a bookcase no less).
       | 
       | [0]: https://phiresky.github.io/blog/2025/visualizing-all-
       | books-i...
       | 
       | [1]: https://news.ycombinator.com/item?id=42897120
        
       | rishikeshs wrote:
       | Noob here, but can someone explain like im fivr, why this is
       | important? It looks beautiful nevertheless
        
         | _mitterpach wrote:
         | I'll start off by quoting the winning submission.
         | 
         |  _Libraries have been trying to collect humanity's knowledge
         | almost since the invention of writing. In the digital age, it
         | might actually be possible to create a comprehensive collection
         | of all human writing that meets certain criteria. That's what
         | shadow libraries do - collect and share as many books as
         | possible._
         | 
         |  _One shadow library, Anna's Archive (which I will not link
         | here directly due to copyright concerns), recently posed a
         | question: How could we effectively visualize 100,000,000 books
         | or more at once? There's lots of data to view: Titles, authors,
         | which countries the books come from, which publishers, how old
         | they are, how many libraries hold them, whether they are
         | available digitally, etc._ -
         | https://phiresky.github.io/blog/2025/visualizing-all-books-i...
         | 
         | Basically, legally gray online book repositories such as Anna's
         | Archive, who was the creator of this bounty, are trying to
         | collect a lot of books. The question quickly arises - how many
         | books are there?
         | 
         | The best way to track books is by using ISBN, international
         | standard book number, basically the personal id of any given
         | books, given to books by an international agency. Now that you
         | know which books exist, you can check which books your
         | repository already has and which ones are missing.
         | 
         | But ISBN covers the space of over 2 billion possible existing
         | books. That's a lot. So, Anna's Archive has created a contest
         | to display this space in the cleanest way possible. The winning
         | submission is very nicely done, and in my view very well
         | deserving of the 6,000$ bounty.
        
           | rishikeshs wrote:
           | Ok so from what I understood, this visualisation displays all
           | the ISBNs that are assigned into countries, then across
           | publishers. Books that are not highlighted are the ones that
           | are not present on Annas Archives? Is that so?
           | 
           | Also what do you mean by unassigned?
        
             | c-fe wrote:
             | Annas Archive has both books in their archive, but they
             | also have other datasets that connect a book ISBN to the
             | metadata (title, author, publisher, ...).
             | 
             | In my visualisation https://isbnviz.pages.dev you can see
             | which books they actually have the files of (blue) and
             | which ones they know exist because they have the metadata
             | from some other source (like google books, ...) (red).
             | Finally, there are also ISBNs not contained in any of the
             | sets that Annas Archive has, and these are either assigned
             | or not assigned. A lot of the 979 prefixed ISBNs are not
             | assigned, that means, no country/publisher has the right to
             | assign them to a book. Other ISBNs are assigned to a
             | publisher, but they just haven't published a book with that
             | ISBN yet. Or they may have published a book, but Anna's
             | archive doesnt know about the book because its not in their
             | (or the ones they scraped) dataset.
        
           | tokai wrote:
           | I like Annas Archive but its definitely not legally gray.
        
             | spudlyo wrote:
             | There are places that have a minimal or no formal
             | recognition of IP rights. Not counting stateless or
             | breakaway regions like Transnistria and Sealand, countries
             | like Somalia and South Sudan either do not have a
             | government-run IP system, or in the case of South Sudan are
             | not part of the Berne Convention. I doubt that Anna's
             | Archive operates in one of these places, but there are
             | still safe harbors for their mission.
        
       | c-fe wrote:
       | Im slightly surprised mine won 3rd place, I believe they liked my
       | simplicity and visualisation. Hosted at https://isbnviz.pages.dev
       | 
       | But honestly, I find both of these better: -
       | https://bwv-1011.github.io/isbn-viewer/ -
       | https://anna.candyland.page/map-sample.html
       | 
       | in particular the one from bwv is technically similar but just
       | all around better than mine, it is what I would want mine to be
        
         | highcountess wrote:
         | I'm glad you said that, because I was also surprised by the
         | fact that the bwv-1011 only made it to honorable mention even
         | though its technical focus was on visualizing the rarity of
         | books, which ostensibly was the primary objective of the whole
         | effort.
        
           | gknoy wrote:
           | I really like that your page talks about _why_ a Hilbert
           | curve is good. I don't remember ever learning about those
           | before, and now hopefully if I'm ever trying to visualize 1D
           | data, I might remember that :)
        
         | matsemann wrote:
         | What is it that make yours and bws' have a floating island with
         | spain/italy/++ in addition to them being represented in the
         | main blob?
        
           | c-fe wrote:
           | Its due to how those ISBN ranges were handed out - I think
           | they probably gave a block like 978-53 (for example) to those
           | countries, meaning the right to distributed ISBNs
           | 978-530-000-000 to 978-539-999-999 and then later they ran
           | out or had all subblocks distributed to publishers, and then
           | they got a new block further away (so not 978-54 in my
           | example) and therefore those blocks are not numerically close
           | to each other and thus also they are separate "islands" in
           | the hilbert space.
        
             | matsemann wrote:
             | I see, thanks for explaining. Cool that your visualization
             | then shows these idiosyncrasies!
        
               | c-fe wrote:
               | Thanks! That is indeed all thanks to using the hilbert
               | curve fractal which has the property that it maps numbers
               | which are close together onto 2d (or higher dimensional)
               | coordinates which are close together, its a very cool
               | property! Its used in lots of contexts for that reason
        
         | abetusk wrote:
         | I'm also surprised that I got 3rd place.
         | 
         | But in terms of comparison of yours to bwv, I don't agree that
         | bwv's is technically superior in every way. It lacks
         | comparison, ISBN selection and link creation. bwv's main focus
         | looks to be that one feature to highlight the rare books
         | without trying to get the other requirements that AA wanted.
        
           | c-fe wrote:
           | Congrats to you too! Indeed, I think they could have improved
           | the visual and comparison part, its a bit dark and not too
           | interesting to look at. But I am envious of how smooth their
           | tiling is. My tiles are 4096x4096 which allows me to satisfy
           | both the 20,000 file limit and the max 20mb file limit
           | imposed by cloudflare. I had some issues with smaller tiles,
           | and wanting to host it on cloudflare restricted me from doing
           | 512x512 tiles iirc. Also I really like that they extracted
           | the publisher information and put that as a pmtile vector,
           | thats something I attempted but ultimately ran out of time
           | with.
        
       | soneca wrote:
       | Where the database is from? How and how often is it updated?
       | 
       | I have two self-published books with ISBNs. Neither of them has
       | the details in the 1st place submission (I assume it won't be in
       | any other as well?).
       | 
       | One was published on Feb 23 and the other on Dec 24. I had hoped
       | at least the older one would be there. Does anyone know why they
       | are not?
       | 
       | The ISBNs:
       | 
       | - 9786500718836
       | 
       | - 9786501276830
        
         | ziddoap wrote:
         | From https://annas-archive.org/blog/all-isbns.html :
         | 
         | > _We started mapping ISBNs two years ago with our scrape of
         | ISBNdb. Since then, we have scraped many more metadata sources,
         | such as Worldcat, Google Books, Goodreads, Libby, and more. A
         | full list can be found on the "Datasets" and "Torrents" pages
         | on Anna's Archive. We now have by far the largest fully open,
         | easily downloadable collection of book metadata (and thus
         | ISBNs) in the world._
         | 
         | So, it your books would need to be present in one of the
         | databases that Anna's Archive scraped, at the time they scraped
         | it.
        
       | jonplackett wrote:
       | Is there anywhere that lists/publicises/collates competitions
       | like this?
       | 
       | I would like to have had a go at this but you often only find out
       | about these things when winners are announced.
        
       | bondant wrote:
       | The winning submission kind of remind me of the Eagle mode file
       | manager where you can zoom into a directory to see files in it
       | and keep zooming to access subdirectories.
       | 
       | https://eaglemode.sourceforge.net/emvideo.html
        
       | ChrisMarshallNY wrote:
       | Love the Trantor reference!
        
       | franciscop wrote:
       | I'm curious why there's no clear "Spanish" in these ISBN
       | visualizations; there's 2 slots for English, one for France,
       | Germany, Japan, Soviet Union, China, etc. but no big one for
       | Spain. Do we really have so few books in Spanish? Or is this a
       | predominantly English distribution?
       | 
       | I say this as someone who grew up in Spanish libraries and book
       | shops, surrounded and immersed in Spanish books, so it feels a
       | bit strange to see the tiny bit we occupy in the world map here.
        
         | rsecora wrote:
         | The dataset consists of books from the Anna Archive, each
         | identified by an ISBN. The ISBNs and titles are extracted from
         | datasets [1], which include magazines and books primarily in
         | Chinese, English, and French.
         | 
         | Example: Germany publishes five times more books than the
         | Netherlands [2], and Spain publishes twice as many books as the
         | Netherlands. However, in visualizations, Germany appears
         | similar to the Netherlands, while Spain and Mexico do not
         | aligned with the high-level labels [3].
         | 
         | [1] https://annas-archive.li/datasets
         | 
         | [2] https://internationalpublishers.org/wp-
         | content/uploads/2023/...
         | 
         | [3] https://software.annas-archive.li/AnnaArchivist/annas-
         | archiv...
        
         | glenstein wrote:
         | >I'm curious why there's no clear "Spanish" in these ISBN
         | visualizations
         | 
         | I had the exact same question, and I do have a completely
         | unsupported theory. There's one large block that appears to be
         | Argentina, or possibly Peru, although their titles are on the
         | fringes of the large block. The block is otherwise unlabled, no
         | name sitting at the center of the block like you see with the
         | other major ones. I would be slightly surprised if it were
         | entirely argentina, but it would make a lot of sense if that
         | block were Spanish.
        
       | layer8 wrote:
       | Does Anna's Archive track and account for duplicate ISBNs?
       | 
       | https://scis.edublogs.org/2017/09/28/the-dreaded-case-of-dup...
        
       | no-reply wrote:
       | I don't see any arabic literature. Curious whether that due to
       | lack of actual digital/ocr text or lack of availability of the
       | pdf/epub formats of the books.
        
       | divbzero wrote:
       | These ISBN visualizations remind me of the maps of IPv4 address
       | space.
       | 
       | https://xkcd.com/195/
       | 
       | https://ant.isi.edu/address/
       | 
       | https://www.caida.org/archive/id-consumption/census-map/
        
       ___________________________________________________________________
       (page generated 2025-02-27 23:00 UTC)