[HN Gopher] Show HN: I mapped HN's favorite books with GPT-4o
       ___________________________________________________________________
        
       Show HN: I mapped HN's favorite books with GPT-4o
        
       Hey HN! I love finding new books to read on here. I wanted to
       gather the most mentioned books and recreate the serendipity of
       physical browsing. I scraped 20k comments from HN threads related
       to reading, extracted the references and opinions using GPT-4o
       mini, and visualised their embeddings as a map.  - OpenAI's
       embeddings were processed using UMAP and HDBSCAN. A direct 2D
       projection from the text embeddings didn't yield visually
       interesting results. Instead, HDBSCAN is first applied on a high-
       dimensional projection. Those clusters tend to correspond to
       different genres. The genre memberships are then embedded using a
       second round of UMAP (using Hellinger distance) which results in
       pleasingly dense structures.  - The books' descriptions are based
       on extractions from the comments and GPT's general knowledge.
       Quality levels vary, and it leads to some oddly specific points,
       but I haven't found any yet that are straight up wrong.  - There
       are multiple books with the same title. Currently, only the most
       popular one of those makes it onto the map.  - It's surprisingly
       hard to get high quality book cover images. I tried Google Books
       and a bunch of open APIs, but they all had their issues. In the
       end, I got the covers from GoodReads through a hacked together
       process that combines their autocomplete search with GPT for data
       linkage. Does anyone know of a reliable source?
        
       Author : pmaze
       Score  : 167 points
       Date   : 2024-09-07 12:23 UTC (2 days ago)
        
 (HTM) web link (hnbooks.pieterma.es)
 (TXT) w3m dump (hnbooks.pieterma.es)
        
       | paulwarren wrote:
       | Check out the OpenLibrary Covers API:
       | https://openlibrary.org/dev/docs/api/covers
        
       | namanyayg wrote:
       | nice project, pieterma.
       | 
       | i'm curious about the decision to use hellinger distance for the
       | second round of UMAP - was that purely empirical or did you have
       | some intuition about why it'd work well for this specific
       | dataset?
       | 
       | also, out of curiosity, what's the most popular book on the map
       | that doesn't have a clear genre cluster?
        
         | pmaze wrote:
         | Thanks!
         | 
         | The cluster memberships that come out of the first round are
         | distributions over the different clusters, e.g. a given book is
         | weighted 0.8 for cluster A and 0.2 for cluster B. The Hellinger
         | distance is well-suited to quantify the difference between two
         | distributions like that. Cosine similarity and Euclidean
         | distance worked as well, but Hellinger gave subjectively nicer
         | results.
         | 
         | Very interesting question, I'm not sure! While developing, I
         | noticed that the systems thinking books were spread over
         | different genres, which I found quite pleasing. However, I'm
         | not sure if other books were even more diffuse. I'll have to
         | dig back in and find out :)
        
       | jppope wrote:
       | FYI I have a runaway recursive processing when I load the site...
       | it goes down in ~30 seconds or so.
        
       | Brajeshwar wrote:
       | This is awesome. Can I please request a way to have a list or a
       | tabular format?
        
         | theturtletalks wrote:
         | https://hnbooks.pieterma.es/features.geojson
        
         | Strongbad536 wrote:
         | I've put a title/author list on a github page here, sorted by
         | title alphabetically
         | 
         | https://github.com/BrianVia/hacker-news-favorite-books
        
       | sleazebreeze wrote:
       | The aesthetics are nice, but what I really want is a toggleable
       | overlay that shows the rough keyword mapping for all the books.
       | The single book view is fine for understanding a single book, but
       | not useful for trying to process the whole page to find one book
       | I might want to read.
       | 
       | Nice project though, I love it.
        
       | kthartic wrote:
       | I'm not sure I understand the "map" part of this. What does the
       | geography represent exactly?
        
         | vegabook wrote:
         | looks like t-SNE projection
        
           | WillAdams wrote:
           | What does each axis represent?
           | 
           | What is the significance of the placement of each cluster?
        
             | ok123456 wrote:
             | In t-SNE, the distances in the feature vector space are
             | preserved in the projected space. IIRC, these distances
             | serve as boundary conditions to a stochastic diffusion
             | problem. The actual positions and the orientation are
             | allowed to be free variables.
        
       | dangus wrote:
       | I find the graphical nature of this to be disorganized and
       | distracting. If you didn't explain to me what the meaning of the
       | map was it would be essentially a meaningless cluster of book
       | covers.
        
       | goshx wrote:
       | Very nice. Do you have a text format of this available?
        
       | padolsey wrote:
       | Niiice! I really like it. The spatial approach is cool, though
       | labelling/annotations/axes would help.
       | 
       | I share the frustraion with getting book covers for my project
       | ablf.io. Amazon used to make this much easier, but they've locked
       | it down recently, so you have to jump through affiliate hoops. I
       | ended up implementing my own thing and storing thousands of
       | images myself on S3. If you have the goodreads IDs, feel free to
       | use:                   assets.abooklike.foo/covers/{goodreads
       | id}.jpg
       | 
       | N.B. The actual goodreads website itself make it hard as well
       | since they have an additional UUID in their img URIs, so it's not
       | deterministic; that's why I created this.
        
         | DantesKite wrote:
         | That's a great website. I've been looking for alternative book
         | recommendation websites for a while and it really has nailed it
         | down.
         | 
         | It even recommended me a somewhat eclectic book I've recently
         | been meaning to read.
         | 
         | Is there a reason you limit to only 6 favorite books? Is it due
         | to computational restraints?
        
         | renjimen wrote:
         | Nice site! I like that I can filter results by fiction or non-
         | fiction. Interesting to enter my favourite novels and see the
         | non-fiction that's recommended. Some surprisingly good picks!
        
       | ilikehurdles wrote:
       | One small issue on mobile safari. when i tap to drag the map
       | around, if i put my finger down on a book cover to start dragging
       | the map, the book description is immediately expanded. put
       | differently, my intention is to drag not open, but both actions
       | take place when I drag.
       | 
       | I really like the project otherwise. We have a book club that's
       | deciding on what to read next and this could be very helpful.
        
       | SoftTalker wrote:
       | Interesting but the visualization is useless. How about a
       | standard tabular format maybe grouped by genre?
        
         | ok123456 wrote:
         | The 'genres' emerge out of the clusters. The fact that you can
         | pick out 'genres' from this plot is an example of semi-
         | supervised learning.
        
           | SoftTalker wrote:
           | OK but there's no clue what they are until you start poking
           | around. It's not something that's useful without some
           | substantial investment of time and exploration. If that's the
           | goal, fine but it's not how I would present a "favorite
           | books" list.
        
             | ok123456 wrote:
             | Making 'genres' and classifying things is imprecise and
             | requires experts in subject matter and library science to
             | get it right. The "genre" labels here emerge out of the
             | data itself.
        
         | allenu wrote:
         | I think it's useful as a tool for browsing a getting a general
         | gist of what people are into and seeing if your favorites are
         | there, too. As for a tool to maximize one's reading list,
         | certainly not as useful, but I appreciate that it didn't make
         | me feel like I had to create action items on things to read.
        
       | alabhyajindal wrote:
       | Congrats! The interface is beautiful and fast!
       | 
       | Adding direct links to the comments that mention the books could
       | be a good feature to add. Hacker News Books [1] does this and
       | it's useful have all the comments for a book in a single page.
       | 
       | 1. https://hackernewsbooks.com
        
       | mooreed wrote:
       | Nice project.
       | 
       | I also would love to hear more about the cluster shapes and
       | cardinality of the coordinate system. I consider myself am pretty
       | versed in data analysis, however with less expertise on NLP
       | topics (eg t-SNE).
       | 
       | So a quick blurb like: the units on the axes in the graph are "a
       | reduced embedding space" designed to keep structure and to reduce
       | the dimensionality such that the clusters could be plotted on
       | screen...
       | 
       | (I'm not even sure that's correct, but I would have loved for you
       | to have informed me on the one sentence visualization choice and
       | then point me to t-SNE.)
       | 
       | Overall nice project - and it reminds me of a painful
       | professional analysis lesson I have had to re-learn more than
       | once.
       | 
       | > After working for NN hours on an analysis, and finally breaking
       | through and completing it, overlooking the title and labels is
       | the biggest footgun I have ever dealt with.
        
       | Nathanael_M wrote:
       | I'd like to explore this more, but I'm getting THOUSANDS of
       | errors:
       | 
       | Failed to load module script: Expected a JavaScript module script
       | but the server responded with a MIME type of "text/html". Strict
       | MIME type checking is enforced for module scripts per HTML spec.
       | 
       | This crashes my browser in less than a minute.
        
       | LudwigNagasena wrote:
       | Is there a way to see cluster names?
        
       | pstorm wrote:
       | Fyi regarding cover images: I have built and run a handful of
       | book related websites and Amazon is the easiest place I found to
       | get book covers. You just need the Amazon id and every image is a
       | standard url.
        
       | peteforde wrote:
       | Really cool to see my favs show up, but I honestly don't
       | understand what we're actually looking at; the groupings seem
       | very opaque beyond very general themes like sci-fi, startups,
       | biographies, math, physics.
       | 
       | In other words, what are the clustering shapes telling us? Can we
       | dig in based on geography, publishing date, key terms or themes?
       | 
       | Either way, I can't keep the site open for more than 30-40
       | seconds before it crashes. I suspect that's not the goal!
       | 
       | Is Cryptonomicon the best fiction book, or is the data wrong?
        
         | jdthedisciple wrote:
         | > Either way, I can't keep the site open for more than 30-40
         | seconds before it crashes.
         | 
         | Yup, probably was about to happen to me too, had I not closed
         | it.
         | 
         | CPU fan almost launched off the troposphere about 30 seconds
         | in.
         | 
         | Probably a cluttered bunch of heavily unoptimized ReactJS
         | modules in there (no offense to OP, I know it probably sped up
         | development by 10x at least)
        
       | 23B1 wrote:
       | This is cool.
       | 
       | Idea: Amazon has killed 'random' browsing of books. Would love to
       | see this applied to topic area searches etc. so I can have the
       | same serendipity that I used to get in all the bookstores Amazon
       | unalived.
        
         | WillAdams wrote:
         | Well, for fiction there is:
         | 
         | https://www.literature-map.com/
        
       | ijidak wrote:
       | Love this. I like that the clustering allows me to start from a
       | book I've read and liked, and then move on the similar books in
       | the cluster.
       | 
       | For example, I just finished The Phoenix Project.
       | 
       | I'm already seeing some related books I should take a look at.
       | 
       | Very useful!
        
       | noitpmeder wrote:
       | This is awesome. Glad to see both A Fire Upon the Deep and
       | Deepness made it on the list!
        
       | reducesuffering wrote:
       | Any way to determine the quantity of recommendations or degree of
       | positive sentiment? Maybe a larger book cover image?
        
       | answerheck wrote:
       | Lovely work, thank you for sharing
       | 
       | Probably a comment on my subconscious desire for
       | familiarity/patterns, but the left side of the map instantly made
       | me think of NW Europe: long skinny Norway dangling between the UK
       | and Denmark (not correctly spaced, but sizes are reasonably
       | correct!). A few other candidates at a stretch - maybe some
       | Baltic states off to the east, for example - but after that it
       | breaks down unfortunately.
       | 
       | Cool project sir
        
       | the__alchemist wrote:
       | Well, now I have recommendations; ty! I just bought 9 books from
       | this list, filling in around ones I know I like. This is
       | outstanding.
        
       | Eduard wrote:
       | For my system: becomes unresponsive within the first second.
       | 
       | * Google Chrome form flathub. Version 128.0.6613.119 (Official
       | Build) (64-bit) * Debian 12 bookworm under KDE Wayland
        
       | dnlserrano wrote:
       | this is great, and cool looking, thanks!
        
       | wtf242 wrote:
       | This is awesome! Do you mind if I add this list to my books site?
       | (https://thegreatestbooks.org) I'll give you full credit and link
       | back to your site.
        
       | motohagiography wrote:
       | amazing and hilariously accurate. together they represent a
       | culture and shared ontology. having worked in places with others
       | who have read some of these books, the shorthand is super fast.
        
       ___________________________________________________________________
       (page generated 2024-09-09 23:00 UTC)