[HN Gopher] Show HN: I mapped HN's favorite books with GPT-4o
___________________________________________________________________
Show HN: I mapped HN's favorite books with GPT-4o
Hey HN! I love finding new books to read on here. I wanted to
gather the most mentioned books and recreate the serendipity of
physical browsing. I scraped 20k comments from HN threads related
to reading, extracted the references and opinions using GPT-4o
mini, and visualised their embeddings as a map. - OpenAI's
embeddings were processed using UMAP and HDBSCAN. A direct 2D
projection from the text embeddings didn't yield visually
interesting results. Instead, HDBSCAN is first applied on a high-
dimensional projection. Those clusters tend to correspond to
different genres. The genre memberships are then embedded using a
second round of UMAP (using Hellinger distance) which results in
pleasingly dense structures. - The books' descriptions are based
on extractions from the comments and GPT's general knowledge.
Quality levels vary, and it leads to some oddly specific points,
but I haven't found any yet that are straight up wrong. - There
are multiple books with the same title. Currently, only the most
popular one of those makes it onto the map. - It's surprisingly
hard to get high quality book cover images. I tried Google Books
and a bunch of open APIs, but they all had their issues. In the
end, I got the covers from GoodReads through a hacked together
process that combines their autocomplete search with GPT for data
linkage. Does anyone know of a reliable source?
Author : pmaze
Score : 167 points
Date : 2024-09-07 12:23 UTC (2 days ago)
(HTM) web link (hnbooks.pieterma.es)
(TXT) w3m dump (hnbooks.pieterma.es)
| paulwarren wrote:
| Check out the OpenLibrary Covers API:
| https://openlibrary.org/dev/docs/api/covers
| namanyayg wrote:
| nice project, pieterma.
|
| i'm curious about the decision to use hellinger distance for the
| second round of UMAP - was that purely empirical or did you have
| some intuition about why it'd work well for this specific
| dataset?
|
| also, out of curiosity, what's the most popular book on the map
| that doesn't have a clear genre cluster?
| pmaze wrote:
| Thanks!
|
| The cluster memberships that come out of the first round are
| distributions over the different clusters, e.g. a given book is
| weighted 0.8 for cluster A and 0.2 for cluster B. The Hellinger
| distance is well-suited to quantify the difference between two
| distributions like that. Cosine similarity and Euclidean
| distance worked as well, but Hellinger gave subjectively nicer
| results.
|
| Very interesting question, I'm not sure! While developing, I
| noticed that the systems thinking books were spread over
| different genres, which I found quite pleasing. However, I'm
| not sure if other books were even more diffuse. I'll have to
| dig back in and find out :)
| jppope wrote:
| FYI I have a runaway recursive processing when I load the site...
| it goes down in ~30 seconds or so.
| Brajeshwar wrote:
| This is awesome. Can I please request a way to have a list or a
| tabular format?
| theturtletalks wrote:
| https://hnbooks.pieterma.es/features.geojson
| Strongbad536 wrote:
| I've put a title/author list on a github page here, sorted by
| title alphabetically
|
| https://github.com/BrianVia/hacker-news-favorite-books
| sleazebreeze wrote:
| The aesthetics are nice, but what I really want is a toggleable
| overlay that shows the rough keyword mapping for all the books.
| The single book view is fine for understanding a single book, but
| not useful for trying to process the whole page to find one book
| I might want to read.
|
| Nice project though, I love it.
| kthartic wrote:
| I'm not sure I understand the "map" part of this. What does the
| geography represent exactly?
| vegabook wrote:
| looks like t-SNE projection
| WillAdams wrote:
| What does each axis represent?
|
| What is the significance of the placement of each cluster?
| ok123456 wrote:
| In t-SNE, the distances in the feature vector space are
| preserved in the projected space. IIRC, these distances
| serve as boundary conditions to a stochastic diffusion
| problem. The actual positions and the orientation are
| allowed to be free variables.
| dangus wrote:
| I find the graphical nature of this to be disorganized and
| distracting. If you didn't explain to me what the meaning of the
| map was it would be essentially a meaningless cluster of book
| covers.
| goshx wrote:
| Very nice. Do you have a text format of this available?
| padolsey wrote:
| Niiice! I really like it. The spatial approach is cool, though
| labelling/annotations/axes would help.
|
| I share the frustraion with getting book covers for my project
| ablf.io. Amazon used to make this much easier, but they've locked
| it down recently, so you have to jump through affiliate hoops. I
| ended up implementing my own thing and storing thousands of
| images myself on S3. If you have the goodreads IDs, feel free to
| use: assets.abooklike.foo/covers/{goodreads
| id}.jpg
|
| N.B. The actual goodreads website itself make it hard as well
| since they have an additional UUID in their img URIs, so it's not
| deterministic; that's why I created this.
| DantesKite wrote:
| That's a great website. I've been looking for alternative book
| recommendation websites for a while and it really has nailed it
| down.
|
| It even recommended me a somewhat eclectic book I've recently
| been meaning to read.
|
| Is there a reason you limit to only 6 favorite books? Is it due
| to computational restraints?
| renjimen wrote:
| Nice site! I like that I can filter results by fiction or non-
| fiction. Interesting to enter my favourite novels and see the
| non-fiction that's recommended. Some surprisingly good picks!
| ilikehurdles wrote:
| One small issue on mobile safari. when i tap to drag the map
| around, if i put my finger down on a book cover to start dragging
| the map, the book description is immediately expanded. put
| differently, my intention is to drag not open, but both actions
| take place when I drag.
|
| I really like the project otherwise. We have a book club that's
| deciding on what to read next and this could be very helpful.
| SoftTalker wrote:
| Interesting but the visualization is useless. How about a
| standard tabular format maybe grouped by genre?
| ok123456 wrote:
| The 'genres' emerge out of the clusters. The fact that you can
| pick out 'genres' from this plot is an example of semi-
| supervised learning.
| SoftTalker wrote:
| OK but there's no clue what they are until you start poking
| around. It's not something that's useful without some
| substantial investment of time and exploration. If that's the
| goal, fine but it's not how I would present a "favorite
| books" list.
| ok123456 wrote:
| Making 'genres' and classifying things is imprecise and
| requires experts in subject matter and library science to
| get it right. The "genre" labels here emerge out of the
| data itself.
| allenu wrote:
| I think it's useful as a tool for browsing a getting a general
| gist of what people are into and seeing if your favorites are
| there, too. As for a tool to maximize one's reading list,
| certainly not as useful, but I appreciate that it didn't make
| me feel like I had to create action items on things to read.
| alabhyajindal wrote:
| Congrats! The interface is beautiful and fast!
|
| Adding direct links to the comments that mention the books could
| be a good feature to add. Hacker News Books [1] does this and
| it's useful have all the comments for a book in a single page.
|
| 1. https://hackernewsbooks.com
| mooreed wrote:
| Nice project.
|
| I also would love to hear more about the cluster shapes and
| cardinality of the coordinate system. I consider myself am pretty
| versed in data analysis, however with less expertise on NLP
| topics (eg t-SNE).
|
| So a quick blurb like: the units on the axes in the graph are "a
| reduced embedding space" designed to keep structure and to reduce
| the dimensionality such that the clusters could be plotted on
| screen...
|
| (I'm not even sure that's correct, but I would have loved for you
| to have informed me on the one sentence visualization choice and
| then point me to t-SNE.)
|
| Overall nice project - and it reminds me of a painful
| professional analysis lesson I have had to re-learn more than
| once.
|
| > After working for NN hours on an analysis, and finally breaking
| through and completing it, overlooking the title and labels is
| the biggest footgun I have ever dealt with.
| Nathanael_M wrote:
| I'd like to explore this more, but I'm getting THOUSANDS of
| errors:
|
| Failed to load module script: Expected a JavaScript module script
| but the server responded with a MIME type of "text/html". Strict
| MIME type checking is enforced for module scripts per HTML spec.
|
| This crashes my browser in less than a minute.
| LudwigNagasena wrote:
| Is there a way to see cluster names?
| pstorm wrote:
| Fyi regarding cover images: I have built and run a handful of
| book related websites and Amazon is the easiest place I found to
| get book covers. You just need the Amazon id and every image is a
| standard url.
| peteforde wrote:
| Really cool to see my favs show up, but I honestly don't
| understand what we're actually looking at; the groupings seem
| very opaque beyond very general themes like sci-fi, startups,
| biographies, math, physics.
|
| In other words, what are the clustering shapes telling us? Can we
| dig in based on geography, publishing date, key terms or themes?
|
| Either way, I can't keep the site open for more than 30-40
| seconds before it crashes. I suspect that's not the goal!
|
| Is Cryptonomicon the best fiction book, or is the data wrong?
| jdthedisciple wrote:
| > Either way, I can't keep the site open for more than 30-40
| seconds before it crashes.
|
| Yup, probably was about to happen to me too, had I not closed
| it.
|
| CPU fan almost launched off the troposphere about 30 seconds
| in.
|
| Probably a cluttered bunch of heavily unoptimized ReactJS
| modules in there (no offense to OP, I know it probably sped up
| development by 10x at least)
| 23B1 wrote:
| This is cool.
|
| Idea: Amazon has killed 'random' browsing of books. Would love to
| see this applied to topic area searches etc. so I can have the
| same serendipity that I used to get in all the bookstores Amazon
| unalived.
| WillAdams wrote:
| Well, for fiction there is:
|
| https://www.literature-map.com/
| ijidak wrote:
| Love this. I like that the clustering allows me to start from a
| book I've read and liked, and then move on the similar books in
| the cluster.
|
| For example, I just finished The Phoenix Project.
|
| I'm already seeing some related books I should take a look at.
|
| Very useful!
| noitpmeder wrote:
| This is awesome. Glad to see both A Fire Upon the Deep and
| Deepness made it on the list!
| reducesuffering wrote:
| Any way to determine the quantity of recommendations or degree of
| positive sentiment? Maybe a larger book cover image?
| answerheck wrote:
| Lovely work, thank you for sharing
|
| Probably a comment on my subconscious desire for
| familiarity/patterns, but the left side of the map instantly made
| me think of NW Europe: long skinny Norway dangling between the UK
| and Denmark (not correctly spaced, but sizes are reasonably
| correct!). A few other candidates at a stretch - maybe some
| Baltic states off to the east, for example - but after that it
| breaks down unfortunately.
|
| Cool project sir
| the__alchemist wrote:
| Well, now I have recommendations; ty! I just bought 9 books from
| this list, filling in around ones I know I like. This is
| outstanding.
| Eduard wrote:
| For my system: becomes unresponsive within the first second.
|
| * Google Chrome form flathub. Version 128.0.6613.119 (Official
| Build) (64-bit) * Debian 12 bookworm under KDE Wayland
| dnlserrano wrote:
| this is great, and cool looking, thanks!
| wtf242 wrote:
| This is awesome! Do you mind if I add this list to my books site?
| (https://thegreatestbooks.org) I'll give you full credit and link
| back to your site.
| motohagiography wrote:
| amazing and hilariously accurate. together they represent a
| culture and shared ontology. having worked in places with others
| who have read some of these books, the shorthand is super fast.
___________________________________________________________________
(page generated 2024-09-09 23:00 UTC)