[HN Gopher] Show HN: 40k books on HN extracted using deep learning
___________________________________________________________________
Show HN: 40k books on HN extracted using deep learning
Author : tracyhenry
Score : 534 points
Date : 2021-09-20 16:58 UTC (6 hours ago)
(HTM) web link (hacker-recommended-books.vercel.app)
(TXT) w3m dump (hacker-recommended-books.vercel.app)
| sushisource wrote:
| Heh, for a minute there I thought you meant Warhammer 40k books
| specifically, and I thought that was a pretty funny thing to be
| scraping from HN :)
| bnbond wrote:
| Same. I'm a little disappointed it's not.
| russellbeattie wrote:
| I'm pretty sure there's more Warhammer 40k books than there are
| days in the year... It's like someone heard the term "space
| opera" and thought that meant "soap opera in space".
|
| Recommendations would include comments like, "This novel is
| really the one that ties the previous 37 books together." or
| "You might want to skip the next dozen books if you're
| squeamish about things that ooze."
| LanceH wrote:
| While I don't consider the 40k books on par with the better
| science fiction out there, I do enjoy that they bring a bit
| of scale and what it means to space. It's a different take
| from the rosy, post-scarcity, future of space. Bad things are
| _really_ bad. Unattended good things turn bad on their own
| just from drift.
|
| Then there is there is the unashamed embrace of over-the-top
| in so many different ways.
| unmole wrote:
| Interesting idea but not completely accurate. My own comment
| about how I hated _Thinking, Fast and Slow_ seems to be counted
| as a recommendation.
| tracyhenry wrote:
| Right, the model is not perfect with limited training dataset I
| have (we hand labeled 4,000 - which is already tons of work for
| a side project). But the intention was to filter out negative
| ones.
| jimmySixDOF wrote:
| You did a stellar job here thanks so much for this addition
| to the community !
|
| On labeling, if you have a method statement or some go-by
| referance I am sure you would get some support here - I know
| I would help ! Maybe package a few blocks of 100 unlabeled
| comments with a readme & see what happens ?
| sampo wrote:
| > My own comment about how I hated Thinking, Fast and Slow
| seems to be counted as a recommendation.
|
| What is the level of sentiment analysis in natural language
| processing? Would it be easy to add the feature, to recognize
| whether the book was mentioned in a positive or negative light?
| munk-a wrote:
| If you want to see some amusing "recommendations" I'd check out
| The Communist Manifesto by Karl Marx and what comments it's
| drawn. I think the network trying to find recommendations needs
| to incorporate more sentiment analysis.
|
| i.e "Guards Guards by Sir Terry Pratchett is a great book" vs.
| "I've never read anything as slow and uninteresting as The Two
| Towers by J.R.R. Tolkein" or "I thought Seveneves by Neal
| Stephenson was good - but it probably should've been two
| separate books with the second half actually having some meat
| to it."
| [deleted]
| FranklinMaillot wrote:
| Lessons: My Path to a Meaningful Life by Gisele Bundchen, the
| top model, is probably the most out of place recommendation :)
| None of the comments is about the book obviously, they just
| mention the word "Lessons".
|
| https://hacker-recommended-books.vercel.app/category/15/all-...
| dang wrote:
| Yes. I've removed the word "recommendations" from the title
| because there are too many cases of negative mentions being
| treated as recommendations.
|
| Not a criticism! Sentiment analysis seems to remain an unsolved
| problem.
|
| See also
|
| https://news.ycombinator.com/item?id=28598341
|
| https://news.ycombinator.com/item?id=28596882
| therealdrag0 wrote:
| This was an amusing "extraction": > I have
| not yet read the good book Atlas Shrugged but be sure to check
| it out based on your recommendation. You're
| delusional. Where did I ever recommend reading Atlas
| Shrugged? Ayn Rand is nuts.
| jgwil2 wrote:
| Yeah, I'm seeing some issues with _Code_ by Petzold citing
| comments that are talking about e.g. _Code Complete_ or just
| code in general, but with such a generic name (and given the
| forum) it 's actually pretty impressive to me that most
| comments are identified correctly.
|
| Edit: another one that is tough is _Open_ by Agassi - seems
| most of these comments do not actually have anything to do with
| the book. I would guess most one-word titles will have similar
| issues.
| tracyhenry wrote:
| That's correct observation. I'm guessing it has to do with
| whether the words after _Open_ are indicative enough to the
| model that they should be brought in together with _Open_. As
| I said in other comments, with more training data this issue
| will likely go away. And these tough comments are the best
| candidates.
| jedwhite wrote:
| Hey this is really awesome! Well done.
|
| You mentioned transformers and BERT for large NLP models. I've
| been playing around with this too and it's a really powerful
| approach. Have you used spacy-transformers? [0]
|
| The approach is pretty cool and can be used with BERT,
| GPT-2/Hugging Face etc.
|
| I'm just starting to experiment with GPT-J and thinking of trying
| this approach also [1].
|
| Anyway, totally awesome project and the results are really good.
| This stuff really is almost unreasonably effective!
|
| [0] https://explosion.ai/blog/spacy-transformers
|
| [1] https://6b.eleuther.ai/
| tracyhenry wrote:
| Thanks! I used Huggingface's pretrained BERT.
| malshe wrote:
| This is really impressive! Can you please elaborate more on
| the way you labeled the data? I think usually there is a lot
| to learn from labeling methods.
| jedwhite wrote:
| This is a really good application of it. Getting NER right
| for something like book titles with so much name collision
| with other domains and entity types is really hard, and this
| works great on something that most people would never realize
| would be so hard!
| sillysaurusx wrote:
| Please write up how you did this! It may seem easy or
| straightforward, but I assure you it's black magic to a lot
| of people.
| maiensch wrote:
| Love it, will you do a write-up on how to replicate this with
| other sources? I'm currently analyzing both Indie Hackers and
| StartupsForTheRestOfUs Interview Transcriptions and this could be
| a fun analysis!
| alanbernstein wrote:
| This is great. I just read Permutation City, which I
| coincidentally see recommended on HN all the time, so I was
| surprised not to see it in the search results or the top of the
| fiction or scifi lists. Any idea why that is?
| tracyhenry wrote:
| That might be that the book database I used is quite limited,
| sadly.
| Tycho wrote:
| Sounds good. Blocked by my work firewall though.
|
| A few years ago I found an article that was something like '100
| short books everyone should read before they're 40'. It was a mix
| of fiction and non-fiction. I've never been able to find it
| again! But I really liked the list because these are books you
| can consume in a few hours and may be life changing.
|
| I remember a few of the titles: Games People Play, Meditations,
| The Prince, The Art of War. (I suppose it may have been non-
| fiction only, although I think _The Awakening_ may have been on
| there.)
|
| Wish I could find the link again.
| ZeroGravitas wrote:
| If it was Oliver Sacks' Awakenings then it is non-fiction,
| though it did get turned into a movie.
| Tycho wrote:
| Different book - Kate Chopin
| [deleted]
| Rd6n6 wrote:
| I don't understand how software engineers get away with
| browsing the internet for fun at work when nobody else can
| themodelplumber wrote:
| Sounds more like personal development than fun?
| sillysaurusx wrote:
| That's a bit like saying watching porn is more for personal
| development than fun. Perhaps you'll learn something, but
| it's incidental.
|
| I've learned a lot from HN. But it wouldn't be good to fool
| myself into thinking that an employer wants to fund my
| personal development in this regard. Otherwise, they'd pay
| me to HN all day.
|
| The crux of the issue is that it's impossible to work 8
| hours every day. We all invent lies to fill the downtime.
| themodelplumber wrote:
| Is all that hyperbole really necessary? Each new sentence
| seems primed to leak edge and corner cases. Without
| giving more attention to such a rhetorical blind spot, I
| wonder how one could imagine they know the crux from the
| passenger side door.
| sillysaurusx wrote:
| Which sentence is mistaken?
| themodelplumber wrote:
| The one with all the generalizations
| sillysaurusx wrote:
| If it's mistaken, it should be easy to explain why.
| Otherwise I'm inclined to believe it's merely an
| uncomfortable truth.
|
| Would your employer pay you to HN all day? If not,
| precisely how much of your day are they comfortable with
| you HN'ing? Are you sure it's officially approved?
| chadcmulligan wrote:
| https://xkcd.com/303/
|
| Waiting for Compiles is the usual, there's a lot of waiting
| in software - waiting for compiles, scripts to run, someone
| else to do something.
| cyberge99 wrote:
| I'm Curious as to why you didn't choose to monetize with
| affiliate links.
|
| Is seems simple and easily justifiable reward. I didn't click the
| links, but hopefully you used smile.amazon for charity.
|
| This is novel and useful. Thank you.
| kvathupo wrote:
| In anticipation of getting flagged into oblivion, am I the only
| one who's disappointed in this selection of books?
|
| Of course, taste is subjective, and it should perhaps be expected
| that much of the list is in line with what is read by the general
| public, but many of the books are either presenting fact or
| attempting to convince the reader of the veracity of a certain
| viewpoint. I'd like to read more open-ended works that ask for
| interpretation on the part of the reader or, at the least, don't
| explicitly spell out what they want the reader to walk away with.
| (certainly some books here fit the bill, e.g. Infinite Jest,
| Pride & Prejudice, etc.). Again, interests are subjective.
|
| In light of this, book recommendations?
| themodelplumber wrote:
| Personally I wouldn't recommend others' books to someone who is
| left unfulfilled by such a huge list. I would rather recommend
| writing or other subjectively-pinned activities, to hold the
| subject accountable and help them stay out of the critic zone
| long enough to find their way into more fulfilling growth.
| awillen wrote:
| I think that's just the nature of pulling books from HN
| comments - a lot of those comments are trying to convince
| people of a viewpoint, so it seems unsurprising that this is
| the kind of list you'd end up with.
|
| Not good or bad, just a function of where they're coming from.
|
| And as for book recommendations, Children of Time by Adrian
| Tchaikovsky.
| figassis wrote:
| This is amazing, thank you.
| rustmachine wrote:
| Cool project, and cool resultats. As an anthropologist who reads
| HN as a way to keep abreast of the tech community and tech
| insights, its interesting to see atlas shrugged as one of the
| most often recommended books. Interesting and maybe slightly
| disturbing. HN would make for quite interesting source material
| for someone who wanted to study tech culture.
| dang wrote:
| I'd be careful about that generalization. This software seems
| to be going more by mentions than by recommendations - e.g. the
| top reply to https://news.ycombinator.com/item?id=16323808
| ("Ask HN: Which are the most damaging books you've read?") is
| being counted as a recommendation.
|
| Sentiment analysis is hard. In fact I've never seen it work
| yet.
| concernedctzn wrote:
| Found it interesting that I couldn't find results for Knuth (The
| Art of Computer Programming) or SICP on here. Maybe the casual
| way we refer to these texts is hard to detect as a reference to a
| book, or their importance is just implied community knowledge?
| [deleted]
| tracyhenry wrote:
| If there is no search result for the book name then it just
| means it's not in my current book database (which is limited).
| supperburg wrote:
| Surprised to see "A pattern language" on there. I've read most of
| it in preparation for building my house. It's more of a
| dictionary than a book but it's unbelievably useful. It's just a
| huge list of little things that an architect would notice over
| the span of his career. Little things that are important but not
| obvious. If you're building a house, another really good book is
| "what not to build."
|
| I also recommend "Islamic imperialism" from Yale, "the bomb in my
| garden" by mahdi obeidi and "nothing to envy."
| gjm11 wrote:
| Most likely the main reason "A Pattern Language" is popular
| here on HN is that it spawned a movement in software
| engineering:
| https://en.wikipedia.org/wiki/Software_design_pattern
|
| (Plus the fact that it's a good book on its own terms. At
| least, it is so far as I can tell; I am not an architect and
| maybe some of the advice in it is actually terrible. But it
| _seems_ almost always reasonable and frequently insightful, and
| it 's well written, and the "pattern language" idea that
| software engineering borrowed from it is a nice one. (Though
| the software-engineering borrowings don't generally amount to
| actual pattern languages as opposed to miscellaneous grab-bags
| of alleged patterns.)
| amelius wrote:
| Perhaps you can do the same for research papers. Would the code
| need to be changed in any way?
| tracyhenry wrote:
| Not much - but it needs a new set of training data for research
| papers. Btw - there seems to be an existing website for this
| already: https://www.hackernewspapers.com/ Although it only
| looks for posts.
|
| I'd assume that Arxiv links are often there. So it's a problem
| that can be addressed with an easier solution (just looking for
| Arxiv links).
| personjerry wrote:
| The problem with reading book lists like this is that nobody has
| time to read all the books. That's a ton of crap out there and I
| want HN to help me filter through them.
|
| Thus the problem with existing solutions is NOT "limited recall"
| or "insufficient rules" or "no Amazon link".
|
| And the problem with this "solution" is that there is no
| justification for why a book is great and applicable to my
| circumstances, and people have to trust your black box. Otherwise
| I'm likely to waste my time, just like reading books from any
| other crappy recommendation engine.
|
| With a deep learning model reducing all the reviews to "book
| names" you've successfully removed the value of the book
| discussions themselves. Therefore, for me this engine and all
| similar engines are strictly worse than simply going through the
| actual big threads themselves, i.e.
| https://news.ycombinator.com/item?id=21900498
|
| Edit: I've just seen the embedded comments by switching to a
| desktop browser. It's a nice addition. However, for me to make
| sure I'm not wasting my time going through arbitrary books and
| comments, I would need to know why a book is ranked highly
| compared to other books. And I want to be sure that ranking is
| tailored to me, at a very, very high accuracy.
| adewinter wrote:
| > With a deep learning model reducing all the reviews to "book
| names" you've successfully removed the value of the book
| discussions themselves.
|
| It literally shows each comment in full that it extracted the
| book name from. It also includes a link to the comment in the
| original thread. What more could you possibly want?
| personjerry wrote:
| Oh, I was on mobile and could not see the comments section.
| It's interesting for sure. But what I want in particular is
| to learn why a book is ranked highly compared to other books.
| And I want to be sure that ranking is tailored to me, at a
| very, very high accuracy.
| lbriner wrote:
| They are ranked highly because of the number of times they
| are recomended in a comment.
| FredPret wrote:
| There's no way around the black box element of a book review,
| but Nassim Taleb suggests waiting a few decades and, if the
| book is still well known, then reading it.
| bachmeier wrote:
| Wow. What a helpful piece of advice (I guess he's smarter
| than the rest of us so it's hard to understand the genius of
| his strategy). Any mention of the cost of missing out on the
| content in the book for a few decades?
| phgn wrote:
| The idea behind reading older books is that they're already
| proven to be useful - it's a filter for you to spend less
| time on useless information. It's generally called the
| "Lindy effect"
| lifekaizen wrote:
| Love this question. I could imagine him suggesting reading
| academic papers for cutting edge things; like his 'barbell'
| excercise strategy of mostly walking with occasional HIITs.
| dwighttk wrote:
| Just read older books until there aren't any and then move
| onto newer ones. There are too many that are a couple
| decades old for you to ever run out.
|
| Also occasionally break the rule for a book you want to
| read. It isn't like that would kill you.
| dhosek wrote:
| I guess it's ok to read _The C Programming Language_ , then.
| FredPret wrote:
| Technical manuals are more like a journal in the sense that
| you have to read all the new ones if you want to keep up.
|
| Novels, philosophies, and histories are works that can
| stand the test of time if they're good enough
| tracyhenry wrote:
| > I want HN to help me filter through them.
|
| The comments panel show the actual recommendations. And the
| books are ranked by number of recommendations. Is this not
| enough?
| deaddabe wrote:
| Impressive work.
|
| What data source are you using for the books, authors and covers?
| I looked at OpenLibrary [1] but the covers are not the same, so I
| suppose it is something else? Maybe Amazon directly somehow?
|
| [1] https://openlibrary.org/search?q=zero+to+one&mode=everything
| tracyhenry wrote:
| I crawled about 20k books from Amazon. Thanks for pointing me
| openlibrary!
| tinmandespot wrote:
| This makes me happy
| justinzollars wrote:
| Wow! this is amazing. Nice work! :) made my day
| leobg wrote:
| Very cool. This one's wrong though: "Zero: The Biography of a
| Dangerous Idea". Comments are talking about other books with
| "zero" in the tile, such as Thiel's "Zero To One". Perhaps parse
| longer titles first, and eliminate them, before matching for
| shorter titles? Great MVP. Had in fact been thinking about how
| great it would be to gather book data from HN myself just
| yesterday. So am really happy to see that someone actually made
| it. Plus, it looks great and is fun to use.
| tracyhenry wrote:
| Thanks. In theory this is the model's fault that's not learning
| "Zero to One" should be considered as a whole book. One
| limitation I mentioned in my root comment. Should be fixable
| with more training data!
| awinter-py wrote:
| it confused rationalist harry potter fanfic with 'harry potter
| hogwarts hardcover journal and elder wand pen set', amazing
| wizzwizz4 wrote:
| To my knowledge, /jk Rowling doesn't allow people to sell Harry
| Potter fanfic, even though she's fine with its existence.
| begueradj wrote:
| some are interesting
| bonniejawker wrote:
| are you planning to open source the app? could you do one for
| lobste.rs too?
| GuB-42 wrote:
| Interesting lack of 1984, even though it is mentioned way too
| often. The lesser known "Animal Farm" and other dystopias like
| "Brave New World" and "Fahrenheit 451" are here.
|
| Is it because it is a number?
| tracyhenry wrote:
| It's because my book database doesn't have it. In fact my model
| identifies 132 mentions of 1984, some examples:
|
| https://news.ycombinator.com/item?id=20285306
|
| https://news.ycombinator.com/item?id=12518804
|
| https://news.ycombinator.com/item?id=22724495
| SquishyPanda23 wrote:
| This is great thank you.
|
| On the topic of Brave New World, the site categorizes it as a
| Reference.
| SquishyPanda23 wrote:
| The book title is "Nineteen Eighty-Four", but nobody spells it
| that way.
|
| The app may need to special case it.
| guidovranken wrote:
| Nice. The Hacker News archive contains a wealth of great
| information. I've previously performed similar extractions like
| OP but with grep and SQL. I've also looked for people who have
| accurately predicted the stock market (I did identify one pro
| investor. He's now into NFTs). I've found so much cool stuff,
| spending whole nights looking for interesting users and reading
| their entire post histories and being blown away by many
| insightful posts. I've been considering making a blog consisting
| entirely of insightful HN posts that I come across.
| moneywoes wrote:
| Do you mind sharing what investor
| air7 wrote:
| Please do. That sounds super interesting.
| Andrew_nenakhov wrote:
| Ok #3 is Dune. It'll surely be super helpful in building my
| interstellar empire.
|
| Step 1: make elites addicted to drugs
|
| Step 2: monopolize drug trade
|
| Step 3: install a religious fundamentalistic regime with yourself
| at its head
|
| (All very logical until this point, but next step might be a
| problem, can anyone offer advice)
|
| Step 4: transform into a worm
|
| ??!
| robotresearcher wrote:
| Don't forget the step of achieving prescience, which allows you
| to figure out what the '??!' is.
| Andrew_nenakhov wrote:
| That's what drugs are for, no?!
| defect0 wrote:
| Noticed an issue. Some, but not all, comments referencing Strunk
| and White's Elements of Style are showing up instead as Erin
| Gates' Elements of Style: Designing a Home & a Life
| tracyhenry wrote:
| Good catch! This is the limitation mentioned in my root comment
| - the algorithm will fail when two books have similar names.
| The partial solution is to look at authors too when available.
| Something to be included in the future.
| leobg wrote:
| BTW, going through that list, I see why I love the HN crowd. 70 %
| of those books I've read myself, and did so before coming to HN.
| There must be some strong personality type filtering going on.
| reducesuffering wrote:
| I think it's been quite obvious there's some personality type
| filtering going on, as with most online communities. I'm quite
| curious how it'd be quantified. Surely software engineers,
| startup founder, ADHD, INTJ, and Meyers-Briggs-is-bogus types
| are overrepresented. Might tell us a bit more about
| ourselves...
| cinntaile wrote:
| There's a strange error in there. The Art of War by Sun Tzu is
| listed twice, why is that? Since it finds the right book and
| author?
| qwert12345887 wrote:
| Can this be done to get list of blogs posted here with topic
| analysis?
| baby wrote:
| Interestingly it cannot differentiate between the different harry
| potter recommendation (the original books, fanfics, and that book
| on philosophy that mentions harry potter)
| spookyuser wrote:
| This is really incredible!
|
| A while ago I created something adjacent to this that looks for
| hacker news review of books on goodreads
| (https://github.com/spookyuser/hacker-reads)
|
| So I'm very curious how you managed to find book titles, I ran
| into a lot of issues trying to figure out, for example, with
| "Clean Code" whether to search for "Clean Code" or "Clean Code: A
| Handbook of Agile Software Craftsmanship" since people mentioning
| the book used both instances. And of course someone mentioning
| just "Clean Code" might be referring to the concept not the book.
| I ended up settling on `${titleMinusColon} - ${author}` but I'd
| love to know what your approach was given that you used deep
| learning to search.
|
| EDIT: Just read your comment below on your approach, very
| interesting!
| jp42 wrote:
| Slightly off from post. The best book recommendation I got is
| from one of the following ways, dedicated recommendation service
| or app never worked for me:
|
| - told by friend
|
| - someone I admire read book and commented on it
|
| - I'm working on some problem, during that exploration i came
| across books.
|
| - random people mentioning book on platform like HN on a
| topic/post of my interest.
| rahimnathwani wrote:
| The most life-changing book recommendation I got was from HN:
| 'Teach your child to read in 100 easy lessons'.
| jp42 wrote:
| Thanks for the comment! My son is 3y8m. He knows letter and
| many word. Looks like this book could be what he needs to get
| on next level.
| TakerofVita wrote:
| Yeah, in my experience a lot of 'general' book reviews are
| super critical and don't really try to hook you. Going through
| several reviews, you just come away with the collected gripes
| and nitpicks of what is otherwise a good book.
|
| I find that I get sold on a lot more when it is just a random
| single comment on some thread somewhere that focuses on a
| single aspect of a book.
|
| If you can find a hyper specific subreddit/forum/etc. for a
| sub-genre you like, then you will spend more time reading books
| than reviews...
| spookyuser wrote:
| > random people mentioning book on platform like HN on a
| topic/post of my interest.
|
| Same! Some of my favorite book recommendations have especially
| come from this one, I don't know why but a one line comment on
| a HN thread of "what book changed your life" has become my
| favorite way for discovering books.
| ramraj07 wrote:
| Great work but do note that the list basically looks slightly
| better than an amazon list (atlas shrugged lol). I think some
| effort into more useful ranking (looking for metrics of
| controversiality or maybe page rank) might make it more useful!
| vavooom wrote:
| I am also curious to know if the # of votes is integrated into
| the ranking at all, possibly weighted. Could also attempt NLP
| Text Sentiment analysis to influence the model as well.
|
| Regardless, fantastic work already!
| tracyhenry wrote:
| Right now the ranking is a simple combination of sentiment
| and length. Including #vote definitely sounds useful!
| FinanceAnon wrote:
| Awesome idea and nice looking UI! I will definitely visit when I
| am looking for new books to read.
|
| One thing that I've noticed is that when I select another book,
| the scrollbar in the comment section doesn't automatically scroll
| up.
| tracyhenry wrote:
| I know this but I wasn't able to fix it. Would love suggestions
| on how to keep the scroll position in one div (for the books)
| but not the other (for the comments) when doing client-side
| navigation using Next.js...
| zeristor wrote:
| Can't find my recommendations for J. Scott Turner's "The Extended
| Organism"
|
| To summarise: organisms evolving to change the environment around
| them to their benefit. I went to Foyle's one day with butlying a
| book on Termite mounds in mind, that is one chapter in the book.
|
| I found out too late that UCL had hosted a talk by Dr Turner a
| year too late.
| soheil wrote:
| "You're delusional. Where did I ever recommend reading Atlas
| Shrugged? Ayn Rand is nuts."
|
| Interesting that that's one of the recommendations.
| the_arun wrote:
| Thinking loud here - what is the difference between Google Search
| Algorithm & AI Based deep learning? They both are trying to do
| same I guess - that is structuring unstructured data?
| mdp2021 wrote:
| Suggestions: any way to notify you (your system) of book
| recommendations your processing has missed?
|
| You could have a form to notify you of a post which seems to be
| not processed, e.g.
| "https://news.ycombinator.com/item?id=28549134", or "id=28591398"
| etc.
|
| (BTW: very great work, and thank you for your invaluable service)
| tracyhenry wrote:
| Although viewable on mobile, this app is best viewed on larger
| screens! :)
| wombatmobile wrote:
| I like this a lot.
|
| The longer extracts are more useful than the shorter extracts.
|
| For Brave New World, I noticed the first 100 - 200 comments are
| short, and not useful as reviews so much as indicators of
| preference. Then after that, the comments are longer, and hence,
| more useful because they explain something.
|
| It would be useful to be able to filter word length so as to be
| able to distinguish between Opinion Mode vs Review Mode.
| srcreigh wrote:
| You helped me spend $150 on books! Two comments
|
| 1. I regret you earned $0 for helping me spending so much on
| books. Have you considered setting up affiliate links or a
| donation button? Maybe affiliate links as a service will be your
| next project.
|
| 2. The Amazon links are for Amazon.com, but I'm in Canada. Maybe
| easy internationalized Amazon affiliate links will be your next
| project.
| xpe wrote:
| I regret that so many people regret that other people are not
| monetizing.
| lostgame wrote:
| You know what, if commenter OP finds value in the services
| offered; and wishes to compensate the author of the software
| - just gonna say - I have no problem with that.
| mihaic wrote:
| You might not have a problem with that, but some of us
| dislike knowing that monetizatization has to become
| omnipresent, as it changes everything.
| srcreigh wrote:
| Affiliate programs are the most anti-big corp
| monetization strategy ever.
|
| Considering I already buy books on Amazon, if there's
| anyway I can just find an affiliate (any affiliate),
| Amazon gets 5.5% less revenue.
|
| For tracyhenry, they would get ~$8.25 CAD straight out of
| Amazon's pocket for my $150 purchase.
|
| https://associates.amazon.ca/help/node/topic/GRXPHT8U84RA
| YDX...
| soperj wrote:
| except it becomes much harder to find genuine
| recommendations for things on the web.
| scns wrote:
| You can use Pi-holes' monetization link.
|
| (edit) just scroll down that page: https://pi-
| hole.net/donate/#sponsorship
| dublinben wrote:
| Amazon would get even less revenue if you bought your
| books somewhere else, like https://bookshop.org/ or
| directly from an independent book store.
| thaufeki wrote:
| A Patreon/crypto address to make a donation to is the
| compromise here, surely.
| ijidak wrote:
| How do you pay your bills?
|
| People should get paid for work.
|
| Whether that work is having a job. Or making a website.
|
| I don't see the difference...
|
| Someone who does useful work deserves wages.
| MathCodeLove wrote:
| I regret that some people seem to think that any sort of
| compensation for services rendered or monetization in any way
| is automatically bad or wrong somehow.
| darwinwhy wrote:
| I regret having read this entire comment chain.
| amelius wrote:
| I regret that you did not get compensated for your lost
| time.
| ijidak wrote:
| Agree.
|
| People should get paid for work.
|
| Whether that work is having a job. Or making a website.
|
| Someone who does useful work deserves wages.
|
| Even 2,000 years ago the Bible said: "the worker deserves
| his wages."
|
| Most of the people who are again monetization are perfectly
| happy to get paid by their employer.
|
| Is direct employment the only morally upright way to
| receive payment for hard work?
| gricardo99 wrote:
| You helped me spend $150 on books!
|
| Check your local Library. Depending on where you are, it could
| be a fantastic resource for books.
| malshe wrote:
| After reading such comments here on HN, last month I got
| myself a local library card and it has turned out to be a
| great decision! I am using Libby app to get digital books and
| even audiobooks! Absolutely fantastic
| rahimnathwani wrote:
| For #2, there are services OP could use, that will
| automatically switch links to the right country store, e.g.
| https://geniuslink.com/how-it-works/for-affiliates
| cweill wrote:
| Great execution, and very neat app!
|
| But, what's wrong with using Amazon affiiliate links? If
| anything, monetizing would be great since it would give you more
| incentive to maintain this wonderful application? And it doesn't
| cost us users anything.
| tracyhenry wrote:
| Great point. I'm on a student visa which forbids any non-work
| income. That's one reason why :)
| nautilius wrote:
| Amazing and super useful: If I start reading today, and I read a
| book a day, it'll only take 112 years to finish, assuming that no
| additional books will be recommended in the next century.
| inanutshellus wrote:
| I'm reminded of Goodhart's Law... So long as your project remains
| secret it'll be valuable. Once someone sees money being made from
| it, it'll kick off ingenuine recommendations... anyway... high
| quality problem to have I guess!
| bachmeier wrote:
| Interesting idea, but this is _mentions_ of books, not
| recommendations. It includes comments by someone that 's reading
| the book, has it on their reading list, or read it and thought it
| was terrible.
| tracyhenry wrote:
| The intention was to only show recommendations. But because of
| limited training data (we hand labeled ~4000 comments), the
| model wasn't able to filter out bad ones effectively. More
| training data should be able to solve it.
| zsmi wrote:
| It's a really interesting project. And I am sure it's really
| hard.
|
| I was curious how many times some common textbooks were mentioned
| but didn't find them via the search, which could be user error.
| But to give a specific example. None of the books in this comment
| thread were found:
|
| https://news.ycombinator.com/item?id=19893447
|
| Comment text like this: "CMOS VLSI Design: A Circuits and Systems
| Perspective (4th Edition)" by Weste and Harris
|
| should've been caught, right?
| tracyhenry wrote:
| It could be that I don't have this book in my book database.
| nickthemagicman wrote:
| Came here for the Warhammer stayed for the book recc's.
| rahimnathwani wrote:
| This is awesome. The best thing is that it's so fast to navigate.
| I like how the HN comments are styled just like on HN.
|
| A couple of thoughts:
|
| * It would be great if each book were to have its own URL (for
| sharing).
|
| * Consider allowing the search to allow author input, e.g. if I
| want to find the book 'Who' by Geoff Smart, the single-word title
| isn't specific enough to show that book at the top of the search
| results.
| soco wrote:
| If I look for one single word and that single word _is_ the
| answer, shouldn 't that be the very first result? I mean that's
| a 100% match right there...
| rahimnathwani wrote:
| If the dataset were perfect, maybe. But, if a book with a
| single-word title has only few comments, it's plausible that
| most/all of those comments are false matches.
|
| In the case of the book I searched 'Who', showing it in 4th
| position seemed about right.
| tracyhenry wrote:
| y the search can definitely be improved (e.g. to include
| author). Right now it's SELECT * FROM books WHERE name LIKE
| '%{search_string}%'
| artursapek wrote:
| this is awesome, thanks for making it
| MarcScott wrote:
| HN really likes Neal Stephenson. I've never read a book of his
| that I didn't love, so will be definitely looking though more of
| the recommended fiction from the community here.
| samuel wrote:
| REAMDE was crap, IMO, and I'm a Stephenson fan.
|
| And the problem with Stephenson is that's rarely succint so a
| bad book from him turns into a huge loss of time.
| macintux wrote:
| I addressed that problem with _Seveneves_ by skimming about 1
| /3rd of it.
| samuel wrote:
| This is amazing. Thank you!
|
| Does it take into account negative reviews/comments? I have seen
| that Why we sleep is being recommended in the 6 months tab, but,
| while it was received with a lot of praise, it was soon
| critizised by others researchers in the field and I would expect
| that the HN crowd would have followed that trend.
| tracyhenry wrote:
| When I labeled the comments, I didn't label books that were
| criticized. So in theory the model should filter out negative
| reviews. But currently the training dataset is pretty limited
| in size so you still can see some negative ones. I suspect that
| with more training data this problem will go away.
| jeron wrote:
| 40k good books out there and I can only read like 24 a year if I
| really push myself
| ehutch79 wrote:
| It has atlas shrugged in the top 10?
| gjm11 wrote:
| "Atlas Shrugged" is a polarizing book: people tend to either
| love it or hate it. And the people who love it love to tell
| other people how great it is, whereas many of the people who
| hate it just don't talk about it (because there's generally
| little need to talk about the badness of bad books).
|
| I think a book list is more useful if it has some books in it
| that some love and some hate, rather than only books that no
| one minds very much. Maybe some of them will turn out to be
| ones I love.
|
| (I happen not to be a Rand fan myself.)
| [deleted]
| Borlands wrote:
| Brilliant
| tracyhenry wrote:
| Hi HN!
|
| I built this small app in my spare time to aggregate books
| recommended on Hacker News. I personally find books recommended
| on HN to be super helpful, so I think this is the way that I can
| contribute back.
|
| This book aggregation idea is not new. A bunch of sites have done
| similar things [1, 2, 3].
|
| Yet one common limitation of those sites is that they have
| limited recall (i.e. not able to get a comprehensive set of book
| mentions), and thus don't paint an accurate picture of what the
| top books are. They're all based on insufficient rules, e.g.,
| looking for Amazon Links. As you can see from my app, people
| often do not include Amazon links when recommending a book.
|
| I wondered, why can't we just match book names? Well, not so
| easy. Some books have pretty short names, e.g. Meditations [4],
| or Steve Jobs [5]. Some book name might as well be the name of a
| movie, e.g. Ready Player One [6]. Simply matching the names of
| the books would produce a whole lot of irrelevant results.
|
| This is where Deep Learning comes into play. Recent advances in
| large NLP models (transformers and BERT in particular) have made
| machine language understanding unprecedentedly accurate. It
| enables me to fine-tune a BERT model on a couple thousand labeled
| HN comments and predict accurately whether each word in a comment
| is part of a book or not - a task commonly termed as Named Entity
| Recognition (NER).
|
| As a result, my app is able to present a whole lot more results
| while maintaining desirable accuracy. For example, NER works
| pretty well on the tough examples I mentioned ([4, 5, 6]).
| Compared to prior sites, my app captures 9-50X more mentions and
| thus presents a much more complete picture of what books are
| recommended on HN.
|
| Furthermore, I've made sure that the comments are presented well
| in the UI because the recommendations are just as useful as the
| books. I highlighted the mentioned book name, and used a custom
| NLP-based ranking function to sort the comments. These are non-
| trivial improvements over prior sites, which I hope you can find
| useful.
|
| Nevertheless, this app is not without limitations: 1) matching
| book names would fail when two books have the same or similar
| names; 2) although not often, this approach would wrongly
| classify some short stop-word names [7] and 3) sometimes NER
| fails to see that the commenter actually hates the book. These
| problems can be alleviated with more Deep Learning. For 1), one
| can use BERT to learn the authors mentioned which can be used as
| a filtering criteria. 2) and 3) should be fixable with more
| training data (currently there are only ~4,000 hand-labeled HN
| comments).
|
| Lastly, I'd like to especially thank my gf who helped me label
| ~1,000 comments, which boosted the model accuracy by 5 percent! I
| also want to thank the people who create and maintain the
| HackerNews big query dataset [8]. And of course, thank everyone
| on HN who recommends books to others.
|
| Hope you enjoy this app! Feedback and suggestions are welcome :)
|
| [1] https://news.ycombinator.com/item?id=15169611
|
| [2] https://news.ycombinator.com/item?id=10924741
|
| [3] https://news.ycombinator.com/item?id=12365693
|
| [4] https://hacker-recommended-
| books.vercel.app/category/0/all-t...
|
| [5] https://hacker-recommended-
| books.vercel.app/category/1/all-t...
|
| [6] https://hacker-recommended-
| books.vercel.app/category/0/all-t...
|
| [7] https://hacker-recommended-
| books.vercel.app/category/12/past...
|
| [8] https://news.ycombinator.com/item?id=19304326
|
| P.s. The amazon links are NOT sponsored. This app is free of
| monetization.
| oakfr wrote:
| This is really cool stuff. Would be really nice to do the same
| for movies :)
| metalliqaz wrote:
| A book that I and others has recommended doesn't show up in the
| database.
|
| Animal, Vegetable, Junk: A History of Food, from Sustainable to
| Suicidal by Mark Bittman
| endofreach wrote:
| Amazing. I appreciate that there are no affiliate links. But I
| honestly think: you should put affiliate links.
|
| Also, if it makes sense, have a monthly list.
| godmode2019 wrote:
| This is very impressive, well done on deploying this.
|
| 95% of every book I have ever read or owned is in the first 20
| pages.
|
| Its almost just as fun to read the comment chain about each book.
|
| You must be independently wealthy because I know no one cares if
| their is an affiliate link. I believe affiliates are always paid
| to the last cookie you have.
| oakfr wrote:
| @tracyhenry: how does the system work exactly? I cannot find any
| documentation on your website.
| tracyhenry wrote:
| hey, you can scroll down to find a long comment of mine
| documenting the approach.
| dang wrote:
| https://news.ycombinator.com/item?id=28596207
| [deleted]
___________________________________________________________________
(page generated 2021-09-20 23:00 UTC)