[HN Gopher] Show HN: The Sample - newsletters curated for you wi...
___________________________________________________________________
Show HN: The Sample - newsletters curated for you with machine
learning
Author : jacobobryant
Score : 44 points
Date : 2021-06-28 16:32 UTC (6 hours ago)
(HTM) web link (sample.findka.com)
(TXT) w3m dump (sample.findka.com)
| visarga wrote:
| I tried this kind of news aggregation but unfortunately the
| source material is mostly garbage and people prefer a few hand
| selected news to an automatic feed.
|
| The classification problem is interesting though. I ended up with
| a long list of hundreds of topics. Most articles fall in two or
| more. There's also a sub-problem of clustering news by subject.
| jacobobryant wrote:
| (response to edit)
|
| > The classification problem is interesting though. I ended up
| with a long list of hundreds of topics. Most articles fall in
| two or more. There's also a sub-problem of clustering news by
| subject.
|
| Yeah, certainly difficult. I'm doing it partially manually
| right now but also with fastText[1]. I'd like to switch
| completely to fastText soon though since more often than not
| the newsletters I add don't fit well with the checkbox
| categories on the landing page. Clustering the newsletters with
| fastText is pretty easy, but it might be hard to generate good
| tags for the landing page that match up with the clusters.
|
| [1] https://fasttext.cc/
| jacobobryant wrote:
| > people prefer a few hand selected news to an automatic feed.
|
| That's actually one of The Sample's strengths--it leaves most
| of the curation to humans, and then it helps people find those
| curators.
| visarga wrote:
| I like that, so it's helping humans find the news but
| deferring the final selection.
| text_exch wrote:
| For all the uninitiated, Jacob also runs Findka Essays [0] which
| is fantastic. My daily reading is Findka Essays and Thinking
| About Things [1], which is more than enough interesting reading
| material that I can get through.
|
| [0] https://essays.findka.com/
|
| [1] https://www.thinking-about-things.com/
| jacobobryant wrote:
| Glad you like it!
| qwerty456127 wrote:
| The next title below this one on HN says "Biohackers Take Aim at
| Big Pharma's Stranglehold on Insulin". Which category does it
| fall in on Findka? It offers tech, culture, marketing etc but I
| am primarily interested in biohacking, medical stuff, cognitive
| science etc but can't find a category for that.
| jacobobryant wrote:
| The categories are pretty fuzzy, and I need to update the ones
| on the landing page. After you sign up, there's a text field
| where you can add arbitrary tags. And as time goes on, the
| algorithm will experiment with sending you different things and
| expand beyond whatever topics you gave it anyway. In truth, the
| only reason I even put those check boxes on the landing page is
| to help people understand what the thing even does :) (a
| surprisingly difficult problem for people not familiar with
| recommender systems).
| qwerty456127 wrote:
| Thanks. I'll try that.
|
| > a surprisingly difficult problem for people not familiar
| with recommender systems
|
| Do you feel like open-sourcing and/or writing a post on how
| it works? I feel interested in recommender systems too :-)
| jacobobryant wrote:
| I'm planning to build a business on this, so probably won't
| open-source it--but I'm always looking for interesting
| things to write about! I write a weekly newsletter called
| Future of Discovery[1]; I might write up some more
| implementation details there in a week or two. In the mean
| time, most of the heavy lifting is done by the Surprise
| python lib[2]. It's pretty easy to play around with, just
| give it a csv of <user id>, <item id>, <rating> and then
| you can start making rating predictions. Also fastText[3]
| is easy to mess around with too. Most of the code I've
| written just layers things on top of that, e.g. to handle
| exploration-vs-exploitation as discussed in another thread
| here.
|
| Recently I've been factoring out the ML code into a
| separate recommendation service so it can different kinds
| of apps (I just barely made this essay recommender
| system[4] start using it for example).
|
| I'm happy to chat about recommender systems also if you
| like, email's in my profile.
|
| [1] https://findka.com
|
| [2] http://surpriselib.com/
|
| [3] https://fasttext.cc/
|
| [4] https://essays.findka.com
| giansegato wrote:
| Very interesting use of reco engines! I always thought that the
| current state of newsletter discovery is very poor. At the same
| time, I would not be ok with a product owning the full newsletter
| stack (like eg. Spotify with podcasts). Good luck with the
| project!
| jacobobryant wrote:
| Thanks, and totally agree about owning the stack! This is
| intended to provide a network for newsletters without trying to
| suck them into a platform--we introduce people to newsletters,
| but we don't forward anyone the same newsletter more than one
| time; so if they want to keep getting it, they can subscribe
| via the newsletter's subscribe page.
| PeterWhittaker wrote:
| I'm definitely interested! But....
|
| NOTE: These are not criticisms so much as release 1.1 (or even
| 2.0) use cases....
|
| The categories are too broad (and, for some use cases, the 21
| days is rather long, but I get it...).
|
| Re categories: While I am interested in science, quote unquote, I
| am orders of magnitude more interested in physics than biology,
| with the exception of biochemistry as it relates to evolutionary
| theory, and within physics I am at least an order of magnitude
| more interested in QM (two orders, if you can nail it down to
| relational QM) than I am in astrophysics, and my interest in
| astronomy is even lower, and my interest in exoplanets is mostly
| nil.
|
| And that's my personal interest stuff. Professionally, I am
| interested in JS specifically, so that would be programming, but
| I really have zero interest in Fortran or COBOL or Haskell,
| unless somehow those relate directly to things I can or might
| want to do in JS (full stack React FE, node BE).
|
| The timeframe side relates to that professional side as well:
| That JS focus will change at some point (startup life) and by the
| time 21 days have passed, well, I'll be about 2.5 weeks behind,
| eh?
| jacobobryant wrote:
| I wouldn't worry too much about the topics:
| https://news.ycombinator.com/item?id=27666763
|
| To add to that, topic modeling/content-based filtering is only
| used as a starting point anyway; as you rate the newsletters
| you receive, your recommendations will start to be dominated
| more by collaborative filtering ("people who liked X also liked
| Y", without thought for what X and Y are about).
|
| Also, the "21 days" thing is pure marketing/explanation :).
| It's just an attempt to help people (especially non-technical
| people) understand that it'll adapt to your preferences over
| time. It'll start adapting immediately, and it'll continue
| adapting after 21 days.
| jacobobryant wrote:
| This is the latest of several recommender systems[1] that I've
| attempted to grow over the past couple years. It launched in
| February and there are about 800 subscribers. The algorithm uses
| a collaborative filtering model ("people who liked X also liked
| Y"), and to help with cold-start I augment the training data with
| content-based filtering: I use keyword extraction (tf-idf) and a
| pre-trained language model (fastText) to cluster the newsletters,
| then for each cluster I generate k "fake users" who like each
| newsletter in the cluster. This way, the model will gradually
| switch from content-based filtering to collaborative filtering as
| it collects user ratings.
|
| Some of the newsletters I found on my own, but most are submitted
| by users (there's a "what other newsletters do you subscribe to"
| question after you sign up). I set up an inbound-only mail
| server, and I generate a unique address for each newsletter,
| which I use to sign up manually. I approve each issue that comes
| in so that we don't forward welcome emails, promotions etc. (It
| only takes 5 - 10 minutes a day). Before forwarding I also scrub
| out any links with certain keywords like "unsubscribe", "manage
| your preferences" and so on. It's not a perfect process but it's
| good enough for now.
|
| Long-term I want to build a genaral-purpose recommender
| system[2]. I'm starting with newsletters because I think it'll be
| the easiest way to grow fast initially (which seems to have been
| validated so far). The short explanation is that I've designed
| The Sample to be extremely effective at cross-promotion[3]. (If
| you have a newsletter, submit it at
| https://sample.findka.com/submit/ and I'll send you a referral
| link).
|
| [1] https://news.ycombinator.com/item?id=24921127
|
| [2] https://jacobobryant.com/p/why-newsletters/
|
| [3] https://jacobobryant.com/p/an-algorithm-for-driving-
| newslett...
| desine wrote:
| Is there any mechanism to prevent you from forming your own
| super-echo-chamber? ML filtering only what you and your similar
| profiles like seems possibly have problematic outcomes, no?
|
| Interesting, and inevitable, project, but I have my concerns.
| jacobobryant wrote:
| Yes, the algorithm accounts for the exploration-vs-
| exploitation problem.[1] Right now it uses a simple epsilon-
| greedy strategy: 20% of the newsletters you receive are
| picked completely at random. I also use a technique I call
| "popularity smoothing," which limits the number of times that
| popular newsletters can be forwarded.
|
| I think the concerns about ML causing echo chambers/other
| problems, while not completely unfounded, are overblown
| (perhaps due to the overall anti-big-tech sentiment). I think
| human behavior plus ease of sharing online is a much larger
| factor. I'm optimistic that ML filtering can actually help
| people get a much _larger_ variety of information, which is
| one of my goals for this.
|
| [1] https://en.wikipedia.org/wiki/Multi-armed_bandit
| thegginthesky wrote:
| Thank you for sharing your approach, I find it very
| interesting. I've worked in some systems that also optimize
| for "serendipity"[1], which is just optimizing not only for
| ranking accuracy but also for adding new relevant content
| into the mix. (I believe you need a lot of data for this to
| be viable though)
|
| > I think the concerns about ML causing echo chambers/other
| problems, while not completely unfounded, are overblown
|
| I disagree that it's overblown. This is a widely discussed
| topic in ML research, especially around recommender systems
| [0]. While I do agree that ML systems have enormous
| potential in augmenting human capability, we should be
| addressing possible flaws such as the one mentioned.
|
| I'm personally interested in how to solve biases in machine
| learning systems, as it impacts so much of my professional
| work. But I also think bias, or echo-chamber, isn't unique
| to ML since we see it so much in the world and institutions
| that surround us, but we are in a unique position to
| address these problems directly on the systems we create.
|
| [0] https://arxiv.org/abs/2010.03240 [1]
| https://link.springer.com/article/10.1007/s11390-020-0135-9
| jacobobryant wrote:
| > I disagree that it's overblown.
|
| Fair enough. To be clear, I'm not saying ML bias isn't a
| significant problem; and certainly I'll continue to
| address it as the project grows. Based on this, it sounds
| like we might be in agreement mostly:
|
| > But I also think bias, or echo-chamber, isn't unique to
| ML since we see it so much in the world and institutions
| that surround us, but we are in a unique position to
| address these problems directly on the systems we create.
|
| I often run into people (usually not ML practitioners)
| who appear to think that ML _inherently_ results in
| filter bubbles /bias (as opposed to specific ML
| implementations) and thus think the entire approach
| should be abandoned; whereas I think ML, with a proper
| focus on reducing bias, is one of our most promising
| options.
| beders wrote:
| The sample newsletter looked interesting, so I signed up. Thanks
| for providing a sneak-peek.
| dmje wrote:
| Interesting, thanks! Have signed up.
___________________________________________________________________
(page generated 2021-06-28 23:00 UTC)