[HN Gopher] Show HN: The Sample - newsletters curated for you wi...
       ___________________________________________________________________
        
       Show HN: The Sample - newsletters curated for you with machine
       learning
        
       Author : jacobobryant
       Score  : 44 points
       Date   : 2021-06-28 16:32 UTC (6 hours ago)
        
 (HTM) web link (sample.findka.com)
 (TXT) w3m dump (sample.findka.com)
        
       | visarga wrote:
       | I tried this kind of news aggregation but unfortunately the
       | source material is mostly garbage and people prefer a few hand
       | selected news to an automatic feed.
       | 
       | The classification problem is interesting though. I ended up with
       | a long list of hundreds of topics. Most articles fall in two or
       | more. There's also a sub-problem of clustering news by subject.
        
         | jacobobryant wrote:
         | (response to edit)
         | 
         | > The classification problem is interesting though. I ended up
         | with a long list of hundreds of topics. Most articles fall in
         | two or more. There's also a sub-problem of clustering news by
         | subject.
         | 
         | Yeah, certainly difficult. I'm doing it partially manually
         | right now but also with fastText[1]. I'd like to switch
         | completely to fastText soon though since more often than not
         | the newsletters I add don't fit well with the checkbox
         | categories on the landing page. Clustering the newsletters with
         | fastText is pretty easy, but it might be hard to generate good
         | tags for the landing page that match up with the clusters.
         | 
         | [1] https://fasttext.cc/
        
         | jacobobryant wrote:
         | > people prefer a few hand selected news to an automatic feed.
         | 
         | That's actually one of The Sample's strengths--it leaves most
         | of the curation to humans, and then it helps people find those
         | curators.
        
           | visarga wrote:
           | I like that, so it's helping humans find the news but
           | deferring the final selection.
        
       | text_exch wrote:
       | For all the uninitiated, Jacob also runs Findka Essays [0] which
       | is fantastic. My daily reading is Findka Essays and Thinking
       | About Things [1], which is more than enough interesting reading
       | material that I can get through.
       | 
       | [0] https://essays.findka.com/
       | 
       | [1] https://www.thinking-about-things.com/
        
         | jacobobryant wrote:
         | Glad you like it!
        
       | qwerty456127 wrote:
       | The next title below this one on HN says "Biohackers Take Aim at
       | Big Pharma's Stranglehold on Insulin". Which category does it
       | fall in on Findka? It offers tech, culture, marketing etc but I
       | am primarily interested in biohacking, medical stuff, cognitive
       | science etc but can't find a category for that.
        
         | jacobobryant wrote:
         | The categories are pretty fuzzy, and I need to update the ones
         | on the landing page. After you sign up, there's a text field
         | where you can add arbitrary tags. And as time goes on, the
         | algorithm will experiment with sending you different things and
         | expand beyond whatever topics you gave it anyway. In truth, the
         | only reason I even put those check boxes on the landing page is
         | to help people understand what the thing even does :) (a
         | surprisingly difficult problem for people not familiar with
         | recommender systems).
        
           | qwerty456127 wrote:
           | Thanks. I'll try that.
           | 
           | > a surprisingly difficult problem for people not familiar
           | with recommender systems
           | 
           | Do you feel like open-sourcing and/or writing a post on how
           | it works? I feel interested in recommender systems too :-)
        
             | jacobobryant wrote:
             | I'm planning to build a business on this, so probably won't
             | open-source it--but I'm always looking for interesting
             | things to write about! I write a weekly newsletter called
             | Future of Discovery[1]; I might write up some more
             | implementation details there in a week or two. In the mean
             | time, most of the heavy lifting is done by the Surprise
             | python lib[2]. It's pretty easy to play around with, just
             | give it a csv of <user id>, <item id>, <rating> and then
             | you can start making rating predictions. Also fastText[3]
             | is easy to mess around with too. Most of the code I've
             | written just layers things on top of that, e.g. to handle
             | exploration-vs-exploitation as discussed in another thread
             | here.
             | 
             | Recently I've been factoring out the ML code into a
             | separate recommendation service so it can different kinds
             | of apps (I just barely made this essay recommender
             | system[4] start using it for example).
             | 
             | I'm happy to chat about recommender systems also if you
             | like, email's in my profile.
             | 
             | [1] https://findka.com
             | 
             | [2] http://surpriselib.com/
             | 
             | [3] https://fasttext.cc/
             | 
             | [4] https://essays.findka.com
        
       | giansegato wrote:
       | Very interesting use of reco engines! I always thought that the
       | current state of newsletter discovery is very poor. At the same
       | time, I would not be ok with a product owning the full newsletter
       | stack (like eg. Spotify with podcasts). Good luck with the
       | project!
        
         | jacobobryant wrote:
         | Thanks, and totally agree about owning the stack! This is
         | intended to provide a network for newsletters without trying to
         | suck them into a platform--we introduce people to newsletters,
         | but we don't forward anyone the same newsletter more than one
         | time; so if they want to keep getting it, they can subscribe
         | via the newsletter's subscribe page.
        
       | PeterWhittaker wrote:
       | I'm definitely interested! But....
       | 
       | NOTE: These are not criticisms so much as release 1.1 (or even
       | 2.0) use cases....
       | 
       | The categories are too broad (and, for some use cases, the 21
       | days is rather long, but I get it...).
       | 
       | Re categories: While I am interested in science, quote unquote, I
       | am orders of magnitude more interested in physics than biology,
       | with the exception of biochemistry as it relates to evolutionary
       | theory, and within physics I am at least an order of magnitude
       | more interested in QM (two orders, if you can nail it down to
       | relational QM) than I am in astrophysics, and my interest in
       | astronomy is even lower, and my interest in exoplanets is mostly
       | nil.
       | 
       | And that's my personal interest stuff. Professionally, I am
       | interested in JS specifically, so that would be programming, but
       | I really have zero interest in Fortran or COBOL or Haskell,
       | unless somehow those relate directly to things I can or might
       | want to do in JS (full stack React FE, node BE).
       | 
       | The timeframe side relates to that professional side as well:
       | That JS focus will change at some point (startup life) and by the
       | time 21 days have passed, well, I'll be about 2.5 weeks behind,
       | eh?
        
         | jacobobryant wrote:
         | I wouldn't worry too much about the topics:
         | https://news.ycombinator.com/item?id=27666763
         | 
         | To add to that, topic modeling/content-based filtering is only
         | used as a starting point anyway; as you rate the newsletters
         | you receive, your recommendations will start to be dominated
         | more by collaborative filtering ("people who liked X also liked
         | Y", without thought for what X and Y are about).
         | 
         | Also, the "21 days" thing is pure marketing/explanation :).
         | It's just an attempt to help people (especially non-technical
         | people) understand that it'll adapt to your preferences over
         | time. It'll start adapting immediately, and it'll continue
         | adapting after 21 days.
        
       | jacobobryant wrote:
       | This is the latest of several recommender systems[1] that I've
       | attempted to grow over the past couple years. It launched in
       | February and there are about 800 subscribers. The algorithm uses
       | a collaborative filtering model ("people who liked X also liked
       | Y"), and to help with cold-start I augment the training data with
       | content-based filtering: I use keyword extraction (tf-idf) and a
       | pre-trained language model (fastText) to cluster the newsletters,
       | then for each cluster I generate k "fake users" who like each
       | newsletter in the cluster. This way, the model will gradually
       | switch from content-based filtering to collaborative filtering as
       | it collects user ratings.
       | 
       | Some of the newsletters I found on my own, but most are submitted
       | by users (there's a "what other newsletters do you subscribe to"
       | question after you sign up). I set up an inbound-only mail
       | server, and I generate a unique address for each newsletter,
       | which I use to sign up manually. I approve each issue that comes
       | in so that we don't forward welcome emails, promotions etc. (It
       | only takes 5 - 10 minutes a day). Before forwarding I also scrub
       | out any links with certain keywords like "unsubscribe", "manage
       | your preferences" and so on. It's not a perfect process but it's
       | good enough for now.
       | 
       | Long-term I want to build a genaral-purpose recommender
       | system[2]. I'm starting with newsletters because I think it'll be
       | the easiest way to grow fast initially (which seems to have been
       | validated so far). The short explanation is that I've designed
       | The Sample to be extremely effective at cross-promotion[3]. (If
       | you have a newsletter, submit it at
       | https://sample.findka.com/submit/ and I'll send you a referral
       | link).
       | 
       | [1] https://news.ycombinator.com/item?id=24921127
       | 
       | [2] https://jacobobryant.com/p/why-newsletters/
       | 
       | [3] https://jacobobryant.com/p/an-algorithm-for-driving-
       | newslett...
        
         | desine wrote:
         | Is there any mechanism to prevent you from forming your own
         | super-echo-chamber? ML filtering only what you and your similar
         | profiles like seems possibly have problematic outcomes, no?
         | 
         | Interesting, and inevitable, project, but I have my concerns.
        
           | jacobobryant wrote:
           | Yes, the algorithm accounts for the exploration-vs-
           | exploitation problem.[1] Right now it uses a simple epsilon-
           | greedy strategy: 20% of the newsletters you receive are
           | picked completely at random. I also use a technique I call
           | "popularity smoothing," which limits the number of times that
           | popular newsletters can be forwarded.
           | 
           | I think the concerns about ML causing echo chambers/other
           | problems, while not completely unfounded, are overblown
           | (perhaps due to the overall anti-big-tech sentiment). I think
           | human behavior plus ease of sharing online is a much larger
           | factor. I'm optimistic that ML filtering can actually help
           | people get a much _larger_ variety of information, which is
           | one of my goals for this.
           | 
           | [1] https://en.wikipedia.org/wiki/Multi-armed_bandit
        
             | thegginthesky wrote:
             | Thank you for sharing your approach, I find it very
             | interesting. I've worked in some systems that also optimize
             | for "serendipity"[1], which is just optimizing not only for
             | ranking accuracy but also for adding new relevant content
             | into the mix. (I believe you need a lot of data for this to
             | be viable though)
             | 
             | > I think the concerns about ML causing echo chambers/other
             | problems, while not completely unfounded, are overblown
             | 
             | I disagree that it's overblown. This is a widely discussed
             | topic in ML research, especially around recommender systems
             | [0]. While I do agree that ML systems have enormous
             | potential in augmenting human capability, we should be
             | addressing possible flaws such as the one mentioned.
             | 
             | I'm personally interested in how to solve biases in machine
             | learning systems, as it impacts so much of my professional
             | work. But I also think bias, or echo-chamber, isn't unique
             | to ML since we see it so much in the world and institutions
             | that surround us, but we are in a unique position to
             | address these problems directly on the systems we create.
             | 
             | [0] https://arxiv.org/abs/2010.03240 [1]
             | https://link.springer.com/article/10.1007/s11390-020-0135-9
        
               | jacobobryant wrote:
               | > I disagree that it's overblown.
               | 
               | Fair enough. To be clear, I'm not saying ML bias isn't a
               | significant problem; and certainly I'll continue to
               | address it as the project grows. Based on this, it sounds
               | like we might be in agreement mostly:
               | 
               | > But I also think bias, or echo-chamber, isn't unique to
               | ML since we see it so much in the world and institutions
               | that surround us, but we are in a unique position to
               | address these problems directly on the systems we create.
               | 
               | I often run into people (usually not ML practitioners)
               | who appear to think that ML _inherently_ results in
               | filter bubbles /bias (as opposed to specific ML
               | implementations) and thus think the entire approach
               | should be abandoned; whereas I think ML, with a proper
               | focus on reducing bias, is one of our most promising
               | options.
        
       | beders wrote:
       | The sample newsletter looked interesting, so I signed up. Thanks
       | for providing a sneak-peek.
        
       | dmje wrote:
       | Interesting, thanks! Have signed up.
        
       ___________________________________________________________________
       (page generated 2021-06-28 23:00 UTC)