[HN Gopher] Monolith: The Recommendation System Behind TikTok
       ___________________________________________________________________
        
       Monolith: The Recommendation System Behind TikTok
        
       Author : tim_sw
       Score  : 84 points
       Date   : 2023-04-14 19:27 UTC (3 hours ago)
        
 (HTM) web link (gantry.io)
 (TXT) w3m dump (gantry.io)
        
       | maxk42 wrote:
       | Following the link to the cuckoo hasing algorithm on Wikipedia, I
       | don't quite understand what it's doing. I looked up another
       | couple articles but still find myself confused. Does anyone have
       | a link to a resource with an easy-to-follow writeup of how cuckoo
       | hashing works?
        
         | bhawks wrote:
         | I found this pretty clear and it includes an interactive
         | visualization:
         | 
         | https://www.lkozma.net/cuckoo_hashing_visualization/
        
         | lelandfe wrote:
         | Recommendation: once you _do_ grok it, see if you can update
         | the intro on the Wikipedia article to help others out :)
        
         | Groxx wrote:
         | Hash collisions happen. What do you do with them?
         | 
         | Chained (aka open hashing) hashtables store pointers rather
         | than values in the "main" array, and just put collisions in a
         | linked list on that hash cell. Easy, but indirections have a
         | cost.
         | 
         | Closed hashing (aka open addressing, yes it's confusing)
         | hashtables take hash(input) and just do it again to get a
         | second location, and put it there. Repeat N times for N
         | hash(hash(hash(...))) collisions. Dense, but needs more complex
         | logic to figure out when to stop looking / what to do when
         | deleting because anything could be at location X due to a
         | collision with something else.
         | 
         | Cuckoo hashtables use two (or more) hash algorithms rather than
         | one, and dedicate a portion of the memory to each algorithm. If
         | something's already in the first algorithm's location, put the
         | thing it's colliding with in that thing's second location. On
         | read, check both locations. Dense, relatively simple for both
         | insert and deletion, and tolerant of a few collisions with low
         | cost.
         | 
         | (cuckoo hashtables are a form of closed hashing / open
         | addressing, because they keep all the data within the data-
         | sized arrays, not storing extra info. and all of these are
         | over-generalizing / there are fairly different-looking
         | strategies available, e.g. it's not necessarily pointers or
         | strictly repeated hashing)
         | 
         | And you could just reject the existence of collisions entirely
         | and move to a new, larger array immediately. That tends to
         | perform so poorly in both cpu and memory that nothing really
         | does it in practice, but it is technically an option.
        
         | CBLT wrote:
         | I understood it from this HN comment:
         | https://news.ycombinator.com/item?id=8491456
        
       | chadrs wrote:
       | I must be rare in that the longer I use tiktok, the less relevant
       | the recommendations feel. Maybe because I compulsively watch
       | videos until the end even if I don't like them.
        
         | jabbany wrote:
         | Seems like a common problem of recommenders tbh.
         | 
         | Like binary search, they're really good at finding local optima
         | quickly, and then are rather bad at getting out of them once
         | they get there.
        
         | nickthegreek wrote:
         | This can happen to me if it gets stuck down a avenue that it
         | thought I was interested in. But the next day or even a few
         | hours later, it seems to correct itself.
        
         | thomasahle wrote:
         | They are probably trying to save GPU power on already hooked
         | users. This is a common trick in Recommendation Systems. You
         | want to spend the most resources / run your most expensive
         | model on users that are just checking out your platform.
         | 
         | A bit like how Poker sites give you better cards in the
         | beginning.
        
         | joshu wrote:
         | long-press and select "not interested" and it will figure it
         | out pretty quickly.
        
         | dmix wrote:
         | > Maybe because I compulsively watch videos until the end even
         | if I don't like them.
         | 
         | That would definitely do it, basically destroying their most
         | important signal.
         | 
         | TikTok is best in class for recommendeding content and I
         | personally haven't see a dip in quality. Aka I never get trashy
         | videos or anything cringe, just a consistent stream of
         | science/tech, local Toronto restaurant reviews, cat videos, etc
        
         | realfeel78 wrote:
         | Sounds like a you problem? At any rate, they added an option to
         | reset it anytime a few weeks ago:
         | 
         | https://techcrunch.com/2023/03/16/tiktoks-new-feature-lets-y...
        
         | MuffinFlavored wrote:
         | I want to know what portion of the algorithm is responsible
         | for, when you are given a new blank slate user, "tries" certain
         | categories
         | 
         | like, let's present this user travel or cooking material,
         | that's usually safe
         | 
         | then, let's try things like certain genres of music, we'll see
         | what they like/don't like
         | 
         | what i don't get is... how does that _first_ recommendation on
         | the #foryoupage or discover or whatever it 's called, starts
         | recommending you the sex workers who try to post as close to
         | NSFW material as possible, get you to land on their profile, in
         | their bio is a link to their Instagram or Linktree, and then
         | from there it's an OnlyFans link
         | 
         | does the system try to recommend a soft entry into this content
         | and then just pivot away if the user doesn't like it?
        
           | bluefirebrand wrote:
           | It probably just recommends a bunch of stuff that is popular
           | at the moment for new users.
           | 
           | Or it tries to match you to an existing profile it has from
           | some ad network data or something.
        
             | MuffinFlavored wrote:
             | > It probably just recommends a bunch of stuff that is
             | popular at the moment for new users.
             | 
             | I get that, but I feel like it starts with "known
             | safe/neutral" material like
             | cooking/traveling/photography/whatever
             | 
             | How can it detect "hey, this person might like if we
             | introduce softcore porn into their timeline"? Like, do they
             | have softcore porn identified on a scale and they introduce
             | the really "safe" stuff and then gradually crank it up? Why
             | are they presenting softcore porn at all? The Apple App
             | Store is cool with that ToC wise?
        
               | bluefirebrand wrote:
               | I think you're overcomplicating it.
               | 
               | It's not trying to start with "safe" stuff, it's not
               | trying to "gently introduce" softcore porn.
               | 
               | It's going "This video got a billion views in the last 30
               | minutes, people must love it, let's keep amplifying it to
               | any account that hasn't explicitly rejected this category
               | of content"
               | 
               | Presumably blank slate accounts are treated as open to
               | anything, until people start curating.
        
               | MuffinFlavored wrote:
               | during the curation process, how does it start to slowly
               | introduce sex workers? because when I was on TikTok, it
               | was a non-zero amount of the content
        
               | libraryatnight wrote:
               | At the risk of going down a rabbit hole for no real
               | reason, I don't use tiktok but when I speak to those that
               | do I've not yet heard this softcore porn/sex worker
               | thing.
               | 
               | For example, in my mind, not all ASMR content might lead
               | to sexualized recommendations, but a girl in a bikini top
               | with cat ears doing ASMR might generate both
               | recommendations for ASMR and other more cam-girl like
               | content. So I guess my question is, when you're starting
               | off in tiktok seeing cooking videos, do you trend towards
               | ones that feature 'sexier' hosts? They might not be sex
               | workers to you, but they might be making tiktok think
               | you're interested.
               | 
               | Also, what does tiktok know about you to start? What info
               | do you have to give it to start an account?
        
               | MuffinFlavored wrote:
               | so you agree that tiktok is able to classify "cooking
               | videos" and "cooking videos with slightly sexualized
               | hosts"? and that they "willingly" "try to push in
               | recommendations" posts with higher "sexuality" attached
               | content?
        
               | libraryatnight wrote:
               | No, again, my assumption is that the user would trend
               | towards that content. You don't need to push people
               | towards it if you have a nuanced enough profile of each
               | video.
               | 
               | (all things made up for this example)
               | 
               | cookinglady39 does a beach bbq recipe tiktok, in a
               | bathing suit. You watch it. They give you another
               | cookinglady39 video where she's back in the kitchen, you
               | skip it, they give you a new cooking host also female,
               | also dressed in summer attire cooking outside. You watch
               | til the end. It gives you a man cooking outside, you
               | skip. Nothing you've seen so far has been sexual, but
               | tiktok is probably picking up on some trends that might
               | lead them to give you more and more things done by women,
               | then women in a certain setting, dressed a certain way
               | and so on.
        
               | spullara wrote:
               | TikTok gives you the content you enjoy. When someone
               | complains about TikTok content I basically assume they
               | don't understand how good the algorithm is and that you
               | just like that kind of stuff. I don't care whether you do
               | or not but TikTok thinks you do _because of the feedback
               | you are giving the app_. I mean, you clicked their
               | profile and followed their links all the way to onlyfans.
               | They have to assume you like it.
        
               | bluefirebrand wrote:
               | My guess is that there is no "slowly introducing"
               | anything.
               | 
               | It just sees that content made by sex workers is popular
               | and puts it in your feed.
        
               | Bjartr wrote:
               | Assuming we're starting with a blank slate, and a
               | heteronormative male user that would happen to enjoy
               | consuming that content on TikTok:
               | 
               | In the initial set of recommendations based only on
               | overall popularity, there might be a video that's popular
               | that incidentally contains a pretty woman. If the user
               | skips most videos after barely a few seconds, but watches
               | that one fully 3 times through, then the recommendation
               | engine probably looks at users it does know more about
               | that exhibit similar behavior and have higher engagement.
               | It will then recommend videos that those users would
               | probably watch a lot. Now the recommendations are shifted
               | in the direction from "generally popular" to "contains
               | pretty women". You repeat this enough times and the user
               | ends up navigating the space of recommendations until
               | they're maximally engaged (in theory). That means they
               | might end up at softcore porn. Goodness knows that porn
               | is popular if nothing else.
               | 
               | The recommendation engine doesn't even have to know
               | anything about the content of the video. Just know what
               | already high-engagement users that watched that video a
               | lot also watched a lot.
               | 
               | That's at it's most basic really, I'm sure there's
               | additional cleverness on top in practice.
        
           | Brystephor wrote:
           | TikTok likely has enough information about others that it can
           | begin to build a profile about you from the moment you login.
           | 
           | Let's use a hypothetical scenario: Someone states that they
           | identify as a man, they're in the 20-25 year old age range,
           | and based on phone location you can gather that they live in
           | Texas. Now you're labeled as a 20-25yo Texas Man. Then you
           | can look at others who fall in the "20-25yo Texas Man"
           | category and show things you'd expect that group to like
           | because chances are, you're more similar to others in the
           | group than being a true outlier. If other people in the
           | "20-25yo Texas Man" group have expressed interest in Apples,
           | NSFW material, and lawn mowing videos, then since you're in
           | that group, it's going to start off with that same material.
           | 
           | disclaimer: i've never signed up for tiktok and have no clue
           | if this is how they do it.
        
           | pimlottc wrote:
           | The classic terminology for this in AI/ML is "explore vs
           | exploit", i.e. striking a balance between trying new things
           | (in hopes of finding a new favorite) vs going back to the
           | tried-and-true.
        
       | cwillu wrote:
       | Referenced paper at https://arxiv.org/pdf/2209.07663.pdf
        
       | SpaceManNabs wrote:
       | Similar article posted here:
       | https://news.ycombinator.com/item?id=34836877
        
         | dang wrote:
         | Thanks! Macroexpanded:
         | 
         |  _The secret sauce of TikTok's recommendations_ -
         | https://news.ycombinator.com/item?id=34836877 - Feb 2023 (138
         | comments)
        
       ___________________________________________________________________
       (page generated 2023-04-14 23:00 UTC)