[HN Gopher] Monolith: The Recommendation System Behind TikTok
___________________________________________________________________
Monolith: The Recommendation System Behind TikTok
Author : tim_sw
Score : 84 points
Date : 2023-04-14 19:27 UTC (3 hours ago)
(HTM) web link (gantry.io)
(TXT) w3m dump (gantry.io)
| maxk42 wrote:
| Following the link to the cuckoo hasing algorithm on Wikipedia, I
| don't quite understand what it's doing. I looked up another
| couple articles but still find myself confused. Does anyone have
| a link to a resource with an easy-to-follow writeup of how cuckoo
| hashing works?
| bhawks wrote:
| I found this pretty clear and it includes an interactive
| visualization:
|
| https://www.lkozma.net/cuckoo_hashing_visualization/
| lelandfe wrote:
| Recommendation: once you _do_ grok it, see if you can update
| the intro on the Wikipedia article to help others out :)
| Groxx wrote:
| Hash collisions happen. What do you do with them?
|
| Chained (aka open hashing) hashtables store pointers rather
| than values in the "main" array, and just put collisions in a
| linked list on that hash cell. Easy, but indirections have a
| cost.
|
| Closed hashing (aka open addressing, yes it's confusing)
| hashtables take hash(input) and just do it again to get a
| second location, and put it there. Repeat N times for N
| hash(hash(hash(...))) collisions. Dense, but needs more complex
| logic to figure out when to stop looking / what to do when
| deleting because anything could be at location X due to a
| collision with something else.
|
| Cuckoo hashtables use two (or more) hash algorithms rather than
| one, and dedicate a portion of the memory to each algorithm. If
| something's already in the first algorithm's location, put the
| thing it's colliding with in that thing's second location. On
| read, check both locations. Dense, relatively simple for both
| insert and deletion, and tolerant of a few collisions with low
| cost.
|
| (cuckoo hashtables are a form of closed hashing / open
| addressing, because they keep all the data within the data-
| sized arrays, not storing extra info. and all of these are
| over-generalizing / there are fairly different-looking
| strategies available, e.g. it's not necessarily pointers or
| strictly repeated hashing)
|
| And you could just reject the existence of collisions entirely
| and move to a new, larger array immediately. That tends to
| perform so poorly in both cpu and memory that nothing really
| does it in practice, but it is technically an option.
| CBLT wrote:
| I understood it from this HN comment:
| https://news.ycombinator.com/item?id=8491456
| chadrs wrote:
| I must be rare in that the longer I use tiktok, the less relevant
| the recommendations feel. Maybe because I compulsively watch
| videos until the end even if I don't like them.
| jabbany wrote:
| Seems like a common problem of recommenders tbh.
|
| Like binary search, they're really good at finding local optima
| quickly, and then are rather bad at getting out of them once
| they get there.
| nickthegreek wrote:
| This can happen to me if it gets stuck down a avenue that it
| thought I was interested in. But the next day or even a few
| hours later, it seems to correct itself.
| thomasahle wrote:
| They are probably trying to save GPU power on already hooked
| users. This is a common trick in Recommendation Systems. You
| want to spend the most resources / run your most expensive
| model on users that are just checking out your platform.
|
| A bit like how Poker sites give you better cards in the
| beginning.
| joshu wrote:
| long-press and select "not interested" and it will figure it
| out pretty quickly.
| dmix wrote:
| > Maybe because I compulsively watch videos until the end even
| if I don't like them.
|
| That would definitely do it, basically destroying their most
| important signal.
|
| TikTok is best in class for recommendeding content and I
| personally haven't see a dip in quality. Aka I never get trashy
| videos or anything cringe, just a consistent stream of
| science/tech, local Toronto restaurant reviews, cat videos, etc
| realfeel78 wrote:
| Sounds like a you problem? At any rate, they added an option to
| reset it anytime a few weeks ago:
|
| https://techcrunch.com/2023/03/16/tiktoks-new-feature-lets-y...
| MuffinFlavored wrote:
| I want to know what portion of the algorithm is responsible
| for, when you are given a new blank slate user, "tries" certain
| categories
|
| like, let's present this user travel or cooking material,
| that's usually safe
|
| then, let's try things like certain genres of music, we'll see
| what they like/don't like
|
| what i don't get is... how does that _first_ recommendation on
| the #foryoupage or discover or whatever it 's called, starts
| recommending you the sex workers who try to post as close to
| NSFW material as possible, get you to land on their profile, in
| their bio is a link to their Instagram or Linktree, and then
| from there it's an OnlyFans link
|
| does the system try to recommend a soft entry into this content
| and then just pivot away if the user doesn't like it?
| bluefirebrand wrote:
| It probably just recommends a bunch of stuff that is popular
| at the moment for new users.
|
| Or it tries to match you to an existing profile it has from
| some ad network data or something.
| MuffinFlavored wrote:
| > It probably just recommends a bunch of stuff that is
| popular at the moment for new users.
|
| I get that, but I feel like it starts with "known
| safe/neutral" material like
| cooking/traveling/photography/whatever
|
| How can it detect "hey, this person might like if we
| introduce softcore porn into their timeline"? Like, do they
| have softcore porn identified on a scale and they introduce
| the really "safe" stuff and then gradually crank it up? Why
| are they presenting softcore porn at all? The Apple App
| Store is cool with that ToC wise?
| bluefirebrand wrote:
| I think you're overcomplicating it.
|
| It's not trying to start with "safe" stuff, it's not
| trying to "gently introduce" softcore porn.
|
| It's going "This video got a billion views in the last 30
| minutes, people must love it, let's keep amplifying it to
| any account that hasn't explicitly rejected this category
| of content"
|
| Presumably blank slate accounts are treated as open to
| anything, until people start curating.
| MuffinFlavored wrote:
| during the curation process, how does it start to slowly
| introduce sex workers? because when I was on TikTok, it
| was a non-zero amount of the content
| libraryatnight wrote:
| At the risk of going down a rabbit hole for no real
| reason, I don't use tiktok but when I speak to those that
| do I've not yet heard this softcore porn/sex worker
| thing.
|
| For example, in my mind, not all ASMR content might lead
| to sexualized recommendations, but a girl in a bikini top
| with cat ears doing ASMR might generate both
| recommendations for ASMR and other more cam-girl like
| content. So I guess my question is, when you're starting
| off in tiktok seeing cooking videos, do you trend towards
| ones that feature 'sexier' hosts? They might not be sex
| workers to you, but they might be making tiktok think
| you're interested.
|
| Also, what does tiktok know about you to start? What info
| do you have to give it to start an account?
| MuffinFlavored wrote:
| so you agree that tiktok is able to classify "cooking
| videos" and "cooking videos with slightly sexualized
| hosts"? and that they "willingly" "try to push in
| recommendations" posts with higher "sexuality" attached
| content?
| libraryatnight wrote:
| No, again, my assumption is that the user would trend
| towards that content. You don't need to push people
| towards it if you have a nuanced enough profile of each
| video.
|
| (all things made up for this example)
|
| cookinglady39 does a beach bbq recipe tiktok, in a
| bathing suit. You watch it. They give you another
| cookinglady39 video where she's back in the kitchen, you
| skip it, they give you a new cooking host also female,
| also dressed in summer attire cooking outside. You watch
| til the end. It gives you a man cooking outside, you
| skip. Nothing you've seen so far has been sexual, but
| tiktok is probably picking up on some trends that might
| lead them to give you more and more things done by women,
| then women in a certain setting, dressed a certain way
| and so on.
| spullara wrote:
| TikTok gives you the content you enjoy. When someone
| complains about TikTok content I basically assume they
| don't understand how good the algorithm is and that you
| just like that kind of stuff. I don't care whether you do
| or not but TikTok thinks you do _because of the feedback
| you are giving the app_. I mean, you clicked their
| profile and followed their links all the way to onlyfans.
| They have to assume you like it.
| bluefirebrand wrote:
| My guess is that there is no "slowly introducing"
| anything.
|
| It just sees that content made by sex workers is popular
| and puts it in your feed.
| Bjartr wrote:
| Assuming we're starting with a blank slate, and a
| heteronormative male user that would happen to enjoy
| consuming that content on TikTok:
|
| In the initial set of recommendations based only on
| overall popularity, there might be a video that's popular
| that incidentally contains a pretty woman. If the user
| skips most videos after barely a few seconds, but watches
| that one fully 3 times through, then the recommendation
| engine probably looks at users it does know more about
| that exhibit similar behavior and have higher engagement.
| It will then recommend videos that those users would
| probably watch a lot. Now the recommendations are shifted
| in the direction from "generally popular" to "contains
| pretty women". You repeat this enough times and the user
| ends up navigating the space of recommendations until
| they're maximally engaged (in theory). That means they
| might end up at softcore porn. Goodness knows that porn
| is popular if nothing else.
|
| The recommendation engine doesn't even have to know
| anything about the content of the video. Just know what
| already high-engagement users that watched that video a
| lot also watched a lot.
|
| That's at it's most basic really, I'm sure there's
| additional cleverness on top in practice.
| Brystephor wrote:
| TikTok likely has enough information about others that it can
| begin to build a profile about you from the moment you login.
|
| Let's use a hypothetical scenario: Someone states that they
| identify as a man, they're in the 20-25 year old age range,
| and based on phone location you can gather that they live in
| Texas. Now you're labeled as a 20-25yo Texas Man. Then you
| can look at others who fall in the "20-25yo Texas Man"
| category and show things you'd expect that group to like
| because chances are, you're more similar to others in the
| group than being a true outlier. If other people in the
| "20-25yo Texas Man" group have expressed interest in Apples,
| NSFW material, and lawn mowing videos, then since you're in
| that group, it's going to start off with that same material.
|
| disclaimer: i've never signed up for tiktok and have no clue
| if this is how they do it.
| pimlottc wrote:
| The classic terminology for this in AI/ML is "explore vs
| exploit", i.e. striking a balance between trying new things
| (in hopes of finding a new favorite) vs going back to the
| tried-and-true.
| cwillu wrote:
| Referenced paper at https://arxiv.org/pdf/2209.07663.pdf
| SpaceManNabs wrote:
| Similar article posted here:
| https://news.ycombinator.com/item?id=34836877
| dang wrote:
| Thanks! Macroexpanded:
|
| _The secret sauce of TikTok's recommendations_ -
| https://news.ycombinator.com/item?id=34836877 - Feb 2023 (138
| comments)
___________________________________________________________________
(page generated 2023-04-14 23:00 UTC)