[HN Gopher] Scrambling eggs for Spotify with Knuth's Fibonacci h...
___________________________________________________________________
Scrambling eggs for Spotify with Knuth's Fibonacci hashing
Author : pncnmnp
Score : 99 points
Date : 2023-12-09 14:00 UTC (9 hours ago)
(HTM) web link (pncnmnp.github.io)
(TXT) w3m dump (pncnmnp.github.io)
| justsayinginnt wrote:
| Shame we can't choose the type of shuffle, based on your
| mood/what you're listening too (not to add even more complexity).
|
| e.g. For classical music I'd prefer stringing together pieces
| from the same orchestra/composer. But for some contemporary music
| would like mix the artist/album up more.
| morkalork wrote:
| I'd like one that on random and uses each song as an experiment
| to determine your mood. If you listen through that's positive,
| instantly skip? That's a negative signal. Just adapt on the fly
| at the beginning of each listing session.
| passion__desire wrote:
| Based on your comment, I would love to see a feature just as
| we have a prompt travel in image generations, how about genre
| travel? A playlist of 10 songs taking me from rock to french
| house.
| esafak wrote:
| The song radio does that. Really well in my experience.
| passion__desire wrote:
| I am not sure if the situation I describe is currently in
| production in Instagram.
|
| But here's what I noticed when I went to the explore section of
| Instagram.
|
| At display, there would be distinct choices of images and
| reels, varied and related to my interests. But if I select a
| particular reel/image type (e.g. animation or comics or 3d
| render), it would take that as a signal and would expand the
| feed based on that selection. I love that feature.
|
| I guess Spotify Radio do that to some extent, not sure.
| j4yav wrote:
| I have a playlist with 150 hours of music on it and it seems to
| play the same five or six songs every day, no matter what. I wish
| I could choose "actually random", I wouldn't mind unexpected
| clustering.
| Solvency wrote:
| It's absolutely wild. I have over 50 playlists I've made, some
| over 10 hours long with no repetitions anywhere. I have
| extremely thorough and diverse music interests and habits d
|
| And yet, when Spotify's shit tier algorithm takes over, it
| kicks me into a similar 5 to 6 meme set of songs every single
| time. It's an absolute joke.
| mrkeen wrote:
| Out of curiosity, does it happen on some clients and not
| others?
|
| I remember hearing of a bug where if you played on a remote
| device, it would transfer the first part of your playlist (10?
| 100 tracks?), and then shuffle would only choose from among
| them.
|
| But it's been 5+ years so things may have changed and/or I
| could be remembering completely wrong.
| hifreq wrote:
| That might be true, but my impression is that the algorithm
| is weighted, so some artists (more popular in general,
| recommended, played more often by that individual user, would
| get picked more frequently by the "random" algorithm).
| varispeed wrote:
| I wonder if this is market manipulation and not a honest
| mistake. To make certain labels and artists more money.
| crazygringo wrote:
| Same. Out of 5,000 songs, am I really supposed to be hearing
| certain songs 3+ days in a row, when I'm only listening to 20
| songs a day...?
|
| I can't tell if it's:
|
| - Certain artists are paying Spotify to favor them in
| randomization?
|
| - There's some kind of shared random seed across devices that
| results in picking a tiny subset of songs to randomize from in
| the first place?
|
| - Other?
|
| I do notice that the effect seems to persist for maybe a week,
| then I'll never hear those songs again, but now it'll be
| different songs that keep popping up repeatedly.
|
| There's a related effect when you launch a radio station based
| on an artist or track. If you launch it multiple times in the
| same day or week, you get the exact same list of tracks. But
| maybe a week later the tracks have changed, like the radio has
| been recalculated based on a different random seed.
| candiddevmike wrote:
| This article is pretty insightful on the ways artists can
| improve their presence on Spotify, including images of the
| backend tools available to artists:
| https://blog.groover.co/en/tips/spotify-streams/
| hifreq wrote:
| I was excited to get an insight into some hidden backend
| tools but it seems like this post is just about playlists
| (creating your own playlists, paying for placements in
| other people's playlists), and ads?
| cvoss wrote:
| There's maybe some counterintuitive math going on here.
|
| If you have 100 songs and listen to 1 song per day (which is
| 1% of the library), on any given day your odds of hearing the
| same song as yesterday are 1 in a 100.
|
| If you have 1000 songs and listen to 10 per day (still 1%),
| the odds of hearing a song that was also played yesterday are
| a little less than 1 in 10, right?
|
| So what matters is not only what fraction of your library's
| play time you sample daily, but also how finely subdivided
| the time is into individual tracks for sampling.
| brasetvik wrote:
| A bit of birthday paradox too? :)
| crazygringo wrote:
| > _If you have 1000 songs and listen to 10 per day (still
| 1%), the odds of hearing a song that was also played
| yesterday are a little less than 1 in 10, right?_
|
| No. It's 10*(1/1000)=1%.
|
| It'll happen a few times a year only.
| esafak wrote:
| It might think you really like those songs. I listen to my
| favorites every day, among others.
| hifreq wrote:
| As far as I can tell all Spotify's playlists (in all forms,
| e.g. including "random" and "track radio") use weighted
| algorithms based on several factors like user's play history,
| Spotify's recommendations (probably includes paid promotions),
| general artist's popularity, etc.
|
| When I still used Spotify, I would get a dozen of my favorite
| artists mixed into basically any "playlist" I pick. Was one of
| the reasons I quit Spotify - they are too opinionated on what I
| should listen to.
| at_a_remove wrote:
| Eh, unexpected clustering is sort of okay but again, it's what
| people respond to versus what they think they will. I've
| written some scripts so I can dredge playlists off of some
| radio stations that are hooked up to the Internet as part of a
| quixotic little project of mine and I've gone through, looked
| for dupes, etc. Let's say we're doing new wave (sure to start
| an argument there). What people seem to dislike is ABC, The
| Buggles, the Cure, Duran Duran, Ebn Ozn, Fra Lippo Lippi, Duran
| Duran, etc. Just having a gap between there feels
| insufficiently random.
|
| Clustering apparently ought to feel deliberate. Now think back
| to when you had actual DJs selecting tracks on the radio. One
| of the techniques was "Two from a particular band." Not two
| from a band with some tracks between them.
|
| Similarly, one can do a "Four tracks from 1994" to provide a
| cluster in time, another technique I've heard.
|
| If anything, the more metadata you have, the more you can
| provide short runs of _something_. Microgenres, for example.
| fetzu wrote:
| Spotify's shuffling is so god-tier bad that I would be
| flabbergasted if it wasn't biased towards songs with less
| royalties to pay out..
| crazygringo wrote:
| My _biggest_ desire for Spotify would be the ability to randomize
| a playlist but _give massive priority to your least-listened
| songs_.
|
| So if you have a playlist with 10 songs A-J that you've listened
| to 10 times each. And then you add 10 new songs K-T that you've
| listened to 1 time each... Then _every time_ I shuffle the
| playlist, I want songs K-T to be the _first 10 songs in random
| order_ until I 've played _them_ 10 times each.
|
| I mean, things can be mixed up a little more than that... but
| generally speaking, I want to listen to my least-listened songs
| much more than the ones I've been listening to forever. But I
| don't want to have to separate them out into special playlists
| "newest", "newer", "kinda new", "old", "oldest" which is
| annoying.
| bazzargh wrote:
| The fibonacci hashing mentioned here looks to be just Kronecker
| low-discrepancy sequences https://en.wikipedia.org/wiki/Low-
| discrepancy_sequence#Addit... which is used for dithering too
| (see eg https://extremelearning.com.au/unreasonable-
| effectiveness-of...). It's quite likely already what Spotify were
| obliquely referring to. But I think the author missed a trick:
| tracks by an artist occur close together if your collection is
| already arranged by artist/album. Picking tracks far apart from
| the globally sorted list is what this sequence will do, and do
| you don't need to do anything at all to split by
| artist/album/category etc:
|
| 1. for the Kth track in a sorted collection of N with seed S in
| [0,1), pick track number floor(N((K(phi-1)+S)%1)).
|
| 2. There is no step 2.
|
| Since Spotify suggest their algorithm is only a couple of lines,
| I'd guess this is what they did.
|
| edited to add: the above would get pretty boring because songs
| would always follow one another in sequence, no matter what the
| seed was. But since the chance of picking an irrational number at
| random in a real interval is a near certainty (because rationals
| are countable and reals are uncountable) you can just pick any
| new random number as the stepsize in the sequence when you start
| to shuffle play and it should be good enough; picking in [.25,
| .75) avoids steps that take you back too close to the same
| artist.
| pncnmnp wrote:
| This is fantastic! I still need to carefully study it all, but
| I implemented the idea you described in the original comment.
| It seems the error lies somewhere between the algorithm I
| mentioned and the Fisher-Yates shuffle, based on one million
| playlists: Mean: 0.13, Median: 0.11, Mode: 0, p25: 0.06, p75:
| 0.18, p90: 0.26, p95: 0.31.
|
| It's possible I'm missing something here. Regarding your edit
| about 'picking any new random number as the stepsize,' wouldn't
| it be affected by a 'bad break,' as mentioned by Knuth? I still
| need to work out the 'bad break' proof.
|
| Edit: If it helps, here is the code I used to test it out -
| https://gist.github.com/pncnmnp/8afb7903f61ec69a157287435a63...
| bazzargh wrote:
| I don't have Knuth to hand...but yeah it is possible you can
| accidentally pick a number that is too close to a rational,
| so you get short repeats of tracks instead of long ones. What
| you're after is a number with a high approximation
| coefficient https://en.wikipedia.org/wiki/Liouville_number#Ir
| rationality... but in practice you just need to know that it
| is irrational "enough", eg you might want to avoid repeats in
| the first 1000 plays.
|
| You can get that by rejection sampling on the random number
| and using the Farey sequence to find nearby low-denominator
| fractions
| https://en.wikipedia.org/wiki/Farey_sequence#Applications -
| if I pick a number between 0 and 1 I can use 0/1, 1/1 as the
| starting point for the usual iteration. (and then scale to
| .25,.75 at the end). You pick your approximation bound mu and
| reject if log(abs(p/q - x)) > - mu log q, repeat the farey
| iteration until q is large enough. (just making this up as I
| go on an ipad, I may have a sign wrong in there or whatever)
|
| It is actually much simpler than this ^ to just explicitly
| check the first 1000 numbers for a loop. It's simpler than
| tortoise and hare: you know there is no run-in, so the first
| number is in the loop, and you want that number to not
| reappear in the first min(N,1000) items.
| HelloNurse wrote:
| The problem is separating tracks in the same group. If the
| globally sorted reference list of N songs has M consecutive
| entries that we want to spread evenly, they should be
| "shuffled" at increments of floor(N/M) or 1+floor(N/M): easy to
| guarantee with tortured recursive constructions like the one
| that opens the article, less obvious with a hash function that
| is affected by floating point approximations.
|
| An integer formula that is more obvious to work: starting from
| any number between the maximum group size and (if larger)
| N/phi, pick an increment D as the next larger or smaller
| integer that has no common factors with N (to ensure full
| period) and map index K to index (S+K*D)%N.
| galdosdi wrote:
| This entire HN thread reads like some fiction made up by a free
| software activist 15 years ago, as a dystopian cautionary tale.
| It's all complaints and speculations about stuff that just worked
| back then.
|
| Back then we had our music locally and we chose our own players,
| of which there were many and easy to make another one. Actually,
| hacking on music playback was easy and not uncommon. We had full
| control of our musical lives.
| Tao3300 wrote:
| _Scrambx0red 3ggs_ by Cory Doctorow (2002)
| pncnmnp wrote:
| Tangentially, I encourage everyone to check out Ken Thompson's
| talk on his jukebox:
| https://www.youtube.com/watch?v=kaandEt_pKw
| jorjordandan wrote:
| I was a tile installer in a previous life, and occasionally a
| customer would make the dreaded request to have an accent tile
| placed 'randomly' throughout a backsplash. They were never happy
| with the placement, and clearly had no understanding of
| randomness. Generally what they actually wanted was 'evenly
| distributed'. I remember one particular customer kept changing
| the accent tiles and moving them until they converged upon a very
| specific pattern throughout the back splash except for one tile
| that was out of the pattern, that they kept 'wrong' out of some
| ego driven desire to justify their request for randomness.
| plagiarist wrote:
| The perfect pattern with one wrong tile is so awful. It's a
| fair punishment that they have to live with that.
|
| I want those non-repeating pattern tiles, how awful would those
| be to tile?
| world2vec wrote:
| Slightly unrelated but just learned about Hacker News pool. Very
| interesting.
| jdontillman wrote:
| There was a delightful Usenet post way back around 1990, where
| someone described how they had just purchased a new multi-CD
| player. They very excitedly filled it up with their collection of
| Prince CDs, and set it to random shuffle play mode.
|
| Great for a while, but then they complained that all the slow
| songs were bunched together. And perhaps the random shuffle play
| mode was sampling the songs, deriving the tempo of each, and
| adjusting the shuffle accordingly.
|
| Very funny.
|
| ---
|
| Heh-heh, I independently came up with Fibonacci hashing for color
| many years ago.
|
| My web app was drawing a diagram of N rectangular items, color-
| coded to tell them apart, with a table listing the details of
| each below.
|
| (Normally I would use EIA standard colors, with a nod to my EE
| brethren.)
|
| But I didn't want the colors to bias anything. So you'd normally
| try random colors. But random colors can come out weird and some
| can be close together.
|
| So I used a Golden Angle around the hue circle, with a constant
| brightness and saturation. And sure enough, the generated colors
| were nicely differentiated.
|
| BUT... not as nice as I'd like. Something was wrong.
|
| It turns out that our perception of color is more complex. And
| when we're differentiating between colors, it really, really
| helps if the colors are familiar, and describable.
|
| So simple colors like blue and purple are much easier to
| differentiate than a new weird blueish color and a new weird
| purplish color.
|
| So my Golden Angle colors were technically superior, but not as
| good a user experience.
| brcmthrowaway wrote:
| What did the ipod shuffle use?
| pncnmnp wrote:
| Hi, author here! Interesting question!
|
| Martin Fiedler briefly addresses this topic in a comment on his
| blog post about shuffling algorithms
| (https://keyj.emphy.de/balanced-shuffle/)
|
| > Apple has a so-called "smart shuffle" algorithm, but this
| merely puts higher-rated tracks in front of lower-rated ones.
| So basically, it's just random shuffle, followed by a sort-by-
| rating operation. I don't know of any product (software or
| hardware) that uses some kind of smart, balanced or whatever-
| you-like-to-call-it shuffle based on the principles I described
| in the text.
|
| I'm not sure what they are using today.
___________________________________________________________________
(page generated 2023-12-09 23:00 UTC)