[HN Gopher] Show HN: I made TV Sort, a web-based game for rankin...
       ___________________________________________________________________
        
       Show HN: I made TV Sort, a web-based game for ranking TV show
       episodes
        
       Over this Christmas break, while discussing the best episodes of
       Frasier with my mother (as we tend to do when I get to see her), I
       thought about coming up with something that's less arbitrary than
       1-10 ratings.  The result is TV Sort. It just uses a sorting
       algorithm, but... it's human powered. When the algorithm needs to
       compare two items, it asks you to compare them, and with that you
       end up with a full, thoroughly sorted episode list.  It uses TMDB,
       IMDB, and Wikipedia to extract episode information for any show, to
       help jog your memory when making episode comparisons.  It was a fun
       little experiment. And finally, I know -exactly- what I think the
       best and worst episodes are.[0]  Would love to hear your feedback,
       this is my first Show HN. ;)  Edit: I wrote a whole blog post about
       what went into making it, if anyone wants to read more of the
       technical detail behind it.[1]  [0]:
       https://tvsort.com/show/3452/matrix_01hjtxz2e1ewkrh44ja3mz0s...
       [1]: https://pocketarc.com/posts/tv-sort-engineering-the-
       ultimate...
        
       Author : pocketarc
       Score  : 56 points
       Date   : 2024-01-01 13:14 UTC (9 hours ago)
        
 (HTM) web link (tvsort.com)
 (TXT) w3m dump (tvsort.com)
        
       | boomboomsubban wrote:
       | I've tried with two different shows, both times the first
       | selection starts at S01E01, the second selection was I'd guess a
       | high rated episode from a later series.
       | 
       | If I say selection one is better, selection one becomes the next
       | episode, S01E02, while selection two stays the same episode. If I
       | say either or selection two is better, selection one is picked at
       | random and selection two stays the same episode. And then if I go
       | back to selection one be being better, selection one moves back
       | to the next episode, ie S01E03 now.
       | 
       | Is that the intended behavior? As I quickly got bored of saying
       | series one was better than this one episode in series three.
        
         | aquova wrote:
         | This is my one complaint as well. I really like the idea of
         | this website, and it seems to be made well! I tried out Star
         | Trek TNG, and regardless of which I picked, I was always
         | comparing against the same episode on the right side (Season 4
         | Episode 15, which I think is the exact middle episode of its
         | run?). I know all the comparisons must be made eventually, but
         | it would be nice to swap different episodes in so that I'm not
         | comparing every single episode against one at a time.
        
           | boomboomsubban wrote:
           | You're right, it is the exact middle episode. I picked shows
           | with variable series lengths, so it was hard to tell.
        
         | pocketarc wrote:
         | > Is that the intended behavior?
         | 
         | Yeah, I'm afraid so. At the start, selection two would always
         | be the same as the algorithm doesn't know anything about the
         | standing of any episode; it's trying to decide where in the
         | array to place it (above or below selection two).
         | 
         | It might be worth seeing if I can randomise the episodes
         | displayed, if only so it doesn't feel so repetitive.
        
           | bravura wrote:
           | Make it slightly fancier and assign a "similarity" between
           | users.
           | 
           | Start with uniform similarity, and as preferences are made,
           | adjust them.
           | 
           | Then you have personalization.
        
             | pocketarc wrote:
             | I'm not sure what you mean by this - are you saying that
             | the episodes that get displayed would be based on what is
             | likely that other users would've picked for the same show?
        
               | bravura wrote:
               | I'm saying that instead of one objective ordering, you
               | have a subjective per user ordering.
               | 
               | The decisions made by other users get a weight assigned
               | to them, which is individual to each logged in user. So
               | every users viewpoint is personalized.
               | 
               | (Apologies for the short explanation, I'm on mobile. If
               | you want to ask more you can email me.)
        
           | boomboomsubban wrote:
           | Why does selection one get randomized if you say selection
           | two is better, but then jump back to the original sequential
           | if you say selection one is better again? That behavior felt
           | bizarre.
        
             | pocketarc wrote:
             | Selection one jumps between the start and the end of the
             | show repeatedly until you make it to the middle (selection
             | two), after which point it'll move on to getting you to
             | rank the best episodes, and then the worst episodes.
             | 
             | Definitely looking into seeing if I can come up with
             | something better though! The problem is making sure that
             | whatever algorithm is picked remains as close to O(n log n)
             | as possible. Randomising options in a way that makes
             | require a lot more comparisons would be far worse.
        
               | ysavir wrote:
               | > Randomising options in a way that makes require a lot
               | more comparisons would be far worse.
               | 
               | For the algorithm, but not for the people taking time to
               | do the rankings. Which do you want to prioritize?
        
               | pocketarc wrote:
               | I was prioritising for taking less time total, but you're
               | right, that doesn't matter if the person gets bored and
               | leaves. I'm tinkering with it now and I think I have a
               | good solution to the problem. I'll be deploying it soon!
        
               | gamerDude wrote:
               | Maybe you can pull in episode rankings from another site
               | to seed a first ranking and then make an algorithm to
               | find where you disagree with the norm.
        
           | bena wrote:
           | I've done something similar.
           | 
           | You're essentially doing A/B comparisons across the entire
           | set.
           | 
           | It looks like you have it so you're basically setting where
           | "B" is before moving on to the next item.
           | 
           | This isn't strictly necessary. You could just generate a
           | novel pair every time and ask the user to choose between
           | them. The thing is that you'd need a way to track a user. So
           | you can make sure that user hasn't seen a certain pair
           | already.
           | 
           | Once you've exhausted all the pairs, you'll know exactly how
           | to sort the array. You'll have an idea before.
           | 
           | You might have an issue with circular lists though. People
           | are fickle. You could have someone who says that A > B > C >
           | A.
           | 
           | In this case, I'd allow for repeat pairings after a certain
           | amount of time. To allow the person to reevaluate
           | essentially.
           | 
           | You could also take the comparisons across all users and
           | compile a general sort of "best of" ranking.
        
             | pocketarc wrote:
             | > The thing is that you'd need a way to track a user.
             | 
             | The page you're on already knows what comparisons you've
             | made (otherwise how it would move forward), so this is
             | entirely possible!
             | 
             | I've come up with a way to randomise it (by just picking a
             | random element from an array of comparisons that haven't
             | yet been made), that's the next step. I deployed it
             | earlier, but there was a bug with it so I've had to
             | rollback until I can look into it.
        
         | bfdm wrote:
         | Had this exact same experience. Picked the Simpsons just to try
         | it out and in the first pair up was S1 stuff vs some episode
         | from S18 I've never seen.
         | 
         | So I picked The S1 episode. Then it was another S1 episode.
         | Repeat. Then an S2 episode.
         | 
         | My second option never changed from that first episode from S18
         | so I never picked it. Perhaps I could have gone through 12-17
         | seasons of episodes always picking option one until two
         | episodes I've never seen went head to head.
         | 
         | IMO, both options need to randomize each time until first
         | pairings are exhausted.
        
       | kyriakos wrote:
       | It keeps giving me the same episode vs a different episode every
       | time. Feels like I'm manually doing bubble sort :)
       | 
       | Maybe randomising the selection would keep me going longer.
        
         | FazJaxton wrote:
         | I agree. I think using the transitive property to place
         | episodes relative to others would help a lot. Also something
         | like "pick your favorite of these 3-5" might go faster and make
         | it feel more fun
        
           | pocketarc wrote:
           | I have been thinking about showing more than 2 options to
           | help it go faster. On mobile I guess that would be quite
           | difficult, but for people on bigger screens, yes, let them
           | run through episodes as fast as they can.
        
       | qurashee wrote:
       | This reminded me of adaptive comparative judgement. I'd be
       | interested in your algorithm on how you decide how to pair up
       | items.
        
         | pocketarc wrote:
         | Thank you for that! Adaptive comparative judgment gives a name
         | to something I've always believed, but never really quite put
         | my finger on; that comparing things one to another is more
         | reliable than random 1-10 ratings.
         | 
         | As for the algorithm, it's a basic Quicksort, building on the
         | work of Leonid Shevtsov[0].
         | 
         | [0]: https://leonid.shevtsov.me/post/a-human-driven-sort-
         | algorith...
        
           | munch117 wrote:
           | I think merge sort would provide a better experience.
           | 
           | Quicksort can be great for human-comparison sorting if you
           | let the user pick the pivot, and if you have a direct-
           | manipulation interface for dividing a big pile into two
           | smaller ones. Humans are great a scanning large numbers of
           | objects, and can split piles much faster than operating one
           | by one.
        
             | pocketarc wrote:
             | You are quite right. I had already been thinking about
             | merge sort because it's guaranteed to lead to fewer
             | comparisons, but what you said about piles would work great
             | when combined with showing more episodes at once, asking
             | the user "which of these 5 episodes is better" and getting
             | those comparisons out of the way all at once.
        
       | npinsker wrote:
       | The website isn't really my thing (maybe I'm weird, but I already
       | know my favorite 3-4 episodes of every show I've watched) -- but
       | the writeup is stellar and has just the right level of detail.
       | Using LLMs to generate episode summaries _and_ having a fallback
       | plan is really going above and beyond to get a great UX. Great
       | stuff.
        
         | brucethemoose2 wrote:
         | > I already know my favorite 3-4 episodes of every show I've
         | watched
         | 
         | I _did_ try the site, but the 2nd  "static episode" of the algo
         | was a fantastic finale (Venom of the Red Lotus), and it made me
         | realize I'm kinda like this too. I already knew the best few
         | episodes.
         | 
         | I don't remember _every_ show, and not always by name, but I
         | think I remember the best episodes of shows I would bother
         | discussing with friends.
        
         | pocketarc wrote:
         | Thank you for the kind words, and for helping me get an idea of
         | how my writing is coming across, I appreciate it a lot!
        
       | m3kw9 wrote:
       | It's always comparing to a same episode, not sure if that's by
       | design, but it made it feel stale, or not fun. Also you need to
       | also show progress bar if you are gonna compare shows across
       | seasons, but I'd stick to comparing shows in a single season
       | instead.
        
         | pocketarc wrote:
         | You're right about doing it by season, that would definitely
         | make it a lot easier to just jump in and start, without making
         | a big commitment to ranking a whole show.
        
       | timhh wrote:
       | You can do much better than just sorting. The simplest is to use
       | Bradley-Terry. It's a very simple algorithm and will let you
       | combine results from multiple users and gives an actual rating
       | rather than just a ranking.
       | 
       | It also handles the probabilistic nature of sorting better.
       | Traditional sorting algorithms rely on comparisons being sensible
       | (a>b and b>c implies a>b) but you probably won't get that if you
       | use people.
       | 
       | I explained it here:
       | 
       | https://stats.stackexchange.com/a/131270/60526
       | 
       | Quite closely related to matchmaking in computer games.
       | 
       | I remember there was a website a while ago that used pairwise
       | comparison to rank programming languages and I think whiskey.
       | Does anyone remember this? I could never find it again.
        
         | pocketarc wrote:
         | Wow, this is extremely helpful, I had no idea this existed and
         | will have to read up on it properly.
         | 
         | I think my main concern would be: What would it be like for the
         | first user to try to rank a show (as was the case for everyone
         | today)? All probabilities would be 50-50, no? But if it's a
         | show that's already been ranked at least once, then this could
         | help immensely, if I understand correctly.
        
           | timhh wrote:
           | Due to the regularisation yeah they all start at the same
           | rating. But you don't need many votes to start getting good
           | ratings.
           | 
           | I introduced this method to Dyson for objectively calculating
           | very subjective measurements (e.g. "how frizzy does this hair
           | look?"). We basically crowd sourced it to other engineers.
           | 
           | I did a load of studies on different methods by ranking
           | something that's sort of hard to rank but you know the answer
           | to - I used 10 grey squares that only differed by 2/255 and
           | you had to pick the brighter one.
           | 
           | Some other things:
           | 
           | 1. I don't remember the exact details but there's a slight
           | extension of the method where you give each user a "how good
           | are you" coefficient that you simultaneously solve for. This
           | helps eliminate people that vote randomly, and also inverts
           | the votes of people that deliberately pick the wrong answer
           | (as long as they're consistently wrong).
           | 
           | 2. You can put confidence limits on the values very easily
           | too since it's a MAP estimate. Actually I showed curves for
           | each item - basically how does the model probability vary as
           | you sweep one rating up and down a bit. People didn't
           | understand it at all though.
           | 
           | 3. You can calculate the rankings incrementally very quickly
           | (details in the answer) which means you can show users
           | comparisons that give the most information. This usually
           | means you end up showing users endless difficult choices
           | which can frustrate them, especially if it's a forced choice.
           | 
           | 4. I never found a principled way to incorporate a "they look
           | the same" option. I tried some ad-hoc methods and IIRC a
           | "much better, slightly better, can't tell, slightly worse,
           | much worse" scale gave the fastest convergence but it was
           | pretty unsatisfying that I just used some as hoc method to
           | add the results.
           | 
           | It was all closed source and I haven't worked there for years
           | so the code is lost to the wind unfortunately.
        
             | pocketarc wrote:
             | This is honestly very interesting, thank you so much for
             | elaborating! To be fair, after today, there are now nearly
             | 400 TV shows with votes, so I can start seriously looking
             | into this very soon!
        
         | michaelrpeskin wrote:
         | I did something similar, in fact, the math may be the same
         | thing and just expressed differently. But when I've had to rank
         | non-transitive things, I use Elo
         | (https://en.wikipedia.org/wiki/Elo_rating_system)
         | 
         | Many years ago when I was a mid-level developer at a
         | dysfunctional company, I was senior enough to be invited to
         | some "strategy" meetings, but junior enough that no one ever
         | listened to me. We (engineering, sales, marketing, etc.) spent
         | nearly an entire summer bickering over what "important"
         | features we were going to schedule next. I finally got fed up,
         | took everything out of the ticketing system and made random
         | parings and had people vote on it. Then just like a chess
         | match, updated their Elo score based on the outcome. Then I had
         | anyone who cared play match-ups for as long as they wanted. We
         | ended up getting a decent ordering of features and finally
         | ended the summer of hell meetings.
         | 
         | I don't know if the order was the correct order, I didn't stay
         | around long enough to see. I was just happy that sales and
         | marketing folks thought that I had some magic math that solved
         | their problem, and I was happy to be back developing and not
         | sitting in useless meetings.
         | 
         | What I like about this is that you don't have to be self
         | consistent, as long as on average you pick the best, it will
         | bubble to the top. And you can mix the results of other voters
         | and see what the "true" winner is. (Of course, to be fair, you
         | have to give each person the same number of match ups, in my
         | case, I just served match-ups to anyone who wanted to sit at
         | the terminal and vote, so someone could have wasted an entire
         | day and overwhelm the system - I didn't care at the time).
        
       | throwaway143829 wrote:
       | I'm both a casual TV viewer and someone with a short attention
       | span. My experience was: I tried comparing a few Seinfeld
       | episodes, found that I am unable to recall the episodes from the
       | descriptions, then gave up.
        
         | pocketarc wrote:
         | Yeah, this thing requires a decent time commitment as it
         | stands.
         | 
         | One of the ideas floating around is to make it so you do this
         | by season instead, to make it a bit more "quick casual fun",
         | rather than having to rank the entire show all at once. Scoring
         | 180 Seinfeld episodes as a casual TV viewer isn't going to be a
         | great experience.
         | 
         | Honestly, HN feedback has been immensely helpful, I couldn't be
         | more thankful.
        
       | tinyspacewizard wrote:
       | There should be a way to exclude seasons (e.g. Simpsons)
        
       | elektor wrote:
       | This is pretty neat! For quite some time I've yearned for a tool
       | like this to be able to rank my favorite songs of an artist
       | (embedded Spotify 30 second clips?) and other custom media like
       | comic strips.
        
         | pocketarc wrote:
         | Time for me to snatch musicsort.com or something. ;-)
         | 
         | That would honestly be fun, great call!
        
       | tnecniv wrote:
       | I'm curious what your pics for best Frasier episodes are?
       | 
       | Two of mine are when they become illegal caviar dealers and when
       | Niles wants to try weed. The episodes with Lilith tend to also be
       | very good. People love the Valentine's Day one, but I'm never a
       | fan of that style of episode. I can appreciate the genius in it
       | though
        
         | pocketarc wrote:
         | "Roe to perdition" is the caviar one, and it's fantastic as
         | well (#20 for me). Niles trying weed was #1 for me, "high
         | holidays". For me, #2 and #3 were "the doctor is out" (Frasier
         | getting involved with Patrick Stewart) and "out with dad"
         | (where Martin and Frasier go to the opera). The
         | misunderstandings are what does it for me!
         | 
         | Edit: But I'd honestly say that my whole Top 50 or so is great
         | episodes, there's not a big gap between the top and any of
         | them. It was hard ranking them all.
         | 
         | Also, my list is at:
         | https://tvsort.com/show/3452/matrix_01hjtxz2e1ewkrh44ja3mz0s...
        
           | tnecniv wrote:
           | All very great choices! "Dog army? What do you think that
           | means?" Is an inside joke with me and my best friend from
           | "High Holidays".
           | 
           | I also agree that the ranking is hard. A remarkable thing
           | about the show is how it's consistent throughout its run and
           | doesn't really fall off in quality despite going for over a
           | decade.
        
             | pocketarc wrote:
             | Frasier's "dear god" when he first sees goth Freddy as well
             | is just terrific.
             | 
             | "Well, thank you Lilith, for mentioning this little
             | development!"
        
       | xlbuttplug2 wrote:
       | I'd assume this doesn't work too well with shows that aren't
       | episodic in nature. Especially if you binge watch, the line
       | between when one episode ends and the next starts is usually
       | blurred.
        
         | brucethemoose2 wrote:
         | Highly serial shows definitely have standout episodes,
         | strong/weak seasons and such.
         | 
         | But I can see how this could be hard to remember.
        
       | samstave wrote:
       | This is super interesting, as over the break, I was thinking of
       | something exactly like this, but for video games.
       | 
       | Feel free to steal the following, if anyone likes:
       | 
       | Take a scrape of video game data from multiple sources, such as
       | steam, amazon, game sales, forum sizes on reddit, as an example,
       | then rank the games based on these metrics - but then have people
       | vote for "hall of fame"
       | 
       | Include as much historic data one game sales for all time, if
       | possible - as so many games were introduced during our formative
       | years, and thus have a deeper, more memorable impact.
       | 
       | --
       | 
       | I see the problem others state. Perhaps have a random button to
       | just give you a new selection.
       | 
       | Great work though.
        
         | brucethemoose2 wrote:
         | The problem with multi user title voting is that it becomes a
         | popularity contest, not an ostensible quality ranking. Steam
         | itself has spent years trying to address this.
         | 
         | TV episodes don't have that problem because each user has
         | viewed all (or at least a sequence of) the episodes.
        
           | samstave wrote:
           | Good point, but I guess the real issue with video games vs
           | shows - shows are passive, games are active - so your
           | experience with a game is going to be way different than a
           | show - you dont have to have had eye coordination for
           | Seinfeld.
           | 
           | :-)
        
       | dcreater wrote:
       | I think it's much better to do this for show vs show imo
        
         | freetonik wrote:
         | Wanted to say the same. I watch a lot of TV, but can barely
         | recall particular episodes by name or even some screengrabs.
         | But I'd be curious to see how my taste in shows themselves
         | compares to other people's
        
           | pocketarc wrote:
           | Definitely an idea for the future, to expand this beyond just
           | TV episodes!
        
       | pocketarc wrote:
       | I've just deployed a change that randomises selections -
       | hopefully it helps address the biggest concern raised here so
       | far. I'm curious to find out how people feel about it compared to
       | the previous way it was doing things.
        
         | bimblesticks wrote:
         | Trying to rank the IT Crowd right now and the episodes keep
         | refreshing before I click anything -
         | https://tvsort.com/show/2490/matrix_01hk2y2csmeg1tdczs12wbz6...
         | 
         | Am using Firefox with uBlock Origin if that affects anything
        
           | pocketarc wrote:
           | Thanks for the heads up! I shouldn't have tried to rush this
           | while this is under heavy use - I've reverted the change, and
           | it should be OK now!
        
       | pattle wrote:
       | This is neat. See also https://brickelo.com for the same thing
       | but with LEGO minifigures
        
       | maxlin wrote:
       | This seems to be somehow broken for me. It auto-skips everything
       | in about one second. If there's some point about a time limit
       | being required it should be 5 seconds at the very minimum.
        
         | pocketarc wrote:
         | That's what I get for trying to rush a deployment during this
         | HN period. ;) I've reverted the change, and it should be OK
         | now!
        
       | t_mann wrote:
       | Fyi, on my first try, one of the two episodes was always the
       | same, on my second try, it kept reloading two new episodes before
       | I could make a selection in an infinite loop.
        
         | pocketarc wrote:
         | Sorry about that! Had to do with the latest deployment. It's
         | okay now!
        
       | bequanna wrote:
       | Very cool!
       | 
       | How are you getting/using ibdb data? I thought the API was locked
       | down (?)
        
         | pocketarc wrote:
         | I wrote a blog post about it[0], but basically, I use the TMDB
         | API to get all shows and episodes, which is free and has a
         | generous rate limit.
         | 
         | For the episode descriptions, I grab the plot summaries from
         | IMDB and Wikipedia, with just HTML scraping, no APIs, and feed
         | them to an LLM to get the 3 main plot points, so you don't have
         | to read a bunch of rambling text when trying to quickly assess
         | the episode you're looking at.
         | 
         | [0]: https://pocketarc.com/posts/tv-sort-engineering-the-
         | ultimate...
        
       | kriro wrote:
       | I like pubmeeple for ranking board games. You can import your BGG
       | collection and it simply does pairwise comparison and presto, you
       | get a top X list sorted. Simple but works very well. I think they
       | also support TV shows and the like but maybe you have to input
       | your own list.
        
       | thih9 wrote:
       | Congrats on the launch!
       | 
       | Curious: how are you getting the data from e.g. TMDB? Was it a
       | one time download or are you refreshing it?
       | 
       | Feedback: Sometimes I didn't watch the latest season of a tv
       | show; at the moment I'm being asked to rate episodes from all
       | seasons. I'd like to rate episodes of a single season or up to a
       | certain season. Alternatively: an option to skip an episode, or
       | flag that I haven't seen it.
        
         | pocketarc wrote:
         | Thank you! The data from TMDB is cached when someone starts
         | ranking a show for the first time. At the moment there is no
         | system for refreshing it, but that's on the to-do list, so that
         | if there are improvements to the data, they'll be fetched.
         | 
         | Thanks for the feedback - I love the idea of a "I didn't watch
         | it", that's super important. Maybe that could drop it out of
         | the list entirely (since it can't count for anything).
         | 
         | The "rate a single season" idea is one of the main things to
         | come out of today, and it's where I'm going to take this next.
         | When you land on a show, instead of a single "start ranking"
         | button you'll have a list of all the seasons in the show, and
         | be able to rank them individually. And since all these
         | comparisons are stored in your browser, I can make it count
         | toward your "full show" ranking automatically, so that if you
         | ever get to that, you'll already have it in progress.
        
       ___________________________________________________________________
       (page generated 2024-01-01 23:01 UTC)