[HN Gopher] Show HN: I made TV Sort, a web-based game for rankin...
___________________________________________________________________
Show HN: I made TV Sort, a web-based game for ranking TV show
episodes
Over this Christmas break, while discussing the best episodes of
Frasier with my mother (as we tend to do when I get to see her), I
thought about coming up with something that's less arbitrary than
1-10 ratings. The result is TV Sort. It just uses a sorting
algorithm, but... it's human powered. When the algorithm needs to
compare two items, it asks you to compare them, and with that you
end up with a full, thoroughly sorted episode list. It uses TMDB,
IMDB, and Wikipedia to extract episode information for any show, to
help jog your memory when making episode comparisons. It was a fun
little experiment. And finally, I know -exactly- what I think the
best and worst episodes are.[0] Would love to hear your feedback,
this is my first Show HN. ;) Edit: I wrote a whole blog post about
what went into making it, if anyone wants to read more of the
technical detail behind it.[1] [0]:
https://tvsort.com/show/3452/matrix_01hjtxz2e1ewkrh44ja3mz0s...
[1]: https://pocketarc.com/posts/tv-sort-engineering-the-
ultimate...
Author : pocketarc
Score : 56 points
Date : 2024-01-01 13:14 UTC (9 hours ago)
(HTM) web link (tvsort.com)
(TXT) w3m dump (tvsort.com)
| boomboomsubban wrote:
| I've tried with two different shows, both times the first
| selection starts at S01E01, the second selection was I'd guess a
| high rated episode from a later series.
|
| If I say selection one is better, selection one becomes the next
| episode, S01E02, while selection two stays the same episode. If I
| say either or selection two is better, selection one is picked at
| random and selection two stays the same episode. And then if I go
| back to selection one be being better, selection one moves back
| to the next episode, ie S01E03 now.
|
| Is that the intended behavior? As I quickly got bored of saying
| series one was better than this one episode in series three.
| aquova wrote:
| This is my one complaint as well. I really like the idea of
| this website, and it seems to be made well! I tried out Star
| Trek TNG, and regardless of which I picked, I was always
| comparing against the same episode on the right side (Season 4
| Episode 15, which I think is the exact middle episode of its
| run?). I know all the comparisons must be made eventually, but
| it would be nice to swap different episodes in so that I'm not
| comparing every single episode against one at a time.
| boomboomsubban wrote:
| You're right, it is the exact middle episode. I picked shows
| with variable series lengths, so it was hard to tell.
| pocketarc wrote:
| > Is that the intended behavior?
|
| Yeah, I'm afraid so. At the start, selection two would always
| be the same as the algorithm doesn't know anything about the
| standing of any episode; it's trying to decide where in the
| array to place it (above or below selection two).
|
| It might be worth seeing if I can randomise the episodes
| displayed, if only so it doesn't feel so repetitive.
| bravura wrote:
| Make it slightly fancier and assign a "similarity" between
| users.
|
| Start with uniform similarity, and as preferences are made,
| adjust them.
|
| Then you have personalization.
| pocketarc wrote:
| I'm not sure what you mean by this - are you saying that
| the episodes that get displayed would be based on what is
| likely that other users would've picked for the same show?
| bravura wrote:
| I'm saying that instead of one objective ordering, you
| have a subjective per user ordering.
|
| The decisions made by other users get a weight assigned
| to them, which is individual to each logged in user. So
| every users viewpoint is personalized.
|
| (Apologies for the short explanation, I'm on mobile. If
| you want to ask more you can email me.)
| boomboomsubban wrote:
| Why does selection one get randomized if you say selection
| two is better, but then jump back to the original sequential
| if you say selection one is better again? That behavior felt
| bizarre.
| pocketarc wrote:
| Selection one jumps between the start and the end of the
| show repeatedly until you make it to the middle (selection
| two), after which point it'll move on to getting you to
| rank the best episodes, and then the worst episodes.
|
| Definitely looking into seeing if I can come up with
| something better though! The problem is making sure that
| whatever algorithm is picked remains as close to O(n log n)
| as possible. Randomising options in a way that makes
| require a lot more comparisons would be far worse.
| ysavir wrote:
| > Randomising options in a way that makes require a lot
| more comparisons would be far worse.
|
| For the algorithm, but not for the people taking time to
| do the rankings. Which do you want to prioritize?
| pocketarc wrote:
| I was prioritising for taking less time total, but you're
| right, that doesn't matter if the person gets bored and
| leaves. I'm tinkering with it now and I think I have a
| good solution to the problem. I'll be deploying it soon!
| gamerDude wrote:
| Maybe you can pull in episode rankings from another site
| to seed a first ranking and then make an algorithm to
| find where you disagree with the norm.
| bena wrote:
| I've done something similar.
|
| You're essentially doing A/B comparisons across the entire
| set.
|
| It looks like you have it so you're basically setting where
| "B" is before moving on to the next item.
|
| This isn't strictly necessary. You could just generate a
| novel pair every time and ask the user to choose between
| them. The thing is that you'd need a way to track a user. So
| you can make sure that user hasn't seen a certain pair
| already.
|
| Once you've exhausted all the pairs, you'll know exactly how
| to sort the array. You'll have an idea before.
|
| You might have an issue with circular lists though. People
| are fickle. You could have someone who says that A > B > C >
| A.
|
| In this case, I'd allow for repeat pairings after a certain
| amount of time. To allow the person to reevaluate
| essentially.
|
| You could also take the comparisons across all users and
| compile a general sort of "best of" ranking.
| pocketarc wrote:
| > The thing is that you'd need a way to track a user.
|
| The page you're on already knows what comparisons you've
| made (otherwise how it would move forward), so this is
| entirely possible!
|
| I've come up with a way to randomise it (by just picking a
| random element from an array of comparisons that haven't
| yet been made), that's the next step. I deployed it
| earlier, but there was a bug with it so I've had to
| rollback until I can look into it.
| bfdm wrote:
| Had this exact same experience. Picked the Simpsons just to try
| it out and in the first pair up was S1 stuff vs some episode
| from S18 I've never seen.
|
| So I picked The S1 episode. Then it was another S1 episode.
| Repeat. Then an S2 episode.
|
| My second option never changed from that first episode from S18
| so I never picked it. Perhaps I could have gone through 12-17
| seasons of episodes always picking option one until two
| episodes I've never seen went head to head.
|
| IMO, both options need to randomize each time until first
| pairings are exhausted.
| kyriakos wrote:
| It keeps giving me the same episode vs a different episode every
| time. Feels like I'm manually doing bubble sort :)
|
| Maybe randomising the selection would keep me going longer.
| FazJaxton wrote:
| I agree. I think using the transitive property to place
| episodes relative to others would help a lot. Also something
| like "pick your favorite of these 3-5" might go faster and make
| it feel more fun
| pocketarc wrote:
| I have been thinking about showing more than 2 options to
| help it go faster. On mobile I guess that would be quite
| difficult, but for people on bigger screens, yes, let them
| run through episodes as fast as they can.
| qurashee wrote:
| This reminded me of adaptive comparative judgement. I'd be
| interested in your algorithm on how you decide how to pair up
| items.
| pocketarc wrote:
| Thank you for that! Adaptive comparative judgment gives a name
| to something I've always believed, but never really quite put
| my finger on; that comparing things one to another is more
| reliable than random 1-10 ratings.
|
| As for the algorithm, it's a basic Quicksort, building on the
| work of Leonid Shevtsov[0].
|
| [0]: https://leonid.shevtsov.me/post/a-human-driven-sort-
| algorith...
| munch117 wrote:
| I think merge sort would provide a better experience.
|
| Quicksort can be great for human-comparison sorting if you
| let the user pick the pivot, and if you have a direct-
| manipulation interface for dividing a big pile into two
| smaller ones. Humans are great a scanning large numbers of
| objects, and can split piles much faster than operating one
| by one.
| pocketarc wrote:
| You are quite right. I had already been thinking about
| merge sort because it's guaranteed to lead to fewer
| comparisons, but what you said about piles would work great
| when combined with showing more episodes at once, asking
| the user "which of these 5 episodes is better" and getting
| those comparisons out of the way all at once.
| npinsker wrote:
| The website isn't really my thing (maybe I'm weird, but I already
| know my favorite 3-4 episodes of every show I've watched) -- but
| the writeup is stellar and has just the right level of detail.
| Using LLMs to generate episode summaries _and_ having a fallback
| plan is really going above and beyond to get a great UX. Great
| stuff.
| brucethemoose2 wrote:
| > I already know my favorite 3-4 episodes of every show I've
| watched
|
| I _did_ try the site, but the 2nd "static episode" of the algo
| was a fantastic finale (Venom of the Red Lotus), and it made me
| realize I'm kinda like this too. I already knew the best few
| episodes.
|
| I don't remember _every_ show, and not always by name, but I
| think I remember the best episodes of shows I would bother
| discussing with friends.
| pocketarc wrote:
| Thank you for the kind words, and for helping me get an idea of
| how my writing is coming across, I appreciate it a lot!
| m3kw9 wrote:
| It's always comparing to a same episode, not sure if that's by
| design, but it made it feel stale, or not fun. Also you need to
| also show progress bar if you are gonna compare shows across
| seasons, but I'd stick to comparing shows in a single season
| instead.
| pocketarc wrote:
| You're right about doing it by season, that would definitely
| make it a lot easier to just jump in and start, without making
| a big commitment to ranking a whole show.
| timhh wrote:
| You can do much better than just sorting. The simplest is to use
| Bradley-Terry. It's a very simple algorithm and will let you
| combine results from multiple users and gives an actual rating
| rather than just a ranking.
|
| It also handles the probabilistic nature of sorting better.
| Traditional sorting algorithms rely on comparisons being sensible
| (a>b and b>c implies a>b) but you probably won't get that if you
| use people.
|
| I explained it here:
|
| https://stats.stackexchange.com/a/131270/60526
|
| Quite closely related to matchmaking in computer games.
|
| I remember there was a website a while ago that used pairwise
| comparison to rank programming languages and I think whiskey.
| Does anyone remember this? I could never find it again.
| pocketarc wrote:
| Wow, this is extremely helpful, I had no idea this existed and
| will have to read up on it properly.
|
| I think my main concern would be: What would it be like for the
| first user to try to rank a show (as was the case for everyone
| today)? All probabilities would be 50-50, no? But if it's a
| show that's already been ranked at least once, then this could
| help immensely, if I understand correctly.
| timhh wrote:
| Due to the regularisation yeah they all start at the same
| rating. But you don't need many votes to start getting good
| ratings.
|
| I introduced this method to Dyson for objectively calculating
| very subjective measurements (e.g. "how frizzy does this hair
| look?"). We basically crowd sourced it to other engineers.
|
| I did a load of studies on different methods by ranking
| something that's sort of hard to rank but you know the answer
| to - I used 10 grey squares that only differed by 2/255 and
| you had to pick the brighter one.
|
| Some other things:
|
| 1. I don't remember the exact details but there's a slight
| extension of the method where you give each user a "how good
| are you" coefficient that you simultaneously solve for. This
| helps eliminate people that vote randomly, and also inverts
| the votes of people that deliberately pick the wrong answer
| (as long as they're consistently wrong).
|
| 2. You can put confidence limits on the values very easily
| too since it's a MAP estimate. Actually I showed curves for
| each item - basically how does the model probability vary as
| you sweep one rating up and down a bit. People didn't
| understand it at all though.
|
| 3. You can calculate the rankings incrementally very quickly
| (details in the answer) which means you can show users
| comparisons that give the most information. This usually
| means you end up showing users endless difficult choices
| which can frustrate them, especially if it's a forced choice.
|
| 4. I never found a principled way to incorporate a "they look
| the same" option. I tried some ad-hoc methods and IIRC a
| "much better, slightly better, can't tell, slightly worse,
| much worse" scale gave the fastest convergence but it was
| pretty unsatisfying that I just used some as hoc method to
| add the results.
|
| It was all closed source and I haven't worked there for years
| so the code is lost to the wind unfortunately.
| pocketarc wrote:
| This is honestly very interesting, thank you so much for
| elaborating! To be fair, after today, there are now nearly
| 400 TV shows with votes, so I can start seriously looking
| into this very soon!
| michaelrpeskin wrote:
| I did something similar, in fact, the math may be the same
| thing and just expressed differently. But when I've had to rank
| non-transitive things, I use Elo
| (https://en.wikipedia.org/wiki/Elo_rating_system)
|
| Many years ago when I was a mid-level developer at a
| dysfunctional company, I was senior enough to be invited to
| some "strategy" meetings, but junior enough that no one ever
| listened to me. We (engineering, sales, marketing, etc.) spent
| nearly an entire summer bickering over what "important"
| features we were going to schedule next. I finally got fed up,
| took everything out of the ticketing system and made random
| parings and had people vote on it. Then just like a chess
| match, updated their Elo score based on the outcome. Then I had
| anyone who cared play match-ups for as long as they wanted. We
| ended up getting a decent ordering of features and finally
| ended the summer of hell meetings.
|
| I don't know if the order was the correct order, I didn't stay
| around long enough to see. I was just happy that sales and
| marketing folks thought that I had some magic math that solved
| their problem, and I was happy to be back developing and not
| sitting in useless meetings.
|
| What I like about this is that you don't have to be self
| consistent, as long as on average you pick the best, it will
| bubble to the top. And you can mix the results of other voters
| and see what the "true" winner is. (Of course, to be fair, you
| have to give each person the same number of match ups, in my
| case, I just served match-ups to anyone who wanted to sit at
| the terminal and vote, so someone could have wasted an entire
| day and overwhelm the system - I didn't care at the time).
| throwaway143829 wrote:
| I'm both a casual TV viewer and someone with a short attention
| span. My experience was: I tried comparing a few Seinfeld
| episodes, found that I am unable to recall the episodes from the
| descriptions, then gave up.
| pocketarc wrote:
| Yeah, this thing requires a decent time commitment as it
| stands.
|
| One of the ideas floating around is to make it so you do this
| by season instead, to make it a bit more "quick casual fun",
| rather than having to rank the entire show all at once. Scoring
| 180 Seinfeld episodes as a casual TV viewer isn't going to be a
| great experience.
|
| Honestly, HN feedback has been immensely helpful, I couldn't be
| more thankful.
| tinyspacewizard wrote:
| There should be a way to exclude seasons (e.g. Simpsons)
| elektor wrote:
| This is pretty neat! For quite some time I've yearned for a tool
| like this to be able to rank my favorite songs of an artist
| (embedded Spotify 30 second clips?) and other custom media like
| comic strips.
| pocketarc wrote:
| Time for me to snatch musicsort.com or something. ;-)
|
| That would honestly be fun, great call!
| tnecniv wrote:
| I'm curious what your pics for best Frasier episodes are?
|
| Two of mine are when they become illegal caviar dealers and when
| Niles wants to try weed. The episodes with Lilith tend to also be
| very good. People love the Valentine's Day one, but I'm never a
| fan of that style of episode. I can appreciate the genius in it
| though
| pocketarc wrote:
| "Roe to perdition" is the caviar one, and it's fantastic as
| well (#20 for me). Niles trying weed was #1 for me, "high
| holidays". For me, #2 and #3 were "the doctor is out" (Frasier
| getting involved with Patrick Stewart) and "out with dad"
| (where Martin and Frasier go to the opera). The
| misunderstandings are what does it for me!
|
| Edit: But I'd honestly say that my whole Top 50 or so is great
| episodes, there's not a big gap between the top and any of
| them. It was hard ranking them all.
|
| Also, my list is at:
| https://tvsort.com/show/3452/matrix_01hjtxz2e1ewkrh44ja3mz0s...
| tnecniv wrote:
| All very great choices! "Dog army? What do you think that
| means?" Is an inside joke with me and my best friend from
| "High Holidays".
|
| I also agree that the ranking is hard. A remarkable thing
| about the show is how it's consistent throughout its run and
| doesn't really fall off in quality despite going for over a
| decade.
| pocketarc wrote:
| Frasier's "dear god" when he first sees goth Freddy as well
| is just terrific.
|
| "Well, thank you Lilith, for mentioning this little
| development!"
| xlbuttplug2 wrote:
| I'd assume this doesn't work too well with shows that aren't
| episodic in nature. Especially if you binge watch, the line
| between when one episode ends and the next starts is usually
| blurred.
| brucethemoose2 wrote:
| Highly serial shows definitely have standout episodes,
| strong/weak seasons and such.
|
| But I can see how this could be hard to remember.
| samstave wrote:
| This is super interesting, as over the break, I was thinking of
| something exactly like this, but for video games.
|
| Feel free to steal the following, if anyone likes:
|
| Take a scrape of video game data from multiple sources, such as
| steam, amazon, game sales, forum sizes on reddit, as an example,
| then rank the games based on these metrics - but then have people
| vote for "hall of fame"
|
| Include as much historic data one game sales for all time, if
| possible - as so many games were introduced during our formative
| years, and thus have a deeper, more memorable impact.
|
| --
|
| I see the problem others state. Perhaps have a random button to
| just give you a new selection.
|
| Great work though.
| brucethemoose2 wrote:
| The problem with multi user title voting is that it becomes a
| popularity contest, not an ostensible quality ranking. Steam
| itself has spent years trying to address this.
|
| TV episodes don't have that problem because each user has
| viewed all (or at least a sequence of) the episodes.
| samstave wrote:
| Good point, but I guess the real issue with video games vs
| shows - shows are passive, games are active - so your
| experience with a game is going to be way different than a
| show - you dont have to have had eye coordination for
| Seinfeld.
|
| :-)
| dcreater wrote:
| I think it's much better to do this for show vs show imo
| freetonik wrote:
| Wanted to say the same. I watch a lot of TV, but can barely
| recall particular episodes by name or even some screengrabs.
| But I'd be curious to see how my taste in shows themselves
| compares to other people's
| pocketarc wrote:
| Definitely an idea for the future, to expand this beyond just
| TV episodes!
| pocketarc wrote:
| I've just deployed a change that randomises selections -
| hopefully it helps address the biggest concern raised here so
| far. I'm curious to find out how people feel about it compared to
| the previous way it was doing things.
| bimblesticks wrote:
| Trying to rank the IT Crowd right now and the episodes keep
| refreshing before I click anything -
| https://tvsort.com/show/2490/matrix_01hk2y2csmeg1tdczs12wbz6...
|
| Am using Firefox with uBlock Origin if that affects anything
| pocketarc wrote:
| Thanks for the heads up! I shouldn't have tried to rush this
| while this is under heavy use - I've reverted the change, and
| it should be OK now!
| pattle wrote:
| This is neat. See also https://brickelo.com for the same thing
| but with LEGO minifigures
| maxlin wrote:
| This seems to be somehow broken for me. It auto-skips everything
| in about one second. If there's some point about a time limit
| being required it should be 5 seconds at the very minimum.
| pocketarc wrote:
| That's what I get for trying to rush a deployment during this
| HN period. ;) I've reverted the change, and it should be OK
| now!
| t_mann wrote:
| Fyi, on my first try, one of the two episodes was always the
| same, on my second try, it kept reloading two new episodes before
| I could make a selection in an infinite loop.
| pocketarc wrote:
| Sorry about that! Had to do with the latest deployment. It's
| okay now!
| bequanna wrote:
| Very cool!
|
| How are you getting/using ibdb data? I thought the API was locked
| down (?)
| pocketarc wrote:
| I wrote a blog post about it[0], but basically, I use the TMDB
| API to get all shows and episodes, which is free and has a
| generous rate limit.
|
| For the episode descriptions, I grab the plot summaries from
| IMDB and Wikipedia, with just HTML scraping, no APIs, and feed
| them to an LLM to get the 3 main plot points, so you don't have
| to read a bunch of rambling text when trying to quickly assess
| the episode you're looking at.
|
| [0]: https://pocketarc.com/posts/tv-sort-engineering-the-
| ultimate...
| kriro wrote:
| I like pubmeeple for ranking board games. You can import your BGG
| collection and it simply does pairwise comparison and presto, you
| get a top X list sorted. Simple but works very well. I think they
| also support TV shows and the like but maybe you have to input
| your own list.
| thih9 wrote:
| Congrats on the launch!
|
| Curious: how are you getting the data from e.g. TMDB? Was it a
| one time download or are you refreshing it?
|
| Feedback: Sometimes I didn't watch the latest season of a tv
| show; at the moment I'm being asked to rate episodes from all
| seasons. I'd like to rate episodes of a single season or up to a
| certain season. Alternatively: an option to skip an episode, or
| flag that I haven't seen it.
| pocketarc wrote:
| Thank you! The data from TMDB is cached when someone starts
| ranking a show for the first time. At the moment there is no
| system for refreshing it, but that's on the to-do list, so that
| if there are improvements to the data, they'll be fetched.
|
| Thanks for the feedback - I love the idea of a "I didn't watch
| it", that's super important. Maybe that could drop it out of
| the list entirely (since it can't count for anything).
|
| The "rate a single season" idea is one of the main things to
| come out of today, and it's where I'm going to take this next.
| When you land on a show, instead of a single "start ranking"
| button you'll have a list of all the seasons in the show, and
| be able to rank them individually. And since all these
| comparisons are stored in your browser, I can make it count
| toward your "full show" ranking automatically, so that if you
| ever get to that, you'll already have it in progress.
___________________________________________________________________
(page generated 2024-01-01 23:01 UTC)