hngopher.com

       [HN Gopher] Implementing the Elo Rating System (2020)
       ___________________________________________________________________
        
       Implementing the Elo Rating System (2020)
        
       Author : pmontra
       Score  : 88 points
       Date   : 2021-02-12 16:55 UTC (6 hours ago)
        
 (HTM) web link (mattmazzola.medium.com)
 (TXT) w3m dump (mattmazzola.medium.com)
        
       | vmception wrote:
       | Nice use for gaming
       | 
       | Crowd-behavior ranking systems should be banned for human
       | relationships, where the person doing the ranking didn't intend
       | to send a message to the crowd
        
       | noir_lord wrote:
       | Glicko2 is a better system.
       | 
       | https://en.wikipedia.org/wiki/Glicko_rating_system
        
         | andrewzah wrote:
         | There's also the TrueSkill rating system from Microsoft
         | (developed for Halo), which claims to be better than Glicko:
         | 
         | "Glicko was developed as an extension of ELO and was thus
         | naturally limited to two player matches which end in either win
         | or loss. Glicko cannot update skill levels of players if they
         | compete in multi-player events or even in teams. The logistic
         | model would make it computationally expensive to deal with team
         | and multi-player games. Moreover, chess is usually played in
         | pre-set tournaments and thus matching the right opponents was
         | not considered a relevant problem in Glicko. In contrast, the
         | TrueSkill ranking system offers a way to measure the quality of
         | a match between any set of players."
         | 
         | https://www.microsoft.com/en-us/research/project/trueskill-r...
        
           | HeyImAlex wrote:
           | I noticed trueskill is pretty effective at matching people
           | together, but the stickiness of rankings makes it frustrating
           | if it's user facing. People want to see the number next to
           | their name change after 4 consecutive wins.
        
             | postalrat wrote:
             | I think most people would want to see a significant boost
             | if you won 40 out of 50 games. Winning 4 out of 4 doesn't
             | mean much.
        
           | monkeybutton wrote:
           | First, a good article about TrueSkill I found when looking
           | for systems to rank pairs of foosball players:
           | http://www.moserware.com/2010/03/computing-your-skill.html
           | 
           | Second, apparently Tinder uses an ELO ranking system for
           | their algorithm. I would add a link but everything I'm
           | finding online right now is very clickbait-y.
        
             | bmm6o wrote:
             | I'd be interested to read more about Tinder's application.
             | Off the top of my head, i don't see how it would apply to a
             | dating app.
        
               | monkeybutton wrote:
               | >What was it, though? It was a part of our algorithm that
               | considered how others engaged with your profile. ... this
               | part of our algorithm compared Likes and Nopes, and was
               | utilized to show you potential matches who may be a fit
               | for you, based on similarities in the way others would
               | engage with profiles. Based on those profile ratings you
               | received, there was a "score" -- in the sense that it was
               | represented with a numeric value in our systems so that
               | it could factor into the other facets in our algorithm.
               | 
               | From: https://blog.gotinder.com/powering-tinder-r-the-
               | method-behin...
               | 
               | Its not a deep description at all, but it sounds like
               | when someone viewing your profile is treated as a game,
               | which way they swipe is a win or loss for you and your
               | change in score depends on their score as well? Basically
               | if a desirable person swipes right on your average
               | profile, its indicating you are desirable and moves you
               | up more than say the inverse scenario where an average
               | profile swipes right on a desirable profile (I guess this
               | would be akin to a low ranked player losing to a high
               | ranked player).
               | 
               | Edit: I guess the outcome would be people of equal (as
               | perceived by others) attractiveness being matched.
        
           | uxp100 wrote:
           | Is the issue multiple players or is it teams? I didn't have
           | any problem implementing ELO for multiple players in games
           | where there can be multiple losers but one winner. I modeled
           | it as simultaneous wins for the winner, with a K-factor
           | divided by the number of players.
           | 
           | But I wasn't concerned with being mathematically rigourus,
           | maybe there is some glitch with this approach? It was for an
           | EDH group. (BTW, don't do this. We no longer play EDH. It
           | makes the spiky players even spikier, and then they get
           | saltier when they're targeted as they have a far higher ELO)
        
             | reificator wrote:
             | > _It was for an EDH group. (BTW, don 't do this. We no
             | longer play EDH. It makes the spiky players even spikier,
             | and then they get saltier when they're targeted as they
             | have a far higher ELO)_
             | 
             | Yes please keep tournaments and rankings away from EDH for
             | the love of god. CEDH is a different story because it's a
             | completely different mindset, but if I sat down for normal
             | EDH and was told we were tracking Elo I'd run far, far
             | away.
        
               | uxp100 wrote:
               | It was mainly to track strength of different decks we all
               | played, since you could check ELO of either. It was the
               | strongest player who was upset by it, since once people
               | realized he won 50% of 3 and 4 player games he was always
               | targeted. But yeah, I wouldn't do it again.
        
             | leetcrew wrote:
             | I think the issue is that, outside of formal tournaments,
             | there are rarely stable teams competing against each other
             | in typical online multiplayer. each team is a random
             | assortment of players, where anywhere from zero to a full
             | team are queuing together. people might queue with a friend
             | of a very different skill level. you need to track the
             | skill level of each individual player and then estimate the
             | win probability for a team of players who may never have
             | played with each other before. I play a lot of csgo, and
             | while far from perfect, I'm surprised how good the
             | matchmaking is. I win pretty close to 50% of my matches.
        
           | porphyra wrote:
           | For certain types of multiplayer games the recent Elo-MMR is
           | even better (faster, provable guarantees) [1][2].
           | 
           | Quote from the paper regarding TrueSkill:
           | 
           | > The main disadvantage of TrueSkill is its complexity:
           | originally developed by Microsoft for the popular Halo video
           | game, TrueSkill performs approximate belief propagation on a
           | factor graph, which is iterated until convergence. Aside from
           | being less human-interpretable, this complexity means that,
           | to our knowledge, there are no proofs of key properties such
           | as runtime and incentive alignment. Even when these
           | properties are discussed, no rigorous justification is
           | provided. In addition, we are not aware of any work that
           | extends TrueSkill to non-Gaussian performance models, which
           | might be desirable to limit the influence of outlier
           | performances.
           | 
           | [1] https://github.com/EbTech/Elo-MMR
           | 
           | [2] https://arxiv.org/abs/2101.00400
        
             | andrewzah wrote:
             | It is fascinating to see improvement in this space.
             | 
             | People -really- care about good ranking systems. I
             | personally saw this when I administrated rankings for a
             | local fighting game scene. Back then we used Glicko since
             | matches were 1-on-1.
        
             | meheleventyone wrote:
             | Trueskill is also under patent. Which is considerably more
             | off putting than its rigor.
        
         | bluecalm wrote:
         | Depends what you mean by better. I agree Glicko approximates
         | your true skill level better and faster. It's also not as fun
         | as ELO in my view. With ELO when I win a few games in a row I
         | get an exciting chance to play against much better players than
         | usual. This is a fun challenge. It's necessary to learn new
         | things. It gives you highs and lows. It's just fun.
         | 
         | With Glicko, once it pinpoints your true skill level, it feels
         | like 2 meters of mud around you. You will never get out of it
         | and you will be forever matched vs the same group of players
         | stuck near you.
         | 
         | So yeah, I don't like Glicko. I have quitted services over it
         | (even something like chess problem solving website). It's just
         | not fun imo.
        
           | bitshiftfaced wrote:
           | It sounds like it's not that you don't like Glicko but rather
           | you'd prefer more variety in matchmaking. You can accomplish
           | that easily by changing the way matchmaking picks your
           | opponent.
        
         | porphyra wrote:
         | Glicko2 has a fatal flaw for massively multiplayer games such
         | as Pokemon Go. Basically, by intentionally losing certain
         | games, you can increase the volatility in your rating, and then
         | you can win a few games in a row against low ranked players and
         | leapfrog people who are actually consistently trying to win.
         | 
         | See:
         | 
         | [1] Farming Volatility: How a major flaw in a well-known rating
         | system takes over the GBL leaderboard.
         | https://www.reddit.com/r/TheSilphRoad/comments/hwff2d/farmin...
        
           | bitshiftfaced wrote:
           | This further strengthens my belief that Glicko 1 is much more
           | preferable to Glicko 2. It's simpler (no volatility
           | parameter), nearly as predictive (if not just as predictive),
           | and it wasn't specifically designed for a monthly chess
           | schedule, so it fits online games much more easily.
        
           | syspec wrote:
           | Does this flaw only apply to massively multiplayer games?
        
       | carbocation wrote:
       | Not sure that I've seen this addressed, but I view Elo as an
       | approximation to a PageRank-like stationary distribution. Have
       | others thought about this more formally?
        
         | hogFeast wrote:
         | As someone else has mentioned, the time-dependent nature is
         | actually pretty important because PageRank (and there are a few
         | others similar to this, Massey ranking is one...I can't
         | remember them all) depends on every team playing each other
         | team at some point and/or playing each other a certain number
         | of times.
         | 
         | So, in practice, stuff like PageRank is fairly brittle, and ELO
         | tends to work better.
         | 
         | I also don't think PageRank is approximation to ELO. PageRank
         | are just eigenvectors so (I believe) rankings are proportionate
         | to each other. This happens with ELO but to a lesser degree
         | because you aren't necessarily looking at all the results for
         | all other teams at all times. A lot of information is embedded
         | into an ELO rating but you are updating match-by-match (I
         | believe, I have done a lot of work with ELO and learned how to
         | modify it so understand it well...I have far less experience
         | with the matrix-based methods). So the practical advantage of
         | ELO is actually a theoretical one, imo.
         | 
         | The best way to think about ELO is Bayesian updating: you start
         | with a prior about team skills, you use this to create a
         | forecast, and then update your ratings based on the actual
         | result. Comparisons are Kalman filters, Markov Chains or MCMC,
         | etc.
         | 
         | I will add ELO is very powerful. Yes, Glicko is better, ELO is
         | a special case of Glicko but ELO is also far simpler than
         | Glicko. Simple ELO models beat complex regression most of the
         | time, it is remarkable.
         | 
         | Imo, it is the Bayesian-esque updating that works so well. And,
         | if you understand what ELO is at this level, you can split this
         | part of the model off and use it with regression or whatever
         | you want. It is truly amazing though.
        
           | martin_balsam wrote:
           | Super nit-picky comment, but it's Elo rating, not ELO. It's
           | not an acronym, it's named after the physicist Arpad Elo
        
         | morelandjs wrote:
         | Elo is essentially a time dependent graph algorithm. In a
         | static scenario, Elo probably converges to something like
         | pagerank, but I'm sure there are differences. It is worth
         | mentioning that pagerank has been applied to sports in a
         | similar manner as Elo.
        
         | nextos wrote:
         | Microsoft Research developed TrueSkill [1], where they frame
         | the problem as a Bayesian inference one.
         | 
         | Incidentally, TrueSkill is implemented in Infer.NET, which IMHO
         | doesn't get the attention it deserves. An open-source state-of-
         | the-art factor graph engine with amazing performance. When
         | dealing with many kinds of problems, including those involving
         | discrete variables, I haven't found anything that compares in
         | terms of speed and covergence.
         | 
         | [1] https://www.microsoft.com/en-
         | us/research/uploads/prod/2018/0...
        
           | 1980phipsi wrote:
           | Probably could use something like this for looking at sports
           | teams.
        
           | oscargrouch wrote:
           | > TrueSkill is patented,[3] and the name is trademarked,[4]
           | so it is limited to Microsoft projects and commercial
           | projects that obtain a license to use the algorithm.
           | 
           | Capitalism sometimes leads to stupid outcomes as this big
           | issue with "what is property".
           | 
           | Imagine if physicists patented their equations, forcing
           | universities, books, movies, wikipedia, etc.. all to pay for
           | the privilege of using their equations. The world would still
           | be in the dark ages with just a privileged social class with
           | access to advanced knowledge.
           | 
           | Not to mention this will bar a totally valid shot of others
           | human beings of getting into the same result while trying to
           | solve this problem, giving they are using a math toolbox that
           | is generally available to anyone.
           | 
           | Corporations should have the right to explore something they
           | worked on, of course, but look at this example where
           | Microsoft already implemented and still profit from their
           | invention/discovery.
           | 
           | Using the legal system and not creativity to stop everyone
           | else to progress using not just their knowledge, but the
           | knowledge that can come from everywhere its pure stupidity as
           | its not serving towards a collective good.
           | 
           | We are in the middle of the COVID outbreak right now, imagine
           | if this sort of adversarial view were prominent in the world,
           | how many more people would die if basic techniques about
           | making vaccines were patented?
           | 
           | This is just utterly stupid and it's another proof that the
           | system is broken.
        
             | rland wrote:
             | We don't need to imagine:
             | 
             | https://www.doctorswithoutborders.org/what-we-do/news-
             | storie...
             | 
             | > So far, pharmaceutical corporations and other
             | manufacturers of products needed to combat COVID-19 have
             | not shown any willingness to take a different approach
             | during the pandemic to ensure the necessary broad access to
             | needed products
        
               | oscargrouch wrote:
               | > Furthermore, there has been an astonishing number of
               | patents filed for COVID-19 vaccines in development,
               | including more than 100 for the mRNA platform technology
               | 
               | I hope there someone with enough intelligence out there
               | to understand that if there's a time to rethink the
               | rules, the time is now.
               | 
               | Even for people that are only thinking in terms of
               | profits and economy, if too many people die, there will
               | be no economy at all..
               | 
               | People on top of social structures: government,
               | corporations, etc, are working like pavlovian dogs
               | unaware that the rules of the old normal doesn't apply to
               | the new normality. Or its just a matter of a corrupt
               | system that is organically formed over power and money
               | that is completely oblivious about the values they should
               | all stand for.
        
           | [deleted]
        
           | morelandjs wrote:
           | There are a lot of similarities between classic Elo and
           | Bayesian statistics. You essentially construct a prior for
           | each comparison and update your prior by its deviation from
           | the observation. Does true skill just do this in a more
           | rigorous fashion according to Bayes' theorem?
        
       | chairmanwow1 wrote:
       | One of my favorite Elo bugs I came across was a bug in a game
       | called age of empires 2: DE.
       | 
       | Whenever my friend and I played online together, we would
       | sporadically get absolutely creamed. We started looking up the
       | stats of the others players we were matched with and noticed that
       | on those bad games the average Elo of our opponents would jump to
       | >2k Elo vs our barely 1k.
       | 
       | Both of us being software engineers, eventually the conversation
       | turned to "What the hell do you think is going on here? How does
       | a matchmaking bug like this crop up?"
       | 
       | Eventually we came to the conclusion that they must be summing
       | out Elo for the Elo of our matchmaking party rather than
       | averaging it.
       | 
       | Eventually the issue was patched and our worst suspicions were
       | confirmed that indeed multiplayer party Elos were summed instead
       | of averaged.
        
         | jsight wrote:
         | How would that make a difference? Or were the party sizes
         | different between you and your opponents?
        
           | chairmanwow1 wrote:
           | So our opponents were queueing as individuals and were highly
           | skilled. So their "party Elo" was genuinely 2k. Parties would
           | be selected to take place in a match (2v2 / 3v3 / 4v4) of
           | similar party Elos.
           | 
           | We would either get paired with another couple 2k players
           | against a full team of 2k (4v4 match) or in the worst case a
           | 2v2 against 2k players.
           | 
           | In the 2v2 case, because of the Elo difference the opposing
           | side was expected to win >99.99% of the time.
        
             | bitshiftfaced wrote:
             | Since there are an equal number of players on both teams,
             | why would it matter if the system used the average or the
             | sum of the players' Elo? Wouldn't both methods have the
             | same outcome?
        
           | ivarv wrote:
           | Two players of similar skill would probably get creamed by a
           | team with a large skill disparity.
        
             | jsight wrote:
             | I guess, but (making up numbers here) two players with 750
             | skill would be 1500 or an average of 750.
             | 
             | A sum might match them up against two players with 1300
             | skill & 200 skill due to the 1500 sum. OTOH, the average
             | would match them up against... an average of 750, so the
             | same people?
             | 
             | I'm sure I'm just missing something really fundamental
             | about how this all works.
        
         | postalrat wrote:
         | They shouldn't be summing or averaging. They should be using
         | the max elo of the players in your party for every member of
         | your party.
        
         | bitshiftfaced wrote:
         | > Eventually we came to the conclusion that they must be
         | summing out Elo for the Elo of our matchmaking party rather
         | than summing it.
         | 
         | I think I'm missing something. Could you rephrase this?
        
           | SamBam wrote:
           | *rather than averaging it, I assume.
        
             | chairmanwow1 wrote:
             | Yes. Thank you. *Averaging. Thought I caught all the
             | mistakes.
        
         | hwjwill wrote:
         | I also noticed it when I play AoE: 2 DE as I only play
         | multiplayer game. I initially thought it's just because there
         | aren't enough players so they match us with someone else much
         | higher. However I only notice it in Quick Play where ELO is not
         | affected instead of Ranked games.
        
         | TameAntelope wrote:
         | The sentiment you're expressing is the idea that one 2k player
         | is so much better than two 1k rated players that it doesn't
         | make sense to simply sum up each team's rating and keep it
         | even, which is the false assumption that the AoE 2 devs made.
         | Elo is _not_ linear.
         | 
         | However I don't think they'd have fixed it just by averaging
         | the Elo ratings, they must have implemented something more
         | complex, like a proximity based algorithm, otherwise if the
         | teams were the same size you'd hit a similar problem with an
         | average as with a sum. A 2k player + a 0 rated player would sum
         | to 2k and average to 1k. You and your friend would also sum to
         | 2k and average to 1k, again assuming even team sizes.
         | 
         | That said, still very funny!
         | 
         | Also, fun fact: Elo isn't an acronym/initialism, it's someone's
         | name, Arpad Elo (the inventor).
         | 
         | Also also, I don't _think_ Elo is really a good idea for
         | multiplayer (more than 1v1 that is) games generally. Don 't
         | know enough to say that authoritatively, but using it for
         | something like AoE 2 is likely prone to all kinds of strange
         | unexpected outcomes.
        
           | jerf wrote:
           | "The sentiment you're expressing is the idea that one 2k
           | player is so much better than two 1k rated players that it
           | doesn't make sense to simply sum up each team's rating and
           | keep it even, which is the false assumption that the AoE 2
           | devs made. Elo is not linear."
           | 
           | It's even better than that. Elo is purely relative. The
           | difference in score between two opponents is what is supposed
           | to predict the likelihood of victory for the two. This means
           | that you can add any constant you like to the entire Elo pool
           | without affecting the Elo rankings at all.
           | 
           | As a matter of aesthetics, when initializing an Elo pool it
           | is nice to try to start with values that will result in
           | nobody having a negative value, because that gives humans the
           | feelbadz, but that's not mathematically necessary; you can
           | run the whole thing deep in the negative billions if that
           | floats your boat.
           | 
           | The Elo math is very much based on two players playing head-
           | to-head. It would be completely broken for almost any other
           | scenario, as multiplayer introduces new degrees of freedom
           | that Elo has nowhere to put. By that I mean that there may be
           | one game for which the team's skill is essentially the max of
           | the members, another for which is it essentially the minimum,
           | and a wide variety of things in between. Elo has nowhere to
           | put that "wide variety", and it was never designed or
           | intended to do so, so that's not a criticism of it. It just
           | isn't what it was for.
        
           | wongarsu wrote:
           | Overwatch implements an Elo system for 6v6 competitive
           | multiplayer. Over the years they've had to make a lot of
           | tweaks to make it kind of work. For example groups are rated
           | by average Elo, but the system doesn't allow you to cause
           | large Elo differences within a team. Also it differentiates
           | between "premade" groups and groups assembled by matchmaking,
           | since the former are assumed to be more coordinated. It also
           | tries to be really smart about how Elo points get distributed
           | within the team on win/loss (which causes a heap of other
           | problems).
           | 
           | Overall you can make it somewhat work, but Elo really isn't
           | well suited for anything larger than 1v1
        
           | lordnacho wrote:
           | Seems to me that you need to use the max of the team rather
           | than the average. The better player can tell the less good
           | player to do, up to a point.
        
             | maybeOneDay wrote:
             | This is what rocket league uses. While it can be unpleasant
             | as a high ranked player to jump in a lobby with your much
             | less skilled friend, overall the matchmaking system and
             | equality of matches is in my opinion utterly fantastic
        
             | jerf wrote:
             | I'd expect "geometric average" to be one of the closest
             | simple statistics for most games, assuming that you have
             | some ranking that can be treated linearly.
             | 
             | It would be generally unclear that rankings can be treated
             | linearly. A subtle aspect of Elo is that as the disparity
             | between two players increases, the Elo metric essentially
             | becomes less and less certain that there is any
             | relationship between the skill of the players at all, which
             | is probably a bit deeper way of understanding "why" .
             | Eventually you'll reach a point where if you round to
             | integers, the winner winning will result in an increase of
             | 0 points. (In fact one thing to watch out for is how you
             | round; it's easy to end up with a wandering Elo basis if
             | you truncate both sides rather than rounding properly, for
             | instance.)
             | 
             | It's fairly trivial to show that if you take a pool of the
             | worst players and the best players, Elo will eventually
             | converge to those two pools having about that amount of
             | distance between them, but by inserting a fresh pool of
             | middling players into the pool, the distance between the
             | best & the worst will increase despite their skill not
             | changing. If skill is linearly distributed... assuming you
             | can even define _that_... Elo ratings might be somewhat
             | reasonably treated linearly, but skill is very unlikely to
             | be linearly distributed.
             | 
             | Basically, I like Elo when used properly and I have used it
             | to good effect a few times myself, but it is a bit
             | dangerous in that it offers a _number_... but that number
             | lacks many properties we associate with integers. (For
             | instance, in another posting I point out you can shift the
             | entire Elo pool by any constant you like without changing
             | the system. Integers do not have this property.) They have
             | a definite meaning, but it isn 't what your intuition might
             | suggest. You really shouldn't do any arithmetic on them.
             | They only make sense in the context of the Elo computation
             | itself.
        
         | jayd16 wrote:
         | Average just causes other issues. Teams should probably have
         | their own rating not derived from the individual elo.
         | 
         | But honestly, elo is a shit system for fun matchmaking anyhow.
        
           | buzzerbetrayed wrote:
           | Why is it shit and what system is better?
        
           | elcomet wrote:
           | Why do you say it's shit ?
        
             | smogcutter wrote:
             | One issue is that once you find your level, Elo should keep
             | you around a 50/50 win rate. So it doesn't feel like you're
             | making progress or improving even if in a real sense you
             | are.
             | 
             | It's also a bad fit for team games. But ranking individual
             | contribution to a team is _hard_. Even for something like
             | the NBA where there's obviously a ton of incentive to do
             | this, statistical methods rely on a large sample size and
             | often need to be taken with a grain of salt.
        
               | Kranar wrote:
               | Most multiplayer games I know of, such as StarCraft, use
               | a different score for single player and multiplayer
               | games. You might have an amazing single player Elo score
               | but a crappy multiplayer score and vice-versa.
        
             | jayd16 wrote:
             | Once you reach the top there's much less reason to keep
             | playing. It doesn't have a great way to handle multiplayer
             | situations. It doesn't handle confidence as well as some
             | other systems ie it takes a while to skill test a player.
             | 
             | https://en.wikipedia.org/wiki/Elo_rating_system#Practical_i
             | s...
             | 
             | Trueskill and many others are often used instead.
             | 
             | https://en.wikipedia.org/wiki/TrueSkill
             | 
             | The biggest issue is fun is not the same fair and its a lot
             | more complicated to tune for fun.
             | 
             | You could, for example, play 100 matches where you were
             | given the closest opponent but at a disadvantage for every
             | single match. There's a lot of other scenarios around how
             | you want to fold in ping based match making and how that
             | smaller pool effects the fairness of the matchhing.
        
               | xmprt wrote:
               | What's the open source alternative to TrueSkill? It seems
               | really interesting but it's patented by Microsoft.
        
               | wizzwizz4 wrote:
               | They can't patent the _general idea_ , so just base
               | something off the information in the Wikipedia article.
               | At time of writing, I don't _think_ doing that would
               | infringe their patent.
        
               | ummonk wrote:
               | Glicko
        
               | maybeOneDay wrote:
               | I accept your premise and the fact that matchmaking is
               | hard but:
               | 
               | > You could, for example, play 100 matches where you were
               | given the closest opponent but at a disadvantage for
               | every single match.
               | 
               | not really, you'd derank aggressively and end up playing
               | people at a lower level and presumably winning
               | comfortably after some time.
        
               | [deleted]
        
               | jayd16 wrote:
               | It all depends on who happens to be playing the game to
               | match with but you're right. Ideally you would quickly
               | hit a good match and win.
        
           | tehjoker wrote:
           | That's what they do in Starcraft 2 with MMR.
        
       | lindig wrote:
       | How dependent is the result on the order in which players play?
       | Assuming who beats whom is fixed, how much do results vary over
       | the order of the tournament?
        
         | narag wrote:
         | Aren't the calculations done always with the initial points?
         | I'm very curious to know if that's not the case.
        
         | hogFeast wrote:
         | I believe order is important. The way to verify this is by
         | running a simulation with players whose skills are all equal
         | but vary with some normal distribution, and you will find that
         | skill levels to do move about quite a bit even though the
         | underlying skill level is fixed (I am thinking about order as
         | just random shuffling here, so if we take a dataset where there
         | is just randomness and in converges quickly then order doesn't
         | matter...this doesn't really happen with ELO).
         | 
         | This isn't to say that rankings don't converge. They do. But
         | things like order definitely can get in the way. Glicko
         | controls for this by modelling uncertainty in a skill estimate
         | (iirc, proportional to the number of matches seen). In
         | practice, you would tend to burn the first few matches using
         | ELO to try and get rankings to converge to a limit.
         | 
         | But, again, even then I have found that issues. ELO will
         | quickly identify the worst and best but there is a lot of
         | volatility around the middle grouping of teams. And this does
         | vary with the underlying activity you are modelling too:
         | activities with a lot of randomness take longer to converge.
         | You can also control convergence with the K-factor.
         | 
         | I think this is one of the trade-offs that comes with the time
         | dependent nature of ELO. You get more variance but less bias.
         | Imo, it usually isn't a huge issue in practice because the
         | upside of this is your method isn't static, and your ranking
         | can respond to things like lineup changes (which tend to occur
         | in many applications)...it works well in most applications.
        
       ___________________________________________________________________
       (page generated 2021-02-12 23:01 UTC)