[HN Gopher] Our recent server issues
___________________________________________________________________
Our recent server issues
Author : timetraveller26
Score : 94 points
Date : 2021-12-22 19:06 UTC (3 hours ago)
(HTM) web link (lichess.org)
(TXT) w3m dump (lichess.org)
| than3 wrote:
| I expect that what they've pointed to as the cause is only part
| of the problem. We'll never know the full picture unless they
| share it.
|
| Being upfront, my experience of the people in charge of the
| organization there doesn't have much goodwill left. Nothing
| against Thibault personally, I think he's done some great things
| but seems to be busy with whatever he's interested in and
| management isn't it, and he has some unprofessional people with
| access that work for the project/charity.
|
| I had volunteered my services years ago as a System Administrator
| (no charge) with suggestions, but with limited modes of
| communication, multiple issues going stale, no response with an
| auto issue closed. They have issues that don't get addressed, and
| their process doesn't appare to be aimed to cultivate qualified
| volunteers or improve the bus factor of the project.
|
| To make things worse, when I expressed disagreement with
| constructive feedback regarding one of their process decisions,
| one of the other dev's with access apparently took offense and
| the next day I found my lichess account had been edited by an
| admin without notice or notification. I could not log in (wrong
| password), the password and email for password recovery were
| changed, and trying to access the profile URI directly showed the
| account as banned.
|
| It seemed this was done out of spite, and definitely without any
| kind of due process. Appeals by email went unanswered within the
| 90 day cutoff I gave them. As a result I submitted a complaint to
| the french charities regulatory body and moved on since the group
| wasn't worth wasting any more of my time. I haven't heard back so
| who knows if anything came of what I reported.
|
| In my opinion, they've got more internal problems than they let
| on, and to me this is just spillover.
|
| Its unfortunate because any failure like this impacts so many
| people, but I don't find it surprising given my limited
| experience of the people there.
| schaefer wrote:
| Would you consider reading the excellent book "Working in
| Public: The Making and Maintenance of Open Source Software" by
| Nadia Eghbal?
|
| The data behind what "Open Source" projects look like differs
| from the popular culture narratives and assumptions about what
| they _should_ look.
|
| I doubt there was justification for disabling your player
| account. and I'm sorry that happened to you.
|
| But it sounds like your expectations about code contributions,
| and onboarding new volunteers may have been far from that
| project's reality.
| Shadonototra wrote:
| lichess is open source, if you want to contribute send your PR
| here: https://github.com/ornicar/lila
|
| i don't know what's your motive, but it doesn't seems to
| involve lichess's code ;)
| iliekcomputers wrote:
| I'm not sure I completely understand. They say that the only
| thing that was affected was the tournament because its events
| needed to be processed synchronously, but I remember the entire
| site being unavailable for people. Was that unrelated?
|
| On a side note, huge props to Lichess, the fact that they can
| compete with chess.com which has so many resources behind it is
| very impressive. Everyone who plays chess should consider
| becoming a patron.
| Santosh83 wrote:
| During the first crash immediately after the initial start of
| the tournament, the entire site did indeed go offline for a few
| minutes. Even address resolution failed. Then things went
| smoothly for about an hour after which the 2nd crash came. This
| one just seemed to affect the particular tournament
| (participants couldn't get fresh pairings) while the rest of
| the site was still working, as the article mentions.
| iliekcomputers wrote:
| Ah I see. That makes sense.
|
| Would be interesting to know what caused the entire site to
| go down in the beginning. Wonder if it was just too much
| traffic.
| nijave wrote:
| Anyone take a look into the code and see why it can't be
| parallelized? The bottom of the FAQ mentions that but I'd think
| at least certain aspects should be parallelizable or at least be
| prioritizable (like maybe forgoing leaderboard updates to focus
| on more important events?)
| jb_s wrote:
| Good idea. Tho in the code itself (from a cursory glance) in
| L141 _Sequencing(...)_ I see a bunch of nested maps - I don 't
| know Scala but I think this may be a performance issue? Rather
| than hyperfocusing on parallelism or event systems etc since
| that stuff is comparitively hard to solve maybe refactoring
| this function/algo at the core of the pairing would have more
| bang for buck
|
| https://github.com/ornicar/lila/blob/98691c8901cc0e7d0f338f4...
| jeremyjh wrote:
| If you think about it, you can't really generate pairings in
| parallel because each thread would need write access to the
| entire pool to ensure no one is paired twice and that you have
| a consistent view of all the results to that point before
| creating a new pair. You could maybe create a lock for each
| participant but that might actually be slower, and would
| definitely be more difficult to reason about and can lead
| towards bugs just as serious under load.
| bo1024 wrote:
| It would be very interesting to hear about the technical details!
| EarthIsHome wrote:
| Further down in the article, there's a more technical
| explanation under the heading "Can you elaborate on the
| technical issue?"
| jph wrote:
| > Eventually there was no way of keeping up with the queue
|
| A chess congestion pile up... sounds like an event stream rook-ie
| mistake. :-)
|
| Seriously congrats to Lichess for growing. It's an amazing site.
| Donate if you can.
| bryan0 wrote:
| Hikaru was live streaming the event so you can see the series of
| failures and how they affected the tournament here:
| https://youtu.be/YKfvNl8UoxA
| powera wrote:
| The "easy" solution is to put a cap on tournament size.
|
| Apart from when Agadmator wants his fans to be in the same
| tournament as Magnus Carlsen etc., there is basically no need to
| hold chess tournaments with over 1000 players.
| assbuttbuttass wrote:
| Sounds like they need backpressure. Isn't that the usual solution
| to a queue growing without bound?
| jeremyjh wrote:
| You mean you want to pause the games that are in progress? The
| events are created by people finishing their games and queuing
| for the next pair. You could limit total participants but only
| if you know the limit ahead of time. Nothing else makes sense.
| c0balt wrote:
| Load sheddding might also be a good solution
| progbits wrote:
| How would that work here?
|
| "Load" is players finishing games and requiring new match
| pairings / rating updates for the tournament to continue. You
| can tell them to wait, sure. But as long as the rate of games
| finishing exceeds the rate at which they can process them I
| don't see how that would improve the situation, the
| tournament would be stuck anyway.
|
| One option is to eg. limit the total number of players in the
| tournament up front but they explicitly said they didn't want
| to do that.
| bcrosby95 wrote:
| Depends upon of every event is strictly necessary for the
| proper functioning of the system, or if some are just nice-
| to-have.
| zaptheimpaler wrote:
| lol are you all just repeating scaling buzzwords? back-
| pressure, load shedding, whats next horizontal scaling or
| maybe blockchain?
| jeremyjh wrote:
| This is where it helps to have domain knowledge before
| pontificating on someone else's architecture. It is an
| arena tournament. If you want people to join the
| tournament and get paired with games after finishing
| their current game you'll need those events. There are no
| "nice to haves".
___________________________________________________________________
(page generated 2021-12-22 23:00 UTC)