[HN Gopher] Optimizing the Lichess Tablebase Server
       ___________________________________________________________________
        
       Optimizing the Lichess Tablebase Server
        
       Author : cristoperb
       Score  : 210 points
       Date   : 2024-07-12 22:14 UTC (1 days ago)
        
 (HTM) web link (lichess.org)
 (TXT) w3m dump (lichess.org)
        
       | hocuspocus wrote:
       | I know it's not a fair comparison but I'm truly impressed by the
       | quality of engineering shown by the Lichess team, when their main
       | competitor was for example boasting about a migration to GCP and
       | yet suffering from repeated outages due to fairly organic growth
       | in popularity. While I believe they employ 100x more people.
       | 
       | Lichess' mobile app was a weak spot, however the v2 rewrite in
       | Flutter is already pretty good while still in beta.
       | 
       | And keep in mind Thibault pays himself less than 60k/year.
        
         | peter_retief wrote:
         | Lichess is a great service to casual chess players like myself
         | to get a quick game against another human. Never much of a
         | wait.
         | 
         | What I do want to know is how does one pronounce Lichess? Lie
         | chess, Le chess?, League chess?
        
           | hocuspocus wrote:
           | /li:/ as in libre.
        
           | jffry wrote:
           | According to https://lichess.org/faq#name: "Lichess is a
           | combination of live/light/libre and chess. It is pronounced
           | lee-chess"
           | 
           | They also link this video:
           | https://www.youtube.com/watch?v=KRpPqcrdE-o
        
             | tecleandor wrote:
             | I guess it's because of the lychee fruit?
        
           | ycombinete wrote:
           | I'm team lie-chess.
        
         | sgt wrote:
         | I don't think he needs to feel bad about increasing his salary.
         | Make it 200k/yr and make his life easier, which can only be
         | good for the project long term.
        
           | hocuspocus wrote:
           | I don't know him personally but from the talks he's given, he
           | seems to be ideological about Lichess and his own lifestyle,
           | in a way that would be considered fairly anti-capitalistic by
           | most of the HN crowd :)
        
             | treyd wrote:
             | Do you have links to any of these talks you could
             | recommend?
        
               | heap_perms wrote:
               | Not OP but I can recommend this talk by Thibault (the
               | founder): https://www.youtube.com/watch?v=LZgyVadkgmI
        
           | epidemian wrote:
           | IDK about France (where Thibault is from, and IDK if he lives
           | there), but where i'm from, you would have a _very_
           | comfortable life earning 5k every month, so his self-imposed
           | 60k /yr salary doesn't seem unreasonable at all. At some
           | point, more money yields diminishing returns.
        
             | diggan wrote:
             | > but where i'm from, you would have a very comfortable
             | life earning 5k every month, so his self-imposed 60k/yr
             | salary doesn't seem unreasonable at all.
             | 
             | (Some) HN commentators seems weirdly out of touch when it
             | comes to salary outside of IT-heavy cities in the US. The
             | other day someone claimed $125k/year for an employee wasn't
             | "big money"
             | (https://news.ycombinator.com/item?id=40927175), so I'd
             | take any comments saying some salary is high/low with a box
             | filled with sand.
        
               | AQuantized wrote:
               | To be fair that really isn't 'big money' in most of those
               | cities, assuming big money has some connotation of
               | significantly above average after tax and expenses
               | disposable income in those areas, especially relative to
               | your peers. I don't think it would be unfair to say that
               | would be big money compared to many European workers in
               | the same jobs though.
        
             | hyperman1 wrote:
             | I don't know if that 5K is before or after taxes. You
             | easily lose half of what your employer actually pays.
        
               | maccard wrote:
               | EUR60k pre-tax is roughly in the top 10% of incomes in
               | the country based on a quick google. Not opulent, but
               | definitely comfortable.
        
               | hocuspocus wrote:
               | His salary is more like EUR55k though.
               | 
               | It's comfortable outside of Paris and other expensive
               | cities. But he could easily double that given his
               | background. Before quitting his job he already worked
               | with Play and the Typesafe (now Lightbend) stack before
               | the peak of its hype, when companies were paying top
               | dollar for consultants.
        
         | epolanski wrote:
         | I think you're highly overestimating how many devs Chess.com
         | has
        
           | hocuspocus wrote:
           | I am not, that's why I said employees not devs.
        
         | Sesse__ wrote:
         | Lichess is a great example of how efficient Wikipedia should
         | have been (both on the code and organization level). :-)
        
       | aeyes wrote:
       | Did they have to reduce cost or is there any other reason to not
       | stick 20TB of SSDs in a box and call it a day? 4TB SSDs only cost
       | ~$300, even HP or Dell SFF drives aren't much more expensive.
       | 
       | I guess they were interested in doing the testing and
       | optimization for fun. From a product standpoint I probably would
       | have invested my limited time in other projects.
        
         | broodbucket wrote:
         | Lichess is a non-profit with a lot of volunteers, they probably
         | don't have the same time vs hardware cost balance as most for-
         | profit companies do
        
           | traceroute66 wrote:
           | It is important not to automatically make assumption that all
           | non-profits are impoverished and run by volunteers.
           | 
           | One of the most famous examples is Wikipedia.
           | 
           | Technically yes, they are a non-profit. Impoverished ?
           | Certainly not !
           | 
           | Look at the financials, as others have already pointed out.
           | Especially if you are in the habit of donating to non-
           | profits, the financials can make for interesting reading.
        
             | r0ks0n wrote:
             | french detected
        
         | BSDobelix wrote:
         | >testing and optimization for fun
         | 
         | In no other industry a engineer would think like that...except
         | in IT.
         | 
         | We definitely have too powerful and cheap Hardware, combined
         | with lazy Wetware who just wants to "call it a day"....be proud
         | of your work....or so they say.
        
           | chronogram wrote:
           | Not calling it a day anywhere is why Lichess is such a good
           | website.
        
           | aeyes wrote:
           | Most things in life are a compromise and it's easy to get
           | tempted to find the perfect solution instead of spending your
           | time on actually moving forward.
           | 
           | In all industries there is always something you can do better
           | if only you spend more time. But at most places time is worth
           | money and I'd say $3000 for a few SSDs is little enough to
           | not make this worth my time.
        
             | BSDobelix wrote:
             | >$3000 for a few SSDs is little enough to not make this
             | worth my time.
             | 
             | Yeah ok i got it, you are so superior that it's not worth
             | your time in finding the performance problems but throw
             | hardware on it. So happy you don't work on any Operating-
             | system related project or anything that has a massive
             | infrastructure.
             | 
             | PS: I know $3000 dollars are your monthly Uber-Eat costs,
             | but not everyone is that loose with money.
        
           | WJW wrote:
           | You think engineers in other industries won't sometimes
           | choose the more exciting option when a boring but well-
           | understood one would do the trick? That's definitely not true
           | in (at least) mechanical and electrical engineering from what
           | I've seen. From people spending millions trying to have the
           | entire factory operated by robots so they could save 100k on
           | humans to engineers specifying friction stir welders for the
           | most basic of welding jobs, overengineering of parts that
           | would make the people at Juicero blush, etc etc etc.
           | 
           | I have no idea why software people think their industry is
           | the only one where people cut corners. Some form of meta-
           | imposter syndrome perhaps.
        
             | BSDobelix wrote:
             | >From people spending millions trying to have the entire
             | factory operated by robots so they could save 100k on
             | humans to engineers specifying friction stir welders for
             | the most basic of welding jobs
             | 
             | Look, I come from that industry (metalworking), if you do
             | friction stir where it's not needed you should be kicked
             | out of your job, but wonder, I've never heard of such a
             | thing in reality, don't tell me you're buying another
             | friction stir cnc to save 100k "on people", friction stir
             | is slow, expensive and any robot can weld (normal welding)
             | faster.
             | 
             | Yes people are expensive, but un-optimised work is even
             | more expensive (on factory level), NO ONE in the metal
             | industry would do something like this if it was not
             | necessary (well except the defence sector, because those
             | guys are crazy and have unlimited money).
             | 
             | I call your made up story complete BS.
        
         | KolmogorovComp wrote:
         | Why scale up when you can optimise? I'm probably going to be
         | downvoted for this, but imo this is really the mindset that
         | leads to bloated software.
        
           | tra3 wrote:
           | Agreed.
           | 
           | This is the implicit assertion that developer time is more
           | expensive than hardware costs.
           | 
           | Seems true in the short term, until the whole system
           | crumbles.
        
         | bastawhiz wrote:
         | They managed to reduce max response times by an order of
         | magnitude. If this project took a week (even two) and some
         | users went from 15s response times to 1.5s response times, only
         | projects where the user experience is _even worse_ or where you
         | work for a for-profit organization where there 's money to be
         | made elsewhere (and you admit you don't really care about
         | customer pain) would be a better justification of time.
        
         | ViktorRay wrote:
         | Lichess is a non-profit. It is run entirely on donations and
         | volunteering. It has only 1 employee, the dude who founded the
         | non-profit, and it seems he takes far less money than he could
         | make from any other job based on how talented he is.
         | 
         | Also the organization is based in France. I don't what impact
         | that has on costs but it's worth mentioning.
        
           | jayemar wrote:
           | I had no idea that was the case, that's incredibly
           | impressive!
        
           | lukhas wrote:
           | We're up to 2 employees now! The founder and a mobile dev.
           | 
           | The impact on costs is "not small", because as a rough
           | estimate, the charity pays overall about twice what the dev
           | gets in take-home money, because French employer taxes are
           | high (keyword for the Frenchies reading us: URSSAF).
           | 
           | Source: am President of the Lichess charity and have the
           | honour and pleasure of dealing with most of the French
           | administrative paperwork.
        
         | diggan wrote:
         | > From a product standpoint
         | 
         | Makes sense from that perspective, but Lichess is not run as a
         | for-profit company with a product, it's run as a non-profit
         | organization (which it is), so a perspective shift is needed to
         | understand their decisions :)
        
           | silvestrov wrote:
           | Take a look at their financials and $1500 for SSDs would not
           | be out of place.
           | 
           | They have yearly expenses for more than $500.000
           | 
           | https://docs.google.com/spreadsheets/d/1Si3PMUJGR9KrpE5lngSk.
           | ..
           | 
           | Seems really weird to be using harddrives when they already
           | have expenses like that.
        
             | Timshel wrote:
             | Looks like rented stuff to me you can't just add drives ...
             | 
             | And while 500k is a lot maybe they can do so much with it
             | because they do not just throw $1500 in drives at every
             | problem.
        
             | Out_of_Characte wrote:
             | The reason is buried in another article
             | 
             | "WDL tables ( _.rtbw) store the outcome of positions, e.g.
             | if a position is winning. An engine will use this very
             | frequently to decide which endgames to aim for. WDL tables
             | should be stored on the fastest disk (preferably SSD) you
             | have. " "DTZ tables (_.rtbz) tell the engine how to finish
             | the endgame once it is on the board. They are optional, but
             | required to reliably convert complicated endings."
             | 
             | Seems reasonable to put the WDL table on the SSD for better
             | engine performance. I do understand not choosing SSD's. The
             | number of lookups for positions always remains the same per
             | user per game. Yet the tablebase is growing more than
             | exponentially.
             | 
             | https://lichess.org/@/lichess/blog/7-piece-syzygy-
             | tablebases...
        
             | lukhas wrote:
             | As mentionned elsewhere, we're renting most of our infra
             | from OVH, and paying, _monthly_ , for 40TB of SSDs or NVMes
             | would simply explode our yearly budget.
             | 
             | Source: am president of the lichess charity (and also one
             | of the sysadmins)
        
       | robbles wrote:
       | > here are the empirical distribution functions (ECDFs) with 30ms
       | added to each response time
       | 
       | > The added constant seems artificial, but it's just viewing the
       | results from the point of view of a client with 30ms ping time.
       | Otherwise the log scaled x-axis would overemphasize the
       | importance of a few milliseconds at the low end.
       | 
       | I thought this was interesting - maybe it's a standard practice I
       | was just unaware of but it seems like a smart trick.
        
       | everyone wrote:
       | A lichess is a female lich I'm assuming? (It's like baron /
       | baroness)
        
         | o11c wrote:
         | Noble titles are a poor comparison since they're the rare
         | example where there actually is an exclusively-male root form.
         | For most words the root form is neuter, and both male-only (if
         | it exists) and female-only forms require an affix.
         | 
         | Properly, a male lich is "werlich" and a female lich is
         | "wiflich" (unlike other words the /f/ sound is not likely to
         | disappear); the plurals add "-en". But generally sex is
         | irrelevant for undead{cn} so the neuter form by far
         | predominates.
         | 
         | "lichess" is an abominable mixture of German and French roots
         | ... so naturally it is indistinguishable from the rest of
         | English.
        
           | claytonwramsey wrote:
           | note - "chess" is not a Germanic word (deriving from the
           | Arabic shah (shah), meaning king). Ironically enough, it
           | comes to English via the Old French esches, meaning that
           | "lichess" is arguably made from entirely French roots.
        
             | o11c wrote:
             | Hm, I guess the "libre" is French, but "live", "light", and
             | most importantly "lich" are all German.
             | 
             | If we look for relatives of "libre", they include
             | "leed"(song) and the first half of Leopold (adding "bold")
             | and Luther (adding "army"). The common meaning is "people".
        
         | OsrsNeedsf2P wrote:
         | It's "Libre" chess, as in "Free (and open source)" chess
        
       | 29athrowaway wrote:
       | There is also lishogi but it is smaller enough to not require
       | such optimizations yet.
       | 
       | Shogi is the most entertaining for chess variants. Xiangqi not as
       | much.
        
       | imperialdrive wrote:
       | Lichess is one of those things you just have to sit and
       | appreciate like a fine wine. It's absolutely wonderful for people
       | in the chess community. I use it every day and am inspired by the
       | functionality and performance, especially knowing it's a 1-2
       | person shop with limited budget.
        
         | lepetitchef wrote:
         | Me too. Recently the new beta mobile app is even cleaner and
         | has haptic feedback which is so cool.
        
         | TheRoque wrote:
         | You forgot to mention that it's free, open source, and doesn't
         | nor will ever ask for your money, and a lot of people donate.
         | Their expenses are public. It's also available as an app !
        
         | wavemode wrote:
         | I wish more open source end-user software learned from Lichess,
         | in terms of how user friendly, well designed and well
         | maintained it is.
        
       | treebeard901 wrote:
       | Some questionable choices are made in this optimization.
       | 
       | The reason for the optimization is that there is so much IO
       | activity the RAID checks can't complete.
       | 
       | It is unclear from the article if the RAID checks were ever
       | completed on 17TiB of data. Instead, they choose to disable the
       | periodic RAID checks and instead switch to doing the error
       | checking as a page of data is read in. The two are not
       | equivalent, and both should be used for important data.
       | 
       | Finding corrupt data only as you try to read it can lead to long
       | running data corruptions, maybe to the point your backups do not
       | go back far enough to restore the uncorrupted data. Underpinning
       | this also is a change to RAID 0... While the fastest option, they
       | are putting a lot of faith in that NVMe config handling that kind
       | of workload.
       | 
       | Hope they have good backups...
       | 
       | EDIT: A good way to solve this is to spin up a temporary server,
       | restore your backups to it, do the full data checks and when
       | successful, you have also checked your backup and restore process
       | along with the integrity of the file. You still want to have
       | enough overhead available to complete the RAID checks on the
       | primary server and don't use RAID 0 for performance.
        
         | lukhas wrote:
         | They are indeed not equivalent, but for our use case this is
         | sufficent, if we detect data corruption we can just throw away
         | the files and download/regenerate them (this is a freely
         | available dataset, if a bit large,
         | https://en.wikipedia.org/wiki/Endgame_tablebase will explain it
         | better than me). For this reason, it is also not backupped.
        
       ___________________________________________________________________
       (page generated 2024-07-13 23:00 UTC)