[HN Gopher] Optimizing the Lichess Tablebase Server
___________________________________________________________________
Optimizing the Lichess Tablebase Server
Author : cristoperb
Score : 210 points
Date : 2024-07-12 22:14 UTC (1 days ago)
(HTM) web link (lichess.org)
(TXT) w3m dump (lichess.org)
| hocuspocus wrote:
| I know it's not a fair comparison but I'm truly impressed by the
| quality of engineering shown by the Lichess team, when their main
| competitor was for example boasting about a migration to GCP and
| yet suffering from repeated outages due to fairly organic growth
| in popularity. While I believe they employ 100x more people.
|
| Lichess' mobile app was a weak spot, however the v2 rewrite in
| Flutter is already pretty good while still in beta.
|
| And keep in mind Thibault pays himself less than 60k/year.
| peter_retief wrote:
| Lichess is a great service to casual chess players like myself
| to get a quick game against another human. Never much of a
| wait.
|
| What I do want to know is how does one pronounce Lichess? Lie
| chess, Le chess?, League chess?
| hocuspocus wrote:
| /li:/ as in libre.
| jffry wrote:
| According to https://lichess.org/faq#name: "Lichess is a
| combination of live/light/libre and chess. It is pronounced
| lee-chess"
|
| They also link this video:
| https://www.youtube.com/watch?v=KRpPqcrdE-o
| tecleandor wrote:
| I guess it's because of the lychee fruit?
| ycombinete wrote:
| I'm team lie-chess.
| sgt wrote:
| I don't think he needs to feel bad about increasing his salary.
| Make it 200k/yr and make his life easier, which can only be
| good for the project long term.
| hocuspocus wrote:
| I don't know him personally but from the talks he's given, he
| seems to be ideological about Lichess and his own lifestyle,
| in a way that would be considered fairly anti-capitalistic by
| most of the HN crowd :)
| treyd wrote:
| Do you have links to any of these talks you could
| recommend?
| heap_perms wrote:
| Not OP but I can recommend this talk by Thibault (the
| founder): https://www.youtube.com/watch?v=LZgyVadkgmI
| epidemian wrote:
| IDK about France (where Thibault is from, and IDK if he lives
| there), but where i'm from, you would have a _very_
| comfortable life earning 5k every month, so his self-imposed
| 60k /yr salary doesn't seem unreasonable at all. At some
| point, more money yields diminishing returns.
| diggan wrote:
| > but where i'm from, you would have a very comfortable
| life earning 5k every month, so his self-imposed 60k/yr
| salary doesn't seem unreasonable at all.
|
| (Some) HN commentators seems weirdly out of touch when it
| comes to salary outside of IT-heavy cities in the US. The
| other day someone claimed $125k/year for an employee wasn't
| "big money"
| (https://news.ycombinator.com/item?id=40927175), so I'd
| take any comments saying some salary is high/low with a box
| filled with sand.
| AQuantized wrote:
| To be fair that really isn't 'big money' in most of those
| cities, assuming big money has some connotation of
| significantly above average after tax and expenses
| disposable income in those areas, especially relative to
| your peers. I don't think it would be unfair to say that
| would be big money compared to many European workers in
| the same jobs though.
| hyperman1 wrote:
| I don't know if that 5K is before or after taxes. You
| easily lose half of what your employer actually pays.
| maccard wrote:
| EUR60k pre-tax is roughly in the top 10% of incomes in
| the country based on a quick google. Not opulent, but
| definitely comfortable.
| hocuspocus wrote:
| His salary is more like EUR55k though.
|
| It's comfortable outside of Paris and other expensive
| cities. But he could easily double that given his
| background. Before quitting his job he already worked
| with Play and the Typesafe (now Lightbend) stack before
| the peak of its hype, when companies were paying top
| dollar for consultants.
| epolanski wrote:
| I think you're highly overestimating how many devs Chess.com
| has
| hocuspocus wrote:
| I am not, that's why I said employees not devs.
| Sesse__ wrote:
| Lichess is a great example of how efficient Wikipedia should
| have been (both on the code and organization level). :-)
| aeyes wrote:
| Did they have to reduce cost or is there any other reason to not
| stick 20TB of SSDs in a box and call it a day? 4TB SSDs only cost
| ~$300, even HP or Dell SFF drives aren't much more expensive.
|
| I guess they were interested in doing the testing and
| optimization for fun. From a product standpoint I probably would
| have invested my limited time in other projects.
| broodbucket wrote:
| Lichess is a non-profit with a lot of volunteers, they probably
| don't have the same time vs hardware cost balance as most for-
| profit companies do
| traceroute66 wrote:
| It is important not to automatically make assumption that all
| non-profits are impoverished and run by volunteers.
|
| One of the most famous examples is Wikipedia.
|
| Technically yes, they are a non-profit. Impoverished ?
| Certainly not !
|
| Look at the financials, as others have already pointed out.
| Especially if you are in the habit of donating to non-
| profits, the financials can make for interesting reading.
| r0ks0n wrote:
| french detected
| BSDobelix wrote:
| >testing and optimization for fun
|
| In no other industry a engineer would think like that...except
| in IT.
|
| We definitely have too powerful and cheap Hardware, combined
| with lazy Wetware who just wants to "call it a day"....be proud
| of your work....or so they say.
| chronogram wrote:
| Not calling it a day anywhere is why Lichess is such a good
| website.
| aeyes wrote:
| Most things in life are a compromise and it's easy to get
| tempted to find the perfect solution instead of spending your
| time on actually moving forward.
|
| In all industries there is always something you can do better
| if only you spend more time. But at most places time is worth
| money and I'd say $3000 for a few SSDs is little enough to
| not make this worth my time.
| BSDobelix wrote:
| >$3000 for a few SSDs is little enough to not make this
| worth my time.
|
| Yeah ok i got it, you are so superior that it's not worth
| your time in finding the performance problems but throw
| hardware on it. So happy you don't work on any Operating-
| system related project or anything that has a massive
| infrastructure.
|
| PS: I know $3000 dollars are your monthly Uber-Eat costs,
| but not everyone is that loose with money.
| WJW wrote:
| You think engineers in other industries won't sometimes
| choose the more exciting option when a boring but well-
| understood one would do the trick? That's definitely not true
| in (at least) mechanical and electrical engineering from what
| I've seen. From people spending millions trying to have the
| entire factory operated by robots so they could save 100k on
| humans to engineers specifying friction stir welders for the
| most basic of welding jobs, overengineering of parts that
| would make the people at Juicero blush, etc etc etc.
|
| I have no idea why software people think their industry is
| the only one where people cut corners. Some form of meta-
| imposter syndrome perhaps.
| BSDobelix wrote:
| >From people spending millions trying to have the entire
| factory operated by robots so they could save 100k on
| humans to engineers specifying friction stir welders for
| the most basic of welding jobs
|
| Look, I come from that industry (metalworking), if you do
| friction stir where it's not needed you should be kicked
| out of your job, but wonder, I've never heard of such a
| thing in reality, don't tell me you're buying another
| friction stir cnc to save 100k "on people", friction stir
| is slow, expensive and any robot can weld (normal welding)
| faster.
|
| Yes people are expensive, but un-optimised work is even
| more expensive (on factory level), NO ONE in the metal
| industry would do something like this if it was not
| necessary (well except the defence sector, because those
| guys are crazy and have unlimited money).
|
| I call your made up story complete BS.
| KolmogorovComp wrote:
| Why scale up when you can optimise? I'm probably going to be
| downvoted for this, but imo this is really the mindset that
| leads to bloated software.
| tra3 wrote:
| Agreed.
|
| This is the implicit assertion that developer time is more
| expensive than hardware costs.
|
| Seems true in the short term, until the whole system
| crumbles.
| bastawhiz wrote:
| They managed to reduce max response times by an order of
| magnitude. If this project took a week (even two) and some
| users went from 15s response times to 1.5s response times, only
| projects where the user experience is _even worse_ or where you
| work for a for-profit organization where there 's money to be
| made elsewhere (and you admit you don't really care about
| customer pain) would be a better justification of time.
| ViktorRay wrote:
| Lichess is a non-profit. It is run entirely on donations and
| volunteering. It has only 1 employee, the dude who founded the
| non-profit, and it seems he takes far less money than he could
| make from any other job based on how talented he is.
|
| Also the organization is based in France. I don't what impact
| that has on costs but it's worth mentioning.
| jayemar wrote:
| I had no idea that was the case, that's incredibly
| impressive!
| lukhas wrote:
| We're up to 2 employees now! The founder and a mobile dev.
|
| The impact on costs is "not small", because as a rough
| estimate, the charity pays overall about twice what the dev
| gets in take-home money, because French employer taxes are
| high (keyword for the Frenchies reading us: URSSAF).
|
| Source: am President of the Lichess charity and have the
| honour and pleasure of dealing with most of the French
| administrative paperwork.
| diggan wrote:
| > From a product standpoint
|
| Makes sense from that perspective, but Lichess is not run as a
| for-profit company with a product, it's run as a non-profit
| organization (which it is), so a perspective shift is needed to
| understand their decisions :)
| silvestrov wrote:
| Take a look at their financials and $1500 for SSDs would not
| be out of place.
|
| They have yearly expenses for more than $500.000
|
| https://docs.google.com/spreadsheets/d/1Si3PMUJGR9KrpE5lngSk.
| ..
|
| Seems really weird to be using harddrives when they already
| have expenses like that.
| Timshel wrote:
| Looks like rented stuff to me you can't just add drives ...
|
| And while 500k is a lot maybe they can do so much with it
| because they do not just throw $1500 in drives at every
| problem.
| Out_of_Characte wrote:
| The reason is buried in another article
|
| "WDL tables ( _.rtbw) store the outcome of positions, e.g.
| if a position is winning. An engine will use this very
| frequently to decide which endgames to aim for. WDL tables
| should be stored on the fastest disk (preferably SSD) you
| have. " "DTZ tables (_.rtbz) tell the engine how to finish
| the endgame once it is on the board. They are optional, but
| required to reliably convert complicated endings."
|
| Seems reasonable to put the WDL table on the SSD for better
| engine performance. I do understand not choosing SSD's. The
| number of lookups for positions always remains the same per
| user per game. Yet the tablebase is growing more than
| exponentially.
|
| https://lichess.org/@/lichess/blog/7-piece-syzygy-
| tablebases...
| lukhas wrote:
| As mentionned elsewhere, we're renting most of our infra
| from OVH, and paying, _monthly_ , for 40TB of SSDs or NVMes
| would simply explode our yearly budget.
|
| Source: am president of the lichess charity (and also one
| of the sysadmins)
| robbles wrote:
| > here are the empirical distribution functions (ECDFs) with 30ms
| added to each response time
|
| > The added constant seems artificial, but it's just viewing the
| results from the point of view of a client with 30ms ping time.
| Otherwise the log scaled x-axis would overemphasize the
| importance of a few milliseconds at the low end.
|
| I thought this was interesting - maybe it's a standard practice I
| was just unaware of but it seems like a smart trick.
| everyone wrote:
| A lichess is a female lich I'm assuming? (It's like baron /
| baroness)
| o11c wrote:
| Noble titles are a poor comparison since they're the rare
| example where there actually is an exclusively-male root form.
| For most words the root form is neuter, and both male-only (if
| it exists) and female-only forms require an affix.
|
| Properly, a male lich is "werlich" and a female lich is
| "wiflich" (unlike other words the /f/ sound is not likely to
| disappear); the plurals add "-en". But generally sex is
| irrelevant for undead{cn} so the neuter form by far
| predominates.
|
| "lichess" is an abominable mixture of German and French roots
| ... so naturally it is indistinguishable from the rest of
| English.
| claytonwramsey wrote:
| note - "chess" is not a Germanic word (deriving from the
| Arabic shah (shah), meaning king). Ironically enough, it
| comes to English via the Old French esches, meaning that
| "lichess" is arguably made from entirely French roots.
| o11c wrote:
| Hm, I guess the "libre" is French, but "live", "light", and
| most importantly "lich" are all German.
|
| If we look for relatives of "libre", they include
| "leed"(song) and the first half of Leopold (adding "bold")
| and Luther (adding "army"). The common meaning is "people".
| OsrsNeedsf2P wrote:
| It's "Libre" chess, as in "Free (and open source)" chess
| 29athrowaway wrote:
| There is also lishogi but it is smaller enough to not require
| such optimizations yet.
|
| Shogi is the most entertaining for chess variants. Xiangqi not as
| much.
| imperialdrive wrote:
| Lichess is one of those things you just have to sit and
| appreciate like a fine wine. It's absolutely wonderful for people
| in the chess community. I use it every day and am inspired by the
| functionality and performance, especially knowing it's a 1-2
| person shop with limited budget.
| lepetitchef wrote:
| Me too. Recently the new beta mobile app is even cleaner and
| has haptic feedback which is so cool.
| TheRoque wrote:
| You forgot to mention that it's free, open source, and doesn't
| nor will ever ask for your money, and a lot of people donate.
| Their expenses are public. It's also available as an app !
| wavemode wrote:
| I wish more open source end-user software learned from Lichess,
| in terms of how user friendly, well designed and well
| maintained it is.
| treebeard901 wrote:
| Some questionable choices are made in this optimization.
|
| The reason for the optimization is that there is so much IO
| activity the RAID checks can't complete.
|
| It is unclear from the article if the RAID checks were ever
| completed on 17TiB of data. Instead, they choose to disable the
| periodic RAID checks and instead switch to doing the error
| checking as a page of data is read in. The two are not
| equivalent, and both should be used for important data.
|
| Finding corrupt data only as you try to read it can lead to long
| running data corruptions, maybe to the point your backups do not
| go back far enough to restore the uncorrupted data. Underpinning
| this also is a change to RAID 0... While the fastest option, they
| are putting a lot of faith in that NVMe config handling that kind
| of workload.
|
| Hope they have good backups...
|
| EDIT: A good way to solve this is to spin up a temporary server,
| restore your backups to it, do the full data checks and when
| successful, you have also checked your backup and restore process
| along with the integrity of the file. You still want to have
| enough overhead available to complete the RAID checks on the
| primary server and don't use RAID 0 for performance.
| lukhas wrote:
| They are indeed not equivalent, but for our use case this is
| sufficent, if we detect data corruption we can just throw away
| the files and download/regenerate them (this is a freely
| available dataset, if a bit large,
| https://en.wikipedia.org/wiki/Endgame_tablebase will explain it
| better than me). For this reason, it is also not backupped.
___________________________________________________________________
(page generated 2024-07-13 23:00 UTC)