[HN Gopher] A distributed systems reading list
___________________________________________________________________
A distributed systems reading list
Author : davidw
Score : 256 points
Date : 2024-02-08 15:38 UTC (7 hours ago)
(HTM) web link (ferd.ca)
(TXT) w3m dump (ferd.ca)
| nonrandomstring wrote:
| A good list of concepts and resources. I'll just mention Andrew
| Tanenbaum's "Distributed Operating Systems" (Prentice-Hall 1995)
| which was my entry point.
| eachro wrote:
| When people learn about distributed systems outside of work, how
| do they actually get hands on experience with it (assuming they
| don't go spinning up a bunch of machines on aws/gcp/azure/etc)? I
| find it easiest to learn by doing, writing simple proof of
| concepts but that seems a bit harder to do in this area than
| others? What is the hello world/mnist of messing around with
| distributed systems?
| TrustPatches wrote:
| Gossip Glomers might be fun if you're looking for some hands-on
| exercises :)
|
| https://fly.io/dist-sys/
| Jtsummers wrote:
| 1. A bunch of communicating local processes.
|
| 2. A bunch of communicating local VMs (easier with a beefier
| machine like my current desktop).
|
| 3. Mininet (there are other options) to simulate a network
| environment, can fully control the topology very easily.
| Lighter weight than (2), more control for simulating different
| network effects than (1) alone.
| John23832 wrote:
| Frankly, just build something.
|
| Use a small k8's distro (kind, minikube, k3s) and build
| something that talks amongst itself and is resilient.
| ActionHank wrote:
| Sites like leetcode are great for coding and improving,
| because you get to compare your solutions to those of others.
| Sadly just building something on your own helps you learn the
| moving parts, but not optimal, neat, or best practice
| solutions.
| John23832 wrote:
| Sites like leetcode overindexes on the rote abilities that
| you can stamp out. Actually building something is exploring
| in a creative way which builds a deeper understanding.
|
| If you want to be "optimal, neat, or best practiced" read a
| book, and get stuck in tutorial hell. If you actually want
| to learn how to do something, literally go do it. Nobody
| has ever built anything of value (whether that is
| financial, intellectual, or emotional) by leetcoding.
| ActionHank wrote:
| Any suggestions for books to read?
| mannyv wrote:
| What you should read or start with are the designs for
| Cassandra, Kafka, Foundation DB, etc.
|
| The problems they're trying to solve are related to
| really large distributed systems that fail a lot, and
| their design decisions are basically a "this is how we
| worked around that problem."
|
| You can also look for the LISA archives
| (https://www.usenix.org/publications/loginonline/thirty-
| five-...). System administrators were the first people
| that had to deal with large distributed systems at scale,
| and university system administrators led the charge.
|
| You might want to hunt down the comp.sys.admin archives
| (I can't remember the newsgroup anymore).
|
| Most of the ideas and issues behind distributed computing
| are obvious if you think about it. Many of the actual
| implementation and mitigation of those are not obvious,
| though.
|
| And there's also the client side of distributed
| computing, which I don't think is discussed as much.
|
| As an example, exponential backoff is one of the go-to
| techniques for clients when the servers are under load.
| Unfortunately that doesn't really work IRL, because
| instead of spreading the load you get waves of load
| coming back over and over. Likewise on the server side
| you have problems with peak load.
| John23832 wrote:
| The only one I would really suggest is Designing Data
| Intensive Applications. But it is very DB centric.
|
| https://www.amazon.com/Designing-Data-Intensive-
| Applications...
| davidw wrote:
| "Just build something" is good advice but it's easier to find
| some kind of fun thing to build with, say, a web framework
| that's educational and maybe not a _complete_ throwaway
| either.
|
| Maybe some Internet of Things applications would provide a
| good avenue for some distributed systems exploration?
| Xamayon wrote:
| It's not as easy as playing with more 'normal' stuff, but I
| usually use VMs on a local hypervisor like ESXi, or a bunch of
| old desktop/server hardware if I have enough
| space/power/cooling at the time. Winter helps, big stuff often
| runs loud and hot. To get specialized hardware when needed,
| ebay or 'trash' from work and such can help a lot.
| mannyv wrote:
| The easiest way is to fire up a bunch of VMs.
|
| The cheapest way is to pick up an old ThinkStation (or other
| tower), load it up with 128GB (or more) of ram and install ESXI
| on it. That's a perfectly good baseline, and you can run about
| 30 4gb linux VMs on it.
|
| Ideally you'd have a bit less than 1 core per VM, just so it's
| a bit slow. Lots of people assume your nodes are quick, but in
| real life they may not be. And really, most of the time your
| machines won't be doing squat.
|
| You might want to have SSDs in there too, because ESXI doesn't
| have RAID capability (or at least mine didn't). I don't think
| you can get a cloud device that uses spinning disks anymore,
| and you wouldn't use it in real life anyway.
|
| A 2tb drive is cheap these days, or just slap all those old
| small SSDs in there. Everyone has a bunch of those small SSDs
| left over, and they're perfect.
| Xamayon wrote:
| ESXi needs the RAID to be handled by another device, the
| simplest case is a hardware RAID card with disks locally
| attached to it. You can also attach remote disks/volumes from
| other systems, with or without RAID, over the network/SAN/etc
| using an HBA, special network card, or the software iscsi
| initiator stuff in ESXi. You can even have something like a
| windows server act as the iscsi volume host, and attach to it
| over the normal network if you don't really care about
| reliability. The ESXi OS will not appreciate it if you ever
| turn the remote volume host system off, or if the network
| drops out. It's really too bad the free and cheap ESXi
| licenses are going away, it was always so nice to work
| with...
| mannyv wrote:
| That's right, Broadcom bought them and the party's open.
| Download your ESXi while you can!
| lijok wrote:
| Most systems are "distributed" actually, even your CRUD apps
| and CLI tools that write to disk. The better question is "how
| do you learn to deal with distributed systems intricacies in
| places where it matters (such as finance)?", the answer for
| which is super simple; 1. Write any stateful
| program 2. Now look at every single LOC and imagine what
| happens to the system if the service crashes before executing
| the next LOC. Then modify the system to deal with those
| scenarios.
| awesome_dude wrote:
| > Most systems are "distributed" actually, even your CRUD
| apps and CLI tools that write to disk.
|
| Yeah, people miss this. If your app interacts with another
| app - bam distributed.
| patmorgan23 wrote:
| Its almost as if the world isn't single threaded
| hiAndrewQuinn wrote:
| You take traditional non-distributed systems and push them to
| their limits in some regard.
|
| "Singularity" systems are an abstraction afforded to us by the
| grace of the hardware we run them on. If you start pushing
| their performance hard enough, however, you inevitably get
| distributed behavior.
|
| This is also a good potential career reason to try to make
| software which is as performnt as possible - you'll get all the
| tasty edge cases and complexity war stories to talk about.
| apwell23 wrote:
| > how do they actually get hands on experience with it
|
| By designing twitter in a 45 min interview.
| m_0x wrote:
| I don't think it's a trivial thing to do outside of work. At
| most you can play with kubernetes and cloud but in an interview
| the lack of experience will come out because I think some stuff
| can only be learned at work. Especially scalability.
| awesome_dude wrote:
| > but in an interview the lack of experience will come out
|
| Some people are just up front about it - I've read a lot, and
| practiced the best I can, but am looking for some real world
| experience to marry that too.
| otoolep wrote:
| FWIW, I built hraftd[1] many years ago to make it easy to play
| with a simple distributed system, but one that uses a
| production-grade implementation of Raft[2]. You can spin up a
| cluster in seconds on a single machine, kill nodes, watch a new
| Leader get elected, and so on.
|
| It's written in Go, so it'll help if you are familiar with Go.
| But the code is not difficult to understand even if you don't.
|
| [1] https://github.com/otoolep/hraftd
|
| [2] https://github.com/hashicorp/raft
| otoolep wrote:
| Oh, and more background here:
| https://www.philipotoole.com/building-a-distributed-key-
| valu...
| davidw wrote:
| Among other things, Fred is the author of "Learn you some Erlang"
| which is one of the best programming books I've read. It's so
| obviously a labor of love.
|
| https://learnyousomeerlang.com/
| keschi wrote:
| I recommended "Understanding Distributed Systems: What every
| developer should know about large distributed applications" by
| Roberto Vitillo to all my colleagues back when I worked on SaaS
| systems.
|
| "Designing Data-Intensive Applications: The Big Ideas Behind
| Reliable, Scalable, and Maintainable Systems" by Martin Kleppmann
| as the more advanced deep dive.
|
| Both books provide timeless conceptual advice. Kleppmann's
| description of developing a database by starting from an append-
| only text file really stuck with me.
| wooly_bully wrote:
| The book that Dominik Tornow is writing "Thinking in
| Distributed Systems" has been an excellent next read after DDIA
| for me (it's not yet finished I believe).
|
| Really shows the experience of someone who understands this
| stuff inside and out (was one of the main people behind
| Temporal).
| killthebuddha wrote:
| FWIW I don't see mention of incompleteness on the book's site
| http://book.dtornow.com/
| hiAndrewQuinn wrote:
| I often like to think that, at a basic level, all a [edit:
| indexed] db "does" is move our O(n) search of an unordered text
| file to the O(log n) search of a tree
| maerF0x0 wrote:
| if the facet is indexed.
| hiAndrewQuinn wrote:
| ah yes, thanks
| teraflop wrote:
| Yup.
|
| From a high-altitude view, that's why splitting a huge
| database table into smaller partitions is not an _automatic_
| performance win. If you have M partitions with N rows each,
| then a lookup might require O(log M) time to find a partition
| and O(log N) time to find a row within the partition. But
| O(log M + log N) = O(log MN) which is what you would get from
| a single big table with appropriate indexing.
|
| Of course, in the real world constant factors and
| implementation details matter, so this is just a heuristic.
| But it seems to run contrary to a lot of novice programmers'
| intuition that a large DB table must automatically be a slow
| one.
| ryandv wrote:
| To add to this list, there is also "Principles of Eventual
| Consistency" [0] for getting down to the mathematical
| formalisms.
|
| In addition, Lamport's paper "Time, Clocks, and the Ordering of
| Events in a Distributed System" [1].
|
| [0] https://www.microsoft.com/en-us/research/wp-
| content/uploads/...
|
| [1] https://lamport.azurewebsites.net/pubs/time-clocks.pdf
| yodsanklai wrote:
| > Lamport's paper "Time, Clocks, and the Ordering of Events
| in a Distributed System"
|
| I know this article is a classic. I studied it at school but
| I've always found it very hard to understand. Maybe I'm wrong
| but I have the feeling that relatively few engineers use
| these formalisms as their mental models when designing
| distributed systems.
| bostik wrote:
| It was surprising that Kleppman's book was mentioned only at
| the _very end_ of the article, but at least it came with an
| understandable caveat. That book is incredible - although in
| all honesty it does require solid foundation of distributed
| systems to make proper sense.
|
| Until you have personally battled with replication lag, real-
| life impacts of eventual consistency and distributed writes,
| Data-Intensive Applications feels like a dry theoretical read.
| If you do come across the book with the scars and lessons, it
| does open the world up.
| throw0101b wrote:
| The words "reading list" implied to me a list of books, article,
| _etc_ , that one would go over to learn about the topic.
|
| Can anyone familiar on the topic suggest a list? Perhaps starting
| with a "101" item for those that want a general understanding /
| scratch a curiosity itch and perhaps proceeding to more technical
| items for those that want to dig deep.
| ahansen wrote:
| I feel like the article itself does a pretty good job of
| introducing a lot of the core topics with a short paragraph for
| each
| 0xbadcafebee wrote:
| Is there a wiki for computer science? I feel like I have a couple
| books worth of knowledge on building and maintaining distributed
| systems that is just gonna die with me. Could try to start
| flushing out articles but would be helpful if others contributed
| plumeria wrote:
| Why not contribute to Wikipedia?
| patmorgan23 wrote:
| There's tons of CS stuff on regular Wikipedia.
| allendoerfer wrote:
| If you want to tackle it from a more practical point of view, I
| can also recommend "Site Reliability Engineering (How Google runs
| production systems)", which is not only about the method itself,
| but naturally goes over distributed systems and explains some
| fundamentals.
| lysecret wrote:
| Do not like these types of books try all of them. However,
| Designing Data-Intensive Applications is just fantastic.
| revskill wrote:
| No book mentioned how to do distributed transaction though.
| convolvatron wrote:
| atomic transactions by Nancy Lynch et al.
| esafak wrote:
| DDIA does under "Distributed Transactions and Consensus" in
| Chapter 9: Consistency and Consensus.
| HextenAndy wrote:
| I haven't followed the link - but why not put it all in one
| place?
| rochak wrote:
| I'd recommend checking out MIT's Distributed Systems course. All
| its videos and assignments are available online and teach you
| everything you would need to get into these systems and go in
| depth in them.
| nc0 wrote:
| I also recommend anyone interested to have a look at the
| Erlang/OTP ecosystem, especially for their design decisions.
| While the language and the platform isn't popular, the OTP team
| does present rich architectural patterns and ideas that can
| improve your design
| sharas- wrote:
| "End-to-End Argument in System Design" - Classic. Basically
| means: nerds, stop playing with yourselves and think about
| users/clients of your system.
| LAC-Tech wrote:
| I've gotten a lot of value by going to a topic I don't understand
| on wikipedia, finding the oldest paper they cite, and printing it
| out.
|
| Sometimes I only finish half the paper, but damned if I haven't
| learned a lot.
|
| Disclaimer: I can could never go through and systematically work
| through a giant list like this. If you know yourself and you can,
| this may be more effective.
| max_ wrote:
| I think "Specifying Systems" by Leslie Lamport should be on the
| list.
|
| Along with "Mastering Bitcoin: Programming the Open Blockchain
| Book" by Andreas Antonopoulos
| nvdnadj92 wrote:
| The best introductory resource I have found in my career was:
| "Distributed Systems for Fun and Profit" by Mixu. It's about 50
| pages long, and is broken down quite well.
|
| https://book.mixu.net/distsys/single-page.html
| macintux wrote:
| Warning: I haven't checked these links in forever, but here's a
| list of distributed systems reading lists.
|
| https://gist.github.com/macintux/6227368
___________________________________________________________________
(page generated 2024-02-08 23:01 UTC)