[HN Gopher] Your computer is a distributed system
___________________________________________________________________
Your computer is a distributed system
Author : carlesfe
Score : 173 points
Date : 2022-03-30 14:07 UTC (8 hours ago)
(HTM) web link (catern.com)
(TXT) w3m dump (catern.com)
| jmull wrote:
| > This is something unique: an abstraction that hides the
| distributed nature of a system and actually succeeds.
|
| That's not even remotely unique.
|
| OP is grappling with "the map is not the territory" vs. maps have
| many valid uses.
|
| Abstractions can be both not accurate in every context and 100%
| useful in many, many common contexts.
|
| Also (before you get too excited), abstractions have quality:
| there are good abstractions -- which are useful in many common
| contexts -- and bad abstractions -- which overpromise and turn
| out to be misleading in some or many common contexts.
|
| I'll put it this way: the idea that _The Truth_ exists is a rough
| (and not particularly useful) abstraction. If you have a problem
| with that, it just means you have something to learn to engage
| reality more fruitfully.
| sesuximo wrote:
| I think there's a big difference which is that your computer is
| allowed to crash when one component breaks whereas a distributed
| system is typically more fault tolerant.
| uvdn7 wrote:
| This is actually what makes handling the distributed system in
| a single computer easier - everything crashing together makes
| it an easier problem.
|
| E.g. you have multiple CPU cachelines, caching different values
| of a main memory location. And there are different cache
| coherence protocols to keep them sane. But cache coherence
| protocols never need to worry about the failure mode when one
| cacheline is temporarily unavailable but the others are.
|
| So yes, there's a distributed system in each multi-core
| computer, but it's a distributed system with an easier failure
| mode.
|
| If you like more analogies between CPU caches and distributed
| systems, https://blog.the-pans.com/cpp-memory-model-as-a-
| distributed-... :p
| harperlee wrote:
| Ideally a peripheral crashing should not crash the whole
| system.
| catern wrote:
| And indeed it does not: Modern operating systems like Linux
| can perfectly well deal with all kinds of devices crashing or
| disappearing at runtime. Just like in larger distributed
| systems.
| amelius wrote:
| Yes, and your computer is a ball of interconnected microservices
| too.
| JL-Akrasia wrote:
| You are also a distributed system.
| __turbobrew__ wrote:
| Some more distributed than others
| ilaksh wrote:
| This proves that conventional wisdom (such as the idea that
| abstracting distributed computation is unworkable) is often
| wrong.
|
| What happens is enough people try to do something and can't quite
| get it to work quite right that it eventually becomes assumed
| that anyone trying that approach is naive. Then people actively
| avoid trying because they don't want others to think they don't
| know "best practices".
|
| Remember the post from the other day about magnetic amplifiers?
| Engineers in the US gave up on them. But for the Russians, mag
| amps never became "unworkable" and uncool to try, and they
| eventually solved the hard problems and made them extremely
| useful.
|
| Technology is much more about trends and psychology than people
| realize. In some ways, so is the whole world. It seems to me that
| at some level, most humans never _really_ progress beyond middle-
| school level.
|
| The starting point for analyzing most things should probably be
| from the context of teenage primates.
| andrey_utkin wrote:
| Your body is a distributed system. Your brain is a distributed
| system. A live cell is a distributed system. A molecule is a
| distributed system. In other news, water is wet.
| rbanffy wrote:
| This was true for several home computers since the late 70's.
| Atari 8-bit computers had all peripherals connecting via a serial
| bus, each one with its own little processor, ROM, RAM and IO (the
| only exception, IIRC, was the cassete drive). Commodores also had
| a similar design for their disk drives. A couple months back a
| 1541 drive was demoed running standalone with custom software and
| generating a valid NTSC signal.
| Frenchgeek wrote:
| ( https://youtu.be/zprSxCMlECA )
| dkersten wrote:
| Wow, that is cool!
| catern wrote:
| Wow! Reminds me of https://www.rifters.com/crawl/?p=6116
|
| A hydrocephalic demo!
| rbanffy wrote:
| I think that plan hits a wall for heat dissipation and
| nutrient/oxygen consumption - not sure we have lungs large
| enough to keep a brain doing 10x more computation
| oxygenated, nor perspiration glands to keep it cool.
|
| But I'd be totally in to a 10% increase in IQ in exchange
| to being able to eat 10% more sugar.
| YZF wrote:
| well, it's been true since a wire has been connecting any two
| bits. The processor, ROM, RAM are all "distributed" systems
| internally.
| rbanffy wrote:
| That's not what "distributed system" means.
| YZF wrote:
| What's your definition of "distributed system" then?
|
| Two flipflops interconnected on one wafer. Two flipflops
| inteconnected on one PCB. Two flipflops interconnected with
| a cable between two PCBs. These are all "distributed".
| They're all subject e.g. to the CAP theorem. Sure, the
| probability of one flipflop failing on the same wafer is
| quite small. The probability of one flipflop failing on one
| PCB is slightly larger. But fundamentally all these systems
| are the same. If you have two computers on a network you
| can make the probability of failure (e.g. of the network)
| pretty small.
| rbanffy wrote:
| I start counting them as independent computers when they
| have their own firmware.
| TickleSteve wrote:
| It absolutely is.
|
| distribution of signals within smaller systems
| (microcontrollers, ASICs, FPGAs, etc) are all distributed
| systems. Ask anyone doing any kind of circuit design about
| distributing clocks and clock skew, etc.
| rbanffy wrote:
| If you read the article, you'll understand it's about our
| computers being networks of smaller computers. The SSD,
| GPU, NIC, and BMC has its own CPU, memory, and operating
| system.
| alexisread wrote:
| There are lots of good resources in this area: The programming
| language of the transputer
| https://en.m.wikipedia.org/wiki/Occam_(programming_language)
|
| Bluebottle active objects https://www.research-
| collection.ethz.ch/bitstream/handle/20.... with some discussion
| of DMA
|
| Composita components
| http://concurrency.ch/Content/publications/Blaeser_Component...
|
| Mobile Maude (only a spec)
| http://maude.sip.ucm.es/mobilemaude/mobile-maude.maude
|
| Kali scheme (atop Scheme48 secure capability OS)
| https://dl.acm.org/doi/pdf/10.1145/213978.213986
|
| Kali is probably the closest to a distributed OS, supporting
| secure thread and process migration across local and remote
| systems (and makes that explicit), distributed profiling and
| monitoring tools, etc. It is basically an OS based on the actor
| model. It doesn't scale massively as routing nodes was out of
| scope (it connects all nodes on a bus), but that can easily be
| added.
|
| Extremely small (running in 2mb ram), it covers all of R5rs, and
| the VM has been adapted to bare metal.
|
| I feel that there is more to do, but a combination of those is
| probably the right direction.
| throwaway787544 wrote:
| The thing we are missing still is the distributed OS. Kubernetes
| only exists because of the missing abstractions in Linux to be
| able to do computation, discovery, message passing/IO,
| instrumentation over multiple nodes. If you could do _ps -A_ and
| see all processes on all nodes, or run a program and have it
| automatically execute on a random node, or if ( _grumble grumble_
| ) Systemd unit files would schedule a minimum of X processes on N
| nodes, most of the K8s ecosystem would become redundant. A lot of
| other components like unified AuthZ for linux already exist, as
| well as networking (WireGuard anyone?).
| ff317 wrote:
| There were older attempts at this stuff, in the 90s with
| "Beowulf" clusters that had cross-machine process management
| and whatnot. It's a lot harder than it seems to make this
| approach make sense in the real world, as the abstraction hides
| important operational details. The explicit container +
| orchestration abstraction is probably closer to the ideal than
| trying to stretch linux/systemd/cgroups across the network
| "seamlessly". It's clearer what's going on and what the
| operational trade-offs are.
| gnufx wrote:
| > in the 90s with "Beowulf" clusters
|
| In case of any confusion, that sort of thing wasn't a generic
| Beowulf feature, but it sounds like Bproc. I don't know if
| it's still used. (The Sourceforge version is ancient.)
|
| https://updates.penguincomputing.com/clusterware/6/docs/clus.
| .. https://sourceforge.net/projects/bproc/
|
| Containers actually only make it harder to "orchestrate" your
| distributed processes in an HPC system.
| mnd999 wrote:
| Imagine a Beowulf cluster of hot grits in soviet Russia with
| CowboyNeal.
| uvdn7 wrote:
| Abstract a fleet of machines as single super computer sounds
| nice. But how about partial failures? It's something that a
| real stateful distributed system would have to deal with all
| the time but a single host machine almost never deals with (do
| you worry about a single cacheline failure when writing a
| program?).
| marcosdumay wrote:
| There is a huge amount of research about distributed OSes
| (really, they were very fashionable at the 90's and early
| 00's). Plenty of people worked on this problem, and it's
| basically solved (as in, we don't have any optimal solution,
| but it won't be a problem on a real system).
| NavinF wrote:
| It's "basically solved" in the sense that everyone gave up
| on distributed OSes and used k8s instead.
| zozbot234 wrote:
| K8s is doing distributed OS's on easy mode, supporting
| basically ephemeral 'webscale' workloads for pure
| horizontal scaling. Even then it introduces legendary
| amounts of non-essential complexity in pursuit of this
| goal. It gets used because "Worse is better" is a thing,
| not because anyone thinks it's an unusually effective way
| to address these problems.
| ohYi55 wrote:
| als0 wrote:
| I remember the Barrelfish OS was trying to tackle this problem
| head on https://barrelfish.org/
| evandrofisico wrote:
| Actually, at some point in the 2.4 kernel it was possibile to
| do that, with single image systems, such as openmosix, that
| handled process discovery, computation and much more, but
| underneath the simple user interface it was complex, kinda
| insecure and so, was never abandoned and never ported to newer
| versions.
| oceanplexian wrote:
| Am I the one who doesn't want this?
|
| The entire point of UNIX philosophy (Which seems to be
| something they aren't teaching in software development these
| days) is to do one thing and do it well. We don't need Linux
| operating operating as a big declarative distributed system
| with a distributed scheduling systems and a million half-baked
| APIs to interact with it, the way K8s works. If you want that
| you should build something to your specific requirements, not
| shove more things into the kernel.
| random314 wrote:
| The Unix philosophy was a reasonably good model decades ago.
| But I think it is over romanticized.
|
| It's binary blob design is no good for security, as opposed
| to a byte code design like Forth. Its user security model was
| poor and doesn't help with modern devices like phones. Its
| multiprocess model was ham fisted into a multithreading model
| to compete with windows NT. Its asynchronous i/o model has
| always been a train wreck even compared to NT. Its design
| creates performance issues, especially in multiproc
| networking code with needless amount of memcopys. Now folks
| are rewriting the networking stack in user space. Its
| software abstraction layer was some simple scheme from the
| 70s which has fragmented into crazy number of implementations
| now. Open source developers still complain about how much
| easier it is to build a package for windows, as opposed to
| linux. It was never meant to be a distributed system either.
| Modern enterprise compute cannot scale by treating and
| managing each individual VM as it's own thing with clusters
| held together by some sysadmins batch scripts.
| anthk wrote:
| And yet Linux manages better doing heavy I/O stuff over
| filesystems than Windows NT.
| pjmlp wrote:
| Because it doesn't provide the abstraction capabilites
| that NTFS allows for third parties, so naturally it is
| faster doing less.
| aseipp wrote:
| A good paper giving a concrete example of all this is "A
| fork() in the road", where you can see how an API just like
| fork(2) has an absolutely massive amount of ramifications
| on the overall design of the system, to the point "POSIX
| compliance" resulted in some substantial perversions of the
| authors' non-traditional OS design, all of which did
| nothing but add complexity and failure modes ("oh, but I
| thought UNIX magically gave you simplicity and made
| everything easy?") It also has significantly diverged from
| its "simple" original incarnation in the PDP-11 to a
| massive complex beast. So you can add "CreateProcess(), not
| fork()" on the list of things NT did better, IMO.
|
| And that's just a single system call, albeit a very
| important one. People simply vastly overestimate how rose-
| tinted their glasses are and all the devils in the details,
| until they actually get into the nitty gritty of it all.
|
| https://www.microsoft.com/en-
| us/research/uploads/prod/2019/0...
| goodpoint wrote:
| Linux/UNIX does not have to turn into a mess like k8s to be
| natively distributed. Plan9 was doing it with a tiny codebase
| in comparison.
| pjmlp wrote:
| The philosophy that is cargo culted and was never taken
| seriously by any commercial UNIX.
| aseipp wrote:
| All of this was possible with QNX literally decades ago, and
| it didn't need whatever strawman argument you're making up in
| your head in order to accomplish it. QNX was small, fast,
| lean, real-time, distributed, and very powerful for the time.
| Don't worry, it even had POSIX support. A modern QNX would be
| very well received, I think, precisely because taking a
| distributed-first approach would dramatically simplify the
| whole system design versus tacking on a distributed layer on
| top of one designed for single computers.
|
| > Which seems to be something they aren't teaching in
| software development these days
|
| This is funny. Perhaps the thing you should have been taught
| instead is history, my friend.
| jeffreygoesto wrote:
| You mean QNet [0]? That is still alive... It is for LAN use
| ("Qnet is intended for a network of trusted machines that
| are all running QNX Neutrino and that all use the same
| endianness."), so extra care is needed to secure this group
| of machines when exposed to the internet.
|
| [0] https://www.qnx.com/developers/docs/7.0.0///index.html#
| com.q...
|
| [1] https://recon.cx/2018/brussels/resources/slides/RECON-
| BRX-20...
| aseipp wrote:
| Correct. Thought QNet itself is only one possible
| implementation, in a sense (but obviously the one shipped
| with QNX.) And the more important part of the whole thing
| is the message-passing API design built into the system,
| which enables said networking transparency, because it
| means your programs are abstracted over the underlying
| transport mechanism.
|
| "LAN use" I think would qualify roughly 95% of the need
| for a "distributed OS," including a lot of usage of K8s,
| frankly. Systems with WAN latency impose a different set
| of challenges for efficient comms at the OS layer. But
| even then you also have to design your apps themselves to
| handle WAN-scale latencies, failover, etc too. So it
| isn't like QNX is going to make your single-executable
| app magic or whatever bullshit. But it exposes a set of
| primitives that are much more tightly woven into the core
| system design and much more flexible for IPC. Which is
| what a distributed system is; a large chattery IPC
| system.
|
| The RECON PDF is a very good illustration of where such a
| design needs to go, though. It doesn't surprise me QNX is
| simply behind modern OS's exploit mitigations. But on top
| of that, a modern take on this would have to blend in a
| better security model. You'd really just need to throw
| out the whole UNIX permission model frankly, it's simply
| terrible as far as modern security design is concerned.
| QNet would obviously have to change as well. You'd at
| minimum want something like a capability-based RPC layer
| I'd think. Every "application server" is like an
| addressable object you can refer to, invoke methods on,
| etc. (Cap'n Proto is a good way to get a "feel" for this
| kind of object-based server design without abandoning
| Linux, if you use its RPC layer.)
|
| I desperately wish someone would reinvent QNX but with
| all the nice trappings and avoiding the missteps we've
| accumulated over the past 10 to 15 years. Alas, it's much
| more profitable to simply re-invent its features poorly
| every couple of years and sell that instead.
|
| This overview of the QNX architecture (from 1992!) is one
| of my favorite papers for its simplicity and
| straightforward prose. Worth a read for anyone who like
| OS design.
|
| https://cseweb.ucsd.edu/~voelker/cse221/papers/qnx-
| paper92.p...
| Karrot_Kream wrote:
| The UNIX philosophy made more sense as an abstraction for a
| computer when computers were simpler. Computers nowadays
| (well at least since 2006-ish) have multiple cores executing
| simultaneously with complicated amounts of background logic,
| interrupt-driven logic, shared caches, etc. The UNIX
| philosophy doesn't map to this reality at all. Right now
| there's no set of abstractions except machine code that
| exposes the machine's distributed systems' in a coherent
| abstraction. Nothing is stopping someone else from writing a
| UNIX abstraction atop this though.
| generalizations wrote:
| The idea of doing one thing, and doing it well, isn't
| dependent on the simplicity of the underlying system (I
| imagine that PDP-11 systems seemed impressively complicated
| in their time, too). The UNIX philosophy is a paradigm for
| managing complexity. To me, that seems more relevant with
| modern computers, not less.
|
| > "A program is generally exponentially complicated by the
| number of notions that it invents for itself. To reduce
| this complication to a minimum, you have to make the number
| of notions zero or one, which are two numbers that can be
| raised to any power without disturbing this concept. Since
| you cannot achieve much with zero notions, it is my belief
| that you should base systems on a single notion." - Ken
| Thompson
| icedchai wrote:
| I think OpenVMS did this... in the 80's.
| gnufx wrote:
| Distributed computation with message passing (and RDMA) is the
| essence of HPC systems. SGI systems supported multi-node Linux
| single system images up to ~1024 cores a fair few years ago,
| but they depend on a coherent interconnect (NUMAlink,
| originally from the MPIS-based systems under Irix).
|
| However, you don't ignore the distributed nature of even single
| HPC nodes unless you want to risk perhaps an order of magnitude
| performance loss. SMP these days doesn't stand for Symmetric
| Multi-Processing.
| zozbot234 wrote:
| Distributed shared memory _is_ feasible in theory even via
| being provided in-software by the OS. You 're right that this
| would not change the physical reality of message passing, but
| it would allow a single multi-processor application code to
| operate seamlessly using either shared memory on a single
| node, or distributed memory on a large cluster.
| gnufx wrote:
| I talk about the practice in HPC, not theory, and this
| stuff is literally standard (remote memory of various types
| and the same thing running the same, modulo performance and
| resources, on a 32-core node as on one core each of 32
| nodes). However, you still need to consider network non-
| uniformity at levels from at least NUMA nodes up, at least
| if you want performance in general.
| f0e4c2f7 wrote:
| I very much agree with this and while Kubernetes is better than
| a poke in the eye, I look forward to the day when there is a
| true distributed OS available in the way you describe. It's
| possible Kubernetes could even grow into that somehow.
| Karrot_Kream wrote:
| I think you're looking at the wrong abstraction level. You're
| thinking on a node (computer) basis. Even on a single computer,
| many of the things that happen are distributed. DMA
| controllers, input interrupts, kernel-forced context switches,
| there's a lot going on there but we still pretend that our
| computers are just executing sequential code. I agree with the
| OP and think it's high time we treat the computer as the
| distributed system it is. Fuschia and GenodeOS are both making
| developments in this direction.
| zozbot234 wrote:
| The abstractions are there in Linux, largely imported from plan
| 9. And work is ongoing to support further abstractions, such as
| easy checkpoint/restore of whole containers. Kubernetes is a
| very new framework intended to support large-scale
| orchestration and deployment in a mostly automated way, driven
| by 'declarative' configuration; at some point, these features
| will be rewritten in a way that's easier to understand and
| perhaps extend further.
| MisterTea wrote:
| > The abstractions are there in Linux, largely imported from
| plan 9.
|
| Which abstractions are those?
| zozbot234 wrote:
| > to be able to do computation, discovery, message
| passing/IO, instrumentation over multiple nodes.
|
| Kernel namespaces are the building blocks for this, because
| an app that accesses all kernel-managed resources via
| separate namespaces is insulated from the specifics of any
| single node, and can thus be transparently migrated
| elsewhere. It enables the kind of location independence
| that OP is arguing for here.
| stormbrew wrote:
| Linux namespaces don't actually do any of those things
| though? Like, not even a single one of them are made
| possible because of namespaces. They're all possible or
| not possible precisely as much with or without
| namespaces.
|
| The thing is when comparing plan9 and linux here, you
| have to recognize that linux has it backwards. On plan9
| namespaces are emergent from the distributed structure of
| the system. On linux they form useful tools to _build_ a
| distributed system.
|
| But what's possible on plan9 is possible because it
| really does do "everything is a file," so your namespace
| is made up of io devices (files) and you can construct or
| reconstruct that namespace as you need.
|
| Like, this[1] is a description of how to configure
| plan9's cpu service so you run programs on another node.
|
| [1]
| https://9p.io/wiki/plan9/Expanding_your_Grid/index.html
|
| Nothing in there makes any sense from a linux containers
| perspective. You can't namespace the cpu. You can't
| namespace the gui terminal. All you can namespace is
| relatively superficial things, and even then opening up
| that namespacing to unprivileged users has resulted in
| several linux CVEs over the last year because it's just
| not built with the right assumptions.
| zozbot234 wrote:
| Doesn't Linux create device files in userspace these
| days, anyway? I thought that's what that udev stuff was
| all about. So I'm not sure that the Plan9 workflow is
| _inherently_ unfeasible, there 's just no idiomatic
| support for it just yet.
| stormbrew wrote:
| device nodes are managed in userspace nowadays yes, but
| they're just special files that identify a particular
| device id pair and then the OS acts on them in a special
| way. udev is just the userspace part of things that
| manages adding and removing them in response to hotplug
| events. Everything that matters about them is still
| controlled by the kernel.
| glorfindel66 wrote:
| That's not at all what Linux namespaces permit. It's a
| side effect of using them that could be leveraged using
| something like CRIU, sure, but it's not what they're for
| and they're not a building block for anything mentioned
| in the portion of their comment you quoted.
|
| Namespaces simply make the kernel lie when asked about
| sockets and users and such. It's intended for isolation
| on a single server. They're next to useless in
| distributed work, particularly the kind being discussed
| here (Plan 9ish). You actually want the opposite: to
| accomplish that, you want the kernel to lie even harder
| and make things up in the context of those interfaces,
| rather than hide things. Namespaces don't really get you
| there in their current form.
| zozbot234 wrote:
| > That's not at all what Linux namespaces permit.
|
| Isolating processes from the specifics of the system
| they're running on is a key feature of the namespace-
| based model; it seems weird to call it a "side effect
| only". We should keep in mind that CRIU itself is still a
| fairly new feature that's only entered mainline recently,
| and the kernel already has plenty of ways to "make up"
| more virtual resources that are effectively controlled by
| userspace. While it may be true that these things are
| largely ad hoc for now, it's not clear that this will be
| an obstacle in the future,
| gnufx wrote:
| I can talk about namespaces in HPC distributed systems,
| and they don't look anything like Plan 9 to me. They make
| life harder in various respects, and even dangerous with
| Linux features that don't take them into account (like at
| least one of the "zero-copy" add-on modules used by MPI
| shared memory implementations).
| NavinF wrote:
| Eh I can't see Linux getting a built-in distributed kv store
| (etcd) any time soon. Same goes for distributed filesystems.
| All you have out of the box is nfs which gives you the worst of
| both worlds: Every nfs server is a SPOF yet these servers don't
| take advantage of their position to guarantee even basic
| consistency (atomic appends) that you get for free everywhere
| else.
|
| And besides how would you even implement all those features you
| listed without recreating k8s? A distributed "ps -A" that just
| runs "for s in servers; ssh user@$s ps; done" and sorts the
| output would be trivial, but anything more complex (e.g.
| keeping at least 5 instances of an app running as machines die)
| requires distributed and consistent state.
| zozbot234 wrote:
| > requires distributed and consistent state
|
| Distributed yes, but not necessarily consistent. You can use
| CRDTs to manage "partial, flexible" consistency requirements.
| This might mean, e.g. sometimes having more than 5 instances
| running, but should come with increased flexibility overall.
| throwaway787544 wrote:
| Fwiw those features existed in Mosix (a Linux SSI patch) 2
| decades ago... I feel like we could probably do it again
|
| In terms of CAP, yeah it might not have been technically as
| reliable. But there's different levels of reliability for
| different applications; we could implement a lot of it in
| userland and tailor as needed
| wwalexander wrote:
| Plan 9 was designed in this way, but never took off.
|
| Rob Pike:
|
| > This is 2012 and we're still stitching together little
| microcomputers with HTTPS and ssh and calling it revolutionary.
| I sorely miss the unified system view of the world we had at
| Bell Labs, and the way things are going that seems unlikely to
| come back any time soon.
| monocasa wrote:
| Are there any good walkthroughs of what a good, distributed
| plan 9 setup looks like from either a development or a
| administration perspective? Particularly an emphasis on many
| distributed compute nodes (or cpu servers in plan 9
| parlance).
| jasonwatkinspdx wrote:
| I think Rob is right to call out the problem, but is being a
| bit rose colored about Plan 9.
|
| Plan 9 was definitely ahead of its time, but it's also a far
| cry from the sort of distributed OS we need today.
| "Everything is a remote posix file" ends up being a really
| bad abstraction for distributed computing. What people are
| doing today with warehouse scale clusters indeed has a ton of
| layers of crap in there, and I think it's obvious to yern for
| sweeping that away. But there's no chance you could do that
| with P9 as it was designed.
| wahern wrote:
| "Everything is a file" originally referred to read and
| write as universal object interfaces. It's similar to
| Smalltalk's send/receive as an idealized model for object-
| based programming. Hierarchical filesystem namespaces for
| object enumeration and acquisition is tangential, though it
| often works well because most namespaces (DNS, etc) tend to
| be hierarchical. (POSIX filesystem semantics doesn't really
| figure into Plan 9 except, perhaps, incidentally.)
| Filesystem namespacing isn't quite as abstract, though
| (open, readdir, etc, are much more concrete interfaces),
| making impedance mismatch more likely.
|
| The abstraction is sound. We ended up with TCP and HTTP
| instead of IL and 9P (and at scale, URLs instead of file
| descriptors), because of trust issues, but that's not
| surprising. Ultimately the interface of read/write sits
| squarely in the middle of all of them, and most others. To
| build a distributed system with different primitives at the
| core, for example, send/receive, requires creating
| significantly stronger constraints on usage and
| implementation environments. People do that all the time,
| but in _practice_ they do so by _building_ atop the file
| interface model. That 's what makes the "everything is a
| file" model so powerful--it's an interoperability sweet
| spot; an axis around which you can expect most large-scale
| architectures to revolve around at their core, even if the
| read/write abstraction isn't visible at the point users
| (e.g. application developers) interact with the
| architecture.
| jasonwatkinspdx wrote:
| A hierarchical namespace is fine, but the
| open/read/write/sync/close protocol on byte based files
| is definitely inadequate. The constraints on usage you
| decry are in fact fundamental constraints of distributed
| computing that are at odds with the filesystem
| abstraction. And this is exactly what I was getting at in
| talking about rose colored glasses with P9. It in no way
| is a replacement for something like Colossus or Spanner.
| zozbot234 wrote:
| > P9 ... in no way is a replacement for something like
| Colossus or Spanner.
|
| Colossus and Spanner are both proprietary so there's very
| limited info on them, but both seem to be built for very
| specialized goals and constraints. So, not really on the
| same level as a general system interface like 9P, which
| is most readily comparable to, e.g. HTTP. In Plan 9, 9P
| servers are routunely used to locally _wrap_ connections
| to such exotic systems. You can even require the file
| system interface locally exposed by 9P to be endowed with
| extra semantics, e.g. via special messages written to a
| 'control' file. So any level of compatibility or lack
| thereof with simple *nix bytestreams can be supported.
| NavinF wrote:
| Meh. Every time a 9p server dies, every client dies. Plan9 is
| not comparable to k8s.
| nautilus12 wrote:
| Glad to see Plan 9 getting some love in the comments even if
| it didn't make it into the article.
| gnufx wrote:
| If you yearn for Plan 9 -- I'm not sure I do -- Minnich's
| current incarnation of the inspiration seems to be
| https://github.com/u-root/cpu
| jlpom wrote:
| This describe more a Single System Image [0] to me (WPD
| includes Plan 9 as one but considering it does not does not
| supports process migration I find it moot). LinuxPMI [1]
| seems to be a good idea but they seems to be based on Linux
| 2.6, so you would have to heavily patch newer kernel. The
| only thing that seems to support process migration with
| current software / still active are CRIU [2] (which doesn't
| support graphical/wayland programs) and DragonflyBSD [3] (in
| their own words very basic).
|
| [0]: https://en.wikipedia.org/wiki/Single_system_image [1]:
| http://linuxpmi.org [2]: criu.org [3]: https://man.dragonflyb
| sd.org/?command=sys_checkpoint§ion...
| zozbot234 wrote:
| Graphical programs could be checkpointed and restored as
| long as they don't directly connect to the hardware.
| (Because the checkpoint/restore system has no idea how to
| grab the hardware's relevant state or replicate it on
| restore.) This means running those apps in a hardware-
| independent way (e.g. using a separate Wayland instance
| that connects to the system one), but aside from that it
| ought to be usable.
| jlpom wrote:
| For CRIU it is not supported:
| https://criu.org/Integration#Wayland.2FWeston, also in my
| experience it doesn't work. Are you talking about an
| other software?
| zozbot234 wrote:
| It has been done "virtually" by going through e.g. VNC
| https://criu.org/VNC . Alternately, CRIU apps could be
| required to use virt-* devices, which CRIU might
| checkpoint and restore similar to VM's.
| stormbrew wrote:
| I don't really see any reason to consider process migration
| a required feature of either a distributed os or a single
| system image. Even on a single computer this isn't always
| practical or desireable (ie. you can't 'migrate' a program
| running on your gpu to your cpu, and you can't trivially
| migrate a thread from one process to another either).
|
| Not all units of computation are interchangeable, and a
| system that recognizes this and doesn't try to shoehorn
| everything down to the lowest common denominator actually
| _gains_ some expressive power over a uniform system (else
| we would not have threads).
| gnufx wrote:
| For what it's worth, the HPC-standard way of
| checkpointing/migrating distributed execution (in
| userspace, unlike CRIU) is https://dmtcp.sourceforge.io/ It
| supports X via VNC -- I've never tried -- but I guess you
| could use xpra.
| MisterTea wrote:
| > his describe more a Single System Image [0] to me
|
| No, Plan 9 is not a SSI OS. The idea is all resources are
| exposed via a single unified file oriented protocol: 9p.
| All devices are files which means all communication happens
| over fd's meaning you look at your computer like a patch
| bay of resources, all communicated with via read() and
| write(). e.g.: [physical disk]<-->[kernel:
| sd(3)]-----< /dev/sdE0/ [audio card] <---->[kernel:
| audio(3)]--< /dev/audio [keyboard]-------->[kernel:
| kbd(3)]----< /dev/kbd
|
| Looking above it looks like Unix but with MAJOR
| differences. First off the disk is a directory containing
| partitions which are just files who's size is the
| partitions size. You can read or write those files as you
| please. Since the kernel only cares about exposing hardware
| as files, the file system on a partition needs to be
| translated to 9p. We do this with a program that is a file
| server which interprets e.g. a fat32 fs and serves it via
| 9p (dossrv(4)). Your disk based file system is just a user-
| space program.
|
| And since files are the interface you can bind over them to
| replace them with a different service like mixfs(4).
| /dev/audio is like the old linux oss where only one program
| could open a sound card at a time. To remedy this on plan 9
| you run mixfs which opens /dev/audio and then binds itself
| over /dev replacing /dev/audio in that namespace with a
| multiplexed /dev/audio from mixfs. Now you start your
| window manager and the children programs will see mixfs's
| /dev/audio instead of the kernel /dev/audio. Your programs
| can now play audio simultaneously without changing
| ANYTHING. Now compare that simplicity to the trash fire
| linux audio has been and continues to be with yet another
| audio subsystem.
|
| Keyboard keymaps are a filter program sitting between
| /dev/kbd and your program. All it does is read in key codes
| and maps key presses according to a key map which is just a
| file with key->mapping lines. Again, keyboards are files so
| a user space file server can be a keyboard such as a GUI
| keyboard that binds itself over /dev/kbd.
|
| Now all those files can be exported or imported to other
| machines, regardless of CPU architecture.
|
| Unix is an OS built on top of a single machine. Plan 9 is a
| Unix built on top of a network. It's the closest I can get
| to computing nirvana where all my resources are available
| from any machine with simple commands that are part of the
| base OS which is tiny compared to the rest.
| emteycz wrote:
| Best explanation of Plan 9 I've ever seen
| [deleted]
| [deleted]
| benreesman wrote:
| Eric Brewer thinks this is a good point of view on such things:
|
| https://codahale.com/you-cant-sacrifice-partition-tolerance/
|
| L1-blockchain entrepreneurs and people who got locked into
| MongoDB aside, I think most agree.
| pkilgore wrote:
| What is the kernal and the bus for the cloud?
| simne wrote:
| These all now are virtual state machines, which store some
| state and convert all kernel/bus behavior to interaction with
| connected via network devices.
|
| At the moment there are lot of such devices - exists for sure
| many full featured, like Raspberry; but also there are network
| connected ATA drives, network connected sensors, RAM, ROM
| (Flash); BTW IEEE 1394 FireWire is serial interface, could been
| used as networking bus; exists adapters ethernet-usb (and many
| commodity devices work well with such connection), so virtually
| anywhere could been considered as connected via network bus.
| Even exists USB 3.0 to PCIe adapter, to use PCIe device throw
| USB connection.
|
| And in reality exists problem, that FireWire so distributed,
| that it where possible on Macs with FireWire interface, to read
| memory via this interface.
|
| So hardware and software exists, but need some steps to make
| it's usage safe.
| WestCoastJustin wrote:
| Great post called "Achieving 11M IOPS & 66 GB/s IO on a Single
| ThreadRipper Workstation" [1, 2] that basically walks through
| step-by-step that your computer is just a bunch of interconnected
| networks.
|
| Highly recommend the post if you're into this and also sort of
| amazing how far single systems have come. You can basically do
| "big data" type things on this single box.
|
| [1] https://tanelpoder.com/posts/11m-iops-with-10-ssds-on-amd-
| th...
|
| [2] https://news.ycombinator.com/item?id=25956670
| syngrog66 wrote:
| once you learn to bias to thinking in terms of message passing
| between actors, and, bias to having immutable shared state,
| then,a lot of problems become easier to decompose and solve
| elegantly, esp at scale
| hsn915 wrote:
| Yes but your computer will not gracefully handle CPUs randomly
| failing or RAM randomly failing. Sure, storage devices can come
| and go, but that's been the case since forever, and most programs
| are not written to handle this edge case gracefully. Except for
| the OS kernel.
|
| The links between the components of your computer are solid and
| cannot fail like actual computer network connections.
|
| In terms of "CAP" theorom, the system has no Partition tolerance.
| If one of the the links connecting CPUs/GPUs/RAM breaks, all hell
| breaks loose. If a single instruction is not processed correctly,
| all hell might break loose.
|
| So I find the analogy misleading.
| StillBored wrote:
| There have been machines tolerant to CPU and Mem failures, and
| to a certain extent this sorta works on some of the higher end
| machines that support ram/cpu hotplug. (historically see
| hp/tandem/nonstop, sunos/imp, etc).
|
| The problem is linux's monolithic model, doesn't work well for
| kernel checkpoint/restore despite it actually supporting
| hotplug cpu/ram it they have to be gracefully removed.
|
| So, this is less about the machine being distributed, and more
| about the fact that linux is the opposite of a microkernel/etc
| that can isolate and restart its subsystems in the face of
| failure. Its also sorta funny that while these types of
| operations tend to need to be designed into the system, the
| last major OS's designed this way were done in the 1980's.
| dwohnitmok wrote:
| I know of no OSes that are resilient to CPU cores producing
| wrong results (or incorrect mem results: I consider ECC a
| lower level concern that is not part of the OS), whereas a
| lot of distributed consensus algorithms have this built into
| their requirements. EDIT: I have heard through the grapevine
| that something like this might be done for aerospace, but I
| have no personal experience with that.
|
| I agree with parent. The major reason why programming on a
| single computer is easier than a distributed system is that
| we assume total resilience of various components that we
| cannot for a distributed system.
|
| From the article:
|
| > This offers hope that it is possible to some day abstract
| away the distributed nature of larger-scale systems.
|
| To do this is not a question of software abstractions, but
| hardware resilience. If we have a network which we can
| reasonably assume to have 100% uptime and absolutely no
| corruption between all its components then we can program
| distributed systems as single computers.
| catern wrote:
| Most distributed consensus algorithms, or distributed
| systems in general, are not resilient to nodes producing
| arbitrary wrong results. That's the realm of systems like
| Bitcoin, which achieve such resilience by paying big
| performance costs.
|
| So it shouldn't be surprising that computers have the same
| lack of resilience.
| anonymousDan wrote:
| Sorry what? That is exactly the purpose of Byzantine
| fault tolerant consensus algorithms, which have been
| around for many years.
| StillBored wrote:
| The tandems I listed above, originally used lock stepped
| processors, along with stratus/etc.
|
| edit: Googling yields few results that aren't actual books,
| Try this
|
| https://books.google.com/books?id=wBuy0oLXEuQC&pg=PA218&lpg
| =...
| dwohnitmok wrote:
| Ah well there you go. Had no idea they used lock
| stepping!
| gnufx wrote:
| It doesn't count as resilient in the mainframe sense, but in
| an effort to encourage system management, I ran the Node
| Health Check system on our "commodity" HPC cluster and found
| multiple failed DIMMs and a failed socket no-one had noticed.
| (I'd had enough alerts from that on a cluster I managed.)
| imtringued wrote:
| The article also ignores that e.g. the CUDA API looks nothing
| like a local function call. People are explicitly aware when
| they are launching GPU kernels.
| bee_rider wrote:
| You can 'disable' a core in Linux pretty easily, although I'm
| not sure to what extent you'd consider this graceful (in the
| sense that you write to a system file and then some magic,
| which may be arbitrarily complicated I guess, happens in the
| background. So it doesn't seem equivalent to just yanking a
| core from the package, if that were possible).
| aidenn0 wrote:
| I think that TFA gets it exactly backwards. It's not that we
| will be able to treat multi-node systems as non-distributed
| it's that single-nodes will have to start being treated like
| distributed systems.
|
| > The links between the components of your computer are solid
| and cannot fail like actual computer network connections.
|
| I've personally had this disproven to me on multiple occasions.
| catern wrote:
| >I've personally had this disproven to me on multiple
| occasions.
|
| That sounds like interesting stories! Can you elaborate?
| aidenn0 wrote:
| Accidents on desktop hardware:
|
| Multiple bad disk cables (more common in IDE era, but
| happened once with SATA). Interestingly enough, Windows
| would reduce the drive speed on certain errors, so I had a
| drive that booted up in UDMA/133 and the longer it was
| running the slower it got, eventually settling in at PIO
| mode 2. Switching the drive cable fixed it.
|
| A sound card that wasn't screwed in to the case, so if you
| pushed the phone connector in too hard it would unseat. I
| still don't know how that happened; it must have been me
| (unless someone pranked me) but the sound-card hadn't been
| changed in like 2 years at that point.
|
| A DIMM wasn't fully clipped in, but the system worked fine
| for weeks until someone bumped into the case.
|
| Things that were actually intentional:
|
| We expect anything plugged in externally (e.g. USB,
| ethernet, HDMI) to be plugged and unplugged without needing
| to restart the system. This sounds banal, but wasn't always
| the case. I had a network card with 3 interfaces (10BASE5
| AUI, 10BASE2 BNC, 10BASE-T modular plug) and you needed to
| power off the system and toggle a DIP switch to change
| which was in use.
|
| I've seen server and minicomputer hardware with
| hotpluggable CPUs and RAM
|
| Eurocard type systems (e.g. VME, cPCI) could connect all
| sorts of things, and could run without restarting. This
| sort of blurs the line as to what a "node" is. If you have
| multiple CPUs on the same PCI bus, is that one node or
| many?
|
| eGPUs have made hotplugging a GPU something that anyone
| might do today. If you run this setup, then the majority of
| the computational power in your system can appear and
| disappear at will, along with multiple GB of RAM.
| catern wrote:
| >Yes but your computer will not gracefully handle CPUs randomly
| failing or RAM randomly failing
|
| That's incorrect.
|
| There are plenty of machines/OSs which are (or can be)
| resilient to a CPU failing; Linux, for example. From the OS
| point of view, you just kill the process that was running on
| the CPU at the time and move on.
|
| Resilience to spontaneous RAM failures is rarer but possible.
| bee_rider wrote:
| Killing the processes running on the compute element seems
| not very graceful, right? I'd expect a gracefully handled
| failure to have some state staved from which the computation
| can be continued.
|
| Which would be overkill on a single node, given that CPUs
| don't really fail all that often.
| catern wrote:
| It's up to userspace to do more than that. There are other
| issues which can cause processes to be spontaneously killed
| (OOMkiller for example) so it's something you should be
| tolerant of.
| nickelpro wrote:
| Disagree. An environment that's being reaped by OOMK is
| not stable enough to make assumptions about. You're in
| "go down the hall and turn it off and on again"
| territory.
|
| Attempting to account for such environments in user
| programs massively inflates their complexity, does little
| to enhance reliability, and the resulting behavior is
| typically brittle or outright broken from the get go.
|
| This is why, for example, the C++ committee flirts with
| making allocation failure a UB condition.
| AdamH12113 wrote:
| >[The fact that computers are made of many components separated
| by communication buses] suggests that it may be possible to
| abstract away the distributed nature of larger-scale systems.
|
| This is a neat line of thought, but I don't think it can go very
| far. There is a huge difference in reliability and predictability
| between small-scale and large-scale systems. One way to see this
| is to look at power supplies. Two ICs on the same board can be
| running off of the same 3.3V supply, and will almost certainly
| have a single upstream AC connection to the mains. When thinking
| about communications between the ICs, you don't have to consider
| power failure because a power failure will take down both ICs.
| Compare this to a WiFi network where two devices could be on
| separate parts of the power grid!
|
| Other kinds of failures are rare enough to be ignored completely
| for most applications. An Ethernet cable can be unplugged. A PCB
| trace can't.
|
| I used to work with a low-level digital communication protocol
| called I2C. It's designed for communication between two chips on
| the same board. There is no defined timeout for communication. A
| single malfunctioning slave device can hang the entire bus.
| According to the official protocol spec, the recommended way of
| dealing with this is to reset every device on the bus (which may
| mean resetting the entire board). If a hardware reset is not
| available, the recommendation is to power-cycle the system! [1]
|
| Now I2C is a particularly sloppy protocol, and higher-level
| versions (SMBus and PMBus) do fix these problems, so this is a
| bit of an extreme example. But the fact that I2C is still
| commonly used today shows how reliable a small-scale electronic
| system can be. Even at the PC level, low-level hardware faults
| are rare enough that they're often indicated only by weird
| behavior ("My system hangs when the GPU gets hot"), and the
| solution is often for the user to guess which component is broken
| and replace it.
|
| [1] Section 3.1.16 of https://www.nxp.com/docs/en/user-
| guide/UM10204.pdf
| taeric wrote:
| So much of programming languages is to hide the distributed
| nature of what the computer is doing on a regular basis. This is
| somewhat obvious for thread abstractions where you can get two
| things happening. It is blatant for CUDA style programming.
|
| As this link points out, it gets a bit more difficult with some
| of the larger machines we have to keep the abstractions useful.
| That said, it does mostly work. Despite being able to find and
| harp on the areas that it fails, it is amazing how well so many
| of the abstractions have held up.
|
| Would be neat to see explicit handling of what features are
| basically completely hiding distributed nature of the computer.
| jayd16 wrote:
| The abstractions aren't just for simplicity. In many cases,
| ensuring that the distributed nature is unknown or unobserved
| means the system can make different decisions without affecting
| the program. This leaves room for flexibility in the system
| design.
| taeric wrote:
| Distributed problems that are largely timing are easy to see
| in this nature. In large, the whole synchronize on a clock
| idea is invisible to programmers.
|
| That said, there are times when it isn't hidden, but only
| taken out of your control. I guess the question is mainly in
| how to move them to first class objects to reason about?
| Karrot_Kream wrote:
| Maybe? Alternatively by bringing the distributed nature up
| front-and-center you can have more flexible designs. If I
| could timeout my drawing routine when the screen has already
| refreshed (or context has been stolen from the OS) then I
| have a lot more flexibility in how to recover instead of
| pretending to do my best and ending up with a lot of screen
| tearing when I miss my frame budget.
| jayd16 wrote:
| I'm trying to wrap my head around where this would happen
| in a way that made sense. Derailing the GPU pipeline from
| the OS probably doesn't make much sense. If we're talking
| about the OS halting the CPU side of the render I guess
| that would maybe be useful? Even on a single core machine
| it would be equally useful so I don't know if its a case of
| distribution per se...
|
| But in the abstract, sure. It's a give and take. It's
| useful to know things and use that knowledge. It's also
| useful to know a detail is hidden and changeable without
| consequence.
| Karrot_Kream wrote:
| Yeah I'm thinking the OS halts the CPU side of the render
| and, say, stuffs an errno into a register after the
| routine so the CPU can see what happened and recover. If
| I were writing a program that required a minimum frame
| rate and I missed multiple frames, it would probably be
| nicer for the user if I displayed a message that I was
| just unable to write a frame at the required speed and
| quit rather than screen tear and frustrate the user.
|
| A similar situation happens if my NIC/kernel buffers are
| to overloaded to send the packets I need out. Instead I
| can try in vain to push packets out and have almost no
| understanding how many packets the OS is dropping just to
| keep up. Media standards like RTCP were designed around
| scenarios like these, but that itself is complexity we
| wouldn't need if the OS could notify the application when
| their packet writes failed.
|
| This kind of flexibility right now is really difficult
| because most OSs try to pretend as hard as possible that
| everything happens sequentially. This is just about
| opening up more complete abstractions to the programmer.
| zozbot234 wrote:
| The distributed nature can never be _unobserved_ , by
| definition. What a well-designed distributed system can do is
| offer facilities to enable useful constraints on its
| operation, that might then be used as necessary via a
| programming language.
| [deleted]
| Koshkin wrote:
| Yes, and concurrency is, in fact, an implementation detail. Which
| is why I think that in most _applied_ scenarios it should be
| hidden, and taken care of, by the compiler.
| it wrote:
| The Erlang VM (BEAM) can be viewed as a distributed operating
| system, or at least the beginnings of one.
| simne wrote:
| Agree, and could add, that ALL Erlang flavors (exists at least
| 4 independent implementations for different environments and
| for different targets) are distributed.
|
| And Erlang is based on relatively new syntax from Prolog, which
| also have cool ideas.
| tonymet wrote:
| i recommend people model their apps this way. spin up more
| threads than needed, one each for api , DB , LB, async, pipelines
| etc. you can model an entire stack in one memory space. It's a
| great way to prototype your complete data model before scaling to
| the proper solutions. Lots of design constraints are found this
| way . everything looks great on paper but then falls apart when
| integrating layers.
| bumblebritches5 wrote:
| simne wrote:
| Unfortunately, this idea fights vs idea of least responsibility.
|
| Because, user level programs are all at one level of abstraction,
| and this distribution is distributed over many levels of
| abstraction.
|
| So in desktop systems, mean mostly successors of business micro
| machines, access to other levels of abstraction intentionally
| hardened for measures of security and reliability. The same thing
| applied to crowd computing - there also vps's are isolated from
| hardware and from other vps's.
|
| These measures usually avoided in game systems and in embedded
| systems, but they are not allowed to run multiple programs from
| independent developers (for security and reliability), and their
| programming magnitudes more expensive than desktops and even
| server side (yes, you may surprised, but game consoles software
| in many case more reliable than military, and usually far surpass
| business software).
|
| To solve this contradiction, need some totally new paradigms and
| technologies, may be some revolutionary, like usage of GAI to
| write code.
___________________________________________________________________
(page generated 2022-03-30 23:00 UTC)