[HN Gopher] "Doors" in Solaris: Lightweight RPC Using File Descr...
___________________________________________________________________
"Doors" in Solaris: Lightweight RPC Using File Descriptors (1996)
Author : thunderbong
Score : 106 points
Date : 2024-07-24 05:02 UTC (18 hours ago)
(HTM) web link (www.kohala.com)
(TXT) w3m dump (www.kohala.com)
| heinrichhartman wrote:
| > Conceptually, during a door invocation the client thread that
| issues the door procedure call migrates to the server process
| associated with the door, and starts executing the procedure
| while in the address space of the server. When the service
| procedure is finished, a door return operation is performed and
| the thread migrates back to the client's address space with the
| results, if any, from the procedure call.
|
| Note that Server/Client refer to threads on the same machine.
|
| While I can see performance benefits of this approach, over
| traditional IPC (sockets, shared memory), this "opens the door"
| for potentially worse concurrency headaches you have with threads
| you spawn and control yourself.
|
| Has anyone here hands-on experience with these and can comment on
| how well this worked in practice?
| actionfromafar wrote:
| Reminds me vaguely of how Linux processes are (or were? I
| haven't looked at this in ages) elevated to kernel mode during
| a syscall.
|
| With a "door", a client is elevated to the "server" mode.
| creshal wrote:
| Sounds like Android's binder was heavily inspired by this.
| Works "well" in practice in that I can't recall ever having
| concurrency problems, but I would not bother trying to
| benchmark the efficiency of Android's mess of abstraction
| layers piled over `/dev/binder`. It's hard to tell how much of
| the overhead is required to use this IPC style safely, and how
| much of the overhead is just Android being Android.
| p_l wrote:
| Not sure which one came first, but Binder is direct
| descendant (down to sometimes still matching symbol names and
| calls) of BeOS IPC system. All the low level components
| (Binder, Looper, serialization model even) come from there.
| kragen wrote:
| what's the best introduction to how beos ipc worked?
| p_l wrote:
| Be Book, Haiku source code, and yes Android low level
| internals docs.
|
| A quick look through BeOS and Android Binder-related APIs
| will quickly show how Android side is derived from it
| (through OpenBinder, which was for a time going to be
| used in next Palm system based on Linux, at least one of
| them)
| kragen wrote:
| thank you very much!
| stuaxo wrote:
| That's really interesting, I wonder if a Haiku on Linux
| could use it.
|
| Will binder ever make it into mainline Linux?
| creshal wrote:
| As far as I understand, it is already mainlined, it's
| just not built by "desktop" distributions since nobody
| really cares - all the cool kids want
| dbusFactorySingletonFactoryPatternSingletons to undo 20
| years of hardware performance increases instead.
| creshal wrote:
| From what I understand, Sun made their Doors concept public
| in 1993 and shipped a SpringOS beta with it in 1994, before
| BeOS released, but it's hard to tell if Sun inspired BeOS,
| or of this was a natural solution to a common problem that
| both teams ran into at the same time.
| pjmlp wrote:
| Binder was inspired by IPC mechanisms in Palm and BeOS, whose
| engineers joined the original Android team.
| rerdavies wrote:
| Conceptually. What are they actually doing, and why is it
| faster than other RPC techniques?
| fch42 wrote:
| Think of it in terms of REST. A door is an endpoint/path
| provided by a service. The client can make a request to it
| (call it). The server can/will respond.
|
| The "endpoint" is set up via door_create(); the client
| connects by opening it (or receiving the open fd in other
| ways), and make the request by door_call(). The service sends
| its response by door_return().
|
| Except that the "handover" between client and service is
| inline and synchronous, "nothing ever sleeps" in the process.
| The service needn't listen for and accept connections. The
| operating system "transfers" execution directly - context
| switches to the service, runs the door function, context
| switches to the client on return. The "normal" scheduling
| (where the server/client sleeps, becomes runnable from
| pending I/O and is eventually selected by the scheduler) is
| bypassed here and latency is lower.
|
| Purely functionality-wise, there's nothing you can do with
| doors that you couldn't do with a (private) protocol across
| pipes, sockets, HTTP connections. You "simply" use a
| faster/lower-latency mechanism.
|
| (I actually like the "task gate" comparison another poster
| made, though doors do not require a hardware-assisted context
| switch)
| p_l wrote:
| Well, Doors' speed was derived from hardware-assisted
| context switching, at least on SPARC. Combination of ASIDs
| (which allowed task switching with reduced TLB flushing)
| and WIM register (which marked which register windows are
| valid for access by userspace) meant that IPC speed could
| be greatly increased - in fact that was basis for "fast
| path" IPC in Spring OS from which Doors were ported into
| Solaris.
| fch42 wrote:
| I was (more) of a Solaris/x86 kernel guy on that
| particular level and know the x86 kernel did not use task
| gates for doors (or any context switching other than the
| double fault handler). Linux did taskswitch via task
| gates on x86 till 2.0, IIRC. But then, hw assist or no,
| x86 task gates "aren't that fast".
|
| The SPARC context switch code, to me, always was very
| complex. The hardware had so much "sharing" (the register
| window set could split to multiple owners, so would the
| TSB/TLB, and the "MMU" was elaborate software in sparcv9
| amyway). SPARC's achilles heel always were the "spills" -
| needless register window (and other cpu state) to/from
| memory. I'm kinda still curious from a "historical" point
| of view - thanks!
| p_l wrote:
| The historical point was that for Spring OS "fast path"
| calls, if you kept register stack small enough, you could
| avoid spilling at all.
|
| Switching from task A to task B to service a "fast path"
| call _AFAIK_ (have no access to code) involved using WIM
| register to set windows used by task A to be invalid (so
| their use would trigger a trap), and changing the ASID
| value - so if task B was already in TLB you 'd avoid
| flushes, or reduce them only to flushing when running out
| of TLB slots.
|
| The "golden" target for fast-path calls was calls that
| would require as little stack as possible, and for common
| services they might be even kept hot so they would be
| already in TLB.
| akira2501 wrote:
| > this "opens the door" for potentially worse concurrency
| headaches you have with threads you spawn and control yourself.
|
| What makes doors "potentially worse" than regular threads?
| wahern wrote:
| IIUC, what they mean by "migrate" is the client thread is
| paused and the server thread given the remainder of the time
| slice, similar to how pipe(2) originally worked in Unix and
| even, I think, early Linux. It's the flow of control that
| "conceptually" shifts synchronously. This can provide
| surprising performance benefits in alot of RPC scenarios,
| though less now as TLB, etc, flushing as part of a context
| switch has become more costly. There are no VM shenanigans
| except for some page mapping optimizations for passing large
| chunks of data, which apparently wasn't even implemented in the
| original Solaris implementation.
|
| The kernel can spin up a thread on the server side, but this
| works just like common thread pool libraries, and I'm not sure
| the kernel has any special role here except to optimize context
| switching when there's no spare thread to service an incoming
| request and a new thread needs to be created. With a purely
| userspace implementation there may be some context switch
| bouncing unless an optimized primitive (e.g. some special futex
| mode, perhaps?) is available.
|
| Other than maybe the file namespace attaching API (not sure of
| the exact semantics), and presuming I understand properly, I
| believe Doors, both functionally and the literal API, could be
| implemented entirely in userspace using Unix domain sockets,
| SCM_RIGHTS, and mmap. It just wouldn't have the context
| switching optimization without new kernel work. (See the
| switchto proposal for Linux from Google, though that was for
| threads in the same process.)
|
| I'm basing all of this on the description of Doors at
| https://web.archive.org/web/20121022135943/https://blogs.ora...
| and http://www.rampant.org/doors/linux-doors.pdf
| gpderetta wrote:
| I think what doors do is rendezvous synchronization: the
| caller is atomically blocked as the callee is unblocked (and
| vice versa on return). I don't think there is an efficient
| way to do that with just plain POSIX primitives or even with
| Linux specific syscalls (Binder and io_uring possibly might).
| xoranth wrote:
| Sounds a bit like Google's proposal for a `switchto_switch`
| syscall [1] that would allow for cooperative multithreading
| bypassing the scheduler.
|
| (the descendants of that proposal is `sched_ext`, so maybe
| it is possible to implement doors in eBPF + sched_ext?)
|
| [1]: https://youtu.be/KXuZi9aeGTw?t=900
| monocasa wrote:
| Not quite.
|
| There isn't a door_recv(2) systemcall or equivalent.
|
| Doors truly don't transfer messages, they transfer the thread
| itself. As in the thread that made a door call is now just
| directly executing in the address space of the callee.
|
| They're more like i432/286/mill cpu task gates.
| wahern wrote:
| > Doors truly don't transfer messages, they transfer the
| thread itself. As in the thread that made a door call is
| now just directly executing in the address space of the
| callee.
|
| In somewhat anachronistic verbiage (at least in a modern
| software context) this may be true, but today this
| statement makes it sounds like _code_ from the caller
| process is executing in the address space of the callee
| process, such that miraculously the caller code now can
| directly reference data in the callee. AFAICT that just isn
| 't the case, and wouldn't even make sense--i.e. how would
| it know the addresses without a ton of complex reflection
| that's completely absent from example code? (Caller and
| callee don't need to have been forked from each other.) And
| according to the Linux implementation, the "argument" (a
| flat, contiguous block of data) passed from caller to
| callee is literally copied, either directly or by mapping
| in the pages. The caller even needs to provide a return
| buffer for the callee's returned data to be copied into
| (unless it's too large, then it's mapped in and the return
| argument vector updated to point to the newly mmap'd
| pages). File descriptors can also be passed, and of course
| that requires kernel involvement.
|
| AFAICT, the trick here pertains to _scheduling_ alone, both
| wrt to the hardware and software systems. I.e. a lighter
| weight interface for the hardware task gating mechanism,
| like you say, reliant on the synchronous semantics of this
| design to skip involving the system scheduler. But all the
| other process attributes, including address space, are
| switched out, perhaps in an optimized matter as mentioned
| elsethread but still preserving typical process isolation
| semantics.
|
| If I'm wrong, please correct me with pointers to more
| detailed technical documentation (Or code--is this still in
| Illuminos?) because I'd love to dig more into it.
|
| FWIW, Here's the Solaris man page for libdoor: https://docs
| .oracle.com/cd/E36784_01/html/E36873/libdoor-3li... Did you
| mean door_call or door_return instead of door_recv?
| monocasa wrote:
| I didn't imply that the code remains and it's only data
| that is swapped out. The thread jumps to another complete
| address space.
|
| It's like a system call instruction that instead of
| jumping into the kernel, jumps into another user process.
| There's a complete swap out of code and data in most
| cases.
|
| Just like with system calls how the kernel doesn't need a
| thread pool to respond to user requests applies here. The
| calling thread is just directly executing in the callee
| address space after the door_call(2).
|
| > Did you mean door_call or door_return instead of
| door_recv?
|
| I did not. I said there is no door_recv(2) systemcall.
| The 'server' doesn't wait for messages at all.
| kragen wrote:
| thanks for finding the man page!
| blacklion wrote:
| Conceptually is key word here.
|
| Later in this article authors says tat Server manage its own
| pool (optionally bounded) of threads to serve requests.
| p_l wrote:
| The thread in this context refers to kernel scheduler
| thread[1], essentially the entity used to schedule user
| processes. By migrating the thread, the calling process is
| "suspended", it's associated kernel thread (and thus scheduled
| time quanta, run queue position, etc.) saves the state into
| Door "shuttle", picks up the server process, continues
| execution of the server procedure, and when the server process
| returns from the handler, the kernel thread picks up the Door
| "shuttle", restores the right client process state from it, and
| lets it continue - with the result of the IPC call.
|
| This means that when you do a Door IPC call, the service
| routine is called immediately, not at some indefinite point in
| time in the future when the server process gets picked by
| scheduler to run and finds out an event waiting for it on
| select/poll kind of call. If the service handler returns fast
| enough, it might return even before client process' scheduler
| timeslice ends.
|
| The rapid changing of TLB etc. are mitigated by hardware
| features in CPU that permit faster switches, something that Sun
| had already experience with at the time from the Spring
| Operating System project - from which the Doors IPC in fact
| came to be. Spring IPC calls were often faster than normal x86
| syscalls at the time (timings just on the round trip: 20us on
| 486DX2 for typical syscall, 11us for sparcstation Spring IPC,
| >100us for Mach syscall/IPC)
|
| EDIT:
|
| [1] Some might remember references to 1:1 and M:N threading in
| the past, especially in discussions about threading support in
| various unices, etc.
|
| The "1:1" originally referred to relationship between "kernel"
| thread and userspace thread, where kernel thread didn't mean
| "posix like thread in kernel" and more "the scheduler
| entity/concept", whether it was called process, thread, or
| "lightweight process"
| grishka wrote:
| Why would you care who spawned the thread? If your code is
| thread-safe, it shouldn't make a difference.
|
| One potential problem with regular IPC I see is that it's
| nondeterministic in terms of performance/throughput because you
| can't be sure when the scheduler will decide to run the other
| side of whatever IPC mechanism you're using. With these
| "doors", you bypass scheduling altogether, you call straight
| "into" the server process thread. This may make a big
| difference for systems under load.
| robinhouston wrote:
| Back in 1998 or so, a colleague and I were tasked with building a
| system to target adverts to particular users on a website. It
| obviously needed to run very efficiently, because it would be
| invoked on every page load and the server hardware of the time
| was very underpowered by today's standards.
|
| The Linux server revolution was still a few years away (at least
| it was for us - perhaps we were behind the curve), so our
| webservers were all Sun Enterprise machines running Solaris.
|
| We decided to use a server process that had an in-memory
| representation of the active users, which we would query using
| doors from a custom Apache module. (I had read about doors and
| thought they sounded cool, but neither of us had used them
| before.)
|
| It worked brilliantly and was extremely fast, though in the end
| it was never used in production because of changing business
| priorities.
| codetrotter wrote:
| Solaris was ahead of its time. They had Zones, which was a
| container technology, years ago. Likewise, FreeBSD way ahead with
| Jails. Only later did Docker on Linux come into existence and
| then containerization exploded in popularity.
| layer8 wrote:
| In the early 2000s I rented a managed VPS based on FreeBSD
| jails. This was nice because you were root in your jail while
| the distribution was managed for you (security updates in
| particular), and you could still override any system files you
| wanted via the union file system. It was like the best of both
| worlds between a managed Unix account and a VPS where you can
| do anything you want as root.
| bionsystem wrote:
| AND you are running bare metal, without the virtualization
| layer overhead, which is quite nice especially in the early
| 2000s :)
|
| My first project as a junior was to deploy Solaris Zones to
| replace a bunch of old SPARC machines. It's such a great tech
| and was a fun project.
| hnlmorg wrote:
| Linux had a few different containerisation technologies before
| Docker blew up. But even they felt like a step backwards from
| Jails and Zones.
|
| Personally I think its more fair to say that Linux was behind
| the times because containerisation and virtualisation were
| common concepts in UNIXes long before they became Linux
| staples.
| stormking wrote:
| Well, Docker did a few things differently then these older
| approaches. First, the one-process-per-container mantra, second
| the layered file system and build process and third, the
| "wireing together" of multiple containers via exposed ports.
| Zones and Jails were more like LXC and were mostly used as a
| "lightweight virtual machine".
| panick21_ wrote:
| I highly recommend this talk, talking about Jails and Zones.
| Jails was first, but Zones took a lot of lessens from it and
| went further.
|
| https://www.youtube.com/watch?v=hgN8pCMLI2U
| pjmlp wrote:
| Historically, HP-UX 10 Vaults might have been there first.
| kragen wrote:
| probably lynn wheeler will tell you cp/67 was there first:
| https://www.garlic.com/~lynn/2003d.html#72
| pjmlp wrote:
| Certainly, I was only focusing on the UNIX side of the
| history.
| kragen wrote:
| understandable!
| p_l wrote:
| CP-67 was a hypervisor, a different model altogether to
| the chroot/jails/zones/linux namespace evolution, the
| sidequest of HP Vaults and various workload partitioning
| schemes on systems like AIX, or the grand-daddy of chroot
| line, Plan 9 namespace system.
| kragen wrote:
| yes, i agree, except that doesn't chroot predate plan9 by
| almost a decade?
| p_l wrote:
| Thank you, somehow I missed the part of the history! Yes
| indeed, chroot() starts in 1979 patches ontop of V7
| kernel. Plan9 namespaces probably evolved from there as
| Research Unix V8 was BSD based.
| kragen wrote:
| thanks! i didn't have any idea 8th edition was bsd-based
| panick21_ wrote:
| From what I have read, it was nowhere near as comprehensive
| and integrated. I only read about them once, so I don't
| really know.
| pjmlp wrote:
| I would say it was easy enough for traditional UNIX
| admins.
|
| We used it for our HP-UX customers in telecoms, using our
| CRM software stack, based on Apache/Tcl/C.
| dprice1 wrote:
| Rereading the zones paper now makes me cringe, but I was in
| my 20s, what can I say. I think the argument we made that
| holds up is that this was designed to be a technology for
| server consolidation, and the opening section sets some
| context about how primitive things were in Sun's enterprise
| customer base at the time.
|
| I have a lot of admiration for what Docker dared to do-- to
| really think differently about the problem in a way which
| changed application deployment for everyone.
|
| Also I can tell you at the time that we were not especially
| concerned about HP or IBM's solutions in this space; nor did
| we face those container solutions competitively in any sales
| situation that I can recall. This tech was fielded in the
| wake of the dot-com blowout-- and customers had huge estates
| of servers often at comically low utilization. So this was a
| good opportunity for Sun to say "We are aligned with your
| desire to get maximum value out of the hardware you already
| have."
|
| It's a blast to see this come up from time to time on HN,
| thanks.
| 4ad wrote:
| And ZFS, and DTrace.
| rbanffy wrote:
| Crossbow as well.
| wang_li wrote:
| And STMF.
| pjmlp wrote:
| While Solaris was/is one of my favourite UNIXes, back in 1999,
| HP-UX already had HP Vaults, which keeps being forgotten when
| Solaris Zones are pointed out.
| dmd wrote:
| We all try to do our best to forget HP-UX. I worked on it at
| Bristol-Myers Squibb in 1999 and hated every minute of it.
| pjmlp wrote:
| How much was HP-UX, and how much was the job itself?
| dmd wrote:
| 40 60
| ilikejam wrote:
| I've said it before, and I'll never stop saying it.
|
| Zones were awful.
|
| Solaris had all of these great ideas, but the OS was an
| absolute pain in the ass to admin. And I say that from both
| sides of the SWAN firewall.
| EvanAnderson wrote:
| My only Solaris experience is using SmartOS. I find Zones
| there to be fairly easy to deal with. Is this just because
| SmartOS is papering-over the awfulness?
| ahoka wrote:
| cgroups came only three years after zones, not a huge
| difference.
| jiveturkey wrote:
| Solaris is still ahead of its time on many things. I wish it
| were worth porting eBPF to Solaris.
| danwills wrote:
| Somehow I'm only just noticing the name-conflict with SideFX
| Houdini's new(ish) USD context, which is also called 'Solaris'!
| .. Guess I don't search for the old SunSoft kind of Solaris much
| these days eh!
| rbanffy wrote:
| You probably noticed that before Oracle's legal dept.
| lukeh wrote:
| 1998 paper on Linux implementation:
| http://www.rampant.org/doors/linux-doors.pdf
| oecumena wrote:
| I did another one back in 2001: https://ldoor.sourceforge.net/
| .
| monocasa wrote:
| Writing a top level comment here to hopefully address some of the
| misconceptions present across this comment section.
|
| Doors at the end of the day aren't message passing based RPC.
| There is no door_recv(2) syscall or equivalent nor any way for a
| thread pool in the callee to wait for requests.
|
| Doors at the end of the day are a control transfer primitive. In
| a very real sense the calling thread is simply transferred to the
| callee's address space and continues execution there until a
| door_return(2) syscall transfers it back into the caller
| address_space.
|
| It truly is a 'door' into another address space.
|
| This is most similar to some of the CPU control transfer
| primitives. It's most like task gate style constructs like seen
| on 286/i432/mill CPUs. Arguably it's kind of like the system call
| instruction itself too, transferring execution directly to
| another address space/context.
| kragen wrote:
| what does the stack look like in the callee's address space? is
| it a stack that previously existed (you seem to be saying it
| isn't) or a new one created on demand?
|
| maybe https://docs.oracle.com/cd/E88353_01/html/E37843/door-
| create... is where i should start reading...
| starspangled wrote:
| Maybe you know more about it I don't want to say you're wrong
| because the only thing I know of it is reading the page linked
| to. But that page semes to disagree with you (EDIT: sorry I
| don't know how to best format quoting the article vs quoting
| your reply): Doors Implementation
| [...] the default server threads that are created in response
| to incoming door requests on the server. A new
| synchronization object, called a shuttle [...] allowing a
| direct scheduler hand-off operation between two threads.
| [...] a shuttle marks the current thread as sleeping, marks the
| server thread as running, and passes control directly to the
| server thread. Server Threads Server
| threads are normally created on demand by the doors library
| [...] The first server thread is created automatically
| when the server issues a door create call. [...] Once
| created, a server thread will place itself in a pool of
| available threads and wait for a door invocation.
|
| There would be no need for these server threads if the client
| thread transferred directly to the server process address
| space.
|
| > There is no door_recv(2) syscall or equivalent nor any way
| for a thread pool in the callee to wait for requests
|
| It says the thread pool on the server is created by the doors
| library when the server creates a door. So the process of
| receiving and processing requests would be carried out
| internally within the doors library, so there would be no need
| for the server application to have an API for the server
| application to accept requests, it's handled by the library.
|
| At least that's what is described in the link, AFAIKS. It's
| only a conceptual door, underneath it is implemented in some
| message passing style maybe with some extra sugar for
| performance and process control nicety with this "shuddle"
| thing for making the requests.
| kragen wrote:
| this is very confusing and now i want to see truss output
|
| man pages like
| https://docs.oracle.com/cd/E88353_01/html/E37843/door-
| server... reference literally zero man pages in section 2, so
| i wonder if there is in fact a door_recv system call and it
| just isn't documented?
|
| but yeah it sure seems like there's a thread pool of server
| threads (full-fledged posix threads with their own signal
| masks and everything) that sit around waiting for door calls
|
| > _The door_server_create() function allows control over the
| creation of server threads needed for door invocations. The
| procedure create_proc is called every time the available
| server thread pool is depleted. In the case of private server
| pools associated with a door (see the DOOR_PRIVATE attribute
| in door_create()), information on which pool is depleted is
| passed to the create function in the form of a door_info_t
| structure. The di_proc and di_data members of the door_info_t
| structure can be used as a door identifier associated with
| the depleted pool. The create_proc procedure may limit the
| number of server threads created and may also create server
| threads with appropriate attributes (stack size, thread-
| specific data, POSIX thread cancellation, signal mask,
| scheduling attributes, and so forth) for use with door
| invocations._
|
| <https://docs.oracle.com/cd/E88353_01/html/E37843/door-
| server...>
|
| apparently things like door_create() survived into
| opensolaris and so they are presumably open source now? even
| if under the cddl
|
| <https://www.unix.com/man-page/opensolaris/3c/door_create/>
|
| /me git clone https://github.com/kofemann/opensolaris
|
| jesus fuck, 1.4 gigabytes? fuck you very much fw_lpe11002.h
|
| okay so usr/src/lib/libc/port/threads/door_calls.c says the
| 'raw system call interfaces' are __door_create,
| __door_return, __door_ucred, __door_unref, and __door_unbind,
| which, yes, do seem to be undocumented. they seem to have
| been renamed in current illumos
| https://github.com/illumos/illumos-
| gate/blob/master/usr/src/...
|
| unfortunately it's not obvious to me how to find the kernel
| implementation of the system call here, which would seem to
| be almost the only resort when it isn't documented? i guess i
| can look at how it's used
|
| __door_create in particular is called with the function
| pointer and the cookie, and that's all door_create_cmn does
| with the function pointer; it doesn't, for example, stash it
| in a struct so that a function internal to door_calls.c can
| call it in a loop after blocking on some sort of
| __door_recv() syscall (which, as i said, doesn't exist)
|
| it _does_ have a struct privdoor_data under some
| circumstances; it just doesn 't contain the callback
|
| i don't know, i've skimmed all of door_calls.c and am still
| not clear on how these threads wait to be invoked
|
| aha, the kernel implementation is in
| usr/src/uts/common/fs/doorfs/door_sys.c. door_create_common
| stashes the function pointer in the door_pc of a door_node_t.
| then door_server_dispatch builds a new thread stack and
| stashes the door_pc on it as the di_proc of a door_info_t
| starting at line 1284: https://github.com/kofemann/opensolari
| s/blob/master/usr/src/...
|
| this seems to be a structure with layout shared between the
| kernel and userspace (mentioned in the man page i quoted
| above): https://github.com/illumos/illumos-
| gate/blob/master/usr/src/...
|
| ...but then door_calls.c never uses di_proc! so i'm still
| mystified as to how the callback function you pass to
| door_create ever gets called
|
| probably one hour diving into solaris source code is enough
| for me for this morning, though it's very pleasantly
| formatted and cleanly structured and greppable. does anybody
| else know how this works and how they got those _awesome_
| numbers? is door_call web scale?
| wahern wrote:
| The callback address is loaded and invoked here:
| https://github.com/illumos/illumos-
| gate/blob/5d9d909/usr/src... It's why door_server_dispatch
| copies out door_results to the server thread stack.
|
| The aha moment was when I realized that the door_return(2)
| syscall is how threads yield and wait to service the next
| request. In retrospect it make sense, but I didn't see it
| until I tried to figure out how a user space thread polled
| for requests--a thread first calls door_bind, which
| associates it to the private pool, and then calls
| door_return with empty arguments to wait for the initial
| call. (See example door_bind at
| https://github.com/illumos/illumos-
| gate/blob/4a38094/usr/src... and door_return at
| https://github.com/illumos/illumos-
| gate/blob/4a38094/usr/src...)
|
| One of the confusing aspects for me was that there's both a
| global pool of threads and "private" pool of threads. By
| default threads are pulled from the global pool to service
| requests, but if you specify DOOR_PRIVATE to door_create,
| it uses the private pool bound to the door.
|
| AFAICT, this is conceptually a typical worker thread
| pooling implementation, with condition variables, etc for
| waking and signaling. (See door_get_server at
| https://github.com/illumos/illumos-
| gate/blob/915894e/usr/src...) Context switching does seem
| to be optimized so door_call doesn't need to bounce through
| the system scheduler. (See shuttle_resume at
| https://github.com/illumos/illumos-
| gate/blob/2d6eb4a/usr/src...) And door_call/door_return
| data is copied across caller and callee address spaces like
| you'd expect, except the door_return magic permits the
| kernel to copy the data to the stack, adjusting the stack
| pointer before resuming the thread and resetting it when it
| returns. (See door.S above and door_args at
| https://github.com/illumos/illumos-
| gate/blob/915894e/usr/src...)
|
| This works just as one might expect: no real magic except
| for the direct thread-thread context switching. But that's
| a similar capability provided by user space scheduler
| activations in NetBSD, or the switchto proposal for Linux.
| The nomenclature is just different.
|
| It's a slightly different story for in-kernel doors, but
| that's not surprising, either, and there's nothing
| surprsing there, AFAICT (but I didn't trace it as much).
|
| Thanks for finding those source code links. I cloned the
| repo and started from there, grep'ing the whole tree to
| find the user space wrappers, etc.
| kragen wrote:
| aha! i should have looked in the assembly. in opensolaris
| it's opensolaris/usr/src/lib/libc/amd64/sys/door.s line
| 121; as far as i can see the file hasn't changed at all
| except for being renamed to door.S. and of course the
| assembly doesn't call the struct field di_proc, it
| defines a struct field address macro
|
| so door_return (or, in opensolaris, __door_return) is the
| elusive door_recv syscall, eh?
|
| (i'll follow the rest of your links later to read the
| code)
|
| i wonder if spring (which was never released afaik) used
| a different implementation, maybe a more efficient one
|
| this is strongly reminiscent of ipc in liedtke's l4
| (which doesn't implicitly create threads) or in keykos
| (which only has one thread in a domain)
|
| thank you so much for figuring this out!
| wahern wrote:
| The one thing I wasn't sure about was how the per-process
| global thread pool for non-private doors was populated. I
| missed it earlier (wasn't looking closely enough, and the
| hacks to handle forkall headaches make it look more
| complicated than it is), but the user space door_create
| wrapper code--specifically door_create_cmn--invokes
| door_create_server (through the global door_server_func
| pointer) on the first non-private door_create call.
| door_create_server creates a single thread which then
| calls door_return. On wakeup, before calling the
| application's handler function, the door_return assembly
| wrapper conditionally invokes the global
| door_depletion_cb (defined back in door_calls.c) which
| can spin up additional threads.
|
| The more I read, the more Doors seems pretty clever. Yes,
| the secret sauce is very much like scheduler activations
| or Google's switchto, but you can pass data at the same
| time, making it useful across processes, not just threads
| sharing an address space. And you can't get the
| performance benefit merely by using AF_UNIX sockets
| because the door_call and door_return have an association
| for the pendency of the call. The tricky part from an API
| perspective isn't switching the time slice to the server
| thread, it's switching it _back_ to the client thread on
| the return.
|
| Microkernels can do this, of course, but figuring out a
| set of primitives and an API that fit into the Unix model
| without introducing anything too foreign, like Mach Ports
| in macOS, does take some consideration. Now that Linux
| has process file descriptors, maybe they could be used as
| an ancillary control message token to get the round-trip
| time slice and data delivery over AF_UNIX sockets. And
| memfd objects could be used for messages larger than the
| receiver's buffer, which seems to be a niche case,
| anyhow. Or maybe that's over complicating things.
| fanf2 wrote:
| > There would be no need for these server threads if the
| client thread transferred directly to the server process
| address space.
|
| The client's stack isn't present in the server's address
| space: the server needs a pool of threads so that a door call
| does not need to allocate memory for the server stack.
| starspangled wrote:
| It wouldn't need a pool of threads for a stack, it just
| needs some memory in the server's address space for the
| stack.
|
| Together with the other points about marking the current
| thread sleeping and passing control to the server thread,
| and about server threads waiting for an invocation, I think
| what is described is pretty clearly the server threads are
| being used to execute code on the server on behalf of the
| client request.
| kragen wrote:
| can you figure out how this mechanism works? my notes in
| https://news.ycombinator.com/item?id=41056051 reflect me
| rooting around and giving up; it's nonobvious
| dan353hehe wrote:
| I did some debugging about 8 years ago in SmartOS. There were
| some processing that were running for way too long without
| using the cpu. I ended up using kdbg to trace where the
| processes were hung. I had to jump between different
| processes to trace the door calls. There was no unified stack
| in one process that spanned multiple address spaces, and the
| calling threads were paused.
|
| So, yes. The documentation in right, the calling thread is
| paused while a different one is used one the other side of
| the door.
| netbsdusers wrote:
| Kernel and user threads are distinct on Solaris. The
| implementation of doors is not by message passing. There
| needs to be a userland thread context in the server process
| with a thread control block and all the other things that the
| userland threading system depends on to provide things like
| thread-local variables (which include such essentials as
| errno). I do not know whether there is a full kernel thread
| (or LWP as they call it in Solaris), but a thread is
| primarily a schedulable entity, and if the transfer of
| control is such that the activated server thread is treated
| in the same way in terms of scheduling as the requesting
| client thread, then effectively it is still what it says it
| is.
| wang_li wrote:
| >Doors at the end of the day are a control transfer primitive.
| In a very real sense the calling thread is simply transferred
| to the callee's address space and continues execution there
| until a door_return(2) syscall transfers it back into the
| caller address_space.
|
| Your phrasing is misleading. A door is a scheduler operation.
| What would it even mean for a thread to go from a caller to a
| callee? They are different processes with completely different
| contexts on the system. Different owners, different address
| spaces, etc. What the door is doing is passing time on a CPU
| core from the calling thread to the receiving thread and back.
| ajross wrote:
| > Doors at the end of the day are a control transfer primitive.
|
| This is true, but isomorphic to RPC, so I really don't think I
| understand the distinction you're trying to draw. A "procedure
| call" halts the "caller", passes "arguments" into the call, and
| "returns a value" when it completes. And that's what a door
| call does.
|
| Nothing about "Remote Procedure Calls" requires thread pools or
| a recv() system call, or an idea of "waiting for a request".
| Those are all just implementation details of existing RPC
| mechanisms implemented _without_ a send+recv call like doors.
|
| Nor FWIW are doors alone in this space. Android Binder is
| effectively the same metaphor, just with a more opinionated
| idea of how the kernel should manage the discovery and dispatch
| layer (Solaris just gives you a descriptor you can call and
| lets userspace figure it out).
| klodolph wrote:
| I thought the parent comment was drawing a very clear
| distinction--that doors do not involve message passing, but a
| different mechanism.
|
| If it were a message passing system, you would be able to
| save the message to disk and inspect it later, or transmit
| the message over the network, etc.
| ajross wrote:
| Is this a pedantic argument about a specific definition for
| "message passing"? If not, then it's wrong: door_call()
| allows passing an arbitrary/opaque buffer of data into the
| call, and returning arbitrary data as part of the return.
| (Maybe the confusion stems from the fact that the sample
| code in the linked article skipped the second argument to
| door_call(), but you can see it documented in the man
| page).
|
| If so, then I retreat to the argument above: there is
| nothing about "Remote Procedure Call" as a metaphor that
| requires "message passing". That, again, is just an
| implementation detail of _other_ RPC mechanisms that don 't
| implement the call as directly as doors (or binder) do.
| klodolph wrote:
| > Is this a pedantic argument about a specific definition
| for "message passing"?
|
| No. This is just colloquial usage of the term "message
| passing".
|
| Yes, you can use doors to pass messages. That is not the
| only thing you can do.
|
| > If so, then I retreat to the argument above: there is
| nothing about "Remote Procedure Call" as a metaphor that
| requires "message passing".
|
| Yeah. Everyone else here agrees with that. The original
| comment you replied to said, "Doors at the end of the day
| aren't message passing based RPC." This absolutely
| indicates that the poster agrees with you on that point--
| that not all forms of RPC are based on a message passing
| system. There are other forms of RPC, and this is one of
| them.
|
| Ultimately, I think you can conceive of "message passing"
| broadly enough that, well, everything is message passing.
| What is a CPU register but something that receives a
| message and later re-transmits it? I would rather think
| of the colloquial use of "message passing".
|
| Likewise, you can probably dig into doors and find some
| message system somewhere in the implementation. Or at
| least, you'll find something you can call a message.
| bux93 wrote:
| I wonder if the name is related to BBS doors?
| https://en.wikipedia.org/wiki/Door_(bulletin_board_system)
| kragen wrote:
| i don't think so; i think people independently chose the name
| 'door' for a way to get out of one program and into another in
| both cases. bbs people and sun labs kernel researchers didn't
| talk to each other that much unfortunately, and the mechanisms
| aren't really that similar
| rbanffy wrote:
| I wonder if anyone ended up saving a copy of the Spring operating
| system.
| sillywalk wrote:
| bcantrill had a copy in his basement as of 2015.
|
| https://news.ycombinator.com/item?id=10325362
| quotemstr wrote:
| Nobody has ever done IPC better than Microsoft did with
| COM/RPC/AIPC. Nobody else even came close. I will die on this
| hill. The open source world has done itself a tremendous
| disservice eschewing object capability systems with in-process
| bypasses.
___________________________________________________________________
(page generated 2024-07-24 23:07 UTC)