hngopher.com

       [HN Gopher] "Doors" in Solaris: Lightweight RPC Using File Descr...
       ___________________________________________________________________
        
       "Doors" in Solaris: Lightweight RPC Using File Descriptors (1996)
        
       Author : thunderbong
       Score  : 106 points
       Date   : 2024-07-24 05:02 UTC (18 hours ago)
        
 (HTM) web link (www.kohala.com)
 (TXT) w3m dump (www.kohala.com)
        
       | heinrichhartman wrote:
       | > Conceptually, during a door invocation the client thread that
       | issues the door procedure call migrates to the server process
       | associated with the door, and starts executing the procedure
       | while in the address space of the server. When the service
       | procedure is finished, a door return operation is performed and
       | the thread migrates back to the client's address space with the
       | results, if any, from the procedure call.
       | 
       | Note that Server/Client refer to threads on the same machine.
       | 
       | While I can see performance benefits of this approach, over
       | traditional IPC (sockets, shared memory), this "opens the door"
       | for potentially worse concurrency headaches you have with threads
       | you spawn and control yourself.
       | 
       | Has anyone here hands-on experience with these and can comment on
       | how well this worked in practice?
        
         | actionfromafar wrote:
         | Reminds me vaguely of how Linux processes are (or were? I
         | haven't looked at this in ages) elevated to kernel mode during
         | a syscall.
         | 
         | With a "door", a client is elevated to the "server" mode.
        
         | creshal wrote:
         | Sounds like Android's binder was heavily inspired by this.
         | Works "well" in practice in that I can't recall ever having
         | concurrency problems, but I would not bother trying to
         | benchmark the efficiency of Android's mess of abstraction
         | layers piled over `/dev/binder`. It's hard to tell how much of
         | the overhead is required to use this IPC style safely, and how
         | much of the overhead is just Android being Android.
        
           | p_l wrote:
           | Not sure which one came first, but Binder is direct
           | descendant (down to sometimes still matching symbol names and
           | calls) of BeOS IPC system. All the low level components
           | (Binder, Looper, serialization model even) come from there.
        
             | kragen wrote:
             | what's the best introduction to how beos ipc worked?
        
               | p_l wrote:
               | Be Book, Haiku source code, and yes Android low level
               | internals docs.
               | 
               | A quick look through BeOS and Android Binder-related APIs
               | will quickly show how Android side is derived from it
               | (through OpenBinder, which was for a time going to be
               | used in next Palm system based on Linux, at least one of
               | them)
        
               | kragen wrote:
               | thank you very much!
        
             | stuaxo wrote:
             | That's really interesting, I wonder if a Haiku on Linux
             | could use it.
             | 
             | Will binder ever make it into mainline Linux?
        
               | creshal wrote:
               | As far as I understand, it is already mainlined, it's
               | just not built by "desktop" distributions since nobody
               | really cares - all the cool kids want
               | dbusFactorySingletonFactoryPatternSingletons to undo 20
               | years of hardware performance increases instead.
        
             | creshal wrote:
             | From what I understand, Sun made their Doors concept public
             | in 1993 and shipped a SpringOS beta with it in 1994, before
             | BeOS released, but it's hard to tell if Sun inspired BeOS,
             | or of this was a natural solution to a common problem that
             | both teams ran into at the same time.
        
           | pjmlp wrote:
           | Binder was inspired by IPC mechanisms in Palm and BeOS, whose
           | engineers joined the original Android team.
        
         | rerdavies wrote:
         | Conceptually. What are they actually doing, and why is it
         | faster than other RPC techniques?
        
           | fch42 wrote:
           | Think of it in terms of REST. A door is an endpoint/path
           | provided by a service. The client can make a request to it
           | (call it). The server can/will respond.
           | 
           | The "endpoint" is set up via door_create(); the client
           | connects by opening it (or receiving the open fd in other
           | ways), and make the request by door_call(). The service sends
           | its response by door_return().
           | 
           | Except that the "handover" between client and service is
           | inline and synchronous, "nothing ever sleeps" in the process.
           | The service needn't listen for and accept connections. The
           | operating system "transfers" execution directly - context
           | switches to the service, runs the door function, context
           | switches to the client on return. The "normal" scheduling
           | (where the server/client sleeps, becomes runnable from
           | pending I/O and is eventually selected by the scheduler) is
           | bypassed here and latency is lower.
           | 
           | Purely functionality-wise, there's nothing you can do with
           | doors that you couldn't do with a (private) protocol across
           | pipes, sockets, HTTP connections. You "simply" use a
           | faster/lower-latency mechanism.
           | 
           | (I actually like the "task gate" comparison another poster
           | made, though doors do not require a hardware-assisted context
           | switch)
        
             | p_l wrote:
             | Well, Doors' speed was derived from hardware-assisted
             | context switching, at least on SPARC. Combination of ASIDs
             | (which allowed task switching with reduced TLB flushing)
             | and WIM register (which marked which register windows are
             | valid for access by userspace) meant that IPC speed could
             | be greatly increased - in fact that was basis for "fast
             | path" IPC in Spring OS from which Doors were ported into
             | Solaris.
        
               | fch42 wrote:
               | I was (more) of a Solaris/x86 kernel guy on that
               | particular level and know the x86 kernel did not use task
               | gates for doors (or any context switching other than the
               | double fault handler). Linux did taskswitch via task
               | gates on x86 till 2.0, IIRC. But then, hw assist or no,
               | x86 task gates "aren't that fast".
               | 
               | The SPARC context switch code, to me, always was very
               | complex. The hardware had so much "sharing" (the register
               | window set could split to multiple owners, so would the
               | TSB/TLB, and the "MMU" was elaborate software in sparcv9
               | amyway). SPARC's achilles heel always were the "spills" -
               | needless register window (and other cpu state) to/from
               | memory. I'm kinda still curious from a "historical" point
               | of view - thanks!
        
               | p_l wrote:
               | The historical point was that for Spring OS "fast path"
               | calls, if you kept register stack small enough, you could
               | avoid spilling at all.
               | 
               | Switching from task A to task B to service a "fast path"
               | call _AFAIK_ (have no access to code) involved using WIM
               | register to set windows used by task A to be invalid (so
               | their use would trigger a trap), and changing the ASID
               | value - so if task B was already in TLB you 'd avoid
               | flushes, or reduce them only to flushing when running out
               | of TLB slots.
               | 
               | The "golden" target for fast-path calls was calls that
               | would require as little stack as possible, and for common
               | services they might be even kept hot so they would be
               | already in TLB.
        
         | akira2501 wrote:
         | > this "opens the door" for potentially worse concurrency
         | headaches you have with threads you spawn and control yourself.
         | 
         | What makes doors "potentially worse" than regular threads?
        
         | wahern wrote:
         | IIUC, what they mean by "migrate" is the client thread is
         | paused and the server thread given the remainder of the time
         | slice, similar to how pipe(2) originally worked in Unix and
         | even, I think, early Linux. It's the flow of control that
         | "conceptually" shifts synchronously. This can provide
         | surprising performance benefits in alot of RPC scenarios,
         | though less now as TLB, etc, flushing as part of a context
         | switch has become more costly. There are no VM shenanigans
         | except for some page mapping optimizations for passing large
         | chunks of data, which apparently wasn't even implemented in the
         | original Solaris implementation.
         | 
         | The kernel can spin up a thread on the server side, but this
         | works just like common thread pool libraries, and I'm not sure
         | the kernel has any special role here except to optimize context
         | switching when there's no spare thread to service an incoming
         | request and a new thread needs to be created. With a purely
         | userspace implementation there may be some context switch
         | bouncing unless an optimized primitive (e.g. some special futex
         | mode, perhaps?) is available.
         | 
         | Other than maybe the file namespace attaching API (not sure of
         | the exact semantics), and presuming I understand properly, I
         | believe Doors, both functionally and the literal API, could be
         | implemented entirely in userspace using Unix domain sockets,
         | SCM_RIGHTS, and mmap. It just wouldn't have the context
         | switching optimization without new kernel work. (See the
         | switchto proposal for Linux from Google, though that was for
         | threads in the same process.)
         | 
         | I'm basing all of this on the description of Doors at
         | https://web.archive.org/web/20121022135943/https://blogs.ora...
         | and http://www.rampant.org/doors/linux-doors.pdf
        
           | gpderetta wrote:
           | I think what doors do is rendezvous synchronization: the
           | caller is atomically blocked as the callee is unblocked (and
           | vice versa on return). I don't think there is an efficient
           | way to do that with just plain POSIX primitives or even with
           | Linux specific syscalls (Binder and io_uring possibly might).
        
             | xoranth wrote:
             | Sounds a bit like Google's proposal for a `switchto_switch`
             | syscall [1] that would allow for cooperative multithreading
             | bypassing the scheduler.
             | 
             | (the descendants of that proposal is `sched_ext`, so maybe
             | it is possible to implement doors in eBPF + sched_ext?)
             | 
             | [1]: https://youtu.be/KXuZi9aeGTw?t=900
        
           | monocasa wrote:
           | Not quite.
           | 
           | There isn't a door_recv(2) systemcall or equivalent.
           | 
           | Doors truly don't transfer messages, they transfer the thread
           | itself. As in the thread that made a door call is now just
           | directly executing in the address space of the callee.
           | 
           | They're more like i432/286/mill cpu task gates.
        
             | wahern wrote:
             | > Doors truly don't transfer messages, they transfer the
             | thread itself. As in the thread that made a door call is
             | now just directly executing in the address space of the
             | callee.
             | 
             | In somewhat anachronistic verbiage (at least in a modern
             | software context) this may be true, but today this
             | statement makes it sounds like _code_ from the caller
             | process is executing in the address space of the callee
             | process, such that miraculously the caller code now can
             | directly reference data in the callee. AFAICT that just isn
             | 't the case, and wouldn't even make sense--i.e. how would
             | it know the addresses without a ton of complex reflection
             | that's completely absent from example code? (Caller and
             | callee don't need to have been forked from each other.) And
             | according to the Linux implementation, the "argument" (a
             | flat, contiguous block of data) passed from caller to
             | callee is literally copied, either directly or by mapping
             | in the pages. The caller even needs to provide a return
             | buffer for the callee's returned data to be copied into
             | (unless it's too large, then it's mapped in and the return
             | argument vector updated to point to the newly mmap'd
             | pages). File descriptors can also be passed, and of course
             | that requires kernel involvement.
             | 
             | AFAICT, the trick here pertains to _scheduling_ alone, both
             | wrt to the hardware and software systems. I.e. a lighter
             | weight interface for the hardware task gating mechanism,
             | like you say, reliant on the synchronous semantics of this
             | design to skip involving the system scheduler. But all the
             | other process attributes, including address space, are
             | switched out, perhaps in an optimized matter as mentioned
             | elsethread but still preserving typical process isolation
             | semantics.
             | 
             | If I'm wrong, please correct me with pointers to more
             | detailed technical documentation (Or code--is this still in
             | Illuminos?) because I'd love to dig more into it.
             | 
             | FWIW, Here's the Solaris man page for libdoor: https://docs
             | .oracle.com/cd/E36784_01/html/E36873/libdoor-3li... Did you
             | mean door_call or door_return instead of door_recv?
        
               | monocasa wrote:
               | I didn't imply that the code remains and it's only data
               | that is swapped out. The thread jumps to another complete
               | address space.
               | 
               | It's like a system call instruction that instead of
               | jumping into the kernel, jumps into another user process.
               | There's a complete swap out of code and data in most
               | cases.
               | 
               | Just like with system calls how the kernel doesn't need a
               | thread pool to respond to user requests applies here. The
               | calling thread is just directly executing in the callee
               | address space after the door_call(2).
               | 
               | > Did you mean door_call or door_return instead of
               | door_recv?
               | 
               | I did not. I said there is no door_recv(2) systemcall.
               | The 'server' doesn't wait for messages at all.
        
               | kragen wrote:
               | thanks for finding the man page!
        
         | blacklion wrote:
         | Conceptually is key word here.
         | 
         | Later in this article authors says tat Server manage its own
         | pool (optionally bounded) of threads to serve requests.
        
         | p_l wrote:
         | The thread in this context refers to kernel scheduler
         | thread[1], essentially the entity used to schedule user
         | processes. By migrating the thread, the calling process is
         | "suspended", it's associated kernel thread (and thus scheduled
         | time quanta, run queue position, etc.) saves the state into
         | Door "shuttle", picks up the server process, continues
         | execution of the server procedure, and when the server process
         | returns from the handler, the kernel thread picks up the Door
         | "shuttle", restores the right client process state from it, and
         | lets it continue - with the result of the IPC call.
         | 
         | This means that when you do a Door IPC call, the service
         | routine is called immediately, not at some indefinite point in
         | time in the future when the server process gets picked by
         | scheduler to run and finds out an event waiting for it on
         | select/poll kind of call. If the service handler returns fast
         | enough, it might return even before client process' scheduler
         | timeslice ends.
         | 
         | The rapid changing of TLB etc. are mitigated by hardware
         | features in CPU that permit faster switches, something that Sun
         | had already experience with at the time from the Spring
         | Operating System project - from which the Doors IPC in fact
         | came to be. Spring IPC calls were often faster than normal x86
         | syscalls at the time (timings just on the round trip: 20us on
         | 486DX2 for typical syscall, 11us for sparcstation Spring IPC,
         | >100us for Mach syscall/IPC)
         | 
         | EDIT:
         | 
         | [1] Some might remember references to 1:1 and M:N threading in
         | the past, especially in discussions about threading support in
         | various unices, etc.
         | 
         | The "1:1" originally referred to relationship between "kernel"
         | thread and userspace thread, where kernel thread didn't mean
         | "posix like thread in kernel" and more "the scheduler
         | entity/concept", whether it was called process, thread, or
         | "lightweight process"
        
         | grishka wrote:
         | Why would you care who spawned the thread? If your code is
         | thread-safe, it shouldn't make a difference.
         | 
         | One potential problem with regular IPC I see is that it's
         | nondeterministic in terms of performance/throughput because you
         | can't be sure when the scheduler will decide to run the other
         | side of whatever IPC mechanism you're using. With these
         | "doors", you bypass scheduling altogether, you call straight
         | "into" the server process thread. This may make a big
         | difference for systems under load.
        
       | robinhouston wrote:
       | Back in 1998 or so, a colleague and I were tasked with building a
       | system to target adverts to particular users on a website. It
       | obviously needed to run very efficiently, because it would be
       | invoked on every page load and the server hardware of the time
       | was very underpowered by today's standards.
       | 
       | The Linux server revolution was still a few years away (at least
       | it was for us - perhaps we were behind the curve), so our
       | webservers were all Sun Enterprise machines running Solaris.
       | 
       | We decided to use a server process that had an in-memory
       | representation of the active users, which we would query using
       | doors from a custom Apache module. (I had read about doors and
       | thought they sounded cool, but neither of us had used them
       | before.)
       | 
       | It worked brilliantly and was extremely fast, though in the end
       | it was never used in production because of changing business
       | priorities.
        
       | codetrotter wrote:
       | Solaris was ahead of its time. They had Zones, which was a
       | container technology, years ago. Likewise, FreeBSD way ahead with
       | Jails. Only later did Docker on Linux come into existence and
       | then containerization exploded in popularity.
        
         | layer8 wrote:
         | In the early 2000s I rented a managed VPS based on FreeBSD
         | jails. This was nice because you were root in your jail while
         | the distribution was managed for you (security updates in
         | particular), and you could still override any system files you
         | wanted via the union file system. It was like the best of both
         | worlds between a managed Unix account and a VPS where you can
         | do anything you want as root.
        
           | bionsystem wrote:
           | AND you are running bare metal, without the virtualization
           | layer overhead, which is quite nice especially in the early
           | 2000s :)
           | 
           | My first project as a junior was to deploy Solaris Zones to
           | replace a bunch of old SPARC machines. It's such a great tech
           | and was a fun project.
        
         | hnlmorg wrote:
         | Linux had a few different containerisation technologies before
         | Docker blew up. But even they felt like a step backwards from
         | Jails and Zones.
         | 
         | Personally I think its more fair to say that Linux was behind
         | the times because containerisation and virtualisation were
         | common concepts in UNIXes long before they became Linux
         | staples.
        
         | stormking wrote:
         | Well, Docker did a few things differently then these older
         | approaches. First, the one-process-per-container mantra, second
         | the layered file system and build process and third, the
         | "wireing together" of multiple containers via exposed ports.
         | Zones and Jails were more like LXC and were mostly used as a
         | "lightweight virtual machine".
        
         | panick21_ wrote:
         | I highly recommend this talk, talking about Jails and Zones.
         | Jails was first, but Zones took a lot of lessens from it and
         | went further.
         | 
         | https://www.youtube.com/watch?v=hgN8pCMLI2U
        
           | pjmlp wrote:
           | Historically, HP-UX 10 Vaults might have been there first.
        
             | kragen wrote:
             | probably lynn wheeler will tell you cp/67 was there first:
             | https://www.garlic.com/~lynn/2003d.html#72
        
               | pjmlp wrote:
               | Certainly, I was only focusing on the UNIX side of the
               | history.
        
               | kragen wrote:
               | understandable!
        
               | p_l wrote:
               | CP-67 was a hypervisor, a different model altogether to
               | the chroot/jails/zones/linux namespace evolution, the
               | sidequest of HP Vaults and various workload partitioning
               | schemes on systems like AIX, or the grand-daddy of chroot
               | line, Plan 9 namespace system.
        
               | kragen wrote:
               | yes, i agree, except that doesn't chroot predate plan9 by
               | almost a decade?
        
               | p_l wrote:
               | Thank you, somehow I missed the part of the history! Yes
               | indeed, chroot() starts in 1979 patches ontop of V7
               | kernel. Plan9 namespaces probably evolved from there as
               | Research Unix V8 was BSD based.
        
               | kragen wrote:
               | thanks! i didn't have any idea 8th edition was bsd-based
        
             | panick21_ wrote:
             | From what I have read, it was nowhere near as comprehensive
             | and integrated. I only read about them once, so I don't
             | really know.
        
               | pjmlp wrote:
               | I would say it was easy enough for traditional UNIX
               | admins.
               | 
               | We used it for our HP-UX customers in telecoms, using our
               | CRM software stack, based on Apache/Tcl/C.
        
           | dprice1 wrote:
           | Rereading the zones paper now makes me cringe, but I was in
           | my 20s, what can I say. I think the argument we made that
           | holds up is that this was designed to be a technology for
           | server consolidation, and the opening section sets some
           | context about how primitive things were in Sun's enterprise
           | customer base at the time.
           | 
           | I have a lot of admiration for what Docker dared to do-- to
           | really think differently about the problem in a way which
           | changed application deployment for everyone.
           | 
           | Also I can tell you at the time that we were not especially
           | concerned about HP or IBM's solutions in this space; nor did
           | we face those container solutions competitively in any sales
           | situation that I can recall. This tech was fielded in the
           | wake of the dot-com blowout-- and customers had huge estates
           | of servers often at comically low utilization. So this was a
           | good opportunity for Sun to say "We are aligned with your
           | desire to get maximum value out of the hardware you already
           | have."
           | 
           | It's a blast to see this come up from time to time on HN,
           | thanks.
        
         | 4ad wrote:
         | And ZFS, and DTrace.
        
           | rbanffy wrote:
           | Crossbow as well.
        
             | wang_li wrote:
             | And STMF.
        
         | pjmlp wrote:
         | While Solaris was/is one of my favourite UNIXes, back in 1999,
         | HP-UX already had HP Vaults, which keeps being forgotten when
         | Solaris Zones are pointed out.
        
           | dmd wrote:
           | We all try to do our best to forget HP-UX. I worked on it at
           | Bristol-Myers Squibb in 1999 and hated every minute of it.
        
             | pjmlp wrote:
             | How much was HP-UX, and how much was the job itself?
        
               | dmd wrote:
               | 40 60
        
         | ilikejam wrote:
         | I've said it before, and I'll never stop saying it.
         | 
         | Zones were awful.
         | 
         | Solaris had all of these great ideas, but the OS was an
         | absolute pain in the ass to admin. And I say that from both
         | sides of the SWAN firewall.
        
           | EvanAnderson wrote:
           | My only Solaris experience is using SmartOS. I find Zones
           | there to be fairly easy to deal with. Is this just because
           | SmartOS is papering-over the awfulness?
        
         | ahoka wrote:
         | cgroups came only three years after zones, not a huge
         | difference.
        
         | jiveturkey wrote:
         | Solaris is still ahead of its time on many things. I wish it
         | were worth porting eBPF to Solaris.
        
       | danwills wrote:
       | Somehow I'm only just noticing the name-conflict with SideFX
       | Houdini's new(ish) USD context, which is also called 'Solaris'!
       | .. Guess I don't search for the old SunSoft kind of Solaris much
       | these days eh!
        
         | rbanffy wrote:
         | You probably noticed that before Oracle's legal dept.
        
       | lukeh wrote:
       | 1998 paper on Linux implementation:
       | http://www.rampant.org/doors/linux-doors.pdf
        
         | oecumena wrote:
         | I did another one back in 2001: https://ldoor.sourceforge.net/
         | .
        
       | monocasa wrote:
       | Writing a top level comment here to hopefully address some of the
       | misconceptions present across this comment section.
       | 
       | Doors at the end of the day aren't message passing based RPC.
       | There is no door_recv(2) syscall or equivalent nor any way for a
       | thread pool in the callee to wait for requests.
       | 
       | Doors at the end of the day are a control transfer primitive. In
       | a very real sense the calling thread is simply transferred to the
       | callee's address space and continues execution there until a
       | door_return(2) syscall transfers it back into the caller
       | address_space.
       | 
       | It truly is a 'door' into another address space.
       | 
       | This is most similar to some of the CPU control transfer
       | primitives. It's most like task gate style constructs like seen
       | on 286/i432/mill CPUs. Arguably it's kind of like the system call
       | instruction itself too, transferring execution directly to
       | another address space/context.
        
         | kragen wrote:
         | what does the stack look like in the callee's address space? is
         | it a stack that previously existed (you seem to be saying it
         | isn't) or a new one created on demand?
         | 
         | maybe https://docs.oracle.com/cd/E88353_01/html/E37843/door-
         | create... is where i should start reading...
        
         | starspangled wrote:
         | Maybe you know more about it I don't want to say you're wrong
         | because the only thing I know of it is reading the page linked
         | to. But that page semes to disagree with you (EDIT: sorry I
         | don't know how to best format quoting the article vs quoting
         | your reply):                   Doors Implementation
         | [...] the default server threads that are created in response
         | to incoming door requests on the server.              A new
         | synchronization object, called a shuttle [...] allowing a
         | direct scheduler hand-off operation between two threads.
         | [...] a shuttle marks the current thread as sleeping, marks the
         | server thread as running, and passes control directly to the
         | server thread.              Server Threads              Server
         | threads are normally created on demand by the doors library
         | [...]         The first server thread is created automatically
         | when the server issues a door create call.         [...] Once
         | created, a server thread will place itself in a pool of
         | available threads and wait for a door invocation.
         | 
         | There would be no need for these server threads if the client
         | thread transferred directly to the server process address
         | space.
         | 
         | > There is no door_recv(2) syscall or equivalent nor any way
         | for a thread pool in the callee to wait for requests
         | 
         | It says the thread pool on the server is created by the doors
         | library when the server creates a door. So the process of
         | receiving and processing requests would be carried out
         | internally within the doors library, so there would be no need
         | for the server application to have an API for the server
         | application to accept requests, it's handled by the library.
         | 
         | At least that's what is described in the link, AFAIKS. It's
         | only a conceptual door, underneath it is implemented in some
         | message passing style maybe with some extra sugar for
         | performance and process control nicety with this "shuddle"
         | thing for making the requests.
        
           | kragen wrote:
           | this is very confusing and now i want to see truss output
           | 
           | man pages like
           | https://docs.oracle.com/cd/E88353_01/html/E37843/door-
           | server... reference literally zero man pages in section 2, so
           | i wonder if there is in fact a door_recv system call and it
           | just isn't documented?
           | 
           | but yeah it sure seems like there's a thread pool of server
           | threads (full-fledged posix threads with their own signal
           | masks and everything) that sit around waiting for door calls
           | 
           | > _The door_server_create() function allows control over the
           | creation of server threads needed for door invocations. The
           | procedure create_proc is called every time the available
           | server thread pool is depleted. In the case of private server
           | pools associated with a door (see the DOOR_PRIVATE attribute
           | in door_create()), information on which pool is depleted is
           | passed to the create function in the form of a door_info_t
           | structure. The di_proc and di_data members of the door_info_t
           | structure can be used as a door identifier associated with
           | the depleted pool. The create_proc procedure may limit the
           | number of server threads created and may also create server
           | threads with appropriate attributes (stack size, thread-
           | specific data, POSIX thread cancellation, signal mask,
           | scheduling attributes, and so forth) for use with door
           | invocations._
           | 
           | <https://docs.oracle.com/cd/E88353_01/html/E37843/door-
           | server...>
           | 
           | apparently things like door_create() survived into
           | opensolaris and so they are presumably open source now? even
           | if under the cddl
           | 
           | <https://www.unix.com/man-page/opensolaris/3c/door_create/>
           | 
           | /me git clone https://github.com/kofemann/opensolaris
           | 
           | jesus fuck, 1.4 gigabytes? fuck you very much fw_lpe11002.h
           | 
           | okay so usr/src/lib/libc/port/threads/door_calls.c says the
           | 'raw system call interfaces' are __door_create,
           | __door_return, __door_ucred, __door_unref, and __door_unbind,
           | which, yes, do seem to be undocumented. they seem to have
           | been renamed in current illumos
           | https://github.com/illumos/illumos-
           | gate/blob/master/usr/src/...
           | 
           | unfortunately it's not obvious to me how to find the kernel
           | implementation of the system call here, which would seem to
           | be almost the only resort when it isn't documented? i guess i
           | can look at how it's used
           | 
           | __door_create in particular is called with the function
           | pointer and the cookie, and that's all door_create_cmn does
           | with the function pointer; it doesn't, for example, stash it
           | in a struct so that a function internal to door_calls.c can
           | call it in a loop after blocking on some sort of
           | __door_recv() syscall (which, as i said, doesn't exist)
           | 
           | it _does_ have a struct privdoor_data under some
           | circumstances; it just doesn 't contain the callback
           | 
           | i don't know, i've skimmed all of door_calls.c and am still
           | not clear on how these threads wait to be invoked
           | 
           | aha, the kernel implementation is in
           | usr/src/uts/common/fs/doorfs/door_sys.c. door_create_common
           | stashes the function pointer in the door_pc of a door_node_t.
           | then door_server_dispatch builds a new thread stack and
           | stashes the door_pc on it as the di_proc of a door_info_t
           | starting at line 1284: https://github.com/kofemann/opensolari
           | s/blob/master/usr/src/...
           | 
           | this seems to be a structure with layout shared between the
           | kernel and userspace (mentioned in the man page i quoted
           | above): https://github.com/illumos/illumos-
           | gate/blob/master/usr/src/...
           | 
           | ...but then door_calls.c never uses di_proc! so i'm still
           | mystified as to how the callback function you pass to
           | door_create ever gets called
           | 
           | probably one hour diving into solaris source code is enough
           | for me for this morning, though it's very pleasantly
           | formatted and cleanly structured and greppable. does anybody
           | else know how this works and how they got those _awesome_
           | numbers? is door_call web scale?
        
             | wahern wrote:
             | The callback address is loaded and invoked here:
             | https://github.com/illumos/illumos-
             | gate/blob/5d9d909/usr/src... It's why door_server_dispatch
             | copies out door_results to the server thread stack.
             | 
             | The aha moment was when I realized that the door_return(2)
             | syscall is how threads yield and wait to service the next
             | request. In retrospect it make sense, but I didn't see it
             | until I tried to figure out how a user space thread polled
             | for requests--a thread first calls door_bind, which
             | associates it to the private pool, and then calls
             | door_return with empty arguments to wait for the initial
             | call. (See example door_bind at
             | https://github.com/illumos/illumos-
             | gate/blob/4a38094/usr/src... and door_return at
             | https://github.com/illumos/illumos-
             | gate/blob/4a38094/usr/src...)
             | 
             | One of the confusing aspects for me was that there's both a
             | global pool of threads and "private" pool of threads. By
             | default threads are pulled from the global pool to service
             | requests, but if you specify DOOR_PRIVATE to door_create,
             | it uses the private pool bound to the door.
             | 
             | AFAICT, this is conceptually a typical worker thread
             | pooling implementation, with condition variables, etc for
             | waking and signaling. (See door_get_server at
             | https://github.com/illumos/illumos-
             | gate/blob/915894e/usr/src...) Context switching does seem
             | to be optimized so door_call doesn't need to bounce through
             | the system scheduler. (See shuttle_resume at
             | https://github.com/illumos/illumos-
             | gate/blob/2d6eb4a/usr/src...) And door_call/door_return
             | data is copied across caller and callee address spaces like
             | you'd expect, except the door_return magic permits the
             | kernel to copy the data to the stack, adjusting the stack
             | pointer before resuming the thread and resetting it when it
             | returns. (See door.S above and door_args at
             | https://github.com/illumos/illumos-
             | gate/blob/915894e/usr/src...)
             | 
             | This works just as one might expect: no real magic except
             | for the direct thread-thread context switching. But that's
             | a similar capability provided by user space scheduler
             | activations in NetBSD, or the switchto proposal for Linux.
             | The nomenclature is just different.
             | 
             | It's a slightly different story for in-kernel doors, but
             | that's not surprising, either, and there's nothing
             | surprsing there, AFAICT (but I didn't trace it as much).
             | 
             | Thanks for finding those source code links. I cloned the
             | repo and started from there, grep'ing the whole tree to
             | find the user space wrappers, etc.
        
               | kragen wrote:
               | aha! i should have looked in the assembly. in opensolaris
               | it's opensolaris/usr/src/lib/libc/amd64/sys/door.s line
               | 121; as far as i can see the file hasn't changed at all
               | except for being renamed to door.S. and of course the
               | assembly doesn't call the struct field di_proc, it
               | defines a struct field address macro
               | 
               | so door_return (or, in opensolaris, __door_return) is the
               | elusive door_recv syscall, eh?
               | 
               | (i'll follow the rest of your links later to read the
               | code)
               | 
               | i wonder if spring (which was never released afaik) used
               | a different implementation, maybe a more efficient one
               | 
               | this is strongly reminiscent of ipc in liedtke's l4
               | (which doesn't implicitly create threads) or in keykos
               | (which only has one thread in a domain)
               | 
               | thank you so much for figuring this out!
        
               | wahern wrote:
               | The one thing I wasn't sure about was how the per-process
               | global thread pool for non-private doors was populated. I
               | missed it earlier (wasn't looking closely enough, and the
               | hacks to handle forkall headaches make it look more
               | complicated than it is), but the user space door_create
               | wrapper code--specifically door_create_cmn--invokes
               | door_create_server (through the global door_server_func
               | pointer) on the first non-private door_create call.
               | door_create_server creates a single thread which then
               | calls door_return. On wakeup, before calling the
               | application's handler function, the door_return assembly
               | wrapper conditionally invokes the global
               | door_depletion_cb (defined back in door_calls.c) which
               | can spin up additional threads.
               | 
               | The more I read, the more Doors seems pretty clever. Yes,
               | the secret sauce is very much like scheduler activations
               | or Google's switchto, but you can pass data at the same
               | time, making it useful across processes, not just threads
               | sharing an address space. And you can't get the
               | performance benefit merely by using AF_UNIX sockets
               | because the door_call and door_return have an association
               | for the pendency of the call. The tricky part from an API
               | perspective isn't switching the time slice to the server
               | thread, it's switching it _back_ to the client thread on
               | the return.
               | 
               | Microkernels can do this, of course, but figuring out a
               | set of primitives and an API that fit into the Unix model
               | without introducing anything too foreign, like Mach Ports
               | in macOS, does take some consideration. Now that Linux
               | has process file descriptors, maybe they could be used as
               | an ancillary control message token to get the round-trip
               | time slice and data delivery over AF_UNIX sockets. And
               | memfd objects could be used for messages larger than the
               | receiver's buffer, which seems to be a niche case,
               | anyhow. Or maybe that's over complicating things.
        
           | fanf2 wrote:
           | > There would be no need for these server threads if the
           | client thread transferred directly to the server process
           | address space.
           | 
           | The client's stack isn't present in the server's address
           | space: the server needs a pool of threads so that a door call
           | does not need to allocate memory for the server stack.
        
             | starspangled wrote:
             | It wouldn't need a pool of threads for a stack, it just
             | needs some memory in the server's address space for the
             | stack.
             | 
             | Together with the other points about marking the current
             | thread sleeping and passing control to the server thread,
             | and about server threads waiting for an invocation, I think
             | what is described is pretty clearly the server threads are
             | being used to execute code on the server on behalf of the
             | client request.
        
               | kragen wrote:
               | can you figure out how this mechanism works? my notes in
               | https://news.ycombinator.com/item?id=41056051 reflect me
               | rooting around and giving up; it's nonobvious
        
           | dan353hehe wrote:
           | I did some debugging about 8 years ago in SmartOS. There were
           | some processing that were running for way too long without
           | using the cpu. I ended up using kdbg to trace where the
           | processes were hung. I had to jump between different
           | processes to trace the door calls. There was no unified stack
           | in one process that spanned multiple address spaces, and the
           | calling threads were paused.
           | 
           | So, yes. The documentation in right, the calling thread is
           | paused while a different one is used one the other side of
           | the door.
        
           | netbsdusers wrote:
           | Kernel and user threads are distinct on Solaris. The
           | implementation of doors is not by message passing. There
           | needs to be a userland thread context in the server process
           | with a thread control block and all the other things that the
           | userland threading system depends on to provide things like
           | thread-local variables (which include such essentials as
           | errno). I do not know whether there is a full kernel thread
           | (or LWP as they call it in Solaris), but a thread is
           | primarily a schedulable entity, and if the transfer of
           | control is such that the activated server thread is treated
           | in the same way in terms of scheduling as the requesting
           | client thread, then effectively it is still what it says it
           | is.
        
         | wang_li wrote:
         | >Doors at the end of the day are a control transfer primitive.
         | In a very real sense the calling thread is simply transferred
         | to the callee's address space and continues execution there
         | until a door_return(2) syscall transfers it back into the
         | caller address_space.
         | 
         | Your phrasing is misleading. A door is a scheduler operation.
         | What would it even mean for a thread to go from a caller to a
         | callee? They are different processes with completely different
         | contexts on the system. Different owners, different address
         | spaces, etc. What the door is doing is passing time on a CPU
         | core from the calling thread to the receiving thread and back.
        
         | ajross wrote:
         | > Doors at the end of the day are a control transfer primitive.
         | 
         | This is true, but isomorphic to RPC, so I really don't think I
         | understand the distinction you're trying to draw. A "procedure
         | call" halts the "caller", passes "arguments" into the call, and
         | "returns a value" when it completes. And that's what a door
         | call does.
         | 
         | Nothing about "Remote Procedure Calls" requires thread pools or
         | a recv() system call, or an idea of "waiting for a request".
         | Those are all just implementation details of existing RPC
         | mechanisms implemented _without_ a send+recv call like doors.
         | 
         | Nor FWIW are doors alone in this space. Android Binder is
         | effectively the same metaphor, just with a more opinionated
         | idea of how the kernel should manage the discovery and dispatch
         | layer (Solaris just gives you a descriptor you can call and
         | lets userspace figure it out).
        
           | klodolph wrote:
           | I thought the parent comment was drawing a very clear
           | distinction--that doors do not involve message passing, but a
           | different mechanism.
           | 
           | If it were a message passing system, you would be able to
           | save the message to disk and inspect it later, or transmit
           | the message over the network, etc.
        
             | ajross wrote:
             | Is this a pedantic argument about a specific definition for
             | "message passing"? If not, then it's wrong: door_call()
             | allows passing an arbitrary/opaque buffer of data into the
             | call, and returning arbitrary data as part of the return.
             | (Maybe the confusion stems from the fact that the sample
             | code in the linked article skipped the second argument to
             | door_call(), but you can see it documented in the man
             | page).
             | 
             | If so, then I retreat to the argument above: there is
             | nothing about "Remote Procedure Call" as a metaphor that
             | requires "message passing". That, again, is just an
             | implementation detail of _other_ RPC mechanisms that don 't
             | implement the call as directly as doors (or binder) do.
        
               | klodolph wrote:
               | > Is this a pedantic argument about a specific definition
               | for "message passing"?
               | 
               | No. This is just colloquial usage of the term "message
               | passing".
               | 
               | Yes, you can use doors to pass messages. That is not the
               | only thing you can do.
               | 
               | > If so, then I retreat to the argument above: there is
               | nothing about "Remote Procedure Call" as a metaphor that
               | requires "message passing".
               | 
               | Yeah. Everyone else here agrees with that. The original
               | comment you replied to said, "Doors at the end of the day
               | aren't message passing based RPC." This absolutely
               | indicates that the poster agrees with you on that point--
               | that not all forms of RPC are based on a message passing
               | system. There are other forms of RPC, and this is one of
               | them.
               | 
               | Ultimately, I think you can conceive of "message passing"
               | broadly enough that, well, everything is message passing.
               | What is a CPU register but something that receives a
               | message and later re-transmits it? I would rather think
               | of the colloquial use of "message passing".
               | 
               | Likewise, you can probably dig into doors and find some
               | message system somewhere in the implementation. Or at
               | least, you'll find something you can call a message.
        
       | bux93 wrote:
       | I wonder if the name is related to BBS doors?
       | https://en.wikipedia.org/wiki/Door_(bulletin_board_system)
        
         | kragen wrote:
         | i don't think so; i think people independently chose the name
         | 'door' for a way to get out of one program and into another in
         | both cases. bbs people and sun labs kernel researchers didn't
         | talk to each other that much unfortunately, and the mechanisms
         | aren't really that similar
        
       | rbanffy wrote:
       | I wonder if anyone ended up saving a copy of the Spring operating
       | system.
        
         | sillywalk wrote:
         | bcantrill had a copy in his basement as of 2015.
         | 
         | https://news.ycombinator.com/item?id=10325362
        
       | quotemstr wrote:
       | Nobody has ever done IPC better than Microsoft did with
       | COM/RPC/AIPC. Nobody else even came close. I will die on this
       | hill. The open source world has done itself a tremendous
       | disservice eschewing object capability systems with in-process
       | bypasses.
        
       ___________________________________________________________________
       (page generated 2024-07-24 23:07 UTC)