[HN Gopher] Domain/OS Design Principles (1989) [pdf]
___________________________________________________________________
Domain/OS Design Principles (1989) [pdf]
Author : todsacerdoti
Score : 45 points
Date : 2021-03-26 15:13 UTC (7 hours ago)
(HTM) web link (bitsavers.org)
(TXT) w3m dump (bitsavers.org)
| gelstudios wrote:
| These branches of computing history are really interesting.
|
| > Domain/OS uses a single-level storage mechanism, whereby a
| program gains access to an object by mapping object pages
| directly into the process's address space.
|
| It sounds similar in that respect to IBM i, and seems like an
| evolutionary branch that died off. What ever happened to this
| paradigm?
| wmf wrote:
| There was also KeyKOS and EROS. One problem with such systems
| is that data corruption can live forever if you're not careful
| (of course the same can happen with mmap which is our watered-
| down version of single-level storage).
| kragen wrote:
| Why did single-level stores die off? It's an interesting
| question, and I'm not sure I know the answer. That's also how
| Multics worked, but I think what happened was it turned out
| that Unix was better.
|
| It's not wholly coincidental, or intentional, that Unix didn't
| have mmap. The PDP-11 and the PDP-7 didn't have paging
| hardware, so early Unix couldn't implement mmap at all. And it
| was common to access files bigger than the virtual address
| space, and doing that with mmap requires you to sequentially
| map, then unmap, different parts of the file--basically what
| you have to do with read() and write(). So, early Unix couldn't
| implement mmap because it was designed to run on cheap
| hardware.
|
| Also, though, if a program is reading from a file by memory-
| mapping it, you can't replace the file with a pipe unless you
| change the program. (If you lseek() on a pipe, it croaks with
| an ESPIPE, now called "Illegal seek".) Unix got enormously
| better composability and scriptability than other contemporary
| OSes by virtue of pipes, to the point where the Unix group
| ported their pipe-based toolkit of "software tools" to other
| operating systems in the mid-1970s in order to have a more
| comfortable working environment. Then, of course, the world
| started to revolve around TCP, which gives you a byte-stream
| between two machines, like a magtape, not a random-access
| collection of pages like a disk. (There are lots of networked
| applications that really prefer a remote-disk model; Acrobat
| Reader and Microsoft Access come to mind. But that wasn't where
| TCP/IP was in the 01980s and early 01990s, Sun's WebNFS aside.)
|
| Another problem is that, when a program is mutating a shared
| mutable resource like a disk sector, there are times when the
| resource is in an inconsistent state. Usually, we think of this
| as a problem for concurrent access, and the solution is to keep
| any other thread from observing the inconsistent state, for
| example with a mutex. But it's also a problem for failure
| recovery: if your program crashes before restoring consistency,
| now you have data corruption to recover from.
|
| In Unix, the mutable shared resource was usually the
| filesystem, so this was mostly only a problem if the _kernel_
| crashed, perhaps due to a power failure; ordinary user programs
| mostly created new files, so if they crashed during execution,
| the worst that could happen was that their output file would be
| incomplete. Then the user could delete it and try again. So,
| even though Unix wasn 't a fault- _tolerant_ OS like Tandem 's
| Guardian, it did tend to limit the impact of faults. (The
| occasional exceptions to this rule, such as Berkeley mbox
| files, were a continuous source of new bugs.)
|
| The easiest way to handle this kind of problem is with atomic
| transactions, so that if a program crashes halfway through an
| update, the old state remains the current state, and there is
| no data corruption problem to worry about. As I understand the
| situation, this is how IMS and DB2 have handled this problem
| since the 01960s and 01970s, respectively, and of course today
| we build lots of applications on top of transaction systems
| like Postgres, Kafka, ZODB, Git, MariaDB, and especially
| SQLite.
|
| But none of those systems existed in the 01980s, except for
| IMS, DB2, and Postgres, and none of those ran on Domain/OS. I
| don't have any experience with Domain/OS but I imagine that
| this was a source of bugs for Domain/OS applications as well.
|
| There's another, arguably distinct, fault-related problem that
| pops up in current use with mmap(). If you try to read() from a
| file, copying data into your address space, this may succeed or
| fail, or it may succeed partially, for example if you hit the
| end of the file. All of these conditions arise at the readily
| identifiable point in your program where it invokes read(), and
| so you can look at the code to see if you forgot to handle one
| of them at that point. Moreover, you can be sure that neither
| of those two problems will arise _later_ while you 're using
| the data you've read, possibly while you have some other shared
| mutable resource in a temporarily inconsistent state.
|
| By contrast, with mmap(), such a failure can arise _any time
| you access the mapped memory_ , in most cases. For example,
| someone else may have truncated the file since you mapped it,
| as in http://canonical.org/~kragen/sw/dev3/mmapcrash.c, where
| as soon as the array index strays onto the now-nonexistent
| page, the program dies with a bus error. This makes it more
| difficult to write programs that handle failures correctly.
|
| Relatedly, there's a performance issue: although memory-mapping
| a page and then reading it means the kernel doesn't have to
| _copy_ its contents into your address space, which often
| increases performance, it does still have to _read_ the page
| from disk. But it has much less information about your access
| patterns than when you 're using read() and lseek(). This
| sometimes _reduces_ performance, because prefetching pages
| _before_ userland requests them makes a big performance
| difference--in the 01980s, we 're talking about 30000
| microseconds to wait for the disk, versus 1 microsecond to
| handle a page fault or 2 microseconds to handle a small read(),
| if the data is prefetched. It doesn't take a whole lot of extra
| prefetch failures to make mmap() slower, potentially by orders
| of magnitude.
|
| With modern NVDIMMs and NVMe Flash, and especially new memory
| architectures like 3D XPoint, the performance advantages of
| memory-mapping might become much more important again. If it
| takes 300 ns to call and return from read() or write(), plus
| 700 ns to copy 4096 bytes into or out of userspace, then
| spending 100 ns to read a random cache line from 3D XPoint
| memory (is that about how long it takes?) might be greatly
| preferable to spending 1000 ns to read a page of data from it
| through the syscall interface. But this was not a possibility
| in the Apollo years.
|
| One final minor issue with the Multics segment-mapping
| approach, at least when realized with paging hardware instead
| of segmentation hardware, is slack space at the ends of files.
| If the fundamental fixed-size units of a file consists of are
| not a multiple of the page size, such as a byte of text, then
| there will be times when the file's natural size is not a whole
| number of pages. So, for example, in CP/M files consist of
| 128-byte "sectors", thus saving 7 precious bits per directory
| entry. Your application program needs some kind of application-
| specific logic to tell whether the last page of the file has
| unused space in it, and, if so, how much.
|
| So in CP/M, for example, some applications would place a ^Z
| after the last legitimate byte of a text file, and others would
| fill the rest of the sector with up to 127 ^Z characters. As
| you can imagine, this kind of thing is fertile ground not only
| for application bugs (you can't reliably store ^Z in a text
| file, and never as the last byte) but also subtle application
| incompatibilities. If you want to write a Unix "cat" program
| for CP/M, it needs to have an opinion about which of these
| conventions to use, and also what to do if it finds a ^Z that
| _isn 't_ in the last sector.
|
| Again, I never used Domain/OS, so I don't know how it handled
| text files or other files that commonly had a non-page-aligned
| EOF. The Apollo engineers were brilliant and produced a
| stunning system that was much better than Unix in many ways. So
| maybe they had a good solution to this problem, like a
| universally-used text-file-handling library that didn't use a
| brain-dead encoding like the CP/M one. I'm just saying it's a
| problem that crops up in userspace with the single-level store
| approach (on paging hardware), while the Unix approach
| relegates it to the filesystem driver.
| skissane wrote:
| > It sounds similar in that respect to IBM i, and seems like an
| evolutionary branch that died off
|
| Even on IBM i it is in decline. Originally everything ran in
| the single-level store address space, but then they introduced
| additional non-single-level address spaces (teraspaces). And
| one of the major things that teraspaces are used for, is to run
| PASE, which is IBM i's AIX binary compatibility subsystem. And
| IBM appears to have a preference to ship new stuff in PASE. The
| single-level store environment is still used by "classic" apps
| (such as RPG and COBOL), but newer stuff - especially anything
| written in newer languages such as Java, Python, etc - runs
| outside of the single-level store in a PASE teraspace.
| anyfoo wrote:
| But is that just to accommodate the newer, more mainstream
| stuff, or because it's actually technically better?
| skissane wrote:
| I think it is mainly about making it easier to port code
| from more mainstream platforms (AIX/Unix/Linux), which
| reduces engineering costs. Porting open source code is a
| low cost way to get new functionality and features, and
| makes the environment seem more familiar and modern to
| newcomers who are familiar with Linux - and commercial
| Unixes such as AIX are pretty close to Linux. When
| detractors call it a "legacy" platform, their sales team
| can now respond "it's not legacy, it runs node.js!"
|
| But one thing I think it demonstrates is a problem with
| non-mainstream operating system architectures. Even if a
| non-mainstream operating system architecture is technically
| superior, sooner or later you want to port software to it
| from a mainstream operating system, which means you need a
| compatibility layer implementing a more mainstream
| operating system architecture. And before you know it most
| of the code is running in the compatibility layer, because
| that's where all the new applications are coming from and
| there is no way you can keep up with that pace yourself.
| And then you have to ask what is the point of the
| innovative non-mainstream architecture if so much of the
| software you run doesn't actually use it. So eventually it
| leads you to moving off the non-mainstream architecture and
| on to a more mainstream one.
|
| Is IBM i technically superior? It is a weird mixture of (a)
| advanced concepts like single-level store, an object-
| oriented operating system and bytecode virtual machine (b)
| legacy crud like EBCDIC, RPG, block mode terminals, 10
| character limit on object names and a single-level
| filesystem (c) a severe lack of extensibility and openness
| in which a lot of OS concepts (e.g object types) are closed
| up so only IBM engineering can extend them (or possibly
| ISVs who pay big $$$$ for NDA manuals) (d) the completely
| different worlds of POSIX/AIX/Java grafted on the side, and
| increasingly taking over the rest. I grant that (a) could
| be said to be technically superior, but (b) and (c) clearly
| are not.
| anyfoo wrote:
| But that's entirely my point, yeah. I don't know if a
| single-level store address space is better, but if the
| reason for its decline on IBM i is merely that mainstream
| software doesn't mesh well with it, I feel like it
| doesn't tell me much about the paradigm itself.
|
| By the way, I'd argue about whether all of b) is
| technically inferior or not. Object name limits certainly
| are, but I got to _really_ know data entry with block
| mode terminals long, long after its hey day (I 've
| certainly come across it back then, but I was rarely a
| user). I feel that it can be enormously efficient for
| data entry and maintenance tasks. Many a person who had
| to move from intensive use of a block mode data entry
| terminal to performing the same tasks with a web app got
| quite annoyed at the clumsiness of it all.
|
| The web was not created for "business apps" but for
| hypertext document retrieval, the other uses got bolted
| on and it still very much shows. It's sad, because proper
| terminal emulation well used to be a ubiquitous feature
| of the Internet, before browsers took over almost
| entirely.
| skissane wrote:
| > but I got to really know data entry with block mode
| terminals long, long after its hey day (I've certainly
| come across it back then, but I was rarely a user)
|
| I don't think block mode terminals are _necessarily_
| inferior. I see some big problems with 5250 though. The
| biggest is EBCDIC.
|
| Another big problem is character-at-a-time interfaces let
| you build things like text editors (vim and emacs),
| spreadsheets (like Lotus 123), etc. Sure you can build a
| text editor for a block mode terminal (SEU on IBM i,
| XEDIT on z/VM, ISPF EDIT on z/OS) but there are just
| certain features and interaction styles that vim and
| emacs support that block mode terminals can't do as
| nicely (example: interactive search). Lotus 123 was
| actually ported to 3270 (to run under MVS and VM/CMS),
| I've never used it (I would love it if someone could find
| a copy so I could!) but from what I've heard it was
| pretty clunky compared to the MS-DOS / PC version.
|
| Sometimes I think that block mode terminals could have
| exposed some kind of byte code to enable running some
| interactivity in the client. Actually real 3270s and
| 5250s generally had some kind of CPU in them (like an
| 8080) so I can't see why they couldn't have done that.
| And of course terminal emulators could do that. Then you
| could have these more flexible interaction styles that
| character mode terminals support even in a block mode
| terminal.
| anyfoo wrote:
| > I don't think block mode terminals are necessarily
| inferior. I see some big problems with 5250 though. The
| biggest is EBCDIC.
|
| Oh yeah I agree, the actual implementation details in
| this case are icky.
|
| > Another big problem is character-at-a-time interfaces
| let you build things like text editors (vim and emacs),
| spreadsheets (like Lotus 123)
|
| That's true, but at the same time block mode allows for
| highly standardized and always latency free data entry
| and manipulation. I wonder if this is just a case for
| different technologies for different use cases.
|
| > Sometimes I think that block mode terminals could have
| exposed some kind of byte code to enable running some
| interactivity in the client.
|
| Hmm, it helps to preserve the zero latency aspect (if
| done correctly), but at the same time opens up the door
| for shoddy implementation and non-standard UX.
|
| And then I'm sure people would come up with all sorts of
| "UI libraries" for terminals that they think are very
| clever, but just make everything fragmented and clumsy
| again, just like I often wish that a web site was just a
| plain old HTML page with maybe a standard web form,
| instead of whatever crazy js-backed UI the web framework
| du jour came up with...
| skissane wrote:
| > That's true, but at the same time block mode allows for
| highly standardized and always latency free data entry
| and manipulation. I wonder if this is just a case for
| different technologies for different use cases.
|
| Other vendors - such as DEC and HP - had dual-mode ASCII
| terminals that normally operated in character-at-a-time
| mode, but had an escape sequence you could use to switch
| them into block mode. Maybe that's the best of both
| worlds. However, in practice, few apps used the block
| mode, even "data entry" style apps which could use it
| often didn't. Part of that was that using block mode
| basically tied you to a single brand of terminal, whereas
| manually generating forms using character mode was more
| portable. A lot of clone terminals and emulators emulate
| DEC VT terminals but few of those clones and emulators
| included the block mode functions.
| anyfoo wrote:
| Ah, I can totally imagine that being the case, yeah.
| Sigh, looks like there's no way out, we'll keep inventing
| ourselves into half-baked solutions on top of existing
| things.
| neilv wrote:
| Apollo Domain/OS was great, innovative, and opinionated. It
| wasn't just another BSD or System V, and they rethought a lot of
| things.
|
| Apollo's heyday was a bit before my time, but my first real job
| was working on technical software at a company who was a big-
| ticket technical software developer on Apollos (and I even once
| saw marketing brochure with our badge on an Apollo workstation).
| As an intern, I set up a pair of them for porting (though we
| preferred SPARCstations), and later, when I moved to
| headquarters, HQ was still doing the master SCM and (what's now
| called) CI, for all platforms, on Apollo DN10k pedestals. We got
| the DSEE descendant Atria ClearCase on non-Apollo workstations,
| in a new-tech R&D group I was in. I bought a couple retired
| Apollo workstations just to play with them at home.
|
| Apollo did a lot of innovative stuff in Domain, and it's one of
| the few platforms I'm sometimes tempted to buy again, just to
| play with it and understand more of how they approached things.
|
| When it had been years since I'd seen or heard of Apollo
| anywhere, I bumped into someone from there, who mentioned that
| Boeing had done some documentation using Apollos, and part of
| their very serious configuration management process involved them
| physically archiving an entire Domain network. (I'm guessing they
| used the very nice Interleaf software, which seemed to be popular
| on Apollo, and, by that time, had long also been available on
| other platforms.) It was appealing to think of an Apollo Domain
| network preserved in stasis, should humanity ever need to call up
| Apollo for duty again.
| dn3500 wrote:
| There is an emulator built in to MESS in case you ever want to
| play with it again. You do need to find a disk image somewhere.
| neilv wrote:
| Thanks! In case anyone else is interested, three of the links
| I just found:
|
| https://wiki.mamedev.org/index.php/Driver:Apollo
|
| http://mess.redump.net/howto/apollo
|
| https://jim.rees.org/apollo-archive/
| mhd wrote:
| Back when I started uni, they had a whole room full of old Apollo
| Domain/OS machines. I think they were previously used for CAD or
| some EE stuff, but even than (late 90s) pretty much only used for
| people to do their email (pine, mostly).
|
| Anyone got some actual war stories about the soft- and hardware?
| dboreham wrote:
| We bought a bunch of Apollo Domain machines for board-level
| CAD/CAE in the early 90s. At that time there was a Unix
| compatibility mode (like Cygwin) so I didn't need to mess with
| Domain/OS much, except for curiosity purposes and to run magic
| commands (like PowerShell). It reminded me of VMS.
| tyingq wrote:
| I had an HP Apollo 425e, 68040 @ 25 MHz.
|
| I believe it was one of the last models that could run
| Domain/OS, but it could also run HPUX, which is what I had
| installed...so I never experienced Domain/OS. I did get
| NetBSD running on it, but it didn't support the framebuffer,
| so no X11.
| shalabhc wrote:
| A paper describing DSEE - an interesting distributed programming
| environment built on Domain/OS:
|
| http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.575....
|
| Excerpts:
|
| "DSEE is implemented as one program, with instances running at
| various nodes in the network."
|
| On the history manager:
|
| "DSEE can create a shell in which all programs executed in that
| shell window transparently read the exact version of an element
| requested in the user's configuration thread. The History
| Manager, Configuration Manager, and extensible streams mechanism
| (described above) work together in this way to provide a "time
| machine" that can place a user back in a environment that
| corresponds to a previous release. In this environment, users can
| print the version of a file used for a prior release, and can
| display a readonly copy of it. In addition, the compilers can use
| the "include" files as they were, and the source llne debugger
| can use old binaries and old sources during debug sessions. All
| of this is done without making copies of any of the elements."
|
| On the configuration manager (sounds like system wide built
| artifact caching):
|
| "The CM maintains a derived object pool which holds several
| version of each object that was produced as the result of
| building a component named In the system model (e.g., binaries).
| Each derived object in the pool is associated with the ECT used
| to build it. When asked to build, the CM determines a "desired"
| BCT by binding the system model to the versions requested by the
| user's current CT. The CM then looks in the derived object pool
| to see If there ls a BCT that exactly matches the one desired. If
| a match is found, the derived object associated with that BCT is
| used. Otherwise, the component is rebuilt In accordance with the
| desired BCT, and the new derived object and BCT are written to
| the pool. In all cases, the user is given exactly what he asked
| for."
| kragen wrote:
| If you like DSEE, you should try Vesta, a version-tracking
| system which provides most of the things that were good about
| DSEE, but is free software on Linux: http://www.vestasys.org/
| Arathorn wrote:
| I really miss my DN3500 (68030 box with EISA bus and ESDI disks,
| from memory) - Domain/OS (and Aegis before it) was a surprisingly
| fun OS, switching between Aegis, BSD & SysV userlands, and it
| came with enough compiler tools for me to write an AutoCAD driver
| for a Summagraphics tablet I picked up around the same time. The
| windowing UI was surprisingly pretty too, and Apollo Token Ring
| with its whacko RAM-over-Network architecture worked surprisingly
| well. It's a real shame HP swallowed them, but perhaps the moral
| of the story is not to try to ship 3 different OSes at the same
| time on the same hardware (and not to write a huge evolutionary
| fork like Aegis was).
|
| Edit 1: The DN3500 also came with a hardcopy of this PDF :)
|
| Edit 2: From memory, it also came with a surprisingly good super-
| early guide to TCP/IP and the Internet, complete with a hardcopy
| of the HOSTS file in case you couldn't get it via FTP from
| Internic ;)
| protomyth wrote:
| This is one of the old OSes that I wish had been open sourced. I
| would really like to see the Pascal source code, and it would be
| neat to see something that wasn't a UNIX.
| skissane wrote:
| > This is one of the old OSes that I wish had been open sourced
|
| Maybe one of these days, someone will convince the powers-that-
| be at HPE to do it. It has basically zero remaining commercial
| value. Open sourcing it would have PR benefits for HPE.
___________________________________________________________________
(page generated 2021-03-26 23:01 UTC)