[HN Gopher] I summarized my understanding of Linux systems
       ___________________________________________________________________
        
       I summarized my understanding of Linux systems
        
       Author : lsc4719
       Score  : 276 points
       Date   : 2024-03-14 07:33 UTC (15 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | smitty1e wrote:
       | I think it needs three areas, not two:
       | 
       | 1. User space
       | 
       | 2. Kernel
       | 
       | 3. Hardware/network
       | 
       | The kernel protects users from hardware, and hardware from users.
        
         | topspin wrote:
         | This is reasonable and correct. I would also have found places
         | in that map for: dcache, block devices, character devices,
         | scheduler, page cache and console/tty/pty. The first two
         | replace "filesystem hierarchy". The second and third are
         | ancient and fundamental classes of UNIX devices.
        
         | t1tos wrote:
         | this is analagous to the fs hierarchy: root protects from the
         | user
        
       | cookiengineer wrote:
       | I can recommend taking a look at /proc and /sys, because that
       | will clear up a lot of how things are intertwined and connected.
       | 
       | procfs is what's used by pretty much all tools that do something
       | with processes, like ps, top etc.
       | 
       | The everything is a file philosophy becomes much more clear then.
       | Even low level syscalls and their structs are offered by the
       | kernel as file paths so you can interact with them by parsing
       | those files as a struct, without actually needing to use kernel
       | headers for compilation.
       | 
       | eBPF and its bytecode VM are a little over the top, but are
       | essential to known about in the upcoming years cause a lot of
       | tools is moving towards using their own bpf modules.
        
         | Cloudef wrote:
         | > The everything is a file philosophy becomes much more clear
         | then.
         | 
         | To be honest, everything is a file is kind of a lie in unix.
         | /proc and /sys are pretty much plan9 inspiration.
        
           | wolletd wrote:
           | Also, a lot of devices require very specific ioctl() commands
           | to work with and don't provide everything as a file.
           | 
           | For example, you can't set the baudrate of a serial port by
           | writing it to some /proc node.
        
             | arghwhat wrote:
             | nit: The ioctl syscall targets files just the same as the
             | write syscall.
             | 
             | Everything being a file, and everything being read/write
             | calls are different things. There's pros and cons.
        
           | arghwhat wrote:
           | A more accurate term is that everything is a _file
           | descriptor_.
           | 
           | The main difference is that plan9 uses read and write for
           | everything, whereas Linux and BSD uses ioctls on file
           | descriptors for everything.
        
             | vbezhenar wrote:
             | Everything is a descriptor. When I'm opening a TCP
             | connection, there's no file, so calling it a file
             | descriptor feels wrong.
             | 
             | And at that point, the whole "everything is" turns into
             | nonsense, because yes, everything is a pointer to
             | something, so what.
        
               | arghwhat wrote:
               | There are _named_ files (which have a file path) and
               | _anonymous_ files (which do not). You can see these in
               | /proc/$PID/fd/$FD if you're curious - when the link
               | doesn't start with '/', it's anonymous. Even process
               | memory is just an anonymous file on Linux, and arguably a
               | cleaner one as it operates on proper fds, instead of
               | plan9 where a string "class name" (not a path) is used to
               | access the magical '#g' filesystem.
               | 
               | The difference to plan9 is not the files, but the way
               | plan9 uses text protocols with read/write to ctl files.
               | To open a TCP connection - if memory serves me right -
               | you first have to write to a ctl file, which creates a
               | new file for the connection. Then, you write the dial
               | command to the ctl file of that connection, and after
               | which you can open the connection file. On Linux, a
               | syscall creates an anonymous file, and then everything
               | after is operations on this anonymous file.
               | 
               | There's some ideological benefits, but plan9 creates a
               | mess of implicit text protocols, ugly string handling,
               | syscall storms and memory inefficiencies. Their design is
               | pretty much solely a limitation caused by the idea that
               | all filesystems should exist through the 9p protocol,
               | which as a networked protocol cannot share data (fds,
               | structs), only copy (payloads to/from read, write). With
               | the idea that all functionality had to be replaceable and
               | mountable from remote machines, the only possible API
               | became read/write for everything.
               | 
               | I'd argue that fd-using syscalls and ioctls - basically
               | per-file syscalls - is a superior approach to implement
               | everything-as-a-file.
        
               | Cloudef wrote:
               | Whichever superior depends on your use case and needs.
               | Plan9's approach is very powerful whenever you need
               | anything distributed, and makes lots of boilerplate to
               | achieve that basically unnecessary. Linux nowadays is
               | flexible for both approaches (in theory, the ecosystem
               | might not be there), and I'm glad user namespaces are a
               | thing.
        
               | arghwhat wrote:
               | Now, proper plan9-style namespaces, _that_ I miss. :)
               | 
               | User namespaces are still a hell of a lot clunkier than
               | "each process inherits its parents' namespace".
        
               | Cloudef wrote:
               | If you setup user namespace, the child processes will
               | inherit that namespace. The difference is that plan9 is
               | fully built on this idea and isn't multi-user, on linux
               | you have to opt-in to this. It's very useful and
               | underused (mostly used by containers). I wanted to ship
               | my AWS lambdas this way, but sadly AWS lambdas don't
               | allow user namespaces.
               | 
               | https://github.com/aws/aws-lambda-base-images/issues/143
        
               | moody__ wrote:
               | Plan 9 is very multi-user, namespaces are actually one
               | component in how it is multi user specifically. A given
               | terminal may be designed to only have one user on it but
               | CPU servers are still multi-user multiplexers, that part
               | has not been given up.
        
               | Cloudef wrote:
               | Yes, by multi-user i meant, plan9 doesn't have the
               | unix/linux user model, but rather "multi-user" is done
               | with the namespaces. Container people would be more
               | familiar with the plan9 way.
        
               | candiodari wrote:
               | > There's some ideological benefits, but plan9 creates a
               | mess of implicit text protocols, ugly string handling,
               | syscall storms and memory inefficiencies.
               | 
               | On the other hand, linux ioctl and syscalls have infinite
               | binary structs you need to know (and cannot let the
               | compiler reorder fields in), which then doesn't make
               | cross-platform development any easier.
        
               | Cloudef wrote:
               | I was quite disappointed by the ABI differences between
               | different architectures when I was doing network
               | transparent uinput.
               | 
               | https://git.cloudef.pw/uinputd.git/tree/common/packet.h#n
               | 34 https://git.cloudef.pw/uinputd.git/tree/server/uinputd
               | .c#n16
               | 
               | (Excuse to write code to use my PS Vita as gamepad :D)
        
               | arghwhat wrote:
               | Having to know structs is not really an issue - you also
               | need to know text formats, JSON schemas, what not.
               | 
               | Re-ordering of structs is always forbidden with the
               | binary format being strictly specified, so there's
               | nothing to worry about there. Can't exactly shuffle bytes
               | in a text format either, and plan9 control strings tend
               | to have positional arguments.
               | 
               | The current structs do leave something to be desired
               | though.
        
               | candiodari wrote:
               | Really? The issue is that those structs don't cross-
               | compile correctly if you aren't very careful in their
               | use. I've personally managed to write an IRC bot that had
               | a select loop that worked fine on i386 but didn't on
               | amd64 until I figured out what was happening to the size
               | of the structure behind it.
        
               | akdev1l wrote:
               | Structured json data that you mentioned does allow to
               | "shuffle bytes"
               | 
               | It doesn't matter which order the client sends {"a": 1,
               | "b": 2 }, it's one object with "a" and "b" regardless
        
               | arghwhat wrote:
               | Do the same to a JSON array.
               | 
               | Position having meaning is a common part of most data
               | representations, not something specific to structs. If
               | you _need_ to, you can also engineer order independence
               | in structs with arrays of type and value unions, but
               | there 's no need to as the order has never been a
               | problem.
        
               | moody__ wrote:
               | I am not exactly sure where you got the "class name" part
               | from, I've typically refereed to those as kernel
               | filesystems or sharp devices. For the record these kernel
               | filesystems are not technically 9p, they present an
               | interface much similar but reads and writes to them are
               | not marshaled and unmarshaled from 9p. It is however
               | possible to export their files over 9p if one desires, I
               | can import a remote machines /net stack and use it to
               | announce or dial out. Plan 9 gives us proto-VPNs just be
               | virtue of its design.
               | 
               | There was perhaps a time where the differences between
               | having everything in binary ioctls and bound to
               | specifically one device was a necessary component in
               | order to reach reasonable performance, but I don't
               | believe that is the case anymore. Anecdotally these days
               | everything on Plan 9 feels snappier. We have some
               | benchmarks that show that 9front outperforms linux with
               | naive pipe io and context switches. What Plan 9 misses in
               | micro optimizations it makes up for by having a
               | incredibly consistent and versatile base.
               | 
               | I want to reiterate the benefits of the network
               | transparency by talking about how drawterm works.
               | Drawterm can be thought of the plan 9 equivalent of
               | windows RDP. How it works is that internally drawterm
               | creates routines to expose a /dev/draw, /dev/mouse and
               | /dev/keyboard through whichever native way there is on
               | the target system (macos, windows, linux, etc). It then
               | attaches to the remote system and overlays these files
               | over a namespace. Programs like our window manager rio
               | can then be run completely transparently, forwarding not
               | compressed images, but individual draw RPC messages.
               | There is no need for any special code on the plan 9 host
               | side in order to accommodate drawterm, again it is
               | something that just falls out of the core design of the
               | system.
        
               | Cloudef wrote:
               | Even on linux people avoid syscalls because syscalls are
               | slow and bad. So I don't really see the problem with
               | plan9's approach either. Make the common scenario useful,
               | optimize for special cases (sendfile, io_uring). In fact
               | read/write lets you batch bigger amount of data than
               | single ioctl can actually be more performant.
        
               | MisterTea wrote:
               | > magical '#g' filesystem.
               | 
               | Whats magical about the segmet(3)[1] device? The '#'
               | devices are kernel file servers. There's no magic.
               | 
               | [1] http://man.9front.org/3/segment
        
             | sph wrote:
             | Relevant talk by Benno Rice, "What UNIX cost us":
             | https://youtu.be/9-IWMbJXoLM?si=OblWX3OMXWrSFinb
        
       | dfc wrote:
       | My mental model of Linux does not have the CPU/Memory in user
       | space. What am I missing?
        
         | vbezhenar wrote:
         | Userspace program directly uses CPU and memory (unless you're
         | using VM). In contrast to that, your userspace program does not
         | directly access your network device or SSD, but uses kernel
         | routines to access those indirectly.
        
           | persnickety wrote:
           | If you're using a hypervisor, then the userspace program
           | inside the VM is also using the CPU and memory directly.
           | You'd have to do full emulation to avoid that.
           | 
           | Even with full emulation, I'd say memory is being accessed
           | directly, unless you really go out of your way to make it
           | weird.
        
             | icedchai wrote:
             | With full emulation, I'd argue memory access is _not_
             | direct. Memory access from the emulated system will go
             | through user space code in the emulator. That code may
             | translate it to actual memory access, or perhaps an
             | emulated, memory mapped I /O device like a frame buffer.
             | Either way, there is something in the middle.
             | 
             | You could argue that nothing is direct unless you're
             | running on a bare metal system, no MMU, no page tables. How
             | do you define "direct"?
        
           | sophacles wrote:
           | It doesn't directly access memory. The addresses in your
           | userspace program are not the actual addresses of memory in
           | the ram sticks - there is a table of mappings that the kernel
           | sets up. When your process asks the kernel for memory, it
           | says "i need 5KB, please put that at address XYZ". The
           | kerenel goes and finds 5KB unused, probably at some other
           | address ABC, and creates a mapping in the table that says XYZ
           | translates to ABC. Then the kernel sets the MMU of the CPU to
           | use the table for your process, and switches back to
           | unpriveleged mode, letting your process run again. Your
           | process in unpriveleged mode sends an instruction to write to
           | the memory at XYZ, but the cpu will translate that
           | instruction to ABC and write there instead.
           | 
           | VMs (well not emulated vms, but if you're doing an x86 vm on
           | x86 or an arm vm on arm) do something similar - an inaccurate
           | (but useful for the concept) way to think of it is that the
           | cpu does 2 layers of MMU for user processes in a vm.
           | 
           | The kernel code isn't running directly when your program
           | accesses memory, but it sets up the cpu so that the kernel
           | still has control over your memory, and you only have access
           | to what the kernel allows - its mediated by the kernel.
        
             | wrs wrote:
             | I think the point of the diagram is that the abstractions
             | or "API" that user space gets to use includes memory it can
             | read and write directly, and a CPU to execute instructions.
             | Of course in reality there only "appears to be" memory and
             | a CPU, but that's why it's an abstraction. Just like there
             | "appears to be" a filesystem for user space to use, when in
             | reality there's a block interface to a disk, or wherever
             | you want to draw the line.
        
               | sophacles wrote:
               | I think what I was getting at was that memory sort of
               | sits in-between.
               | 
               | My instructions are executed directly on the cpu. My file
               | reads and writes directly translated from a stream of
               | bytes to block instructions by code in the kernel.
               | 
               | Memory is a wierd in-between place, or maybe a 3rd
               | option, since the kernel has to run a bunch of code on my
               | behalf for me to use memory, sort of like the filesystem
               | thing, but I'm using the direct hardware units afterward,
               | sort of like instructions.
        
           | akira2501 wrote:
           | Ring 3 is not "directly using the CPU." And mmap is not
           | "directly using the memory."
        
             | suprjami wrote:
             | Hardware ring has nothing to do with "directly using the
             | CPU", it controls what access level the program has.
             | 
             | Forget virtualisation. Compile a userspace program which
             | just adds numbers into a stack variable. That program is
             | running directly on the CPU in the unprivileged ring.
             | 
             | A userspace program in a VT-x virtual machine is exactly
             | the same.
             | 
             | If those programs attempt privileged access then that
             | access will fail and a trap is raised. That's what the CPU
             | ring controls.
        
               | akira2501 wrote:
               | > Hardware ring has nothing to do with "directly using
               | the CPU"
               | 
               | Why wouldn't it? Several features are simply not
               | available in ring 3. Several features are configured for
               | you in a way you cannot change. Several instructions will
               | simply fault your program.
               | 
               | > which just adds numbers into a stack variable
               | 
               | Yes.. and when you eventually overflow that stack, what
               | happens? How did the stack segment selector get created?
               | Can you change that selector or it's attributes? Can you
               | set the stack pointer to any valid memory address you
               | like?
               | 
               | > A userspace program in a VT-x virtual machine is
               | exactly the same.
               | 
               | What does an IOMMU do?
               | 
               | > If those programs attempt privileged access then that
               | access will fail and a trap is raised.
               | 
               | Right.. so you are not directly using the CPU. You're not
               | even in control of what timeslices are afforded to you by
               | the OS. You are in an exceptionally limited environment
               | most of which you cannot control or alter and much of
               | which you cannot even observe.
               | 
               | The fact that instructions get dispatched according to
               | the system ABI when you run a program is not material to
               | this problem, and in particular, is not at all correctly
               | represented by this diagram.
        
         | suprjami wrote:
         | Nor does mine.
         | 
         | Userspace assembly runs directly on the CPU* executing in the
         | unprivileged ring. When the userspace program makes a system
         | call by calling a kernel entry function which is mapped into
         | the process's address space by the dynamic loader, then part of
         | that entry into kernelspace is to put the CPU into the
         | privileged ring and kernel assembly then runs on the CPU.
         | 
         | The process scheduler can stop execution to kick a task off the
         | CPU and switch to another one, depending on OS and kernel some
         | things can be kicked off the CPU and some cannot.
         | 
         | Userspace memory allocations are serviced by virtual memory
         | where the page tables track the translation of virtual memory
         | pages into physical memory pages using the MMU.
         | 
         | The kernel is involved during allocation and page fault, but
         | iiuc a regular successful virtual memory access is a hardware
         | operation only.
         | 
         | I don't have a diagram of how this works. Neither processes nor
         | memory are my usual area of kernel.
         | 
         | You'd be better to read the x86 version of the XV6 book to
         | learn how this stuff really works. It's really well written and
         | implements enough to be tangibly useful. Reading the code is
         | optional when just learning concepts. Reading the XV6 code will
         | hopefully help you understand the O'Reilly Linux books better,
         | which will hopefully help you understand the actual Linux
         | kernel better.
         | 
         | (*yes I'm aware CPUs don't directly execute assembly anymore,
         | but the microcode guarantees the observable CPU state at any
         | Instruction Pointer matches the expectation as if you were
         | running assembly on a PDP or C64, or close enough for 99.999%
         | purposes and definitely enough for debugging your program in
         | gdb)
        
           | Koshkin wrote:
           | > _execute assembly_
           | 
           | I always thought that assembly was a type of programming
           | language.
        
             | pertique wrote:
             | It is. From my understanding, CPUs execute machine code.
             | Assembly has to be passed through an assembler to get
             | machine code, and that assembler can make other changes as
             | well, so they are not always one to one. Written assembly
             | will usually translate veryclosely to machine code, though.
        
             | marcosdumay wrote:
             | The GP has a slightly weird language, but mixing assembly
             | and machine code in informal speech isn't rare at all.
        
             | ooterness wrote:
             | Ben Eater's excellent educational video series includes one
             | explaining the difference:
             | 
             | https://www.youtube.com/watch?v=oO8_2JJV0B4
             | 
             | In short, assembly for a given CPU is very nearly one-to-
             | one with the machine language for that CPU. It's not
             | correct to conflate the two, but close enough when speaking
             | informally.
        
             | suprjami wrote:
             | Here I'm referring to assembly as mnemonic for machine
             | code, but yes it would have been more correct to say
             | machine code instead.
        
       | timeforcomputer wrote:
       | Nice! I want to do something similar and map my understanding of
       | Linux. I find some diagrams on Wikipedia fascinating (example:
       | https://en.m.wikipedia.org/wiki/File:Linux_Graphics_Stack_20...,
       | but more to do with the user library ecosystem rather than kernel
       | and program runtime). These diagrams make me want to learn about
       | each part and be able to comprehend in principle what is
       | happening on my machine. Eventually...
        
         | Jasper_ wrote:
         | Any diagram by ScotXW on Wikipedia is somewhere between
         | misleading and completely wrong, and they're a constant pain on
         | the Linux graphics community.
         | 
         | If you're curious about the details in this case, ScotXW
         | confuses EGL and OpenGL, the arrows aren't quite right, and the
         | labels aren't quite right either (DRM is labeled "hardware-
         | specific" but KMS isn't? The label for "hardware specific
         | Userspace interface to hardware specific direct rendering
         | manager" is quite weird), and some important boxes are flat out
         | missing. It's nitpicking for sure, but when the diagram goes
         | out of its way to add extremely weird details, it demands
         | nitpicking.
         | 
         | Nobody in the Linux graphics community would draw the diagram
         | like this.
        
       | richardwhiuk wrote:
       | I don't understand what the boxes on this diagram are meant to
       | represent.
       | 
       | It feels like an elaborate mechanism to draw something wrong in
       | the hopes people will correct it.
        
         | projektfu wrote:
         | FWIW, I also don't really understand what the boxes are
         | supposed to represent, given that the arrows represent
         | dependencies like PID <-- process. I thought a PID was an
         | attribute of a process?
         | 
         | To me, a block diagram might show [CPU Scheduler], [Virtual
         | Device Manager], [VFS Manager], [Memory Manager], [Interrupt
         | Handlers], etc...
         | 
         | Of course, my knowledge of Linux internals is limited and
         | perhaps it has a separation of the concept of PID and process
         | where there is a literal dependency.
        
         | sevagh wrote:
         | Interview prep.
        
       | tmalsburg2 wrote:
       | I learned a lot about this from the book "The design of the Unix
       | operating system" by Maurice J. Bach.1 It's an old book and many
       | details deviate from actual present-day Linux, but it nonetheless
       | gives a great overview of the key components and ideas.
       | 
       | 1 https://books.google.de/books/about/The_Design_of_the_UNIX_O...
        
         | guerrilla wrote:
         | This is one of my favorite books. A true classic. There are
         | follow-ups in that style for Linux and FreeBSD as well. I think
         | Robert Love wrote the former.
        
           | temeya wrote:
           | And Marshall Kirk McKusick wrote the latter, "The Design and
           | Implementation of the FreeBSD Operating System"
        
             | guerrilla wrote:
             | Thanks, pretty sure that's the one I meant.
        
           | bigfatfrock wrote:
           | Thank you! I was going to ask for the latest linux variant -
           | are you speaking of Love's "Linux Kernel Development", or
           | "Linux in a Nutshell"?
           | 
           | I've been a primary linux user for a couple decades now but
           | I'm not too keen on digging into kernel hacking but love
           | details like the OPs post.
        
             | guerrilla wrote:
             | The Kernel Development book. It gives a tour. Read the UNIX
             | one first though.
        
         | cpach wrote:
         | Seems to be available on the Internet Archive:
         | https://archive.org/details/DesignUNIXOperatingSystem
        
       | thesuperbigfrog wrote:
       | "The Linux Programming Interface" by Michael Kerrisk is one of
       | the best technical resources I have found and used to understand
       | Linux:
       | 
       | https://man7.org/tlpi/
       | 
       | Description from the book's website:
       | 
       | "The Linux Programming Interface (TLPI) describes system
       | programming on Linux and UNIX.
       | 
       | TLPI is both a guide and reference book for system programming:
       | 
       | If you are new to system programming, you can read TLPI linearly
       | as an introductory guide: each chapter builds on concepts
       | presented in earlier chapters, with forward references kept to a
       | minimum. Most chapters conclude with a set of exercises intended
       | to consolidate the reader's understanding of the topics covered
       | in the chapter.
       | 
       | If you are an experienced system programmer, TLPI provides a
       | comprehensive reference that you can consult for details of
       | nearly the entire Linux and UNIX (i.e., POSIX) system programming
       | interface. To support this use, the book is thoroughly cross
       | referenced and has an extensive index."
        
       | begueradj wrote:
       | Why a UML book is listed as a reference ?
        
       | knorker wrote:
       | Whenever I've made notes like this, it's never been useful to my
       | future self nor to anyone else.
       | 
       | The only use I've had of this kind of documentation is that the
       | process of writing it, made me understand it better. Basically
       | write-only documentation.
       | 
       | I would call myself a Linux expert, and while I can kinda see
       | what you mean with this diagram, it would not have been useful to
       | me back before I was an expert.
        
         | persolb wrote:
         | It almost resembles mind mapping. It is a useful 'process' to
         | figure out what you think/know. And it might be a pretty
         | picture. But it isn't very useful as documentation.
        
         | falserum wrote:
         | I found it useful. Allowed me to compare if I have similar idea
         | to the author.
        
         | codelobe wrote:
         | Usually I would agree. I typically make a "Crash-Course in
         | $PLATFORM" document while keeping notes. These I very commonly
         | reference in order to externalize my memory since it seems to
         | be approaching capacity. I don't care about Ruby on Rails, but
         | once I did, and I can reference my notes if I ever need to
         | touch that platform again.
        
       | pjmlp wrote:
       | UNIX IPC is kind of missing, streams, pipes, message boxes,
       | shared memory.
       | 
       | SUN RPC for NFS, yellow pages,...
        
       | peter_d_sherman wrote:
       | A future simple linux-like (or unix-like) OS -- could
       | theoretically be created with _only 4 syscalls_ :
       | 
       | open() read() write() close()
       | 
       | Such a theoretical linux-like or unix-like OS would assume quite
       | literally that _" everything is a file"_ -- including the ability
       | to _perform all other syscall /API calls/functions via special
       | system files_, probably located in /proc and/or /sys and/or other
       | special directories, as other posters have previously alluded
       | to...
       | 
       | Also, these 4 syscalls could theoretically be combined into a
       | _single syscall_ -- something like (I 'll use a Pascal-like
       | syntax here because it will read easier):
       | 
       | FileHandleOrResult = _OpenOrReadOrWriteOrClose(FileHandle:
       | Integer; Mode: Integer; pData: Pointer; pDataLen: Integer);_
       | 
       | if Mode = 1 then open();
       | 
       | if Mode = 2 then read();
       | 
       | if Mode = 3 then write();
       | 
       | if Mode = 4 then close();
       | 
       | FileHandle is the handle for the file IF we have one; that's for
       | read() write() and close() -- for open() it could be -1, or
       | whatever...
       | 
       | Mode is the mode, as previously enumerated.
       | 
       | pData is a pointer to a pre-allocated data buffer, the data to
       | read or write, or the full filename when opening...
       | 
       | (And of course, the OS could overwrite it with text strings of
       | any error messages that occur... if errors should occur...)
       | 
       | pDataLen is the size of that buffer in bytes.
       | 
       | When the Mode is open(), pData contains a string of the path and
       | file to open.
       | 
       | When Mode is read(), pData is read _to_ , that is, overwritten.
       | 
       | When Mode is write(), pData is used to write _from_.
       | 
       | All in all, pretty simple, pretty straightforward...
       | 
       | A _" one syscall Linux or Unix (or Linux-like or Unix-like)
       | operating system"_, if you will... for simplicity and
       | understanding!
       | 
       | (Andrew Tannenbaum would be pleased!)
       | 
       | Related: "One-instruction set computer" (OISC):
       | https://en.wikipedia.org/wiki/One-instruction_set_computer
        
         | richardwhiuk wrote:
         | That's already kind of how syscalls work - you shove the
         | syscall number in a register, and then call an interrupt.
        
         | samatman wrote:
         | You've effectively reinvented 9p here. Which is good!
         | 
         | There are some differences which may interest you:
         | https://9fans.github.io/plan9port/man/man9/intro.html
         | 
         | I think you may find that some of the additional complexity of
         | 9p is necessary, but perhaps not all of it.
        
         | zzo38computer wrote:
         | I had considered that too, but what I had also considered, and
         | that I think is better, is a different single syscall, which is
         | more like a actor model or like a capability-based system. (One
         | problem with the "everything is a file" like Plan9 is that then
         | the operating system has to parse the file paths every time you
         | want to do any I/O; what I describe below ignores that problem
         | since you can link directly to objects instead.)
         | 
         | A process has access to a set of capabilities (if it does not
         | have any capabilities, then it is automatically terminated
         | (unless a debugger is attached), since there is nothing for the
         | program to do).
         | 
         | A "message" consists of a sequence of bytes and/or
         | capabilities. (The message format will be system-independent
         | (e.g. the endianness is always the same) so that it works with
         | emulation and network transparency, described below.)
         | 
         | A process can send messages to capabilities it has access to,
         | receive messages from capabilities it has access to, create new
         | capabilities (called "proxy capabilities"), discard
         | capabilities, and wait for capabilities.
         | 
         | Terminating the process is equivalent to a mandatory blocking
         | wait for an empty set of capabilities; discarding all
         | capabilities also terminates the process. A non-blocking wait
         | for an empty set of capabilities means that you wish to yield,
         | so that other processes get a chance to run, before this
         | process continues.
         | 
         | Some further options may be needed to handle multiple locking
         | and transactions, and to avoid some kinds of race conditions,
         | but mostly that is just it.
         | 
         | This is useful for many things, including sandboxing,
         | emulation, network transparency (this can be done by one
         | program keeping track of which capabilities need to be sent
         | across the network link and assign an index number to each one,
         | and then the other end will create a proxy capability for each
         | index number and use that number when it wants to send back),
         | security with user accouts, etc; the kernel does not need to
         | know about all of these things since they can be implemented in
         | user code.
         | 
         | Other things (outside of the kernel) can also be implemented in
         | terms of proxy capabilities, and I had ideas about those other
         | parts of the operating system too, for example it has a
         | hypertext file system (with no file names, but files can
         | contain multiple numbered streams, which can include both bytes
         | and links to other files (which can be either to the current
         | version or to a fixed version; if to a fixed version then copy-
         | on-write will be used if the file is modified)), and the
         | "foreign links table", and a common (binar) data format, and a
         | command shell with some similarities than Nushell (but also
         | many differences), and the system uses the "Extended TRON Code"
         | character set, and details about the working of the package
         | manager and IME and window manager, etc.
        
       ___________________________________________________________________
       (page generated 2024-03-14 23:01 UTC)