[HN Gopher] DBOS: A Database-Oriented Operating System
       ___________________________________________________________________
        
       DBOS: A Database-Oriented Operating System
        
       Author : KraftyOne
       Score  : 54 points
       Date   : 2022-09-03 19:08 UTC (3 hours ago)
        
 (HTM) web link (dbos-project.github.io)
 (TXT) w3m dump (dbos-project.github.io)
        
       | crazygringo wrote:
       | I understand conceptually how you could architect a
       | desktop/server OS based on a local database instead of a local
       | filesystem, and it's deeply intriguing to me in terms of how
       | everything could share a common data language that is far more
       | flexible than files but far more structured than text. Presumably
       | things like terminal output would be formatted as query results
       | instead of text. The database wouldn't reside in files on disk,
       | it would reside directly in blocks on disk and in-memory as
       | required.
       | 
       | But this seems to be proposing a _distributed_ database that runs
       | _across_ computers, which confuses me. Anything that runs
       | _across_ computers seems to me to be application-level, not OS-
       | level, by definition. What does it even mean to be running a
       | distributed database on  "microkernel services" instead of on
       | fully-fledged OS's? And then... where is the OS-level CPU coming
       | from? If the database is distributed among computers then fine,
       | but which computer is "running" the OS?
       | 
       | Both the nomenclature and distributed aspect are really throwing
       | me off here.
        
         | tremon wrote:
         | _Anything that runs across computers seems to me to be
         | application-level, not OS-level, by definition_
         | 
         | Novell Netware was also marketed as a Network Operating System,
         | in the sense that none of its services were confined to the
         | local machine: its entire purpose was to combine a network of
         | computers into a single management unit.
        
         | pjmlp wrote:
         | Have a look at IBM i (nee AS/400), it uses database instead of
         | files, aptly named catalogs.
        
           | skissane wrote:
           | Disagree, IBM i uses files: how do you create a database
           | table in it? CRTPF command ("Create Physical File"). You
           | create a file.
           | 
           | And I don't know what you mean by "catalogs", in the context
           | of IBM i. Are you talking about DB2 catalog views? (Which
           | exist in DB2 on every other platform, and most other RDBMS
           | have something equivalent, such as the ANSI standard
           | INFORMATION_SCHEMA)
           | 
           | Or are you confusing IBM i with MVS (in which the OS contains
           | databases called "catalogs", in which you lookup a file name
           | or file name prefix to find out which disk volume a file is
           | stored on?)
        
         | marginalia_nu wrote:
         | A file system _is_ a database though. Not a relational one,
         | granted, but it 's still basically nosql before it was cool.
        
           | layer8 wrote:
           | Today's filesystems are more like the NoSQL equivalent of
           | hierarchical databases, which were the very first database
           | design, created in the 1960s, preceding RDBMSs.
        
         | jasonwatkinspdx wrote:
         | So I would strongly disagree with your notion that an OS cannot
         | be a distributed system. Several OS's, such as Plan 9, were
         | explicitly designed as distributed systems from the beginning.
         | For Plan 9, running it on a solo computer is more the edge case
         | than the common case they designed for, which was a small team
         | with workstations sharing a central server.
         | 
         | All of the computers are "running" the OS. The OS is more than
         | one service on more than one machine.
        
       | amelius wrote:
       | Sounds like something you could implement on top of an existing
       | OS like Linux (even in userspace) and get mostly the same
       | advantages.
        
         | laweijfmvo wrote:
         | This was my first thought, and probably should be the first
         | pass IMO. If they have to implement an entire novel OS to
         | support this it will never be anything beyond a lab experiment.
        
       | didgetmaster wrote:
       | I have been working for years on a system designed to do much of
       | what is described in their 'Level 2' layer. It is a single system
       | that can effectively manage unstructured (i.e. file) data, semi-
       | structured data (NoSql), and highly structured data (RDBMS).
       | 
       | I don't have the resources available to me like this team, so
       | there are still a lot of features on my TODO list that would
       | enable some of the things they are looking for, but it can do
       | many things now. It can manage 200M+ files and find subsets of
       | them in sub-second speed. It can build relational tables with
       | hundreds of millions of rows and thousands of columns and perform
       | queries faster than many conventional DBs.
       | 
       | www.Didgets.com is where you can download the beta software. Demo
       | videos at
       | https://www.youtube.com/channel/UC-L1oTcH0ocMXShifCt4JQQ
        
       | AlbertCory wrote:
       | "A database-oriented operating system" -- where have I heard this
       | before?
       | 
       | Oh, right: https://en.wikipedia.org/wiki/Pick_operating_system
       | 
       | (No, I never used this.)
        
       | melony wrote:
       | Windows Registry doesn't count?
        
       | fernly wrote:
       | I expect they will acknowledge the Pick OS? "a demand-paged,
       | multiuser, virtual memory, time-sharing computer operating system
       | based around a MultiValue database."
       | 
       | [0] https://en.wikipedia.org/wiki/Pick_operating_system
        
         | skissane wrote:
         | Pick's "multi-value database" is essentially just a flat-file
         | database. I don't see how - if we put aside the marketing - it
         | was really any more "database-oriented" than your average
         | mainframe/minicomputer operating system with a record-oriented
         | filesystem, such as MVS (especially VSAM), VM/CMS, OpenVMS (in
         | particular its Files-11 RMS component), etc.
        
       | pjmlp wrote:
       | Well, Oracle APEX applications written in PL/SQL are a way to
       | approach this.
        
       | PaulDavisThe1st wrote:
       | OK, so I've got 50GB of audio samples. Does anyone actually
       | believe that these 50GB (400 giga-bits) are more efficiently
       | stored in a database designed for "stuff" than they are in a
       | database designed for files ("a filesystem") ?
        
         | wswope wrote:
         | Can be; it's all contextual.
         | 
         | https://www.sqlite.org/fasterthanfs.html
        
           | PaulDavisThe1st wrote:
           | Fair enough. The problem is that at some point, the data has
           | to hit some sort of storage hardware. Presumably between the
           | DB and the hardware, there's some layer that somewhat
           | abstracts the storage hardware. Isn't that ... a filesystem?
        
             | tremon wrote:
             | That's a storage volume (partition, raid volume, zfs pool,
             | etc), not a filesystem. A filesystem is the abstraction
             | layer on top of the storage volume translating the user-
             | assigned data identifiers (aka file names) to byte ranges.
             | 
             | Talking specifically about databases: they often implement
             | their own data organization. Oracle and Sybase famously
             | performed better when working on raw partitions than with
             | files.
        
             | samus wrote:
             | A filesystem offers a hierachical interface. Meanwhile, a
             | DBMS needs nothing more from the OS than access to blocks
             | and preferably information about HDD layout. That's a level
             | below.
        
             | hcta wrote:
             | Can you clarify what point you're making? If you're trying
             | to argue that adding an extra layer can only reduce
             | performance, any cache is an obvious exception to that. Are
             | you saying it's extraneous to use a database as a storage
             | abstraction because they have to sit on top on filesystems,
             | and filesystems already exist?
        
               | PaulDavisThe1st wrote:
               | The point I'm making (and I'm not certain that it is
               | true) is that ultimately if you want to store raw data, a
               | filesystem _seems_ more likely to be what you want to
               | use. Put differently, BLOBs in the DB end up
               | (necessarily) as blobs on the disk, and managing blobs on
               | a disk is precisely what filesystems are intended for.
               | 
               | But yes, on top of that, there's the question that in the
               | end even the DB will need something very, very much like
               | a filesystem between them and the storage hardware ...
               | which opens up the question whether this should remain
               | hidden to every other application, or whether it makes
               | sense that for certain kinds of applications, they too
               | would use it (i.e. just like today)
        
               | layer8 wrote:
               | > managing blobs on a disk is precisely what filesystems
               | are intended for.
               | 
               | A filesystem is doing much more, e.g. providing naming
               | and management (directories, symlinks, access control,
               | extended attributes, cache management, ...) for files for
               | manipulation by humans and applications, whereas RDBMs
               | only need fixed-sized blocks of storage.
               | 
               | Some databases actually support using raw disks without a
               | normal filesystem, which can have advantages by removing
               | the extra layer of abstraction, e.g.:
               | 
               | https://dev.mysql.com/doc/refman/8.0/en/innodb-system-
               | tables...
               | 
               | https://docs.oracle.com/en/database/oracle/oracle-
               | database/2...
               | 
               | https://www.ibm.com/docs/de/db2/9.7?topic=creation-
               | attaching...
        
             | wswope wrote:
             | If I'm reading you right, you're correct that the database
             | is still technically passing its data to the filesystem at
             | the end of the day.
             | 
             | However, databases generally subsume most responsibility
             | for the on-disk representation of data as well as I/O
             | patterns. What's really being compared here is the
             | performance of the database as a storage engine vs. the
             | file system itself as a storage engine - not the raw I/O
             | potential of the filesystem itself.
             | 
             | https://en.wikipedia.org/wiki/Database_engine
        
             | jandrewrogers wrote:
             | Many databases implement their own filesystem internally
             | that are heavily optimized for database-y use cases and
             | access patterns while missing standard POSIX and other
             | features a "real" filesystem would have. When this
             | filesystem is installed on top of the OS filesystem, there
             | is a cost due to duplication of effort, design impedance
             | mismatch, limitations of the OS filesystem, etc. This is
             | partially mitigated by turning the files in the OS
             | filesystem into a giant block store to minimize interaction
             | with the OS filesystem.
             | 
             | Some database filesystems can be installed directly on raw
             | block devices if you desire with no OS filesystem in the
             | middle. This usually offers significant performance and
             | efficiency gains since everything above the raw hardware is
             | purpose-built for the requirements of optimal database
             | performance.
        
       | drewcoo wrote:
       | Neat idea, but so many questions . . .
       | 
       | how are process boundaries on data preserved?
       | 
       | If data can be shared between processes via the db, how do they
       | enforce clean, clear, testable interfaces (like monoliths lack)?
       | 
       | And given all that, how do they manage data schema changes that
       | we'd handle now with API versioning?
        
         | marginalia_nu wrote:
         | > clean, clear, testable interfaces (like monoliths lack)
         | 
         | What prevents a monolith from having these things?
        
       | airocker wrote:
       | """ At Level 1, a kernel provides low-level OS services such as
       | device drivers and memory management. At Level 2, a distributed
       | DBMS runs on those services. At Level 3, we build high-level OS
       | services such as a distributed file system, cluster scheduler,
       | and distributed inter-process communication (IPC) subsystem on
       | top of the DBMS. At Level 4, users write applications. """ How
       | would it be different from installing today's kernel and then
       | installing postgres, Gluster, docker and etcd on top of it?
        
         | airocker wrote:
         | would you force level 4 applications to only access level 3
         | services and model everything that OS does (device management,
         | process management, memory management etc) as a layer on top of
         | it? So essentially all devices are a table? I think just
         | modeling memory access as a table would be a big win. Not sure
         | how atomicity and consistency would help applications exactly
         | though with every memory access. Would love to know.
        
         | Jtsummers wrote:
         | Probably shortens the path between the applications (built on
         | their distributed database manager) and the hardware/network
         | compared to running applications on top of Postgres or other
         | DBs today (which still end up calling out to the kernel and
         | other subsystems). If _all_ your applications (as seems to be
         | their intent) don 't need the Linux kernel, but only the DB,
         | then push the DB service into the operating system.
         | 
         | Building on top of too many layers increases the overall
         | complexity and reduces overall performance. Periodically
         | chopping out part of the system and creating what you actually
         | need, fresh, is sometimes necessary. Even if it doesn't produce
         | a new product or final system on its own, gives you a direction
         | for moving other systems if the theory pans out in practice.
        
           | [deleted]
        
       | balentio wrote:
       | The cloud is all ready a security nightmare. I don't want to put
       | more stuff in it.
        
       | rwmj wrote:
       | Is there any work or implementation? Edit yes, this would be a
       | better link: https://dbos-project.github.io/
        
       | wyan wrote:
       | AS/400 follows a similar idea if I recall correctly, on top of
       | DB2
        
         | skissane wrote:
         | Dubious. IBM's marketing wants to convince you it does, but as
         | far as I can work out, the integration of DB2 into the system
         | is nowhere near as deep as the marketing makes it sound.
        
       ___________________________________________________________________
       (page generated 2022-09-03 23:00 UTC)