[HN Gopher] DBOS: A Database-Oriented Operating System
___________________________________________________________________
DBOS: A Database-Oriented Operating System
Author : KraftyOne
Score : 54 points
Date : 2022-09-03 19:08 UTC (3 hours ago)
(HTM) web link (dbos-project.github.io)
(TXT) w3m dump (dbos-project.github.io)
| crazygringo wrote:
| I understand conceptually how you could architect a
| desktop/server OS based on a local database instead of a local
| filesystem, and it's deeply intriguing to me in terms of how
| everything could share a common data language that is far more
| flexible than files but far more structured than text. Presumably
| things like terminal output would be formatted as query results
| instead of text. The database wouldn't reside in files on disk,
| it would reside directly in blocks on disk and in-memory as
| required.
|
| But this seems to be proposing a _distributed_ database that runs
| _across_ computers, which confuses me. Anything that runs
| _across_ computers seems to me to be application-level, not OS-
| level, by definition. What does it even mean to be running a
| distributed database on "microkernel services" instead of on
| fully-fledged OS's? And then... where is the OS-level CPU coming
| from? If the database is distributed among computers then fine,
| but which computer is "running" the OS?
|
| Both the nomenclature and distributed aspect are really throwing
| me off here.
| tremon wrote:
| _Anything that runs across computers seems to me to be
| application-level, not OS-level, by definition_
|
| Novell Netware was also marketed as a Network Operating System,
| in the sense that none of its services were confined to the
| local machine: its entire purpose was to combine a network of
| computers into a single management unit.
| pjmlp wrote:
| Have a look at IBM i (nee AS/400), it uses database instead of
| files, aptly named catalogs.
| skissane wrote:
| Disagree, IBM i uses files: how do you create a database
| table in it? CRTPF command ("Create Physical File"). You
| create a file.
|
| And I don't know what you mean by "catalogs", in the context
| of IBM i. Are you talking about DB2 catalog views? (Which
| exist in DB2 on every other platform, and most other RDBMS
| have something equivalent, such as the ANSI standard
| INFORMATION_SCHEMA)
|
| Or are you confusing IBM i with MVS (in which the OS contains
| databases called "catalogs", in which you lookup a file name
| or file name prefix to find out which disk volume a file is
| stored on?)
| marginalia_nu wrote:
| A file system _is_ a database though. Not a relational one,
| granted, but it 's still basically nosql before it was cool.
| layer8 wrote:
| Today's filesystems are more like the NoSQL equivalent of
| hierarchical databases, which were the very first database
| design, created in the 1960s, preceding RDBMSs.
| jasonwatkinspdx wrote:
| So I would strongly disagree with your notion that an OS cannot
| be a distributed system. Several OS's, such as Plan 9, were
| explicitly designed as distributed systems from the beginning.
| For Plan 9, running it on a solo computer is more the edge case
| than the common case they designed for, which was a small team
| with workstations sharing a central server.
|
| All of the computers are "running" the OS. The OS is more than
| one service on more than one machine.
| amelius wrote:
| Sounds like something you could implement on top of an existing
| OS like Linux (even in userspace) and get mostly the same
| advantages.
| laweijfmvo wrote:
| This was my first thought, and probably should be the first
| pass IMO. If they have to implement an entire novel OS to
| support this it will never be anything beyond a lab experiment.
| didgetmaster wrote:
| I have been working for years on a system designed to do much of
| what is described in their 'Level 2' layer. It is a single system
| that can effectively manage unstructured (i.e. file) data, semi-
| structured data (NoSql), and highly structured data (RDBMS).
|
| I don't have the resources available to me like this team, so
| there are still a lot of features on my TODO list that would
| enable some of the things they are looking for, but it can do
| many things now. It can manage 200M+ files and find subsets of
| them in sub-second speed. It can build relational tables with
| hundreds of millions of rows and thousands of columns and perform
| queries faster than many conventional DBs.
|
| www.Didgets.com is where you can download the beta software. Demo
| videos at
| https://www.youtube.com/channel/UC-L1oTcH0ocMXShifCt4JQQ
| AlbertCory wrote:
| "A database-oriented operating system" -- where have I heard this
| before?
|
| Oh, right: https://en.wikipedia.org/wiki/Pick_operating_system
|
| (No, I never used this.)
| melony wrote:
| Windows Registry doesn't count?
| fernly wrote:
| I expect they will acknowledge the Pick OS? "a demand-paged,
| multiuser, virtual memory, time-sharing computer operating system
| based around a MultiValue database."
|
| [0] https://en.wikipedia.org/wiki/Pick_operating_system
| skissane wrote:
| Pick's "multi-value database" is essentially just a flat-file
| database. I don't see how - if we put aside the marketing - it
| was really any more "database-oriented" than your average
| mainframe/minicomputer operating system with a record-oriented
| filesystem, such as MVS (especially VSAM), VM/CMS, OpenVMS (in
| particular its Files-11 RMS component), etc.
| pjmlp wrote:
| Well, Oracle APEX applications written in PL/SQL are a way to
| approach this.
| PaulDavisThe1st wrote:
| OK, so I've got 50GB of audio samples. Does anyone actually
| believe that these 50GB (400 giga-bits) are more efficiently
| stored in a database designed for "stuff" than they are in a
| database designed for files ("a filesystem") ?
| wswope wrote:
| Can be; it's all contextual.
|
| https://www.sqlite.org/fasterthanfs.html
| PaulDavisThe1st wrote:
| Fair enough. The problem is that at some point, the data has
| to hit some sort of storage hardware. Presumably between the
| DB and the hardware, there's some layer that somewhat
| abstracts the storage hardware. Isn't that ... a filesystem?
| tremon wrote:
| That's a storage volume (partition, raid volume, zfs pool,
| etc), not a filesystem. A filesystem is the abstraction
| layer on top of the storage volume translating the user-
| assigned data identifiers (aka file names) to byte ranges.
|
| Talking specifically about databases: they often implement
| their own data organization. Oracle and Sybase famously
| performed better when working on raw partitions than with
| files.
| samus wrote:
| A filesystem offers a hierachical interface. Meanwhile, a
| DBMS needs nothing more from the OS than access to blocks
| and preferably information about HDD layout. That's a level
| below.
| hcta wrote:
| Can you clarify what point you're making? If you're trying
| to argue that adding an extra layer can only reduce
| performance, any cache is an obvious exception to that. Are
| you saying it's extraneous to use a database as a storage
| abstraction because they have to sit on top on filesystems,
| and filesystems already exist?
| PaulDavisThe1st wrote:
| The point I'm making (and I'm not certain that it is
| true) is that ultimately if you want to store raw data, a
| filesystem _seems_ more likely to be what you want to
| use. Put differently, BLOBs in the DB end up
| (necessarily) as blobs on the disk, and managing blobs on
| a disk is precisely what filesystems are intended for.
|
| But yes, on top of that, there's the question that in the
| end even the DB will need something very, very much like
| a filesystem between them and the storage hardware ...
| which opens up the question whether this should remain
| hidden to every other application, or whether it makes
| sense that for certain kinds of applications, they too
| would use it (i.e. just like today)
| layer8 wrote:
| > managing blobs on a disk is precisely what filesystems
| are intended for.
|
| A filesystem is doing much more, e.g. providing naming
| and management (directories, symlinks, access control,
| extended attributes, cache management, ...) for files for
| manipulation by humans and applications, whereas RDBMs
| only need fixed-sized blocks of storage.
|
| Some databases actually support using raw disks without a
| normal filesystem, which can have advantages by removing
| the extra layer of abstraction, e.g.:
|
| https://dev.mysql.com/doc/refman/8.0/en/innodb-system-
| tables...
|
| https://docs.oracle.com/en/database/oracle/oracle-
| database/2...
|
| https://www.ibm.com/docs/de/db2/9.7?topic=creation-
| attaching...
| wswope wrote:
| If I'm reading you right, you're correct that the database
| is still technically passing its data to the filesystem at
| the end of the day.
|
| However, databases generally subsume most responsibility
| for the on-disk representation of data as well as I/O
| patterns. What's really being compared here is the
| performance of the database as a storage engine vs. the
| file system itself as a storage engine - not the raw I/O
| potential of the filesystem itself.
|
| https://en.wikipedia.org/wiki/Database_engine
| jandrewrogers wrote:
| Many databases implement their own filesystem internally
| that are heavily optimized for database-y use cases and
| access patterns while missing standard POSIX and other
| features a "real" filesystem would have. When this
| filesystem is installed on top of the OS filesystem, there
| is a cost due to duplication of effort, design impedance
| mismatch, limitations of the OS filesystem, etc. This is
| partially mitigated by turning the files in the OS
| filesystem into a giant block store to minimize interaction
| with the OS filesystem.
|
| Some database filesystems can be installed directly on raw
| block devices if you desire with no OS filesystem in the
| middle. This usually offers significant performance and
| efficiency gains since everything above the raw hardware is
| purpose-built for the requirements of optimal database
| performance.
| drewcoo wrote:
| Neat idea, but so many questions . . .
|
| how are process boundaries on data preserved?
|
| If data can be shared between processes via the db, how do they
| enforce clean, clear, testable interfaces (like monoliths lack)?
|
| And given all that, how do they manage data schema changes that
| we'd handle now with API versioning?
| marginalia_nu wrote:
| > clean, clear, testable interfaces (like monoliths lack)
|
| What prevents a monolith from having these things?
| airocker wrote:
| """ At Level 1, a kernel provides low-level OS services such as
| device drivers and memory management. At Level 2, a distributed
| DBMS runs on those services. At Level 3, we build high-level OS
| services such as a distributed file system, cluster scheduler,
| and distributed inter-process communication (IPC) subsystem on
| top of the DBMS. At Level 4, users write applications. """ How
| would it be different from installing today's kernel and then
| installing postgres, Gluster, docker and etcd on top of it?
| airocker wrote:
| would you force level 4 applications to only access level 3
| services and model everything that OS does (device management,
| process management, memory management etc) as a layer on top of
| it? So essentially all devices are a table? I think just
| modeling memory access as a table would be a big win. Not sure
| how atomicity and consistency would help applications exactly
| though with every memory access. Would love to know.
| Jtsummers wrote:
| Probably shortens the path between the applications (built on
| their distributed database manager) and the hardware/network
| compared to running applications on top of Postgres or other
| DBs today (which still end up calling out to the kernel and
| other subsystems). If _all_ your applications (as seems to be
| their intent) don 't need the Linux kernel, but only the DB,
| then push the DB service into the operating system.
|
| Building on top of too many layers increases the overall
| complexity and reduces overall performance. Periodically
| chopping out part of the system and creating what you actually
| need, fresh, is sometimes necessary. Even if it doesn't produce
| a new product or final system on its own, gives you a direction
| for moving other systems if the theory pans out in practice.
| [deleted]
| balentio wrote:
| The cloud is all ready a security nightmare. I don't want to put
| more stuff in it.
| rwmj wrote:
| Is there any work or implementation? Edit yes, this would be a
| better link: https://dbos-project.github.io/
| wyan wrote:
| AS/400 follows a similar idea if I recall correctly, on top of
| DB2
| skissane wrote:
| Dubious. IBM's marketing wants to convince you it does, but as
| far as I can work out, the integration of DB2 into the system
| is nowhere near as deep as the marketing makes it sound.
___________________________________________________________________
(page generated 2022-09-03 23:00 UTC)