[HN Gopher] AnyDB: An Architecture-Less DBMS for Any Workload
___________________________________________________________________
AnyDB: An Architecture-Less DBMS for Any Workload
Author : aratno
Score : 63 points
Date : 2021-01-11 19:16 UTC (1 days ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| jandrewrogers wrote:
| The paper is premised on a dichotomy that doesn't hold in real
| database systems. Specifically, the assertion that "static"
| shared-nothing architecture performance substantially degrades
| under skew, which provides a foil for shared-disk architectures
| that do not. While this is true for many shared-nothing
| architectures, particularly in open source, robustly skew-
| tolerant shared-nothing architectures have existed for at least a
| decade -- highly _dynamic_ shared-nothing architectures are valid
| (and quite good) designs.
|
| A skew-tolerant shared-nothing database has internals that look a
| bit like "AnyDB" to the extent execution of a type of workload is
| largely disconnected from the physical architecture -- the
| storage engines are often identical, for example. This allows you
| to schedule any mixture of fast-twitch operations concurrent with
| slow analytic queries. The original motivation for these types of
| architectures was complex mixed workloads. What is missing from
| the AnyDB shared-nothing architecture to make it skew-tolerant is
| a mechanism for continuously and smoothly shedding both data and
| load across cores/machines while maintaining consistency.
| Multiple options here, just need to pick one that makes sense and
| plays nicely with the concurrency control model.
|
| Similarly, the synchronization-free streaming concurrency control
| model described in the paper is a standard design idiom. Most
| "thread-per-core" style database architectures do something like
| this -- it is one of the major advantages of being thread-per-
| core.
|
| The database engineering industry has a long history of not
| publishing design advances and this is illustrative of that. If
| someone asked me to point to a paper that describes all this, I'd
| have a difficult time thinking of one. That isn't where this
| knowledge tends to be stored.
| the_duke wrote:
| The juicy part of the paper:
|
| > The main idea of an architecture-less database system is that
| it is composed of a single generic type of component where
| multiple instances of those components "act together" in an
| optimal manner on a per-query basis. To instrument generic
| components at run-time and coordinate the overall DBMS execution,
| each component consumes two streams: an event and a data stream.
| While the event stream encodes the operations to be executed, the
| data stream shuffles the state required by these events to the
| executing component.Through this instrumentation of generic
| components by event and data streams, a component can act as a
| query optimizer at one moment for one query but for the next as a
| worker executing a filter or join operator.
|
| Doesn't sound too different from current distributed DBMS, which
| already specialize query execution and distribute workload
| between cores/nodes in a similar manner, but taken to the next
| level to more easily enable things like different data
| persistence models or FPGA integration.
|
| Seems challenging to implement without losing significant
| performance to the abstraction layer.
| fipar wrote:
| > Seems challenging to implement without losing significant
| performance to the abstraction layer.
|
| Agreed, it seems the tradeoff would only make this worthwhile
| when you need to optimize for throughput, and a lot of the
| workload is ad hoc. A lot of distributed DBMS just serve point
| queries or range scans where I don't think something like this
| would be useful.
|
| Troubleshooting production issues here seems challenging too
| (though that's always the case in distributed systems to this
| would be less of a problem).
|
| Still interesting, but it would be more interesting to see the
| idea get some trial by fire.
| gregw2 wrote:
| So I get that they call it "architecture-less" because it doesn't
| choose between "shared nothing" and "shared disk" architectures
| and thus can pivot from OLTP to OLAP.
|
| But I have a different OLTP vs OLAP "architecture" question....
| is it row-based or columnar? Is it "architectureless" in that
| regard also? Are they going to store data persisted both ways so
| you get the worst of both worlds performance-wise or is there
| still an OLAP vs OLTP architecture choice there?
|
| I suspect there are still some architectural choices here!
| bobthebuilders wrote:
| From what it looks like there's no architecture.
___________________________________________________________________
(page generated 2021-01-12 23:01 UTC)