[HN Gopher] AnyDB: An Architecture-Less DBMS for Any Workload
       ___________________________________________________________________
        
       AnyDB: An Architecture-Less DBMS for Any Workload
        
       Author : aratno
       Score  : 63 points
       Date   : 2021-01-11 19:16 UTC (1 days ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | jandrewrogers wrote:
       | The paper is premised on a dichotomy that doesn't hold in real
       | database systems. Specifically, the assertion that "static"
       | shared-nothing architecture performance substantially degrades
       | under skew, which provides a foil for shared-disk architectures
       | that do not. While this is true for many shared-nothing
       | architectures, particularly in open source, robustly skew-
       | tolerant shared-nothing architectures have existed for at least a
       | decade -- highly _dynamic_ shared-nothing architectures are valid
       | (and quite good) designs.
       | 
       | A skew-tolerant shared-nothing database has internals that look a
       | bit like "AnyDB" to the extent execution of a type of workload is
       | largely disconnected from the physical architecture -- the
       | storage engines are often identical, for example. This allows you
       | to schedule any mixture of fast-twitch operations concurrent with
       | slow analytic queries. The original motivation for these types of
       | architectures was complex mixed workloads. What is missing from
       | the AnyDB shared-nothing architecture to make it skew-tolerant is
       | a mechanism for continuously and smoothly shedding both data and
       | load across cores/machines while maintaining consistency.
       | Multiple options here, just need to pick one that makes sense and
       | plays nicely with the concurrency control model.
       | 
       | Similarly, the synchronization-free streaming concurrency control
       | model described in the paper is a standard design idiom. Most
       | "thread-per-core" style database architectures do something like
       | this -- it is one of the major advantages of being thread-per-
       | core.
       | 
       | The database engineering industry has a long history of not
       | publishing design advances and this is illustrative of that. If
       | someone asked me to point to a paper that describes all this, I'd
       | have a difficult time thinking of one. That isn't where this
       | knowledge tends to be stored.
        
       | the_duke wrote:
       | The juicy part of the paper:
       | 
       | > The main idea of an architecture-less database system is that
       | it is composed of a single generic type of component where
       | multiple instances of those components "act together" in an
       | optimal manner on a per-query basis. To instrument generic
       | components at run-time and coordinate the overall DBMS execution,
       | each component consumes two streams: an event and a data stream.
       | While the event stream encodes the operations to be executed, the
       | data stream shuffles the state required by these events to the
       | executing component.Through this instrumentation of generic
       | components by event and data streams, a component can act as a
       | query optimizer at one moment for one query but for the next as a
       | worker executing a filter or join operator.
       | 
       | Doesn't sound too different from current distributed DBMS, which
       | already specialize query execution and distribute workload
       | between cores/nodes in a similar manner, but taken to the next
       | level to more easily enable things like different data
       | persistence models or FPGA integration.
       | 
       | Seems challenging to implement without losing significant
       | performance to the abstraction layer.
        
         | fipar wrote:
         | > Seems challenging to implement without losing significant
         | performance to the abstraction layer.
         | 
         | Agreed, it seems the tradeoff would only make this worthwhile
         | when you need to optimize for throughput, and a lot of the
         | workload is ad hoc. A lot of distributed DBMS just serve point
         | queries or range scans where I don't think something like this
         | would be useful.
         | 
         | Troubleshooting production issues here seems challenging too
         | (though that's always the case in distributed systems to this
         | would be less of a problem).
         | 
         | Still interesting, but it would be more interesting to see the
         | idea get some trial by fire.
        
       | gregw2 wrote:
       | So I get that they call it "architecture-less" because it doesn't
       | choose between "shared nothing" and "shared disk" architectures
       | and thus can pivot from OLTP to OLAP.
       | 
       | But I have a different OLTP vs OLAP "architecture" question....
       | is it row-based or columnar? Is it "architectureless" in that
       | regard also? Are they going to store data persisted both ways so
       | you get the worst of both worlds performance-wise or is there
       | still an OLAP vs OLTP architecture choice there?
       | 
       | I suspect there are still some architectural choices here!
        
         | bobthebuilders wrote:
         | From what it looks like there's no architecture.
        
       ___________________________________________________________________
       (page generated 2021-01-12 23:01 UTC)