[HN Gopher] Introducing ArcticDB: A Database for Observability
___________________________________________________________________
Introducing ArcticDB: A Database for Observability
Author : brancz
Score : 106 points
Date : 2022-05-04 14:05 UTC (8 hours ago)
(HTM) web link (www.polarsignals.com)
(TXT) w3m dump (www.polarsignals.com)
| psanford wrote:
| I'm a bit confused by this being an in-memory only database and
| also using Apache Parquet. Isn't Apache Parquet a file format
| specification? Whats the point of using it if you aren't
| serializing data to disk? Is the goal to eventually support
| durable storage?
| brancz wrote:
| You're absolutely right that that's confusing.
|
| Short answer: We just wanted to release this as soon as
| possible and haven't gotten to it yet.
|
| Slightly longer answer: We currently build parquet buffers in-
| memory which we are soon going to persist. We still want to
| finish up some details about how we partition and compact data
| over time before we do that though. We've learned from previous
| projects that once we write to disk people start to depend on
| that :)
| hintymad wrote:
| It's great to see more solutions in this space, and
| congratulations on launching and open sourcing it. Is it possible
| to explain why it matters to an end users? The Why We Built It
| section seems solely focus on implementation and what matters to
| the team.
|
| > First, we needed something embeddable for Go
|
| I'm not sure why it matters if I just need to store and analyze
| large quantity of profiling data, but let me assume this matters
| to the creator of the db.
|
| > The second and more pressing argument was: in order for us to
| be able to translate the label-based data-model to a table-based
| layout, we needed the ability to create columns whenever we see a
| label-name for the first time.
|
| Why does this matter to me, an end user? Does it make ingestion
| faster? Does it make query faster? Does it support higher
| throughput compared with M3DB or Netflix Atlas or FB Gorilla?
| Does it make distributed query more scalable? Does it enable the
| support of higher cardinality and more dimensions? Does it enable
| more expressive aggregations or query semantics in general? Does
| it enable the db to model data beyond multi-dimensional timer
| series?
| brancz wrote:
| Great point! It's important to have the data that we want to
| separately search and aggregate by laid out in their own column
| and sorted in order to allow fast processing of that data
| (query latency) but also achieve better compression since
| repetitive values can be efficiently encoded (saving both
| memory and disk once we persist data).
|
| We can keep cost of ingestion low because we make trade offs
| about the mutability of data, as sorting will never change if
| data is immutable. We globally maintain sorting by requiring
| writes to be sorted, that way, worst case we look at each in-
| memory block but in practice at far less when inserting.
|
| One thing to clarify, for now arcticDB is just an embeddable
| database similar to badger, but it's possible we might make it
| a distributed database in the future (for now this suffices for
| what we need it to do).
|
| Let me know if that clarifies it, happy to elaborate further!
| brancz wrote:
| Hey all, one of the creators of ArcticDB here. We're going to be
| around for a while and answer any questions you might have about
| it!
|
| It's open source so if you just want to check out the repo:
| https://github.com/polarsignals/arcticdb
| jtwebman wrote:
| You really should change the front page to why I should use it
| vs why we built it. I don't care why you built it that it is in
| Go. I do care why I should use it which must be hidden
| somewhere in why you built it but I don't have that kind of
| time.
| brancz wrote:
| Thank you so much for the feedback! By front page I assume
| you mean the project readme? I agree, I think we leaned a bit
| too far into design documentation in the readme rather than
| explaining why and how to use it. We'll get that fixed!
| caust1c wrote:
| How does it compare to BadgerDB/RocksDB/LevelDB? I see that
| it's using Arrow and Parquet of course, but the Sparse Index
| sounds very similar to LSM Tree like storage engines, except
| using something like a K-Way Merge algorithm and a heap
| structure to manage that somehow?
|
| I'm more of an operator and user of these systems, so as an
| operator I care more about the usability than what's
| underneath, but also am reasonably skeptical of new databases
| since theres literally hundreds being written every year.
|
| So what benefits does this structure and data format provide
| over classical LSM-like databases which are currently
| dominating the high-write-throughput embedded DB space?
| brancz wrote:
| It's closer to DuckDB rather than Badger/RocksDB/LevelDB, but
| similar in the sense that it is an embeddable database, not
| one that is operated standalone. It's not unlike an LSM tree,
| but the difference is that the leafs in the tree are not
| individual keys, but rather describe a range of values that
| are all read at once if read. So this allows high write
| throughput _and_ high read throughput, trading off mutability
| (which is ok for the Observability use cases we aim for it to
| fulfil).
|
| I think it's reasonable to be skeptical about new databases,
| if it helps, we worked on the Prometheus and Thanos project
| storage layers before we started the work on this, which now
| powers hundreds of thousands of monitoring stacks out there.
| caust1c wrote:
| It was my understanding that those databases do have sparse
| indexes, but admittedly I may have applied that assumption
| based on my majority experience with Clickhouse where it
| uses LSM Tree engines and also has a sparse index.
|
| Would it be correct to say this is like an embeddable
| clickhouse engine, minus the SQL interface and using Arrow
| and Parquet as the storage format?
| brancz wrote:
| Yes, that's correct! Plus the dynamic column feature,
| which we think is crucial for Observability-type
| workloads (from what we know only InfluxDB IOx supports a
| similar feature).
| caust1c wrote:
| Nice! Thanks for the reply! Looking forward to see where
| this goes! :-)
| bigcat12345678 wrote:
| Does ArcticDB support store data in memory, and flush overflown
| & old data onto disk (or other persistent storage sink)?
| brancz wrote:
| Not yet, but we will soon, though we already rotate the in
| memory state when a certain (configurable) size is reached.
| The idea is that we'll put data in parquet format into object
| storage from where it can be consumed from any parquet
| compatible tools. That plus additional metadata is what we'll
| use for long term storage for profiling data ourselves.
| bigcat12345678 wrote:
| Just to clarify more on the in-memory part, it offers the
| same Arrow APIs for querying, right? Or only specialized
| query API tailored to Parca's use?
|
| We'd like it to be Arrow APIs, such that we can use it for
| other purposes, but still in the Observability space
| actually.
|
| Read the other comment:
| https://news.ycombinator.com/item?id=31263825 Already
| explained that the in-memory part, but missing the
| serialization part.
| brancz wrote:
| ArcticDB has a general purpose dataframe-like query
| builder that Parca uses to build its queries. The query
| engine scans parquet row groups and those that may
| contain interesting data are converted to arrow and go
| through the query plans the query planner creates. The
| query plans other than the table scan expect arrow
| frames.
|
| Hope that explains it, but happy to elaborate more!
| bigcat12345678 wrote:
| IIUC: The query written by users of arcticdb is in a
| dataframe-like python-ish language.
|
| The query engine first go through the inmemory parquet
| rows to get a subset that contains the relevant data.
| These relevant data is returned as arrow frames.
|
| Then the query planer produces a query plan and the
| engine execute the query plan on the previously returned
| arrow frames.
|
| Does the query engine produce the query plan before
| selecting the arrow frames, and uses the plan for the
| selecting process as well?
| eatonphil wrote:
| It looks like there's not a textual query interface to it at
| the moment. Do I have that wrong or are you interested in
| adding a high level query interface in the future?
| brancz wrote:
| Very good question! While we may add a textual query
| interface (maybe even SQL) in the future, we very
| intentionally started with this abstraction, since most
| Observability projects out there (eg. Prometheus, Grafana
| Loki, Parca) have specialised query languages that we wanted
| to have a lower level abstractions that those languages could
| be implemented on top of, instead of having to transpile to
| SQL (or whatever proprietary language we might have come up
| with).
| bfm wrote:
| What was the motivation for building your own instead of using
| something like DuckDB which support parquet out of the box?
| What are the differences?
| eatonphil wrote:
| In addition to other reasons they may have, an embedded
| database for Go being built in Go means they don't need to
| require CGO, which DuckDB in Go would require.
| brancz wrote:
| This was definitely part of the motivation, but even more
| importantly (and it's possible that we missed it in the
| duckdb documentation when we explored doing exactly this)
| we needed the ability to add columns dynamically when we
| see a new label-name. This is sort of an analogy of wide-
| columns in Cassandra, but forcing it into the columnar
| layout to allow it to be searched and aggregated by
| efficiently. From our research all the open source column
| databases at best support a map type, through which we
| loose the columnar layout since the values of the map are
| all stored together giving us row-based database
| characteristics. (all databases except InfluxDB IOx, whose
| developers we talked to extensively and who highly inspired
| this design)
| bfm wrote:
| It makes sense, DuckDB's documentation has significantly
| improved, but it is still lacking when it comes to the
| limitations of using parquet. We have also hit some
| roadblocks when updating schemas backed by parquet files,
| so we now only use DuckDB for querying parquet via SQL.
| ctovena wrote:
| Nice article !
|
| Super curious about your plan for persistence and compression ?
|
| With TSDB interning labels, do you expect any increase of size
| for the label part ?
|
| And finally any specific reason for not having this under the
| Parca repo ? IMHO working across multiple repo in go can be a
| PITA.
| brancz wrote:
| Great questions! We're planning on persisting data in parquet
| format in object storage, potentially with additional metadata
| to help the query planner optimize queries. In terms of
| compression, parquet already supports various modern
| compression mechanisms (zstd, lz4) that we already support if
| the schema specifies it. Though we have thought about
| potentially allowing different compression schemes to be used
| in different situations, for now it's static and part of the
| schema.
|
| While the pure storage for strings of labels might increase,
| since we got rid of the inverted index entirely, the saving of
| that is greater than what we're spending on potentially
| duplicate strings.
|
| We intentionally put it on the Polar Signals GitHub org to
| distance it from the Parca project. While we initially
| developed it for Parca, we think the applications can be much
| wider so we wanted to emphasize that. I do agree it can be a
| pain to have it separate, but for now we think having it
| separate is worth it.
| witcher wrote:
| Finally seeing Apache Parquet and Apache Arrow used with Go
| efficiently and effectively!
|
| Great job. Looking forward to exploring this more in the
| Prometheus and CNCF Ecosystem.
|
| The underneath library used
| (https://github.com/segmentio/parquet-go) looks amazing too!
| brancz wrote:
| I agree, both Parquet and Arrow are super powerful
| technologies. This Parquet library is an absolute bliss to work
| with (and contribute to), the APIs are so well designed!
___________________________________________________________________
(page generated 2022-05-04 23:01 UTC)