[HN Gopher] Show HN: BemiDB - Postgres read replica optimized fo...
___________________________________________________________________
Show HN: BemiDB - Postgres read replica optimized for analytics
Hi HN! We're Evgeny and Arjun, and we're building a better way to
do analytics with Postgres. We love Postgres for its simplicity,
power, and rich ecosystem. But engineers have to still get bogged
down with heavyweight and expensive OLAP systems when connecting an
analytics data stack. Postgres is amazing at OLTP queries, but not
for OLAP queries (large data scans and aggregations). Even in this
case, we've still heard from countless scaling startups that they
still try to use only a read replica to run analytics workloads
since they don't want to deal with the data engineering complexity
of the alternative. This actually works surprising well initially,
but starts to break for them as they scale or when integrating
multiple data sources. Adding lots of indexes to support analytics
also slows down their transactional write performance. When
growing out of "just use Postgres", companies have to understand
and wrangle complex ETL pipelines, CDC processes, and data
warehouses -- adding layers of complexity that defeat the
simplicity that undermines their initial choice for Postgres as
their data storage in the first place. We thought there had to be
a better way, so we're building BemiDB. It's designed to handle
complex analytical queries at scale without the usual overhead.
It's a single binary that automatically syncs with Postgres data
and is Postgres-compatible, so it's like querying standard Postgres
and works with all existing tools. Under the hood, we use Apache
Iceberg (with Parquet data files) stored in S3. This allows for
bottomless inexpensive storage, compressed data in columnar files,
and an open format that guarantees compatibility with other data
tools. We embed DuckDB as the query engine for in-memory analytics
that work for complex queries. With efficient columnar storage and
vectorized execution, we're aiming for faster results without heavy
infra. BemiDB communicates over the Postgres wire protocol to make
all querying Postgres-compatible. We want to simplify data stacks
for companies that use Postgres by reducing complexity (single
binary and S3), using non-proprietary data formats (Iceberg open
tables), and removing vendor lock-in (open source). We'd love to
hear your feedback! What do you think?
Author : exAspArk
Score : 117 points
Date : 2024-11-07 16:25 UTC (6 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| oulipo wrote:
| Really cool! I have an IoT use-case where I ingest data, I want
| to keep like the last 3 months or so in Postgresql, and then
| store the old data as parquet files on S3
|
| I planned initially to do chunks on S3 and do the analytical
| queries using duckdb, I'm wondering if your tool would be a good
| replacement?
|
| For now I don't have that many analytical queries, I'm mostly
| doing visualization of the data points by querying a range (eg
| last 2 weeks of data for a device)
|
| Does it then make sense to use columnar storage or am I better
| off with "regular Postgres"?
|
| Or in my case does your approach provide "best of both worlds" in
| the sense that I could do some occasional analytical queries on
| past data stored on S3, and regularly access "last 3 months" data
| for visualization using the data stored in the regular Postgres?
| exAspArk wrote:
| Thank you!
|
| Yes, absolutely!
|
| 1) You could use BemiDB to sync your Postgres data (e.g.,
| partition time-series tables) to S3 in Iceberg format. Iceberg
| is essentially a "table" abstraction on top of columnar Parquet
| data files with a schema, history, etc.
|
| 2) If you don't need strong consistency and fine with delayed
| data (the main trade-off), you can use just BemiDB to query and
| visualize all data directly from S3. From a query perspective,
| it's like DuckDB that talks Postgres (wire protocol).
|
| Feel free to give it a try! And although it's a new project, we
| plan to keep building and improving it based on user feedback.
| oulipo wrote:
| Thanks!
|
| - Can you give me more info about the strong consistency and
| delayed data, so I can better picture it with a few examples?
|
| - Also, is it possible to do the sync with the columnar data
| in "more-or-less real-time" (eg do a NOTIFY on a new write in
| my IoT events table, and push in the storage?)
|
| - Would your system also be suited for a kind of "audit-log"
| data? Eg. if I want to have some kind of audit-table of all
| the changes in my database, but only want to keep a few weeks
| worth at hand, and then push the rest on S3, or it doesn't
| make much sense with that kind of data?
| exAspArk wrote:
| For now, BemiDB supports only full Postgres table data re-
| sync. We plan to enable real-time data syncing from
| Postgres into S3 by using logical replication (CDC), which
| is much more reliable than PG NOTIFY.
|
| We use logical replication and this exact approach with our
| other project related to auditing and storing Postgres data
| changes https://github.com/BemiHQ/bemi. We're thinking
| about combining these approaches to leverage scalable and
| affordable separated storage layer on S3.
|
| Lmk if that makes sense or if you had any more questions!
| oulipo wrote:
| Really interesting thanks! I guess my use-case would
| rather require incremental updates
|
| Ideally it would just sync in real-time and buffer new
| data in the Bemi binary (with some WAL-like storage to
| make sure data is preserved on binary crash/reload), and
| when it has enough, push them on S3, etc
|
| Is this the kind of approach you're going to take?
| exAspArk wrote:
| Yes, we want to use the approach like you described!
| We'll likely wait until enough changes are accumulated by
| using 2 configurable thresholds: time (like 30s) and size
| (like 100MB)
| Onavo wrote:
| Use a clickhouse FDW or something similar, clickhouse has
| excellent integration with postgres. They also have a great
| embedded Python version. Their marketing isn't as good as
| Duckdb but in terms of stability and performance they are so
| much better. Duckdb is very very buggy and full of sharp edges
| but because of their VC funded developer marketing, you don't
| really hear people talking about it.
| exAspArk wrote:
| I agree that DuckDB may sometimes be buggy because it's being
| actively developed with a large surface area (a universal
| embeddable query engine that works with different storage
| layers if I were to simplify).
|
| However, DuckDB (non-profit foundation) != MotherDuck (VC
| funded). These are two separate organizations with different
| goals. I see DuckDB as a tool, not as a SaaS or a VC-funded
| company. My hope is that it'll be adopted by other projects
| and not associated with just a single for-profit company.
| hoerzu wrote:
| Can you give an example if I have 5gig (2 million rows)
|
| How will it be created differently for columnar access?
| exAspArk wrote:
| We ran some benchmarks (TPC-H, designed for OLAP) with ~10M
| records https://github.com/BemiHQ/BemiDB#benchmark
|
| The BemiDB storage layer produced ~300MB columnar Parquet files
| (with ZSTD compression) vs 1.6GB of data in Postgres.
| Sesse__ wrote:
| Does TPC-H SF1 really take _one and a half hours_ for you on
| regular Postgres? Last time I tried (in the form of DBT-3),
| it was 22 queries and most of them ran in a couple seconds.
| exAspArk wrote:
| Interesting. I haven't used the DBT-3 kit, does it add any
| indexes? I manually added these Postgres indexes https://gi
| thub.com/BemiHQ/BemiDB/blob/main/benchmark/data/cr... to
| reduce the main bottlenecks on SF0.1 and reduce the total
| time from 1h23m13s to 1.5s. But SF1 still took more than 1h
| Sesse__ wrote:
| It adds a bunch of indexes, yes. I don't think anyone
| really runs TPC-H unindexed unless they are using a
| database that plain doesn't support it; it wouldn't
| really give much meaningful information.
|
| Edit: I seemingly don't have these benchmarks anymore,
| and I'm not going to re-run them now, but I found a very
| (_very_) roughly similar SF10 run clocking in around
| seven minutes total. So that's the order of magnitude I
| would be expecting, given ten times as much data.
| exAspArk wrote:
| Got it, thanks for sharing it! We'll try to look into
| DBT-3 and the indexes it creates to test with SF10
| gigatexal wrote:
| Query Engine: embeds the DuckDB query engine to run analytical
| queries. Storage Layer: uses the Iceberg table format to store
| data in columnar compressed Parquet files.
|
| Smart. Imma test this out for sure.
| arjunlol wrote:
| Thanks! Give it a try and let us know any feedback :)
| paurora wrote:
| very cool!! We have the same vision with pg_moooncake:
| https://github.com/Mooncake-Labs/pg_mooncake/tree/main
|
| From what I understand, the BemiDB experience is akin to PeerDB +
| Clickhouse. It's not really a Postgres extension?
|
| Glad open table formats are becoming mainstream, for everyone.
| exAspArk wrote:
| Thanks!
|
| We love the pg_moooncake extension (and pg_duckdb used under
| the hood). Although our approaches are slightly different.
| Long-term, we want to allow anyone to use BemiDB by using
| native Postgres logical replication without installing any
| extensions (many Postgres hosting providers impose their
| restrictions, upgrading versions might be challenging, OLAP
| queries may affect OLTP performance if within the same
| database, etc.)
| shayonj wrote:
| Absolutely stoked for pg_mooncake. I really want to see some of
| these things happening inside PG and taking advantage of the PG
| internals + native storage. Only bummer is adoption by places
| where users are currently, say Aurora. But thats probably a
| problem for another day :)
|
| P.S The integration with something like Neon is really cool to
| see.
| neeleshs wrote:
| Congratulations! I was looking and pg_analytics from ParadeDB
| hoping this use case would be solved (the dump from pg to parquet
| part), but it doesnt yet do it.
|
| How does it handle updates?
| exAspArk wrote:
| Thank you!
|
| The pg_analytics Postgres extension partially supports
| different file formats. We bet big on Iceberg open table
| format, which uses Parquet data files under the hood.
|
| Our initial approach is to do periodic full table resyncing.
| The next step is to support incremental Iceberg operations like
| updates. This will involve creating a new "diff" Parquet file
| and using the Iceberg metadata to point to the new file version
| that changes some rows. Later this will enable time travel
| queries, schema evolution, etc.
| neeleshs wrote:
| Fantastic!
| winddude wrote:
| difference to something like duckdb?
| polskibus wrote:
| How can you setup automatic replication from Postgresql to a
| duckdb instance?
| exAspArk wrote:
| The most common approach is to read Postgres data _in_ DuckDB
| https://duckdb.org/docs/extensions/postgres.html
| simlevesque wrote:
| > > Alternatives
|
| > DuckDB:
|
| > - Designed for OLAP use cases. Easy to run with a single
| binary.
|
| > - Limited support in the data ecosystem (notebooks, BI tools,
| etc.). Requires manual data syncing and schema mapping for best
| performance.
| exAspArk wrote:
| ^ This!
|
| Here is the link that briefly describes pros and cons of
| different alternatives for analytics
| https://github.com/BemiHQ/BemiDB#alternatives
| canadiantim wrote:
| How does this compare to ParadeDB? Seems to occupy the same space
| exAspArk wrote:
| We love ParadeDB and their team. Their primary focus is search
| (Elasticsearch on Postgres), but they also have the
| pg_analytics Postgres extension (foreign data wrappers and
| embedded DuckDB).
|
| The biggest difference is in a Postgres extension vs a separate
| OLAP process. We want to allow anyone with just Postgres to be
| able to perform analytics queries without affecting resources
| in the transactional database, building and installing
| extensions (might not be possible with some hosting providers),
| dealing with dependencies and their versions when upgrading
| Postgres, manually syncing data from Postgres to S3, etc.
| dangoodmanUT wrote:
| This is probably the most streamlined/all-inclusive solution out
| of all that I've seen, but this has definitely been an extremely
| saturated space in 2024
| dangoodmanUT wrote:
| mostly everyone riding on duckdb's tailcoat
| levkk wrote:
| Moving data between systems is problematic. Where this product is
| actually needed (multi-TB databases under load) is where logical
| replication won't be able to sync your tables in time.
| Conversely, small databases where this will work don't really
| need columnar storage optimizations.
| exAspArk wrote:
| Fair point. We think that BemiDB currently can be useful when
| used with small and medium Postgres databases. Running complex
| analytics queries on Postgres can work, but it usually requires
| tuning it and adding indexes tailored to these queries, which
| may negatively impact the write performance on the OLTP side or
| may not be possible if these are ad-hoc queries.
|
| > (multi-TB databases under load) is where logical replication
| won't be able to sync your tables in time
|
| I think the ceiling for logical replication (and optimization
| techniques around it) is quite high. But I wonder what people
| do when it doesn't work and scale?
| woodhull wrote:
| For my use case of something similar on Clickhouse:
|
| We load data from postgres tables that are used to build
| Clickhouse Dictionaries (a hash table for JOIN-ish operations).
|
| The big tables do not arrive via real-time-ish sync from
| postgres but are bulk-appended using a separate infrastructure.
| exAspArk wrote:
| Would you be able to share how you implemented "bulk-appended
| using a separate infrastructure" at a high level?
| woodhull wrote:
| As much as DuckDB is cute I've mostly come to believe that
| Clickhouse is the perfect thing to pair Postgres with. This is
| especially true now that they've acquired PeerDB and are
| integrating it into the Clickpipes cloud product.
|
| DuckDB is neat, and I understand why a company like BemiDB would
| build their product on top of it, but as a prospective customer
| embedded databases are a weird choice for serious workloads when
| there are other good open-source solutions like Clickhouse
| available.
| exAspArk wrote:
| ClickHouse is definitely a popular choice nowadays. I'm curious
| whether you self-host ClickHouse or use their Cloud? We wanted
| to make BemiDB as simple to run as possible with a single
| binary and object storage (vs large machines, big disks,
| clustering, running Temporal for CDC, etc.)
| Onavo wrote:
| Clickhouse has an embedded version (https://github.com/chdb-
| io/chdb), the issue with duck is that it's too buggy for
| production loads. You can see a nice list of the issues here:
|
| https://news.ycombinator.com/item?id=41490707
| maxmcd wrote:
| Using duckdb and apache iceberg means that you can run read
| replicas without any operational burden. Clickhouse is amazing,
| but they do not allow you to mount dumb read replicas to object
| storage (yet).
|
| I can imagine this product is a very elegant solution for many
| types of companies/teams/workloads.
| zX41ZdbW wrote:
| You can mount read replicas on object storage in ClickHouse.
|
| Example: CREATE DATABASE test; USE
| test; CREATE TABLE hackernews_history UUID
| '66491946-56e3-4790-a112-d2dc3963e68a' (
| `update_time` DateTime DEFAULT now(), `id`
| UInt32, `deleted` UInt8, `type`
| Enum8('story' = 1, 'comment' = 2, 'poll' = 3, 'pollopt' = 4,
| 'job' = 5), `by` LowCardinality(String),
| `time` DateTime, `text` String,
| `dead` UInt8, `parent` UInt32, `poll`
| UInt32, `kids` Array(UInt32), `url`
| String, `score` Int32, `title`
| String, `parts` Array(UInt32),
| `descendants` Int32 ) ENGINE =
| ReplacingMergeTree(update_time) ORDER BY id
| SETTINGS disk = disk(readonly = true, type =
| 's3_plain_rewritable', endpoint = 'https://clicklake-
| test-2.s3.eu-central-1.amazonaws.com/',
| use_environment_credentials = false);
|
| And you can try it right now.
|
| Install ClickHouse: curl
| https://clickhouse.com/ | sh ./clickhouse local
|
| Run the query above to attach the table.
|
| The table is updated in real time. For example, here is your
| comment: :) SELECT * FROM
| hackernews_history WHERE text LIKE '%Clickhouse is amazing%'
| ORDER BY update_time \G Row 1: ------
| update_time: 2024-04-06 16:35:28 id:
| 39785472 deleted: 0 type: comment
| by: mightybyte time: 2024-03-21
| 22:59:20 text: I'll second this.
| Clickhouse is amazing. I was actually using it today to
| query some CSV files. I had to refresh my memory on the
| syntax so if anyone is interested:<p><pre><code> clickhouse
| local -q "SELECT foo, sum(bar) FROM
| file('foobar.csv', CSV) GROUP BY foo FORMAT
| Pretty" </code></pre> Way easier than
| opening in Excel and creating a pivot table which was my
| previous workflow.<p>Here's a list of the different
| input and output formats that it supports.<p><a href="https:&
| #x2F;/clickhouse.com/docs/en/interfaces&#
| x2F;formats" rel="nofollow">https://clickhouse.com&
| #x2F;docs/en/interfaces/formats</a>
| dead: 0 parent: 39784942 poll:
| 0 kids: [39788575] url:
| score: 0 title: parts: []
| descendants: 0 Row 2: ------
| update_time: 2024-04-06 18:07:34 id:
| 31334599 deleted: 0 type: comment
| by: richieartoul time: 2022-05-11
| 00:54:31 text: Not really. Clickhouse is
| amazing, but if you want to run it at massive scale you'll
| have to invest a lot into sharding and clustering and all
| that. Druid is more distributed by default, but doesn't
| support as sophisticated of queries as Clickhouse
| does.<p>Neither Clickhouse nor Druid can hold a candle to
| what Snowflake can do in terms of query capabilities, as well
| as the flexibility and richness of their product.<p>That's
| just scratching the surface. They're completely different
| product categories IMO, although they have a lot of technical
| / architectural overlap depending on how much you
| squint.<p>Devil is in the details basically. dead:
| 0 parent: 31334527 poll: 0
| kids: [31334736] url: score:
| 0 title: parts: []
| descendants: 0 Row 3: ------
| update_time: 2024-11-07 22:29:09 id:
| 42081672 deleted: 0 type: comment
| by: maxmcd time: 2024-11-07 22:13:12
| text: Using duckdb and apache iceberg means that you
| can run read replicas without any operational burden.
| Clickhouse is amazing, but they do not allow you to mount
| dumb read replicas to object storage (yet).<p>I can imagine
| this product is a very elegant solution for many types of
| companies/teams/workloads. dead: 0
| parent: 42080385 poll: 0 kids:
| [] url: score: 0
| title: parts: [] descendants: 0
| 3 rows in set. Elapsed: 3.981 sec. Processed 42.27 million
| rows, 14.45 GB (10.62 million rows/s., 3.63 GB/s.)
| Peak memory usage: 579.26 MiB.
| partdavid wrote:
| How does this replicate Postgres data? I glanced at the code and
| saw that it exports to a CSV file then writes out an Iceberg
| table for an initial snapshot--does it use Postgres logical
| replication?
| exAspArk wrote:
| Full table re-syncing is our initial solution. Using Postgres
| logical replication is next on our roadmap!
| leighleighleigh wrote:
| Definitely checking this out today! I use postgres for ~30 GB of
| machine learning data (object detection) and have a couple
| workflows which go through the Postgres->Parquet->DuckDB
| processing route.
|
| A couple questions, if you have time:
|
| 1. How do you guys handle multi-dimensional arrays? I've had
| issues with a few postgres-facing interfaces (libraries or
| middleware) where they believe everything is a 1D array!
|
| 2. I saw you are using pg_duckdb/duckdb under the hood. I've had
| issues calling plain-SQL functions defined on the postgres
| server, when duckdb is involved. Does BemiDB support them?
|
| Thanks for sharing, and good luck with it!
| exAspArk wrote:
| Thank you, please give it a try!
|
| Great questions:
|
| 1. We currently don't support multi-dimensional arrays, but we
| plan to add support for such complex data structures.
|
| 2. Would you be able to share what type of user-defined
| functions are these, do they do modify the data or read it?
| leighleighleigh wrote:
| 1. good to hear! 2. The bulk of them are convenience wrappers
| which resolve UUIDs into other values, so most are read-only
| with only a single table lookup.
| VoxPelli wrote:
| The AGPL license is a no-go for me.
|
| While it's technically true that it's an OSI license it's mostly
| used to scare away competing cloud vendors from hosting the
| software, which isn't in spirit of OSS.
|
| Have you looked into the more modern choices?
|
| Like the Business Source License that MariaDB created and uses or
| the Functional Source License that Sentry created as an
| improvement over the Business Source License?
| https://fsl.software/
|
| Both those licenses have a fair source phase that automatically
| resolves into an open source phase over time.
|
| Thus one gets the best of two worlds: An honest descriptive
| license for protecting one's business model + a normal permissive
| OSS license that ensures longevity and prevents lock-in.
| exAspArk wrote:
| Our philosophy in general is to go to a more open license over
| time (vs the other direction). So we might consider other more
| permissive OSI-approved licenses.
|
| Would you be able to share why AGPL license is a no-go for you?
| I'm genuinely curious about your use case. In simple words,
| it'd require a company to open source their BemiDB code only if
| they made modifications and were distributing it to other users
| (allowing modifications and using it internally without any
| restrictions)
| senorrib wrote:
| Please, don't. AGPL is great and you're fine using it.
| senorrib wrote:
| You're seriously calling out a perfectly valid OSS for not
| being "in the spirit of OSS", and pitching for licenses that
| are explicitly NOT OSS?!
|
| AGPL couldn't be more in the spirit of OSS. The entire free
| software movement started to defend the _users_ freedom, not
| individual companies'.
| riiii wrote:
| Because you want to take their hard work, modify it and not
| share it back to the community?
|
| I'm not crying that "it's not for you".
| gregw2 wrote:
| So it looks like you don't use postgres extensions so you can run
| this on an EC2 against an Aurora Postgres instance and dump files
| to S3 Iceberg right?
|
| And can you then have Glue Catalog auto-crawl them and expose
| them in Athena? Or are they DuckDB-managed Iceberg tables
| essentially?
| exAspArk wrote:
| Exactly! You can run it on any server connecting to any
| Postgres, without installing custom extensions (AWS Aurora
| supports only a limited number of extensions https://docs.aws.a
| mazon.com/AmazonRDS/latest/AuroraPostgreSQ...).
|
| The Iceberg tables are created separately from the DuckDB query
| engine. So you should be able to read these Iceberg tables by
| using any other Iceberg-compatible tools and services like AWS
| Athena.
| jakozaur wrote:
| Cool. Every database or data source (e.g. CRM) should produce
| Iceberg format for you.
|
| Though a little sceptical of embedding DuckDB. It is easy and
| better to isolate Read/Write paths, and it has a lot of other
| benefits.
| exAspArk wrote:
| Iceberg for the win!
|
| We actually separate Read/Write paths. BemiDB reads by levering
| DuckDB as a query engine. And it writes to Iceberg completely
| separately from DuckDB. I'm curious if that's what you
| imagined.
| jakozaur wrote:
| Ideally, I would love many places writing to the Iceberg
| catalogue and then using it as a data lake.
|
| In the data lake, I would have a gateway provisioning DuckDB
| on demand for each user.
| mrbluecoat wrote:
| I hadn't heard of Devbox before, so thanks for sharing
| exAspArk wrote:
| Haha, it's awesome for isolating project environments
| (languages, databases, etc.) without using docker
| globular-toast wrote:
| I don't get how this would do away with the need for some kind of
| ETL. Most apps use highly normalised schemas that are completely
| unsuitable for analytical users. Not to mention you wouldn't want
| to couple your app schema to your warehouse schema. Am I missing
| something? How would this replace traditional data warehousing?
| exAspArk wrote:
| Good point. For more complex scenarios, people would still be
| able to implement, for example, a Medallion Architecture to
| progressively improve data quality and structure. Because it is
| Postgres- and Iceberg-compatible (db and data), it's possible
| to bring more other advanced data tools when it's needed to
| perform data transformation and movement. Currently, we see it
| as a Postgres read replica for analytics. But it's easy to
| imagine that in the future it could be used as a standalone OSS
| database on top of a data lakehouse with an open format in S3.
| lucasfcosta wrote:
| Amazing work, guys! Looking forward to seeing where BemiDB is
| going.
___________________________________________________________________
(page generated 2024-11-07 23:00 UTC)