[HN Gopher] Pg_lake: Postgres with Iceberg and data lake access
___________________________________________________________________
Pg_lake: Postgres with Iceberg and data lake access
Author : plaur782
Score : 241 points
Date : 2025-11-04 16:12 UTC (6 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| ozgune wrote:
| This is huge!
|
| When people ask me what's missing in the Postgres market, I used
| to tell them "open source Snowflake."
|
| Crunchy's Postgres extension is by far the most ahead solution in
| the market.
|
| Huge congrats to Snowflake and the Crunchy team on open sourcing
| this.
| gigatexal wrote:
| Honestly. Just pay snowflake for the amazing DB and ecosystem
| it is. And then go build cool stuff unless your value add to
| customers is infra let them handle all that.
| ayhanfuat wrote:
| With S3 Table Buckets, Cloudflare R2 Data Catalog and now this,
| Iceberg seems to be winning.
| dkdcio wrote:
| I was going to ask if you could then put DuckDB over Postgres for
| the OLAP query engine -- looks like that's already what it does!
| very interesting development in the data lake space alongside
| DuckLake and things
| pgguru wrote:
| You create foreign tables in postgres using either the
| pg_lake_table wrapper or pg_lake_iceberg.
|
| Once those tables exist, queries against them are able to
| either push down entirely to the remote tables and uses a
| Custom Scan to execute and pull results back into postgres, or
| we transform/extract the pieces that can be executed remotely
| using a FDW and then treat it as a tuple source.
|
| In both cases, the user does not need to know any of the
| details and just runs queries inside postgres as they always
| have.
| spenczar5 wrote:
| I think I don't understand postgres enough, so forgive this
| naive question, but what does pushing down to the remote
| tables mean? Does it allow parallelism? If I query a very
| large iceberg table, will this system fan the work out to
| multiple duckdb executors and gather the results back in?
| pgguru wrote:
| In any query engine you can execute the same query in
| different ways. The more restrictions that you can apply on
| the DuckDB side the less data you need to return to
| Postgres.
|
| For instance, you could compute a `SELECT COUNT(*) FROM
| mytable WHERE first_name = 'David'` by querying all the
| rows from `mytable` on the DuckDB side, returning all the
| rows, and letting Postgres itself count the number of
| results, but this is extremely inefficient, since that same
| value can be computed remotely.
|
| In a simple query like this with well-defined semantics
| that match between Postgres and DuckDB, you can run the
| query entirely on the remote side, just using Postgres as a
| go-between.
|
| Not all functions and operators work in the same way
| between the two systems, so you cannot just push things
| down unconditionally; `pg_lake` does some analysis to see
| what can run on the DuckDB side and what needs to stick
| around on the Postgres side.
|
| There is only a single "executor" from the perspective of
| pg_lake, but the pgduck_server embeds a multi-threaded
| duckdb instance.
|
| How DuckDB executes the portion of the query it gets is up
| to it; it often will involve parallelism, and it can use
| metadata about the files it is querying to speed up its own
| processing without even needing to visit every file. For
| instance, it can look at the `first_name` in the incoming
| query and just skip any files which do not have a
| min_value/max_value that would contain that.
| spenczar5 wrote:
| Thanks for the detailed answer!
|
| I use DuckDB today to query Iceberg tables. In some
| particularly gnarly queries (huge DISTINCTs, big sorts,
| even just selects that touch extremely heavy columns) I
| have sometimes run out of memory in that DuckDB instance.
|
| I run on hosts without much memory because they are
| cheap, and easy to launch, giving me isolated query
| parallism, which is hard to achieve on a single giant
| host.
|
| To the extent that its possible, I dream of being able to
| spread those gnarly OOMing queries across multiple hosts;
| perhaps the DISTINCTs can be merged for example. But this
| seems like a pretty complicated system that needs to be
| deeply aware of Iceberg partitioning ("hidden" in
| pg_lake's language), right?
|
| Is there some component in the postgres world that can
| help here? I am happy to continue over email, if you
| prefer, by the way.
| pgguru wrote:
| Well, dealing with large analytics queries will always
| perform better with larger amounts of memory... :D You
| can perhaps tune things to perform based on the amount of
| system memory (IME 80% is what DuckDB targets if not
| otherwise configured). Your proposed system does sounds
| like it introduces quite a bit of complexity that would
| be better served just by using hosts with more memory.
|
| As far as Iceberg is concerned, DuckDB has its own
| implementation, but we do not use that; pg_lake has its
| own iceberg implementation. The partitioning is "hidden"
| because it is separated out from the schema definition
| itself and can be changed gradually without the query
| engine needing to care about the details of how things
| are partitioning at read time. (For writes, we respect
| the latest partitioning spec and always write according
| to that.)
| beoberha wrote:
| Curious why pgduck_server is a totally separate process?
| dkdcio wrote:
| from the README:
|
| > This separation also avoids the threading and memory-safety
| limitations that would arise from embedding DuckDB directly
| inside the Postgres process, which is designed around process
| isolation rather than multi-threaded execution. Moreover, it
| lets us interact with the query engine directly by connecting
| to it using standard Postgres clients.
| beoberha wrote:
| Thanks! Didn't scroll down far enough
| rmnclmnt wrote:
| The README explains it:
|
| > This separation also avoids the threading and memory-safety
| limitations that would arise from embedding DuckDB directly
| inside the Postgres process, which is designed around process
| isolation rather than multi-threaded execution. Moreover, it
| lets us interact with the query engine directly by connecting
| to it using standard Postgres clients.
| pgguru wrote:
| What has been pointed out from the README; also:
|
| - Separation of concerns, since with a single external process
| we can share object store caches without complicated locking
| dances between multiple processes. - Memory limits are easier
| to reason about with a single external process. - Postgres
| backends end up being more robust, as you can restart the
| pgduck_server process separately.
| dharbin wrote:
| Why would Snowflake develop and release this? Doesn't this
| cannibalize their main product?
| kentm wrote:
| It's not going to scale as well as Snowflake, but it gets you
| into an Iceberg ecosystem which Snowflake can ingest and
| process at scale. Analytical data systems are typically
| trending to heterogenous compute with a shared storage backend
| -- you have large, autoscaling systems to process the raw data
| down to something that is usable by a smaller, cheaper query
| engine supporting UIs/services.
| hobs wrote:
| But if you are used to this type of compute per dollar what
| on earth would make you want to move to Snowflake?
| kentm wrote:
| Different parts of the analytical stack have different
| performance requirements and characteristics. Maybe none of
| your stack needs it and so you never need Snowflake at all.
|
| More likely, you don't need Snowflake to process queries
| from your BI tools (Mode, Tableau, Superset, etc), but you
| _do_ need it to prepare data for those BI tools. Its
| entirely possible that you have hundreds of terabytes, if
| not petabytes, of input data that you want to pare down to
| < 1 TB datasets for querying, and Snowflake can chew
| through those datasets. There's also third party
| integrations and things like ML tooling that you need to
| consider.
|
| You shouldn't really consider analytical systems the same
| as a database backing a service. Analytical systems are
| designed to funnel large datasets that cover the entire
| business (cross cutting services and any sharding you've
| done) into subsequently smaller datasets that are cheaper
| and faster to query. And you may be using different compute
| engines for different parts of these pipelines; there's a
| good chance you're not using _only_ Snowflake but Snowflake
| and a bunch of different tools.
| barrrrald wrote:
| One thing I admire about Snowflake is a real commitment to
| self-cannibalization. They were super out front with Iceberg
| even though it could disrupt them, because that's what
| customers were asking for and they're willing to bet they'll
| figure out how to make money in that new world
|
| Video of their SVP of Product talking about it here:
| https://youtu.be/PERZMGLhnF8?si=DjS_OgbNeDpvLA04&t=1195
| qaq wrote:
| Have you interacted with Snowflake teams much? We are using
| external iceberg tables with snowflake. Every interaction
| pretty much boils down to you really should not be using
| iceberg you should be using snowflake for storage. It's also
| pretty obvious some things are strategically not implemented
| to push you very strongly in that direction.
| ozkatz wrote:
| Out of curiosity - can you share a few examples of
| functionality currently not supported with Iceberg but that
| works well with their internal format?
| qaq wrote:
| even partition elimination is pretty primitive. For Query
| optimizer Iceberg is really not a primary target. The
| overall interaction with even technical people gives
| strong this is a sales org that happens to own an OLAP db
| product vibe.
| barrrrald wrote:
| Not surprised - this stuff isn't fully mature yet. But I
| interact with their team a lot and know they have a
| commitment to it (I'm the other guy in that video)
| 999900000999 wrote:
| It'll probably be really difficult to set up.
|
| If it's anything like super base, your question the existence
| of God when trying to get it to work properly.
|
| You pay them to make it work right.
| pgguru wrote:
| For testing, we at least have a Dockerfile to automate the
| setup of the pgduck_server and a minio instance so it Just
| Works(tm) with the extensions installed in your local
| Postgres cluster (after installing the extensions).
|
| The configuration mainly involves just defining the default
| iceberg location for new tables, pointing it to the
| pgduck_server, and providing the appropriate auth/secrets for
| your bucket access.
| mslot wrote:
| When we first developed pg_lake at Crunchy Data and defined GTM
| we considered whether it could be a Snowflake competitor, but
| we quickly realised that did not make sense.
|
| Data platforms like Snowflake are built as a central place to
| collect your organisation's data, do governance, large scale
| analytics, AI model training and inference, share data within
| and across orgs, build and deploy data products, etc. These are
| not jobs for a Postgres server.
|
| Pg_lake foremost targets Postgres users who currently need
| complex ETL pipelines to get data in and out of Postgres, and
| accidental Postgres data warehouses where you ended up
| overloading your server with slow analytical queries, but you
| still want to keep using Postgres.
| mberning wrote:
| Does anyone know how access control works to the underlying s3
| objects? I didn't see anything regarding grants in the docs.
| pgguru wrote:
| Hi, one of the developers here. You define credentials that can
| access the S3 buckets and use those as DuckDB secrets, usually
| in an init script for pgduck_server. (You can see some examples
| of this in the testing framework.)
|
| I'll see if we can improve the docs or highlight that part
| better, if it is already documented--we did move some things
| around prior to release.
| onderkalaci wrote:
| Maybe this could help: https://github.com/Snowflake-
| Labs/pg_lake?tab=readme-ov-file...
| mberning wrote:
| Interesting. I am working on a project to integrate access
| management to iceberg/parquet files for sagemaker.
| Controlling what users logged into sagemaker studio have
| access to in s3. It's fine using static policies for mvp, but
| eventually it needs to be dynamic and integrated into
| enterprise iam tools. Those tools generally have great
| support for managing sql grants. Not so much for s3 bucket
| policies.
| pgguru wrote:
| DuckDB secrets management supports custom IAM roles and the
| like; at this point we are basically treating the
| pgduck_server external system as a black box.
|
| For the postgres grants themselves, we provide privs to
| allow read/write to the remote tables, which is done via
| granting the `pg_lake_read`, `pg_lake_write` or
| `pg_lake_read_write` grants. This is a blanket all-or-
| nothing grant, however, so would need some design
| work/patching to support per-relation grants, say.
|
| (You could probably get away with making roles in postgres
| that have the appropriate read/write grant, then only
| granting those specific roles to a given relation, so it's
| probably doable though a little clunky at the moment.)
| mslot wrote:
| There are Postgres roles for read/write access to the S3 object
| that DuckDB has access to. Those roles can create tables from
| specific files or at specific locations, and can then assign
| more fine-grained privileges to other Postgres roles (e.g. read
| access on a specific view or table).
| chaps wrote:
| I love postgres and have created my own "data lake" sorta systems
| -- what would this add to my workflows?
| gajus wrote:
| Man, we are living in the golden era of PostgreSQL.
| anentropic wrote:
| When Snowflake bought Crunchy Data I was hoping they were going
| to offer a managed version of this
|
| It's great that I can run this locally in a Docker container, I'd
| love to be able to run a managed instance on AWS billed through
| our existing Snowflake account
| oulipo2 wrote:
| Interesting! How does it compare with ducklake?
| mslot wrote:
| You could say
|
| With DuckLake, the query frontend and query engine are DuckDB,
| and Postgres is used as a catalog in the background.
|
| With pg_lake, the query frontend and catalog are Postgres, and
| DuckDB is used as a query engine in the background.
|
| Of course, they also use different table formats (though
| similar in data layer) with different pros and cons, and the
| query frontends differ in significant ways.
|
| An interesting thing about pg_lake is that it is effectively
| standalone, no external catalog required. You can point Spark
| et al. directly to Postgres with pg_lake by using the Iceberg
| JDBC driver.
| inglor wrote:
| This is really nice though looking at the code - a lot of the
| postgres types are missing as well a lot of the newer parquet
| logical types - but this is a great start and a nice use of FDW.
| inglor wrote:
| Also, any planned support for more catalogs?
| pgguru wrote:
| I think we have recently merged (or are getting ready to
| merge) REST catalog support, so that will open some things up
| in this department.
| pgguru wrote:
| Hi, what types are you expecting to see that aren't supported?
| I believe we had support for most/all builtin postgres types.
| iamcreasy wrote:
| Very cool! Was there any inherent limitation with postgresql or
| its extension system that forced pg_lake to use duckdb as query
| engine?
| mslot wrote:
| I gave a talk on that at Data Council, then still discussing
| the pg_lake extensions as part of Crunchy Data Warehouse.
|
| https://youtu.be/HZArjlMB6W4?si=BWEfGjMaeVytW8M1
|
| Also, nicer recording from POSETTE:
| https://youtu.be/tpq4nfEoioE?si=Qkmj8o990vkeRkUa
|
| It comes down to the trade-offs made by operational and
| analytical query engines being fundamentally different at every
| level.
| pgguru wrote:
| DuckDB provided a lot of infrastructure for reading/writing
| parquet files and other common formats here. It also was
| inherently multi-threaded and supported being embedded in a
| larger program (similar to sqllite), so made it a good basis
| for something that could work outside of the traditional
| process model of Postgres.
|
| Additionally, the postgres extension system supports most of
| the current project, so wouldn't say it was forced in this
| case, it was a design decision. :)
| spenczar5 wrote:
| Very cool. One question that comes up for me is whether pg_lake
| expects to control the Iceberg metadata, or whether it can be
| used purely as a read layer. If I make schema updates and
| partition changes to iceberg directly, without going through
| pg_lake, will pg_lake's catalog correctly reflect things right
| away?
| pgguru wrote:
| We have some level of external iceberg table read-only support,
| but it is limited at the moment. See this example/caveat:
| https://github.com/Snowflake-Labs/pg_lake/blob/main/docs/fil...
| mslot wrote:
| You can use it as a read layer for for specific metadata JSON
| URL or a table in a REST catalog. The latter got merged quite
| recently, not yet in docs.
| boshomi wrote:
| Why not just use Ducklake?[1] That reduces complexity[2] since
| only DuckDB and PostgreSQL with pg_duckdb are required.
|
| [1] https://ducklake.select/
|
| [2] DuckLake - The SQL-Powered Lakehouse Format for the Rest of
| Us by Prof. Hannes Muhleisen:
| https://www.youtube.com/watch?v=YQEUkFWa69o
| pgguru wrote:
| Boils down to design decisions; see:
| https://news.ycombinator.com/item?id=45813631
| mslot wrote:
| DuckLake is pretty cool, and we obviously love everything the
| DuckDB is doing. It's what made pg_lake possible, and what
| motivated part of our team to step away from Microsoft/Citus.
|
| DuckLake can do things that pg_lake cannot do with Iceberg, and
| DuckDB can do things Postgres absolutely can't (e.g. query data
| frames). On the other hand, Postgres can do a lot of things
| that DuckDB cannot do. For instance, it can handle >100k single
| row inserts/sec.
|
| Transactions don't come for free. Embedding the engine in the
| catalog rather than the catalog in the engine enables
| transactions across analytical and operational tables. That way
| you can do a very high rate of writes in a heap table, and
| transactionally move data into an Iceberg table.
|
| Postgres also has a more natural persistence & continuous
| processing story, so you can set up pg_cron jobs and use
| PL/pgSQL (with heap tables for bookkeeping) to do
| orchestration.
|
| There's also the interoperability aspect of Iceberg being
| supported by other query engines.
| jabr wrote:
| How does this compare to https://www.mooncake.dev/pgmooncake?
| It seems there are several projects like this now, with each
| taking a slightly different approach optimized for different
| use cases?
| j_kao wrote:
| FYI the mooncake team was acquired by Databricks so it's
| basically vendors trying to compete on features now :)
| mslot wrote:
| Definitely similar goals, from the Mooncake author:
| https://news.ycombinator.com/item?id=43298145
|
| I think pg_mooncake is still relatively early stage.
|
| There's a degree of maturity to pg_lake resulting from our
| team's experience working on extensions like Citus,
| pg_documentdb, pg_cron, and many others in the past.
|
| For instance, in pg_lake all SQL features and transactions
| just work, the hybrid query engine can delegate different
| fragments of the query into DuckDB if the whole query
| cannot be handled, and having a robust DuckDB integration
| with a _single DuckDB instance_ (rather than 1 per session)
| in a separate server process helps make it production-
| ready. It is used in heavy production workloads already.
|
| No compromise on Postgres features is especially hard to
| achieve, but after a decade of trying to get there with
| Citus, we knew we had to get that right from day 1.
|
| Basically, we could speed run this thing into a
| comprehensive, production-ready solution. I think others
| will catch up, but we're not sitting still either. :)
| anktor wrote:
| What does data frames mean in this context? I'm used to them
| in spark or pandas but does this relate to something in how
| duckDB operates or is it something else?
| dunefox wrote:
| It's a python data frame.
| mritchie712 wrote:
| > For instance, it can handle >100k single row inserts/sec.
|
| DuckLake already has data-inlining for the DuckDB catalog,
| seems this will be possible once it's supported in the pg
| catalog.
|
| > Postgres also has a more natural persistence & continuous
| processing story, so you can set up pg_cron jobs and use
| PL/pgSQL (with heap tables for bookkeeping) to do
| orchestration.
|
| This is true, but it's not clear where I'd use this in
| practice. e.g. if I need to run a complex ETL job, I probably
| wouldn't do it in pg_cron.
| derefr wrote:
| > This is true, but it's not clear where I'd use this in
| practice. e.g. if I need to run a complex ETL job, I
| probably wouldn't do it in pg_cron.
|
| Think "tiered storage."
|
| See the example under https://github.com/Snowflake-
| Labs/pg_lake/blob/main/docs/ice...: select
| cron.schedule('flush-queue', '* * * * *', $$ with
| new_rows as ( delete from measurements_staging
| returning * ) insert into measurements
| select * from new_rows; $$);
|
| The "continuous ETL" process the GP is talking about would
| be exactly this kind of thing, and just as trivial. (In
| fact it would _be_ this exact same code, just with your
| mental model flipped around from "promoting data from a
| staging table into a canonical iceberg table" to "evicting
| data from a canonical table into a historical-archive
| table".)
| darth_avocado wrote:
| This is so cool! We have files in Iceberg that we then move data
| to/from to a PG db using a custom utility. It always felt more
| like a workaround that didn't fully use the capabilities of both
| the technologies. Can't wait to try this out.
| fridder wrote:
| I love this. There are definitely shops where the data is a bit
| too much for postgres but something like Snowflake would be
| overkill. Wish this was around a couple years ago lol
| drchaim wrote:
| More integrations are great. Anyway, the "this is awesome" moment
| (for me) will be when you could mix row- and column-oriented
| tables in Postgres, a bit like Timescale but native Postgres and
| well done. Hopefully one day.
| pgguru wrote:
| Hypertables definitely had the arrays columns auto-expanding
| with the custom node type. Not sure what else it would look
| like for what you describe.
|
| That said, don't sleep on the "this is awesome" parts in this
| project... my personal favorite is the automatic schema
| detection:
|
| ``` CREATE TABLE my_iceberg_table () USING iceberg WITH
| (definition_from = 's3://bucket/source_data.parquet'); ```
| harisund1990 wrote:
| This is cool to see! Looks like a compete against pg_mooncake
| which Databricks acquired. But how is this different from
| pg_duckdb?
| apexalpha wrote:
| I'm not super into the Data sphere but my company relies heavily
| on Snowflake which is becoming an issue.
|
| This announcement seems huge to me, no?!
|
| Is this really an open source Snowflake covering most use cases?
| taude wrote:
| there's also plenty of other options for warehouse/compute
| processing of iceberg data storage.
|
| I think this is a pretty big deal, though.
|
| Snowflake does a lot more, though, especially around sharing
| data across company boundaries.
| lysecret wrote:
| Nice does this also allow me to write to parquet from my Postgres
| table?
| lysecret wrote:
| Ong yes it works this would have made my past job so much
| easier.
| mslot wrote:
| Yes, just COPY table TO 's3://mybucket/data.parquet'
|
| Or COPY table TO STDOUT WITH (format 'parquet') if you need it
| on the client side.
| claudeomusic wrote:
| Can someone dumb this down a bit for a non data-engineer? Hard to
| fully wrap my head around who this is/isn't best suited for.
| lysecret wrote:
| One usecase we have (we built it ourselves) is to periodically
| offload data from Postgres to lake house partitioned data on
| GCS. The way I see it this can now be done with a single query.
| Another one is the other way around to use posters as a query
| engine or to merge offloaded data with your live data.
| scirob wrote:
| Crunchydata did it first :) but nice to get more options
| mslot wrote:
| It's the same team and same project :). Crunchy Data was
| acquired by Snowflake.
| scirob wrote:
| Holy shit thats amazing!! Congrats to the team
| iamcreasy wrote:
| If anyone from Supabase is reading, it would be awesome to have
| this extension!
| pjd7 wrote:
| This is awesome, I will be trying this out in the coming months.
| Its just made it to the top of my R&D shortlist for things that
| could massively simplify our data stack for a b2b saas.
___________________________________________________________________
(page generated 2025-11-04 23:00 UTC)