[HN Gopher] Show HN: Pontoon - Open-source customer data syncs
___________________________________________________________________
Show HN: Pontoon - Open-source customer data syncs
Hi HN, We're Alex and Kalan, the creators of Pontoon
(https://github.com/pontoon-data/Pontoon). Pontoon is an open-
source data export platform that makes it really easy to create
data syncs and send data to your enterprise customers. Check out
our demo here: https://app.storylane.io/share/onova7c23ai6 or try
it out with docker: https://pontoon-data.github.io/Pontoon/getting-
started/quick... While at our prior roles as data engineers, we've
both felt the pain of data APIs. We either had to spend weeks
building out data pipelines in house or spend a lot on ETL tools
like Fivetran (https://www.fivetran.com/). However, there were a
few companies that offered data syncs that would sync directly to
our data warehouse (eg. Redshift, Snowflake, etc.), and when that
was an option, we always chose it. This led us to wonder "Why don't
more companies offer data syncs?". It turns out, building reliable
cross-cloud data syncs is difficult. That's why we built Pontoon.
We designed Pontoon to be: - Easily deployed: we provide a single,
self-contained Docker image for easy deployment and Docker Compose
for larger workloads (https://pontoon-
data.github.io/Pontoon/getting-started/quick...) - Support modern
data warehouses: we support syncing to/from Snowflake, BigQuery,
Redshift, and Postgres. - Sync cross cloud: sync from BigQuery to
Redshift, Snowflake to BigQuery, Postgres to Redshift, etc. -
Developer friendly: data syncs can also be built via the API -
Open source: Pontoon is free to use by anyone Under the hood, we
use Apache Arrow (https://arrow.apache.org/) to move data between
sources and destinations. Arrow is very performant - we wanted to
use a library that could handle the scale of moving millions of
records per minute. In the shorter-term, there are several
improvements we want to make, like: - Adding support for DBT
models to make adding data models easier - UX improvements like
better error messaging and monitoring of data syncs - More sources
and destinations (S3, GCS, Databricks, etc.) - Improve the API for
a more developer friendly experience (it's currently tied pretty
closely to the front end) In the longer-term, we want to make data
sharing as easy as possible. As data engineers, we sometimes felt
like second class citizens with how we were told to get the data we
needed - "just loop through this api 1000 times", "you probably
won't get rate limited" (we did), "we can schedule an email to send
you a csv every day". We want to change how modern data sharing is
done and make it simple for everyone. Give it a try
https://github.com/pontoon-data/Pontoon. Cheers!
Author : alexdriedger
Score : 33 points
Date : 2025-08-01 15:28 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| melson wrote:
| Is it like an offline sync?
| kalanm wrote:
| Kalan here, syncs are batch based and scheduled, similar to
| conventional ETL / data pipelines
| conormccarter wrote:
| Congrats on the launch! I'm one of the cofounders of Prequel (I
| saw our name in the feature grid - small nit: we do support self-
| hosting). This is definitely a problem worth solving - the market
| is still early and I'd bet the rising tide will help all of us
| convince more teams to support this capability. I'm not a lawyer,
| but the latest EU Data Act might even make it an obligation for
| some software vendors?
|
| Maybe I can save you a headache: Snowflake is actively
| deprecating single-factor username/password auth in favor of key
| pair auth, so the faster you support that, the fewer mandatory
| migrations you'll be emailing users about.
| kalanm wrote:
| Thanks! Kalan here, I appreciate the nit! PR is already merged.
| Definitely agreed on the market, it seems like there's a ton of
| opportunity. And thanks for the heads up re Snowflake auth!
| we're actively working that one, and a few other auth modes for
| Redshift and BQ as well.
| hiatus wrote:
| What does the row "First-class Data Products" in the comparison
| table entail?
| alexdriedger wrote:
| Great question. We think of data products as multi-tenant
| tables that are created with the intention of sending that data
| to a customer.
|
| To compare with an ETL tool like Airbyte, it's really easy to
| sync a full table somewhere with Airbyte, but it get's more
| complicated if you have a multi-tenant table, where you want to
| sync only a subset of data to a customer.
|
| When you're setting up a data model with Pontoon, you just
| define which column has the customer id (we call it a tenant
| id) and it handles sending the right data to the right
| customer.
| a2128 wrote:
| Not to be confused with Pontoon, a self-hostable translation
| platform made by Mozilla: https://github.com/mozilla/pontoon
| alexdriedger wrote:
| Another great self-hostable platform. I'm not sure where they
| got their name from though, translations don't have a
| connection to lakes like data does...
___________________________________________________________________
(page generated 2025-08-01 23:01 UTC)