[HN Gopher] Moving a billion Postgres rows on a $100 budget
___________________________________________________________________
Moving a billion Postgres rows on a $100 budget
Author : samaysharma
Score : 47 points
Date : 2024-02-21 19:54 UTC (3 hours ago)
(HTM) web link (blog.peerdb.io)
(TXT) w3m dump (blog.peerdb.io)
| _boffin_ wrote:
| How about 11b and a horrible python script over to parquet to
| your wee little NAS for your homelab?
| sss111 wrote:
| now we're talking about innovation
| twelfthnight wrote:
| If someone is price conscious, why move from postgres to
| snowflake?
| happytiger wrote:
| That was my first thought as well -- like who is the audience
| for this?
|
| My first thought was that you could stay on Postgres _and save
| that $100_ by using the secret power of Open Source.
|
| Well said.
| __s wrote:
| Seems like the title would be more accurate specifying
| postgres to snowflake
|
| In the end, there's companies paying to use snowflake, &
| despite what one may believe they aren't price oblivious.
| Having their application in postgres is a cost optimization,
| but then still relying on snowflake for data warehouse
| integrations
| saisrirampur wrote:
| Based on my experience working with Postgres users since a
| decade (ex-Citus/Microsoft), I don't think Postgres is there
| yet to support every possible workload - ex: medium to larger
| scale (ex:1TB+) workloads in Real Time analytics, Data
| Warehousing, Search etc. Sure at smaller scales, it is very
| versatile to support any workload. That is why it is super
| common for companies to complement Postgres with other data
| stores. Don't mean to say that Postgres will not get there, I
| think it will! But I see it to be more in the long term.
| xenator wrote:
| We operate with 80 Tb of data ATM. It is laying in several
| nodes and meta nodes (this is our own terminology). All
| Postgres.
|
| Recently we need to move data from one DB to another, about
| 600M records. It is not biggest chank of the data, but we
| need it on different server because we use FTS a lot. And
| don't want to interrupt other operations on previous
| server. It took 3 days and costs 0.
| saisrirampur wrote:
| Thanks for context! Totally understand where you are
| coming from. Postgres can be moulded to work for many
| use-cases. However it could take good amount of effort to
| make it happen. For example in your case building and
| managing a sharded Postgres environment isn't
| straightforward. It requires quite a lot of time and
| expertise. Citus automated exactly this (sharding).
| However it wasn't a fit for every workload. https://docs.
| citusdata.com/en/v12.1/get_started/what_is_citu...
| quadrature wrote:
| Its not about being price conscious, its about finding an
| efficient way to replicate data into your data warehouse.
| saisrirampur wrote:
| OP here, thanks for chiming in. Yep, the blog talks about how
| efficient you can be while replicating data from Postgres to
| Data Warehouses. Also just for some context, the inspiration
| behind writing this blog is a common concern from customers
| that data-movement is expensive and also the insanely crazy
| margins in this space.
| saisrirampur wrote:
| Thanks for chiming in. The concepts/idea of this blog can be
| extended to other target data-stores too, say ClickHouse.
| rplnt wrote:
| (edit: mostly offtopic observation follows)
|
| I only knew Snowflake the id selection algorithm, so was a bit
| confused, but googling "snowflake db" showed me this blurb and
| now I'm even more confused.
|
| > Snowflake enables organizations to learn, build, and connect
| with their data-driven peers. Collaborate, build data apps &
| power diverse workloads in the ...
| quickslowdown wrote:
| Snowflake & Azure Synapse are competing products, if that
| helps.
| pryelluw wrote:
| Does the elastic license fall under open source as defined by
| OSI?
| ralusek wrote:
| If someone asked me how much it would cost to move a billion
| Postgres rows, I would say "probably under $100."
|
| I just had to move 500 million "rows" into S3, and it came in at
| about $100. I would expect S3 to be more expensive.
| hipadev23 wrote:
| > Moving 1 billion rows is no easy task
|
| This isn't an accurate premise. Modern OLAP databases make
| dealing with billions to trillions of rows manageable, including
| on a single server. Exporting "select * from table" from an OLTP
| such as Postgres or MySQL into an OLAP is trivial and quite fast,
| and if 100M rows/sec on commodity servers isn't fast enough,
| there's always performance tuning [1].
|
| [1] https://altinity.com/blog/loading-100b-rows-in-minutes-in-
| al...
| saisrirampur wrote:
| That sentence is more in the context of the blog of moving a
| billion rows across data-stores. We will edit it to make that
| more clear. Thanks for the feedback.
| hipadev23 wrote:
| I can't speak for Snowflake, but on Clickhouse it's quite
| literally [1]: insert into new_table select
| * from postgresql('postgres:5432', 'db', 'table', 'user',
| 'pass');
|
| I assume it's similarly easy on Snowflake, Databricks,
| SingleStore, and the rest.
|
| [1] https://clickhouse.com/docs/en/sql-reference/table-
| functions...
| tharakam wrote:
| I'm confused. Everything sounds very expensive to me.
|
| The last table which compares it with the other vendors is
| surprising. Even Stich Data (cheapest) costs $1 to move 240K
| records: (1B / 4,166.67 = 240K). Is this real?
|
| So, their solution costs $1 to process 13.6M records. Sounds like
| this is not very share-worthy.
|
| What I'm missing here?
| jmholla wrote:
| This was a disappointing article. It was expecting it to explore
| validating the integrity and consistency of the data, but that
| just seems to be handwaved away by a short section saying PeerDB
| handles it. This is especially disappointing since the article
| calls that out as one of the cruxes that makes this so difficult.
___________________________________________________________________
(page generated 2024-02-21 23:00 UTC)