[HN Gopher] Building an Open, Multi-Engine Data Lakehouse with S...
___________________________________________________________________
Building an Open, Multi-Engine Data Lakehouse with S3 and Python
Author : bradhe
Score : 33 points
Date : 2025-02-18 17:33 UTC (5 hours ago)
(HTM) web link (tower.dev)
(TXT) w3m dump (tower.dev)
| dogman123 wrote:
| i'm working on a project to do this with iceberg and sqlmesh
| executed via airflow at my job. sqlmesh seems really promising. i
| investigated multi-engine executions in dbt and it seems like you
| need to pay a lot of $$$ for it (multi-engine execution requires
| multiple dbt projects) and is not included in dbt core.
| rockostrich wrote:
| Toby and the team at Tobiko really are a pleasure to work with.
| They have strong opinions but have shown a good amount of
| willingness to implement features as long as there's a strong
| general use case. We've been working with them for almost a
| year now and it's really interesting seeing a early-ish open
| source library being developed by a start-up develops (and how
| much influence you can have over the direction if you work
| closely with the dev team).
| dogman123 wrote:
| that's great to hear. it mirrors my observational experience
| from being in their slack channel. i'm aware of the technical
| risks of being an early adopter of a product like this, but i
| must say part of me is excited to be on board early to help
| to shape it from a user perspective. i'm still not totally
| bought in yet (still in mvp phase) but our use case as we
| scale almost requires multi-engine execution (athena, spark
| on EMR, duckdb) and it doesn't seem like anyone is doing it
| better.
| bradhe wrote:
| I'm one of the cofounders at Tower, posted this because I
| thought some people would be interested in the topic. Would be
| interested to know what Airflow is really...doing...for you
| here? Is it just an execution engine for your sqlmesh? Anyway,
| as we're trying to build out Tower would love to know more.
| dogman123 wrote:
| sqlmesh execution engine + cloud resource provisioning
|
| provision spark on emr or duckdb on beefy ec2 -> run sqlmesh
| -> wipe resources.
|
| i'm still in MVP phase of revamping my company's current data
| platform, so maybe there are better alternatives -- which i'd
| love to hear about.
| bradhe wrote:
| Cool, would be happy to talk a bit more about it. Want to
| shoot me an email (in bio) or if you sign up for our
| waitlist on https://tower.dev I can reach out in the AM
| (we're in Germany)!
| datancoffee wrote:
| Python support of Iceberg seems to be the biggest unrealized
| opportunity right now. SQL support seems to be in good shape,
| with DuckDB and such, but Python support is still quite nascent.
___________________________________________________________________
(page generated 2025-02-18 23:01 UTC)