hngopher.com

       [HN Gopher] Building an Open, Multi-Engine Data Lakehouse with S...
       ___________________________________________________________________
        
       Building an Open, Multi-Engine Data Lakehouse with S3 and Python
        
       Author : bradhe
       Score  : 33 points
       Date   : 2025-02-18 17:33 UTC (5 hours ago)
        
 (HTM) web link (tower.dev)
 (TXT) w3m dump (tower.dev)
        
       | dogman123 wrote:
       | i'm working on a project to do this with iceberg and sqlmesh
       | executed via airflow at my job. sqlmesh seems really promising. i
       | investigated multi-engine executions in dbt and it seems like you
       | need to pay a lot of $$$ for it (multi-engine execution requires
       | multiple dbt projects) and is not included in dbt core.
        
         | rockostrich wrote:
         | Toby and the team at Tobiko really are a pleasure to work with.
         | They have strong opinions but have shown a good amount of
         | willingness to implement features as long as there's a strong
         | general use case. We've been working with them for almost a
         | year now and it's really interesting seeing a early-ish open
         | source library being developed by a start-up develops (and how
         | much influence you can have over the direction if you work
         | closely with the dev team).
        
           | dogman123 wrote:
           | that's great to hear. it mirrors my observational experience
           | from being in their slack channel. i'm aware of the technical
           | risks of being an early adopter of a product like this, but i
           | must say part of me is excited to be on board early to help
           | to shape it from a user perspective. i'm still not totally
           | bought in yet (still in mvp phase) but our use case as we
           | scale almost requires multi-engine execution (athena, spark
           | on EMR, duckdb) and it doesn't seem like anyone is doing it
           | better.
        
         | bradhe wrote:
         | I'm one of the cofounders at Tower, posted this because I
         | thought some people would be interested in the topic. Would be
         | interested to know what Airflow is really...doing...for you
         | here? Is it just an execution engine for your sqlmesh? Anyway,
         | as we're trying to build out Tower would love to know more.
        
           | dogman123 wrote:
           | sqlmesh execution engine + cloud resource provisioning
           | 
           | provision spark on emr or duckdb on beefy ec2 -> run sqlmesh
           | -> wipe resources.
           | 
           | i'm still in MVP phase of revamping my company's current data
           | platform, so maybe there are better alternatives -- which i'd
           | love to hear about.
        
             | bradhe wrote:
             | Cool, would be happy to talk a bit more about it. Want to
             | shoot me an email (in bio) or if you sign up for our
             | waitlist on https://tower.dev I can reach out in the AM
             | (we're in Germany)!
        
       | datancoffee wrote:
       | Python support of Iceberg seems to be the biggest unrealized
       | opportunity right now. SQL support seems to be in good shape,
       | with DuckDB and such, but Python support is still quite nascent.
        
       ___________________________________________________________________
       (page generated 2025-02-18 23:01 UTC)