[HN Gopher] Supabase Wrappers: A framework for building Postgres...
       ___________________________________________________________________
        
       Supabase Wrappers: A framework for building Postgres foreign data
       wrappers
        
       Author : yurisagalov
       Score  : 103 points
       Date   : 2022-12-15 15:59 UTC (7 hours ago)
        
 (HTM) web link (supabase.com)
 (TXT) w3m dump (supabase.com)
        
       | nixalaister wrote:
       | I'm excited about Wrappers because of an idea born in the GraphQL
       | community: using GraphQL as the point of integration for many
       | different services in your app. A use case GraphQL seemed made
       | for. It's not without problems, though, such as when you need
       | data from one of those services inside your database. So this
       | takes that idea and moves it one level down into the database.
       | You still get the benefits of a unified API (through REST or
       | GraphQL) without the same limitations. I think there couldn't be
       | a more ideal point of integration!
       | 
       | disclaimer: I am a Supabase employee, but these views are my own.
        
         | jensneuse wrote:
         | We (https://wundergraph.com) do exactly this, API integration
         | with GraphQL as the integration layer. Integrating APIs is
         | messy. It could require custom middleware, etc... Glueing
         | something together is the easy part. How do you mock it? How do
         | you test it? Add custom logic? Doing it at the API layer allows
         | you to seamlessly scale your serverless workloads. Doing all
         | this work in the database means that we're eating up resources
         | for tasks that the database is not designed to do. Just because
         | FDW exists doesn't mean we should use it for this use case.
        
           | oliverrice wrote:
           | > How do you mock it? How do you test it?
           | 
           | Stripe and Firebase both offer test endpoints for their
           | services. The foreign data wrappers for each allow subbing in
           | a user defined endpoint in their `options` so thats how I'd
           | recommend testing the two that have released. Some of the
           | pre-release integrations can be spun up locally in docker
           | e.g. clickhouse. The FDWs for those similarly take an
           | arbitrary connection string making it pretty straightforward
           | to write tests. Here's an example from the repo https://githu
           | b.com/supabase/wrappers/blob/5fac8afb62e6e8362b...
           | 
           | > Add custom logic?
           | 
           | Views!
           | 
           | > tasks that the database is not designed to do
           | 
           | The C API for foreign data wrappers is baked right into
           | Postgres proper. They've been around since 2013 and are
           | pretty battle hardened at this point. Supabase Wrappers
           | exposes that API in rust so users can write FDWs without
           | worrying about accidentally tanking the Postgres process. Its
           | more about making a great Postgres feature more accessible
           | than tricking the database into doing anything new.
        
             | kiwicopple wrote:
             | another possibility for application developers - mocking
             | can be done at the "application" level. For example, using
             | something like Jest, or Supabase can build mocking into
             | client libs.
             | 
             | I'm not clear how mocking in the middleware is materially
             | different from this approach (except perhaps you are
             | testing network connectivity too).
        
       | atonse wrote:
       | This is wonderful!
       | 
       | I love the pragmatism behind tools like osquery and steampipe
       | that expose a lot of APIs as database tables. It makes these
       | datasets available to non-programmers that are more comfortable
       | with a database/tabular format.
       | 
       | Is it fair to say though, that FDWs have to run as compiled
       | shared libs? I've always wondered if there can be (like with VS
       | code and language servers) a protocol where we can run a generic
       | FDW that speaks a particular API over the network, and then you
       | can build out-of-process connectors to it and just have to know
       | the usual tools (HTTP, JSON, etc).
       | 
       | Thoughts? Maybe this already exists.
       | 
       | Then anyone could potentially build a bespoke API/dataset and
       | just point an FDW to it.
        
         | kiwicopple wrote:
         | > _speaks a particular API over the network_
         | 
         | it's a interesting idea, and one of the things that we were
         | toying with in our pg_net extension
         | (https://github.com/supabase/pg_net). This is a "generic" async
         | network extension, so you can fetch/put/post. It works well for
         | APIs.
         | 
         | I think the generic approach works for some things where the
         | data is less "fixed" - for example, an OpenAI API endpoint.
         | 
         | But for "fixed" data (data warehouses), the wrapper usually
         | needs some custom work for security, protocols, and "push
         | down". I'll be interested to get HN's take on this - they might
         | have some suggestions for us for this framework
        
         | tpetry wrote:
         | Like pgsql-http?
         | 
         | https://github.com/pramsey/pgsql-http
        
       | claytongulick wrote:
       | I love Supabase and what they're doing! I evaluated them heavily
       | when designing architecture for a healthcare product.
       | 
       | I'm not sure about this one though - rust is a great systems
       | language, but it wouldn't be my first choice for bridging the db
       | <-> api gap.
       | 
       | I wonder why this wasn't built on top of, or an enhancement to,
       | the existing (excellent) multicorn[1] project. Python seems like
       | a better choice of language for dealing with io bound problems
       | like talking to remote APIs.
       | 
       | Multicorn is mature, stable, well tested and has a ton of FDW
       | implementations already.
       | 
       | The dynamic nature of python simplifies the development/debug
       | cycle, and serialization to/from JSON is easier than in any
       | mainstream language except for javascript.
       | 
       | I'd love to understand more about the technical rationale that
       | drove this.
       | 
       | [1] https://multicorn.org/
        
         | oliverrice wrote:
         | > I wonder why this wasn't built on top of, or an enhancement
         | to, the existing (excellent) multicorn[1] project
         | 
         | Have to agree with you there, multicorn is extremely cool. I'm
         | a big sqlalchemy fan so their default SQLA wrapper was a killer
         | feature to give up (although maybe we could do something
         | similar with launchbadge/sqlx[1]). We investigated using
         | multicorn early this year and had a few hiccups. Activity on
         | the original repo[2] quieted way down ~3 years ago. For
         | example, pg14 support hasn't landed and the newest supported
         | python version is EOL in 2022. There is new fork[3] with pg14
         | support (15 in the pipe) that might pick up in adoption but
         | thats still TBD.
         | 
         | Supabase aims to support new major Postgres versions within 2-3
         | months so we have to be very careful taking on dependencies
         | that might slow that process.
         | 
         | > I'd love to understand more about the technical rationale
         | that drove this.
         | 
         | Architecturally, multicorn has postgres communicate with a
         | separate python process on the host that does all the hard
         | work. That's convenient, but it can bloat over time and/or be
         | memory hungry for larger result sets. The rust implementation
         | runs in-process and is generally a lot lighter.
         | 
         | Currently I'd say supabase/wrappers is a safer/easier version
         | of the C API vs a direct analog to multicorn. Over time I think
         | we'll see that comparison become more appropriate. There's a
         | lot of excitement around the concept internally and we've
         | already been floating some ideas wrt `auto-mapping` tables for
         | common SQL dialects, a generic JSON HTTP API wrapper, etc. Stay
         | tuned!
         | 
         | [1]https://github.com/launchbadge/sqlx
         | [2]https://github.com/Segfault-Inc/Multicorn
         | [3]https://github.com/pgsql-io/multicorn2
        
       | thedangler wrote:
       | So you're telling me I can use this to create my own wrapper to
       | get data directly from another API?
       | 
       | I can now use this instead of creating my own workflow to get the
       | data via an api which will be stored in the DB anyways?
        
         | kiwicopple wrote:
         | yes, that's correct. we've built a read-only version for Stripe
         | (which is API-based), and we aim to have the read/write
         | implementation done soon.
         | 
         | You will be able to do something like this:
         | insert into stripe_products (name)         values ('Pizza'),
         | ('Pasta');
         | 
         | This will insert a value into Stripe via the API. Then you can
         | query your stripe products like this:                   select
         | *         from stripe_products         limit 10;
        
           | thedangler wrote:
           | Is there documentation on how to interact with a API for the
           | wrapper? The API I'm probably going to try this with needs
           | some crazy logic to parse the responses and does everything
           | through query parameters.
           | 
           | Can't wait until this is ready.
        
             | kiwicopple wrote:
             | You'd need to build a specific wrapper for that API. For
             | example, here[0] is the wrapper for Stripe (docs[1])
             | 
             | I think you're looking for something more generic, like
             | pg_net[2]. This would allow you to do your crazy logic by
             | parsing your API response:                   select
             | net.http_get('https://news.ycombinator.com') as request_id;
             | select body, status_code from net.http_collect_response(1,
             | async:=false);
             | 
             | [0] Stripe src: https://github.com/supabase/wrappers/tree/m
             | ain/wrappers/src/...
             | 
             | [1] Stripe docs:
             | https://supabase.github.io/wrappers/stripe/
             | 
             | [2] pg_net: https://supabase.github.io/pg_net/api/
        
           | omnibrain wrote:
           | So I could implement one myself for arbitrary REST APIs?
           | 
           | I can't wait for somebody offering a generator, where you
           | plug in your API, it pulls and parses the JSON, then you can
           | select the fields you want and it generates the wrapper.
           | (Alternatively, for put/post you could supply your own JSON).
           | 
           | Practically like some low/no code tools like
           | Appsmith/Budibase and the likes already do today.
           | 
           | Sadly I lack the necessary skills and more important the time
           | to dive into that.
        
             | kiwicopple wrote:
             | Yes, I guess it would be possible to works with generic
             | REST APIs - ones that all conform to a similar model.
             | 
             | > _generator, where you plug in your API, it pulls and
             | parses the JSON, then you can select the fields you want
             | and it generates the wrapper_
             | 
             | Probably on this one the wrapper could use an OpenAPI spec
        
       | kiwicopple wrote:
       | hey hn, supabase ceo here
       | 
       | this one might be more in of a "Show HN" because it's a pre-
       | release - something that you might want to try yourself or
       | contribute to. The GitHub repo is here:
       | https://github.com/supabase/wrappers
       | 
       | For context, Postgres has Foreign Data Wrappers (FDW) [0]. These
       | allow you to connect one Postgres database to another, and then
       | query the secondary database directly in the first.
       | CREATE FOREIGN TABLE other_database_table (            id
       | integer,            title text          )         SERVER
       | other_database;
       | 
       | The data does not need to be "moved" to the primary database - it
       | stays in the secondary and is fetched when you run a select
       | query:                   select * from other_database_table;
       | 
       | Our release today is a framework which extends this functionality
       | to other databases/systems. If you're familiar with Multicorn[1]
       | or Steampipe[2], then it's very similar. The framework is written
       | in Rust, using the excellent pgx[3].
       | 
       | We have developed FDWs for Stripe, Firebase, BigQuery,
       | Clickhouse, and Airtable - all in various pre-release states.
       | We'll focus on the systems we're using internally while we
       | stabalize the framework.
       | 
       | There's a lot in the blog post into our goals for this release.
       | It's early, but one of the things I'm most excited about.
       | 
       | [0] Postgres FDW: https://www.postgresql.org/docs/current/sql-
       | createforeigndat...
       | 
       | [1] Multicorn: https://multicorn.org/
       | 
       | [2] Steampipe: https://steampipe.io/
       | 
       | [2] pgx: https://github.com/tcdi/pgx
        
         | kiwicopple wrote:
         | Full documentation is here:
         | https://supabase.github.io/wrappers/
        
         | chrisjc wrote:
         | The blog post, repo and your comment still leave me a little
         | confused about what Supabase's wrappers project is, the intent
         | and its current state.
         | 
         | So this "wrappers" project is just a framework that simplifies
         | the development and manages the curation and distribution of
         | Postgres Foreign Data Wrappers? At the end of the day, are
         | these just FDWs that run on top of any Postgres instance? Do I
         | have to run Supabase to take advantage of the FDWs created
         | through this framework?
         | 
         | Additionally one of your examples is Snowflake, but I don't see
         | a matching SF FDW in your repo? Is there a generic JDBC/SQL/etc
         | FDW for any SQL based database like Snowflake, Oracle, etc? Or
         | is this just to create a spark in someone's mind (guilty!)
         | about what is possible and yet to be implemented?
         | 
         | Despite my confusion, this sounds like a very exciting project
         | to follow.
        
       | dennisy wrote:
       | This seems cool to nerd out to! So thanks for that!
       | 
       | But what is the point? Why is this better than making the API
       | call?
       | 
       | SQL is for queries, why would I want to do networking with it?
        
         | ako wrote:
         | Because SQL allows you to better specify what data you're
         | interested in? GraphQL and OData take a similar approach.
        
         | kiernanmcgowan wrote:
         | This ends up being a way to hide an ETL pipeline behind a SQL
         | query. Using this and some materialized views it makes it easy
         | to just pull data in from an external source and just have it
         | in your system.
         | 
         | I'm not sure on the value of this for customer facing
         | production systems, but for internal reporting / product
         | analytics this should make it really easy to pull in disparate
         | datasets w/o having to spend eng time to keep each one running.
        
           | rudasn wrote:
           | Pg newbie here, first time I'm reading about this. Super
           | useful for internal tools indeed!
           | 
           | Could you point me to any tools or resources on achieving
           | that (pulling data from several external databases into a
           | single one for the purpose of analytics/aggregation)?
        
       | sisve wrote:
       | So much great postgres stuff comming out from supabase. I'm
       | really impressed with how much focus and direction the team have.
       | So many companies would try to broaden their product/feature
       | suite, but fail at understanding what made the core sucsessful
       | and stretch the company in many directions. It really seems like
       | they are just making them self better and better experts at
       | postgres. And how to use it well. Really looking forward to see
       | where this is going.
       | 
       | On a sidenote, is not the wrappers for Airtable, BigQuery and
       | ClickHouse opensourced? Or why did they skip that column in the
       | second table?
       | 
       | Is all of supabase opensourced now? I'm meaning I heard something
       | about function not being opensourced but I can be remembering
       | wrong. Most stuff is on GitHub I see
        
         | oliverrice wrote:
         | > On a sidenote, is not the wrappers for Airtable, BigQuery and
         | ClickHouse opensourced? Or why did they skip that column in the
         | second table?
         | 
         | All of the wrappers are open source. You can see the source for
         | the Airtable, BigQuery, and ClickHouse wrappers here
         | https://github.com/supabase/wrappers/tree/5fac8afb62e6e8362b...
         | 
         | The `self-hosted` column is only missing from the "under
         | development" wrappers in the blog post's table because those
         | are not production ready and shouldn't be self hosted (yet).
         | 
         | > Is all of supabase opensourced now?
         | 
         | Yep! The whole stack is open source and can be self hosted
        
       | abuehrle wrote:
       | Sorry I couldn't figure this out from the docs, but Stripe data
       | is queried "live" from Stripe, right? The abstraction is great,
       | but won't this lead to unexpected N API calls when joining across
       | my domain + Stripe?
        
         | kiwicopple wrote:
         | Yes, the data is queried live.
         | 
         | > _unexpected N API calls when joining across my domain_
         | 
         | I'm not sure why they would be unexpected (because it should
         | displace some other API calls). I'll hazard a guess that you're
         | worried about fetching the same data multiple times? If that's
         | the case, then yes, you should materialize the data into your
         | database.
         | 
         | The Wrapper itself handles pagination, but you'd also want to
         | make sure you're using filters & limits in your SQL statements.
        
           | abuehrle wrote:
           | Thanks for the reply. I'll flesh out my thought process in
           | case it's helpful. My immediate reaction was excitement about
           | the abstraction. An example use case is joining my users to
           | their corresponding Stripe Customers in SQL. The kinds of
           | queries I can reasonably write depend on implementation
           | details of the connector. For example, if Stripe has a bulk
           | customer lookup (list of customer IDs -> Customers), and the
           | connector uses it, I can estimate I'd be able to query on the
           | order of 500 Users at a time in a performant way. But if the
           | API only supports looking up one customer at a time, that 500
           | User query kicks off 500 API requests, which isn't going to
           | work.
           | 
           | You're right -- it's not unexpected -- maybe more like a
           | leaky abstraction.
        
             | kiwicopple wrote:
             | I understand now, and this is a similar problem to how some
             | GraphQL engines work
             | 
             | I imagine you want to do something like:
             | select              *         from
             | public.users join stripe.customers         on
             | public.users.stripe_id = stripe.customers.id         limit
             | 100;
             | 
             | Then yes, it might make 100 consecutive calls to your
             | stripe account. There are 3 options here:
             | 
             | 1. Materialize your customers into your database (like I
             | mention in the previous comment)
             | 
             | 2. We build a "smart" FDW, so that it parses your query and
             | fetches the Stripe data first, then performs the join.
             | 
             | 3. Use a CTE:                   with         customers as (
             | select * from stripe.customers         ),         users as
             | (             select * from public.users         )
             | select              *         from              users join
             | customers         on             users.stripe_id =
             | customers.id
        
               | infogulch wrote:
               | 4. Use a local http cache in front of stripe's api. This
               | is basically "external materialization".
        
         | ako wrote:
         | Materialized views enable you to "cache" the response, and only
         | refresh it periodically.
        
           | chrisjc wrote:
           | Wait, you're telling me that you can create materialized
           | views using foreign tables in Postgres?
           | 
           | Is there a way to propagate changes from the foreign data
           | source through the FDW to Postgres?
           | 
           | Or would it just be some kind of task polls the foreign data
           | source pulling a delta?
        
       | spankalee wrote:
       | How well do foreign data wrappers work with Postgres's query
       | planner? Does't the planner need to estimate the cardinality of
       | results from each table? Are the Supabase wrappers providing this
       | info from the wrappers even though it might be hard to get from
       | the data source?
       | 
       | Also, is the Firebase connector for Realtime Database or
       | Firestore?
        
       | jzelinskie wrote:
       | I really want to love foreign data wrappers for Postgres and this
       | seems like a big improvement over existing Python library, but
       | the lack of support for them in managed databases services makes
       | them a non-starter for so many use-cases.
       | 
       | Because RDS, for example, will only support the foreign data
       | wrapper for reading from another "Postgres", what we really need
       | is a server that supports the Postgres wire protocol (easier said
       | than done) and you implement your drivers as a handler to that
       | server.
        
         | radiowave wrote:
         | Never tried this with RDS, but it's entirely possible to use
         | postgres_fdw to interact with a foreign table on _another_
         | postgres server, where you might have more choice about what
         | extensions you run.
         | 
         | I'm doing this right now because I have a postgres installation
         | that it's not yet convenient to upgrade beyond v12, but where
         | I'd really like the benefit of the recently-improved JDBC FDW,
         | which requires at least v13.
        
       ___________________________________________________________________
       (page generated 2022-12-15 23:01 UTC)