[HN Gopher] Supabase Wrappers: A framework for building Postgres...
___________________________________________________________________
Supabase Wrappers: A framework for building Postgres foreign data
wrappers
Author : yurisagalov
Score : 103 points
Date : 2022-12-15 15:59 UTC (7 hours ago)
(HTM) web link (supabase.com)
(TXT) w3m dump (supabase.com)
| nixalaister wrote:
| I'm excited about Wrappers because of an idea born in the GraphQL
| community: using GraphQL as the point of integration for many
| different services in your app. A use case GraphQL seemed made
| for. It's not without problems, though, such as when you need
| data from one of those services inside your database. So this
| takes that idea and moves it one level down into the database.
| You still get the benefits of a unified API (through REST or
| GraphQL) without the same limitations. I think there couldn't be
| a more ideal point of integration!
|
| disclaimer: I am a Supabase employee, but these views are my own.
| jensneuse wrote:
| We (https://wundergraph.com) do exactly this, API integration
| with GraphQL as the integration layer. Integrating APIs is
| messy. It could require custom middleware, etc... Glueing
| something together is the easy part. How do you mock it? How do
| you test it? Add custom logic? Doing it at the API layer allows
| you to seamlessly scale your serverless workloads. Doing all
| this work in the database means that we're eating up resources
| for tasks that the database is not designed to do. Just because
| FDW exists doesn't mean we should use it for this use case.
| oliverrice wrote:
| > How do you mock it? How do you test it?
|
| Stripe and Firebase both offer test endpoints for their
| services. The foreign data wrappers for each allow subbing in
| a user defined endpoint in their `options` so thats how I'd
| recommend testing the two that have released. Some of the
| pre-release integrations can be spun up locally in docker
| e.g. clickhouse. The FDWs for those similarly take an
| arbitrary connection string making it pretty straightforward
| to write tests. Here's an example from the repo https://githu
| b.com/supabase/wrappers/blob/5fac8afb62e6e8362b...
|
| > Add custom logic?
|
| Views!
|
| > tasks that the database is not designed to do
|
| The C API for foreign data wrappers is baked right into
| Postgres proper. They've been around since 2013 and are
| pretty battle hardened at this point. Supabase Wrappers
| exposes that API in rust so users can write FDWs without
| worrying about accidentally tanking the Postgres process. Its
| more about making a great Postgres feature more accessible
| than tricking the database into doing anything new.
| kiwicopple wrote:
| another possibility for application developers - mocking
| can be done at the "application" level. For example, using
| something like Jest, or Supabase can build mocking into
| client libs.
|
| I'm not clear how mocking in the middleware is materially
| different from this approach (except perhaps you are
| testing network connectivity too).
| atonse wrote:
| This is wonderful!
|
| I love the pragmatism behind tools like osquery and steampipe
| that expose a lot of APIs as database tables. It makes these
| datasets available to non-programmers that are more comfortable
| with a database/tabular format.
|
| Is it fair to say though, that FDWs have to run as compiled
| shared libs? I've always wondered if there can be (like with VS
| code and language servers) a protocol where we can run a generic
| FDW that speaks a particular API over the network, and then you
| can build out-of-process connectors to it and just have to know
| the usual tools (HTTP, JSON, etc).
|
| Thoughts? Maybe this already exists.
|
| Then anyone could potentially build a bespoke API/dataset and
| just point an FDW to it.
| kiwicopple wrote:
| > _speaks a particular API over the network_
|
| it's a interesting idea, and one of the things that we were
| toying with in our pg_net extension
| (https://github.com/supabase/pg_net). This is a "generic" async
| network extension, so you can fetch/put/post. It works well for
| APIs.
|
| I think the generic approach works for some things where the
| data is less "fixed" - for example, an OpenAI API endpoint.
|
| But for "fixed" data (data warehouses), the wrapper usually
| needs some custom work for security, protocols, and "push
| down". I'll be interested to get HN's take on this - they might
| have some suggestions for us for this framework
| tpetry wrote:
| Like pgsql-http?
|
| https://github.com/pramsey/pgsql-http
| claytongulick wrote:
| I love Supabase and what they're doing! I evaluated them heavily
| when designing architecture for a healthcare product.
|
| I'm not sure about this one though - rust is a great systems
| language, but it wouldn't be my first choice for bridging the db
| <-> api gap.
|
| I wonder why this wasn't built on top of, or an enhancement to,
| the existing (excellent) multicorn[1] project. Python seems like
| a better choice of language for dealing with io bound problems
| like talking to remote APIs.
|
| Multicorn is mature, stable, well tested and has a ton of FDW
| implementations already.
|
| The dynamic nature of python simplifies the development/debug
| cycle, and serialization to/from JSON is easier than in any
| mainstream language except for javascript.
|
| I'd love to understand more about the technical rationale that
| drove this.
|
| [1] https://multicorn.org/
| oliverrice wrote:
| > I wonder why this wasn't built on top of, or an enhancement
| to, the existing (excellent) multicorn[1] project
|
| Have to agree with you there, multicorn is extremely cool. I'm
| a big sqlalchemy fan so their default SQLA wrapper was a killer
| feature to give up (although maybe we could do something
| similar with launchbadge/sqlx[1]). We investigated using
| multicorn early this year and had a few hiccups. Activity on
| the original repo[2] quieted way down ~3 years ago. For
| example, pg14 support hasn't landed and the newest supported
| python version is EOL in 2022. There is new fork[3] with pg14
| support (15 in the pipe) that might pick up in adoption but
| thats still TBD.
|
| Supabase aims to support new major Postgres versions within 2-3
| months so we have to be very careful taking on dependencies
| that might slow that process.
|
| > I'd love to understand more about the technical rationale
| that drove this.
|
| Architecturally, multicorn has postgres communicate with a
| separate python process on the host that does all the hard
| work. That's convenient, but it can bloat over time and/or be
| memory hungry for larger result sets. The rust implementation
| runs in-process and is generally a lot lighter.
|
| Currently I'd say supabase/wrappers is a safer/easier version
| of the C API vs a direct analog to multicorn. Over time I think
| we'll see that comparison become more appropriate. There's a
| lot of excitement around the concept internally and we've
| already been floating some ideas wrt `auto-mapping` tables for
| common SQL dialects, a generic JSON HTTP API wrapper, etc. Stay
| tuned!
|
| [1]https://github.com/launchbadge/sqlx
| [2]https://github.com/Segfault-Inc/Multicorn
| [3]https://github.com/pgsql-io/multicorn2
| thedangler wrote:
| So you're telling me I can use this to create my own wrapper to
| get data directly from another API?
|
| I can now use this instead of creating my own workflow to get the
| data via an api which will be stored in the DB anyways?
| kiwicopple wrote:
| yes, that's correct. we've built a read-only version for Stripe
| (which is API-based), and we aim to have the read/write
| implementation done soon.
|
| You will be able to do something like this:
| insert into stripe_products (name) values ('Pizza'),
| ('Pasta');
|
| This will insert a value into Stripe via the API. Then you can
| query your stripe products like this: select
| * from stripe_products limit 10;
| thedangler wrote:
| Is there documentation on how to interact with a API for the
| wrapper? The API I'm probably going to try this with needs
| some crazy logic to parse the responses and does everything
| through query parameters.
|
| Can't wait until this is ready.
| kiwicopple wrote:
| You'd need to build a specific wrapper for that API. For
| example, here[0] is the wrapper for Stripe (docs[1])
|
| I think you're looking for something more generic, like
| pg_net[2]. This would allow you to do your crazy logic by
| parsing your API response: select
| net.http_get('https://news.ycombinator.com') as request_id;
| select body, status_code from net.http_collect_response(1,
| async:=false);
|
| [0] Stripe src: https://github.com/supabase/wrappers/tree/m
| ain/wrappers/src/...
|
| [1] Stripe docs:
| https://supabase.github.io/wrappers/stripe/
|
| [2] pg_net: https://supabase.github.io/pg_net/api/
| omnibrain wrote:
| So I could implement one myself for arbitrary REST APIs?
|
| I can't wait for somebody offering a generator, where you
| plug in your API, it pulls and parses the JSON, then you can
| select the fields you want and it generates the wrapper.
| (Alternatively, for put/post you could supply your own JSON).
|
| Practically like some low/no code tools like
| Appsmith/Budibase and the likes already do today.
|
| Sadly I lack the necessary skills and more important the time
| to dive into that.
| kiwicopple wrote:
| Yes, I guess it would be possible to works with generic
| REST APIs - ones that all conform to a similar model.
|
| > _generator, where you plug in your API, it pulls and
| parses the JSON, then you can select the fields you want
| and it generates the wrapper_
|
| Probably on this one the wrapper could use an OpenAPI spec
| kiwicopple wrote:
| hey hn, supabase ceo here
|
| this one might be more in of a "Show HN" because it's a pre-
| release - something that you might want to try yourself or
| contribute to. The GitHub repo is here:
| https://github.com/supabase/wrappers
|
| For context, Postgres has Foreign Data Wrappers (FDW) [0]. These
| allow you to connect one Postgres database to another, and then
| query the secondary database directly in the first.
| CREATE FOREIGN TABLE other_database_table ( id
| integer, title text ) SERVER
| other_database;
|
| The data does not need to be "moved" to the primary database - it
| stays in the secondary and is fetched when you run a select
| query: select * from other_database_table;
|
| Our release today is a framework which extends this functionality
| to other databases/systems. If you're familiar with Multicorn[1]
| or Steampipe[2], then it's very similar. The framework is written
| in Rust, using the excellent pgx[3].
|
| We have developed FDWs for Stripe, Firebase, BigQuery,
| Clickhouse, and Airtable - all in various pre-release states.
| We'll focus on the systems we're using internally while we
| stabalize the framework.
|
| There's a lot in the blog post into our goals for this release.
| It's early, but one of the things I'm most excited about.
|
| [0] Postgres FDW: https://www.postgresql.org/docs/current/sql-
| createforeigndat...
|
| [1] Multicorn: https://multicorn.org/
|
| [2] Steampipe: https://steampipe.io/
|
| [2] pgx: https://github.com/tcdi/pgx
| kiwicopple wrote:
| Full documentation is here:
| https://supabase.github.io/wrappers/
| chrisjc wrote:
| The blog post, repo and your comment still leave me a little
| confused about what Supabase's wrappers project is, the intent
| and its current state.
|
| So this "wrappers" project is just a framework that simplifies
| the development and manages the curation and distribution of
| Postgres Foreign Data Wrappers? At the end of the day, are
| these just FDWs that run on top of any Postgres instance? Do I
| have to run Supabase to take advantage of the FDWs created
| through this framework?
|
| Additionally one of your examples is Snowflake, but I don't see
| a matching SF FDW in your repo? Is there a generic JDBC/SQL/etc
| FDW for any SQL based database like Snowflake, Oracle, etc? Or
| is this just to create a spark in someone's mind (guilty!)
| about what is possible and yet to be implemented?
|
| Despite my confusion, this sounds like a very exciting project
| to follow.
| dennisy wrote:
| This seems cool to nerd out to! So thanks for that!
|
| But what is the point? Why is this better than making the API
| call?
|
| SQL is for queries, why would I want to do networking with it?
| ako wrote:
| Because SQL allows you to better specify what data you're
| interested in? GraphQL and OData take a similar approach.
| kiernanmcgowan wrote:
| This ends up being a way to hide an ETL pipeline behind a SQL
| query. Using this and some materialized views it makes it easy
| to just pull data in from an external source and just have it
| in your system.
|
| I'm not sure on the value of this for customer facing
| production systems, but for internal reporting / product
| analytics this should make it really easy to pull in disparate
| datasets w/o having to spend eng time to keep each one running.
| rudasn wrote:
| Pg newbie here, first time I'm reading about this. Super
| useful for internal tools indeed!
|
| Could you point me to any tools or resources on achieving
| that (pulling data from several external databases into a
| single one for the purpose of analytics/aggregation)?
| sisve wrote:
| So much great postgres stuff comming out from supabase. I'm
| really impressed with how much focus and direction the team have.
| So many companies would try to broaden their product/feature
| suite, but fail at understanding what made the core sucsessful
| and stretch the company in many directions. It really seems like
| they are just making them self better and better experts at
| postgres. And how to use it well. Really looking forward to see
| where this is going.
|
| On a sidenote, is not the wrappers for Airtable, BigQuery and
| ClickHouse opensourced? Or why did they skip that column in the
| second table?
|
| Is all of supabase opensourced now? I'm meaning I heard something
| about function not being opensourced but I can be remembering
| wrong. Most stuff is on GitHub I see
| oliverrice wrote:
| > On a sidenote, is not the wrappers for Airtable, BigQuery and
| ClickHouse opensourced? Or why did they skip that column in the
| second table?
|
| All of the wrappers are open source. You can see the source for
| the Airtable, BigQuery, and ClickHouse wrappers here
| https://github.com/supabase/wrappers/tree/5fac8afb62e6e8362b...
|
| The `self-hosted` column is only missing from the "under
| development" wrappers in the blog post's table because those
| are not production ready and shouldn't be self hosted (yet).
|
| > Is all of supabase opensourced now?
|
| Yep! The whole stack is open source and can be self hosted
| abuehrle wrote:
| Sorry I couldn't figure this out from the docs, but Stripe data
| is queried "live" from Stripe, right? The abstraction is great,
| but won't this lead to unexpected N API calls when joining across
| my domain + Stripe?
| kiwicopple wrote:
| Yes, the data is queried live.
|
| > _unexpected N API calls when joining across my domain_
|
| I'm not sure why they would be unexpected (because it should
| displace some other API calls). I'll hazard a guess that you're
| worried about fetching the same data multiple times? If that's
| the case, then yes, you should materialize the data into your
| database.
|
| The Wrapper itself handles pagination, but you'd also want to
| make sure you're using filters & limits in your SQL statements.
| abuehrle wrote:
| Thanks for the reply. I'll flesh out my thought process in
| case it's helpful. My immediate reaction was excitement about
| the abstraction. An example use case is joining my users to
| their corresponding Stripe Customers in SQL. The kinds of
| queries I can reasonably write depend on implementation
| details of the connector. For example, if Stripe has a bulk
| customer lookup (list of customer IDs -> Customers), and the
| connector uses it, I can estimate I'd be able to query on the
| order of 500 Users at a time in a performant way. But if the
| API only supports looking up one customer at a time, that 500
| User query kicks off 500 API requests, which isn't going to
| work.
|
| You're right -- it's not unexpected -- maybe more like a
| leaky abstraction.
| kiwicopple wrote:
| I understand now, and this is a similar problem to how some
| GraphQL engines work
|
| I imagine you want to do something like:
| select * from
| public.users join stripe.customers on
| public.users.stripe_id = stripe.customers.id limit
| 100;
|
| Then yes, it might make 100 consecutive calls to your
| stripe account. There are 3 options here:
|
| 1. Materialize your customers into your database (like I
| mention in the previous comment)
|
| 2. We build a "smart" FDW, so that it parses your query and
| fetches the Stripe data first, then performs the join.
|
| 3. Use a CTE: with customers as (
| select * from stripe.customers ), users as
| ( select * from public.users )
| select * from users join
| customers on users.stripe_id =
| customers.id
| infogulch wrote:
| 4. Use a local http cache in front of stripe's api. This
| is basically "external materialization".
| ako wrote:
| Materialized views enable you to "cache" the response, and only
| refresh it periodically.
| chrisjc wrote:
| Wait, you're telling me that you can create materialized
| views using foreign tables in Postgres?
|
| Is there a way to propagate changes from the foreign data
| source through the FDW to Postgres?
|
| Or would it just be some kind of task polls the foreign data
| source pulling a delta?
| spankalee wrote:
| How well do foreign data wrappers work with Postgres's query
| planner? Does't the planner need to estimate the cardinality of
| results from each table? Are the Supabase wrappers providing this
| info from the wrappers even though it might be hard to get from
| the data source?
|
| Also, is the Firebase connector for Realtime Database or
| Firestore?
| jzelinskie wrote:
| I really want to love foreign data wrappers for Postgres and this
| seems like a big improvement over existing Python library, but
| the lack of support for them in managed databases services makes
| them a non-starter for so many use-cases.
|
| Because RDS, for example, will only support the foreign data
| wrapper for reading from another "Postgres", what we really need
| is a server that supports the Postgres wire protocol (easier said
| than done) and you implement your drivers as a handler to that
| server.
| radiowave wrote:
| Never tried this with RDS, but it's entirely possible to use
| postgres_fdw to interact with a foreign table on _another_
| postgres server, where you might have more choice about what
| extensions you run.
|
| I'm doing this right now because I have a postgres installation
| that it's not yet convenient to upgrade beyond v12, but where
| I'd really like the benefit of the recently-improved JDBC FDW,
| which requires at least v13.
___________________________________________________________________
(page generated 2022-12-15 23:01 UTC)