https://blog.crunchydata.com/blog/understanding-postgres_fdw

Crunchy Data Home

  * Home
  * Cloud

    Crunchy Bridge

    Overview Documentation Pricing Get Started

    Cloud Partners

    AWS Azure GCP
    Create your account Login Contact us
  * Products

    Featured Products

    Crunchy Bridge

    Fully managed cloud Postgres available on AWS, Azure, & GCP.

    Crunchy PostgreSQL for Kubernetes

    The Postgres Operator easily lets you automate your Postgres in
    Kubernetes, whether it's one or thousands of instances.

    Crunchy High Availability PostgreSQL

    Integrated high availability PostgreSQL solution for enterprises
    with always on requirements.

    Crunchy Hardened PostgreSQL

    Hardened, secure, access controlled PostgreSQL to meet advanced
    security requirements.

    All Crunchy Products

    Crunchy Bridge Crunchy PostgreSQL for Kubernetes Crunchy High
    Availability PostgreSQL Crunchy Hardened PostgreSQL Crunchy
    Certified PostgreSQL Crunchy PostgreSQL for Cloud Foundry Crunchy
    MLS PostgreSQL Crunchy Spatial
  * Solutions

    Industries

    Government Healthcare Finance

    Use Cases

    Internal Database-as-a-Service Embedded PostgreSQL Edge Computing

    Integrations

    Ansible Tower JFrog Kong OpenShift
  * Learn
    Crunchy Data Developer Portal Software Documentation Value of a
    Subscription
  * Customers

  * Blog
  * Contact Us
  * Download
  * Login
    Access Portal Crunchy Bridge

[Blog_3]

Understanding Foreign Data Wrappers in Postgres and postgres_fdw

August 18, 2021 Kat Batuigas
Application Development Data Warehouse Postgres Extensions

The idea of writing a database query that can then go out to an
external source may not occur to someone who is not a DBA early on.
That is: instead of figuring out how to grab then load multiple data
sets into the same store, or configuring your application backend to
connect to a bunch of disparate sources, why not use query JOINs like
you usually would across tables within one database? 

In case you're not familiar, the dblink module in PostgreSQL, along
with the concept of database links or linked servers in other DBMSs,
has been around for a while. Foreign data wrappers are newer, having
been introduced with PG 9+. Postgres now has a lot of foreign data
wrappers available and they work with plenty of different source
types: NoSQL databases, platforms like Twitter and Facebook,
geospatial data formats, etc. My colleague Craig Kerstiens has shared
his thoughts on Postgres being a "batteries included" database, and
it's so easy to see why.

That said, why might you still need foreign data wrappers between
Postgres servers? The default implementation doesn't support
cross-database queries, even on the same Postgres server. So you
still need the wrapper to handle the connection and fetch foreign
data. postgres_fdw is more or less the dblink equivalent for access
between Postgres servers, with the main difference being that
postgres_fdw conforms to SQL standards. They do provide a lot of the
same functionality but postgres_fdw is more recommended and more
widely used at this point.

postgres_fdw basics

If you haven't tried postgres_fdw, setting it up is pretty simple.
Say I have a contacts table on a local Postgres install, and I want
to be able to query a remote Crunchy Bridge database that stores 
sales information. I can do the following:

 1. Load the extension in my local database (`postgres_fdw` is
    included in Postgres `contrib`, and you do need CREATE privileges
    on the local database):

    CREATE EXTENSION postgres_fdw;

 2. Create a foreign server:

    CREATE SERVER salesinfo_bridge
            FOREIGN DATA WRAPPER postgres_fdw
        OPTIONS (host 'p.2gdmzr2pcbadzcstrkuolxvtpq.db.postgresbridge.com', dbname 'sales');

 3. Set up a user mapping to authenticate:

    CREATE USER MAPPING FOR postgres
            SERVER salesinfo_bridge
        OPTIONS (user 'fdw_user', password 'password');

 4. Then I can set up foreign tables that correspond to the tables I
    want to query on the foreign server. This is done in two
    different ways:
      + Run CREATE FOREIGN TABLE, which is pretty similar to CREATE
        TABLE in that you have to define column names, data types,
        constraints etc.
      + Run IMPORT FOREIGN SCHEMA, which imports tables and views
        from a schema, and creates foreign tables that match the
        definitions for the external tables. You even have the option
        to include/exclude specific tables only, which makes it even
        more convenient:

        test=# IMPORT FOREIGN SCHEMA public LIMIT TO (payment_methods, accounts)
        FROM SERVER salesinfo_bridge INTO public;

And I can carry on querying as if these tables were all on the same
database!

test=# SELECT c.id, pm.type, acct.balance
FROM contacts c
LEFT JOIN accounts acct ON c.id = acct.contact_id
LEFT JOIN payment_methods pm ON acct.id = pm.acct_id
WHERE acct.balance > 0;
id  |    type    | balance
----+------------+----------
  1 | mastercard | 2742.62
  2 | mastercard |  464.76
  3 | mastercard |  116.67

Even though I've recreated up the foreign schema on the local server,
these tables don't actually store data locally. This means that the
local server doesn't know anything about statistics from the external
 server (we'll look at the implications a bit later).

Don't forget about the psql commands that provide information related
to foreign data wrappers:

  * \des - list foreign servers
  * \deu - list uses mappings
  * \det - list foreign tables
  * \dtE - list both local and foreign tables
  * \d <name of foreign table> - show columns, data types, and other
    table metadata

Some things to consider for query optimization

postgres_fdw will try to optimize queries on its own. With that said,
the local server doesn't automatically gather statistics from the
foreign server or tables either. That makes sense, but can also make
query planning and optimization still somewhat involved. As a user,
you might consider these approaches (the first two are described in
the official postgres_fdw docs):

 1. Tell the foreign data wrapper that you want the foreign server to
    perform the cost estimate, by setting the use_remote_estimate
    option to true on the server or table level.
    So, every time you query the foreign server, you're asking it to
    perform additional EXPLAIN commands. You're effectively
    requesting more across the network each time, which may add to a
    longer total time for your query to return results depending on
    how complex the query is. If you leave it to the default false
    value, the local server performs the cost estimation itself.
 2. Run ANALYZE on the foreign tables, which updates those table
    statistics on the local server. But if the foreign tables are
    updated pretty frequently, the local statistics can quickly
    become stale as well. So you'd also need to consider how often
    you may have to schedule ANALYZE.
 3. Use a materialized view on the foreign table, which you can also
    refresh on a desired basis. You'd be working with a local
    snapshot of the external data, which should help with speed. This
    could work well when you're not required to always use live,
    up-to-the-minute data.

Foreign data wrappers as an alternative to ETL?

From my non-DBA perspective, the main takeaway is that foreign data
wrappers can simplify data querying and analysis when you need data
from disparate sources. And the biggest drawback may be that in many
cases you can't "set it and forget it," if you don't want to risk
poor query performance. With that said, if you're dealing with
massive amounts of data anyway, then there might be more suitable
approaches. But either way, it's nice that there are other options
aside from the standard ETL/ELT pattern. 

For those of you out there using foreign data wrappers, what have
been your most important considerations? I'm also curious to hear of
other use cases for FDWs. We're all ears at @crunchydata. 

  * Tweet
  * 
  * 

Using Cert Manager to Deploy TLS for Postgres on Kubernetes 
[scott-webb]

Like what you're reading? Stay informed by subscribing for our
newsletter!

[                    ] 

Newsletter

Like what you're reading? Stay informed by subscribing for our
newsletter!

 

Read More

 

Crunchy News

Crunchy Data Solutions logo

  *  
  *  
  *  
  *  

  * Privacy Policy
  * (c) 2018-2021 Crunchy Data Solutions, Inc.

Products

  * Crunchy Bridge
  * Crunchy PostgreSQL for Kubernetes
  * Crunchy High Availability PostgreSQL
  * Crunchy Certified PostgreSQL
  * Crunchy PostgreSQL for Cloud Foundry
  * Crunchy MLS PostgreSQL
  * Crunchy Spatial

Services & Support

  * Enterprise PostgreSQL Support
  * Red Hat Partner
  * Trusted PostgreSQL
  * Crunchy Data Subscription

Resources

  * Customer Portal
  * Software Documentation
  * Blog
  * Events
  * Videos
  * DISA STIG for PostgreSQL
  * CIS Benchmark for PostgreSQL

Company

  * About Crunchy Data
  * News
  * Careers
  * Contact Us
  * Newsletter