[HN Gopher] Dbdev - A database package manager for PostgreSQL tr...
___________________________________________________________________
Dbdev - A database package manager for PostgreSQL trusted language
extensions
Author : samwillis
Score : 101 points
Date : 2023-04-14 15:38 UTC (7 hours ago)
(HTM) web link (database.dev)
(TXT) w3m dump (database.dev)
| koolba wrote:
| Is this like pgxn but specific to trusted languages so presumably
| less installation issues?
| oliverrice wrote:
| yeah, exactly. pgxn also houses native extensions, some of
| which may be TLE compatible. database.dev is trusted language
| extension specific
|
| We've reached out to David, who runs pgxn to see how we might
| collaborate and allow pgxn to be used as another registry that
| users can install TLEs from
| mmmeff wrote:
| What's stopping me from adding an extension to this repository
| that adds something valuable while simultaneously exfiltrating
| entire databases back to me?
|
| It's a cool idea, but I won't be the first to try it. You're
| placing an awful lot of trust in these modules.
| thewataccount wrote:
| Nothing? It's a classic supply chain attack.
|
| The same goes for pypi, npm, crates.io, docker.com, github,
| chrome extensions, etc.
|
| Generally it comes down to trusted authors/package names. Many
| people trust `requests` from pypi. That being said even this
| doesn't prevent the package to be sold, or hacked, or from the
| owner later adding things, or a 3rd party sneaking something in
| via a PR. Some like docker (and technically most git based ones
| because of md5) can have specific tags changed at any time, so
| you could later add a malicious package to version 3.0.4 or
| whatever.
|
| Many package managers (mostly linux ones) include key-signing,
| some package managers also have built-in scanners. Preferably
| you proxy/cache versions you need, and have a way to audit
| version upgrades - but in reality it's a lot of work to do
| properly.
|
| Recent example of `pytorch` nightly on pypi accidentally having
| a dependancy named `torchtriton` that I think was name
| squatting? Anyway installing the pytorch nightly between
| December 25th to 30th would result in your home directory being
| uploaded, etc.
| https://www.bleepingcomputer.com/news/security/pytorch-discl...
|
| EDIT: That said if anyone from supabase reads this - I'd love
| to see GPG signing, version hashing/integrity verification, 2FA
| if it's not their already, and I know this one is a lot of
| extra work - but any type of automated even cursory glance for
| malicious code.
| michelpp wrote:
| > That said if anyone from supabase reads this - I'd love to
| see GPG signing, version hashing/integrity verification, 2FA
| if it's not their already
|
| Yep, we've got all these on the road map, thanks!
|
| > and I know this one is a lot of extra work - but any type
| of automated even cursory glance for malicious code.
|
| Unfortunately a "cursory glance" is trivially sidestepped,
| and anything complete enough to be useful is probably
| impossible. Maybe you want to write an extension that
| automates logically replicating your data off the server, or
| truncates a table after processing it, to you that's useful,
| to someone else that's malicious, there's just no way to
| know.
|
| If you don't trust the authors then the only way to trust a
| TLE is to inspect the code yourself in the context of what
| "safe" means to you. The pytorch example you mentioned was
| very likely caught by a human, if an automated system were
| possible for python (but likely impossible) it would have
| caught it sooner.
| michelpp wrote:
| Like any package or plugin for any language or framework, you
| are responsible for understanding what you are installing.
| Since TLEs do not come with any compiled code, this is
| relatively easy for you to verify yourself with code
| inspection.
|
| We expect a lot of DBAs to say "no way" to something like TLEs,
| that's fine, for similar reasons people don't use pip, npm, or
| cpan. You make your own risk/benefit analysis.
| bashinator wrote:
| You talking with the PostGIS folks?
| oliverrice wrote:
| PostGIS is a native extension mostly written in C. Since there
| is no C trusted language it would be hard to fit PostGIS into
| the TLE paradigm
|
| Its less of a problem for well known and trusted extensions
| like PostGIS though because it comes pre-installed on most
| hosted providers (Supabase, RDS, etc)
| mike_hearn wrote:
| At KotlinConf today I gave a talk on designing apps with two-tier
| architecture, where you implement your entire app without the web
| stack appearing anywhere at all. Instead you publish desktop and
| mobile apps that connect directly to an RDBMS like PostgreSQL via
| its native protocol, and use server extensions for any logic that
| is inconvenient to do with SQL.
|
| This approach might seem horrifyingly outside-the-box but has a
| lot of advantages, and some of the reasons we didn't do things
| this way historically have been solved in recent years.
|
| Because it was KotlinConf the demo uses PL/Java, which is pretty
| nice because there's such a healthy ecosystem of stuff based
| around JDBC and because deploying JVM stuff doesn't require any
| sort of cross-compilation. PL/Java also supports (for now)
| trusted extensions using sandboxing, although of course the
| sandbox can just get in the way and normally you trust your own
| server anyway so this is a double edged sword.
|
| The demo code can be found here (it's a prototype and nobody
| reviewed it yet so be gentle)
|
| https://github.com/hydraulic-software/bugzino
|
| I'll write up a blog post version of the talk, but for now I had
| to mention that DBaaS providers don't actually enable this sort
| of design because they like to wall off the full power of the
| RDBMS behind custom APIs. But in two-tier design you really lean
| into the database and use all of its features. So, it'd be nice
| if:
|
| a. database.dev were to support PL/Java extensions.
|
| b. Supabase were to allow direct connections, as the native DB
| protocol supports a _lot_ of features that otherwise have to be
| sort of hacked on top of HTTP. Ultimately, HTTP is designed to
| fetch hypertext whereas the PG native protocol is designed to
| work with data, and that difference shines through in a bunch of
| ways.
| saghm wrote:
| > This approach might seem horrifyingly outside-the-box but has
| a lot of advantages, and some of the reasons we didn't do
| things this way historically have been solved in recent years.
|
| From a development standpoint, I definitely think this seems
| easier! My concerns about this would be almost entirely about
| security. How does this approach avoid leaking credentials that
| people could use to access the database outside of the
| application? More generally, how does this approach deal with
| the increased attack surface for the database from exposing it
| to the open internet?
| mike_hearn wrote:
| See my other post for a discussion of non-user security
| topics.
|
| Every app user has a database user, mapped 1:1, so there are
| no credentials to leak. The DB credentials are the user
| credentials. In the demo app repository I implement open
| signup with email confirmation using stored procedures/a
| server extension. It works by creating a guest user with a
| well known password, but it's locked down so that the only
| things it can do are run a couple of stored procedures. Those
| then send an email to confirm ownership of the address before
| creating a real DB user.
|
| Open work: don't use passwords, mint SSL client certificates
| during signup with the private key stored in the OS keystore.
| Now there's no cookies or passwords to steal! Also, OAuth
| integration.
|
| The guest user / open signup is certainly a weakness for the
| demo, which is public on the internet (a dedicated cluster
| though). It's locked down as much as possible but there are
| probably lots of ways to screw with the server even without
| being granted access to anything. And there are no rate
| limits on signup. It's really a very basic exploration.
| conceptme wrote:
| Connections are pretty expensive in PG how can you do
| pooling if every user connects directly?
| giovannibonetti wrote:
| Connection poolers like PgBouncer [1] (traditional) and
| Supabase's Supavisor [2] (new) come to mind.
|
| [1] https://www.pgbouncer.org/ [2]
| https://github.com/supabase/supavisor
| mike_hearn wrote:
| Supavisor looks great. Unclear if it can be configured to
| use https://github.com/pgaudit/set_user though.
|
| That said, there are quite a lot of CRUD apps in the
| world that don't need to support lots of simultaneous
| users, or where you can just add read replicas quite
| cheaply. Think internal apps, dashboards, etc.
| GordonS wrote:
| > This approach might seem horrifyingly outside-the-box
|
| I mean, this is how we routinely built client-server apps 20-30
| years back. Everything old is new again!
|
| One of the big reasons so many moved away from this approach
| was security. Giving end users direct access to the database
| (even if "obscured"), gives me the shivers!
| mike_hearn wrote:
| One of the things that's changed in recent years is that free
| databases like PostgreSQL have far more security features.
| You can grant fine grained privileges on individual objects
| like queries (via views), individual table rows, stored
| procedures/functions and so on.
|
| In the demo app, I use a simple approach to security: users
| don't have permissions to directly access any tables. Read
| access is via views and could use row-level security (RLS),
| and write access is always via stored procedures. Stored
| procs can also act as queries, and they can be written in any
| supported extension language, so any access control and
| privacy policy is implementable.
|
| Still, security in two-tier apps is a nuanced topic. Some
| things get better and other things are worse.
|
| Some of the remaining things to solve are:
|
| 1. RDBMS engines tend to be written in C. They're good code
| and fairly trustworthy, but still, web servers tend to be
| written in memory safe languages and databases aren't.
|
| 2. DDoS defense companies have more options for protecting
| web apps (sending js challenges and captchas). If you have a
| mobile app then you're exposing raw APIs anyway, so this is
| maybe not a big change.
|
| 3. You have to set resource quotas to stop malicious users
| submitting expensive queries. Of course this is also a
| concern with web apps, just translating HTTP to PG doesn't
| make it impossible to spam servers with expensive queries.
|
| 4. Data can be protected with ACLs pretty well, but schemas
| not so much. So you could leak an upcoming feature via the
| appearance of new object names. Web apps sometimes have the
| same issue with leaking info about new features in their JS
| though, and the sky doesn't fall.
|
| For many app types the above concerns don't matter much. Any
| app that's internal or business-to-business for example. If
| you build auth around client certs then unauthorized users
| never get to connect at all.
|
| Also! Consider that this design also eliminates many kinds of
| security issue. The talk goes into this. For example, XSS and
| SQL injection are eliminated by design! Those have been two
| of the most common and destructive bug classes over time.
| Also, web servers do not exactly have a track record of being
| unhackable, especially as wrapping SQL with HTTP handlers
| introduces many new places where access checks might be
| forgotten or incorrectly implemented. A SQL GRANT statement
| is pretty transparent and the implementation is well tested
| compared to app specific logic that might have been written
| by one person and never properly tested.
|
| There's also other security-related benefits, for example,
| certificate expiry routinely causes massive and eye-
| bleedingly disruptive outages, but is eliminated as a problem
| entirely by this design. It works because code signatures are
| timestamped so don't break when the cert expires, and the db
| connection can be then be encrypted using a self-signed
| certificate that never expires. To rotate the key you roll
| out a new client.
| majkinetor wrote:
| This is basically how its done today with tools like
| Postgrest.
| benatkin wrote:
| > This approach might seem horrifyingly outside-the-box but has
| a lot of advantages
|
| You don't need to worry, there are a lot of people who are into
| stored procedures. If there weren't, there wouldn't be people
| disagreeing with them, like the author of Rails:
| http://web.archive.org/web/20060418215514/http://www.loudthi...
|
| The slippery slope of liking stored procedures a whole lot
| means them taking over the rest of the middle of the stack, and
| that isn't a new idea.
|
| I like the idea of having instant access to the database, and
| microservices are my favored approach.
| nijave wrote:
| Are there any performance issues with connection handling? Not
| sure how/where a connection pooler fits in.
| mwcampbell wrote:
| Mike has called this out as a weakness of PostgreSQL, and an
| advantage of Microsoft SQL Server and Oracle, before.
| mike_hearn wrote:
| This is also discussed in the talk, which I hope will be on
| YouTube soon enough (it was live streamed).
|
| DB connections are heavier than web server connections, in
| particular in terms of memory (when using postgres, not sure
| about mysql). Multiplexers like pgbouncer can address this,
| but then you lose a bit of the deployment simplicity.
| Fortunately RAM is pretty cheap nowadays, you can configure
| the DB to disconnect idle connections, you can use
| multiplexers and you can make the client transparently
| reconnect when a connection is needed. So it turns into a
| RAM/user latency tradeoff.
|
| There are some benefits though. Native DB protocols will tend
| to stream results in binary, avoiding a JSON encoding that
| can expand the bandwidth requirements. Also you can get rid
| of the server side load balancing, because good db drivers
| know how to load balance amongst R/O replicas already. A lot
| of deployment complexity in web apps is driven by the fact
| that browsers handle errors by just throwing up an error page
| and stopping, so you need to avoid even tiny amounts of
| downtime. When you control the client you can do failover and
| recovery more gracefully, which in turn relaxes the
| availability constraints on the servers a bit.
| nijave wrote:
| It seems like this uses native Postgres security/roles
| which I think would prevent a connection pooler from
| working (unless you held open 1 connection per user)
|
| Ultimately it'd be nice if Postgres could keep a hot worker
| pool for connections instead of forking processes but afaik
| there's been plenty of debate over the years and no
| consensus
|
| I think high connection thrashing also eats up CPU and
| there's the issue of increasing load eats more RAM which
| reduces Postgres memory cache. We had a connection leak
| problem and clearing ~2600 PG connections saved something
| like 80-100GB of RAM
| mike_hearn wrote:
| I think you can use https://github.com/pgaudit/set_user
| combined with some pgbouncer configuration to solve that,
| but agreed that postgres doesn't have perfect support for
| this scenario. It'd be better to have a proxy that is
| designed specifically for this use case that knows how to
| drive something like the set_user extension (and better
| if it were to be merged upstream).
| noob4life wrote:
| How does `index_advisor` work?
| oliverrice wrote:
| index_advisor is a project that almost made the cut for LW7. We
| decided to launch it as a pgtle instead to get some feedback
| before we (spoiler) probably release it in LW8
|
| It uses the hypopg package to very quickly generate an index
| for each column referenced in the query and then re-run's an
| explain plan to measure the estimated "total cost".
|
| If the expected "total_cost" reduces, we add the indexes to the
| list of recommendations and continue testing the remaining
| columns.
|
| Its currently limited to single column indexes only but in a
| later update we'll enhance that to do more comprehensive
| searches with multiple columns in a single index
| claytongulick wrote:
| Cool idea for an extension!
|
| I had a whole set of comments about concerns about doing this
| to a large table in prod, and statistics and such - then I
| read about hypopg.
|
| What a neat package, learned something new today.
| claytongulick wrote:
| I'd _love_ to see plv8 added to this.
|
| Jerry S. has done an amazing job of wrangling the insane v8 build
| process into something that mostly works across platforms, but
| installation is still not particularly simple.
|
| Unfortunately, due to the problems with that build process,
| Ubuntu and other flavors dropped the prebuilt apt packages.
|
| Making plv8 a simple install would be a huge boost to the
| community.
| michelpp wrote:
| Good news for you then! plv8 is a trusted language and works
| with pg_tle just fine.
|
| EDIT: I think I misunderstood your comment, plv8 itself is a
| compiled extension and thus cannot be a TLE, however TLEs _can_
| be written in plv8 once you have it installed. Sorry for the
| confusion!
| jerrysievert wrote:
| wow, thanks for the compliment! it hasn't always been easy,
| believe me.
| LunaSea wrote:
| Thank you for the amazing work indeed!
|
| I know that at one point the idea of using a non-v8
| JavaScript engine was floating around, due in part to the V8
| build complexity and the API breaking changes.
|
| Is this still something that is up in the air?
|
| I know that Bun decided to use JavaScriptCore as a recent
| example.
| jerrysievert wrote:
| I have been slowly but surely working on pljs in my spare
| time. it is nowhere near feature parity with plv8, but I
| hope to get it closer with more time. currently missing:
| * query execution/cursors * window functions
| * sub transactions
|
| it is based on quickjs, is very fast/easy to compile, and
| while in long-running functions it is slower than v8, it
| has a very fast startup time that negates that for shorter
| functions.
|
| but, it's not ready for prime time yet.
| mike_hearn wrote:
| That's interesting - could you describe what makes the build
| and install so hard? I can't think of any obvious reason why
| it should be harder than PL/Java. V8 is pretty widely used!
| jerrysievert wrote:
| v8 has its own build chain, which involves ~5gb of
| downloads. that build chain doesn't work on many platforms,
| and requires very specific versions of prerequisites
| including specific glibc versions.
|
| the build chain consists of specific compilers, specific
| versions of chromium parts and pieces, its own sys root,
| everything.
|
| since the API for v8 changes on a regular basis (not as
| much as it has in the past, at least), you cannot simply
| change out v8 versions, as they likely will not work.
|
| v8 is also compiled differently for each embedding
| application, and thus a shared object won't work correctly
| across multiple applications, and besides is strongly
| advised against by the v8 team.
|
| nodejs has its own build process that they've been
| maintaining since the v8 3.1.4 days, but it is specific to
| node.
|
| so, it just adds up to a difficult maintenance.
| mike_hearn wrote:
| Wow. That's kind of surprising. HotSpot is just
| autotools: ./configure && make and you're done. Plus the
| embedding APIs are stable.
|
| Do you have any insight into what led them to have such a
| complex build? Is it just because it's a part of the
| Chromium tree? It doesn't seem inherent to the problem
| space.
| jerrysievert wrote:
| they use their own build systems for everything, even
| skia. though skia has a bit less strict build
| requirements.
| foreigner wrote:
| Can I use this on AWS RDS?
| oliverrice wrote:
| RDS does have the underlying native extension that is required
| to install user defined packages (pg_tle) from database.dev. It
| doesn't currently have support for the extension that is
| necessary to make the in-database client work to install
| packages directly from the registry.
|
| That's great feedback though: We'll add a new page to the
| website for each package that provides the snippet you can run
| on an RDS instance to load the extension and then run `create
| extension`.
|
| We're also in talks with RDS team and experimenting with a few
| options that would allow users to query the registry directly.
| The most promising one we have found so far is using AWS Lambda
| with RDS's aws_lambda extension to do that http part, and pass
| the contents back to the database. Still a WIP though!
| paulddraper wrote:
| No.
|
| RDS has a strict list of approved extensions. (Since AWS is on
| the hook for the system stability, security, replication, etc.)
| supamichelp wrote:
| RDS for Postgres 14+ _does_ come with pg_tle, but it does not
| come with pgsql-http, so you cannot use the dbdev installer.
| But, you _can_ still install TLEs, you just have to do so
| "by hand" by calling `pgtle.install_extension()` yourself,
| probably from a migration script depending on your tooling.
|
| See the RDS docs here: https://docs.aws.amazon.com/AmazonRDS/
| latest/UserGuide/Postg...
| samwillis wrote:
| Partially correct, this doesn't work with RDS as you can't
| install pgsql-http (a requirement for automatic install).
| However all the extensions listed are "Trusted Language
| Extensions for PostgreSQL" which was developed by AWS to
| enable installing extensions written in "Safe" languages. So
| I believe that all the extensions listed could be installed
| manually.
|
| https://github.com/aws/pg_tle
|
| https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Postg.
| ..
| paulddraper wrote:
| Oh cool!
| [deleted]
| nextaccountic wrote:
| From https://supabase.com/blog/dbdev
|
| > dbdev fills the same role for PostgreSQL as npm for JavaScript,
| pip for Python and cargo for Rust in that it enables publishing
| libraries and applications for repeatable deployment. We'll be
| releasing the tooling necessary for third-parties to publish
| pglets to the registry once we've collected some community
| feedback and incorporate any great new ideas. Our goal is to
| create an open ecosystem for packaging and discovering SQL.
|
| How does https://pgxn.org/ fit in this ecosystem?
| supamichelp wrote:
| pgxn does not (yet) support the notion of TLEs, but it
| certainly could with minimal changes. We're currently in
| discussion with the pgxn maintainers about how that would look.
| ccleve wrote:
| TLEs = trusted language extensions?
| [deleted]
| supamichelp wrote:
| Yes.
___________________________________________________________________
(page generated 2023-04-14 23:00 UTC)