[HN Gopher] Realtime Postgres Row Level Security
       ___________________________________________________________________
        
       Realtime Postgres Row Level Security
        
       Author : kiwicopple
       Score  : 119 points
       Date   : 2021-12-01 15:00 UTC (8 hours ago)
        
 (HTM) web link (supabase.com)
 (TXT) w3m dump (supabase.com)
        
       | polycaster wrote:
       | What are approaches to you prevent DoS attacks with such an
       | architecture? It seems trivial for an attacker to generate super
       | expensive queries en masse.
       | 
       | Strict row limits could bring some relief but don't solve the
       | problem.
       | 
       | I bet there are solutions for this but it's not very obvious.
       | Would someone briefly explain?
        
       | tehnorm wrote:
       | We currently make use of Supabase and it's been fantastic. It's
       | enabled us to completely get away from having a traditional
       | backend/API by utilizing RLS and it's realtime nature to have
       | data directly in the UI. Rough numbers are showing we cut
       | development time by a third vs a traditional approach.
        
         | iwebdevfromhome wrote:
         | When I think of postgrest, supabase and other tools that allows
         | you skip the backend completely and go straight to the DB; is
         | how do you handle business logic that doesn't make sense to
         | have in either the frontend or the DB ?
        
           | thelastbender12 wrote:
           | I have the same question. Since Supabase still exposes the
           | connection parameters to the database, we could spin up
           | another Flask server to deal with routes that are not
           | expressible as simple CRUD operations. The JWT secret shared
           | by the auth service (GoTrue) with Postgrest is available too,
           | so I could also use it to authenticate the users myself.
           | 
           | @supabase developers, is this a reasonable way to do things
           | or am I missing something?
        
           | polskibus wrote:
           | You can do business logic in stored procedures, functions and
           | views.
        
           | michelpp wrote:
           | My go-to approach is to insert jobs into a queue table, and
           | then have backend workers that consume items from the queue.
           | This has a number of advantages:
           | 
           | 1. Faster user experience, the user isn't waiting for the
           | business logic to complete, inserting in the queue should
           | only take a couple of milliseconds.
           | 
           | 2. It's more secure, the web worker can have minimal
           | privileges (INSERT only on the queue table) but the backend
           | workers can have much more privilege because they are not
           | user facing.
           | 
           | 3. You can scale the web workers orthogonal to the queue
           | workers, as they'll likely have very different scaling
           | properties.
           | 
           | Supabase also has a WAL decoding tool called WALRUS that I
           | have not tried yet, and that could be the most efficient
           | approach going forward. The tradeoff with queue tables is
           | that the tables are persistent and a bit more resilient to
           | failure/retry.
        
             | cpursley wrote:
             | Do you have more info on the WALRUS stuff? Is that part of
             | the Elixir lib?
        
           | cpursley wrote:
           | I actually extracted out the WAL listening bit of code from
           | Supabase and use that to listen to changes in Postgres then
           | do callback style business logic in Elixir. If you like
           | Elixir, this could be an option.
           | 
           | I haven't had time to work on it myself until recently, but I
           | brought up the idea of making that bit a separate library
           | (and I understand that the Supabase folks are super busy).
           | 
           | https://github.com/supabase/realtime/issues/146#issue-874855.
           | ..
        
           | pgroves wrote:
           | There are lots of simple things that are normally easier to
           | do in the web framework that are suddenly easier to do in the
           | database (with the side effect that you can do DB
           | optimizations much easier).
           | 
           | But the other consideration is that you likely need to do a
           | lot with a reverse-proxy like traefik to have much control of
           | what you are really exposing to the outside world. PostgREST
           | is not Spring, it doesn't have explicit control over every
           | little thing so you're likely to need something in front of
           | it. Anyway, point is that having a simple Flask server with a
           | few endpoints running wouldn't complicate the architecture
           | very much b/c you are better off with something in front of
           | it doing routing already (and ssl termination, etc).
        
         | pgroves wrote:
         | I'm on a POC project that's using PostgREST and it's been
         | extremely fast to get a big complicated data model working with
         | an API in front of it. But I guess I don't get how to really
         | use this thing in reality? What does devops look like? Do you
         | have sophisticated db migrations with every deploy? Is all the
         | SQL in version control?
         | 
         | I also don't really get where the users get created in postgres
         | that have all the row-level permissions. The docs are all about
         | auth for users that are already in there.
        
           | michelpp wrote:
           | This is my personal experience with using PostgREST (I
           | haven't had the full supabase experience yet):
           | 
           | > What does devops look like?
           | 
           | I usually spin PostgREST workers up in some kind of managed
           | container service, like Google Compute Engine. PostgREST is
           | stateless, so other than upgrades, you never really need to
           | cycle the services. As for resources PostgREST is extremely
           | lean, I usually try to run 4 to 8 workers per gigabyte of
           | RAM.
           | 
           | > Do you have sophisticated db migrations with every deploy?
           | 
           | You can use whatever migration tool your want. Sqitch is
           | quite popular. I've even worked on projects that were
           | migrated by Django but PostgREST did the API service.
           | 
           | > Is all the SQL in version control?
           | 
           | Yes this is a good approach, but it means needing a migration
           | tool to apply the migrations in the right order, this is what
           | Sqitch does and many ORMy libraries have migration sort of
           | half-baked in.
           | 
           | It's worth noting that because many of the objects that
           | PostgREST deals with are views, which have no persistent
           | state, the migration of the views can be _decoupled_ from the
           | migration of the persistent objects like tables. Replacing a
           | view (with CREATE OR REPLACE VIEW) can be done very quickly
           | without locking tables as long as you don 't change the
           | view's schema.
        
           | kiwicopple wrote:
           | In Supabase we use a separate Auth server [0]. This stores
           | the user in an `auth` schema, and these users can login to
           | receive a JWT. Inside the JWT is a "role", which is, in fact,
           | a PostgreSQL role ("authenticated") that has certain grants
           | associated to it, and the user ID (a UUID).
           | 
           | Inside your RLS Policies you can use anything stored inside
           | the JWT. My cofounder made a video [1] on this which is quite
           | concise. Our way of handling this is just an extension of the
           | PostgREST Auth recommendations:
           | https://postgrest.org/en/v9.0/auth.html
           | 
           | [0] Auth server: https://github.com/supabase/gotrue
           | 
           | [1] RLS Video: https://supabase.com/docs/learn/auth-deep-
           | dive/auth-row-leve...
        
       | kiwicopple wrote:
       | Hey HN, Supabase is an open source Firebase alternative. We're
       | building the features of Firebase using enterprise-grade open
       | source tools. We're particularly focused on scalability. We take
       | proven tools like Postgres, and we make them as easy to use as
       | Firebase.
       | 
       | Today, Supabase is adding Row Level Security (RLS) to our
       | Realtime engine.
       | 
       | The linked blog post goes into depth around the technical
       | implementation, so I'll just give a recap for the people who jump
       | straight to comments (like me).
       | 
       | Supabase was launched here on HN when we open sourced our
       | Realtime engine[0] - an Elixir server which clients (i.e. website
       | visitors/users) can connect to via websockets and receive a
       | stream of PostgreSQL changes.
       | 
       | The server receives those changes via a logical replication slot
       | - the same system that PostgreSQL uses for replicating to other
       | databases.
       | 
       | To achieve RLS we added a few SQL functions, the main one is
       | apply_rls[1] which the stream is filtered through. For every user
       | connected to the Elixir server, the Postgres function checks if
       | they have access to the database change and appends an array of
       | allowed user IDs. The Realtime server then delivers the change to
       | the user only if the connected user is matched in this array.
       | 
       | This one has been a long time coming, and it's one of the reasons
       | why we have maintained our "beta" badge for so long. A few of the
       | team will be here to answer any questions - my cofounder @awalias
       | and @steve-chavez from PostgREST, @inian, @wenbo and @1_over_n
       | 
       | [0] Realtime Show HN:
       | https://news.ycombinator.com/item?id=22114560)
       | 
       | [1] SQL function:
       | https://github.com/supabase/realtime/blob/master/server/priv...
        
         | BrandiATMuhkuh wrote:
         | I just want so say thank you. RLS is the feature I was waiting
         | for. It's an impressive work you have done!
        
         | me_me_mu_mu wrote:
         | Looks really interesting, looking at your docs. I have an
         | application using firebase + geofire where users can see things
         | on a map, and those results are themselves updated (ex. real
         | time reactions). However, geofire limits me heavily because I
         | can't add additional filtering (ex. timestamp) and lack of bbox
         | queries.
         | 
         | Can I use supabase's postgres + postgis setup, and support the
         | same type of real time functionality where I can get a bbox
         | result set to display on a map, and then when one of those
         | items gets a reaction or something from some user, that change
         | should propagate to anyone else who might have that particular
         | item in their current map bbox result set. Just wondering if
         | you know if that's possible. Thanks
        
         | golemiprague wrote:
         | Looking at Hasura they provide also column level security and
         | option to add custom logic via actions. Is this something
         | planned in the future and in general how would you compare the
         | two systems and what are the advantages or disadvantages of
         | Supabase?
        
       | Ericson2314 wrote:
       | The row level security people should talk to the capability
       | security people.
       | 
       | Probably _the_ main problem with RDBMS is also it 's biggest
       | benefit: you write code that naively looks some bad polynomial
       | time, and then query planner does the magic and use indices. The
       | problem is, of course, when the index doesn't exist or the query
       | planner doesn't do a good job.
       | 
       | What would be nice is a way to _mandate_ that queries must go
       | through certain indices. But this is in fact good for row level
       | security too!
       | 
       | Functional programming is deeply related to capability-based
       | security, in that both try to get rid of all ambient authority
       | and mediate all actions through some sort of reference. Rust,
       | with it's many types of reference makes the relationship even
       | clearer. From these vantage points, it is very natural to use
       | data structure both to make things performant _and_ enforce
       | certain invariant.
       | 
       | Indices are the closes data structures in RDBMSs, because they do
       | promote only certain access patterns. With the rule that queries
       | must go through the indices, they enforce those access pattern,
       | and thus we have the same proven tool. Combine them with views
       | for filtering out data, (which we can think in conjunction with
       | more indices of as sort of a curried indexes Map<UserID,
       | Map<Stuff, OtherStuff>> even if they aren't implemented that
       | way), and now we have very powerful tools for guaranteeing
       | performance and security.
       | 
       | As an icing on the cake, I think enforcing limited access
       | patterns could get us closer to that long-sought goal of a RMBDS
       | that automatically knows how to distribute itself. The limited
       | access patterns open up possible options, because we which of the
       | pathological joins that every sharding possibility run into are
       | no longer allowed. Reusing the same query planner tricks,
       | "cardinality analysis" type stuff can be used to choose between
       | those possibilities.
       | 
       | Here's a worked out concrete example:
       | 
       | Tables:                  User: { k: UserID PK, t: TennentId, n:
       | Name, isAdmin: Bool },        Todo: { k: TodoID PK, u: UserID, d:
       | Date, m: Msg }
       | 
       | Indices and Views:
       | 
       | # exists_classical means proof of existence but we don't care
       | which.                   Map<(t: TennentId, exists_classical u:
       | User. u.t = t, u.isAdmin = true), {u2 <- User, u2.t = t}>
       | Map<(t: TennentId, exists_classical u: User. u.t = t, u.isAdmin =
       | true), {t <- Todo, u2 <- User, t.u = u2.k, u2.t = t}>
       | 
       | ^ that's a joined view with compression syntax
       | 
       | # exists_constructive means need to show existence and we do see
       | what item witness it                   Map<t: TennantID, Map<uk:
       | UserID, exist_constructive u: User. u.t = t, u.k = uk>>
       | Map<(t: TennentId, Map<uk: UserId, exists_classical u: User. u.t
       | = t, u.k = uk, {t <- Todo, t.u = u.k}>>
       | 
       | The more worked out system would probably allow tupling stuff to
       | not repeat the "Map<(t: TennentId," at the beginning.
       | 
       | A few things to note:
       | 
       | 1. When we do a comprehension syntax, we are "returning the whole
       | table" i.e. an unconstrained view that can be queried in an
       | arbitrary way. A joined table has to be joined in the given way,
       | but is otherwise unconstrained.
       | 
       | 2. exist_constructive is the same as a { ... } comprehension, but
       | the cardinality must be 1 as proven by the unique constraints.
       | exists_classical is a comprehension for an arbitrary number of
       | stuff, that is then _truncated_ in the type theory sense.
       | https://xenaproject.wordpress.com/2021/05/19/the-trace-of-an...
       | is a good discussion of an example of this truncation.
       | 
       | 3. Normally one puts the TennentID on every table, even though it
       | is denormalizing. Now the access patterns enforce this.
       | 
       | Clearly this is a few PhD theses away from actually existing :),
       | but all the embryonic theory comes from good well-established
       | things, which makes me confident.
        
         | steve-chavez wrote:
         | > What would be nice is a way to mandate that queries must go
         | through certain indices.
         | 
         | Yes, this is an interesting idea. We're kinda gravitating
         | towards this approach in PostgREST lately. For example, on a
         | recent discussion about integrating with PGroonga[1], we were
         | thinking of only exposing its operators once an index is
         | enabled on a column:                 CREATE INDEX index_name ON
         | table USING pgroonga (column);
         | 
         | Enables doing:                 GET
         | /table?column=groonga_query.PostgreSQL
         | 
         | Which roughly translates to:                 SELECT * FROM
         | table WHERE column &@~ 'PostgreSQL';
         | 
         | Of course this is PostgREST-specific, only when going through
         | it you'd enforce this restriction.
         | 
         | [1]: https://github.com/PostgREST/postgrest/issues/2028
        
       | SahAssar wrote:
       | I've been developing an app on postgrest (which supabase uses)
       | for a while, and my solution to this was to have a NOTIFY trigger
       | and a small program that broadcasted the changed ID's over SSE to
       | clients. The clients would then fetch the changes.
       | 
       | As for the RLS aspect I considered it "good enough" to allow all
       | clients to know the changed ID (an UUID) as long as they got no
       | other info, and planned to have the trigger also send the allowed
       | groups in the NOTIFY so that the program could filter without
       | needing a separate PG connection per each subscriber. In my
       | testing it scales very well.
       | 
       | Anyone from supabase that can comment why LISTEN/NOTIFY triggers
       | were not used, and why WS was used over SSE when the
       | communication is not bi-directional? It'd be interesting to hear
       | your thoughts.
        
         | kiwicopple wrote:
         | > LISTEN/NOTIFY triggers were not used
         | 
         | We started with this implementation, however there are 2 issues
         | with this approach. 1) NOTIFY has a limit of 8000 bytes, so
         | large rows are silently dropped. 2) you need to attach the
         | notify trigger to every table (vs a Publication, which can
         | listen to an entire database in a single line of code).
         | 
         | > why WS was used
         | 
         | The Server is Elixir (Erlang) so it's very good at WS. Over
         | time we will add other multiplayer useful features (like
         | presence)
        
           | SahAssar wrote:
           | Thanks for the response! I have a few more questions if you
           | have the time:
           | 
           | If I'm reading the article correctly this feature is only
           | available for authenticated users? Is that restriction based
           | on scaling or some other limitation?
           | 
           | When I looked at the example query in the article it only
           | queries the primary key, does this also work for tables where
           | the GRANT hides certain columns from different users?
        
       ___________________________________________________________________
       (page generated 2021-12-01 23:01 UTC)