[HN Gopher] How We Turn Authorization Logic into SQL
       ___________________________________________________________________
        
       How We Turn Authorization Logic into SQL
        
       Author : gneray
       Score  : 87 points
       Date   : 2021-11-23 18:43 UTC (4 hours ago)
        
 (HTM) web link (www.osohq.com)
 (TXT) w3m dump (www.osohq.com)
        
       | winrid wrote:
       | ISO any articles/documents related to scaling access control, for
       | example if you have 100_000 users and 90k of them have access to
       | some resource, but 10k do not, and you can't use groups that your
       | customer knows about. Obvious solutions are "where
       | allowed_user_ids = ... big list" or "where disallowed_user_ids NE
       | ... small list"; the latter not a solution as you can't optimize
       | this query with a normal tree-like index.
       | 
       | I suppose you could use some sort of bloom filter, or
       | create/maintain groups behind the scenes somehow, but haven't
       | seen many articles cover this.
        
         | tmoertel wrote:
         | There's the paper on Google's Zanzibar:
         | 
         | https://research.google/pubs/pub48190/
         | 
         | "This paper presents the design, implementation, and deployment
         | of Zanzibar, a global system for storing and evaluating access
         | control lists. Zanzibar provides a uniform data model and
         | configuration language for expressing a wide range of access
         | control policies from hundreds of client services at Google,
         | including Calendar, Cloud, Drive, Maps, Photos, and YouTube.
         | Its authorization decisions respect causal ordering of user
         | actions and thus provide external consistency amid changes to
         | access control lists and object contents. Zanzibar scales to
         | trillions of access control lists and millions of authorization
         | requests per second to support services used by billions of
         | people. It has maintained 95th-percentile latency of less than
         | 10 milliseconds and availability of greater than 99.999% over 3
         | years of production use."
        
         | [deleted]
        
         | pphysch wrote:
         | "RBAC Like It Was Meant To Be" https://tailscale.com/blog/rbac-
         | like-it-was-meant-to-be/
         | 
         | The key insight is to define policies as purely virtual; a
         | policy never refers to real users or real resources. Instead,
         | it refers to virtual users (via "roles") or virtual resources
         | (via "tags"), which can then be associated with real
         | users/resources without modifying the policy.
        
         | jzelinskie wrote:
         | Disclosure: I'm a cofounder of Authzed, a YC company building a
         | permissions database[0] inspired by Google's Zanzibar paper[1].
         | 
         | This is actually a really hard problem and depends on the
         | systems with which you are integrating. We call this problem
         | "ACL filtering"[2] and there are two general strategies: pre
         | and post filtering.
         | 
         | We have a blog post[3] describing our API for pre-filtering
         | which can stream results that you can then use build a SQL
         | query or data-structures like bloom filters/bitmaps. We
         | currently have a proposal on GitHub[4] for an extension to that
         | strategy adding a denormalization/caching layer (which is
         | applying a similar strategy to the Leopard Indexing system
         | internally at Google).
         | 
         | You might also be surprised at the performance you can achieve
         | with post-filtering by building an iterator in your programming
         | language of choice that will batch together permission checks
         | and amortize the cost of filtering those results from the set
         | of all results that you pull out of your database.
         | 
         | Additionally, if you're interested deeper database specific
         | integrations, there are hooks into various components such as
         | Postgres's Row Level Security, but that typically means
         | eschewing a cloud service to operate your database for you
         | (e.g. RDS) so that you can install your own plugins.
         | 
         | [0]: https://github.com/authzed/spicedb
         | 
         | [1]: https://authzed.com/blog/what-is-zanzibar/
         | 
         | [2]: https://docs.authzed.com/reference/glossary#acl-filtering
         | 
         | [3]: https://authzed.com/blog/acl-filtering-in-authzed/
         | 
         | [4]: https://github.com/authzed/spicedb/issues/207
        
         | rzzzt wrote:
         | Seconded! I can get to a million authentication-related
         | articles and services, but "authorization theory" seems to be
         | very hard to find. (It doesn't help that the big text box in
         | the sky treats the two words as somewhat interchangeable.)
        
         | deepsun wrote:
         | Wasn't LDAP developed exactly for that?
        
       | eatonphil wrote:
       | Are there any policy-language-libraries-backed-by-sql like Polar
       | but that aren't based on logic programming languages? I don't
       | really want to learn logic programming for this purpose nor do I
       | want to require it on my coworkers.
       | 
       | I guess I'm just looking for a library + SQL shorthand that can
       | easily interpolate request variables and session variables that
       | gets declared in code where a route is declared. Just spitballing
       | but something like `(blogs.id = $req.blogid).userid =
       | $session.userid OR (users.id = $session.userid).isAdmin`.
       | 
       | This [0] is close but it doesn't have enough momentum to be well
       | documented let alone usable as a library in every language you'd
       | want (Go, C#, Python, Node.js, etc.).
       | 
       | Edit: Maybe OPA/Rego can in fact do this [1].
       | 
       | [0] https://github.com/mrumkovskis/tresql
       | 
       | [1] https://blog.openpolicyagent.org/write-policy-in-opa-
       | enforce...
        
         | nicoburns wrote:
         | What advantage would this have over a middleware that
         | implements this logic as a lambda?
        
         | evancordell wrote:
         | Rego is a nice language to use (IMO), but also has roots in
         | logic programming and is closer to datalog than SQL.
        
       | craz wrote:
       | If you found this post interesting, here's another great post
       | about doing a similar thing with OPA Policies:
       | https://blog.openpolicyagent.org/write-policy-in-opa-enforce...
        
       | agentultra wrote:
       | fwiw, PostgreSQL has a built-in mechanism for filtering rows
       | based on authorization rules: _row-level security_ [0].
       | 
       | This can simplify your data-access layers quite a lot and pushes
       | you towards better security practices like limiting the scope of
       | permissions granted to your applications' role.
       | 
       | If you like Polar but can't use it for whatever reason it does a
       | lot of what Polar does.
       | 
       | [0] https://www.postgresql.org/docs/9.5/ddl-rowsecurity.html
        
       | revskill wrote:
       | Good pattern ! At least a library that makes sense. Thanks.
       | 
       | One point about production usage, you should adapt to a NoSQL
       | backend as we want to query authorization logic against a caching
       | layer for performance reason.
        
         | lpapez wrote:
         | Why not simply add caching to the existing SQL solution?
        
       | epberry wrote:
       | > Oso's Django and SQLAlchemy integrations turn partials from
       | Polar into database queries... The SQL they produce relies
       | heavily on nested subqueries
       | 
       | Live by ORM, die by ORM. This strikes me as particularly bad
       | because these authorization queries may be running on every
       | request. It's great to see Oso went direct to SQL to address
       | this. And the asides about logical programming were fun as well.
        
         | aidos wrote:
         | But you can build any query with Sqlalchemy - even when using
         | the ORM rather than the Core, you still have that control,
         | right?
         | 
         | Don't get me wrong, I've read stuff by these guys and I know
         | they know what they're talking about. Certainly, if you have to
         | make it work for Django as well you'd be considering going
         | straight to sql.
         | 
         | As and aside, I walked through their stuff a little while back
         | and I'm very interested in what they're building.
        
         | gfody wrote:
         | agree this seems like the worst of both worlds between doing
         | things application-side and doing things database-side. imo
         | you're much better off embracing your db's security features
        
         | airstrike wrote:
         | Building bad queries with an ORM isn't necessarily the ORM
         | engine's fault. It's really just a tool and, as such, its value
         | really relies on one's proficiency with it.
         | 
         | One simple but illustrative example:
         | https://docs.djangoproject.com/en/3.2/ref/models/querysets/#...
        
         | throw_m239339 wrote:
         | Well (most) ORM often cannot take advantage of CTE, group
         | concat, array types, json types and what not when building
         | queries from code, for obvious reason. So they often can't
         | return multi-dimensional data in a single query.
         | 
         | But ORM are still useful to get an application off the ground
         | fast.
        
           | gfody wrote:
           | do ORMs still target lowest common denominator ansi sql? I
           | only know that EF doesn't - it has "providers" for each rdbms
           | it supports and will liberally emit CTEs and db specific
           | language features like "outer apply" or "join lateral" but
           | then stupidly ignore powerful features like "for json" or
           | "json_agg/json_build_object" which could eliminate the
           | outrageously naive way it encodes 1:m and m:m results - it
           | boggles how smart and stupid orms are at the same time.
        
             | jiggawatts wrote:
             | JSON is too "lossy" to use automatically.
        
           | aidos wrote:
           | To be clear, sqlalchemy does support those features.
        
       | pphysch wrote:
       | This seems like a fairly brute-force policy management approach
       | that has the shortcomings outlined in this article from
       | TailScale: https://tailscale.com/blog/rbac-like-it-was-meant-to-
       | be/
       | 
       | What happens when you have sweeping changes to existing policies?
       | It seems like you have to chase down every other line of DSL and
       | fix policies individually.
        
       | ZikPhil wrote:
       | Man, OSO write such good blog posts
        
         | gneray wrote:
         | Thanks. What would you want to us to write about next?
        
           | jauco wrote:
           | The question by winrid on scaling authz would be interesting
           | 
           | https://news.ycombinator.com/item?id=29323287
        
             | gneray wrote:
             | Yeah absolutely
        
       ___________________________________________________________________
       (page generated 2021-11-23 23:00 UTC)