[HN Gopher] How We Turn Authorization Logic into SQL
___________________________________________________________________
How We Turn Authorization Logic into SQL
Author : gneray
Score : 87 points
Date : 2021-11-23 18:43 UTC (4 hours ago)
(HTM) web link (www.osohq.com)
(TXT) w3m dump (www.osohq.com)
| winrid wrote:
| ISO any articles/documents related to scaling access control, for
| example if you have 100_000 users and 90k of them have access to
| some resource, but 10k do not, and you can't use groups that your
| customer knows about. Obvious solutions are "where
| allowed_user_ids = ... big list" or "where disallowed_user_ids NE
| ... small list"; the latter not a solution as you can't optimize
| this query with a normal tree-like index.
|
| I suppose you could use some sort of bloom filter, or
| create/maintain groups behind the scenes somehow, but haven't
| seen many articles cover this.
| tmoertel wrote:
| There's the paper on Google's Zanzibar:
|
| https://research.google/pubs/pub48190/
|
| "This paper presents the design, implementation, and deployment
| of Zanzibar, a global system for storing and evaluating access
| control lists. Zanzibar provides a uniform data model and
| configuration language for expressing a wide range of access
| control policies from hundreds of client services at Google,
| including Calendar, Cloud, Drive, Maps, Photos, and YouTube.
| Its authorization decisions respect causal ordering of user
| actions and thus provide external consistency amid changes to
| access control lists and object contents. Zanzibar scales to
| trillions of access control lists and millions of authorization
| requests per second to support services used by billions of
| people. It has maintained 95th-percentile latency of less than
| 10 milliseconds and availability of greater than 99.999% over 3
| years of production use."
| [deleted]
| pphysch wrote:
| "RBAC Like It Was Meant To Be" https://tailscale.com/blog/rbac-
| like-it-was-meant-to-be/
|
| The key insight is to define policies as purely virtual; a
| policy never refers to real users or real resources. Instead,
| it refers to virtual users (via "roles") or virtual resources
| (via "tags"), which can then be associated with real
| users/resources without modifying the policy.
| jzelinskie wrote:
| Disclosure: I'm a cofounder of Authzed, a YC company building a
| permissions database[0] inspired by Google's Zanzibar paper[1].
|
| This is actually a really hard problem and depends on the
| systems with which you are integrating. We call this problem
| "ACL filtering"[2] and there are two general strategies: pre
| and post filtering.
|
| We have a blog post[3] describing our API for pre-filtering
| which can stream results that you can then use build a SQL
| query or data-structures like bloom filters/bitmaps. We
| currently have a proposal on GitHub[4] for an extension to that
| strategy adding a denormalization/caching layer (which is
| applying a similar strategy to the Leopard Indexing system
| internally at Google).
|
| You might also be surprised at the performance you can achieve
| with post-filtering by building an iterator in your programming
| language of choice that will batch together permission checks
| and amortize the cost of filtering those results from the set
| of all results that you pull out of your database.
|
| Additionally, if you're interested deeper database specific
| integrations, there are hooks into various components such as
| Postgres's Row Level Security, but that typically means
| eschewing a cloud service to operate your database for you
| (e.g. RDS) so that you can install your own plugins.
|
| [0]: https://github.com/authzed/spicedb
|
| [1]: https://authzed.com/blog/what-is-zanzibar/
|
| [2]: https://docs.authzed.com/reference/glossary#acl-filtering
|
| [3]: https://authzed.com/blog/acl-filtering-in-authzed/
|
| [4]: https://github.com/authzed/spicedb/issues/207
| rzzzt wrote:
| Seconded! I can get to a million authentication-related
| articles and services, but "authorization theory" seems to be
| very hard to find. (It doesn't help that the big text box in
| the sky treats the two words as somewhat interchangeable.)
| deepsun wrote:
| Wasn't LDAP developed exactly for that?
| eatonphil wrote:
| Are there any policy-language-libraries-backed-by-sql like Polar
| but that aren't based on logic programming languages? I don't
| really want to learn logic programming for this purpose nor do I
| want to require it on my coworkers.
|
| I guess I'm just looking for a library + SQL shorthand that can
| easily interpolate request variables and session variables that
| gets declared in code where a route is declared. Just spitballing
| but something like `(blogs.id = $req.blogid).userid =
| $session.userid OR (users.id = $session.userid).isAdmin`.
|
| This [0] is close but it doesn't have enough momentum to be well
| documented let alone usable as a library in every language you'd
| want (Go, C#, Python, Node.js, etc.).
|
| Edit: Maybe OPA/Rego can in fact do this [1].
|
| [0] https://github.com/mrumkovskis/tresql
|
| [1] https://blog.openpolicyagent.org/write-policy-in-opa-
| enforce...
| nicoburns wrote:
| What advantage would this have over a middleware that
| implements this logic as a lambda?
| evancordell wrote:
| Rego is a nice language to use (IMO), but also has roots in
| logic programming and is closer to datalog than SQL.
| craz wrote:
| If you found this post interesting, here's another great post
| about doing a similar thing with OPA Policies:
| https://blog.openpolicyagent.org/write-policy-in-opa-enforce...
| agentultra wrote:
| fwiw, PostgreSQL has a built-in mechanism for filtering rows
| based on authorization rules: _row-level security_ [0].
|
| This can simplify your data-access layers quite a lot and pushes
| you towards better security practices like limiting the scope of
| permissions granted to your applications' role.
|
| If you like Polar but can't use it for whatever reason it does a
| lot of what Polar does.
|
| [0] https://www.postgresql.org/docs/9.5/ddl-rowsecurity.html
| revskill wrote:
| Good pattern ! At least a library that makes sense. Thanks.
|
| One point about production usage, you should adapt to a NoSQL
| backend as we want to query authorization logic against a caching
| layer for performance reason.
| lpapez wrote:
| Why not simply add caching to the existing SQL solution?
| epberry wrote:
| > Oso's Django and SQLAlchemy integrations turn partials from
| Polar into database queries... The SQL they produce relies
| heavily on nested subqueries
|
| Live by ORM, die by ORM. This strikes me as particularly bad
| because these authorization queries may be running on every
| request. It's great to see Oso went direct to SQL to address
| this. And the asides about logical programming were fun as well.
| aidos wrote:
| But you can build any query with Sqlalchemy - even when using
| the ORM rather than the Core, you still have that control,
| right?
|
| Don't get me wrong, I've read stuff by these guys and I know
| they know what they're talking about. Certainly, if you have to
| make it work for Django as well you'd be considering going
| straight to sql.
|
| As and aside, I walked through their stuff a little while back
| and I'm very interested in what they're building.
| gfody wrote:
| agree this seems like the worst of both worlds between doing
| things application-side and doing things database-side. imo
| you're much better off embracing your db's security features
| airstrike wrote:
| Building bad queries with an ORM isn't necessarily the ORM
| engine's fault. It's really just a tool and, as such, its value
| really relies on one's proficiency with it.
|
| One simple but illustrative example:
| https://docs.djangoproject.com/en/3.2/ref/models/querysets/#...
| throw_m239339 wrote:
| Well (most) ORM often cannot take advantage of CTE, group
| concat, array types, json types and what not when building
| queries from code, for obvious reason. So they often can't
| return multi-dimensional data in a single query.
|
| But ORM are still useful to get an application off the ground
| fast.
| gfody wrote:
| do ORMs still target lowest common denominator ansi sql? I
| only know that EF doesn't - it has "providers" for each rdbms
| it supports and will liberally emit CTEs and db specific
| language features like "outer apply" or "join lateral" but
| then stupidly ignore powerful features like "for json" or
| "json_agg/json_build_object" which could eliminate the
| outrageously naive way it encodes 1:m and m:m results - it
| boggles how smart and stupid orms are at the same time.
| jiggawatts wrote:
| JSON is too "lossy" to use automatically.
| aidos wrote:
| To be clear, sqlalchemy does support those features.
| pphysch wrote:
| This seems like a fairly brute-force policy management approach
| that has the shortcomings outlined in this article from
| TailScale: https://tailscale.com/blog/rbac-like-it-was-meant-to-
| be/
|
| What happens when you have sweeping changes to existing policies?
| It seems like you have to chase down every other line of DSL and
| fix policies individually.
| ZikPhil wrote:
| Man, OSO write such good blog posts
| gneray wrote:
| Thanks. What would you want to us to write about next?
| jauco wrote:
| The question by winrid on scaling authz would be interesting
|
| https://news.ycombinator.com/item?id=29323287
| gneray wrote:
| Yeah absolutely
___________________________________________________________________
(page generated 2021-11-23 23:00 UTC)