How We Turn Authorization Logic Into SQL

How We Turn Authorization Logic into SQL

Authorization questions can take many different forms in your application. The most obvious one, given kitten and resource, is to ask something like "Can kitten access this resource?" A slightly more sophisticated but still common case is to ask "What are all the resources that kitten can access?"

Letting users answer the second type of question is the goal of a project we call data filtering. Briefly, data filtering turns authorization logic from our declarative policy language, Polar, into SQL to enable efficient and secure data access. In this post I'll describe how we built data filtering for the first, second, and third times, and talk about the functionality we hope to make available to all of our users over the coming releases.

Oso and Polar

To understand exactly how we're approaching the problem, it helps to know a little bit about how Oso works under the hood. Oso is a cross-platform authorization framework based a declarative policy language that we built, called Polar. More specifically, Polar is an interpreted logic programming language, and it runs inside a library loaded by your application. To make a specific authorization decision, the Polar interpreter evaluates your authorization policy (a Polar program) for a given “subject verb object” triple (we usually give them the names actor, action, and resource).

Polar is a logic programming language based on Prolog. Logic programming is an underutilized programming style where programs are expressed as logical formulas and executed by trying to verify them for a given set of values for their variables. A feature of logic programming that sets it apart from other styles is that in many cases, not every variable needs to have an associated value in order for a program that refers to it to continue executing.

This means that an expression like kitten(Max) has two possible interpretations: either we know who Max is, so we should be able to say whether or not they're kitten; or we don't, in which case we treat this as an assertion: Max is kitten until proven otherwise.

🐻‍❄️ vs 🐍

The same idea extends to rules with more arguments: as an example, let’s compare the Polar rule

add(a, b, c) if a + b = c;

with the Python function

def add(a, b):
    return a + b

The most obvious change between the two versions is their arity: the Python function takes two arguments while the Polar rule takes three. The second most obvious difference is the absence of anything like a return statement in the Polar rule. This would be the usual in an expression-oriented language like Ruby or Rust, where the return would be implicit, but Polar rules don’t return values at all; they only make assertions about their arguments. Nevertheless, both definitions can be used to find the sum of two numbers. The Python function returns the sum as a value, while the Polar rule assigns the sum to its third argument:

# Python
sum = add(2, 2)
# Polar
add(2, 2, sum)

But the power of logic programming makes the Polar rule more versatile. We can use it to obtain a representation of the difference between two numbers instead, by calling it like this:

add(10, difference, 20)

In some languages, this would assign 10 to difference. In Polar, it instead assigns a placeholder value called a "partial" that we'll talk about in more detail below. But both representations, in theory, carry the same information.

To summarize, we might try to describe the difference between Polar and Python by saying that the Polar rule defines a relationship between values while the Python function computes a value from values.

What does this have to do with data filtering?

The add example above isn’t super interesting, because we already know how to do subtraction. But the same principle works outside of just arithmetic. To see how it could apply in the context of authorization, consider this simple allow rule:

allow(actor, “edit”, resource) if actor = resource.owner;

The most basic application for this rule is to check whether a given actor can edit a specific resource. But if, as above, we were to call the rule without a known resource as its third argument, we might expect to receive back the set of possible values for resource that satisfy the rule. In other words, fully supporting the logic programming abstraction here would let one rule answer both of the questions we referred to in the intro: Can this actor edit this resource? And, what resources can this actor edit? Letting developers use their existing Oso policies to answer the second type of question is what we call "data filtering."

Partial success

Before going into detail about how we’ve been trying to make this possible, let’s take a look at what happens right now, if you try to call the allow rule given above in a Polar REPL using the Python library. (Anyone following along at home is encouraged to use the Python REPL, as there's some disparity between host libraries when it comes to handling unbound variables.)

When you enter something like

query> actor = { name: “gwen” } and 
       resource = { created_by: actor } and 
       allow(actor, “edit”, resource)

the Polar shell simply displays the variable/value associations

actor = { name: ‘gwen’ }
resource = { created_by: { name: ‘gwen’ } }

These are just the values we passed in. Neither one has been "computed" by the rule here: the rule has simply verified that all the required conditions hold over them. However if we leave resource undefined, the output becomes more complicated:

query> actor = { name: "gwen" } and
       allow(actor, "edit", resource)

actor = {'name': 'gwen'}
resource = Expression(And, [Expression(Unify, [{'name': 'gwen'}, Expression(Dot, [Variable('_this'), 'created_by'])])])

The binding for resource above is no longer a concrete or “ground” value. Instead, the Expression type represents a predicate that must hold on resource for the rule to be satisfied. We call these provisional assignments “partials”, which as far as I know is a term we made up. But, it makes sense if you consider the variables they refer to as being “partially” bound to a value: whatever it is, whoever it was created_by, their name is "gwen".

Partials are computed as a side effect of regular policy evaluation, so we already have something to start with. The next step is to transform the set of partials into the set of values they describe. SQL is a widely-used tool for describing data sets filtered by conditions, so "targeting" is useful for both practical and explanatory purposes: in the case above, our goal would be roughly to turn the condition expressed by the partial _this.created_by = { name: "gwen" } into a query such as SELECT posts.* FROM posts WHERE posts.created_by_id = <gwen's id>.

v0: authorized_session

Our first attempt at this resulted in two Python integration libraries that we still maintain today. Oso’s Django and SQLAlchemy integrations turn partials from Polar into database queries, and use this capability to automatically apply authorization to every model we pull out of the database. This makes for a “hands-free” experience where you can often avoid writing any explicit authorization logic at all in your application, not even calls to Oso API functions; everything is automatically handled by middleware.

Unfortunately, these libraries have some limitations. First and most obviously, they’re Python-specific, and framework-specific on top of that, which is not our vibe here at Oso. They’re also limited in what kinds of policies they can translate into queries, and tightly coupled to the details of how we represent partials, which makes them difficult to extend. The SQL they produce relies heavily on nested subqueries: for example, here's a selection from our tests:

SELECT posts.id AS posts_id, posts.contents AS posts_contents, posts.title AS posts_title, posts.access_level AS
       posts_access_level, posts.created_by_id AS posts_created_by_id, posts.needs_moderation AS
       posts_needs_moderation
       FROM posts
       WHERE (EXISTS (SELECT 1
       FROM users
       WHERE users.id = posts.created_by_id AND (EXISTS (SELECT 1        FROM posts
       WHERE users.id = posts.created_by_id AND posts.id > ? AND (EXISTS (SELECT 1
       FROM users
       WHERE users.id = posts.created_by_id AND (EXISTS (SELECT 1
       FROM posts
       WHERE users.id = posts.created_by_id AND posts.id > ? AND (EXISTS (SELECT 1
       FROM users
       WHERE users.id = posts.created_by_id AND (EXISTS (SELECT 1
       FROM posts
       WHERE users.id = posts.created_by_id AND posts.id > ?)))))))))))) AND (EXISTS (SELECT 1
       FROM users
       WHERE users.id = posts.created_by_id AND (EXISTS (SELECT 1
       FROM posts
       WHERE users.id = posts.created_by_id)))) AND posts.id > ? AND posts.id =

Finally, while the authorized_session API they offer is very cool and has some compelling use cases, it’s not as general as we would like. Popular ORMs already offer abstractions around database queries that we’d like to plug into. Our ideal, general-purpose API would return a query with authorization already applied, that the user can then refine in whatever way they need to, for example by applying additional conditions, or paginating the results.

v1: authorized_query

Our next attempt would be more ambitious. Oso v0.20.1 saw the initial release of our framework-agnostic data filtering API for the Python, Ruby and Node libraries. This time we'd offer a more flexible, slightly lower-level API that returns an ORM query or a collection of objects to the user. We added a component to the Rust library that translates partial results into what we called a "filter plan", a structure that captures information about dependencies between different data sources and describes a sequence of filters that, when composed, yield the desired query or set of objects. To use data filtering, the user would supply Oso with callbacks to operate on "queries", in their application-specific form. In a Rails app, for example, this would be through ActiveRecord::Relation objects.

While this design works in many cases, it also has some drawbacks. For example, it makes no attempt to consolidate partial results into a single database query, instead making separate queries over each table. It also assembles some intermediate results in memory in order to generate subsequent queries. That means that although you can now call authorized_query and get a query object back, there's actually no guarantee Oso is accessing your data store in an efficient way. And not only would Oso issue intermediate queries in order to construct the final query, but the SQL it ended up generating wan't always ideal either. Take a look at the sequence of database queries issued to handle an API call in our GitClub example application

Untitled

Finally, v1 was originally based on a prototype design that didn't quite capture some fairly common usage patterns. To support these cases, we had to cram a lot more information into our filter descriptions, which made data filtering significantly more complicated for users to configure.

v2: don't know much about algebra

Our next iteration tried to solve these problems by choosing a new representation for filters, one that we believed would be more able to express the full range of queries we'd like to support. Fortunately, SQL is already based on a convenient formalism called relational algebra, that can express complicated selections of data by composing a few general primitives. Although we aim to support more than just relational databases, we felt that this way of describing data sets was general enough to work for most of our users, and that we'd be able to extend it fairly easily to work with non-relational data if the need arose.

So we went ahead with a proof-of-concept for the new filter design. The v1 approach required users to analyze a complicated data structure with non-obvious semantics in order to get a useful query out of data filtering. The new approach would essentially send back an abstract syntax syntax tree describing an idealized SQL query, which would turn itself into an ORM query object by recursively calling a user-defined method on each node. User configuration would now consist of providing definitions for these to_query methods for each relational primitive: select, join, etc.

This approach quickly proved its ability to render relatively complicated policies as a single database query:

Untitled

Nevertheless, we still had some misgivings. Expressing queries as an AST made them easy to work with programmatically, but hard to inspect and verify. It also became clear that we didn't really need the full power of the "algebraic" representation, and that a simpler, more regular format would have the same degree of expressiveness while also being easier for us to understand. Finally, the plan for how users would configure data filtering worked fine in Ruby, which we used to develop the new version, but would translate awkwardly into languages where concepts like "subclass" and "method" have no clear interpretation.

v3: the present moment

Rather than expressing a query as an arbitrarily nested algebraic expression, we decided to flatten the structure by using a disjunctive normal form representation, and to separate logical conditions (users.id = 99) from relational information (Which tables? How are they joined?) at the top level. The result is a new format that's comprehensible to humans while still being easy for a machine to digest:

Untitled

We've also pushed the details of how we translate relations into joins out into the host library, which keeps the core simpler and more platform-independent. The user configuration question is still somewhat open, but it's no longer tied to language-specific concepts like subclasses, and the data it has to handle are much simpler and more regular. We expect the picture to become clearer as we begin porting the new version to more languages.

v4???

While we're excited about our improved data filtering code and expect it eventually to supplant the original version, we plan to support both APIs initially. The new system has the potential to be much more performant than v1 and to support a wider range of policies, but there are still some limitations on the queries it's able to generate that mean it's not quite a drop-in replacement -- although based on the policies we've seen in the wild, we don't expect most users to encounter any problems.

Assuming the new design proves to be a good foundation for future work, one of our next steps will be to port it to the remaining host libraries. Big nostra culpa here to any Rust/Java/Go users in need of data filtering: sorry for the wait! We want to deliver something solid, and dynamic languages are good for prototyping, so we hope you understand. Thank you ... for bearing with us 🐻

We use GitHub issues to gauge community interest in different platforms and prioritize work accordingly, so in the event you'd really like to use data filtering with Rust, Go, Java, .NET, Haskell, Elixir, Lua, Lisp, Julia, OCaml, APL, or whatever else your favorite platform happens to be — we'd appreciate hearing from you there!

thx for reading 🤓

If you'd like to learn more about Oso, please come and join our community Slack channel and let's chat. And since you made it all the way down to the bottom, we might as well mention that we're hiring.

Good luck out there bear cubs, and happy authorizing!

Want us to remind you?
We'll email you before the event with a friendly reminder.

Write your first policy