Map Relational Data to Facts
This guide assumes that you're familiar with:
- Facts syntax
- Polar syntax
- Relational database basics
Oso Cloud represents data as facts, which are optimized for authorization operations. Facts are flexible enough to represent any application data, but it's not always obvious how to translate from one to the other. Since most software developers are familiar with relational databases, this guide will discuss how to map common relational data structures to facts.
Facts define some trait of an entity: something it is (public, protected) or something it has (color, amount). Good fact names reflect this. They look like is_TRAIT
or has_TRAIT
. For instance, if the fact expresses that an entity is public, you should name the fact is_public
. If it expresses that an entity has a role, you should name it has_role
.
Fact arguments provide the details that uniquely define the fact. The first argument is the entity that the fact describes. We'll call this entity the subject. The fact is_public(Repository:cool_app)
states that the cool_app
repository (the subject) is public
(the trait).
These conventions will guide how facts are constructed in this guide.
We used the name of the repository here for illustration, but in practice you'll usually refer to objects by their ID.
Simple-valued fields
Simple-valued fields (i.e. fields that aren't foreign keys) typically describe an attribute of a subject. They can be represented by facts that take the form is_FIELD_NAME
or has_FIELD_NAME
.
Boolean fields
Boolean fields express whether or not something about the subject is true (public, protected, banned, disabled). They are the simplest to convert to facts, because the only information you need is the name of the field and the identity of the subject that the field describes. They look like this:
trait_name(subject)
We saw this above in the is_public
example. Suppose a repository
table with the following structure.
---------- repository -----------| | | || id | name | is_public |---------------------------------| 1 | cool_app | TRUE || 2 | okay_app | FALSE |---------------------------------
The trait we're interested in is whether or not a repository is public, which is expressed by the is_public
field. The row with ID 1 describes the cool_app
repository, for which the value of is_public
is TRUE
.
If you wanted to determine whether the cool_app
repository is public in SQL, you'd write:
SELECT is_public FROM repositoryWHERE id=1
To express this same data as a fact using the convention above, you'd write
is_public(Repository:1)
The name of the fact is the name of the trait (is_public
), and the first argument is the subject the fact describes (repository ID 1, or cool_app).
Boolean fields are often named is_SOMETHING. If the field name already starts
with is
or has
, don't prepend it again to the fact name.
Strings and numbers
Strings and numbers express traits that can be quantified (amount, weight) or labeled (role, status). Unlike boolean facts, it is necessary to specify the value of a string or number to fully define it. For instance, your application may have users who have different roles (admin
, writer
, read-only
). You could express this as a string-valued role
field on a User table (you probably wouldn't, but let's keep things simple for now).
------------ user --------------| || id | name | role |--------------------------------| 1 | alice | admin || 2 | bob | read-only || 3 | charlie | writer |--------------------------------
The trait we're interested in now is the role a user has, which is expressed by the role
field. You can see that alice
has the admin
role, bob
has the read-only
role, and charlie
has the writer
role.
If you want to fetch alice
's role in SQL, you'd use this query:
SELECT role FROM user WHERE id=1
If you want to express alice
's role as a fact, you need more than the name of the trait and the identity of the subject. That approach works for boolean traits because they only have two possible values: TRUE
and FALSE
. If the fact exists, then the value is TRUE
. If it doesn't, then the value is FALSE
.
Since string and numeric traits have many possible values, you also need to record the value of the trait in the fact. The fact has_role(User:1)
just says that alice
has a role. It doesn't say which role alice
has. You use the second argument of the fact for that information: has_role(User:alice, "admin")
.
Roles have special
meaning
to Oso. If you define roles in a field called something other than role
, you
should still name the associated fact has_role
.
You can use this general form for all traits that have a value:
trait_name(subject, trait_value)
Numbers are expressed almost identically to strings, except that the trait_value
isn't quoted. You might have a login_count
field to store the number of times a user has logged in.
------------------- user -------------------| || id | name | role | login_count |--------------------------------------------| 1 | alice | admin | 100 || 2 | bob | read-only | 24 || 3 | charlie | writer | 15 |--------------------------------------------
The user bob
(ID 2) has a login count of 24
, so you express the number of times bob
has logged in as has_login_count(User:2, 24)
.
Foreign keys
A foreign key is a reference to another table in a relational database. These references serve two major purposes:
Lookups: Lookup tables define values that are used repeatedly in other tables. For example, you may want to have a role
table where each role name is defined a single time rather than repeating the names of the roles directly in your user
table.
Relationships: The majority of foreign keys define relationships between two entities.
Lookup tables are closely related to string-valued fields, so we'll start with those and then discuss relationships.
Lookup tables
Rather than repeating the same values over and over for strings that have system meaning like roles and status, many teams will capture all the possible values of those strings in a separate table. That would look like this for our example of users with roles:
The role table simply defines the list of roles that are available.
------ role ------| || id | role |------------------| 1 | admin || 2 | writer || 3 | read-only |------------------
Now the user table looks a little different.
---------- user ------------| || id | name | role_id |----------------------------| 1 | alice | 1 || 2 | bob | 3 |----------------------------
The SQL to get alice
's role is also a bit different:
SELECT role FROM roleJOIN user on user.role_id = role.idWHERE user.id = 1
You could express alice
's role in a fact by its ID like this: has_role(User:1, 1)
, but you probably don't want to. This is because your authorization logic will have rules that depend on a specific role value. You may have a rule that states that users with the admin
role get the add_user
permission. If you express roles using their names, then you can write that rule like this:
has_permission(user: User, "add_user") if has_role(user, "admin");
If instead you use the role IDs to express the roles, then the rule would look like this:
has_permission(user: User, "add_user") if has_role(user, 1);
The first version is much easier for a person to understand. Someone reading the second version has to know which role has which ID (and it's not uncommon for that mapping to drift between pre-prod and production environments). So, if you're using lookup tables to store strings that you use in your policy, store the strings as facts rather than their IDs. Your policy will be much easier to maintain.
Relationships
Most foreign keys express relationships between two entities. The rest of this section covers the most common:
- One-to-Many
- Recursive
- Many-to-Many
One-to-many relationships
A one-to-many relationship is a relationship where one entity may have a relationship to multiple other entities of a given type. If you want to capture the owner of a repository, you can create a foreign key relationship between the repository
table and the user
table.
The owner of a repository is defined in the owner_id
field of the repository
table.
---------------- repository ----------------| | | || id | name | is_public | owner_id |--------------------------------------------| 1 | cool_app | TRUE | 1 || 2 | okay_app | FALSE | 3 |--------------------------------------------
This shows that alice
is the owner of cool_app
and charlie
is the owner of okay_app
.
This is an example of a one-to-many relationship. Each user will have exactly one record in the user
table. But any given user may be the owner of multiple repositories, so there could be many rows in the repository
table with the same owner_id
.
Because the owner_id
expresses a relation between the user
and repository
tables, we use the name has_relation
for the trait that this field defines.
Although we could have used the name has_owner
, relations occur so often
in authorization logic that they have special meaning to Oso. Whenever a fact defines
a relation between two entities, name it has_relation
.
This also makes it obvious from the policy that the fact represents a foreign key in the source database.
We use the same general form to define the owner fact: trait_name(subject, trait_value)
.
has_relation(Repository:1, "owner")
The trait_value
for a relation is the name of the relation. But once again, you need more information to fully define the fact. Now you need to know which User is the owner of the repository. This SQL query returns the owner of the cool_app
repository (user ID 1).
SELECT owner_id FROM repository WHERE id = 1
We'll call this user the object of the relation. You probably see where this is going. Enter the third argument:
has_relation(Repository:1, "owner", User:1)
You may be wondering why the repository is the subject instead of the user.
This is because the foreign key (owner_id
) exists on the repository
table,
so the primary key of the row that contains the foreign key identifies a
repository. For one-to-many relationships, it's natural to think of the
primary key as the subject, and the foreign key as the object.
All facts that involve a subject and an object have this general form:
trait_name(subject, trait_value, object)
Because relations are so common in authorization, it's worth reinforcing that form:
has_relation(subject, relation_name, object)
Recursive (self-referential) relationships
Recursive relationships are a special form of one-to-many relationship where the foreign key refers to a field in its own table (usually the primary key). Folder hierarchies are a common example. You may have folders that contain folders to arbitrary depths.
folder-1 |_folder-2 |_folder-3 |_folder-4 ...
You can represent this with a folder
table that has a parent_id
field that refers back to the id
of another row in the table.
The hierarchy above would be represented like this:
---------- folder ----------- | | | id | name | parent_id | ----------------------------- | 1 | folder-1 | NULL | | 2 | folder-2 | 1 | | 3 | folder-3 | 2 | | 4 | folder-4 | 1 | -----------------------------
You express the parent relation using has_relation
facts the same way you did before. Since the parent_id
field is the foreign key, the primary key of the row that contains the parent_id
(that is, the child folder) is the subject of the fact. The row that the parent_id
refers to is the object of the fact. So we say that folder-2
has the parent folder-1
, or:
has_relation(Folder:2, "parent", Folder:1)
In Polar terms, a recursive relationship is expressed by a fact where the subject and object have the same type.
Many-to-many relationships (JOIN tables)
Applications frequently model many-to-many relationships. In a multitenant application, users belong to organizations. Since a given user will likely have different roles in different organizations, it doesn't make sense to associate a single role with a user. Instead, you need a way to associate a role with the combination of a user and an organization. You do this by creating a JOIN table that defines a many-to-many relationship between users and organizations (we'll switch back to the string-valued role here for simplicity).
This structure allows you to assign users to multiple organizations and to give them different roles on each. Let's say you have two organizations: acme
and banjo
. alice
is an admin
in acme
and a member
in banjo
. You represent this as follows (assume the same user data we've used throughout, where alice
is user ID 1).
-- organization --| || id | name |------------------| 1 | acme || 2 | banjo |--------------------------- organization_user ----------| || organization_id | user_id | role |--------------------------------------| 1 | 1 | admin || 2 | 1 | member |--------------------------------------
Now, if you want to get a user's role, you also need to know the organization you need it for.
If you want to know which role alice
has on the acme
organization, you'd write this query:
SELECT role FROM organization_userWHERE organization_id = 1AND user_id = 1
How would you express this as a fact? This is just the general form of a fact with a subject and an object:
trait_name(subject, trait_value, object)
The trait we're interested in is the role. The subject (the entity that has the role) is alice
. The value is admin
. The object (the entity on which alice
has the role) is the acme
organization. Putting that all together, we get:
has_role(User:1, "admin", Organization:1)
Summary
You can visualize the mapping from relational data to facts like this:
Much of the power of facts derives from their flexibility, but this flexibility can cause confusion when you map relational data to facts. In this guide, we've proposed some conventions that we've found helpful. Though by no means comprehensive, these guidelines cover the majority of relational data structures in most applications:
- The general form of a fact should be
trait_name(subject, [trait_value], [object])
- The
trait_name
should take the formis_TRAIT
orhas_TRAIT
- The
- The
subject
is the entity that has the trait (e.g. the user that has the role)- You can also think of this as the row that has the field.
- The
trait_name
should mirror the name of the field that it corresponds to in most cases - Two types of traits have special meaning
- Roles should always be called
has_role
- Relations should always be named
has_relation
- Roles should always be called
- If you use lookup tables to store the values of strings that have meaning to your policy, use the values in your
fact definitions, not the IDs.
has_role(User:1, "admin")
instead ofhas_role(User:1, 1)
- The object should be the the record that is referenced by the subject
- The parent folder
- The organization on which the user has a role