Sync Data
Add facts in development
Whether starting fresh or making changes to existing rules, the quickest way to iterate on the facts stored in Oso Cloud is via the Fact Schema (opens in a new tab) in the UI. The Fact Schema lists the types of facts referenced in your policy; these are the types of facts Oso Cloud expects you to send.
To add a new fact, click + Add
next to the type of fact you want to add. To
remove an existing fact, click ▼ Show matching facts
and then click the Delete
button next to the fact you want to delete.
Sync facts in production
Oso Sync is only available for Startup and Growth plan customers.
Initial sync
Once you've decided how to represent your authorization data in Oso Cloud, you'll need to do a one-time sync to bring Oso Cloud in-line with your data. We provide Oso Sync to update the facts in Oso Cloud to match those in your application database.
You can use Oso Sync from the CLI with the oso-cloud reconcile
command.
oso-cloud reconcile --perform-updates reconcile.yaml
Configuration
In order for Oso Sync to know where to find the facts you need, you need to create a configuration yaml file, which maps your data to facts in Oso Cloud. We currently support the following data sources:
PostgreSQL
version: 1source: postgresfacts: has_relation(Repository:_, String:parent, Organization:_): db: app_db query: |- select repository.public_id, organization.public_id from repository join organization on organization.id = repository.organization_iddbs: app_db: connection_string: postgresql://oso:oso@somerds.instance.aws.com:5432/foo
The config file has two top level fields: facts
and dbs
.
dbs
contains a list of databases from which Oso Sync should pull the fact data. Each entry is keyed by a unique name and contains aconnection_string
value, which needs to conform to a PostgreSQL connection URI (opens in a new tab). Alternatively, you can provide an environment variable (prefixed with a$
) containing the connection string:connection_string: $ENV_VAR_NAME
.facts
maps fact types to the database query that fetches all facts of that type. Each fact type is defined with positional variable slots (specified by an underscore_
), which are filled by the query in order to generate the facts. For instance, the fact typehas_relation(Repository:_, String:parent, Organization:_)
has two variables: one in the first argument for theRepository
and one in the third argument for theOrganization
.db
is the database that contains fact data for this fact type. Its value should match an identifier from thedbs
section.query
is the query to fetch all facts of that fact type. Match the columns you're fetching data from positionally with the variables in the fact type. In the example above,repository.public_id
is set as the Repository value in the first argument of the fact type, andorganization.public_id
is set as the Organization value in the third argument.
MongoDB
version: 1source: mongodbfacts: has_relation(Repository:_, String:parent, Organization:_): db: app_db collection: has_relation fields: - name: repository - name: organization is_array: true query: find: {} # `find` and `aggregate` are mutually exclusive # aggregate: []dbs: app_db: connection_string: mongodb://oso:oso@somemongo.instance.aws.com:27017/foo
The config file has four top level fields: version
, source
, dbs
, and facts
.
version
should have a value of1
.source
should have a value ofmongodb
.dbs
contains a list of databases from which Oso Sync should pull the fact data. Each entry is keyed by a unique name and contains aconnection_string
value, which needs to conform to a MongoDB connection URI (opens in a new tab). Alternatively, you can provide an environment variable (prefixed with a$
) containing the connection string:connection_string: $ENV_VAR_NAME
.facts
maps fact types to the database query that fetches all facts of that type. Each fact type is defined with positional variable slots (specified by an underscore_
), which are filled by the query in order to generate the facts. For instance, the fact typehas_relation(Repository:_, String:parent, Organization:_)
has two variables: one in the first argument for theRepository
and one in the third argument for theOrganization
.db
is the database that has the collection with the data for this fact type.collection
is the collection that contains data for this fact type.fields
is an array containing the names of the fields to extract from the documents returned by the query. Each array item maps to the positional variable in the fact type, and all variables must be included. An item may have an optionalis_array
field; ifis_array
istrue
, the field on the document must be an array type and is automatically unwound. At most one field may be configured withis_array: true
.query
is the query to fetch all documents that contain data for the fact type. Eitherfind
oraggregate
field may be used for the query, and these are passed directly to the MongoDBfind
andaggregate
, respectively. The example above illustrates a query usingfind
. Foraggregate
queries, use of the$out
stage results in an error.
Comma-separated Values (CSV)
version: 1source: csvfacts: has_relation(Repository:_, String:parent, Organization:_): fields: - name: repository - name: organization path: /path/to/has_relation.csv
The config file has three top level fields: version
, source
, and facts
.
version
should have a value of1
.source
should have a value ofcsv
.facts
map fact types to the CSV file with the data of that type. Each fact type is defined with positional variable slots (specified by an underscore_
), which are filled with data from the corresponding values in the CSV file. For instance, the fact typehas_relation(Repository:_, String:parent, Organization:_)
has two variables: one in the first argument for theRepository
and one in the third argument for theOrganization
.fields
is an array containing the names of the values to extract from the CSV file. The first row in CSV file must be a header row and must include all of the items in thefields
array. Each array item maps to the positional variable in the fact type, and all variables must be included.path
is the path to the CSV file with the data for the fact type.
Add and remove facts
Whenever you insert, update, or delete authorization-relevant data in your application, you should use Oso Cloud's Bulk API to mirror that update in Oso Cloud.
This "dual writes" approach is similar to updating an Elasticsearch index to provide up-to-date search results. Oso Cloud is a fast and flexible index for your authorization data that's optimized for producing sub-millisecond authorization decisions.
For example, in our GitCloud (opens in a new tab) example app, when a user creates a new repository, we send a pair of facts to Oso Cloud:
def create_repository(org_id): org = Organization(org_id) repo = Repository(payload["name"], org) # Open a transaction to persist the repository to our datastore. session.add(repo) # Send facts to Oso Cloud. with oso.batch() as tx: # The parent organization of `repo` is `org`. tx.insert(("has_relation", repo, "organization", org)) # The creating user gets the "admin" role on the new repository. tx.insert(("has_role", current_user, "admin", repo)) # Once the bulk update to Oso Cloud succeeds, commit the transaction. session.commit() return repo.as_json(), 201
When deleting a repository, the process is identical, but the facts in the Bulk API call go in the removal array. Additionally, you can use wildcards to remove all facts matching a pattern:
with oso.batch() as tx: # Remove all `has_relation` facts for the repository. tx.delete(("has_relation", repo, None, None)) # Remove all `has_role` facts for the repository. tx.delete(("has_role", None, None, repo))
Wildcards are represented as None
in Python, null
in JavaScript, nil
in
Ruby, and so on.
When creating new resources, send corresponding facts to Oso Cloud before closing the local transaction. This way, we tell the user we’ve created the new resource once they’re able to access it.
When deleting existing resources, remove corresponding facts from Oso Cloud after closing the local transaction. We wait to remove access until we’re sure the resource no longer exists.
To add and remove facts in a single transaction — for example, when updating a
user's role from member
to admin
— use the Bulk API:
with oso.batch() as tx: tx.delete(("has_role", user, None, repo)) tx.insert(("has_role", user, "admin", repo))
The Bulk API processes fact removals before additions, so after the above call
the user has exactly one role on the repository: admin
.
Keep facts in sync
To ensure authorization data remains in sync with application data, it's good practice to periodically refresh the facts in Oso Cloud. You can use Oso Sync to identify any data drift as well as synchronize your application data to the facts in Oso Cloud.
Using the configuration file from the Initial Sync configuration,
- To compute the diff only, run:
oso-cloud reconcile reconcile.yaml
- To compute and apply the diff, run:
oso-cloud reconcile --perform-updates reconcile.yaml
This returns the diff over stdout. If the --perform-updates
flag is passed,
the diff output represents the differences before applying the diff.
If 1000 or fewer facts have changed, Oso Sync returns the lists of facts to add or remove:
{ "type": "facts", "fact_types": [ { "fact_type": <Fact>, "add": [<Fact>, ...], "remove": [<Fact>, ...] } ]}
If more than 1000 facts have changed, Oso Sync returns the counts instead:
{ "type": "counts", "fact_types": [ { "fact_type": <Fact>, "add_count": 501, "remove_count": 500, } ]}
Oso Sync formats facts in their fully-expanded JSON representation.
Any variables in the fact type are represented by a null
value:
{ "predicate": "has_relation", "args": [ { "type": "Repository", "id": null }, { "type": "String", "id": "parent" }, { "type": "Organization", "id": null } ]}
Oso Sync Limitations
-
At most one Oso Sync command should be run at a time for a given environment. If multiple Oso Sync commands are run in parallel for an environment, you may see HTTP 419 errors.
-
The maximum size of the application data per fact type is 10GB. To synchronize larger data sets, you may consider "sharding" a single fact type across multiple fact type definitions in the YAML configuration by substituting a concrete value for one or more of the arguments.
Before:
has_relation(Repository:_, String:_, Organization:_): ...After:
has_relation(Repository:_, String:parent, Organization:_): ...has_relation(Repository:_, String:child, Organization:_): ... -
The diff may include transient false positives due to our comparing a point-in-time snapshot of your database to Oso Cloud, which continues to receive changes. Transient false positives should not appear on successive invocations of Oso Sync and do not indicate issues with how your application updates facts in Oso Cloud.
Docker
We publish a wrapped up version of the CLI (x86_64) for Oso Sync at public.ecr.aws/osohq/reconcile:latest
.
To use it, build your own image on top of this using a Dockerfile like this:
FROM public.ecr.aws/osohq/reconcile:latestARG CONFIG_PATHRUN test -n "$CONFIG_PATH" || (echo "CONFIG_PATH argument must be set to path of your reconcile.yaml" && false)WORKDIR /appCOPY $CONFIG_PATH /app/config.yamlENTRYPOINT ["/app/reconcile", "experimental", "reconcile", "/app/config.yaml"]
Build it with: docker build -t reconcile-tool -f reconcile-tool.Dockerfile --build-arg="CONFIG_PATH=./reconcile.yaml" --platform linux/amd64 .
.
Talk to an Oso engineer
If you'd like to learn more about using Oso Cloud in your app or have any questions about this guide, connect with us on Slack. We're happy to help.