Sync Data

Add facts in development

Whether starting fresh or making changes to existing rules, the quickest way to iterate on the facts stored in Oso Cloud is via the Fact Schema (opens in a new tab) in the UI. The Fact Schema lists the types of facts referenced in your policy; these are the types of facts Oso Cloud expects you to send.

Fact Schema

To add a new fact, click + Add next to the type of fact you want to add. To remove an existing fact, click ▼ Show matching facts and then click the Delete button next to the fact you want to delete.

Sync facts in production

Oso Sync is only available for Startup and Growth plan customers.

Initial sync

Once you've decided how to represent your authorization data in Oso Cloud, you'll need to do a one-time sync to bring Oso Cloud in-line with your data. We provide Oso Sync to update the facts in Oso Cloud to match those in your application database.

You can use Oso Sync from the CLI with the oso-cloud reconcile command.


oso-cloud reconcile --perform-updates reconcile.yaml

Configuration

In order for Oso Sync to know where to find the facts you need, you need to create a configuration yaml file, which maps your data to facts in Oso Cloud. We currently support the following data sources:

PostgreSQL


version: 1
source: postgres
facts:
  has_relation(Repository:_, String:parent, Organization:_):
    db: app_db
    query: |-
      select repository.public_id, organization.public_id
      from repository
      join organization
      on organization.id = repository.organization_id
dbs:
  app_db:
    connection_string: postgresql://oso:oso@somerds.instance.aws.com:5432/foo

The config file has two top level fields: facts and dbs.

dbs contains a list of databases from which Oso Sync should pull the fact data. Each entry is keyed by a unique name and contains a connection_string value, which needs to conform to a PostgreSQL connection URI (opens in a new tab). Alternatively, you can provide an environment variable (prefixed with a $) containing the connection string: connection_string: $ENV_VAR_NAME.
facts maps fact types to the database query that fetches all facts of that type. Each fact type is defined with positional variable slots (specified by an underscore _), which are filled by the query in order to generate the facts. For instance, the fact type has_relation(Repository:_, String:parent, Organization:_) has two variables: one in the first argument for the Repository and one in the third argument for the Organization.
- db is the database that contains fact data for this fact type. Its value should match an identifier from the dbs section.
- query is the query to fetch all facts of that fact type. Match the columns you're fetching data from positionally with the variables in the fact type. In the example above, repository.public_id is set as the Repository value in the first argument of the fact type, and organization.public_id is set as the Organization value in the third argument.

MongoDB


version: 1
source: mongodb
facts:
  has_relation(Repository:_, String:parent, Organization:_):
    db: app_db
    collection: has_relation
    fields:
      - name: repository
      - name: organization
        is_array: true
    query:
      find: {}
      # `find` and `aggregate` are mutually exclusive
      # aggregate: []
dbs:
  app_db:
    connection_string: mongodb://oso:oso@somemongo.instance.aws.com:27017/foo

The config file has four top level fields: version, source, dbs, and facts.

version should have a value of 1.
source should have a value of mongodb.
dbs contains a list of databases from which Oso Sync should pull the fact data. Each entry is keyed by a unique name and contains a connection_string value, which needs to conform to a MongoDB connection URI (opens in a new tab). Alternatively, you can provide an environment variable (prefixed with a $) containing the connection string: connection_string: $ENV_VAR_NAME.
facts maps fact types to the database query that fetches all facts of that type. Each fact type is defined with positional variable slots (specified by an underscore _), which are filled by the query in order to generate the facts. For instance, the fact type has_relation(Repository:_, String:parent, Organization:_) has two variables: one in the first argument for the Repository and one in the third argument for the Organization.
- db is the database that has the collection with the data for this fact type.
- collection is the collection that contains data for this fact type.
- fields is an array containing the names of the fields to extract from the documents returned by the query. Each array item maps to the positional variable in the fact type, and all variables must be included. An item may have an optional is_array field; if is_array is true, the field on the document must be an array type and is automatically unwound. At most one field may be configured with is_array: true.
- query is the query to fetch all documents that contain data for the fact type. Either find or aggregate field may be used for the query, and these are passed directly to the MongoDB find and aggregate, respectively. The example above illustrates a query using find. For aggregate queries, use of the $out stage results in an error.

Comma-separated Values (CSV)


version: 1
source: csv
facts:
  has_relation(Repository:_, String:parent, Organization:_):
    fields:
      - name: repository
      - name: organization
    path: /path/to/has_relation.csv

The config file has three top level fields: version, source, and facts.

version should have a value of 1.
source should have a value of csv.
facts map fact types to the CSV file with the data of that type. Each fact type is defined with positional variable slots (specified by an underscore _), which are filled with data from the corresponding values in the CSV file. For instance, the fact type has_relation(Repository:_, String:parent, Organization:_) has two variables: one in the first argument for the Repository and one in the third argument for the Organization.
- fields is an array containing the names of the values to extract from the CSV file. The first row in CSV file must be a header row and must include all of the items in the fields array. Each array item maps to the positional variable in the fact type, and all variables must be included.
- path is the path to the CSV file with the data for the fact type.

Add and remove facts

Whenever you insert, update, or delete authorization-relevant data in your application, you should use Oso Cloud's Bulk API to mirror that update in Oso Cloud.

💡

This "dual writes" approach is similar to updating an Elasticsearch index to provide up-to-date search results. Oso Cloud is a fast and flexible index for your authorization data that's optimized for producing sub-millisecond authorization decisions.

For example, in our GitCloud (opens in a new tab) example app, when a user creates a new repository, we send a pair of facts to Oso Cloud:


def create_repository(org_id):
    org = Organization(org_id)
    repo = Repository(payload["name"], org)
    # Open a transaction to persist the repository to our datastore.
    session.add(repo)
    # Send facts to Oso Cloud.
    with oso.batch() as tx:
        # The parent organization of `repo` is `org`.
        tx.insert(("has_relation", repo, "organization", org))
        # The creating user gets the "admin" role on the new repository.
        tx.insert(("has_role", current_user, "admin", repo))
    # Once the bulk update to Oso Cloud succeeds, commit the transaction.
    session.commit()
    return repo.as_json(), 201

When deleting a repository, the process is identical, but the facts in the Bulk API call go in the removal array. Additionally, you can use wildcards to remove all facts matching a pattern:


with oso.batch() as tx:
    # Remove all `has_relation` facts for the repository.
    tx.delete(("has_relation", repo, None, None))
    # Remove all `has_role` facts for the repository.
    tx.delete(("has_role", None, None, repo))

Wildcards are represented as None in Python, null in JavaScript, nil in Ruby, and so on.

💡

When creating new resources, send corresponding facts to Oso Cloud before closing the local transaction. This way, we tell the user we’ve created the new resource once they’re able to access it.

When deleting existing resources, remove corresponding facts from Oso Cloud after closing the local transaction. We wait to remove access until we’re sure the resource no longer exists.

To add and remove facts in a single transaction — for example, when updating a user's role from member to admin — use the Bulk API:


with oso.batch() as tx:
    tx.delete(("has_role", user, None, repo))
    tx.insert(("has_role", user, "admin", repo))

The Bulk API processes fact removals before additions, so after the above call the user has exactly one role on the repository: admin.

Keep facts in sync

To ensure authorization data remains in sync with application data, it's good practice to periodically refresh the facts in Oso Cloud. You can use Oso Sync to identify any data drift as well as synchronize your application data to the facts in Oso Cloud.

Using the configuration file from the Initial Sync configuration,

To compute the diff only, run:


oso-cloud reconcile reconcile.yaml

To compute and apply the diff, run:


oso-cloud reconcile --perform-updates reconcile.yaml

This returns the diff over stdout. If the --perform-updates flag is passed, the diff output represents the differences before applying the diff.

If 1000 or fewer facts have changed, Oso Sync returns the lists of facts to add or remove:


{
  "type": "facts",
  "fact_types": [
    {
      "fact_type": <Fact>,
      "add": [<Fact>, ...],
      "remove": [<Fact>, ...]
    }
  ]
}

If more than 1000 facts have changed, Oso Sync returns the counts instead:


{
  "type": "counts",
  "fact_types": [
    {
      "fact_type": <Fact>,
      "add_count": 501,
      "remove_count": 500,
    }
  ]
}

Oso Sync formats facts in their fully-expanded JSON representation. Any variables in the fact type are represented by a null value:


{
  "predicate": "has_relation",
  "args": [
    { "type": "Repository", "id": null },
    { "type": "String", "id": "parent" },
    { "type": "Organization", "id": null }
  ]
}

Oso Sync limitations

At most one Oso Sync command should be run at a time for a given environment. If multiple Oso Sync commands are run in parallel for an environment, you may see HTTP 419 errors.
The maximum size of the application data per fact type is 10GB. To synchronize larger data sets, you may consider "sharding" a single fact type across multiple fact type definitions in the YAML configuration by substituting a concrete value for one or more of the arguments.

Before:

has_relation(Repository:_, String:_, Organization:_): ...

After:

has_relation(Repository:_, String:parent, Organization:_): ... has_relation(Repository:_, String:child, Organization:_): ...
The diff may include transient false positives due to our comparing a point-in-time snapshot of your database to Oso Cloud, which continues to receive changes. Transient false positives should not appear on successive invocations of Oso Sync and do not indicate issues with how your application updates facts in Oso Cloud.

Docker

We publish a wrapped up version of the CLI for Oso Sync at public.ecr.aws/osohq/reconcile:latest. To use it, build your own image on top of this using a Dockerfile like this:


FROM public.ecr.aws/osohq/reconcile:latest
ARG CONFIG_PATH
RUN test -n "$CONFIG_PATH" || (echo "CONFIG_PATH argument must be set to path of your reconcile.yaml" && false)
WORKDIR /app
COPY $CONFIG_PATH /app/config.yaml
ENTRYPOINT ["/app/reconcile", "reconcile", "/app/config.yaml"]

Build it with: docker build -t reconcile-tool -f reconcile-tool.Dockerfile --build-arg="CONFIG_PATH=./reconcile.yaml" --platform linux/amd64 ..

Talk to an Oso engineer

If you'd like to learn more about using Oso Cloud in your app or have any questions about this guide, connect with us on Slack. We're happy to help.

Get started with Oso Cloud →

Export Data