1. How Microservices Make Authorization Complicated
2. Our Example: GitClub Jobs
When you move from a monolith to a service-oriented architecture, you need to design your authorization accordingly. You'll need to share your Authorization data between your services, and there are many ways to do that. Each design and architecture decision you'll make has trade-offs that you'll need to understand. We'll show you each of those choices and their trade-offs. And, we'll provide you with heuristics that will help you make decisions about your service architecture.
In the previous chapters, we’ve talked about building authorization in an application where all of the code, logic, and data live in the same application.
Here we’ll discuss the challenges of making authorization work in a distributed environment where code, logic, and data may be in different services. In this chapter:
This chapter is about authorization in a microservice architecture.
While the term “microservices” usually refers to deployments with many discrete services, the challenges of building in a distributed environment are the same whether you have two services or two hundred. Here we use “microservices” to describe an application composed of two or more services.
There’s a difference between building authorization that works internally between your applications versus using federated authorization protocols such as OAuth to integrate securely with third-party services. This chapter is about the former—we won’t cover authorization with third parties.
There are many reasons to adopt an architecture with multiple services. For example, you may want to build a new product that requires you to use a different technology stack. Or, you may need to split up a single monolithic service into smaller services to let your teams work on the components in isolation.
Whatever the reason, the result is that you’ll have an application made up of many different codebases, and you need them to present a consistent authorization experience.
These requirements are at odds! Developers use microservice architectures to decouple different services so teams can work on them separately, but authorization binds them together because it needs to work consistently in each service.
To see this problem in action, consider a role-based access control (RBAC) example. In an app with several services, Users belong to and have one of several roles within an Organization.
At a minimum, each service needs to agree on the possible roles that a member can have in an organization. In addition to knowing which roles there are, each service needs to know what role a specific User has.
In a monolith, you can query the local database for the user's roles. In our example, what happens when you’re writing code for a service that doesn’t have a copy of that database locally? Do you pass all of the data in a token? Do services call each other to fetch the data they need when a user makes a request, just in time? Once a service has the inputs that it needs to make an authorization decision, how does it enforce this decision and communicate the result to users?
These are all meaningful questions about architecture, and the best approach isn’t always evident.
Let’s see these different trade-offs in action. We’re building a new "GitClub Jobs" service to complement our existing GitClub product. If you’ve just arrived, GitClub is a code hosting app similar to GitHub or GitLab. GitClub Jobs is a way for developers to define and automate deployment and compilation tasks in their repositories. Customers can use GitClub Jobs to run their test suite for every new commit or automate the building and deployment of their web service whenever they release a new version.
We’ve designed the GitClub Jobs service for an entirely different type of work from our original web service: running arbitrary compute tasks. Because of the different requirements, we’ve built GitClub Jobs as a new service.
The data objects of this new service are jobs and runs. Jobs are definitions of automated tasks to perform, and Runs represent individual executions of a given Job object. For example, a user who wants to run an application’s unit tests would define a “test” job that includes the necessary instructions. Each time a user starts a test suite, we create a Run object. That object holds a success status and any other outputs, like logs or artifacts.
Users interact with GitClub Jobs via a new gitclub.dev/$ORGANIZATION/$REPOSITORY/jobs page. This page displays an overview of the current jobs for a repository and allows users to perform some basic management tasks, like “cancel job” or “restart job.”
GitClub Jobs covers the two authorization cases that you’ll see in microservices: shared authorization and service-specific authorization. The GitClub Jobs authorization model has objects in common with the GitClub web app, like users, organizations, and repositories. We need to share these concepts between our services. That model also defines its own service-specific objects—jobs and runs—that the web application doesn’t need to consider.
Jobs exist in the context of individual repositories—a job can’t exist without a repository. This relationship makes it possible to model authorization for Jobs objects by using the same roles that we use for Repositories. For example, a user with the “member” role in a Repository will have the same “member” role in the jobs service for all of the related Jobs objects.
The user has relationships to Jobs and Runs via Organizations and Repositories. We want to use these relationships to enforce the following policies:
To show the overview page for a job, we need access to many different data points. At a minimum, we need to know:
The first of these is available in the jobs service, but what about the organization and repository data? In a monolith, all of this data would be available in one place, but now it’s split between our two services.
To enforce this policy, we’ll need to find a way for our two services to share the policy and data necessary for our authorization decisions.
When you’re splitting up your monolith into multiple services, you’ll need to determine if you need to centralize your authorization model or if you should split it up into each service. This decision will affect how you manage your authorization model. It will also affect the process you use to roll out changes to that model across your services.
When making this decision, also consider the requirements of your teams. For example, do you need to let teams make changes to their own services’ authorization models without coordinating with others?
One way of decentralizing your authorization model is to duplicate your authorization logic in each service. If you have only a handful of services to manage, this can be an appropriate way to start. In this scenario, each service has a complete copy of the authorization model. That model includes both the common elements across all services (like the relationship between users and organizations) and the service-specific elements (like the rules defining who can cancel jobs). This approach allows each service to change the authorization model without needing input from other teams.
As your policy grows, you’ll need to manage many copies of the same authorization logic across your services. In particular, it’s important to ensure that you keep each copy up-to-date. You’ll need to roll out each change to each service.
If all of your service codebases use the same programming language, you can separate your authorization logic into a communal library. Each service then depends on this shared library. Using shared code like this removes the risk that a service will have an error in its own policy. While this approach addresses the risk of implementations drifting apart over time, you’ll still need to coordinate updates across services.
Distributed systems are not always consistent or available. If you deploy to many services, this can make policy updates across services prone to errors. At this point, it makes sense to centralize the definition and management of your authorization policies.
Centralizing has trade-offs. In a fully decentralized architecture, each service was free to introduce and test new changes to the policy in isolation. In this centralized world, you’ll need to deploy changes to all our services at once. It’s hard to predict edge cases in each service ahead of time. It’s worthwhile to consider ways to test policy behavior before deploying, like with mock data and test fixtures.
One option is to create a specialized service to handle authorization decisions for your application. Your services then call this central API whenever they have an authorization question. Exposing your authorization policy over the network allows your application services to offload this work while also providing a single place to manage your authorization policy. In addition, the central service allows you to update policies for all clients at once, removing the need to coordinate changes to services.
Policies are only half of the authorization equation. To enforce those policies, you’ll also need access to the relevant data.
The question of when to centralize your policies is more straightforward than the question of when to centralize your data. That’s because of the size of the dataset—policies are typically a few hundred lines of text, while databases are unbounded in size—and the frequency of data updates.
You’ll probably update your policy infrequently. On the other hand, users and services constantly mutate your application’s data. In addition, your policy might require access to many different data points to make an authorization decision.
To demonstrate, we’ll revisit GitClub Jobs. We want to enforce the following rules:
To enforce this policy, we’ll need access to data about the user, the job, and the organizations and repositories that they each belong to. That’s a lot of different inputs! Clients can pass the user data as a JSON Web Token, and the job data will be available locally in the service. But, what about the organizations and repositories inputs?
In a monolith, you can be sure that you have access to all of the data in one place. Unfortunately, there’s no such guarantee in the world of microservices. Scenarios like this are common when you’re breaking an existing application into different services, and highlight the fundamental challenge of solving authorization in a distributed application.
When you arrive at this situation, you have two options: use your existing technology to distribute data within your infrastructure, or introduce a centralized authorization service to gather the data in one location.
Our golden rule is: build authorization around your application, and not the other way around. This rule is true whether you’re building in a monolith or microservice architecture. To that end, consider what infrastructure or other options you might have available to share data between your microservices.
This problem isn’t unique to authorization—this arises whenever you need to share data with all of your services. Because of this, it’s possible you might already have some existing infrastructure you can re-use to distribute your authorization data.
Here are some options:
One option is to have your services mimic clients and call each other’s APIs to retrieve the data that they need in a just-in-time manner. Doing this means that the data that you use in your authorization decisions is always fresh and reliable.
This naive approach can work if you have light performance requirements but quickly leads to poor performance under load. Any requests that require data from many services will be slow in these circumstances—a single client request will trigger several other requests, effectively multiplying your request load.
If you happen to have a performant way of fetching data between services—like if you’re all-in on gRPC for service-to-service communication—then by all means, use that!
If—like many of us—your organization has yet to solve the problem of highly performant and available distributed databases, there are other options.
If you only have a small amount of data that needs to be available to each service for authorization, you can decentralize it by passing all the data in each request. One way of doing this is using authorization tokens, like JSON Web Tokens (JWT), to pass the data between each service securely.
Authorization tokens rely on cryptographic signatures to protect the integrity of their contents from manipulation. Cryptographic signing prevents attackers from taking a token for one user and manipulating it to represent another user. Many applications limit the scope of the authorization token to serialize only the identity of the user. However, that’s not always necessary—you can include other role-based access control data for your services to consume.
In our GitClub example, we could serialize all of the organization roles a particular user has within their JWT alongside their user identity. This approach would work well for encoding organization roles because a user won’t have very many of those roles. But, this strategy isn’t useful for encoding a user’s repository roles because you’ll run out of space in the JWT. There are limitations on the size of tokens—in particular, if you pass your JWT via an HTTP header, you’ll encounter HTTP’s not-so-fun header limit sizes.
Another concern with placing lots of data in authorization tokens is that the data they contain can become stale quickly. For example, in an app where users only receive new authorization tokens infrequently, e.g. on sign-in, then changes (such as invitations to new repositories) won’t be immediately visible. Instead, users would have to refresh their authorization token for those changes to appear.
If you have more authorization data than you can pack into a single token, where should it live? One option is to use a shared special-purpose database for authorization data.
Companies typically design their microservices to have independent datastores to ensure reliability or security isolation. That’s the right thing to do when your services don’t have any hard dependencies on each other.
However, when implementing authorization, you’ll specifically want to share this data with all of your services. A shared database is a good solution in such scenarios. Databases handle this task of storing data and making it accessible for multiple readers in a highly-performant way! If you don’t need all services to manipulate authorization data, you can even use database read replicas to share the data between services without relying on a central host as a single point of failure.
In our GitClub Jobs example, we could constrain the management of all the relevant authorization data to the web service (like user identity, organization, and repository relation data). Doing so would let the jobs service consume a read replica of this data without creating load on the web service.
A downside of splitting your data into multiple databases (service-local plus authorization) is that you can’t query across all the databases at once. If you previously used JOINs to filter results based on authorization criteria, you’ll need to reimplement those JOINs in-memory in your app. That can be slow and memory-expensive.
If you operate many services, it doesn’t always make sense to replicate a copy of your entire authorization dataset to each of them. Each service is likely to use only a limited portion of the dataset, making the replication wasteful.
In that case, you could implement an event-sourcing mechanism on a platform like Kafka. You can publish changes to resources to a distributed event stream and consume them in different services. Event streaming platforms like Kafka typically let consumers subscribe to a filtered portion of the entire stream. This lets services subscribe only to the changes that are relevant to their authorization models and to maintain their own independent copies.
Using an event stream is effectively the same as database replication, except that you’re using your application to implement filtering for performance or other reasons. The result is the same—a local data source which your service can query for quick authorization results.
If you don’t have a good data transfer mechanism at hand, don’t deploy one just for authorization.
If nothing we’ve described works for your team, your next best option is to build a centralized authorization service. Centralizing authorization can also make sense if you’re going to grow the number of services you operate in the future, as each new service has only a small marginal cost to support.
The key difference between this and the shared database approach is that you’ll centralize policy and decision-making in this new service in addition to centralizing your data.
The biggest challenges in developing a new central authorization service are about data. Specifically, the problem of either reliably replicating that data or centralizing it. Your architecture will depend on whether you define this new service as the authoritative source of truth for all of its contents or if it contains copies of data from elsewhere.
In a distributed system, each service has responsibilities and dependencies. In our GitClub example, the web service is responsible for managing users, organizations, repositories, and their relationships with each other. The jobs service handles the creation, orchestration, and execution of the jobs from repositories. Your distributed system will also have service-specific relationships.
For centralized authorization to work you’ll have to relay a copy of that information from all of your application services to the authorization service. As we mentioned previously, microservices will usually have their own datastores for reliability reasons. Because of those separate datastores, you’ll have to repeat this replication setup for each datastore that contains data that’s relevant to your authorization policy.
There are many possible options when it comes to the actual mechanics of data replication. One option is to expose an API in your authorization service and then call out to this API from within your services’ model code when a user modifies a resource. The key to making this work is to ensure that you respond to all of the error cases correctly—for example, if the remote authorization service HTTP call fails, then the local update should too and vice versa.
Many database technologies expose an operation log that serializes every record creation and update operations in sequence. You can consume this operation log to monitor for changes to the resource types you care about and then relay these changes to the authorization service. Use this pattern in circumstances where real-time replication is important, like populating a search index based on the contents of a relational database.
The worst case is a disagreement between the contents of the local and remote service data. If a resource exists locally but not in the central authorization service, users won’t have the correct permissions on that resource.
As you add more services, the number of connections between services also grows. Replicating data and making sure that your local and remote copies are in sync can become unreliable. You can sidestep this headache altogether by going all-in on your central authorization service and making it the authoritative source of truth for the data.
Unfortunately, that approach can require a major reorganization of your application services. Where previously those apps could rely on a local database to filter and retrieve resources, they must now call your new central authorization service. This approach is practical, but it breaks the strict isolation between microservices.
Choosing which parts of your model to relocate to a centralized datastore depends on the specifics of your policy. Nearly any attribute can be authorization data.
In our GitClub Jobs example, it makes sense to centralize the storage of the role and relation data for organizations and repositories. The web and jobs service will both use that data.
But what about other service-specific attributes? For instance, we might use Job object’s author property to implement rules, like our earlier example where users could cancel their own jobs. If the jobs service is the only service whose authorization rules consume this relationship, it can seem wasteful to relay it to a central service. But that’s fine! That data is important, and it’s rare for the size of that data to be a bottleneck. If you want your central authorization service to return yes/no authorization results, then it will need to have access to all of the data involved in producing that decision. Be as generous as you can when replicating data to your central authorization service. That way, you’ll be able to write more featureful policies.
Doing this also has the advantage of making model extensions more straightforward in the future. For example, imagine that later we wanted to create a new GitClub Artifacts service that distributes the files produced from a Job to users. We could use the Job’s relationship both to its output Artifacts and the User who created it to implement very tight access controls.
Building a centralized authorization service is a serious effort. Companies that have built their own authorization service—like Google, Slack, and Airbnb—found that it took more than a year of effort by a dedicated team.
You can also use a centralized authorization service from a third-party. Oso Cloud, Ory Keto (a solution based on Google Zanzibar), and Aserto (a solution based on Open Policy Agent) are some of the options available. We’ve written more detail on these services and others in our guide to authorization as a service.
Whether you're just starting to split up your monolith or very advanced in your microservices journey, you’ll need a consistent authorization model to provide a seamless experience for your users.
The most important thing is that permissions are transparent to your users. Your users shouldn’t be able to tell the difference between interacting with a monolith or a collection of different microservices. For this to work, your services will ultimately need to agree on authorization.
Whatever your plan, build authorization around your application and not the other way around. What this means for you will be different depending on your specific requirements and circumstances.
For example, if you have only a small amount of role data for each user, you can probably get away with storing it in something like a JWT and passing it to each service. If you already have effective service-to-service communication in your distributed environment, you can use that communication to fetch the necessary authorization data directly from the source. If you need to enforce more granular authorization than you can achieve with a JWT, then consider centralizing authorization in a single service.
If you have lots of authorization data and no existing infrastructure available for the task, then you’ll need to build a central authorization service to house this data. Your biggest challenge in operating such a service will be ensuring that the data it contains reflects an up-to-date and consistent state of the world.
If you choose to build a central authorization service, your main priority will be keeping your data up-to-date with all of the changes happening in your different services. For every new resource that users create and for each role you grant to a user, you’ll need to make sure that you replicate these changes to your central authorization datastore. Getting this wrong will mean false-negative authorization results and a frustrating experience for your users.
If you are on a timeline, want to focus your energy elsewhere, or want to just want to sleep easier, we have built Oso Cloud so you don’t have to. You can learn about it here.
If you’d like help figuring out what solution will work best for you, talk to us on Slack or schedule a 1-on-1 with one of our engineers.