Abhishek Parmar, co-creator of Google Zanzibar & Airbnb Himeji, joins Oso as technical advisor.
We sat down with Abhishek to learn more about large-scale distributed systems, fine‑grained authorization, developing Google Zanzibar, and why he’s opted to spend time with the Oso team.
What major projects have you worked on?
I worked on large-scale distributed systems, mostly at Google. I was at Google for more than twelve years.
First, I worked on Google's web search serving infrastructure. Then I moved to Google's RPC system, "Stubby," which now is known as GRPC. I added streaming support to Google's internal version of GRPC. After that, I worked on a big storage system, Bigtable. I was tech lead for Bigtable for almost six years — that's the system that started the NoSQL revolution. My last project at Google was Zanzibar, which is a very large-scale and fine-grained authorization system.
After that, I switched gears and went to a startup where we built industrial-grade 3D printers. I worked on everything from a custom-designed board on an ARM chip all the way to a Ruby on Rails web app running on AWS. It was literally the definition of full-stack. I'm still involved as an advisor.
Now at Airbnb, I help teams build systems for transactional data storage. I'm working on how data storage bridges into the offline world — data warehouses — and also helping teams build the compute infrastructure on which that workload is run.
How did you start working on Zanzibar?
At the time, Google was going all in into building Google Plus. A few of us were invited to figure out what infrastructure pieces would be needed to build Google Plus. The product leadership wanted to differentiate Google Plus from its competitors. They decided that privacy was going to be a key differentiating factor. When we looked at how the entire system was going to be built, we decided that it would be better if all authorization decisions were enforced in one single place. Because the backend stack was made up of many, many pieces, we needed one single authorization engine.
Before Zanzibar, even at Google, authorization logic was sprinkled around all these backend pieces. It was hard to ensure that you got everything right. It was hard to audit and hard to debug when things were going wrong. We decided to go for a centralized authorization approach.
The biggest challenge in making a centralized service like that is that the service needs to scale. It becomes more critical than anything else. If it fails, then nothing else succeeds — nothing works.
Later you went to Airbnb — was it to work on the same problem with their Himeji authorization system?
Actually, I didn't go to Airbnb to work on that same problem. They had independently decided that they need something similar. One of my managers who I used to work with that Google is now the CTO of Airbnb. He wanted me to help out at Airbnb.
I ended up consulting with them a little bit to help them figure out how to build that system.
Though I wasn't involved in a hands-on level with Himeji, I was guiding them from the outside. I advised on the design of that system.
What from Zanzibar did you leave out in Himeji?
The biggest thing that we left out was this clever but hard-to-use feature that ensured that you could get extremely consistent results. The "Zookies" feature, which defends against the “new enemy problem.” Google Plus's leadership cared a lot about consistency at that time. Whereas in practice, you know, it doesn't seem to matter much. Also, we didn't have the Spanner database to help build that feature. We could have used other databases, but we just decided to leave it out because it wasn't very useful.
This insistence on consistency was a requirement imposed from the product side. We worried about a disaster or failure if we ever made a single authorization mistake. This made sense at the time, but in hindsight, we could give the wrong authorization answer for a second and it would be ok. Certainly, for most products using Zanzibar, it didn't matter. At the time, the consistency that Zookies gave us was important because we were building a social media platform. But, that's not the one feature that will make or break a social media platform. For instance, Facebook has plenty of inconsistencies when you interact with the product.
Other things we left out because they were artifacts of having to scale the system.
What were the features of Zanzibar that were most important for developers and product teams?
Giving structure to the keys of Zanzibar's relation-objects was a pretty good feature. It let developers express relationships more efficiently. It fit naturally with how a lot of products think about their keys [that is, the entries in their ACLs]. Then, we could build even more structure on those keys.
Himeji has done similar things, even beyond Zanzibar. Himeji provides features to extract parts of an object key and then do something with it. Maybe part of the key points to other objects, and those objects are related to the key.
One of the most important features of Zanzibar — other than centralizing authorization — was to be able to express relationships in authorization. The situation where you say, for instance, "I want Alice to have access to all the documents that Bob has read-access to." That's a "relationship" in authorization. Allowing devs to express relationships between keys — programmatically or declaratively — was a good feature.
Zanzibar's idea of having "overlay tuples" was very powerful, but could have been prone to abuse. For example, if you wanted to express "this document should not be accessible from this range of IP addresses," that can't be expressed directly in the system. You need to know at runtime what a person's IP address is. We came up with the idea of "overlay tuples," where certain trusted authentication systems outside of Zanzibar could provide runtime data.
That allowed authorization decisions like "no one from this IP address range has access." But it also made certain funky things possible. Like, "only allow access to this document on a night with a full moon." I think that kind of rule should be discouraged. You want your authorization logic to be much more simple than that, to ensure correctness and maintainability.
What was the experience for developers that worked with Zanzibar?
This is Zanzibar's biggest weakness, and also the biggest opportunity for someone to innovate. As flexible and as powerful as Zanzibar is, it is not easy to use. It has all the components to metaphorically build a jet engine with, but that's not something that every person needs to do. People want a pre-built jet engine that is capable of going a thousand miles or is capable of a certain speed. Not everyone should have to assemble their own jet engine.
It was hard to ramp up to Zanzibar. But, any migration of this kind is hard. Authorization itself, in most products, is a feature that nobody wants to think about. It's a necessary evil. When you have your authorization logic sprinkled around your app, and you have to look hard to find all your edge cases, it's a hard task to migrate that. So, it's not easy to encourage people to undertake the migration to a centralized system. It's not obvious that your life will become easier.
In both Zanzibar and Himeji, it took leadership saying, "this is a company risk." It took that kind of mandate to make people migrate to a centralized system.
The second friction point was that the systems — Zanzibar and Himeji— were complicated. Even the configuration language itself can get a little bit out of hand. It's one of those things where you could spend a day or two to write a policy, but when you went back six months later to edit it, you'd have a hard time understanding what you wrote. That's a big failing of these systems.
That's also a major opportunity, because organizations care about solving these problems!
After Zanzibar and Himeji, why Oso?
I got introduced to Oso by Bill Coughran, who also used to be my manager at Google. It's mostly about people. I have some background in this domain, and I can help share my experience. But at the end of the day, it's about how I choose to give my time. I'm interested in working with people who are passionate and have strong convictions, and who understand how to build systems. I saw that in [Oso founders] Graham and Sam, and I thought I could help them in this space.
How do you convince developers that risk management is a critical problem?
It's not easy. You're talking about risk to your brand, losing opportunities, and risk to users. I don't think developers think in those terms, except for risk to users. I think the situation is getting better, though. Security has a much higher profile than ten years ago. People are aware of breaches, how bad they can get, and what an attacker can do.
It all has to start with someone in an organization saying, "wow, this could be pretty messed up." "Someone could publish this exploit in the Wall Street Journal." When someone is worrying about the Wall Street Journal problem, the organization will start to care.
Devs inherently understand that security is a good thing. But, they want to make amazing products without spending their whole time on security requirements. There's a high level of empathy right now for security-related products. If we can make them more palatable and easier to integrate, that's the route to success.
Educating people on the risks is important too, but that can be a hard sell. It's better to give devs an easy-to-adopt solution and say: at some point, your company will care about this risk, but you won't have to worry about it because you did it right from day one. For that dimension, it's about reducing the friction to use your product.
What are you excited to get really right in Oso Cloud?
Visualization and analysis of authorization data. We did that with Zanzibar, but that was an afterthought. That could be a pretty compelling value proposition for customers. I think that should be a first-class citizen in an authorization product.
Also, if we can make it easier to grok and migrate to an authorization system — if we [at Oso] can provide a higher level of abstraction — that will be important to a lot of people. Oso needs the parts like an evaluation system and a storage engine, but that doesn’t need to be exposed to clients at a low level.
Thank you so much!
If you want more background on the Google system that Abhishek refers to, you can read What is Google Zanzibar. To learn how we’re building Oso Cloud, you can read How Oso Cloud Works.