SE Radio 406: Torin Sandall on Distributed Policy Enforcement

Torin Sandall of Styra and Open Policy Agent discussed OPA and policy engines and how they can benefit software projects security and compliance. Host Justin Beyer spoke with Sandall about the benefits of removing authorization logic from your applications. Specifically, how OPA can create a single view of all policies across your software stack for both configuration verification and user authorization. They also discussed how having a generic policy tool can allow broader use cases such as data masking and data localization enforced through the same tooling. They also had a brief discussion on how policy engines can be leveraged in combination with authentication protocols, such as OAUTH, to create an Authentication, Authorization, and Account (AAA) stack within applications. They also discussed how OPA, specifically, deals with some distributed systems issues, such as Split Brain problem, with regards to the different deployment options. A discussion also occurred surrounding how Gatekeeper fits into the OPA project umbrella to provide tooling specifically for Kubernetes.

Show Notes

Transcript

Transcript brought to you by IEEE Software

Justin Beyer 00:51 Hello, this is Justin Beyer for software engineering radio. And today I’m speaking with Torin Sandall. Torin is the co-creator of the open policy agent (OPA project) and the VP of opensource at Styra, the company rethinking authorization for cloud-native systems. at Sierra Torinleads development of the OPA project and focuses on helping users and partners succeed. Toren is a frequent speaker on policy and authorization at conferences like cube con, O S con, velocity, and more touring. Can you give the audience a quick introduction to what a policy engine is?

Torin Sandall 01:19 Sure. So a policy engine is you can kind of think of it in layman’s terms is kind of like a concierge for your service. So it kind of exists to like offload policy decisions from your software, from your applications, from your API gateways, from your, you know, your orchestrators from scripts and CICB pipelines to SSH statements and so on. So a policy engine kind of exists to answer questions for your software when your software needs to make some kind of decision and has a question to ask.

Justin Beyer 01:45 Okay. So essentially it’s trying to offload some of the decision making out of your application core and instead allow you to express a decision-making process in a separate instance.

Torin Sandall 01:56 Yeah. And, and it can be, you know, it can be in a library or it can be in a separate process that runs on the same machine, or it can be in a separate container that’s in the same pod, or it can be, you know, in a service that’s running across the network. But generally, the idea behind policy engines is they, they kind of exist to decouple policy enforcement from policy decision making. So the decision making that gets handled by the policy engine based on rules and logic that had been fed into it, and then the enforcement resides inside of the software that queries it. So, you know, if you’re building like a, an, uh, an application that had to serve a rest API, then you know, every time that rest API gets queried, it’s going to have to decide whether or not to allow that, that API requests. And so the way that that typically would work with the policy engine is that whenever the API request comes in, the application would query the policy engine and say, should this API request be allowed? And the policy engine will do a bunch of crunching and figure out, you know, essentially yes or no, whether that should happen. And it’ll hand that answer back to the service or to the application so that it can be enforced. Okay. So essentially

Justin Beyer 03:00 It’ll make a Boolean logic decision and return that to the application. And just to table it, we will definitely go back and talk a little bit more about the implementation of a policy engine and an application, but I just want to change direction a little bit and start specifically talking about the project you’re involved in open policy agent. So what is it, how does it fit into this domain of policy engines?

Torin Sandall 03:22 Sure. So, so the open policy agent, uh, or OPA as we like to call it, some people call it OPA, we call it OPA people can go at whatever they want. We like to call it OPA. We started the project a while ago, uh, about four years ago in early 2016. And what OPA kind of provides is this is basically a building block for enforcing policies consistently across a wide range of, of software. So at its core, it provides a, a way to express policy and then an engine to evaluate policies and produce decisions that can then be handed back to your software

Justin Beyer 03:57 To be enforced. Okay. That makes sense. So how specifically does OPA work?

Torin Sandall 04:01 So the way, the way that OPA works is that you, you express your policies in a high-level declarative language, that’s kind of purpose-built for expressing policies and rules. Um, and then you feed those rules into OPA via well-defined API APIs or off of the file system, or, or however you want. And then whenever your software needs to make a decision or needs to obtain a decision, it can query OPA and it can ask, you know, give me back the decision for these, for these inputs. So for example, if you were building a microservice and that service exposed an API to serve, you know, financial reports or something like that, or salary information, and whenever a request comes into your service to look up the salary of an employee, or to look up some financial report, um, the service would decide to quit. You would need to make a decision about that, and it would query OPA and inside of that query, it would describe the request that was happening, um, to itself, right?

Torin Sandall 04:53 So provide things like the method and the URL and the HTTP headers, and maybe even the message body and all this data. And it would give that over to OPA. And it would be asking basically for a decision, a yes or no, like a Boolean answer, like true or false. And so then OPA would evaluate the rules that you had that had been fed to it previously. And it would out the outcome of that evaluation process would be the decision. It would be true or false or allowed an I, or whatever you want to say. Um, and that would be sent back to your service to be enforced.

Justin Beyer 05:20 Okay. So essentially I take policy, I write it in this high-level language, and then I put it into OPA through whatever method it be. And then my application then uses OPA to say, based on this information that I received, should I allow this request or deny this request?

Torin Sandall 05:38 Exactly. So there’s kind of like two sides to it. There’s the, there’s you as a policy author and somebody who’s responsible for kind of managing policy in the system, right? So you’re responsible for writing down the rules that govern who can do what, and maybe you’re also responsible for building out some of the systems that distributed those rules, and then there’s the software that actually needs to get decisions when things are happening when API requests are coming in, when users are trying to SSH into machines, you know, when scripts are running inside of CIC pipelines, that software is, is solely concerned with the decision, like getting back a decision from OPA. And so it’ll, it’ll ask Oprah for policy decisions when it needs them.

Justin Beyer 06:11 Okay. And how do you express this policy? What’s the language for it?

Torin Sandall 06:14 So the language for expressing policy with OPA is called Rigo and Rigo is, is a, it’s a, it’s a Latin word. It means to rule. So we thought that was a good, good name for a policy language. It turned out that we, we, we didn’t have any Australians, uh, on the, on the core team. And it turned out that Rigo actually made a radio as it’s pronounced means car registration in Australia. So we, we learned that a little bit too late to, to kind of rename it. But so anyway, you write you nonetheless, you write your rules and re rego and, and what they, what they basically exist to do is answer questions, right? Cause your software’s gotta make these decisions. And those decisions are often formed in, in, in terms of questions, right? Like is, you know, Alice allowed to see, you know, Bob salary or something like that, right.

Torin Sandall 06:57 Is this workload allowed to be deployed or, you know, maybe more interesting, like what, what policies could be violated if this workload were to be deployed or what records, you know, should someone be allowed to see, right? These are, these are all kind of questions that are there, policy in nature. And so the language that that open gives you is a, is a high-level declarative language that lets you write down rules that govern those kinds of decisions. And so the language is very good at expressing kind of just kind of logical statements over arbitrary sets of data.

Justin Beyer 07:28 Okay. So it’s essentially implementing attribute-based authentication or authorization to that extent. So it’s saying here’s this attribute and my policy is that Jane can see financial data, but she can’t see personnel records. So I would write that in Rigo and say, users with this attribute of financial manager can see financial records, but can’t see personnel records

Torin Sandall 07:58 Exactly. There there’s typically like a phase before authorization, which is authentication that happens, you know, before, before you actually decide whether or not to allow or deny some kind of request. And so that authentication phase is super important. But once that finishes, like once you’ve kind of decided or proven verified that, you know, Alice is who she says she is, then you need to make this decision. And that that decision-making process comes down to comes down to logic. And it comes down to attributes that describe who Alice is, what actions she’s trying to perform, what resource she’s trying to access as well as the environment that all of the software is running in, right. It might depend on what device Alice is connecting from. It might depend on how she authenticated. Did she do go through multi-factor or not? They might depend on the time of day and so on. Right? There’s all kinds of different environmental factors that affect the decision-making process. And so with OPA and Rigo, what we try to do is give people a language that’s very good at, at expressing logic over all these kinds of these kinds of attributes.

Justin Beyer 08:56 Okay. So I would authenticate the user and let’s say, give them back and off token with the attributes about how they authenticated, whether they use multifactor. And then when they come back to the application, provide that token to prove that they are who they say they are, that’s where OPA fits in and starts to apply the policies almost. And we will touch on this a little bit later in that zero trust, networking kind of way.

Torin Sandall 09:22 Yeah, exactly. So, so typically the way the, we, we, the way that we think about ID like identity and, and, you know, token like Oh, tokens or Jason web tokens, is that they are just another kind of set of attributes that inform the decision making process. There’s nothing inherently special about, I mean, they’re, they’re obviously super fundamental to the, to the policy, but you know, there are lots of other things that are important. So they’re just another source of data or context that informs the decision-making process. So that’s kind of how it works at a high level. You know, when you actually start writing rego, what you’re essentially doing is putting down just a bunch of, if then statements you say, you know, a lot, you are allowed to perform this operation under these conditions. And you kind of just write down the conditions under which things are allowed to happen or not allowed to happen, or, you know, potentially how, you know, the incoming requests may need to be modified or changed or obligations that might fall on the, on the client and so on.

Justin Beyer 10:16 Okay. So changing direction a little bit here, let’s talk about how it’s actually implemented. So I take OPA and I have my application that exists. What kinds of applications benefit from me putting open and are we talking, you know, small applications, huge enterprise applications, large distributed systems, where does that benefit start to show with OPA? So

Torin Sandall 10:39 We, we made a very conscious decision early on in the project to try to keep it as, as kind of flexible and domain agnostic and general-purpose as possible. Uh, and so I personally think that, um, applications big and small, um, in different kinds of domains can actually can actually benefit quite a bit from it where we see a lot of interest today for the project is from companies, enterprises, large organizations that operate in, you know, um, highly regulated environments like financial services companies, uh, healthcare companies, uh, and so on. And we see kind of like two broad categories of use cases. So one use case is mostly around configuration validation, essentially. This is what people are typically talking about when they, when they talk about open the context of Kubernetes in that context, OPA kind of exists to put, uh, safeguards or guardrails in place that protect the clusters, the communities clusters, as well as the, like the applications that are running on top of those clusters from, uh, from themselves and from, uh, from each other and so on.

Torin Sandall 11:40 So that’s kind of like one broad category of use cases, just kind of safeguarding platform configuration and, and, you know, the, the metadata that defines compute network and storage resources in this sort of cloud-native stack. The second of major category of use case, or kind of application for OPA is, is just this kind of API authorization problem, right? Every single time you build a microservice or some sort of application, you, you have to build out AAA right, authentication, authorization, and accounting. And so there are kind of industry-standard ways of doing authentication and authorization on the other hand is, is typically been deeply kind of embedded into the, into the business logic. And what we’re seeing now is that people are kind of realizing that it’s better to decouple that and split it out and offload it to a component like OPA. So there are all kinds of API authorization use cases around microservices, as well as applications, whether you’re talking about role like implementing role based access control, or know attribute-based access control, or you want to implement, um, you know, AWS IAM style, access control model. There are various companies that are, that are using open just exactly for that.

Justin Beyer 12:45 Okay. So one of the huge benefits is almost I can have a consistent authorization across my entire stack. So if I put up a Kubernetes cluster and launch my new application, and I’m already using OPA as the basis of authorization for my existing 400 applications, I can then take those same roles and apply specific controls within that application, but still mapped to those same roles.

Torin Sandall 13:11 Yeah. There’s nothing there’s nothing kind of, that would inhibit you from, from load, from using contextual information that describes like the running software, when you’re writing policy for Kubernetes clusters or running policy for your, for your services and having the ability to, you know, express policy in a kind of a uniform and consistent way across a wide range of software, whether you’re talking about microservices or platform services, or, you know, even, even host-level Damon’s like SSH is, is tremendously valuable when you’re talking about these large organizations that otherwise have very poor kind of control and visibility over the rules and the governance of the, of the system.

Justin Beyer 13:45 Exactly. So there’s definitely some compliance benefits that you could get out of it, which I’m going to circle back a little bit later to talk on some of those benefits, but just to dive a little bit deeper into the OPA client and how that would be implemented, how does it handle a lot of the issues we see with like distributed systems where, you know, we can’t get a hundred percent network reliability, or you might end up with a split-brain problem or something to that effect.

Torin Sandall 14:09 Sure. That’s a, that’s a good question. So when it, when it actually comes to, you know, using OBA to integrating it into your, into your stack or your software, you have, you have a couple of different options. And we kind of tend to think of OPA as this essentially like this host local cache for policy decision making. So you can either embed it as a library if you’re building services and go, or you can actually just run it as a Damon as a, as a standalone server. But in either case while, I mean, in both cases, we recommend that you run it essentially as close to your software as possible, right? So if it’s embedded as a library, it’s in the same process, but if it’s running as a Damon, we recommend that you take it and you run it as a whole, as a host level, Damon, essentially next to your next to your software, or in the case of Kubernetes as a, as a sidecar, that’s a sort of an architectural pattern within Kubernetes.

Torin Sandall 14:53 And the reason that we recommend that you do that is sort of twofold, right? So if, if you imagine like a, an application that’s, that’s kind of designed with a service-oriented or microservice architecture, when an incoming request hits that application, it might have to traverse, you know, four or five or 10 or a dozen or 20 or something microservices in order for it to be fulfilled. And if at every single hop along the way your services or your software has to call out across the network to get a policy decision back, then there are a few different things that can go wrong. And I think you kind of alluded to them there a little bit with things like split-brain, right? So one, one thing that can happen is that, you know, the network can get slow or it can, it’ll injure. It looks, it’s certainly gonna introduce latency into, into the path, right?

Torin Sandall 15:39 And so if every hop you have to pay this network overhead, then that’s going to impact your, your overall application latency, right? And so your, your application is not going to perform as well, and your users are going to be unhappy. The other impact is that it’s going to affect your applications. Uptime. It’s gonna affect its availability, because if at every single hop along the way, you have to call it across the network. Then if there’s a network partition or the host that OPA is running on crashes, it dies. Then your application is not gonna be able to get a decision back, right. It’s gonna, it’s gonna, it’s gonna, the service is going to ask over for decision, and it’s just gonna sit there waiting until some timeout expires, and then it’s gonna, it’s gonna stale. It’s gonna encounter an error and it’s going to have to make a decision. And I guess basically at that point about what to do, because authorization is, is kind of this, this fundamental security problem, because it exists in the critical path of your application. You typically have to fail closed. So that’s going to impact your, your availability. It’s going to impact your, your uptime. Um, and so because of that, because of performance and availability, we recommend that you take OPA and you run it as close to your software as possible, ideally on the, on the same machine.

Justin Beyer 16:41 Okay. So it’s the classic balancing security and, you know, the confidentiality, confidentiality, integrity, and availability, you know, balancing those all out. And by putting it as close as possible to the service, we’re still getting the availability that we probably need for our service, but still ensuring we get the required security controls in place for our application. Especially if it’s something in a highly regulated industry where the expectation is, every request should be authorized to some extent before it’s allowed.

Torin Sandall 17:10 Yup. Now of course, OPA is very flexible. You can embed it as a library. You can stand it up as a Damon. If you want to run it across the network, if you want to run it as a service, w we don’t stop you, you can do that. But it is, it is kind of on you to, to think about the availability and performance impact that’s going to incur on your application. So, um, you know, we don’t, we don’t try to, if we don’t force you to run it on every machine or next to every application, but, uh, that’s, that’s kind of our that’s, that’s, that’s how we thought about open at the beginning was that it would enable a distributed enforcement model for policy and authorization, right. It didn’t a lot of legacy systems will, are kind of overly centralized, you know, they were designed for like, uh, like older environments. Um, and so that’s one of the things that opened us a little bit differently.

Justin Beyer 17:53 Okay. That definitely is helpful to know that it’s able to handle this distributed systems issue, but just a little bit more on that as we’re talking about putting OPA as almost a side card, every application, or as close as possible to the application, how do I manage all of these OPA agents and provide a consistent policy across all of them.

Torin Sandall 18:12 Right? Yeah. So if you have like all of these, these agents that are there, and by the way, Oprah keeps all of the rules and data that it uses to make decisions in memory. So it doesn’t have any kind of like decision time dependencies on, on any kind of external database or anything like that. Right? So all the, all the evaluation happens locally inside of the agent, there are mechanisms to call out from inside of the policy, if you need to, but by default, it doesn’t, it doesn’t do that. Right. And if you architect your policies the way that we kind of encourage them than it will. But so if you, if you have your, have your kind of architecture set up, so you’ve got these agents running throughout your infrastructure, and they’re all storing everything in memory, you need some way to manage them, right?

Torin Sandall 18:48 You need some way to kind of control what policies and rules they have loaded. And you want to know, you want to know, you know, what decisions they’re making at the end of the day, right? That’s important for audit and accounting. And so what OPA does to enable that is it exposes a set of, of management, uh, API APIs. So those APIs are basically, they just exist to provide control and visibility over the agents. So Oprah has an API called the bundle API, which you can use to essentially distribute bundles of policy and data out to the agents. Essentially, you configure OPA to periodically phone home to a, to a remote HTB server. And OCHA will kind of sit there trying to get the latest version of policy, uh, and it’ll, it’ll periodically download it and activate it. And so that can be a simple HTB server, you know, serving off of a file system.

Torin Sandall 19:37 You can point OPA at S3, you can write your own service to serve policy and data. And we see people doing all kinds of stuff like that. Oprah also has other APIs for visibility. So it’s got what we call status API for receiving notifications about basically like what version of policy OPA is currently running and whether they’re running issues activating the last, the last version that it got from your, from your bundle API. And then there’s also this decision log API, so that you can configure open to keep a little record in memory every time it makes a decision. So every time your software queries OPA it’ll keep a record of that around. And that record includes all of the input attributes and the decision that was made as well as a pointer to the version of the policy that it had loaded when it made the decision. And then it will periodically flush those out. It’ll upload those to remote API, or it’ll send them to a logger. And so then you can kind of aggregate those decisions and do all kinds of analysis on them. So Oprah has these kinds of like primitive API APIs that architects and developers can, can kind of build on top of it and say, well, and then there, of course there are companies like, uh, Stara that provide a commercial control plan for managing open policy agent deployments.

Justin Beyer 20:45 Okay. So the policy agent is almost a data plane level, uh, agent providing the authorization for a service. And then you would still want to control plane to provide the policy, gather feedback from the agent check statuses verify is running the latest policy so that you can detect issues like an agent running one version of policy. And then you push out a critical change because you notice an issue with it that way you can detect, Oh, well, these agents over here and this application aren’t being updated. Why

Torin Sandall 21:14 Exactly? Yeah. Having, having like a visit visibility and understanding the performance and, and, you know, state, excuse me of the OPAs is super important. Um, and then also having a kind of a record of the decisions that the, that the agents have been making is also super valuable from an audit perspective. Cause now you can kind of build up this historical record of all the decisions that have been made over time, you know, across your cross, your stack effectively. Cause that’s powerful for audit. It kind of feeds back into, into rollout and distribution of policy because you can use that historical record to do things like back-testing. And so on, on, on policy changes before you go and roll them out.

Justin Beyer 22:19 So Torin when we’re talking about using OPA let’s just focus in a little bit more on the security aspects of it. Where would we put OPA into the application security stack? I know we’re using it for authorization and I know it’s next to it, your application that you’re running, but what’s the advantage of using OPA over? I think very traditionally like an LDAP query based on a security group, or just trying to take OAuth claims and enforce that in the application.

Torin Sandall 22:50 Sure. So whenever, whenever you’re trying to secure an application, you know, the first thing you have to kind of figure out is how you’re going to do identity and authentication, how you’re going to verify, you know, that I am who I say I am, what I’m connecting to the system. And that’s what things like OAuth kind of help you do. Right? So off the way that I think about OAuth is that it’s sort of like power of attorney for, for software, for applications, right? It allows me to grant some piece of software, the ability to do something on my behalf. Right. But , doesn’t define what it doesn’t solve for you is once, once that kind of grant has been made, that grant still has to be validated by the server. That’s receiving the request from the application, right? And so today and in the past, what a lot of what had happened is that the validation of those grants, those claims would just be deeply embedded, hard, coded into the application business logic.

Torin Sandall 23:42 And that’s for good reason, because if you go read the RF, the auth RFC, it doesn’t tell you, you know, this is how the claims need to be validated. It might say for some of them what to do, but for others, it says, this is outside the scope, right? It’s just not part of a law. So the decision making around validation of claims and stuff is, is, is, is an authorization problem. And it’s a policy problem at the end of the day. So that’s kind of where OPA comes in. What OPA allows you to do is have that basic kind of claim validation just simply offloaded from the application, right? From the service to a dedicated engine that allows you to express it uniformly consistently across a wide range of software. And so this is particularly valuable. If you’re talking about, you know, a large organization that’s trying to rollout, you know, author, you know, some sort of authentication practice, like multi-factor across a wide range of applications because instead of having to go into each and every application and reconfigure it, or make changes to the implementation of that application so that it supports multifactor, they can just do that in one place.

Torin Sandall 24:39 They can go into the policy and they can say, okay, now require multi-factor for users that are connecting from, you know, certain geographic regions or users that are connecting to the system outside of business hours or something like that. Right. Um, it makes it much, much easier to roll out these kinds of like enterprise or organization-wide security policies, two fleets it applications.

Justin Beyer 25:00 Okay. That makes sense. So it gives you that generic container where you can tell the generic container of OAuth, Hey, based on these claims, this is how I interpret these. And then within the application, be able to query that consistently and say, I got these claims, what do I do? Or is this action acceptable?

Torin Sandall 25:21 Exactly. And there are different ways you can do that integration. It can happen inside of the application itself. The application can query OPA OPA and say, is this token valid, but it can, it can also happen elsewhere. It can happen inside of the, the kind of app or web framework that the applications implemented inside of. Right. So take like a spring framework, for example, right? You can just have a simple spring-security plugin that knows how to query OPA whenever an incoming request comes in, you can also do it outside of the application with the service proxy like Envoy, right? So that’s, that’s another way of integrating is to just have, or rather inserting policy enforcement and your stack has to take a proxy like Envoy, put it in front of your application and then configure Envoy to talk to OPA and ask for, for, for example, OAuth tokens to be validated as requests come into the system.

Justin Beyer 26:07 Okay. So it still is able to pull that, you know, authorization out of the application. And it’s really just focused on doing that within OPA. Now, changing gears just a little bit here, we did do an episode on zero trust networking episode three 85, and I’ll refer viewers back to that for more information on that field specifically, if they’re not familiar, but where does OPA fit into that whole architecture of zero trust networking? So

Torin Sandall 26:34 Zero trust is kind of about, you know, removing assumptions around security from the system, right? So, you know, if you, if you look at, you know, legacies are like older systems, you know, what they rely on is kind of centralized perimeter base network security, right? So it’s kind of like if you have a house and you put a lock on, on the front door, right. That, that’s kind of like perimeter-based security right inside of the house, there’s no locks, you can go into any room at any time. Um, and, and it’s, it’s kind of free a free for all with zero trust. You’re basically going into the house and putting locks on every single door. Now that could be a little bit kind of creepy, I guess, but, but the, but the idea is it’s kind of sound right. And so the challenge there, I think a lot of the time is around, you know, usability, right? How do you, how do you kind of maintain usability and how do you kind of make security? Something that doesn’t just totally get in people’s way? Cause otherwise they’ll find ways to bypass it and get around it. But, but OPA is kind of like this engine that you can use to put locks on every single door in your house. It’s, it’s intended to be super lightweight and super easy to, to embed. So it, it does fit very nicely into these kind of like zero trust architectures.

Justin Beyer 27:40 And with OPA can we leverage, I know you mentioned enriching the data, but can we do a more like complex authorization workflow, like saying if I logged in and wanted to do this action, now I need to be, I can log into the application without multifactor, but now that I want to do an admin action, I need to do multifactor.

Torin Sandall 28:00 Yeah. That’s a, that’s a good question. And I think earlier on when we were talking, we were focusing on policies that generate kind of Boolean like decisions, right? Like, yes, no, but one of the things that kind of separates Oprah from a lot of other projects in the past is that it actually allows you to generate non-bullying decisions. The decisions that you generate from your Okla policies can actually be arbitrary, Jason documents based on objects or values. And so what that means is that if you want to generate a decision that says, well, maybe you’re allowed or no, you’re not allowed, but if you go off and you go to this URL and you authenticate, you can come back, then, then, then you’ll be allowed to access. That’s something you can actually express today quite easily inside of inside of OBA.

Justin Beyer 28:39 Okay. So I still, I can provide a more complex workflow and say, Oh, well, now that you want to do this, you have to go here or give back the application more information other than just a yes. Allow or no, don’t allow, I can say, well, I’m not sure they need to go here and then come back and ask me again.

Torin Sandall 28:55 Exactly. So we see this a lot with our, with our Envoy integration, people will write policies that return the actual HTTP headers that force a redirect of clients coming into the system. Right. So it just builds on top of a standard HTTP, no redirection.

Justin Beyer 29:10 I think. Can you give me a specific example of an application where you’ve implemented OPA and it improved the security of the application overall?

Torin Sandall 29:17 Sure. Uh, I mean, there are tons of companies that have talked publicly about how they’re, how they’re using OPA at scale recently, we were, um, we held our first ever OPA summit back in San Diego at cube con in November. And at that event, we had, um, a bunch of different companies from kind of different verticals come and talk about how they’re using OPA. And one of the coolest ones was how, uh, Pinterest is actually leveraging OPA to enforce all kinds of different policies across there, across their systems. So they’re using open for standard stuff like config validation inside of CSED pipelines. And then they’ve also, they also talked about how they’re using it for, for API authorization and microservices, but then they’re also going even further and they’re using it to control access to Kafka. Right. And so Kafka is, it’s kind of like this, this data is the data plan, right?

Torin Sandall 30:06 For a lot of organizations, it has just a huge amount of information flowing through it on a daily basis. Right. And so they shared how they actually use OPA at-scale to enforce authorization or access control over who can basically connect to certain coffee topics and produce a read from them. And the numbers that they shared were pretty impressive. It was like at peak, it was serving about across their Kafka fleet. It was serving something like 500,000 authorization queries per second, globally across all of their clusters. And then when they added caching on top of that, it went up to, uh, something like 8.5 million. So the, the numbers that they showed there were pretty cool. And it was just neat to see that that kind of publication of how people are using OPA kind of add scale to solve real security issues. Another example that came out from coop con was some folks from Goldman Sachs that run Kubernetes, um, showed how they use OPA to not only define admission control policies, to safeguard the cluster, to say things like, you know, you’re not allowed to run privilege containers, or you’re not allowed to run images off the public internet.

Torin Sandall 31:03 Uh, but they also use OPA to define desired state or configuration for Kubernetes. So like whenever a, a namespace gets created, right. And this happens all the time for new teams, it’s in Coobernetti’s policies are put in, are in place to automatically PR so that, so that what happens is it automatically, um, uh, security, resources get provisioned. So things like our back rolls things like quota, as well as other things like persistent volumes and other objects. So there are tons of different, different use cases. And I recommend that people go online and kind of look around on YouTube and other places for examples of how people are running Oprah in the wild.

Justin Beyer 31:37 Okay. So essentially OPA has this huge range of resources and we can use it for things like authorization, or we can go all the way over, almost in the other side and say, we’re doing config validation and we’re almost forcing baseline templates whenever you do any type of specific action.

Torin Sandall 31:54 That’s right. Yeah. And that’s kind of been the goal since the very beginning was to provide a kind of a unified or a consistent way to do that across a range of software. So it’s been super satisfying to see people actually kind of get to that point today.

Justin Beyer 32:05 So just changing gears a little bit here, um, I want to move over to the compliance and audit side of it. How does using OPA help with that? Does it help reduce how much work we have to do when we’re doing an audit of an environment to verify that, you know, appropriate access control is being implemented.

Torin Sandall 32:23 I talked to someone a while ago for a particular company and they, they told me about how they tried to embark on this exercise to determine whether or not a public API request like a public request coming in from the internet could access credit card data, basically in their, in their, in their system. Um, and this was like a large scale application consisting of many microservices or different kind of layer layers of services. And they basically determined that it was gonna be impossible to answer that because of the fact that the authorization decision making was basically coded in, you know, in different places. And there was no way that they were going to be able to, um, in an automated manner, even in a manual manner, go through and kind of audit and figure out, you know, for every different application or service, whether that was whether, you know, public traffic could reach, um, credit card data.

Torin Sandall 33:09 And so the, the value there for them, for using something like OPA it, um, is that you no longer have to go into each and every service, right. You never know, look, I have to go into each and every application and look at each and every implementation of policy and authorization, right? If you’re a security engineer or a compliance officer or something like that in a large organization, there’s no way you’re going to build a ramp up on all of the different programming languages and frameworks and implementation patterns that all these different applications and services are implemented with. Right. And so just having that, that one, you know, unified way of expressing policy across a wide range of services, um, just, just by having that, that kind of allows you to decouple the decision making from the application business logic and have it representative one way. That’s tremendously valuable. If you talk to people that do audits and so on from large organizations in highly regulated industries, they’ll just, they’ll tell you the horror stories about spending weeks and weeks and weeks inside of inside of meeting rooms, trying to try to do these, these audits. And it’s just, it’s very difficult because there’s so many different ways of specifying who can do what that’s, what we’re trying to solve.

Justin Beyer 34:11 Exactly. So because everybody’s implementation is a little bit different and every language is a little bit different. The way I might implement authorization in one service isn’t necessarily the way I would implement it. Another, and I might have 500 different patterns in one service, 400 and another 200 and this one over here. And there’s no way you’re ever going to be able to trace down all of those paths individually to say, well, the request that hits this service, then this service and this service will never be able to get here, unless it has this attribute. Whereas with OPA, I can say, Oh, here’s this general policy. Does it meet these requirements? And does the application implement OPA correctly? And if it does, then I can assume that it will be enforced correctly. Exactly. Makes sense. So just going a little bit more into the compliance side, is there any benefit on the data governance side of the stack? So when we’re talking about config validation, is there a way for me to say my U S customer data won’t ever get started in the Kubernetes cluster we have in Europe, we do have

Torin Sandall 35:16 Users that are using OPA today for, uh, putting, putting access control basically in place over self-service provisioning platforms. Right? So, you know, a lot of organizations are trying to move towards more kind of like self-service platforms for provisioning things like message brokers and databases along with applications. And that’s great because it kind of increases velocity developers and applicant applications in the organization. But at the same time, you need to be careful, right? You don’t want to accidentally have someone’s data from the EU getting, getting shipped off somewhere else. Right. And so putting, putting kind of governance or guard rails in place over, over self-service provisioning platforms is, is something that we see people using OPA for today. We also see people using OPA to implement access control with different data Lake kind of projects. Right? So Kafka is one example. We’ve also got integrations with things like SEF and Minaya to implement similar types of fine-grained authorization policies that say, you know, you’re, you’re only allowed to connect to, or you’re only allowed to access this bucket or these objects, you know, if you’re connecting from the right geographic region at a certain point in time, you know, and so on.

Torin Sandall 36:21 So the authorization problem there is, is, is quite similar with there some more kind of speculative integrations with OPA that try to go even further into the data authorization problem and control access to data at a very fine grain level. So like at the row-level or at the column level, inside of a database, and you can do that today, but it’s definitely more of a cutting edge use case.

Justin Beyer 36:44 Okay. So more of the cutting edge use case I could put in open and say, my column of credit card numbers is an accessible, unless you have the appropriate role to access a full credit card number. And I’m doing that because I need it for PCI compliance. And now I can say with OPA yes, here’s my policy that says that only these people can access full primary account number. Therefore I’m PCI compliant cause I’m limiting access to it.

Torin Sandall 37:12 Yeah. And I mean, it can go even further than that too. And you could, you might want to say, for example, that, you know, a customer like, so suppose you’re a large financial firm and you’ve got customer service agents, you might not ever want them to see like a social security number, for example. Right. But you’d want to allow people obviously to see the last four digits of their own social security numbers. And then there are certain people in the organization, like somebody from like risk or something like that, then you still have to see the full social security number. Right. And so the decision about like how many digits to mask out on a, on a social security number is, is, is another example of a policy decision. And it’s another, that’s an actual use case that some, some folks have for open today.

Justin Beyer 37:48 Okay. So instead of trying to implement that in my application, and then having six applications that access to the same database, I’ll mask it a little bit differently. I have one system that’s saying, if they’re a customer service agent, they get no unmask digits. If they’re in fraud, they get the last four. If they’re a customer, they get the last four. And if you’re in finance, you get the whole thing. Exactly. Yup. Awesome. So that gives me that ability then to almost make that attestation and very cleanly provide one single sheet of paper to my auditor and say, here’s who has access to this? Because these are the roles that they have, and this is the policy we enforce. Exactly. Yeah. Awesome. And then is there anything that I missed that you think a software engineer should know?

Torin Sandall 38:29 I think this was, this was super fun. You know, the project is growing quite a bit. We’ve got con Europe event happening, um, in, at the end of March, the CNCF Linux foundation event happening at the end of March. And, and lots of people are gonna be there talking about Oprah policy agents. So if you’re, if you’re in Europe at that time, definitely check it out. Uh, we love contributions back to the impulse agents. So if you are using it and you have an idea or, you know, you see a bug and you want to see it fixed, you know, please come and engage with us. Um, and we also love new integrations. So if you have ideas for integrations, for the project, with new pieces of software, please come and come and share that with us. One of the things we launched recently was a, was an ecosystem or integration index on the website. So if you go to the website, now you can see a nice list of all the different integrations that we know about. Um, and you can submit a PR and your, your integration will get added to that page. So do check that out.

Justin Beyer 39:16 That’s awesome. I’ll definitely make sure to include that link in the show notes. And then just one other thing, before we finish up, I noticed a project that seems similar to open policy agent, the gatekeeper project. What is that?

Torin Sandall 39:30 For as part of open policy agent, what gatekeeper is, is sort of like an evolution of something that people have been using Oprah for, for a long time, instead of Kubernetes, this problem with admission control, what we’ve done is we’ve taken gatekeeper or sorry, we’ve taken OPA and we’ve kind of provided some first-class integration with Kubernetes. So that project is being developed jointly by styro Microsoft, Google, and others. And the kind of mission there is to provide a first-class, you know, Kubernetes native way of managing a admission control policy. So it gives you some nice features on top of OPA, like, like audit, for example, have the ability to audit your cluster against your, your OPA policies. Uh, and then it also introduces a kind of a way of parameterizing or templatizing your, your OPA policies that you can have kind of packs or, or predefined sets of libraries, of policies that you can then easily install in your cluster and then configure and be off and running. Um, so that’s a super cool project that, you know, represents the work of a lot of people.

Justin Beyer 40:25 That’s awesome. And then what’s the future direction you see with OBA? Any major changes on the horizon? The project itself

Torin Sandall 40:33 It’s pretty stable. Um, we’re probably going to declare it like a one Datto release pretty soon. You know, we take things like backwards compatibility very seriously. And I think that OPA is doing a pretty good job with, it’s kind of like core objective of enabling people to do config validation and audio if y’all authorizations, but there are some kind of new use cases that are, that are more or less speculative, like data filtering that I mentioned a minute ago and data masking that we’re there, we’re trying to improve that we’re kind of working on improved support for, and then we’re also one of the things we announced late last year was kind of basic GA support for taking your OPA policies and compiling them into WebAssembly using OPA. And so that ability to target WebAssembly runtimes, whether they’re in a CDN or they’re in a service proxy, or they’re in a database or they’re in your browser, it’s something that’s going to be super powerful and is going to become more and more prevalent. I think. So, um, that’s, that’s sort of on the cutting edge of open development right now.

Justin Beyer 41:25 Well, that’ll be a fantastic product to say coming out of OPA. So Torin. I just wanted to thank you for coming on the show and discussing how we can leverage policy engines like open policy agent for software security. This is Justin Beyer for software engineering radio. Thank you for listening.

[End of Audio]

This transcript was automatically generated. To suggest improvements in the text, please contact [email protected].

SE Radio theme: “Broken Reality” by Kevin MacLeod (incompetech.com — Licensed under Creative Commons: By Attribution 3.0)

SE Radio 406: Torin Sandall on Distributed Policy Enforcement

Show Notes

Related Links

Transcript

Join the discussion

More from this show

SE Radio 713: Héctor Ramón Jiménez on Building a GUI library in Rust

SE Radio 712: Dan Lorenc on Sigstore

SE Radio 711: Scott Hanselman on AI-Assisted Development Tools

Menu

Recent posts

Search

Search

SE Radio 406: Torin Sandall on Distributed Policy Enforcement

Show Notes

Related Links

Transcript

Join the discussion

More from this show

SE Radio 713: Héctor Ramón Jiménez on Building a GUI library in Rust

SE Radio 712: Dan Lorenc on Sigstore

SE Radio 711: Scott Hanselman on AI-Assisted Development Tools

Menu

Recent posts