SE Radio 499: Uma Chingunde on Building a PaaS

Uma Chingunde of Render compares building a PaaS with her previous experience running the Stripe Compute team. Host Jeremy Jung spoke with Chingunde about the role of a PaaS, building on public cloud providers, build vs buy, choosing features, user experience, managing databases, Series A vs later stage startups, and why internal infrastructure teams should run themselves like product teams.

This episode sponsored by Kolide.

Show Notes

Transcript

Transcript brought to you by IEEE Software magazine.
This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.

Jeremy Jung 00:01:10 This is Jeremy Jung for Software Engineering Radio. Today I’m joined by Uma Chingunde She’s the VP of Engineering at Render, and she previously managed the team responsible for Compute at Stripe. Before that she was an engineer and manager at VMware. Uma, welcome to Software Engineering Radio.

Uma Chingunde 00:01:28 Thank you so much for having me.

Jeremy Jung 00:01:30 So today I thought we could talk about the experience of building platform as a service. And so, where I thought would be a good place to start is maybe defining what that actually means. What is a platform as a service and what problem is it trying to solve?

Uma Chingunde 00:01:46 I think the term itself has not existed for as long as people realize, it has also been used in different contexts. So, to kind of share it a little bit, I think it would kind of talk about the ecosystem. So, you have software as a service and the way I think of software as a service is when you’re actually just running software online without having to download something to your local system. And so that’s what software as a service. And then at the other end, you have infrastructure as a service and that’s most of the cloud computing providers. So, for software as a service to exist, you actually first need infrastructure as a service to exist because that’s what all SAAS companies run on top off usually. And then in the middle is this kind of outer layer, that has kind of been built on top of infrastructure as a service, which is the platform as a service.

Uma Chingunde 00:02:41 So imagine you are a SAAS company, and you want kind of like, you know, you end up either internally building your own platform, which you’re then providing as a service, to all the other engineers at your company. Or you are relying on a third-party platform. And that’s kind of where companies like Render come in, which is you are providing a platform where you’re providing a certain amount of abstraction, like essentially software development abstractions for like, you know, building your core, driving your code, usually using open source components, building on top a GitHub or a Gitlab or similar, and then having some sort of old standard components, such as an ability to deploy your code, run your code, again as a service. And that something that provides all of those shrunk up is what I like to think of platform as a service. So the additional thing that it’s providing that differentiates it purely from infrastructure as a service, in my view, is infrastructure provides enough nuts and bolts. So it provides things like the layer of compute, or you’re getting memory in compute or virtual machine or at the next layer and this is kind of where maybe the boundaries get a little blurred — like, are you getting a cluster or you getting a container — but at some level that’s still like, you know, all of this infrastructure and then things on top of that, the next layer is platform.

Jeremy Jung 00:04:10 You mentioned infrastructure as a service being provided by companies like Amazon and Google providing you virtual machines, or maybe providing you a way to run containers and platform as a service would be a layer of abstraction on top of that. So not working directly with those things.

Uma Chingunde 00:04:30 Yes, exactly. That’s more the way I think of it as platform as a service is the tools to develop your SAAS software. But that provides enough higher level of abstraction and pure compute on memory.

Jeremy Jung 00:04:44 Companies that are running the big infrastructure as a service products like Amazon, like Google, why don’t you think that developers use what they already provide? Like, what is it that they’re missing that has to be served by companies like yours?

Uma Chingunde 00:05:00 To kind of answer the question, I’d like to kind of go back a little about the history of cloud computing and so informed a little bit by the fact that I used to work at VMware. So VMware kind of, they were not the first, but they were like one of the major providers of popularizing the concept of virtual machines. So before that, you only had physical servers for laptops or desktops, but like everything was like physical. They introduced this ability to kind of slice up parts of your physical server and create essentially virtual machines with the ability to find independent isolated systems within one physical device. And that became like portrait machines and that kind of like hotel computing because now Amazon and Google and Microsoft could kind of provide these virtual machines online. And so slowly everything kind of, the entire data center, which used to be like physical hardware, became virtual and essentially got moved through the cloud.

Uma Chingunde 00:05:58 But in that, what happened was all the complexity took off, lifted and shift. So, you know, the complex networks got lifted and shift. Everything were just move together to the cloud. When you today go to Google or Amazon or any of the cloud providers in many ways, it’s not that different and experience from buying a physical server and racking and stacking, and kind of, you know, there is some level of ease that has been introduced because it’s truly are not actually going to a physical store and like running cables that is back level of abstraction, but the concepts themselves are still essentially physical concepts virtualized with some basic level of simplification added. And now if you take that metaphor a little further, what developers, engineers, builders of products need is more than that, they need the dev environment. They need a lot of other things on top of just pure servers. If you could have compressed all of that into one product that stack layer that we are building.

Jeremy Jung 00:07:00 This layer that you’re building on top, are you building it on top of an existing cloud or are you running your own servers and how did you come to that decision?

Uma Chingunde 00:07:11 So currently we are building on multiple clouds. That is what we’re doing. The way we came to this decision is back, the current underlying cloud provider is the sort of commodity at this point. And things like Kubernetes give us enough of an abstraction that we can actually build on top of an existing cloud provider. And then also introduce on physical data centers under the hood. And we’ve kind of experimented with it, but we don’t, we had gone to half full production level systems running yet. So that is like part of the plan, but it isn’t there yet. These abstractions allow us to actually run on a particular cloud provider and then create a similar cluster on a different cloud provider. And then also that move that same group floor to bare metal eventually. But that’s kind of how it, how we kind of came to the decision was I think it was, so this was before my time at the startup. I have, I will have been there a little over a year, but I kind of know the history, which is, I think it was originally, I think was the core competency that we’re providing is this developer experience, is this platform. So the ideal goal was solve for that and then work, work down this package that we’re trying to build from scratch. Why reinvent, what has already been done at the lower of the internet and try to build a differentiation at the higher level then work at that.

Jeremy Jung 00:08:32 So it sounds like from what you were describing is you’re starting out with a software that can run on basically any virtual machine on any server. And you’re running on top of public clouds with this sort of testing in the back where you’re trying to see, like, if we needed to run our own servers, could we move those workloads over to them? And so maybe you get started running on these public cloud providers and as you grow, then maybe you could shift to bare metal to either for cost savings or for other reasons.

Uma Chingunde 00:09:05 Exactly. That’s kind of where we are. There’s many different reasons, cost saving would probably be the less interesting one. It would be kind of providing options for our service in places where the cloud providers may not exist. Something that is going to become more interesting in the last few years has also been regulatory reasons, but a lot of countries are introducing regulations where they want companies wanting to serve their citizens, to kind of like, you know, have a physical presence there. So there’s many different reasons. And so we think that that would always kind of be good reasons to explore.

Jeremy Jung 00:09:40 Do you have any concerns about these other cloud providers building what you’re providing? Like AWS goes in and goes like, oh, let’s see what Render’s doing and we’ll make our own version of that?

Uma Chingunde 00:09:52 I think for better or worse, I think that’s something that most SAAS companies have to deal with. I think you can probably like between the three major cloud providers, you could actually try to always ask this question, right? Like if you’re building on them, can they in turn build the same product? And I think that always exists. And I think saying that that’s not a possibility would be kind of naive, but that being said, they haven’t done it yet. And I think that’s kind of why startups have to exist. And you could say the same thing for like many other companies. In fact, it’s used to actually be a relatively common question asked at Stripe, which is like, what if Amazon gets into payments, like you know, will they take over our business? And so far they haven’t. And I think that’s where I think you have to be ready clear about the direction and the differentiation that you are providing, which is where it can never goes back to the origin, which has, we’re not immediately trying to go there to bare metal. Our focus is developer experience and the developer platform and that doesn’t yet exist. And the plan is to get really, really good at that and be the preferred place for all developers to be.

Jeremy Jung 00:11:00 And I suppose,it’s like you said, it doesn’t currently exist. So if they were to come onto the market in a few years, you would have a, you know, X number of years head start as well.

Uma Chingunde 00:11:12 I think this goes back to kind of like differentiation and the more you want that head start, you want the stickiness where users have worked loads on us have like, you know, they are stuck up working on us, have really like grown to trust us and have grown to love our work flow enough that they would seriously consider like an a point of friction to be forced to doctor.

Jeremy Jung 00:11:32 So we’ve talked a little bit about how Render is a platform as a service to allow developers to run their apps and not have to worry necessarily about specific virtual machines, specific containers. And I wonder if you could talk a little bit about how you’re running those applications. You mentioned Kubernetes briefly earlier, but I wonder if you could elaborate a little bit more on what’s happening.

Uma Chingunde 00:11:56 I can’t go into many details, just because that’s a bit of the secret. So I say at a high level, I can kind of like try to answer the question in as much detail as is okay but without revealing too much. I think in this case, Kubernetes is more of a tool. It allows abstractions for us. Like it allows us to abstract this layer between virtual machines and user workloads in a clean way, which allows like, things like ease of migration, things like spinning up additional clusters. That’s, like a primary thing and that’s kind of why we use it. So I don’t want to index too heavily on, or that’s the underlying kind of mechanism. It’s a tool that solves a purpose, much like the way the underlying cloud provider is solving the purpose is a way of looking at it. Build that abstraction at the really, really high level, what the underlying product is building this thing where we’re abstracting.

Uma Chingunde 00:12:47 So when you, as a user, don’t have to think of your compute and have to think about where you want to run your service and where you want to kind of be, you’re not thinking from a provisioning workflow. So what we’re doing is we’re creating an abstraction where you’re removed from the provisioning workflow and instead have to be with the developer workflow. And that’s really the gist of the overall platform. So, you’re thinking at the level of writing code and get caught up and then like, you know, it’s linked to your Render account. And so you create a PR and then you use preview environments are similar and then you deploy your code and it goes live. And the entire layer of the product is actually just that, which is like managing this workflow. I guess that’s kind of like the level that it’s possible to do it at, without kind of drawing an architecture diagram, but it we’re kind of like essentially shepherding the user code using their workflow instructing okay, now click on, create the word on the machine and now copy your code from your desktop to, or like, you’ll get report for this place and I’ll run it, run the binary, essentially packaging all of that into the developer workflow.

Jeremy Jung 00:13:55 Like, I guess in our initial email conversation, we talked a little bit about being able to talk about the parts that you used open source or which you built yourself and where you partnered with other providers. And I’m wondering like out of those different pieces, if you could talk to as an example, like, oh, these are the things that we use that are open source, and these are the things we decided we needed to build ourselves. I wonder if you could talk about a few of those things. Yeah.

Uma Chingunde 00:14:21 I think one example, because it’s somewhat recent that I could talk about would be , I think, because it’s also like a differentiation that we’re providing is partnerships. So one thing that we did very recently is we actually decided to kind of actually, we realized that enough of our users were worried about you know, security attacks or are mostly also like the attacks.. And so it kind of actually became kind of like an interesting question for us, which is, do we continue solving these either as incident, where this happens and we mitigated live, which is actually possible to do, which is what we were doing. And at that point to be used, what cloud providers provide also as a service or do we use someone independent or do we also like actually just build the capability ourselves? And I think this was an interesting exercise of a, sort of like a build versus buy model for us.

Uma Chingunde 00:15:18 What we decided was that this was enough of a problem, or like if you were successful, this would become enough of a problem that it would make sense for us to become really good at early. But it was also not the thing where we would necessarily be differentiating ourselves because our core is the developer workflow and providing the best developer experience and being the best platform to run on. And there are companies that do this, full time as like their core business. And that’s kind of where we evaluated basically a few different vendors, including the cloud providers themselves, and then decided to actually pick Cloudflare as a vendor. And so all our user workloads, everything is behind Cloudflare and that kind of gives us this protection. And then there were some interesting discussions around pricing, which is like, oh, you know, we are paying for it.

Uma Chingunde 00:16:06 Do we pass that cost on to our users or do we actually offer it as a benefit? And then we decided that at least for now we will actually offer it as a benefit so that it kind of goes with the concept of we had a platform. And so you shouldn’t have to think about individual components of the platform and this level of security and DDoS protection is part of the platform, basically like this makes the superior platform, but as a developer, it’s not something you want to be thinking about. And so it’s like baked into it directly. And I thought that was an interesting exercise because as part of that, we actually rewrote the way traffic is routed in Render. And we actually have a couple of really good blog posts on both pieces of this, which is making, using a vendor for DDoS protection. And then also the way we structured our any cost networks the way essentially hot traffic comes in and then gets distributed across. And those were kind of like an interesting architectural decisions that we made over the last year.

Jeremy Jung 00:17:05 So it sounds like in this example, when people deploy an application, there’s a lot of, I guess, bots and things like that, just trying to hit your application that have no interest in using it, but are just wasting your resources and you made the decision that it’s important to have it, but there are other companies that are either have more people dedicated to it, or it’s a problem they’ve been working on for a while. And so rather than you having your team build a solution for that, you decided, okay, we’ll let Cloudflare handle it for us.

Uma Chingunde 00:17:39 Yeah. That’s kind of exactly the decision that we made. And we actually had to make this a few different times? Like another example is around metrics. There’s many different platforms and vendors. Again, I think this actually we use a mix of open source and also kind of a bespoke Render in this case. Use Datadog but then also for like Penta for Kubernetes, because we use that so heavily, we actually use from ETS because that’s really a well understood framework and it provides a good level of abstraction. But then we’re also constantly evaluating other options. So I think the benefit of open source is there’s always so many different things that are evolving that, you know, we can actually like pick and choose. And as long as they’re willing to pick the cost of migrating from one solution to another, you can actually always be a little helped in what’s being provided.

Uma Chingunde 00:18:30 And then because we’re a platform, sometimes some of these decisions will also get driven by what do our users want? Are more of our users asking for a certain type of integration? This comes up with third-party integrations a lot. So things like we have this concept of a deployed to Render, and we do that. We use this for like say you’re like an open source project and you want to kind of tie in your ability to deploy that project to anchor seamlessly. And so we will kind of build that integration. And that’s where often the decision making goes, which is which ones are popular in particular communities and which ones are getting traction? And then based on that, and sometimes it will also be determined if we ourselves are users of that open source project, we ourselves are developers. And the fact that, you know, if something’s appealing to us or if we are seeing a gap in a particular offering, that’s likely something, our users in turn will also need. So that goes into a lot of these conversations.

Jeremy Jung 00:19:29 So in terms of deciding what to let open source software handle or software as a service handle, you mentioned the security, like denial of service. You mentioned logging and metrics and things with Datadog and Prometheus, but I’m wondering what are some things that you looked at and you decided these things are our core competency, and we really do need to build these ourselves?

Uma Chingunde 00:19:53 That’s a good question. I think we decided on our, actually, anything that deals with sort of the look and feel of the website, so anything that are the dashboard itself. So like when you strengthen the product, anything that kind of flows from that experience we kind of, and bill, because that’s kind of where you are. Like you’re using the product and any sort of like interruption in that experiences. For a relatively small startup, you know, we’re quite design centric backed there, so, you know, we work with designers, we work with UX engineers. That is, I think the difference, because I think is particularly in dev tools or generally in. In tools as a space, there may not be the same polish and the same kind of like engine or EPL being spent, as you see in consumer apps that has been a very conscious decision to do that internally.

Uma Chingunde 00:20:46 So anything that kind of patches the product’s look and feel or the developer experience itself, we’re already conscious of working. And then even like in the internals anything that’s part of like the developer work flow, even if we are using open source components, like Kubernetes kind of going back to that, right? It’s we try our best to like that abstraction shouldn’t be called. Like, you might know that that’s what we’re using under the hood, because you’re listening to this conversation. But if you’re actually using the product, it’s not like you’re not deploying, thinking about Kubernetes, you’re just thinking about the deploying your code and having that, be a way to your separation is important.

Jeremy Jung 00:21:24 The part that’s actually running the applications may be based on open source software. Like you mentioned Kubernetes, but all of the, I’m not sure how you would describe it, but you mentioned developer experience. So maybe the part that the user sees when, like you said, they go to the website or they push their code and then the part that’s maybe taking that code and running the workload, that’s all stuff that you wrote internally. And is, I guess you could say secret sauce of the company?

Uma Chingunde 00:21:53 Yeah. The bark from like the integration with get to the kind of developer workflow setting up the integration. And then the previous environments is another quick one where you can actually have a PR and have review separately. And that’s, I think one of our actually differentiation features. So things like that, that are core to that experience, those are the ones that we invest in. And I think maybe another thing to think about is, we are kinds of experimenting with, and also providing features. Managed databases is a good question where this boundary becomes harder. So we provide a managed Postgres as a product feature. And then we also are working on Redis, managed Redis. I think that’s managed databases is a very interesting one because we are very careful about. Because most kind of stateful apps need a database and want a database, but won’t have to manage the database. But then are we now getting into the kind of managing DBs as a product? So that’s where we’re like judicious key picking a couple of the most common ones that people need and want. And then that’s where, the constant user conversations and sort of like evolution of the roadmap comes into play.

Jeremy Jung 00:23:02 See, you mentioned the managing of databases. And I wonder, like from the perspective of a company who’s running a SAAS is managing user databases. Is that the sort of thing where you have to have a bunch of DBA’s on staff and people who, you know, what typically know how to monitor the database and tune and things like that, they’re just watching all of your customers or what is that does that actually look like from your end?

Uma Chingunde 00:23:30 I think we’re lucky again, to be in a kind of state where a lot of that has thankfully been automated, but it is a 100% is one of those things where you start going into more specialization. So it is like, it does require people to have a deeper understanding of the underlying technology wishes, just pooling components together. So yes, absolutely. So what we kind of have to do there has been the tooling, okay the monitor. Monitor the databases, manage them, upgrade them. That’s like a common thing. So it takes us immediately from not having to worry about user state. You’re always worrying about user state, but more at the metadata level. And this takes us to kind of absolutely at the data level, you start having connect that introduces complexity and, and a need for like, you know, managing state at the different level.

Jeremy Jung 00:24:21 When you’re talking about going from link, when you worked at Stripe, you were managing compute. So I imagined that it’s sort of similar to running a platform as a service, except that it’s for an internal company. And I wonder if you could speak to how that compares to running an actually public platform as a service.

Uma Chingunde 00:24:42 Yeah, I like this question because it’s also one of the ways that I actually describe Render often to people. If I’m talking to like a former colleagues from Stripe, or just like, people that are familiar that have been at work at other large SAAS companies, which is, rebuilding Render for, the broader public. So the set of constraints is very different for one, and they both have pros and cons. With an internal platform, you have a captive market, right? Like you have a captive audience who, while captive are also highly opinionated and are not afraid of making their opinions be known. And then also depending on the size, I was there from around 800 employees to a few thousand, so depending on the size, what you’re running just becomes more and more critical. So the criticality of what you’re running just becomes so huge. Where you go from running production level, but like moderately critical workloads.

Uma Chingunde 00:25:40 In incident, while terrible, isn’t being treated actively a lot by a hundred users and then overtime, escape. So it is very much so the kind of experiences you can have this, everything is kind of much more homogeneous, but feels higher stakes. Especially as the company grows because you know, you are kind of, you know, in charge of it. So that’s kind of like the, both the pros and the cons of the activity. You’re like running this internally, you have a dedicated security team that you’re working with. You have all of these kinds of resources, but then the stakes and consequences are really higher. On the other side when you’re building for the gendered public, it’s just really interesting because it’s so much more heterogeneous. People are doing really, really interesting things on your platform and are asking for really interesting use cases and are, you know, seeing interesting failure modes.

Uma Chingunde 00:26:29 So it’s a completely different thing. The joy of that as you have a lot more room to experiment and try and you’re getting like entirely different feedback loop. But they’re also not captives. So, you know, they’re just they’re there but can also leave. And there isn’t like this kind of clear direct path, a roadmap for instance. No one is giving us this roadmap from above and saying, this is your roadmap called. Is this, that’s what our structure the worst is. When you are building an internal platform, it’s very, very clear, like this is the company’s goal. These are the company’s products that are the most important, and this is what you’re going to do there. You will get them there and that’s it. And so what that allows you is, it allows more speed, but at the risk of actually like, you know, building things that are less polished, because speed is like the biggest thing, because the underlying infrastructure team cannot be the more connected to the product company.

Uma Chingunde 00:27:24 When you’re building for the public, your constraints are that you can’t just like give something to people to try, unless it’s, completely actually ready. And it actually needs to be a fully finished product needs to be supportive, otherwise, you’ll start having incidents. But the use cases are so many more that you can actually do it in a much more incremental way. Where we can have the luxury of experimenting with things like figure, that’s something that just doesn’t make sense. That’s an internal platform. Like if it is kind of literally free. So there is this tighter loop with your users that you kind of have as a public platform back as an internal platform, you kind of have already different set of incentives and constraints. But I do think that there’s a lot that you can kind of borrow and replicate in both trends.

Uma Chingunde 00:28:07 One thing I’ve kind of leaned, leaned on and tried to become better at is this kind of thing, listening to users and like keeping that feedback much quicker, which I can actually see having, this skill would have actually been already good even at a larger company. And then I think there’s a certain level of rigor, an eye for detail that internal platform teams have because, often the critical nature of what they’re running means that everything has to be way more detailed that I’m trying to dream through our smaller team. My pitches really, you’re getting like that great off platform. So if you are as a developer, starting out, but you don’t have access to that internal fracking. We are trying to be that internal fracking for you.

Jeremy Jung 00:28:52 Yeah. That’s interesting that you mentioned how, when you’re doing internal infrastructure, the stakes are very high and I can understand that in the case of Stripe, right? If people can make their payments, then they’re going to be upset. But I wonder, like you were mentioning how on the public side, wouldn’t it seem like the stakes would be just as high to your customers? So I’m kind of wondering how you reconcile that.

Uma Chingunde 00:29:15 I think the difference here is, our stage, a series of company. The hope is that our stakes are as high ready quickly as well. Right now though it is that for us, it’s kind of like the, not all our eggs in one basket sort of thing where one is like, you know, for instance, we already work with multiple cloud providers. So by nature of targeting somewhat different businesses, we are operating slightly differently where the economics of that did not make sense or will typically not make sense for a larger company. Like you’ll find very few larger companies working with multiple cloud providers. They usually pick one and go deep on them. So there’s things like that that can end up getting built in for us that give us some built-in resilience. And then I think while the stakes are high across the board, like for us, we have so many different users that, that kind of gives us a different level of resilience. But the underlying point that you make is absolutely true. Which is, so the stakes are higher it’s activity. It’s just more good as a functional time I’d stage, rather.

Jeremy Jung 00:30:22 If I understand correctly, when you are working for a company like Stripe and as it gets larger and gets more funding, more employees, inevitably more people rely on it and your reliability needs to go up. And of course the end goal would be the same for something like Render, but it’s very early days and that’s always going to be a gradual process.

Uma Chingunde 00:30:45 Yes, a hundred percent. When you are like the payments company, and you are in current serving users that are public companies. That’s just a different level of stakes than when you are a startup and your primary users are at a different stage.

Jeremy Jung 00:31:04 The other comment I thought was interesting was you mentioned how the constraints when doing internal compute might make it, I don’t know if you specifically said that you might have to build things slower. Was that right? And I was wondering if that’s, because you’re also responsible for more things because you have more internal knowledge of the different applications that are running?

Uma Chingunde 00:31:27 I think when I said that, to kind of clarify a little more, what can end up happening is at a larger company, I think what you end up doing is you can actually go quite fast, but you don’t often have the luxury of like finishing things on a productizing internet infrastructure. So there’s often like this journey where internet infrastructure teams sort of run as like service teams? They are providing services for the rest of the company, but they aren’t quite able to create through that next layer and also act as like free functioning product teams? So I guess like the differences that you’re able to like deliver 80% of what your users need faster. And, but then you, like, you never get that last 20% ever. Then you’re kind of perpetually like, you know, dealing with like the leftover of that plus 20%.

Uma Chingunde 00:32:19 So that can kind of be actually like a frustrating thing for internal infrastructure teams versus you can’t do that as a product company because you always have to provide your users with a very polished product experience. Otherwise they just won’t use your sources. Larger companies, they don’t have a choice, but then it often just like working with constraints, such as like, you know, team capacity and team priorities, that will be slightly different. So I don’t think it’s more like you go faster or slower. Maybe that’s the wrong capitalization, it’s kind of like, what’s the level of finish that you need to provide in both. And I actually do honesty thing that most internal infrastructure teams would better serve their users if they were run more as if they were external products, but that unfortunately doesn’t tend to happen. For many different reasons.

Jeremy Jung 00:33:08 Yeah. That makes a lot of sense because if I understand correctly, when you’re building for an internal organization, you could have a, you know, an offering that works providing real business value and people are hosting their applications on it, but there’s like little, either developer experience issues, or maybe there’s occasional reliability problems. And people have to go in and deal with that either in your team or from the application team. But maybe it can be hard to get the people assigned to the resources assigned to go like, Hey, let’s solve this once. And for all, because it’s annoying, but it’s not stopping the business.

Uma Chingunde 00:33:43 That’s a hundred percent exactly that thing. So like an ongoing thing that our large companies are like migrations. So there’ll be like the business critical migrations that will happen, but there won’t be the less critical ones that it’d be like any large team will just have like a pending backlog of like, oh yeah, we want to migrate to this new framework, this new, you know, this metric tool, this better team. But they would just like never have the time or bandwidth to do it.

Jeremy Jung 00:34:08 And with the case of something like Render that’s to the public, if you release a feature, an offering and it has like kind of shaky developer experience, or it works 90 something percent of the time, then customers are just going to go, like, I can’t use this. They’re not going to deal with it like an internal company might.

Uma Chingunde 00:34:27 Exactly. That’s exactly the kind of constraints and incentives.

Jeremy Jung 00:34:31 I wonder also from the perspective of monitoring your platform as a service or your internal teams had Stripe, is that different monitoring, internal applications versus monitoring workloads that are coming from, you know, who knows where, where you have no visibility into their source and things like that?

Uma Chingunde 00:34:51 I think for the most part, it looks similar, but then there’s like similar vectors to what we talked about earlier already, right? We have to actively monitor for people violating our terms of service or like using our platform for fraud or abuse or using our platform to be the source of phishing or DDoS attacks for other people. You don’t have that problem with them in front of the team because that’s just not going to be a problem. So I think there’s a much bigger vector of misuse off an external platform that you have to monitor for put in safe guards against, than you do with an internal platform. So there’s kind of a walled garden versus like the general bazaar sort of problems that you have.

Jeremy Jung 00:35:34 How are some ways you deal with the unknown aspect of who’s coming to use your service, whether it’s for malicious purposes or someone’s trying to just tie up your resources and not be like a regular customer, that sort of thing?

Uma Chingunde 00:35:51 I think that’s where we basically, all of this is monitoring and solid like with different, with all the tools at our disposal. So it’s sort of we had the, kind of the basic monitoring, like monitoring of all the critical components, monitoring of all the resources, monitoring user signups, to the extent possible everything that’s automated. And then other angle is there’s an ongoing effort, which is truly never ending, which is fraud and abuse monitoring. So that’s, again, it’s automatable and actually this is not a problem for companies like Stripe, but just come in a different space and depth. People are trying to use other part of abuse and fraud. So it’s actually kind of interesting where the same sort of tools actually get used, like Stripe isn’t like manually verifying credit card abuse.

Uma Chingunde 00:36:40 It’s similar to programmatically monitor for people signing up for fraudulent reasons or with stolen cards or for are using phishing attacks and stuff like that. So it’s always like a mix of, automating and monitoring and like in automating action that you take for the monitoring and then always having a fall back for there is also like sometimes like a manual element for a lot of these things. So the CEO of Render used to,was actually the head of Risk at Stripe. So he is very familiar with fraud and abuse and handling it. And so he’ll often take the front seat in these discussions because he’s kind of not done it for these axis and so it’s kind of interesting how much of that translates. And also how many of the same tools we can use to detect fraud.

Jeremy Jung 00:37:27 Another thing I thought we could talk about is when you’re building a platform as a service or you’re building an internal compute team, what type of expertise are you looking for? And is that different than somebody who’s building a software as a service, for example?

Uma Chingunde 00:37:45 I think broadly, I don’t think they’re that different. I think in tech especially, the landscape changes so quickly that what you really want is people that are able to kind of be flexible and learn new things quickly. And like an example, most of the stuff that I’d learned, isn’t like a relevant skill anymore. So kind of another chord that I originally learned programming just isn’t useful lecture. There are some places that use C++, but that isn’t mainstream. I mean, it’s still a very widely used language, but that’s not to be a start-ups. So I think in general, you just want people that are really good developers, have a lot of curiosity and have a lack of kind of willingness and desire to learn, which usually kind of goes with curiosity and humbleness. So, you know, not assuming that they have all the thoughts are not kind of coming in with the mindset that, Hey, I am an ex-developer with this much experience, and I know how to solve this problem or kind of coming in with, yes, I have these skills and how do they translate here?

Uma Chingunde 00:38:48 I would just say that that’s kind of like all this unifying characteristic for good engineers. And then depending on the specific problems that the team or the business is trying to solve at a given point in time, that’s when you kind of want to delve into more specialized skill sets. So typically the skills that we tend to want to hire at Render, are not that different from what I would have hired for on my old team at Stripe. I think the difference is a little bit more on the adjacent sites? But also actually think that we could have used some of those skills on my old team and a couple of examples are design. So having dedicated designers, which we did not have on my old team, we kind of consulted with in Stripe designer team but we didn’t have an embedded designer or UX engineer.

Uma Chingunde 00:39:35 So people are actually thinking deeply about the user experience and the workflow. We did not have that, but we actually had a few people who are very talented at that without the training, which were the just full stack engineers. And then a couple of other things that are, if I were to go back in time was a dedicated support team. So, we have that. I trained her because you know, that’s kind of where the difference comes in of being an internal versus a public platform. So, at Stripe, it was actually the engineers on the team that would act as support on rotation basically. And at Render, we also have that rotation where actually everyone participate and supports, but there’s a steady team and then a rotation, both. I think the key differences is you cannot go deep on special skillsets, typically user facing skillsets on a public platform, which you don’t do on an internal platform. But actually having seen both, I think that some of these deeper expertise areas could actually be taken back to internal platform things and they could actually benefit from those.

Jeremy Jung 00:40:34 I mean, when you think of internal teams at any company, they sound like they should be different. But you kind of are saying, you really should treat it more like a product, more like something you are shipping to customers, even if it’s internal.

Uma Chingunde 00:40:48 I think we’d have happier users if you did that.

Jeremy Jung 00:40:50 So I wonder too, when you first started at Stripe, how large was the Compute team’s team?

Uma Chingunde 00:40:57 It was pretty small. Actually, if I remember correctly, it was just around 14 people. So, we were just starting to split the team. So, I kind of came in inherited one half of the team, one half of Compute, which we called Cloud, which was the layer that work with the Cloud providers and other half was called Deploy and Orchestration. So, manners of applied workflow analytics orchestration there. So, we cannot split it between six and eight people between those two teams that I started with that. And then I think by the time I left, it was like, you know, four teams and a little over 40 people.

Jeremy Jung 00:41:29 And looking at how things were managed when you first started versus when you finish as well as how things look at Render. I wonder how you approach the process of running a Compute team or running an infrastructure team as it grows.

Uma Chingunde 00:41:44 I think a few things I’ve kind of learned is because I’ve got to see things at the larger scale things. Like I have a kind of somewhat a foreshadowing of all this is, we are going to be hitting scale limits or reliability limits, or even on the people’s side this kind of experience of when to start splitting the teams. What makes a good size team versus what kind of person? So there’s a big of things that have kind of leaned on from my previous experience, like incident management, thinking about reliability and thinking about incidents and learning from incidents and actually being proactive about those? Which I think are typically will take larger companies, like there’s almost a certain point in their life when they start learning about internet. I like to think that maybe because of my experience of seeing it at a larger scale, I have learned to kind of start sooner than I absolutely needed. But I think benefits us is an element of also like, you know, just ecosystem experience, that kind of concern, like, you know, vendors and like who do our users care about that comes with having done it at a slightly different scale.

Jeremy Jung 00:42:58 You mentioned how, when the company is large, you built out this formal process for incident management and things like that. I wonder if there’s anything else you can think of that is typically in place at a large organization that you think would really benefit a small one.

Uma Chingunde 00:43:16 I think observability is another one because it goes hand-in-hand with reliability and incidents. Which are where I think that most SAAS companies typically will wait longer, but kind of not build out robust observability. And I wouldn’t say that we are there yet either. I think we are still getting there. There is this kind of intangible just of being really, really good operationally that companies learn as they grow. A lot of it is stuff around incidents reliability becoming much better than suitability, recur about stuff like this. There’s an element of rigor around a quality that typically comes in at larger companies, but they’re actually was very pleasantly surprised that Render was already ahead of it. I expected it to be, but just generalizing. I think that’s typically not something that’s what our companies will invest in. Our security is another one that typically companies wait a little longer to invest in that I think smaller companies would benefit from getting that expertise, but then early, especially if you’re like, you know, in a more platform or enterprise product space,

Jeremy Jung 00:44:24 When you talk about quality within the context of software, are you talking about code quality or defects or, you know, what are you referring to when you mentioned that?

Uma Chingunde 00:44:35 All of them. I’d like starting with that quality, right? Like, you know, so when I say I was pleasantly surprised, I was pleasantly surprised to find, like I said before more school that Render gets revealed. There is a good trigger around code reviews and feedback and thinking about code before pushing it. That’s not just for quality, but just also for learning and collaboration I think is just so powerful. So that back was a good thing. And then I think you’re not, then there’s the defect and pushing it. And then at the other end of the defect spectrum is the incident drive, basically incidents are basically defects that occur so critical that they cause an incident. So, it’s actually a spectrum between the writing of the code book, how you’re dealing with incidents and operationalizing that entire pipeline.

Jeremy Jung 00:45:17 When you talk about improving quality, a lot of times that’s related to making sure things work, whether they’re tested things like that in the case of a platform, as a service, like Render your platform is running the software of other people whose software you don’t control. Right? And I wonder if, as a part of your testing process, how do you account for that? Are you running random applications against Render things like that?

Uma Chingunde 00:45:45 I think we don’t typically have to do that just because, you know, there is enough of an abstraction between what our users are doing and what we’re doing, that we don’t have to worry about that. What does happen though, there will be an interesting series of support questions that will often come in where users are kind of struggling to deploy something. And it will not always be clear whether the problem is in their application or library that they’re using or actually under Render. And that gets tricky. And actually interestingly matters, not unique to the public platforms. My old team at Stripe had this all the time as well, where, you know, people would come to the Compute team and ask for help debugging because they had like literally gone through the entire stack. And often they try to debug and then we were the last layer

Uma Chingunde 00:46:30 and we would often end up helping them debug their application problems versus it not being an infrastructure problems. So, I would say it doesn’t, it’s not actually something that we have to test as much, but it’s something that we definitely have to be prepared to answer questions about. And then often if there’s always this infesting kind of question, we might be able to help them, but also what is our level of obligation? So we generally try to be like good support and do try to help them. But there’s also at some point we have to also tell them like, Hey, look, actually, this is a problem with your application, and you might be able to fix it.

Jeremy Jung 00:47:05 It’s a reminder that you are in a consulting service. You’re a, you’re a platform to host your application, you know?

Uma Chingunde 00:47:11 Versus as an internal platform, you often, ìcan I actually say no?î Usually, people don’t feel comfortable saying no, because in the end you know, you are one larger team and that’s why sentiment are a little mix.

Jeremy Jung 00:47:25 Let’s say you’re fielding a support ticket for your internal team. And someone saying, I deployed this app and it’s not working. Would your support team actually have to go in and look at user’s code and things like that?

Uma Chingunde 00:47:38 You mean for the internal team, right? Yes. And that was very often the case. And this was a mix of like, you know, one is because you’re part of the same larger team. You kind of have this obligation to help your coworkers. And then the second problem is also because you haven’t yet but you had the luxury of building those strong interfaces from the get-go. It’s actually hard for your users to know that the problem lies with a public platform, you have built strong enough abstractions that you can quickly debug and tell your users like, Hey, no, actually it’s there. And this is exactly why we think it is. With an internal team, often abstractions are leaky and it might not be easily obvious. And that’s going to, when I was alluding to the fact that internal platform teams could be likely better off if they had those stronger abstractions and those stronger boundaries,

Jeremy Jung 00:48:29 Could you give an example of where those boundaries leak in an internal application?

Uma Chingunde 00:48:35 One example is which was kind of quite painful for my old team was, we were using this service mesh library called Envoy. My team had kind of done the migration and kind of like rolled it out to all internal service to service communication was through Envoy because Envoy provided stronger security guarantees and more observability. But when it was first rolled out, it was kind of a one migrations are always a bit tough. So it was still new. So there were problems with the migration itself, but then it kind of also like put this narrative where a service would fall over. People are quickly look at the logs, see an Envoy log methods on very far down in the stack and be like, Hey, we have an Envoy problem. And my team would then have the kind of debug it. And this is that same thing where the abstraction leak because it wasn’t to be strong. There wasn’t a strong enough abstraction. But then there was also like this kind of problem of guilt by association where, we were kind of ended up debugging things are, have this problem. And I think this is just a very common problem for internal infrastructure teams where they end up debugging things across the stack.

Jeremy Jung 00:49:49 Yeah. That’s really interesting because it’s a little counterintuitive where you would think like, oh, we both know about this thing. So, you know, it allows us to work better together, but in the case of Render or any other platform as a service, the user will never see the Envoy error. They’ll never see, all these things that are happening in the background. So they can’t go to you and say like, well, clearly it’s your problem. Right?

Uma Chingunde 00:50:14 And you also, aren’t like sitting one desk over where you can just be like tap on the shoulder and you’re like three levels of manager is in the same manager.

Jeremy Jung 00:50:23 Totally. Yeah. So it’s a culture thing there too.

Uma Chingunde 00:50:27 Yeah, absolutely.

Jeremy Jung 00:50:28 Well, I think that’s basically everything I had, but is there anything else you wanted to mention or that we should have talked about?

Uma Chingunde 00:50:35 One, kind of, hypothesis that I’d like to offer — because we talked about the incident and we talked about computers. Maybe there’s kind of going to be this expansion of products that are essentially going to be replacements of things that internal platform teams have built over the years. So I’ve kind of like tweeted about this a bit in the past, but I think it’s, it’s my current, pet theory about how the platform as a service space is going to expand on this current evolution where all the developers that work at large SAAS companies have gotten used to a certain set of tools that they will now either build themselves or like, you know, wants to see built, and that’s where the ecosystem will head next. So that’s kind of like one hypotheses I would like to let out in the world.

Jeremy Jung 00:51:24 Are you picturing something where, you know, maybe five years from now or something somebody would go to Render and they say, I want to build an application and Render will have like, here’s the way that you log in your application, and here’s the dashboard; you plug in some maybe configuration and we’ll set it up for you. You’ve already picked these specific products, I guess, or ways of doing the things that nearly every application is already doing.

Uma Chingunde 00:51:52 Yes. I think for Render’s case, that would kind of be a bit of the next step. I think there’s also this element of, we kind of see this next layer of basically like platform as a service or like almost like services as a service. So an example would be, we’ll see more managed database companies come up. Like we are already in the space, but that’s not our core competency, but we see more and more managed DBs. People will push more and more stuff down. Each large SAAS company has a whole plethora of internal tools that they use. And each of those is almost like its own product for instance. And we will see more of them kind of coming up and like, you know, existing where there will be a way to kind of, you know, stitch together different tools and provide them like Zapier does or free tool is trying to provide or at a lesser kind of degree things like, providing software compliance like this, it’s not becoming like a product or something. So compliance is becoming its own product, right. Or you’re seeing companies excess that you’re providing incident tooling, specifically. So you have like Jeli, they’re doing it learning from incidents. Or if you have incident IO, they’re providing incident management. So all of those were kind of become standalone products in themselves. So, you know as a farmer, you could pick out your credit card and sign up for Render+ these two other tools and like, you know, things that you would have done with engineering effort will all be done, you know, your credit card.

Jeremy Jung 00:53:24 Well, I hope we get there because I think there is so much, I suppose you could say brain energy being used on every time somebody creates a new application, they have to decide, okay, what are all the services I’m going to use? And what am I going to do myself? And if somebody could just hand you, Hey, use these things, we’ve configured them for you. And you know, you’re all set that could save so much time.

Uma Chingunde 00:53:48 Yeah. I think that is a hundred percent something like this kind of like a startup kit or SAAS companies. I’ve seen a few of those actually floating around already, but I think it’ll become more kind of canonical.

Jeremy Jung 00:54:54 To wrap up. Where can people find you? Where can they find Render and anything like that? Go for it.

Uma Chingunde 00:55:01 Render.com, check us out there, or reach out to me on Twitter. I’m on Twitter. You can just follow me or reach out via DMs also on LinkedIn, if you’re more old school.

Jeremy Jung 00:55:12 Cool. Well Uma, thank you so much for joining me on Software Engineering Radio.

Uma Chingunde 00:55:16 Thank you so much for having me. This was a great conversation.

Jeremy Jung 00:55:19 This has been Jeremy Jung for Software Engineering Radio. Thanks for listening. [End of Audio]

SE Radio theme: “Broken Reality” by Kevin MacLeod (incompetech.com — Licensed under Creative Commons: By Attribution 3.0)

SE Radio 499: Uma Chingunde on Building a PaaS

Show Notes

Related Links

Transcript

Join the discussion

More from this show

SE Radio 613: Shachar Binyamin on GraphQL Security

SE Radio 612: Eyal Solomon on API Consumption Management

SE Radio 611: Ines Montani on Natural Language Processing

Menu

Recent posts

Search

Search

SE Radio 499: Uma Chingunde on Building a PaaS

Show Notes

Related Links

Transcript

Join the discussion

More from this show

SE Radio 613: Shachar Binyamin on GraphQL Security

SE Radio 612: Eyal Solomon on API Consumption Management

SE Radio 611: Ines Montani on Natural Language Processing

Menu

Recent posts