SE Radio 586: Nikhil Shetty on Virtual Private Cloud

Nikhil Shetty, an expert in networking and distributed systems, speaks with SE radio’s Kanchan Shringi about virtual private cloud (VPC) and related technologies. They explore how VPC relates to public cloud, private cloud, and virtual private networks (VPNs). The discussion delves into why VPC is fundamental to building on the cloud, as well as configuring a VPC, subnets, and the address space that can be assigned to the VPC. During this episode they look into route tables, network address translation, as well as security groups, network access control lists, and DNS. Finally, Nikhil helps compare VPC offerings from Amazon Web Services (AWS) and Oracle Cloud Infrastructure (OCI).

This episode is sponsored by ClickSend.
SE Radio listeners can get a $50 credit by following the link below.

Show Notes

Transcript

Transcript brought to you by IEEE Software magazine and IEEE Computer Society. This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.

Kanchan Shringi 00:00:48 Hi all. Welcome to this episode of Software Engineering Radio. Our guest today is Nikhil Shetty. Nikhil is an expert in networking and distributed systems. He has worked at Juniper Networks, Cisco Systems, and Oracle Cloud infrastructure. For Oracle Cloud infrastructure, Nikhil has helped design and develop the monitoring and automation platforms that manage OCIs global network. He’s currently helping develop service for OCIs AI super cluster networks. His interests include network observability, data pipelines, and control planes. I like to point out that Nikhil and I both work for Oracle. Nikhil was introduced to me and he came highly recommended by someone in my network when I was looking for a guest to speak about this topic on VPC. Nikhil, welcome to the show. It’s great to have you here. Is there anything else you’d like to add to your bio before we get started?

Nikhil Shetty 00:01:43 Thanks for having me here Kanchan. This is a great opportunity for me and I would like to really thank you for inviting me into this podcast. Nothing else to add. You’ve given a great introduction of yourself. Thank you.

Kanchan Shringi 00:01:57 Great. So let’s just start with describing the big picture for some time. And the very first question would be, what is a virtual Private Cloud? And then we will go on to discuss why is it fundamental to building on the cloud? What is the underlying technology and some issues and monitoring aspects. But can you describe what is a virtual Private Cloud?

Nikhil Shetty 00:02:24 Yeah, so I think before we start here, I think one of the things in these fields in networking and other fields is that experts tend to use acronyms, right? So you’ll use terms like VPC, VPN and things like that. So you’ll hear a lot of acronyms. What you want to do is, over my experience over all the years has been, dig in deeper into that acronym, see what each of those terms like stand for. So in this particular case, VPC stands for virtual Private Cloud, as you clearly mentioned, if you dig into it, the first term would be virtual. So obviously it’s virtual rather than physical, right? So that itself kind of gives you some hint about what this is. The next term is private, right? So private is, it’s not public, right?

Nikhil Shetty 00:03:10 It’s the opposite of public. So it’s something that kind of gives you another hint about what this thing is. And finally, cloud, right? Cloud is something that’s running not on your laptop or desktop, but it’s running somewhere else and you’re connecting to it or the network, right? So now if you put all of these together, a virtual Private Cloud would be a cloud that is not physically yours, right? So it’s virtually yours, it’s private, that means it’s not public, which means others cannot see your traffic. There might be other customers who cannot actually access the traffic that you’re sending on this particular cloud. And then of course it’s a cloud. So it’s not on your laptop or desktop, it’s somewhere sitting connected to the network basically, right? So a bunch of software and services and you have a network, most likely the internet over which you’re going to access these services.

Nikhil Shetty 00:04:04 So essentially that’s what a virtual Private Cloud would be. Where it becomes interesting is what is the relationship with a public cloud, right? So what’s a public cloud? So by definition of public cloud would be one of the big hyperscalers, like AWS, GCP, OCI, things like that. These are all public clouds. The reason they’re public is because they’re publicly accessible. All the services and softwares publicly accessible. Some of the services may also be accessible over the internet, right? But what you, when you have a virtual public virtual Private Cloud within a public cloud, what it means is you get your own chunk of that public cloud in which you can run your own software and services and it’ll be virtual obviously, because it’s not physically yours, it’ll be private. So others cannot view it. It’s isolated from other customers. And of course it’s running in the cloud. So that’s what I would call VPC.

Kanchan Shringi 00:04:59 However, there’s another term called Private Cloud, which I believe is stands for something quite different. Do you want to clarify?

Nikhil Shetty 00:05:08 Yeah. So Private Cloud in general, it kind of refers to your on-premise networks. So traditionally, all your software and services, they have been delivered through private data centers. Like so essentially this is like physically private data centers that you own. You own all the servers, you’d own the networking, you’d own all the storage, right? And that would be your Private Cloud, right? So that is when you say cloud, usually expect things like–hey, I get on demand compute, I get storage on demand, things like that. So those things you could replicate in your own on-premise data center, right? You could have, like for example, VMware that manages your servers. You get VMs on the fly. You could have some kind of maybe a NetApp storage grid that kind of gives you object storage kind of services. So you could do all the services that are running in the cloud in your own private data center.

Nikhil Shetty 00:06:06 So that becomes your Private Cloud. These things become quickly complicated, however, because you sometimes you don’t want to run everything. You, maybe you have a Private Cloud, maybe you have a dedicated infrastructure running on premises, but you want to connect to some services that you’re actually running on the public cloud, right? So you have a VPC and then you have a dedicated data center. How do you connect them together? Right? At that point it becomes like a hybrid cloud. So that’s another term that you might hear in the industry. Hybrid cloud. So that means some of your applications are on premise, some of them are running in a VPC in a public cloud, and you’re kind of connecting them together. The other term that we might hear is multi-cloud, right? So what is multi-cloud? So when as a company you want to put some of your applications and services in the cloud, you may decide–hey, you know what I don’t want to be attached to one public cloud.

Nikhil Shetty 00:06:58 I don’t want to be attached to one vendor. So it’s kind of like a multi-vendor strategy where I want to now put my services and applications across multiple clouds. So I could put something in AWS, put something in Azure, put something in GCP, and then I kind of now want these applications maybe to even talk to each other, right? So then it becomes a multi-cloud kind of infrastructure. So that’s the term you’ll keep hearing as well, multi-cloud. And looking further, I think these boundaries are getting even more blurred, right? There are things like AWS outpost or Azure stack, right? So where you can get a server rack that’s running in your on-premise and running AWS services or Azure services right there, right? And what services you are interested in. If you’re interested in only storing the data locally, those things could be run in that particular server rack and then that server rack could reach out back to the public cloud for all the other services that you have, right?

Nikhil Shetty 00:07:58 Maybe you have a VPC there. So you have other applications installed there, and this outpost would kind of reach out for those services there. Recently, there is also a new product that was introduced by OCI, which is called dedicated cloud. So it again, this is dedicated infrastructure, but then kind of managed by OCI, so OCI would actually install all their cloud services in your dedicated data center. So it’s all dedicated for you, but it has all the services that the cloud offers. And at the same time, you benefit from having the data locally for whatever reason. It could be latency, it could be security, right? So all of these terms getting a little bit blurred and complicated, but at a high level just let’s make sure that we understand what a VPC is. VPC is running in a public cloud. And then you have all of these other combinations where I have some on-premise stuff, which is a Private Cloud. Maybe I have a mix of both, or maybe I have multiple of these VPCs in different clouds. So all of those have different terms that you refer to them with.

Kanchan Shringi 00:09:03 You’ve talked in a little bit detail of the VPC and on premises and the different combinations there. How do you connect between the on-premises cloud and the public cloud? Is that the VPN or virtual private network? Is that right?

Nikhil Shetty 00:09:20 That is correct, yes. So if you, if you search for VPN online, I’m sure you’ll find a bunch of free services and even paid services, which let you join a VPN, right? And from your laptop, you could join a VPN and all your data traffic, all your internet traffic, all your browsing goes over that encrypted connection. It’s basically like a tunnel between your laptop and the VPN server, right? And then once it’s all tunneled. So all your data is hidden. So let’s say if you’re in a Wi-Fi in a cafe in a public cafe like Starbuckís Wi-Fi or something, you don’t want any of your data to be visible, like what kind of activities you’re doing. So you’re putting it in a tunnel, right? You want to do the same thing for your enterprise, right?

Nikhil Shetty 00:10:10 So you have an on-premise data center and you have a VPC in the public cloud. How do you connect them together? You want to actually encrypt it end-to-end, put all your data traffic in a tunnel so no one can see it. You own all the keys, right? For all of this encrypted communication. And that’s what a VPN would be, right? So that’s what a virtual private network, usually this is built on top of a technology called IPSec, right? So you have an IPSec PPN that you set up, and as I said, you could do a VPN from your laptop, but for these high fidelity VPNs usually you would have some kind of customer premise equipment, which you’ll install in your on-premise data center. And then that would connect to the cloud service, which offers like the endpoint of your VPN, and then all the traffic that lands at that endpoint in the cloud is dropped into your VPC to go to its appropriate location within the VPC.

Kanchan Shringi 00:11:08 So before we get into some more of the technology that this is built on, I’d like to point folks to Episode 571, Jay Mulder on Multi-Cloud Governance. That has covered some of the related topics here as well. So Nikhil, let’s spend some time now on the underlying technology to some extent, perhaps maybe with an example. So if somebody has a three-tier web app, if you can keep that in mind and then talk about some of the concepts related, starting with the VPC, what if the VPC is not created at all? What happens?

Nikhil Shetty 00:11:47 Okay, so if a VPC is not created at all and maybe your first interaction with the cloud is–hey, I want an instance, right? I want a VM to run something. Most clouds would create a default VPC for you, right? That takes care of like all your basics and any of the communication between instances in that VPC and things of that sort. But in general, the default VPC may not work for you because it’ll have some simplistic settings. For instance, like you’ll have full access to the internet, right? And we just talked about the private network. Maybe you don’t want your instances to be on the internet because that’s, you’re exposing yourself to security hacks and things of that sort, right? Like, so if you look at like the Swiss cheese model of security, avoiding instances from being on the internet is kind of like one layer of your Swiss cheese model of security, right?

Nikhil Shetty 00:12:43 So it won’t be sufficient, but it’s kind of like maybe a necessary thing. You want to keep them away. So what you want to watch out for, you can always use services on, the public cloud, create your own tenancy start using maybe some instances, maybe you want to bring up some containers, right? Things like that. But they may all land up in the default VPC, which is not what you want, because some of the settings of the default VPC may not be to your liking, right? So going back to your question, what you want to do is you want to create one VPC. So in this case, let’s say, let’s take a naive three-tier web app, right? Let’s say I take create A VPC, and I will use a term called CIDR, which we’ll try to explain in a bit.

Nikhil Shetty 00:13:28 So let’s say you create VPC, it has a CIDR of 10000/16, and then I create three tiers, right? So the web tier, the app tier, and the DB tier. So the web tier, I could put it in a subnet. Let’s say I call, let’s say I take that a subnet of this CIDR, which is let’s say 1001.0/24, okay? And then I could do another subnet for the app tier, which would be 10020/24, and then for the DB tier, 10030/24, right? So I can kind of break up my VPC this way into multiple subnets, put my different tiers in those subnets. And then for each of those subnets, I can have different kind of settings, right? So for instance, the web tier, I want access to the public internet.

Nikhil Shetty 00:14:21 So I need to do something for that, right? And we’ll come to all of this about internet gateway and stuff like that later. The app tier only needs to talk to the web tier and the DB tier, right? So you will set up the subnet rules appropriately. Finally, the DB tier only needs to talk to the app tier, right? So how do you set up something like that? So you’ll see that as part of all the, as we discuss more of the concepts here, we’ll see how you kind of achieve that basically.

Kanchan Shringi 00:14:50 You talked about subnets, and then I think you mentioned splitting the IP address space across the subnets. What is the address space? Is that the private address space? And can you please describe that a little bit? And then maybe talk about CIDR blocks as well.

Nikhil Shetty 00:15:08 I introduced the term CIDR, and again, CIDR here stands for CIDR, right? So for folks on the audio who may not know this term terminology, and again, back to what I said, let’s expand the acronym and let’s work on it. So it’s stands for Classless Inter Domain Routing, which sounds pretty complicated, right? So what do you mean by classless, right? So when the internet started out, you have these IPB four addresses, which are 32-bit addresses or four bytes basically, right? So when the internet started out, they initially had the sense that–hey, I would break these into multiple classes of addresses, right? Some addresses would have the first 24 bits fixed, and now you can play around with the last eight bits, right? So you get two to the power eight different addresses, which is 256 addresses, right?

Nikhil Shetty 00:16:05 So some customers on the internet would be happy with that Class C address space. Then there would be some who would be happy with the Class B address space, which is take the first two bytes slash 16, and then that’s 16 bits, right? So you can now modify the remaining 16 bits to that two through power 16, which is 65,000 addresses. And then maybe there are some who are really big for them. You keep the first eight bits fixed and the remaining 24 bits, which is two to the power 24. So 16 million addresses. Immediately people realize–hey, that is too rigid of a system, right? Because what if I needed only 10,000 addresses? I can’t use a class C space, I would have to use multiple class C spaces, or I have to then buy a class B space, which is 65K, which might be too expensive for me.

Nikhil Shetty 00:16:56 So you want now all different kinds of intermediate breakup of this address space, right? So they completely let go of this class-based addressing and they said–hey, let’s just do it classless, right? So I just say that–hey, this is my prefix, and these are the bits of the prefix that are significant. And everything under that is owned by this particular, let’s say, autonomous system. That’s the terminology that’s used. So you’ll see the way we are using CIDRs in VPC is just a way to notate, annotate an address, basically, right? So you’ll say I have 10.0.0.0 slash let’s say 18. That’s my CIDR, right? So what that means is 10.0.0 is the address, if you take the prefix first 18 bits of it, that’s what would be fixed, right? And everything else, the remaining 14 bits. So I get two to the 14 addresses, which I can use in my VPC in any way I choose, I can break it into multiple subnets, whatever I want to do with it.

Nikhil Shetty 00:18:04 So that’s where the CIDR terminology comes from. You could just probably call it a prefix and be done with it. You don’t have to call it CIDR, right? But that’s the terminology that’s been used. So we need to just understand it. Same thing applies to IPV six. The only difference is that’s 128 bits of address space, right? You can again break it down in sequence in any prefix that you want. Coming back to the question of what is a private address space? So the IETF, basically it’s set aside some addresses for folks to use within their organizations, right? So these are private spaces which are not publicly routable over the internet, right? So if you take an address in this space, you try to send some packet over the internet, you cannot go, right? You maybe you can only go within your private network, private corporate network or something like that.

Nikhil Shetty 00:18:59 So accordingly, what ITF did was they set aside these address spaces. One of those was 10.0.0 slash eight. That means the first 10 is fixed, and then you have three bytes or two to the power 24 addresses, which you can use any way you want, right? So 10 slash eight, the other spaces 172.16 slash 12, another one is 192.168 slash 16. And as I told you, like the longer the prefix, the lesson, the number of addresses that you have, right? So I mean, if you’re setting up a VPC, you probably just want to take the biggest one because you don’t know how much you’re going to grow, how successful you’re going to be. So you just take 10, let’s say 10.0.0 slash 16 and that’s your CIDR block that you’re assigned to your VPC. So question you could ask me is why can’t I use a public address space, right?

Nikhil Shetty 00:19:54 And the answer to that is absolutely, you could use a public address space in your VPC, but imagine what would happen, right? Let’s walk through that scenario. So as an example, I know that 8.8.8.8 is a very well known address, which is on the internet. This is Google’s TNS service, basically. So now imagine you had a VPC, right? Which was 8.8 00 slash 16, and then in that VPC, you created a subnet called 8.8.8 slash 24, and now you attach an instance in that subnet. What if it got the address 8.8.8.8? What would you do then, right? If your instance really wanted to access Google’s DNS service for whatever reason, all the packets would just come to this instance rather than going over the internet to Google’s DNS service, right? So the answer to that is you can use your public address space, but you have to be so confident that, you’re never going to use or never going to access any public service that exists in that address space. So usually the recommendation is don’t do that. Like just use one of the private address spaces, and if you’re really running out of them, then maybe you think about pulling up some of the public spaces and adding it to your VPS, right? Definitely not recommended, but if you know what you’re doing, maybe you can go ahead and do it.

Kanchan Shringi 00:21:19 So you talked about using the prefix 16, why would I not just reserve the largest space?

Nikhil Shetty 00:21:26 Yeah, so those would be restrictions from your cloud. So if your cloud provider allows you to create CIDR blocks with slash 12, go ahead, you can do 10.0.0 slash 12, right? You don’t have to do 10.

Kanchan Shringi 00:21:39 What is the reasons for the provider to restrict it?

Nikhil Shetty 00:21:42 It would be some kind of internal restriction like their stack. Maybe it just doesn’t support it, right? Maybe the way they’ve designed their internal software to handle all of this, it just doesn’t support that kind of ranges. So you will see those kind of differences between the clouds. Maybe some of them have designed it that way to provide that flexibility. Some have not, right? So you’ll see different kind of ranges that are kind of supported,

Kanchan Shringi 00:22:07 And I think you hinted at this, but can you just explicitly state the difference between what I understand are public subnets and private subnets.

Nikhil Shetty 00:22:17 Okay. So I started out by saying that we have a three-tier app, right? The web tier, the app tier, and the DB tier, and each of those is in its own subnet, right? And we said that the web tier would need access to the internet. So it’s a probably a great time here to talk about what are route tables, right? So route tables are something that allow the routing to happen in a subnet. So what it tells you is, hey, for this particular destination, how do you want to get to it? Like which subnet should I go to, right? Or do I need to go to something else? Like some other device? Like a gateway, right? To reach a certain target. So what you do is you give a prefix or a CIDR and you say–hey, this destination, this is the target I want to do, right?

Nikhil Shetty 00:23:11 And the route tables usually work on longest prefix match, right? So if you have a longer prefix in the route table, that is what will get matched rather than the shorter one, right? So essentially each of these subnets they have are out table. The default route table usually will have all the subnet related prefixes in them, right? And we’ll tell, okay, for this prefix, go to this subnet basically. So that kind of mapping would already be there. But sometimes you want to add additional entries like–hey, I want to go to 8.8.8.8, right? So take the internet gateway when I want to do that, right? So that’s the route that you’ll add. So now I’ve been talking about internet gateway, right? So that it’s a network function essentially, which allows you to reach out to the internet. So when a subnet has a route table entry that allows it to go to the internet over an internet gateway, that’s when you see that the subnet is public in nature, okay?

Nikhil Shetty 00:24:14 So with that, what happens is any instance that is in that subnet, if it can now start talking to internet addresses, and it may be assigned an address over the internet, right? Not a private address. Everything already has a private address, it’ll get a public address, maybe dynamically, or it could be an elastic address. That’s a separate discussion. It’ll get a public address or for the internet, and all traffic will now go through the internet gateway and can reach to the internet, basically. So that’s what public subnet is, right? So as I said, you have the route table and you have the internet gateway. Once you do that, you get an address on the public internet, and you can talk to the internet from that instance. The interesting thing here is there is something called a NAT gateway as well, where your private instances, your private subnets can also access the internet, right?

Nikhil Shetty 00:25:11 So again, we are blurring the boundaries here, but just to be clear, the private subnet can also access the internet via the NAT gateway. What it cannot do is allow communication from outside in the internet into that subnet, right? So it’ll allow one-sided communication. So it’ll the instance can open up a connection out to 8.8.8, for instance, if you wanted access to the Google DNSs, right? But Google’s DNSs cannot open a connection to that instance, right? So that would be completely impossible. So you have internet gateway and the NAT gateway, and then you use internet gateway when you want both site connections to be allowed, right? From the instance or from the internet. And you use the NAT gateway if you want to keep this subnet as private, but still want some kind of access to the internet.

Nikhil Shetty 00:26:01 And that’s where the route tables also come into play, because now you can specifically say-hey, what? I want this instance to only access 8.8.8. So maybe in the route table, I only add that entry and nothing else, right? So anything else I try to reach out over the internet is going to be denied, basically. So that’s kind of, you can think of it like another layer of your Swiss cheese model of security, basically, right? So this is another layer. Okay? So I think that kind of covers both of these. And back to what you asked me about subnets, public and private. So in our case, in our example, what I would do is I would create an internet gateway, attach it in the VPC, I would add a route table entry for the web tier subnet to say, let’s say, let’s say default for now, right? So anything for the internet, just go to the internet gateway. Maybe I need 8.8.8. So I will say, create an ad gateway, attach it to the VPC. Now my app tier and my DB tier, I say anything that wants to go to 8.8.8 add a router table entry target would be the NAT gateway, right? So now those instances have access to 8.8.8, and that would kind of set up all my communications for me basically.

Kanchan Shringi 00:27:22 So continuing on the security model, I’ve heard of security groups and network hackles. Can you help us understand what they are and when should one be used versus the other?

Nikhil Shetty 00:27:35 So we already talked about security in one sense, which is in terms of like access, how do I restrict access to the internet, for instance, right? How do I keep my instances private, right? One other way to do security is to kind of enforce all kinds of communication that an instance can do. Okay? And here I think I’ve been using the term instance and a subnet, right? The more accurate relationship is actually a network interface on an instance and subnet, okay? So I just wanted to bring up that distinction here. An instance could have multiple network interfaces, right? And they could be in different subnets, okay? So really when I say instance until now in this whole talk, I was referring to a network interface of an instance, okay? So now a security group, what’s a security group? A security group is like an allowed communication for a certain instance.

Nikhil Shetty 00:28:37 And you can say–hey, what’s the protocol? What’s support range? Whether it’s an ICMP traffic or not, what are the source and destinations that you allow the communication from and to, right? So you can provide these as a rule and attach it to an instance class, right? So that’s what a network security group is. And that would automatically, any instance that you spawn in that instance class, on all the network interfaces of that instance, you would apply this network security group. Okay? So just to point out, network security group associated with an instance, it applies to all interfaces on that instance. Then there is something called Hackles (?), which can be set up in a subnet, right? So those are associated with the subnet rather than the instance. And then ALS are very similar. They have the same kind of match criteria, right?

Nikhil Shetty 00:29:37 You have like the protocol, port source destination, things like that. The difference would be that would tell you what is allowed or denied, right? So each entry can actually be specific to allow or deny. While network security groups, they’re just saying, okay, these communications are allowed, and by default everything else is denied, right? So ALS kind of support this more granular kind of allow and deny rules basically. And back to what I said, the network security group goes with the instance the network ALS operates at the subnet level. So that’s the distinction. You can always have a security group for a certain instance class, and then the network could also replicate the same thing. It just adds another layer, basically, right? So in case you messed up something on the security group, at least then actual kind of protects you or vice versa, like you messed up something on the network, the network security group kind of helps you.

Nikhil Shetty 00:30:41 So both of them have a place in your security architecture. Usually what would be recommended by most cloud operators is to use NSG because they go with your instances. You don’t have to worry about which you’re putting this instance in, right? So for example, let’s say your AP tier, I started with an example just with one instance, but maybe in the future I have thousands of instances and I exhausted my subnet. I need to create one more subnet. Now, when I create the subnet, if the network security group that goes with this instance class, if the instance class remains the same, the network security group continues to apply, right? So even if I change the subnet, then NSG would still continue to apply, but the network act allowed to explicitly put on that new subnet as well, right? So that’s the benefit. So you’re kind of decoupling the security of an instance class from what your network structure is for that particular instances.

Kanchan Shringi 00:31:38 So for network ackles, you mentioned there’s both allow and deny rules. That’s a little confusing to me. So what happens to what is not either specified as allowed or denied?

Nikhil Shetty 00:31:49 So usually these rules have to specify whether the certain flow is allowed or not, right? And then whether it is denied or not, right? So what you could have is, for example, you could have a network security group that says– hey, give me all SSH connections, right? So that could be very simple. Network security group, port 22 protocol is TCP allow everything, right? And then the network could say, what, actually do not allow traffic coming from certain range because I know that’s not a good range, or it’s coming from my competitor, or something like that. Or that there are bad actors on that range of addresses. So you can specifically go and say–hey, deny anything that comes from that particular range, right? So that kind of helps you with the denying of access essentially.

Nikhil Shetty 00:32:39 One other thing that we should also keep in mind is the network security group. Every rule in the network security group is evaluated, but on the S, the way the S work is the first match that you get to an axle, that’s the axle that will be used. So if you have a deny at the top somewhere and that gets matched, then your traffic will be denied, right? It’ll not go and see, okay, is anything else allowing this traffic? So it goes in sequence and evaluates all the rules one by one. So all of this by the network hackles come more from the networking side where network devices used to have these S in order and things like that, but the security groups, they come more from the application side. So that’s a way to kind of think about it. And both have their place basically, and both have their utilities.

Kanchan Shringi 00:33:29 So we’ve talked about VPCs and subnets in some detail. I’m wondering, are there scenarios where you would have multiple VPCs?

Nikhil Shetty 00:33:38 Multiple VPCs? Yes, it’s possible. You can have multiple VPCs. One way you can think of adding multiple VPCs is, so remember what I said this, I have a three-tier web app, right? Maybe I have another web app, which is another three tiers, and I want to manage it separately, right? So I put it in a separate VPC and manage all of its interconnectivity that way, right? I could have organizational boundaries, right? So I could say that maybe the sales team has whole VPC and they have a bunch of services there, and then there is an engineering team, which has a completely different BPC with their own subnets, right? Or maybe even within that, there is a VPC for development stuff, right? So it kind of provides that isolation, right? So you make any changes, you don’t impact some other application, right? So that’s one benefit. The other benefit is you can drive kind of management of these VPCs through separate orgs, basically, right? So I think that’s another benefit of having multiple VPCs. I mean, in general, if you’re a small application developer, probably you don’t need multiple VPCs. But as you grow and your organization becomes complex, the services and applications that you provide become complex, you may want to kind of isolate your applications from each other.

Kanchan Shringi 00:35:06 I’m interested in scenarios that require connecting instances in two different, or two or more different VPCs. Let’s talk to . How does that work? And I’ve heard of issues with that, with CIDR overlaps. Could you cover that?

Nikhil Shetty 00:35:22 Yeah. So what does CIDR overlap mean, right? So let’s take this example. I said 10.0.0.0 slash 16 is a CIDR in my VPC. Now if I create another VPC and I use the exact same CIDR block, there is a potential that when I try to connect them together and the way to connect them together is something called VPC peering, right? So I could peer these VPCs together so that all the subnets in each of these VPCs now can see each other, right? But the minute to do that, if there is the same address being used in another subnet in another VPC, in the other VPC, it’ll cause an overlap, right? So you cannot do that unless there is some kind of NAT translation that you can do. And, NAT stands for Network Address Translation, right?

Nikhil Shetty 00:36:16 So as you go between the VPCs, if you can transform your IP address into something that only the other VPC understands, right? Then you could do that communication. But it’s pretty complex. And these are all new kinds of features that are there in the public clouds because of some of these things that they observed with overlaps and stuff like that. But the best way to avoid this kind of overlap is to just avoid them right in the first place, which is get different address spaces, right? Like if youíre one big organization, try to keep them separate, right? So you want to do some kind of IP address management, and there are a lot of tools that these public cloud vendors would provide you. Like how to manage your IP address space, how to monitor, how much of it is used, things of that sort.

Nikhil Shetty 00:37:05 So definitely you should investigate and look into those features so as to avoid these kinds of overlaps. Because getting into the overlaps, I think it’s just going to be very complicated to resolve eventually. So better planning, I would say is the way to go about it. The more complicated use cases, I have a dedicated Private Cloud, right on premise. Now, if I’m connecting that to the VPC that I have in the public cloud, or I have multiple VPCs in multiple public clouds, and I want to now start talking between them. So you really want an organization-wide view of all the address space that you’re using. So yeah, you definitely need to plan for it. I think planning is super important here.

Kanchan Shringi 00:37:50 That’s interesting. Like I know from just scenarios I’ve seen recently, it’s very hard to envision what might happen. So suddenly understanding what are the options in doing the translation.

Nikhil Shetty 00:38:04 So as I said, there is this option of a NAT gateway that you could put a private NAT gateway, right? There’s not the NAT gateway that goes to the internet, but between your VPCs, you could have like Nat gateway that translates your addresses into something that the other end understands and does not overlap with something that the other end has, right? But, that just reduces the amount of option you have in terms of communication. As I said NAT you can only initiate a connection in one direction, right? So maybe that works for a lot of the applications, but if you wanted to now do the communication both directions, what would you do now? You want to then move it, make two NAT gateways? One in each direction. I don’t know, like, so it starts getting complicated very quickly.

Kanchan Shringi 00:38:49 Yeah. So there was a term called private link when connecting VPCs that I came across. I’m not sure if you talked about that already, I don’t think explicitly, but I don’t know if you talked about it.

Nikhil Shetty 00:39:01 I didn’t speak about it. It’s kind of related to the VPC peering in a way where if you have a service that’s maybe hosted by another AWS account. So instead of doing VPC peering with that other service, right? By the way, it may not be a completely different AWS account, it could be an AWS service, but you want that service to be kind of visible within your subnet in a private address space. You don’t want to go public, right? How would you do that is via something called private endpoint. So what that would do is, so take an example of this database. Maybe AWS maybe the public cloud has a database service that allows you to have a private endpoint in your subnet, which you can access it via, right?

Nikhil Shetty 00:39:57 So now in your subnet you can say–hey, I want to add this private endpoint of this database service. So the database service will show up like as if it’s a network interface or an instance in that private subnet though it’s really a service backed by properly cloud service, right? I think on the other end. So, but it’ll look like an instance to you, right? So you can talk to it like an instance, you don’t it’s within your subnet, so you don’t, maybe you don’t even need special protections and things like that. Your traffic does not exit the VPC and maybe go into some kind of public arena, right? So all of those benefits exist. So you could do that, you could attach the database service into your database subnet, and then maybe you have just a small thing there which allows you to access that endpoint, basically, right?

Nikhil Shetty 00:40:49 So, and you can call all the APIs of that service just like as if it’s a service that you delivered, that you built and you put it in your subnet. So it looked like that. So that’s something that private endpoint and, you can use this for attaching other AWS accounts as well, right? As long as they have support for private links, they can provide the service to you. You can again, add those services as a private endpoint in your subnet and then access those services via that private endpoint. So yeah, I think that kind of hopefully covers it and answers your question around it.

Kanchan Shringi 00:41:28 Yeah, thanks Nikhil. So here in last several minutes we’ve been talking, I’ll count some of the key technology that people would use. Now I’d like to drill into very specific topics. We can pick a couple and drill into that. The first one was, bring your own IP. Where does that fit and why would one do that?

Nikhil Shetty 00:41:47 Yeah, so there are a lot of these organizations, big organizations that are kind of migrating to the cloud, right? So previously they had maybe an on-premise infrastructure, maybe they bought a bunch of public addresses. So by the way, public IPV four space is extremely scarce and expensive. So maybe you already bought it, right? And you’ve been using some of these public address spaces, maybe you’ve even given it to some of your customers and said–hey, can you please whitelist my public address space, right? For some service or something, right? When you go to the cloud and you are saying–hey, give me a public IP address, right? You get a public IP address from the clouds address space, but maybe that’s not what you want. Maybe you want to continue to offer your services from your address space. So you leverage anything, all your customers have kind of set up a whitelist for your address space somehow.

Nikhil Shetty 00:42:43 So you want to leverage that, you want to continue to leverage that instead of asking them–hey, know what? Now I’m going to get this dynamic address from my cloud provider. Maybe I’m multiple cloud providers. That becomes even more complex. So instead of that, what you do is you take your IP space, bring it to the cloud, and give it to the cloud operator and say–hey, I have this public IP space when my instance is trying to talk out to the internet, please assign an address from this space instead. Right? So what does a cloud operator have to do at that point? There is some work to be done because they have to verify that you actually own that address space, or, you cannot just take someone else’s address space and say– hey, cloud operator, let’s take this as my IP address space, right? So they have to go and verify that. So they would’ve to verify it with the regional internet registries or something, and then they would accept that space and once it’s accepted, they would start advertising that address space. The cloud operator would advertise it, address space in the internet, right? And start attracting traffic for that address space. And then, at that point you can start using it, start assigning instances, application services, whatever you’re building on your, on the cloud, you’ll get the same address space essentially. So that’s what bring your own IP is.

Kanchan Shringi 00:44:02 Another topic that I’m curious about is monitoring analytics and logging the IP traffic. Why is that important? Have you discovered any interesting patterns, when people do this?

Nikhil Shetty 00:44:17 Yeah, good question. So there are a bunch of different tools that you can use when you’re debugging any connectivity issues in the cloud, right? So you have a VPC, you could set up flow logging, right? So for a subnet, so what flow logs will do is it’ll tell you–hey, what’s the flow that’s coming into the subnet, right? And what was the action taken by the cloud for that particular flow? Was it like a deny or accept, whatever it is, right? That helps because as I told you, the security groups and ackles (?), they become quite complicated very quickly, right? And they may even like override each other or something, right? So if you’ve by mistake had a bad setting on your ackles (?) or something, which is denying the traffic that’s trying to come into the subnet, you’ll not see anything on your instance basically, right?

Nikhil Shetty 00:45:10 So at that point, you want to enable flow logs and say–hey, let’s see what’s happening to this flow. Let’s see if it’s even hitting the subnet in the first place. Maybe it’s not even reaching the subnet, maybe there’s an issue somewhere else. But at least when it hits the subnet that was it accepted, was it denied? And why was it denied? Maybe there was a rule, or maybe there is security group that kind of denied that access basically. So that kind of level of debugging would be enabled if you do flow logs. The only thing to watch out is with flow logs, if you have a lot of flows, like you have a really popular application. That can quickly overwhelm some of these things. So you’ll have to most of these services are horizontally scalable, but maybe there are limits and you have to pay more money in terms of costs for the logs that you’re exporting.

Nikhil Shetty 00:46:00 So that’s the only thing to kind of watch out for. There are other tools. I know there are tools like traffic mirroring. So you can actually mirror the complete traffic that an instance is receiving. Maybe this is useful for some kind of compliance kind of use cases that we see in some of the corporate networks and stuff like that. So maybe it’s a similar kind of use case that you want to apply in the virtual Private Cloud. There are also tools like reachability analyzers tools. So where you could set up something like–hey, I want to always monitor my connectivity between a certain instance IP and the internet, let’s say. Some address on the internet. I want to make sure that that connection is always up somehow. So anytime you make any configuration changes in your VPC, the reachability analyzer will run again to verify that that connectivity is good.

Nikhil Shetty 00:46:57 And if they figure out, oh you know what? You just added a NAT here that denied that traffic, then it’ll kind of throw an alert, right? So, your operators can get an alert saying, oh there was a configuration change, the reachability is now broken, so you may want to go and fix your configuration again. So you get an immediate feedback rather than figuring out later when somebody actually does that communication and they have a problem with that communication, then debugging becomes super difficult, right? You have to like go ping something. You have to enable flow log or look in the flow logs, things like that. So, having some kind of reachability analyzer that kind of analyzes your route tables, your security groups, your actual rules, right end to end, I think that is kind of valuable.

Kanchan Shringi 00:47:44 So the next topic I like to chat with you about, Nikhil is the DNS management on the VPC. Could you maybe just start explaining what DNS management would do and how would you configure that on the VPC?

Nikhil Shetty 00:48:00 Yeah, so back to the theme on acronyms and stuff like that. So DNS for folks who are not initiated into this, it stands for Domain Name System. So this is the way you kind of resolve host names to IP addresses to perform your communications. So in case of VPCs, usually they would have some kind of default DNS support for the instances that you create in the VPC. What would happen is, if you enable the DNS support all the private addresses for the instances would be assigned some kind of host names. Some, you can also actually go ahead and maybe create your own personal private domain and you can assign that to the VPC.

Nikhil Shetty 00:48:56 When you are creating instances, those instances will get host names and then those host names will be mapped to those private services. Also, how does that help you, right? So let’s say, taking our example here, the three-tier web app, let’s say I have a domain called example.com, right? I could say my app is app instance example.com. My DB would be DB instance.example.com. And that way, when I put the thing into my configuration of my application, right? I just have to say DB instance.example.com, it kind of decouples it from the actual IP address that’s assigned to that instance. So, later, let’s say if you actually copied your DB into another instance, you could just move that DNS posting onto that IP address.

Nikhil Shetty 00:49:54 And nothing has to change on the application in, right? So that’s one way you could use it. The other way you could use it is you just want to check connectivity between these instances. You don’t have to remember the IP address of the instance. You can, if you have a way that you have configured all these host names in your VPC, you can use that to actually perform like a ping check or something of that sort. In addition there’s also public DNS that you could use. Most of providers would’ve some kind of public DNS support. So in our example, if you look at the web tier, it is supposed to be on the internet, you probably don’t want to give your IP address to someone else.

Nikhil Shetty 00:50:42 Rather you want to give a host name, right? You want to give a proper domain name, right? So for example in this case, if it’s example.com, you’d call it hey web.example.com, and then you kind of associate it with the web instance that you have. Or if you had a load balancer there, you can associate it with the IP address of the load balance, the public IP address of the load balancer. Now in addition, on the public DNS side, there are a lot of other features that you could use from the cloud providers. For instance, you could do like a DNS failover, right? So you have multiple of these endpoints. You could put one DNS entry pointing to all of these or pointing to one of them, and then perform health checks and monitoring and make sure if it’s not healthy, you flip over to the backup, basically, right?

Nikhil Shetty 00:51:37 So DNS could do a failover feature, could do that for you. Public DNS also provides other kinds of features like, kind of routing your request to the closest cloud region. Or, maybe you can configure and say–hey, if there is a request that comes from this geography maybe from North America, always send it to maybe the US East region. If it comes from Europe, always send it to a Frankfurt region or something of that sort. So there are a lot of these things that you can play around with, especially with public DNS. And with that, you can actually create a proper end-to-end application that’s globally available and has a lot of redundancy.

Kanchan Shringi 00:52:25 A couple of follow-up questions, Nikhil. The first one was just in terms of certificate management for the host names, how is that integrated into the setup?

Nikhil Shetty 00:52:36 Yeah, so certificate management, there could be multiple things that you want to worry about here, right? So for example, maybe what you want to validate is the certificate that the host is actually offering me. Is it a valid certificate? So for example, I want to go to SSH, I want to do SSH DB.example.com, right? Is it actually the DB host, right? So the way it would work is I would have a CA certificate. So let assume, let’s say the public cloud provider is the CA as well. Maybe you get those trust route onto your host so that when you do an, when you do some kind of API call to your DB server and it returns back, that certificate you can actually validate locally that it’s actually the remote the host is actually who it says it is right?

Nikhil Shetty 00:53:29 And or most of the cloud providers, they would have some kind of certificate management where they, you can actually download these certificates onto your host maybe in an automatic fashion you can kind of rotate it periodically, right? So all of those settings can be tuned. The cloud providers could also provide a way to send these certificates to your load balancers. So for example, in this case, in the web case, maybe the load balancer automatically keeps getting its certificate renewed periodically, maybe every 30 days, whatever it is. And it could be signed by a well-known provider, which is something that is trusted by all the browsers throughout the world. So if you’re actually coming in from a browser to your website, then your browser should be able to trust the certificate that it sees. So it can be signed by one of those well-known providers as well. So those are some of the services that these cloud providers can provide you.

Kanchan Shringi 00:54:34 So what about the situation where the application is a SaaS, B2B SaaS? Let’s take that as an example. And the customers, let’s say they’re Pepsi and Coke, each want their end users to have a unique URL, so it’s a multi-tenant application. How would you then manage the DNS to provide multiple host names to the same IP?

Nikhil Shetty 00:54:59 Multi-tenancy can be implemented in multiple ways. You could have a single endpoint and then somehow your requests kind of identify which tendency that you are making the request for. That’s one way you can go about tenancy, but I think what you’re asking is, can I have two different DNSs domain names, right? For two different customers of mine, right? So that’s definitely something that’s possible. Maybe there are, again, within this, maybe there are multiple ways you could implement it, right? Maybe you have two different load balancers with different IPs. So all the DNS mapping, everything is different. Or you could have a single load balancer and then the DNS request comes in and based on what is the address that the request sees, right? You can forward it to your appropriate backend which is actually serving that particular customer, right? So things like that. So that, there are many ways you can kind of skin the cat here but those are some of the ways that you can do it. And one of the ways as you said how do you use DNS? So that’s one way to use the DNS. You could assign your public host name to different IP addresses. That’s one way you can do it.

Kanchan Shringi 00:56:21 Is there any impacts to network latency that one should be aware of? Or making sure that you are tracking for any specific configurations?

Nikhil Shetty 00:56:31 Yeah, so most of these cloud services, they would be horizontally scalable elastic services, right? So you can use as much as you want. That’s usually the hyperscaler mantra. What you want to watch out for is, there might be some things, for example I’ve not discussed this before, but maybe this is a good point to actually talk about load balancers. In the three-tier web app that I talked about, I was always giving any single instance as an example, but you’ll probably never have just a single instance. You’ll have multiple instances. In fact, you’ll have multiple instances divided across availability zones. And what are availability zones? Availability zones are just zones which give you enough redundancy. So, you have a data center, the other availability zone, the data center might be kilometers away.

Nikhil Shetty 00:57:28 So if there’s, let’s say there’s a wildfire and one of your data centers goes down, you still have your instances in other data centers in that area, maybe 10 kilometers away or something like that, which are still up and running and your services continue to operate basically. Or you don’t have to go to wildfire, you can, even think of like a power cut or something. A simple thing like that, right? There’s a power issue. You have something in the other. So now once you have multiple instances in your subnet, how do you manage high availability? You cannot expect every consumer to know about all of these and try to check which of them is healthy and then send traffic to that instance. So you, what you want to do is you want to put kind of a load balancer, which provides both the high availability.

Nikhil Shetty 00:58:17 So the load balancer itself can verify which of these instances is healthy and send traffic only to healthy instances. It can also do load balancing, right? So you have multiple instances you can balance the load across all of these instances. So now back to the question about network latency. There are two kinds of load balances. There’s like application load balancer and network load balancer. And the application load balancers are kind of like layer seven load balancer. So they kind of terminate the whole HTTP connection, then start a new connection to the backend basically. Now, if you had that versus a network load balancer where it only terminates, let’s say your TCP session, right? So the TCP session is terminated on the load balancer, but not the TLS, right? And the TLS, everything happens, let’s say with the backend, then what happens is you have lower latencies in the handling, right?

Nikhil Shetty 00:59:16 So there might be some choices in terms of which product you use that can affect the latency of your end-to-end traffic, right? But in general, I would say that you wouldn’t have to worry about some of these latencies and stuff like that until you really hit some high limits, right? So for example, again, going back to the example of a load balancer, maybe there is a max bandwidth limit on the load balancer, which may be very big, right? Like, so if you’re a very small application, you don’t care usually, but maybe if you’re a very popular application, then yes, you are starting to hit towards come closer to those limits and then you want to add maybe more load balancers and things like that. So you want to do some kind of horizontal scaling and figure out how to balance across those load balancers, right?

Nikhil Shetty 01:00:03 So I think that’s kind of like the next level of things that you want to start thinking about. The bigger thing I kind of worry about is cost, right? So what’s the price of what I’m using? So for example, the load band may have like connections per second. How many active connections I have, how much data I’m sending per hour, and maybe there are two different products and one product is slightly cheaper. So you may want to use that and that’s what you scale up, right? With the cheaper product basically. Maybe that’s what you want to do. There are other interesting things where there are cost implications. For instance, internet gateways, if you’re sending a lot of traffic out of the public cloud, there are some cost implications to that. So when you’re designing your application, you want to just make sure that you’re kind of accounting for some of those aspects. So back to your original question about network latency and VPC configuration, it’s usually not the public cloud service that is responsible for your latency, but it is most likely your design or how you’re using those services that is causing you the trouble basically. And there will always be some option that you can choose or pick that can help you avoid some of those latency problems and things like that.

Kanchan Shringi 01:01:21 So I wanted to spend the last few minutes on just seeing if there’s key aspects to comparing and contrasting the VPC offerings by the major cloud providers. Is there something specific comes to your mind between Amazon VPC or Google Private Cloud or Azure or OCI network or even IBM? If you can comment.

Nikhil Shetty 01:01:46 Okay, so I haven’t looked through every public cloud to see all their feature sets. What I could compare between is maybe AWS and OCI, because these are things I’ve actively used as part of my work. I think the key difference that you’ll find between these cloud providers is the terminology, right? That’s the key difference in my mind, because usually the same product exists in both the clouds, but they may have different names, right? So for example, something that’s called VPC in AWS, it’s VPC, right? And that’s exactly why we started this talk because AWS is so big, everybody’s using VPC, but it’s not called VPC in OCI, right? It’s called VCN Virtual Cloud Network. But it’s the same thing, right? Same thing about, there’s something called virtual private gateways, right?

Nikhil Shetty 01:02:36 So in case of AWS it’s called a virtual private gateway, but in OCI, it’ll be called a dynamic routing gateway. So getting around that terminology could be a little challenging initially, but you just have to look for some of the keywords that you’re what are you looking for basically, right? And from your application point of view, and then if you find it, you’ll be able to quickly find it. The other big difference is like the default limits, right? So for example, I start on AWS, my account would have a limit of five VPCs per region. If I start on OCI, it’s 50 VPCs, right? So those things could be different. Like I’m sure the numbers are quite different in Azure, GCP, things like that. Same thing like for subnets, maybe AWS has 200 subnets per VPC, OCI has 300 subnets per VPC, right?

Nikhil Shetty 01:03:24 So there could be limits differences. Finally, there could be differences in the features, right? And I think we talked about this at the start, like what are the CIDR blocks that you can use in your VPC? In case of AWS, it ranges from slash 16 to slash 28, in OCI, the ranges could be from slash 16 to slash 30. Okay? One key difference, which I have found, which was very confusing for me initially, was that AWS does not have any regional subnets, right? So when you create a subnet, it’s actually availability zone specific subnet only, right? But OCI, and in fact I think I recently looked at Azure as well, and I think they have this concept of regional subnets. So you create one subnet applies to the complete region, right? Some features like ackles(?) has support for ackles (?) which are both stateful and stateless in the sense that you know, write an ackle in one direction, the reverse traffic corresponding to that same flow is automatically allowed, right?

Nikhil Shetty 01:04:24 So that’d be a stateful ackle entry. On the other hand, like if you look at AWS, AWS allows third party for private link, right? OCI does not, in terms of the IPAM tool, the AWS has a very well developed IPAM tool, which allows you to monitor IP address and usage and things like that. OCI does not have that well-developed tool. And again, I want to point out that any differences that I’m talking about today may not be differences one month from now if there’s a new feature that’s released, released by the cloud operators, right? So that can always happen. So this is just at this point in time analysis of some of these feature differences. But yeah, so terminology, limits and features, I think those are the three key things that I would say are the big differences between these cloud vendors.

Kanchan Shringi 01:05:13 Thank you, Nikhil. So we’ve covered several topics and of course this is a pretty vast subject. But is there any key topic you think we missed that you would like to talk about?

Nikhil Shetty 01:05:22 No, I think we went through the whole gamut of things here. I’m sure, we have missed something, right? But this is, I think this is the best we have done and we’ve tried to give people kind of an overview. So now hopefully it gives a jumpstart for folks who are kind of trying to dig in deeper, right? At least this say, oh, I understand at high level what this looks like. Now let’s actually go into the documentation and figure out what each of these things actually does, right? So yeah, in terms of topics, I don’t think we have anything else left to discuss, but this is all a surface kind of analysis of all the different features that these cloud operators provide. So it requires another level of analysis.

Kanchan Shringi 01:06:01 What is the best way, if somebody wanted to contact you?

Nikhil Shetty 01:06:04 It’s Nikhil VGS on all social media, I prefer LinkedIn or Twitter, actually not Twitter X, that’s a new term. So yeah, LinkedIn, or X, it’s NikhilVGS. And you can connect with me there.

Kanchan Shringi 01:06:20 Thank you so much, Nikhil. This was a great discussion. I hope our listeners learn and like you said, use it as a jumpstart. Thanks again.

Nikhil Shetty 01:06:27 Yeah. Thank you so much. Bye. [End of Audio]

SE Radio 586: Nikhil Shetty on Virtual Private Cloud

Show Notes

Transcript

Join the discussion

1 comment

More from this show

SE Radio 730: Birgitta Boeckeler on Harness Engineering for AI Agents

SE Radio 729: Garth Mollett on AI Supply Chain Security

SE Radio 728: Clare Liguori on the AWS Strands SDK for AI Agents

Menu

Recent posts

Search

Search

SE Radio 586: Nikhil Shetty on Virtual Private Cloud

Show Notes

Transcript

Join the discussion

1 comment

More from this show

SE Radio 730: Birgitta Boeckeler on Harness Engineering for AI Agents

SE Radio 729: Garth Mollett on AI Supply Chain Security

SE Radio 728: Clare Liguori on the AWS Strands SDK for AI Agents

Menu

Recent posts