SE Radio 468: Iljitsch van Beijnum on Internet Routing and BGP

Iljitsch van Beijnum, author of the book BGP: Building Reliable Networks with the Border Gateway Protocol https://www.oreilly.com/pub/au/970 discusses internet routing and BGP – the border gateway protocol used by ISPs to update routing information. Host Robert Blumen spoke with Iljitsch about the topology of the internet, autonomous systems (AS), regulatory bodies that coordinate the AS space, IP addresses, the assignment of IPs to ASs; tier-one ISPs, carriers, and home/business ISPs; Internet routing; the path of a packet; routing tables, what they contain, and how they are constructed; routing algorithms; BGP and its role in updating routers with the knowledge of routes held by other routers; and BGP messages. Drill down into the update message. How updates progress from BGP into routing algorithms and then routing tables. What can go wrong. Attacks on BGP.

This episode sponsored by Elastic.

Show Notes

Transcript

Transcript brought to you by IEEE Software
This transcript was automatically generated. To suggest improvements in the text, please contact [email protected].

SE Radio 00:00:00 This is software engineering radio, the podcast for professional developers on the [email protected] se radio is brought to you by the computer society. Well, as your belief, software magazine online at computer.org/software

Robert Blumen 00:00:16 For software engineering radio. This is Robert Blumen. I have with me today. dosage is a freelance network specialist and writer in the Netherlands and is active within the internet engineering task force. He is the author of the book, BGP building reliable networks with the border gateway protocol and is the author of a forthcoming ebook on internet routing with BGP LGH. Welcome to software engineering radio. Thanks for having me today. We are going to be talking about internet routing and BGP. Before we can even have a conversation about BGP, we need to cover some basics on what the internet is and how internet routing works. I’ve came across this explanation of the internet as a network of networks. Can you explain what that means?

Iljitsch van Beijnum 00:01:17 Well, at home, you probably have your own network could be a very small network with just a home wifi router and then your phone and your laptop. And so on connecting to it. Organizations also have drone networks, much larger networks, but the thing is, all these networks are connected together together. They make up the internet.

Robert Blumen 00:01:37 What is the atomic unit? Well,

Iljitsch van Beijnum 00:01:40 I guess anything that has its own IP address. So that could be a very small device, probably not as small as a smart solite bulblets and anything up from there is to the most basic internet connected thing you can get.

Robert Blumen 00:01:54 We have groupings of internet addresses into what’s called an autonomous system. Can you explain that?

Iljitsch van Beijnum 00:02:02 Well, the thing is, as we get to talk about BGP, some organizations have a network that runs BGP, and then you have to somehow demarcates that network. So that is what an ASMR is not autonomous system, and they will have drone number to keep them apart.

Robert Blumen 00:02:20 Where does an autonomous system get its number from?

Iljitsch van Beijnum 00:02:24 Well, there are five regional internet registries. They give out a IPV, four, an IPV, six addresses and ASMR numbers. And,

Robert Blumen 00:02:32 And these autonomous systems. What kind of real world entities do they correspond to? Is that a corporation, an ISP or what?

Iljitsch van Beijnum 00:02:41 Well, certainly all the ISP, because you need an ass number to run BGP and you need BGP. If you connect to more than one other network. So at home you just connect one ISP. So you don’t have to have your own routing policy where you say some packets go to the left, some to the right. They just all go to ISP. So you don’t need any complex protocols for that. But the ISP, they connect to multiple other ISP and other networks. So they need to run BGP de RMAs, also content networks, sometimes organizations or enterprises, such as banks. They often are ESS, but lots of them even fairly big networks to just connect to one ISP. So they don’t need to be their own ASMR.

Robert Blumen 00:03:26 Is there any data on how many and by ASC and autonomous system, how many ACEs there are on the entire internet

Iljitsch van Beijnum 00:03:35 I could check. But I think the last time I did was 70,000 something in that order,

Robert Blumen 00:03:42 If you’re a business and you need to get on the internet, you might start out by getting an ISP and connecting you. Is there some point where you get big enough where you say we’re going to become an AAS?

Iljitsch van Beijnum 00:03:57 Well, the big thing is connecting to more than one ISP, so that that’s usually for redundancy because you can’t afford any long outages, but it could also be to save money. So for instance, if you connect to other networks directly, it’s cheaper than to pay an ISP to do it for you. If you have a large network used to be that even somewhat smaller networks, they would save money by connecting directly. But these days you have to be really huge because the ISP prices have gotten a lot lower

Robert Blumen 00:04:27 Talking about bias peas. Are there different types or tiers of ISP?

Iljitsch van Beijnum 00:04:34 Yeah. So the main thing is that the, what we call a tier ones, those are so big. They can’t find any anyone even bigger to buy service from. So they have to handle all their stuff on their own and they have to connect to all the other tier ones. It’s about 12 to 15 of those. And all the other ones are lower tiers. They’ll really count if it’s two or three or whatever. Usually it’s the big ones and the smaller ones.

Robert Blumen 00:05:04 Okay. If I have home internet, then I’m going to be contracting with a smaller ISP and they’re going to upload or not upload, but some of their traffic will be routed up to a tier one. Is that how it works?

Iljitsch van Beijnum 00:05:20 Yeah. So also difference that we usually recognize is between, uh, ice piece that provides productivity to home users, uh, small businesses, and then the ones that carried a traffic really long distances across the world. We usually call them carriers. So usually you have the, for instance, Comcast is huge ISP, but they are not a carrier. They don’t connect to all the different regions in the world. So they connect to one or more carriers. Also, they connect to our networks directly. So they don’t have to go through the carrier

Robert Blumen 00:05:53 And our carriers all the same as tier or

Iljitsch van Beijnum 00:05:58 All the tier ones are carriers, but there’s also carriers who are tier one.

Robert Blumen 00:06:02 Okay. And you mentioned Comcast, which is a, certainly a popular ISP where I live. What are some of the names of some of the tier ones and are those known to the public or are they insider names that you’d only know if you’re a network engineer?

Iljitsch van Beijnum 00:06:20 Yeah. The thing is they keep merging, so the names keep changing, but I think at, and T is still one, then we have Tata is a big Indian enterprise that has all kinds of different businesses, including being a carrier, the fries and business, although they changed the name of their network a bunch of times. Yeah. Probably names like a teller one or a dodgy Telekom. They are very active in the U S or are they a tier one? I think so. I’m not sure

Robert Blumen 00:06:53 The next one, our building block topics will be IP addressing start with how does a entity on the internet obtain an IP address?

Iljitsch van Beijnum 00:07:04 So the big difference between an IP address and an Ethan address is that the Ethan’s addresses burned into the Ethan, a chip or the Ethan’s carts and the factory. So you just get it, it’s already there. So you don’t have to do anything for that. But with IP addresses, that won’t work because there are so many, uh, addresses and all the routers need to be able to find a path to each individual IP address. So to have to be billions of entries into routing tables. So to avoid that, what we do is we hand out blocks of IP addresses. And, um, for instance, in a university you have maybe a few thousand IP addresses, or you used to have that before that became scarce, but you only have one entry in the routing table and an insight university network. They know where all the IP addresses go. So you usually get those from your ISP, but if you’re a nice be yourself, or if you want to run BGP, you get it from the regional internet registry.

Robert Blumen 00:08:04 I see it wants one or a few IP addresses. Who does it request them from and how are these requests handled?

Iljitsch van Beijnum 00:08:13 Okay. Now, suppose the university doesn’t run BGP themselves. So easiest thing to do is just ask our ISP. Usually that’s basically part of the setup process. If you become a customer and usually you need at least 256, if you, well, that’s the smallest block that is handled in BGP. So for instance, if the university then says, well, I want to connect to other networks as well, or to ISP, then they have to adopt PGP and they probably have to get to do block of IP addresses at that point. And then they, if it’s an American ISP or sorry, the American university, the north America served by Erin, the American registry for internet numbers. So didn’t have to become a member of Erin and request 256 addresses from Erin IPV, four addresses as a whole

Robert Blumen 00:09:05 I’m user. I get an IP address from my ISP. Did my ISP go through that same process to get a block of IP addresses, which then hands out to its customers. Yes. And that might be a much larger block because I SP has tons of home users,

Iljitsch van Beijnum 00:09:25 Right? So those are tens of thousands, hundreds of thousands, or even millions. Okay.

Robert Blumen 00:09:30 You provide a history of the concept of class full and the class less IP addresses.

Iljitsch van Beijnum 00:09:37 So the thing is, I, uh, already told about, uh, how ISP would know only the range of dresses that are used in a university network. So there’s bits in the IP address that are the same for all the IP addresses inside the university. And then there’s, uh, so that we call that the network part. And then the remaining bits are used to number the individual systems in the university. So that is the host part. And it used to be that there are three different classes of IP addresses, one where the class a, where the network part is very short. So you only have a few networks, but then the host part is very long. So we can everybody hosted it at work. And then class C where it’s the other way around very many classy networks and an only 256 hosts or class C network.

Iljitsch van Beijnum 00:10:30 And then class B spits differences in the middle. But at some point, for instance, the university, again, I suppose in the nineties, you needed to hand out 4,000 IP addresses to 4,000 PCs in the university. Well, classy is too small, 256, doesn’t cut it. Class a is 60 million addresses way too much. So you get class B 65,000 addresses, but there’s only 16,000 class B box. So you waste 60,000 to addresses use only 4,000. That didn’t work. So then what they said is we’ll just switch to class C and have for instance, 16 class seat blocks for one university. But then the routing tables started growing really, really fast. So basically the routers exploded. So then they said, well, that’s it get rid of this artificial limitation of these three classes and just say, we cut wherever we want. And that is a class into the main routine. Okay.

Robert Blumen 00:11:32 If I understood that there are the 32 bits in the IP address, and then there’s been a lot of changes over time in how many of those bits are the organization part that are consistent across one organization. And then how many are left for individual nodes on the network?

Iljitsch van Beijnum 00:11:50 No, no, no. It used to be that there were just three sizes, but now the sizes, whatever you want, whatever. Okay. So if you need, for instance, 400 addresses, you get what we call a slash twenty three, twenty three bits are for the organization. Nine bits left 512 addresses. So you only waste a hundred. I should

Robert Blumen 00:12:12 Ask a briefly about IP four versus IP six. Although that won’t be the main focus of our discussion, but how did things change with IP six?

Iljitsch van Beijnum 00:12:22 Well and disregards, they didn’t really change except that stars now 100 to 28 bits. Okay. So a lot more bits. So,

Robert Blumen 00:12:30 Okay. Now I don’t move on. Talk about routing. I’m using some device. I need to talk to another server out there, whether I’m sending an email internet, how does the packet get from one IP address to another IP address and how many different kinds of things does it have to cross on the way from a to B?

Iljitsch van Beijnum 00:12:55 Well, what happens is that so far inside your computer creates IP packets. So for instance, we send an email and the male’s bit longer has to be split in a bunch of IP packets. Those all get some IP header with some information in it. The most important part of that information in the header is the destination address. It’s also the source address. So to return back and come back, but the destination address guides the packet along the way. So then your computer probably doesn’t have any big routing table inside it. So what it does, it sends the packet to the default tutor. That’s what you get through DHCP. So as soon as you connect to a network, DHCP tells you what the default router is, send it back, it’s there. And as a return, if it’s for instance, a small home router, it also has a default router.

Iljitsch van Beijnum 00:13:47 That’s the other side of the line to the ISP. And then it gets to, for instance, the first ISP router, and then there’s actually a decision to make. So do I go to the north, to the south? Which exits do I take out of network? So these rulers get bigger and bigger and they have more and more choices of where to send stuff. And then eventually it gets to the right ISP. Maybe there’s a carrier in the middle, maybe even to get, gets to this nation ISP. And then it goes to the right loser. That’s the other side connects to goes over to overline to the home router. And that one finds the ethernet address that goes with the IP address and delivers it over the ethernet or wifi works at 10. So each

Robert Blumen 00:14:34 Yep. Each router is looking at its routing table, deciding where to send the packet next. Yes. And a writing table. It’s some kind of a data structure. What is it?

Iljitsch van Beijnum 00:14:48 Do you want some details,

Robert Blumen 00:14:50 But we’re all about

Iljitsch van Beijnum 00:14:52 Details on this podcast. Okay. So the thing is, there’s actually three tables. So there’s a BHP table that stores all this BHP information. Then there’s the main routing table. That’s collects all the information from all the protocols that run is usually an internal routing protocol within the ASMR. So there’s two routing protocols, and then it goes to the full boarding information base. And that’s the table that’s actually used to forward the packet. So that one usually gets millions of packets per second, or at least it’s built to handle millions of packets per second. So you need to be able to go through a data structure really fast. Uh, so there’s basically two ways to do that. You use an ASIC that can search through a data structure in REM really fast, or you use content addressable memory TKM turnery content, addressable memory. So it can have wildcard bits in your search, uh, question. And that’s basically memory with tiny bit of processing power in it. So every memory cell can do a compare and see, is this a prefix, the address block that this address pulls within? And it says, yeah, that’s me. So you don’t need to go through all the sequentially through a bunch of memory locations. The memory can do it itself. If it’s in the software or if it’s in REM, then usually we use a three eight, so not a binary tree, but a tree with, for instance, let’s say it’s 256 different leaves or

Robert Blumen 00:16:34 Okay, now it wouldn’t be feasible to have an entry for every single IP address. When I understood from your discussion, is it relies on destination address falling within a range of IP addresses by some of the higher order bits matching, and that is considered a route match. Is that correct?

Iljitsch van Beijnum 00:16:56 Yeah, that’s a prefix a match. So basically, like I mentioned before, if you have a block of five and 112 addresses, so then the organization part and network part is 23 bits. So we write that down with slash and the 23 at the end. So it’s less 23. And that means that in the data structure, basically the remaining nine bits that are left zero, but then you can have a mask. So you to mask out the bits you don’t want to match, or you can use some other mechanism. And the thing is because it’s fixed as can overlap. So I can have the 23, but also within the slice 23, there’s two slash 20 fours. So if these are also in the recent table, I mentioned the stage 23, but also match one of the slice 20 fours. And then the rule is longest match first. So the edge with the lowest number after the slash the shortest prefix, that one wins.

Robert Blumen 00:17:57 Okay. I’m glad you said that. Cause I was going to ask if there could be more than one match. That sounds to me like saying, if I know you live in a certain neighborhood, that’s more specific than if I knew you lived in a certain city or region. And so if we routed to the neighborhood or getting closer to you than if we just said route it to Netherlands,

Iljitsch van Beijnum 00:18:20 Right? So I, I misspoke just now. I said the smallest number after the slash, but it’s actually the largest number after the slash so the longest match. So the example that I often use is for instance, if you are driving, um, from the east coast to California, or actually you drive into San Francisco and there’s, uh, two signs that road splits, and one sign says California to the left. And the other says, San Francisco tutor rights. So you need to go to San Francisco is in California. So you go to the left. Right? Got it. So, no, that doesn’t make any sense because why would there be a separate sign, pointing a different direction for something smaller that doesn’t make any sense to use the enlarger less specific information? So actually we applied this algorithm ourselves as well without, uh, without really realizing it.

Robert Blumen 00:19:14 And how big in terms of either the number of entries or maybe the number of megabytes or gigabytes are routing tables these days

Iljitsch van Beijnum 00:19:25 There’s in BHP a bit under 900,000 IBC for prefixes and about 125,000 IPV, six prefixes.

Robert Blumen 00:19:35 So one thing I’ve wondered about is certain small countries have created a profit center by licensing their domain, their top level domain, because it happens to match an English word like dot M E I think is it might be Montenegro. If these routing tables have a premise of a bunch of things are close together because they’re all in Montenegro. And we are going to be able to route traffic to those domains to Montenegro. And those entities are assigned certain IP addresses, but now I’m in California and I got adopt me because it’s cute and funny. Does that create issues with the routing, not working the way it was conceived because you have people all over the world who are now on this same top level domain?

Iljitsch van Beijnum 00:20:29 Well, the domain names and IP addresses are completely decoupled because the DNS sits in the middle. So it maps one to the other. So you can easily map one name, two addresses that are used in Holland. And the next name one letter up to something used to South Africa, completely different addresses.

Robert Blumen 00:20:50 Okay. So there’s no reason to assume that a bunch of domains issued from the same place are going to have the need IP addresses that are also issued from the same IP as peace. No. Okay. So that was my flaw. Great. Now within the routing table, could there be multiple alternative routes to the same arrange or has something else the thing which built the writing table already decided what is the best route if there were multiple routes?

Iljitsch van Beijnum 00:21:21 Well, obviously the whole idea being that you need to make a decision where to send your traffic. So you always have, or usually have multiple options. And then BGP decides which option, which path is the best one. And then it gives that one to the master routing table inside the router. And then maybe there’s not a protocol as well. That also says I can reach this. And then the two protocols have to duke it out in the master routing table. But as far as B2B is concerned, B2B knows what’s best in BGP, except when they’re completely equal. And you want to actually load balance across multiple paths, but then there are some special conditions that have to be met. Okay.

Robert Blumen 00:22:05 So we may come back to that in my home computer, that all of a simple routing table, which is saying anything that’s not on my local network, send it up to my ISP. And then I would think my ISP would have relatively similar riding tale because it’s connecting to everything is going to go to one of a number of carriers or tier one. So it only has to group things into eight or 10 buckets to know which carrier. Yeah.

Iljitsch van Beijnum 00:22:39 Yeah. But the thing is, it’s like, um, from the standpoint of that first router, that doesn’t have very many options. It’s like there’s only 10 phone numbers in the phone book. So actually you, for instance, could just shrink them down to one digit, but it’s still the entire phone book. It’s just the numbers.

Robert Blumen 00:22:56 So it’s right. Okay. The number of values is small, but the number of prefixes is still okay. And so how are these routing tables I’m wanting to, just to build up where I can then ask you what is BGP? And the next question I have is how are their writing tables constructed? Now, if we have to talk about DGP first, then go ahead and answer that question. However, it makes the most sense. Well,

Iljitsch van Beijnum 00:23:24 Like I said, a router will probably be running two or maybe even a few more routing protocols. So each routing protocol just says, I can reach this prefix. And attach is usually some value, a metric to it, of how well it thinks it’s can reach it. And then this master routing table is built from, and that one is then used to create the forwarding information base. So that’s basically just manipulating data structures and software.

Robert Blumen 00:23:52 Okay. So is there a program we’re running on each router that is taking in information about routes and updating the routing table?

Iljitsch van Beijnum 00:24:03 Right. So for instance, there’s a open source software that implements a bunch of hooting protocols on the Unix, like systems it’s called zebra and it has a demon for every protocol and then one master demon that gets all the information for all the other demons and collects it into the master routing table. And then it goes inside the kernel of the Unix system.

Robert Blumen 00:24:29 And it, then when it sees changes, that would impact the routing table. It applies an update to the writing table,

Iljitsch van Beijnum 00:24:38 Right? Yeah.

Robert Blumen 00:24:39 Okay. And how rapidly are writing tables changing over the course of the

Iljitsch van Beijnum 00:24:45 Okay, well, oh, SPF is a widely used one inside an ASMR and that one detects other routers if they go away, if they appear within about 10 seconds or a small multiple of 10 seconds. And then if, uh, an existing router that’s already connected to the other ones has an updates can happen in a second and BGP because the entire internet takes a bit longer, especially for an update to be flooded all across the internet. But that could be within a few dozen seconds or maybe one or two minutes to reach the entire internet. Right.

Robert Blumen 00:25:28 Okay. So you mentioned OSP F like to drill down a bit into that. So first, do you know what it stands for? Open

Iljitsch van Beijnum 00:25:37 Shortest path first and shortest path first is the SBF or Dykstra algorithm by my fellow countrymen who worked in Texas for a long time. And that’s a algorithm to find the shortest path between two places.

Robert Blumen 00:25:52 Okay. So what are the inputs to this algorithm and what does it produce?

Iljitsch van Beijnum 00:25:58 Basically, it’s a graph, so I’ve a bunch of nodes and this one is connected to this one and so on. And then it’s, uh, runs through that until it’s determined the cost to reach every other note from the start from

Robert Blumen 00:26:13 Where you are. Okay. So let’s back up to earlier response you gave, you said there’ll be a demon running OSP, OSPs on each router and it’s getting updates that it can use to recompute what the graph looks like. Is that correct? Okay.

Iljitsch van Beijnum 00:26:33 So in our SPF, there’s actually, they call it a all SPF database. So that’s basically the graph of the network, which a cost value attached to every, uh, notes that are connected. And then when there’s an update, it updates its so Nate sends out the update to its other neighbors and then it applies the update itself on its own database, runs the SPF algorithm again and then sees that it needs to take a different path to reach certain destinations because now something has changed,

Robert Blumen 00:27:05 Oh, SPF. If I understood this, it maintains its own model of what it thinks the entire internet looks like

Iljitsch van Beijnum 00:27:12 Now, SPF doesn’t work internet wide. It’s a, what we call an IGP internal gateway protocol and internal routing protocol. So it runs within a network operated by one organization within 1:00 AM.

Robert Blumen 00:27:28 Okay. What is the extent of the graph that OSP F models?

Iljitsch van Beijnum 00:27:33 It’s the connections between all the routers? So if you have, for instance of 20 routers and on average, they are connected to three others that is 60, uh, links that you have to put in database. And then the list of prefixes that each router sends out into the network.

Robert Blumen 00:27:54 So things that would change the graph would be new router is added, a router goes away or an existing router is aware of a change in its ability to access parts of the internet. Are there any other types of events that would cause a rerun of SPF?

Iljitsch van Beijnum 00:28:13 Well from the simple brain cells of this demon running inside a router, it’s very hard to make the difference between a router going away and the link to a neighboring route or going away. So I’m not, not sure if that is something that is different than OSPF, but one disclaimer, I have to make that is BGP that I wrote this book on. It’s a relatively simple it’s the BGP standard is I think the old one about 50 pages with SPF is 150 pages, much more complex. So I’m not an expert in all SPF. So basically you see a router on an interface, on a network interface that wasn’t there before. It could be because the router to just turn on could be because a link came on and the opposite router goes away. Doesn’t answer any more to the keepalive packets, the hello packets. And it could be because router went away, it could be that the collection went away. So those are basically the two events. And then there’s, uh, of course, what also can happen is that a prefix goes away. So the roots are still there, but now it says, don’t send me traffic for this prefix anymore. Or a new prefix is advertised.

Robert Blumen 00:29:27 If I had in my routing table on the router, that router was formerly the best route to that prefix. Now there are cities that prefix has gone away. Don’t send me any more traffic to that, that force SPF to revise its notion of where it is the best route to that prefix and possibly change the routing table.

Iljitsch van Beijnum 00:29:50 It would change the routing table, but it wouldn’t have any impacts on SPF. SPF is just a graph between the connectivity between the routers. So then there’s a second part of the database that maps the prefixes to

SE Radio 00:30:08 The last thing enables the world’s leading organizations to put their data, to work using the power of search, whether it’s connecting people in teams with content that matters keeping applications and infrastructure online or protecting entire digital ecosystems elastic search platform is able to surface relevant results with speed and add scale, learn how you can get started with elastic search platform for free at elastic.co/se radio.

Robert Blumen 00:30:36 So I think with these building blocks, we already to take on BGP. I want to start with, what does it stand for?

Iljitsch van Beijnum 00:30:46 Well, BGP is the border gateway protocol. And now you may ask yourself, what is it, border gateway, but back in, uh, 1989, when a BGP one was created, then they often use the word gateway for what we call a router. So basically it’s border router protocol and a border router. Well that makes sense. That’s the last route or in your network that talks to the first router in the next network. So it’s the protocol that the border brokers in different networks talk to each other.

Robert Blumen 00:31:20 If you had to come up with a better name for it, that’s more in line with modern usage. Do you have an idea for that?

Iljitsch van Beijnum 00:31:28 I think board, our router protocol would make more sense protocol used before we had BGP was EGP and that was the exterior gateway protocol. So that’s that I don’t think people would understand that also that name is already taken in the past. So something like inter domain routing protocol, but that one is also used for something that nobody remembers anymore. So it’s hard to find good names. Okay.

Robert Blumen 00:31:56 And what is PGP?

Iljitsch van Beijnum 00:31:58 Well, like I said, it’s a routing protocol that your routers use to talk to routers operated by other people. Okay.

Robert Blumen 00:32:08 And that is BGP. Could you give us a brief history of BGP?

Iljitsch van Beijnum 00:32:14 Well, the first version was in 89 and then within a few years they went to first two and three and then version three, that one was used when this whole thing where the bruising tables started to explode because they went from class B networks to multiple class C networks. So they had to figure something out. So that was classes into domain rooting and BG before is the BGP version that supports classes into domain. And we’re still using BGP before. So that was 1993. And it’s now 2021. So that was a very successful protocol version.

Robert Blumen 00:32:55 Pretty stable.

Iljitsch van Beijnum 00:32:56 Yeah. Well, but that doesn’t mean that nothing has changed for instance, right around the same time they created BG before they were working on IP V6. So BGP for predates IP six, but still we can use BGP four to route IPV six. And that’s because there’s extensions that are added to VG before, but they didn’t have to go to new version number

Robert Blumen 00:33:21 Something I wanted to ask before. I think it makes sense now is in terms of megabytes or gigabytes, how big are these routing tables?

Iljitsch van Beijnum 00:33:32 It’s hard to say. So the first time I ran BGP was in 1996 on the Cisco 2,500 router. That one has 25 megahertz, 8,630 CPU and 16 megabytes of memory. And that just about fit. So there was five megabytes for BGP and I was 30,000 prefixes and five megabytes for the main routing table. So we’re now at about 30 times that, so that would be about 150 megabytes for each table, but that assumes that the data structures are the same because memory is cheap. Now it’s probably a bit bigger than that, but order of a few hundred megabytes for one BGP feed. So if you connect to multiple other networks, multiple routers, they all send a copy of their BGP table. So in that can add up. So it’s one copy of the BGP table for every BGP router that you talked to and then one extra for the main table and our last one for the forwarding information base.

Robert Blumen 00:34:40 Okay. I can do the math in my head, but to what extent or changes in how the internet works driven by the realistic amount of memory that you could put in a router?

Iljitsch van Beijnum 00:34:56 I don’t think that wasn’t a big limitation. I mean, it’s always possible to add more memory. I mean, it might be expensive, but there’s not really a limitation on how much memory you can put in some, put in a CPU or attached to a CPU, except for, of course, when you have to jump from 32 bits to 64 bits. But I don’t think that that was an issue that happened for other reasons than purely memory size and rotors. I mean, even today probably don’t need more than four gigs in any router except maybe the largest ones

Robert Blumen 00:35:28 Within the BGP protocol. What are the most important messages that are exchanged between

Iljitsch van Beijnum 00:35:36 Routers? Well, there’s basically, there’s only five messages and the main ones are well, there’s the open message that, that starts the whole thing. Then there’s update message that sends the other router. What are more prefixes with some extra

Robert Blumen 00:35:51 Information attached or says withdraws prefixes that were sent in earlier updates. And then when there no updates to send and there’s keepalive messages to make sure that the other side doesn’t think we’ve went away. Does the PGP connect network bootstrap itself when routers come onboard?

Iljitsch van Beijnum 00:36:14 Well, interesting thing about BGP is that unlike all other routing protocols, it doesn’t automatically discover other routers. So it needs to be configured on two routers to talk to each other. So when they are booked up, when they are wrong and then their network interface comes up, they start sending start connecting to the IP address of the router over TCP. When there’s TCP connection, they send the open message and they start exchanging information. And each router has one or more prefixes of the IP address is used in the asset itself. So then they exchange those and maybe one of the routers connects to a third network and then maybe it’s, it gets prefixes from that network. And since it’s an update to the first one, and so the more stuff connects, the more updates flow in all directions. And those a 900 K prefixes are putting a table. If you turn off the entire internet and turn it back on at the same time, of course,

Robert Blumen 00:37:21 If you are going to add a new router here in ISP, then you need to configure your other routers to say for BGP purposes, here’s a new router that you need to connect to that you did not know about before.

Iljitsch van Beijnum 00:37:37 Yeah, that’s a really annoying limitation because the job of the B2B readers is to talk to other networks, but they also have to coordinate their information with each other. So they also need to talk to the other BGP routers in your own network. And then originally the rule was the basic rule is that every BHP Ritter, and then they S must talk directly to every other one. That way you can’t have loops in the information because can only come from the source. Now, if you have a hundred scooters, you put in number 101, I mean, you have to log in to a hundred routers and add a BGP neighbor to the new one. They’ll hopefully if you have a hundred tutors, you have some automated system for that. But of course that is quite a workable. So there are solutions to get around that limitation.

Robert Blumen 00:38:30 Um, this, I think it illustrates a general principle. You see in a lot of things where we have all these great protocols like DNS and BGP that help our applications discover things. But at some point something can’t be discovered. It has to know where stuff is.

Iljitsch van Beijnum 00:38:51 Right.

Robert Blumen 00:38:53 Okay. Now, suppose I’m an ISP and I’m going to add a new router that I want to interconnect with a tier one or other ISP. Do I have to tell them guys, I’m adding this new router, here’s the IP address? Whichever one of your routers do you want to connect to me? It has to now know about this new IP address.

Iljitsch van Beijnum 00:39:15 Yeah. So if you have an existing router and you replace it there, you just put all the information from the old one into the new one. And then basically the other side doesn’t really have to know anything. Well, you probably want to tell them I’m going to do maintenance. So we’ll be down for an hour or something, but there’s no change for them. But usually the way it works is that if you want to connect a new router, of course it has to connect over something over some network connection. So usually you order a connection from an ISP, and then you talk about the BGP information, the settings on the two sites that you’re going to use. Okay.

Robert Blumen 00:39:54 And what happens if a router cannot connect to an IP address where it believes there should be another router,

Iljitsch van Beijnum 00:40:04 Just get straightened,

Robert Blumen 00:40:06 Keeps trying. Okay. Now, so let’s drill down a bit more into the update message to explain w with the update, what are the fields in the data, in the update?

Iljitsch van Beijnum 00:40:18 So basically it’s all binary, right? So this is all the nineties. So no XML or anything. And there are three parts, the two parts and the half length. And then, because the message itself also has the length. That means the last part, the length is implied. So the first part is an L R I, that is network layer reachability information. And that is a really fancy way of saying what are more prefixes. So that’s just a IP address, prefixes. And then we get the path attributes. So that’s additional information attached to these prefixes and then the last field. So all these attributes, they all have their own structure because they’re all different. Some are optional and some are required. But then the last part is the withdrawn roots. So that’s prefixes that are no longer reachable. So that’s how, what an update looks like.

Robert Blumen 00:41:18 So update is a router saying here’s some prefixes, which I am able to route to, or here’s some grievances, which I am no longer able to route to. Yes. Okay. You’re a router. You’re getting BGP updates and updates. Tell you that certain routes that you were not aware of before now exist or routes, which you had have gone away. And then that drives the routing algorithm, which will then eventually, may apply updates to the routing table. If either you have a new route that’s better or out, that was the best route is no longer available. It was, was any of that, correct?

Iljitsch van Beijnum 00:41:59 Yeah, that’s right. And then there’s a third thing that can happen. That is that you have a prefixed that was already there, but now the path attributes have changed because there was some updates somewhere else. For instance, the path got longer. So still reachable, but now maybe because it’s longer, you want to use another one.

Robert Blumen 00:42:22 Okay. So previously it took me five hops to get to a certain address range, but the topology of the network between me and that address has changed. And now it takes seven hops. So you want the other routers to know that, because now that may no longer be the shortest route, if it’s gone from five to seven,

Iljitsch van Beijnum 00:42:47 Right. I could be that it might still use a longer one because the length of the path is not the most important thing, but it is important. So could easily be that it now selects another one.

Robert Blumen 00:43:00 Yeah. See, that gets into what, by a shortest or best route, what kind of a metric are we using to decide on the best route?

Iljitsch van Beijnum 00:43:09 Well, I’m glad you ask because there are 13 simple rules. It’s actually fairly, uh, involved algorithm to decide. And the thing is that you need to resolve this. You can say, okay, I don’t know. I can make a choice. You have to make a choice in the BGP specification. It goes to G how many is that? That is seven plus another one. So that’s eight. And like I said, the 13 that’s on Cisco. So a website, Cisco has a few extra, they invented themselves and most other routers, they use the same logic as Cisco. So do you want me to discuss the main one?

Robert Blumen 00:43:50 You know, I’d like to save the time were we have a bit of time left and I wanted to set a time to go into another topic, which is the discussion of what can go wrong with the BGP. As I understand, it’s based on a trust system where if I’m a router and I say, Hey, I have some great routes to these prefixes, then other routers trust that, is that correct?

Iljitsch van Beijnum 00:44:16 Yes and no. So ideas of course, that people may make mistakes. So basically if you sign up with an ISP and you buy a book about BGP and you start typing that you could make a mistake it’s possible. So what I really should do and usually do is that they have filters that only accept from their customers, what their customers are supposed to send. So only the prefix that they know belongs to their customer now. So for simple customers that only have a one or a few prefixes, that’s fine. That works. There are, of course some icebergs that don’t do this and then bad stuff happens sometimes. But the trouble is if I speak, connect to each other and they all have hundreds of customers with all a few prefixes. So that’s a thousand prefixes for one ISP. So that would be a very long filter, but also a filter that changes every week. So that’s not doable to manually, uh, fill for that. So basically the, the big issue is between the ISP and yeah. If you don’t have any mechanism to make sure only the correct stuff gets in, then yeah. I don’t know if that means you trust them, but you don’t really have another option if you don’t have the mechanisms. Unfortunately, we do have a relatively new mechanism or PKI that helps, but it’s not foolproof.

Robert Blumen 00:45:51 I am aware from some security news sites that sometimes a ISP, either maliciously or by accident advertises routes that it does not own. How can that happen?

Iljitsch van Beijnum 00:46:05 Uh, oh, there are a bunch of ways. There’s actually an RFC from the ITF that lists, uh, six of them. And you can even think of a few others. Do you want some detailed examples? Yeah,

Robert Blumen 00:46:17 Sure. That would be great.

Iljitsch van Beijnum 00:46:18 Okay. So basically the most famous one is the whole YouTube Pakistan incident in 2008. What happened there is that the Pakistani government didn’t like some videos on YouTube. So they told the ice peace in the country. I want you to block YouTube. So what I used to be did that by creating a route in the routing table, that points to a no interface. So all the packets that met that roots basically go away. So that’s a really good way to get rid of packets. You talk like without having to set up all kinds of firewalling rules, but then what they also had was a mechanism where all the locally known routes were injected in BGP. So without specifically telling the router to put that no roots in BGP that happens. And then it went out to the ISP who didn’t filter the customer routes. So they got the prefix from the YouTube servers from this Pakistani ISP, and they send it out to the rest of the world and to make things even worse. It was a longer prefix. So the longest, much first rule kicked in. So the completely overrode other considerations, such as the length of the path. So even though the path was long, it would still draw all the traffic for the YouTube streaming servers to the Pakistani ISP words disappeared. So YouTube became unreachable.

Robert Blumen 00:47:49 How long did it take for people to figure out what happened and fix it?

Iljitsch van Beijnum 00:47:56 Oh, well it was a long time ago. I think people started realizing what was going on pretty quickly within maybe 10, 15 minutes. And then there are these forums where that’s where operators talk to each other, such as for instance, NANOG north American network operator group. So they warn each other. This is going on. And then I think people started filtering out this incorrect information in BGP and it’s, uh, I don’t know how long it took for, from, to actually be solved to go away. If I had to say something, I think some number of hours

Robert Blumen 00:48:37 That sounds like it was a mistake, but are there security attacks involving BGP or you’re intentionally trying to route traffic somewhere that it doesn’t really belong?

Iljitsch van Beijnum 00:48:50 Yeah. The thing is, it’s hard to tell. For instance, there was one time in 2010, where for, I think, 15 minutes or something huge part of the internet was all routed to China telecom and yeah, people were asking, is this an attack or are they trying something to see if it works? Or was it just a stupid mistake? But there are things that’s where obviously attacks. So for instance, one thing I’ve heard about, but I don’t think I’ve seen any actual detailed write-ups is where spammers take unused IP address space announced that the BGP start spamming because these address are unknown to the anti-spam software, and then they go away. Nobody can see where it came from. I’m not sure to what degree it actually happens, but there was one incident. I don’t know, too many of the details where someone injected IP addresses over DNS server into BGP to send out fake DNS replies, to reroute a domain name, to intercept cryptocurrency.

Robert Blumen 00:49:58 Last thing I’d like to ask since this is software engineering radio, I could say as software engineer, I don’t get exposed much to BGP, but is there a use case where I am running some application in a particular data center and I’m going to move it physically somewhere else, I might reach for DNS and say, I’ll get a new IP address. Am I new data? And then I’ll change DNS record serve the new IP address, but are there cases where I want to take the IP address with me when I move something,

Iljitsch van Beijnum 00:50:40 There’s a bunch of, uh, applications where they hard-code IP addresses sometimes as a way to limit the number of licenses that can be used or something. So that’s always very annoying, but I think the main thing where you would want to do that for good reason is if you want to have a very high availability or very high performance services on the internet, then of course, if you put that somewhere, then the other side of the earth, it takes a long time for the packets to get there. And if it goes down, then you’re gone. So then you would like to use any cast. That means you have the servers with the same IP address in different places. This is especially something that happens a lot with DNS. And then the BGP will routes the packets to the closest one. So you have the best performance, but then the thing is the surface stops working. Then you need to withdraw that prefixed from that location. So the rerouting can happen to our location. So there you have to have a tight integration between monitoring service and influencing PGP.

Robert Blumen 00:51:46 Great. Okay. That makes sense. So if I could see that for DNS, where a lot of services do have DNS hard-coded with IPS, so w would be, it would break a lot of things. If you issued a new IP address for your DNS server, you really are stuck with it. Right? Okay. I mentioned your book that you’ve already published and that’s available everywhere, your new ebook, when will that be available? Well,

Iljitsch van Beijnum 00:52:18 The thing is, life keeps getting it in a way. Uh, and writing is a, is select programming always takes longer than you think. So hopefully, maybe six weeks or something I’ll be finished and it will be up on Amazon and apple, uh, ebook stores. And of course, if you look me up on Twitter, I’ll send out a Twitter message to tell everyone about it’s very easy to find it because you just have to type my first name and then you find all the links to everything

Robert Blumen 00:52:46 I do. Do you have any other presence on the internet? You’d like people to check out?

Iljitsch van Beijnum 00:52:51 Yeah. When I wrote the book for a Riley, I created a websites with some modesty. I called BGP expert BTP expert.com, but I basically moved that stuff to, uh, ILGA I T S C H my first name.com where I have a section for IPV six for BGP and for some personal stuff. So that’s a good way to keep track of what I write and what I do.

Robert Blumen 00:53:17 Thank you very much for speaking to software engineering radio. Thank you for having me for software engineering radio. This has been Robert lumen. Thank you for listening.

SE Radio 00:53:29 Thanks for listening to se radio an educational program brought to you by either police software magazine or more about the podcast, including other episodes, visit our [email protected] to provide feedback. You can comment on each episode on the website or reach us on LinkedIn, Facebook, Twitter, or through our slack [email protected]. You can also email [email protected], this and all other episodes of se radio is licensed under creative commons license 2.5. Thanks for listening.

[End of Audio]

SE Radio theme: “Broken Reality” by Kevin MacLeod (incompetech.com — Licensed under Creative Commons: By Attribution 3.0)

SE Radio 468: Iljitsch van Beijnum on Internet Routing and BGP

Show Notes

Related Links

Transcript

Join the discussion

More from this show

SE Radio 724: Jure Leskovec on Relational Graph and Foundational Models

SE Radio 723: Dave Airlie on Linux Kernel Maintenance

SE Radio 722: Dwayne McDaniel on the Engineering Challenges of Secrets Management

Menu

Recent posts

Search

Search

SE Radio 468: Iljitsch van Beijnum on Internet Routing and BGP

Show Notes

Related Links

Transcript

Join the discussion

More from this show

SE Radio 724: Jure Leskovec on Relational Graph and Foundational Models

SE Radio 723: Dave Airlie on Linux Kernel Maintenance

SE Radio 722: Dwayne McDaniel on the Engineering Challenges of Secrets Management

Menu

Recent posts