SE Radio 527: Adrian Kennard and Kevin Hones on Writing a Network OS from Scratch

Adrian Kennard and Kevin Hones, founders of FireBrick routers and firewalls, discuss how to design, build, test, and support a hardware router and network operating system from scratch. Host Gavin Henry spoke with them about a vast array of topics, starting with component choices, embedded operating system design, testing, and release cycles. The conversation explores more detailed areas like configuration management, Ethernet packet processing, RF engineering, power engineering, VoIP, network protocol design, RFCs, documentation, broadband, network monitoring, semaphores, CE marks, EMC testing, IPv6, L2TP, electromagnetic compatibility, emissions and immunity, EN55022/EN55024, safety EN60950, XML, XSD, JSON, and not being afraid to create something that fits your exact requirements and no more.

Show Notes

Other References

Adrian’s Blog – RevK®’s ramblings
Firebrick Website – FireBrick – For Firewalls, Bonding ADSL, Routers, Traffic Shaping…
Firebrick History – History
Kevin’s business – Business Network Solutions
Adrian’s ISP – A&A – Home

Transcript

Transcript brought to you by IEEE Software magazine.
This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.

Gavin Henry 00:00:16 Welcome to Software Engineering Radio. I’m your host, Gavin Henry, and today my guests are Adrian Kennard and Kevin Hones. Adrian has worked in software and telecom for over 40 years. Watched internet from the start. He’s worked for SDL, Nokia, on GSM standards and even on Tote machines for race tracks. He’s an IPv6 and open software advocate with lots of published works on GitHub. He currently works at Andrews & Arnold Ltd. (AAISP), which he started over 25 years ago, and is the founder and lead developer of FireBrick Routers/Firewalls. Kevin has worked in hardware and software and telecoms since the early 1980s. He has enjoyed microcontrollers that range from 4 to 64-bits and power electronics. Has experience in communication and network technologies from serial, PSTN and ISDN through to 10Gig Ethernet. He currently works at Andrews & Arnold Ltd., which he started in 1999, and is the founder and lead hardware designer at FireBrick Routers/Firewalls. Adrian and Kevin, welcome to Software Engineering Radio. Is there anything I missed in your bio that you’d like to add, or did we cover everything?

Adrian Kennard 00:01:24 I think that’s very comprehensive.

Kevin Hones 00:01:26 Think that’s fine. Yeah. I never know what to say about myself.

Gavin Henry 00:01:30 Just a note for you guys and the listeners, this is my first ever show where I’ve had two guests. So hopefully it won’t be messy. Just need to bear in mind that we’re going to chat over each other, potentially. I’m really looking forward to this, but you’ll need to take your turn, however excited you get that’s the risk. So we’re going to have a chat about five or six topics, hopefully about 10 minutes each, related to the creation of the Firebrick Router, which you can tell me more about in a minute. So let’s start. Adrian, am I correct in my understanding that you designed and built an ISP carrier-grade router from scratch?

Adrian Kennard 00:02:04 Well it takes a little bit of explaining here because this is a series of products over more than two decades. So what we started with was a much smaller product. But yes, we do now have equipment that is in ISP networks, such as ours and Kevin’s and many others that handles many thousands of customers, broadband connections as a full ISP grade router. So, yes.

Gavin Henry 00:02:30 So why on earth did you decide to build your own hardware and software from scratch?

Adrian Kennard 00:02:35 So I let Kevin explain a bit about the hardware to start with then.

Gavin Henry 00:02:39 Okay. Thanks. That’d be great.

Kevin Hones 00:02:41 Well back in 1999, when we started this, there wasn’t anything like what there is now just available off the shelf. My background’s in designing industrial control equipment and things, and we figured, well, how hard is it to do such a thing? We basically need a micro controller with enough resources, some Ethernet controllers, how difficult could it be to do that? And we were literally sat around talking about such things one day and we decided let’s do this. Adrian’s side was software, mine, hardware. From hardware point of view, it was very much a mainstream thing that we did at the time, designing with microcontrollers and got the data sheets and started putting a design together. Meanwhile, talking to Adrian about what software are we going to run on this thing?

Gavin Henry 00:03:28 It does seem like common thread we hear sentence, how difficult can it be? You know, you have no idea what you get into, but you give it a shot anyway.

Adrian Kennard 00:03:37 Oh I think it should perhaps be our motto, how hard can it be? Yes.

Kevin Hones 00:03:39 And now we know how hard it is.

Gavin Henry 00:03:43 So could you give me an overview of the main components probably in version one or something that you created to give us an idea of what you shouldn’t have taken on?

Kevin Hones 00:03:52 Well by modern standards, it’s very, very primitive. It was good-for-its-time Hitachi microcontroller — their H8S family, which is basically a 16-bit machine. We had two Ethernet controllers running at the speed of 10 megabits a second on it, an Ethernet hub, and about a megabyte of RAM and some Flash memory built into the thing. If anybody’s interested in the specifics, an H8S/F2357F microcontroller.

Gavin Henry 00:04:23 I’ll get some links off you and put it all in the show notes.

Kevin Hones 00:04:25 By all means, and it all sat in a fairly small metal box with an external 12-volt, small wallwart type power supply. It all went together reasonably well. So, we got some hardware up and running in quite quick order and put it in front of Adrian.

Adrian Kennard 00:04:43 Yeah. That’s where it got fun.

Gavin Henry 00:04:44 So did all the electronics speak to each other at that point or…?

Kevin Hones 00:04:48 Pretty much. There were some minor things — there always are some minor things — but the fundamentals, it worked, it talked to its controllers. It spoke Ethernet, which was smiles all around.

Gavin Henry 00:04:59 Excellent. And what was Adrian’s software remit at that point?

Adrian Kennard 00:05:02 Well, we started, Kevin already had a very simple task-switching sort of operating system for the Hitachi H8S. So we had to write everything from scratch, basically. This is the first time we’d done anything with Ethernet, and so the software had to handle Ethernet packets at the lowest level of bytes that come in. The hardware didn’t even have DMA, so we had to actually have a loop in the software to transfer byte by byte from the Ethernet controller to receive packets and send packets. So very, very simple, very basic stuff.

Gavin Henry 00:05:36 What’s DMA?

Adrian Kennard 00:05:37 Sorry, Direct Memory Access. These days Ethernets controllers will transfer the packets directly into memory. They will handle whole queues of packets being stored for you, all behind the scenes, in the hardware. And the software can then go in and look at the header of a packet and manipulate it without having to bring anything else in from memory even, so very quick. But back in those days, the Ethernet controller was so simple we had to literally read byte at a time of a packet and put it in memory and then write it out a byte at a time to send it out to the other controller, to send it on its way. So very low level. And we had to write everything from scratch, building up from there, with IP and TCP and HTTP for web interface and so on. So a lot of, lot of work in software.

Gavin Henry 00:06:21 And this was what, 1999?

Adrian Kennard 00:06:24 Yeah. That’s when we started. And this was before modern broadband had even got off the ground. The very first FireBricks were coming out, in terms of working hardware, as we were installing the very first broadband lines. So it was really early on.

Gavin Henry 00:06:40 Wow. And what does a FireBrick router look like now?

Adrian Kennard 00:06:44 Well, it’s moved on. Back then it was a small metal case, one WAN port — so the Wide Area Network, the outside — and four LAN ports as a hub. These days, we have two main products, the smaller ones, very similar, it’s a slightly bigger metal box. It still has five ports on it, but they can be configured pretty much anyway you like, and you can even plug in a fiber on this small box, which is kind of aimed at the sort of home or office gateway product with firewalling. But we also have a larger rackmount 1U high, 19-inch rack mount box, which provides internet grade gigabit routing. And we are working on the successes to both of those where we are looking at 10-gigabit, but they’re all made in the UK, unlike a lot of routers and firewalls. So, it’s all sort of designed hardware and software and actually manufactured in the UK.

Gavin Henry 00:07:35 Thanks Adrian. What I think will focus on for the rest of the show is the kit that you can get now. It was a good journey and I’ll make sure we put some links in for those that want to look at the original chip sets. So I’m going to move us on Adrian and Kevin. And we’ll talk about, let’s say the, a version that’s available this year or the past couple years, and we’re going to talk about the various decisions you had to make. Selecting the components to build the reach, I think would be a good place to start.

Adrian Kennard 00:08:00 It’s probably worth thinking a little bit about what we are selecting right now in terms of the hardware for the, the next generation, as well as part of this, I suspect.

Gavin Henry 00:08:08 Yeah. If that fits better, let’s go for that because obviously you you’ve got new decisions to make and supply change changes with what’s going on in the world.

Kevin Hones 00:08:16 Well, that is the biggest issue at the moment.

Gavin Henry 00:08:19 Yeah, so version. Is there a version trained for these things? What you call in the next gen one that you’re working on?

Kevin Hones 00:08:25 Well, the current product for the small devices is the FB2900 and the current data center product, which is very old now, is the FB6000.

Gavin Henry 00:08:36 So is that the one you’re looking to redo?

Kevin Hones 00:08:37 That is in the process. There is very nearly a product called an FB9000.

Adrian Kennard 00:08:43 We have prototypes.

Kevin Hones 00:08:44 We have prototypes. They work. It isn’t finished, but it’s a very good work in progress. The biggest limitation to when it’ll be something people can buy won’t actually be development for a change. It will be component availability. As you touched on just now, supply chain issues: they affect us just like they’re affecting pretty much the entire world. There are components which are completely ordinary components from an engineer’s point of view that if you try to buy, they’ll tell you, you might be able to get them in 52 weeks, but we can’t even promise that. It’s unprecedented. We’ve never seen anything quite like it. So we do have a very good manufacturing company who assembles the PCBs for us and does the buying and they’re doing the best job they can of finding things. We just have to hope that that comes up trumps soon enough.

Gavin Henry 00:09:32 So let’s take a step back from supply. And if either yourself, Kevin or Adrian, wants to take us through the design process of this is what we’d like to put in it. This is how we think it’s going to work. We can write some software with it, but until we actually get our hands on it, we’re not going to know if it all works because. . .

Adrian Kennard 00:09:48 That is very much the challenge here. Looking just at the data sheets, you have a very good idea that it will do what you want. But exactly the details, we are building the FB9000 with 10-gigabit ports, for example. But it’s likely to be maximum of 10 gigabit throughput through those, even though there’s two ports, because of the way the hardware works. And we didn’t really appreciate exactly how that’s going to play together until we have the boards built and the software working and we run performance tests and work out, uh that’s the best it’s going to do on those ports, which is fine for the product we want to build here. It’s a 10 gigabit ISP grade router, mainly as an LNS, which is what handles things like broadband connections. So it’s really good for that. And the two ports provide the redundancy, but learning that lesson is complicated process that you can’t just glean from a datasheet sadly.

Gavin Henry 00:10:42 Yeah. And you also have to match customer expectations for the fact they’ve got two ports.

Adrian Kennard 00:10:47 Oh, very much so. And now we understand exactly how this works. That’s going to be very clear in the documentation that the two ports are primarily for redundancy, which is a very important factor in a data center. You typically connect them to different switches in a cluster so that if you have to reboot a switch for any reason, or it fails, everything carries on seamlessly, which is, you know, essential when you’re running ISB grade type stuff.

Gavin Henry 00:11:11 So if you were to take the case off of the fire break 9000, what would you see before your component mode?

Adrian Kennard 00:11:18 Oh, they look lovely.

Kevin Hones 00:11:19 What you’d see. You’d initially see a heat sync covering the main event, the CPU underneath it. If you took the lid off that you’d see a CPU, which looks superficially like the CPU in a PC or something. It isn’t, it’s not an X86 base system. It’s an ARM-based system in this particular case, it’s one from TI and it’s got four cores running at about one and a half gigahertz. I think again, by modern PC standards, that doesn’t actually sound a huge amount. But the way it works with our systems, which Adrian will explain later, actually gives incredibly good performance with that hardware. Around that, you’d see a very large PCB with a couple of fans on it. The whole philosophy of Firebricks for data centers has been to engineer them to last. So there’s two fans. It’s actually marginal whether a fan is needed at all. Because another nice thing about ARMs is they’re very low power. But it’s going to carry on working even if one fan fails, the whole thing is done like that. The power supplies, which form a fair bit of the design are very overrated. The end result of this is it’s very efficient. It runs very cool and it’s…

Adrian Kennard 00:12:32 Very green as well in that respect, low power.

Kevin Hones 00:12:34 To point, it is indeed very green because the CPU uses a very low amount of power for the job it’s doing. Along the front of the case, you’ll see a row of 10 SFPs. We’ve decided for the data center units to stick with SFPs rather than have any copper ports at all.

Gavin Henry 00:12:50 And what does that stand for, for the non-networking listeners?

Kevin Hones 00:12:53 What is it?

Adrian Kennard 00:12:54 That’s a good point. What are SFPs, it’s one of those acronyms we use all the time and you don’t necessarily know what exactly stands for yes.

Kevin Hones 00:13:02 Pass on that. Apologies, it’s just an industry bit of jargon, I guess.

Adrian Kennard 00:13:08 But it’s a shell with a connector that lets you plug in your choice of network connection. It could be a single fiber, a dual fiber, which is more common transmit and receive, or even a copper port, like an ordinary Ethernet connection. And you can choose what to plug in. That’s the key thing there.

Gavin Henry 00:13:24 Yeah. So a little rectangle square that you slot in. I think it’s “small form pluggable” or something like that.

Kevin Hones 00:13:30 That could well be the case. Yes. Yeah. Sounds like.

Gavin Henry 00:13:31 I’ll put some links in.

Kevin Hones 00:13:35 So then at the sides of this unit, carry on with the description, there are two power supply boards. We’re using a bought in modular power supply, which takes incoming mains and turns it to 12 volts. We have two of them for resilience as well, of course. Two completely separate mains feeds. They’re combined on the main board, and a row of pretty flashing lights at the front above the ports. Pretty much describes the whole thing.

Adrian Kennard 00:14:00 One of the clever things there that Kevin hasn’t mentioned is that, in a data center, where you want to plug the power at the front or the back is always a controversial issue. Some kit has it at the back, some at the front, and sometimes you want the network connections at the back or the front, and it’s a pain in the neck. And what we’ve chosen to do is make these power supplies reversible. You can have them both at the back, both at the front, one of each, if you really wanted, which would be a little bit bizarre, but they unplug and swap round.

Gavin Henry 00:14:30 Yeah. So that’s the standard, sort of, cupboard-sized rack that you’d slide a bit of equipment into for the listeners that aren’t familiar with rackable equipment. You see it on nice marketing pictures. So one of the main business use cases for the whole thing was that there was nothing like this that you wanted out there and it’s extremely power efficient.

Adrian Kennard 00:14:51 Yes. These days, of course, there’s lots of different routers, especially for an internet service provider. But when we started, having a firewall itself wasn’t even something that you necessarily had. When broadband first launched, one of the clever things the very early models did is they could sit in your network and firewall. And they had to do this because the routers you could get from BT at the time, would have a single subnet on them. You’d have a sort of joining subnet to connect between your router and your firewall, and then another one on your firewall these days. But you couldn’t do that with the BT router. It had a single subnet and didn’t have any firewall. So what you’d get as a broadband service didn’t have firewalls. People weren’t attacking your network. It was rare when we first started, you look at the logs and see, oh, someone’s attacking me. This is exciting.

Adrian Kennard 00:15:37 It’s not like that these days it’s a steady stream of all sorts of attacks. So there really wasn’t anything back then. And there wasn’t anything we could just buy in and use. There weren’t Raspberry Pi, for example, which you might just entirely write your own software on. So we had to start from scratch and we’ve taken that philosophy forward. And the current Firebrick, we revamped it completely when we moved to an ARM platform. So we started from scratch completely new Ethernet control and drivers and network stack. And we built in IPv6 from scratch at that point as well. So the current version of internet protocol, IP version 6, is built in from the ground up in the software now.

Gavin Henry 00:16:21 Thank you. And Kevin, you touched on the CPUs and ARM 64 bit. Is that correct?

Kevin Hones 00:16:26 This one’s actually an ARM 32-bit.

Gavin Henry 00:16:29 Okay, is that what we’ve got in our mobile phones or?

Kevin Hones 00:16:31 No, you’ve probably got something more advanced in your mobile phones these days. The things that we tend to use in industrial control are usually a few years behind the cutting edge that appear in phones because one of the things supply chain issues aside is we want continuity of supply and industrial parts tend to be things that you can design now, and you can still buy them from manufacturer in a decade’s time if you need to. But as a result of that, they tend to be a little behind the frontage, but they’re perfectly adequate for switching 10 gigabits of Ethernet, which is what we need them to do for this product.

Gavin Henry 00:17:04 And is there a concept of RAM or memory in this?

Kevin Hones 00:17:08 Very good point. There is, there’s a single, SODIMM socket, which I think we have eight gigabytes of SD RAM, which doesn’t sound again a huge amount by modern PC standards, but actually for a router, it’s plenty.

Adrian Kennard 00:17:23 Oh, it’s luxury. I can’t remember what we started with. It was tiny.

Kevin Hones 00:17:27 The very first Brick had a megabyte, eight gigs is quite a luxury.

Gavin Henry 00:17:32 Thank you. That’s a good summary of what we’ve got today. I think even from the latest model or, you know, up until that point, you can argue forever on this one, I think, but which is the hardest part, the software or the hardware?

Kevin Hones 00:17:45 Actually, I would concede on this one, the amount of work that goes into the software exceeds that in the hardware. So it’s also never ending. The hardware is a discrete thing. Once you’ve built it and it’s in manufacture, you don’t need to do a great deal apart from component sourcing.

Adrian Kennard 00:18:01 Oh, I remember the days when software was like that and you could make a software and it was put in a mask ROM and it was done, but no, it is never ending now.

Gavin Henry 00:18:09 So you are constantly waiting for Adrian, Kevin?

Kevin Hones 00:18:12 It’s not quite like that. I tend to be moving on to the next product in the line by the time Adrian’s in full flow on the current product. It’s just, there’s a phase shift. The hardware has to exist before the software can be done, but once it exists, there’s often some more hardware needs to be done.

Adrian Kennard 00:18:31 So to be fair, you do make it sound a little bit like it’s just me and Kevin. We do now have a bit of a team working on all of this. And thankfully I’m not having to spend all of my time working on the software at the moment. And the same with the hardware, there’s people doing PCB layout and things like this as well. So it isn’t just the two of us, thankfully.

Gavin Henry 00:18:50 Thank you. And if you feel confident enough, could you give me one disaster that you overcame, an example of?

Kevin Hones 00:18:56 Oh, easily software or hardware?

Gavin Henry 00:18:59 I’ll give you a minute on each.

Adrian Kennard 00:19:00 You go first, Kevin.

Kevin Hones 00:19:04 Thank you. Well, we’ve not had any huge disasters. In the current FB9000, which is most topical, we’ve had a few challenges in particular to do with clock chips. That’s probably something that, as a radio guy, is going to be quite obvious to you, but things like a 100 mHZ oscillators are not trivial things to make. Good we’re using bought-in ones. Well, it turns out there’s actually a huge difference between different oscillators from very good manufacturers in practice, in particular with jitter. And we did have one particularly thorny problem, which took a wild to diagnose, which turned out to be one brand of oscillator jittered in a way which prevented 10 gigabits from working well, which is obviously a fairly fundamental thing for a 10-gigabit router.

Gavin Henry 00:19:54 Now it gives you your timing, does it?

Kevin Hones 00:19:56 Yes. The basic timing for the processor and the Ethernet subsystems, it was difficult because you had to be looking at it in the right way to actually find it electrically. If you looked at it with the normal tools, oscilloscopes, frequency counters, it was bang on, but the jitter showed up best as a spectrum analyzer plot where you could see as well as the peak at a hundred megahertz. In this case, there were side bands of noise, far higher than they should have been. And once we got rid of those, suddenly the 10 gig was working rock solid.

Adrian Kennard 00:20:28 Yeah, the trick was just used a different manufacturer.

Kevin Hones 00:20:30 In this case. And we’d had some that worked. So we knew the 10 gig worked. It’s just, it didn’t when we some of the prototypes.

Gavin Henry 00:20:37 But that comes down to, you know, almost 30 years’ experience how to troubleshoot things.

Kevin Hones 00:20:42 Very much so. Yeah.

Gavin Henry 00:20:44 And the time delay with getting a new component as well.

Kevin Hones 00:20:47 To task as well. So that’s probably the closest we’ve had to a disaster on the 9000 in terms of design.

Adrian Kennard 00:20:52 I think we had something with the 6000 where the first ARM processor we were using turned out to be horrible bodge of different components of different speeds and behaved very strangely. And we essentially moved on to a completely different chip afterwards, didn’t we?

Kevin Hones 00:21:07 That’s a good point. The first one was a very early Intel X-scale, which is another ARM architecture. And it was a 3-chip chip set and they didn’t integrate very well. Fortunately, we never ended up having to use that in production because Intel came up with a one-chip solution, which worked far better.

Adrian Kennard 00:21:26 And that’s when we started the software from scratch to do the ARM software. And thankfully that was the same software on that other chip set, essentially with very minor changes, so we could move forward. In terms of the software, I’m not sure disasters necessarily, unless you count OSPF? But we mention that later, but we have had some challenges.

Gavin Henry 00:21:49 That’s routing protocol, guys, if anyone’s listening.

Adrian Kennard 00:21:53 It’s a horrible routing protocol, but that’s just my opinion. We did have some interesting challenges when we started all this and we had these, the smaller FireBrick, because we were only selling very slow broadband lines, like 500K, we only had a 2-megabit link into BT in our offices in Reading. And that grew surprisingly quickly, broadband was a thing we were just trying out as will this take off? We had no idea and so we stopped selling new lines quite quickly because people would have slow service, but we ended up having to build into the FireBrick traffic shaping to manage the speeds of business and residential customers at different times of day, and time profiles to understand what time of day it was. And we built those features in very quickly into the software to handle the demand for customers on a small link while we waited for BT to spend months installing a bigger link for us in a data center. So we had to work fairly quickly to overcome a requirements change that we weren’t expecting in the early Firebricks. And that’s still in there now, those features.

Gavin Henry 00:22:54 And that gives you some reassurance or quite a lot of reassurance that your software development practice is in good shape because you can move quite quickly and get those things in place with confidence.

Adrian Kennard 00:23:04 Oh, definitely. And we we’ve had to do some well, you’re going to ask about features later, which I’ll explain some of the things that we’ve done during the pandemic, for example, where we’ve had to react quickly to changes in requirements.

Gavin Henry 00:23:15 Excellent. I think that’s a good place to move us on to Adrian’s remit now and his team, the operating system. Thanks Kevin, for that last bit. So you’ve designed the hardware and you’ve got to have some type of operating system to speak to it. Can you take me through process management, network stack?

Adrian Kennard 00:23:30 Yeah. The key thing here is the operating system isn’t like the operating system you may be familiar with in a PC or a Linux box or something like that. There you have an operating system as a sort of baseline. You can then install your own programs. And the operating system has to protect the users from themselves very much because it could be any program. With an embedded system like this, the operating system does play an important role. It does manage the different processes and memory management and semaphores and signals and so on, but it’s not having to quite play the same role where it’s unexpected end user software being thrown at it. The whole system is tightly controlled. It only runs our software. So there isn’t quite the same dividing line between the operating system and the application that you would see normally. In some ways that makes life a lot easier.

Adrian Kennard 00:24:20 But in other ways it means the whole lot’s one big product we have to manage and test all together rather than separate things necessarily. The original simple process switching stuff that we had in the very first Firebrick was redone as part of moving towards an ARM processor. And it has to allow lots of different processes to run, although they’re generally not starting and stopping dynamically, they can do, but mostly they’re all fixed processes that do a particular job as part of the overall function and have to work together with each other and messages between them. So that’s the sort of process management, if that makes sense.

Gavin Henry 00:24:54 So that would be, is it a process or a daemon or a server that would take in network packets and then do something with them?

Adrian Kennard 00:25:01 Yeah. There’s actually a surprisingly large number of processes. You can go into the web interface and get a list of them. So there are things to handle packets that’s mostly done on interrupts rather than a separate process. We try and shift packets in and out as quickly as possible, but there are, there are processes to handle each protocol. So things like BGP, DRP and so on, DHCP, they all have processes that run. And there are queues of packets that go into those processes that they then handle and send out packets. The whole job’s packets in, packets out, one way or another.

Gavin Henry 00:25:34 And so if we had a packet come in through the Ethernet interface, as it were, could you take us through a flow of that?

Adrian Kennard 00:25:41 Yeah, sure. There’s fortunately we do have this DMA direct memory access. So, we get an interrupter say there’s one or more packets waiting, and there’s two key sort of paths to those packets. If we are passing the packet through, we are acting as router or as a firewall or doing network address translation whatever, the packet comes in, we work out where it’s going and we may have to make changes to the header. If the simplest, just being the Ethernet address, it’s going to, to send it onto the next gateway, but we may have to make changes in the IP layer, things like network address translation, and even add or remove headers for tunneling protocols, but we make those changes and we send the packet on its way, and that’s all handled in the interrupt to move that packet in and out as quickly as possible.

Adrian Kennard 00:26:24 However, there’s a lot of functionality where the FireBrick is the end point of the communications. So any of the protocols — accessing its web interface, talking BGP, DHCP, et cetera — involve the packet coming in and being put in a queue, that queue then causes a process. That’s waiting for packets on that queue to run, pull in that packet, do its job and send it on its way. And that’s handled more as a sort of main task that’s task switched between the different processes and the queues have semaphores, so it wakes up the write process and that’s separate from the shift packets in and out as quickly as possible for booting.

Gavin Henry 00:27:01 You mentioned the word semaphore there. Could you just explain to the listeners what that is and how you use it in the router?

Adrian Kennard 00:27:07 Yeah, it’s a flag or a counter sort of thing; it’s used for things like knowing whether there’s a message in a queue or if you need to lock out two things trying to do something at the same time. And it’s important that it’s part of the operating system, because you can have a process waiting on a semaphore, it’s waiting until a packet’s ready or something. And so the operating system knows not to even try running that process cause it’s waiting. And as soon as the semaphore is set the right state, it can then add one or more processes that’s waiting onto the queue of processes to run and make sure they all run when they’re meant to.

Gavin Henry 00:27:44 Is that similar to mutex or is that something completely different?

Adrian Kennard 00:27:48 Well, it’s all part of the same mechanism in the operating system. It’s used for a mutex where it’s a semaphore that’s just one or naught, but it can also be used as a counter.

Gavin Henry 00:27:57 And does this go back to what you said, Kevin, about the oscillator being the key thing to make sure that all moves along for the right speed predictably?

Kevin Hones 00:28:05 Yeah. The oscillator is the fundamental system clock, which all computers have. In a way, yes. It’s a bit like a metronome, but rather higher speed telling the insides, do something, do something, do something the whole architecture of modern electronics works around that like it’s heartbeat.

Adrian Kennard 00:28:22 Yeah. So the software does have sort of like a heartbeat. It has timers, it has functions that run periodically. But a lot of what we are doing is, is based on queues of packets. So the interrupt controller says it’s got a packet, puts it on a queue for a particular process. And then the operating system has to decide which process to run next, depending on which processes are more important or which have been waiting too long, which have things waiting in their queue. And it makes that decision and runs the relevant process to handle that next job.

Gavin Henry 00:28:52 So what looks after if one of these processes has an issue or is slow or disappears?

Adrian Kennard 00:28:59 Ah, well it’s an embedded system. So as I said, it’s a little bit different to your average user programs running on a PC where yes, they can hang up or go wrong. Basically, they don’t — or rather they shouldn’t. So no, a process can’t really lock up like that. It has to get on do its job. There are built in software and hardware watchdogs just in case something unexpected does happen. And that actually causes the whole system to reset and generate a report that’s emailed to us to tell us that something stupid happened and those are relatively rare. It’s not like a PC where you might stop that task and restart it. It shouldn’t stop. That’s the whole point.

Gavin Henry 00:29:39 Okay. Thank you. And you spoke about the packet coming in, depending on what it looks like it might go straight out to its next pop or endpoint or the router itself might have some type of services on it that it will use that packet for and make replies and things. So obviously that has loads of different protocols involved in there. You have to write them all, I take it?

Adrian Kennard 00:30:00 Absolutely. And when a packet comes in, it’s just a sequence of bytes and you have to break it down and it starts with, with MAC addresses and then it has internet protocol, IP headers, and then it might have UDP or TCP or IP sec or something else. And then there’s payloads in that. And even when you get up to TCP, you’ve then got protocols on top of that, like HTTP for the webpages and BGP for which is a routing protocol to manage routes between routes. So all of these layers have their own protocols, and we’ve had to write everything from scratch to do all of that, largely because of where we started from, there weren’t readily available embedded system IP stacks you could use. So we had to write them and these days it’s, it’s more policy. We’ve had to write them. We build on them and we do write all our own protocols.

Gavin Henry 00:30:47 And what was your language of choice for all of this?

Adrian Kennard 00:30:51 Ah, yes. One of your trick questions here. It’s all done in C. There’s a little bit of assembler. There has to be in any low level operating system, but we use C. None of us are really keen on C++. So it’s all in C and we are very experienced C coders, but the other thing you, you did ask before we started here is what would we use if we would start again and we’ve discussed this a bit and we’ve actually considered the possibility of even using ADA because of the very strong typing and controls it gives. Even C programmers with lots of experience do sometimes need these extra controls to make sure things don’t break.

Gavin Henry 00:31:26 Yeah. We did a show on that, that I’ll put in the link notes show notes rather about ADA. I did a bit of research on that after. It’s quite an interesting language too.

Adrian Kennard 00:31:35 It is interesting, but I think because it got mandated for military projects, everyone shied away from it, which is a shame, because it’s quite a good language.

Gavin Henry 00:31:43 And it’s not something that a lot of people say, oh, you should use Rust for everything, but that’s not something that would work in this type of environment.

Adrian Kennard 00:31:50 I suspect any language would work, but C’s what we use because that’s the experience we had when we started. That’s where we’re coming from in terms of what we’ve used most in the past.

Gavin Henry 00:32:00 Okay, thank you. I’m going to move us on to how you test all of this next. There’s lots of different moving parts. So, obviously you’re selling these things. So there’s certain legal and government type certifications you need to put on things. So that will probably help with what you need to get test and certified. Can you just take us through what a modern router in 2022 needs to have for it to be able to be plugged into a data center?

Kevin Hones 00:32:25 A lot of it is very similar legislation to any electronic product. I must say, electronic testing standards have improved immensely in the years I’ve been in the business. Back in the day equipment often didn’t work with each other, failed in silly bizarre ways, because there was no testing. There is now. Effectively we have two types of standards we have to comply with. First is electromagnetic compatibility, both for emissions and immunity. And secondly is for safety. Obviously both are rather important things. EMC makes sure that you can have one piece of equipment sat next to another piece of equipment and they don’t interfere with each other. In a data center rack full of equipment, that’s absolutely fundamental to the whole thing working. Secondly safety testing, you can’t be too safe. And there were devices in the past, which literally burnt buildings down because they weren’t thoroughly thought through. Not our devices, I second.

Kevin Hones 00:33:24 We always follow the safety standards and often exceed them whatever they are. But in order to sell a product, you need to put a CE mark or now a UK CA mark, which is pretty much the same thing on it. And in order to do that, you need to make sure that it does meet the standards. And in practice, the only way to do that is to employ a test house, certainly for the EMC. In practice, what that means is you send your product or go along with your product to a test house. And they work on it for typically about three or four days running all sorts of tests, pointing aerials at it and bombarding it with quite high energy RF, having very sensitive receive aerials, listening to see what’s coming out of it, sending nasty spikes and surges up main’s inputs and any other connections that it has. And if it survives all this and it still working at the end and hasn’t radiated anything that it shouldn’t do, then it gets a pass.

Gavin Henry 00:34:18 And how much of that do you have control over? I mean, sorry, from the point of view of you’ve potentially put some of your own electronics in to make components speak together. Obviously, the components are manufactured by the manufacturers, so they’ll have some type of certifications they’ve got. So do you have to tweak your power supplies that you’ve built or the …?

Kevin Hones 00:34:37 Very much so it it’s more case of just good engineering practice. Very often a lot of problems for complex systems are in power supplies, or poor grounding is a good one. If the grounding isn’t right, you’ll get currents flowing in paths that you shouldn’t do. And even down to cabling, the layout of cables within boxes can pick up bits of mush from one component and carry it straight out the front panel. So it’s down to experience again. Once you’ve been through a few EMC tests, you learn pretty quickly the kind of things that affect it, and you make sure your next design is as good as possible before you go and test it. And all things being well, it’ll be okay. We we’ve got a good track record in that now, but the very first things like anybody you learn as you go.

Gavin Henry 00:35:24 Thinking back to my unit days and RF stuff, it’s all a bit of an art. Isn’t it, RF engineering, radio frequency engineering?

Kevin Hones 00:35:31 Very much so. And it does help to have some people which we do know who are very much into RF to advise certain things. A lot of it, like so many things in life, turns out to be common sense once you think it through, but it’s not necessarily easy stuff to think through if you haven’t grown up in the field.

Gavin Henry 00:35:49 Thank you. And so, from the network side of point of view?

Adrian Kennard 00:35:53 Ah, well in some ways, life’s a lot easier because there isn’t formal testing you have to do before you can sell a network product. And that might sound like it’s easy. You don’t have to do all this certification and sending off to test houses. But on the other hand, you haven’t got someone you can send it off just as easily and say, does it all meet these specs? So, you have to do a lot of in-house testing and a lot of testing of does it work with other products? The specifications are, in most cases in, in RFCs — the network standards that exist. Writing the protocols to follow those RFCs strictly is great, but you don’t always find everything else quite follows them perfectly. So sometimes you have to find a lowest common denominator in terms of how the protocols work to work with the most of other equipment.

Adrian Kennard 00:36:44 And we’ve had to do testing things like we have a complete voiceover IP telephone system in the FireBrick now. So, it can be your office phone system. And we’ve had to set up dozens of different manufacturers of voiceover IP telephones. I’ve got a picture somewhere of an office full of weird and wonderful telephones and different service providers and check how they all work together and identify when they don’t and work out the best way of making them work. Even when we are doing it right and someone else is doing it wrong, we still try and make it work if we can.

Gavin Henry 00:37:16 So would this be a case of, you’ve looked at the request for comments that are RFC standards, that everyone works on to agree a common way to do something. You’ve taken that protocol, you’ve gone through the must, it must do this. And it may do that.

Adrian Kennard 00:37:32 Yeah, must, may, should. And all this.

Gavin Henry 00:37:33 Yeah. And you’ve found that the musts are not all there or?

Adrian Kennard 00:37:37 Well, one of the problems is that not all these protocols are necessarily operating completely in isolation. So you may have firewalling getting in the way of allowing a protocol to work the way it was designed — particularly voiceover IP phones. They can work with a subset of the RFC. We’ve gone through many iterations of making a voice service for Andrews & Arnold. And we now use Firebricks as our core voiceover IP service. But the early iterations we expected to be able to do in a certain way to have lots of different sort of call routing back ends. And then we found loads of phones can’t cope if they’re told to do a call setup to one IP address, but the actual audio goes to another one, for example. They just won’t do it even though the RFC says they should. So we’ve had to design the system to be, let’s say like lowest common denominator.

Adrian Kennard 00:38:29 We only use one codec, which is a codec everybody uses as a common one rather than doing any conversion. So, we have to make these decisions in terms of designing the protocol. And sometimes we design protocols with extra features as well. Our voiceover IP deliberately has situations where it won’t respond to requests even to say, no, you are wrong because that then tells someone attacking your network, that you’ve got a voiceover IP server sat there, and they’re going to go ahead and keep attacking until they get in. So we have settings where if you’re trying to talk to a voiceover IP server from outside, even though that’s allowed because you’ve got some phones that people working from home or something, it won’t respond unless you’ve got all the credentials, right. Whereas from the inside, it’ll respond and say, no, you’ve got the password wrong try again, sort of thing. So which means technically we’re not following the spec we’re meant to respond, but we have an option to say, don’t do that on the outside.

Adrian Kennard 00:39:28 Extend the protocols.

Gavin Henry 00:39:28 Sorry that fits nicely with our OWASP. That was just came out for security vulnerabilities. Because that would be just like a website’s login page where it says that user doesn’t exist or that user exists your passwords incorrect. So it’s that type of hiding.

Adrian Kennard 00:39:43 Exactly. And in this case, we are actually not responding at all. You know, we are not a VOIP server. We are not answering because that’s the best way to not then get hammered with lots of different password requests.

Gavin Henry 00:39:54 And these types of tests, do you do any sort of unit tests or integration tests on the software side before you actually test the protocols live? Do you have to create your own protocol simulators, or are there tests for that?

Adrian Kennard 00:40:09 In some cases we have to simulate the protocol. In a lot of cases we can set up or the equipment that already talks to the protocol to test it. So during development, we will sometimes be setting up several different, you know, like a Linux box or a PC or as I said, several VOIP phones to test. On occasion, we’ve had to create something specifically to simulate protocol. But you always run into the problem there that if you create your simulator to how you’ve read the RFC and you create your code to how you’ve read the RFC and especially if, how you’ve read the RFC, isn’t quite correct. It’ll work perfectly because they’re talking to the same understanding. So simulators that you’ve made aren’t always the best answer. We do have a test set up that is used for performance testing and regression testing before software builds come out. This is sort of several different versions of Firebrick and various other equipment that communicates with it to do various tests.

Gavin Henry 00:41:01 Yeah, we’ve done quite a few shows on software engineering and testing where that exact point you’ve raised, where the test is only as good as the person that’s written the test. And if they’ve written the code, the test is generally going to pass. So it’s best to have those slightly separate.

Adrian Kennard 00:41:17 It helps if you’ve got a team where it’s different people that do different things, but even then there’s no substitute for some real world testing as well with other equipment and other manufacturers just to make sure you’re not getting the wrong end of the stick somewhere with how it should work.

Gavin Henry 00:41:31 I’m going to have to move us along a bit to try and get as much covered as I can, but can we just finish up this section on testing with how you bring in security testing for these and one example of something you found that you had to fix?

Adrian Kennard 00:41:45 I’m not sure I can think of, I mean, security is one of those things you always have to be working on and always improving. We’ve improved things like how we do password hashing, that sort of thing, just as later standards come along. But as I said, we don’t have to do any formal testing before you sell a product like this. But we do have a lot of our customers that have been involved in formal penetration testing of their networks protected by Firebricks. So we know in that environment, we pass those tests with no problems, mostly it’s our own testing to try and work out can we attack Firebrick rather than separate test houses for that.

Gavin Henry 00:42:19 Okay. And is there anything that you can recall in the specs that you, or the features set of a protocol that you thought you’d done and picked up?

Kevin Hones 00:42:28 Can I just add something here? We have implicitly had testing done in customers premises. Lots of our customers use Firebrick to protect their networks and they have had those pen tested by professional pen testing companies. So we know that there have never been any problems with any of those sorts of pen tests. I know it’s not a sort of scientific way of doing it, but it’s real world we’ve been implicitly tested more than once.

Gavin Henry 00:42:53 I’m going to move us on to you’ve built the software. You’ve tested it. You’re happy with it, but that’s not the end of it. So you’ve got to keep constantly fixing any issues that come up or handling feature request. This is commonly called the release cycles of software training as it were. Can you tell us a little bit how you deal with release cycles or if you get a feature request?

Adrian Kennard 00:43:14 The releases are fairly straightforward in that we have, obviously, we can build the software ourselves with changes as we’re working on them to do testing. We will then make an alpha release — and this is something that’s on the Firebrick website and you can download an alpha release. Normally, customer Firebricks won’t run one of these alpha releases. The customer needs to speak to us first and say that they want to try out an early release of software and will enable it on their Firebrick. And this helps avoid just people being gung-ho and saying, I want the latest software and then getting code that doesn’t necessarily work a 100%. So we do have some customers that do load these alpha releases. And it’s usually when we are working with someone on a feature change or request that they’ve got, we will do ongoing alpha releases regularly, sometimes several a day.

Adrian Kennard 00:44:02 Sometimes, you know, it could be a week apart, but we’ll release these so that people who are testing them can try them out and give us feedback. When we’re happy with a milestone that we’ve got a new features or we want to make a release, then we make a beta release and this is available to everybody. Anybody can load one of these, but Firebrick aren’t automatically loading a beta release. You have to tell your Firebrick, you want to be a bit more leading edge and try the beta release. And if there’s any problems, we’ll withdraw that. And that’s happened a couple of times where we’ve done all our testing. We’ve had customers doing various testing on offers, we’ve done a beta and someone’s found something significantly wrong with it that we need to withdraw it, fix it, make another beta release. Happens occasionally, but not very often.

Gavin Henry 00:44:43 What sort of thing would that be?

Adrian Kennard 00:44:45 Yeah, I knew you can ask that and I’m trying to think it it’s quite a while since we did that last time. So I’m not sure I can actually think of a specific example for that, to be honest. It’s usually the sort of thing where there’s a customer with something very obscure in their setup that isn’t passed by normal testing. Cause there’s so many different ways people can use a Firebrick that we can’t test every possible way. We have to test each, each subsystem as much as we can, but some of the combinations of working, we have had occasions where that’s happened, but I can’t think of a specific example.

Gavin Henry 00:45:15 So presumably you then incorporate that test for the next time. Yeah,

Adrian Kennard 00:45:20 Yeah. So once a beta has been released, usually for a few weeks and we test it on our core network as well to make sure, especially for ISP infrastructure, we want to make sure there aren’t any issues with that. And then we promote that to a full release. At that point, most Firebrick will automatically upgrade to that over the next 24 hours at some point, and most people don’t even notice their Firebricks upgraded. It downloads the new software automatically. It re-flashes it, it reboots and the reboot is well under a second. So most people don’t even realize their Firebrick upgraded. The core network ones in data centers are not set to do that. Mostly the IT people involved in those want to carefully manage when they do an upgrade. And so, they’ll look at a release note from us and decide when to do it. But the smaller Firebricks automatically upgrade, but we give customers a lot of choice about how much risk they want to take.

Adrian Kennard 00:46:11 Customers can be loading alpha releases. If they want, they can load betas, they can load releases. They can even set the system to say, I don’t want a release until it’s been out for two weeks, just in case something happens and they can tell their Firebrick, don’t load it straight away when it’s available, leave it some time. They can tell it to only do it in the middle of the night if they want. So they’ve got a lot of control or they can tell their FireBrick not to upgrade. We obviously don’t recommend that, especially as it’s a security product with firewalling and things, if we are improving features or security, it’s best if everyone gets an upgrade, but you can do that even.

Gavin Henry 00:46:43 Are you able to share — happy if you say no — how you get a user to opt in to run an alpha? You know, what you changing there? Is that a software toggle or a hardware toggle?

Adrian Kennard 00:46:54 Uh yeah. We have settings in a database in terms of what a Firebrick’s capabilities are, and we can change those and provide a new sign configuration for the Firebrick so that it then knows it’s allowed to load enough of release or not. Both the code and the configurations all digitally signed well, it’s called capabilities in our setup rather than configuration. Configuration is what the customer does to set their Firebrick up. The Firebrick’s underlying capability is a digitally signed bit of XML data that can be sent to FireBrick so that it knows it’s come from us.

Gavin Henry 00:47:26 You have to ask you to re-sign a bit of something? Yeah, okay. That makes sense. So they can’t just go and download it because they’re. . .

Adrian Kennard 00:47:33 No it’s for tech customers for themselves. Really. We know there are plenty of people who would say, oh, I want the latest alpha software. And we don’t make any guarantee that the alpha software actually works. It’s mainly for the people who are looking at the features we’re working on now, to try out. Rather than just for everybody.

Gavin Henry 00:47:50 But is that how you validate everything as PTP signatures are similar, private key or…?

Adrian Kennard 00:47:55 There’s different security for different things. So there are the code, as I say, is signed. And so is this capability, but things like IPsec tunnels and HTTPS certificates and so on, are all managed in different ways. So things like HTTPSs certificates are managed typically using nets encrypt. And that’s also what a lot of people use for IP sec, where they validate the domain name at the end using a lets encrypt certificate. So there’s, there’s different levels of,

Gavin Henry 00:48:20 Yeah, I meant the software, the firmware, sorry,

Adrian Kennard 00:48:22 The firmware is digitally signed and, and it’s, it’s a different signature level for alphas and releases. So even though there’s a team of software engineers, only specific people can, can sign a release, for example.

Gavin Henry 00:48:33 I’m going to move us on to the last section of the show, it’s gone really quick. So just to summarize again, so we’ve got the lessons where we are, there was nothing like this out there, efficient or low power at the time, and you’ve evolved with that. Your C engineers, so that was the right choice at the right time and still is today. It’s extremely feature rich and low energy use equipment. You can upgrade them on the fly, but they need to be told how to do that. Use all the standard protocols.

Adrian Kennard 00:49:01 Well by default, a customer Firebrick will just upgrade itself with new releases automatically. You don’t have to do anything special with that at all. It’s only the alpha releases that we treat, especially like that.

Gavin Henry 00:49:10 So we’ve got the ongoing life cycle of the product and it’s all certified and tested. But now as a user of that system and product, I want to make a change. And that’s a whole different thing, isn’t it? Managing configuration, validating that, checking the user’s not messing their own thing up.

Adrian Kennard 00:49:30 Yes. We

Gavin Henry 00:49:30 Changes remotely. You can support a product. It’s very easy for people to think, oh yeah, I’ll just create a network, operate from scratch with the hardware. But until it gets out there in the real world, you there’s so much more that you’re missing. So could, could you take us through the ongoing configuration and upgrades that you had to think about?

Adrian Kennard 00:49:48 Well, customers configure their own Firebricks. As an ISP, when we sell Firebrick, we do offer a service to help someone configure their Firebrick if they want for a small fee. And we also provide sample configurations for their broadband lines. So if you, if you buy a broadband line from us and a FireBrick, we can say, well, here’s a starting point for your configure, understand your logins and everything for your broadband to work and firewall settings to protect your LAN and here’s something to get started.

Gavin Henry 00:50:12 That’s a good point. I haven’t actually said that this router isn’t just to work with your own ISP. It can work with anything.

Adrian Kennard 00:50:18 Yes, it’s an Ethernet-level router, so it works with Ethernet, but it works with PPP protocol as well. So if you’ve got a broadband modem, it will work with that. I’ve got one on a StarLink satellite here acting as a gateway to work as a backup, for example. So there there’s lots of ways you can use this. In terms of the configuration, we made a decision very early on to make a single definition of the configuration. And this, this is XSD-based. It’s an XML protocol to define XML, which is just all a bit incestuous, but it defines all of the settings and fields in the configuration. And that single master file is what generates all of the headers and definitions in the C code. So the actual code using the config, it also generates a published XSD so people can actually use it with tools to validate the XML config, if they want themselves.

Adrian Kennard 00:51:10 And people do that. It generates the manuals for the config fields. It generates the JavaScript based web config editor. So on the config webpage, you go in and it’s got icons and labels and fields you fill in and help text. All that’s generated from this single master definition. Obviously that’s what gets updated when we add new things to the config, but it means that they’re all consistent. And we’ve seen so many routers where the command line has some config settings that don’t exist in the web interface or, or the saved file or whatever. With the Firebrick, they’re always consistent because they’re made from a single file, which I think is an important feature.

Gavin Henry 00:51:44 Yeah. I think one of the things as a user and engineer that you experience over your lifetime is evaluating products and, you know, the life cycle of upgrades, you’ve got to constantly check the change logs, you know, is this deprecated, is this still there? And if you do it on the XML side of thing, you can instantly do that, can’t you?

Adrian Kennard 00:52:02 Well, one of the reasons XML was chosen as the underlying config format is that it’s, it’s extensible — the clue’s in the X. So when we add new features, we generally try to make sure that you don’t have to mess it out with the config when you do an upgrade. It’s partly why the upgrades are automatic. You don’t have to think about it. Config carries on working. The new features are extra fields or settings, which if necessary have defaults so that they just become available as new features. And we don’t very often deprecate something. So, XML’s worked really well as the config, but you can edit it in XML, even through the web interface. But a lot of people use this web-based sort of graphical interface to edit it where you can go through different icons and listed sections and open them up and fill in the fields.

Adrian Kennard 00:52:45 So, we have this relatively easy to edit web-based config. But one of the things you were saying about, no, there’s nothing like trying this out in the field with real customers. One of the important things with a router and a firewall is customers can dig themselves in a hole. You can very easily configure the FireBrick to shut you out. And that’s not too bad if it’s sat in front of you, there’s a factory reset process. But if it’s a hundred miles away in a data center, that’s a pain. And one of the features we put in — it wasn’t there in the beginning, it was a few years ago — is a test config and you press test and it applies to config. And if you don’t do anything for five minutes, it puts it back. So when you lock yourself out, you just have to wait five minutes and then it starts working again.

Adrian Kennard 00:53:27 And you can work out what you did wrong. Of course, if it does work, you can then say no, make the config permanent. So that was an invaluable feature we put in to help users protect them from themselves and make it so you can test a config and we certainly recommend it. You can even make it so that a certain user on the Firebrick can only test the config first. If they make a change and that you, you define which users are allowed to make changes. And which aren’t, you can say, yes, you’re allowed to make a change, but you have to press the test button. Only when you’ve done that, can you then commit it?

Gavin Henry 00:53:57 And is this a benefit of using XML for that type of thing, or just a design pattern?

Adrian Kennard 00:54:01 That could have been done with whatever type of conflict we used. It’s not really an XML-specific thing, but we chose XML because it’s extensible, it’s what’s stored in memory. It’s also something people can work with remotely. It’s very easy to use external tools to manage XML. And we know lots of customers who generate configs on the fly using other systems in XML, because it’s such a standard. And actually we do that on our core routers. We take the XML from the router and we set certain things and send it back to the router or the FireBrick. So it’s very easy to write tools to manage XML. And that’s another reason we’re using it. And it works really well.

Gavin Henry 00:54:39 And was it always like that with the XML configuration or was it something. . .?

Adrian Kennard 00:54:43 You know I can’t remember the very first Firebrick. I think XML came in with the rewrite for ARM, I think.

Kevin Hones 00:54:49 It was web-based only the very first.

Adrian Kennard 00:54:51 Yes. Yes. And when we moved to ARM, we decided on this single config definition and all XML-based.

Gavin Henry 00:54:58 And I know a lot of our listeners were thinking about the time that they’ve used XML and SOAP APIs and they’d be thinking, why not Json or something like that?

Adrian Kennard 00:55:08 No I spoke with Kevin about this earlier. I was saying, if we did it now, it might well be Json, but it’s XML. And it may as well stay like that. Mostly people aren’t editing the XML. Mostly they are working with the web interface, the graphical interface. But XML works for this purpose, it’s fine and to be fair, when we started the XML, that was the thing everyone was doing. And Json really didn’t get a look in back then. These days, perhaps it would be a different decision.

Gavin Henry 00:55:36 And you’ve got code that does it. It’s tested. It’s, you know, it’s mature, it’s been out in the field. It would need to be a major decision really to justify,

Adrian Kennard 00:55:43 Well, we’d almost certainly engineer it so that you could do XML or Json and there’d be a compatible translation between the two. But yes, it would be, yeah that would be something to think about depending on, you know, if enough customers come to us saying that we really want to work in Json, not XML. Then we might consider it.

Gavin Henry 00:56:01 Thank you. That takes us up nicely to starting to wrap up the show could have done a show on each of those subtopics. It’s very difficult to give an overview and get enough technical detail. So thank you for, for that. I think we’ve done a great job of covering what goes into not only spec-ing up a router, the history of it, the components, the testing of the hardware, the software, and building everything from scratch. And, but if there’s one thing you’d want, I don’t know, a sane software engineer to take away from our show, what would you want it to be? What is the thing that you would like to instill?

Adrian Kennard 00:56:34 We did think about this. Um, to some extent it’s really that reinventing the wheel is not always a bad thing. The history of the Firebrick means we weren’t just reinventing the wheel. We were coming up with new things from scratch because a lot of what we wanted to do wasn’t there, but reinventing the wheel is, is what we get accused of a lot. Because particularly these days where, putting voiceover IP on there, we could have taken a standard off the shelf, open source, voiceover IP platform and tweaked it to work on the Firebrick. And to be honest, I think if we’d done that it wouldn’t be anywhere near as good. I think we’ve done a much better job because we did it from scratch. So I think the message there is don’t be afraid to reinvent the wheel sometimes. I mean, not always, but it’s definitely worth considering.

Gavin Henry 00:57:16 We hear that a lot actually. And you see it on some of the articles online and some of the sort of thought leaders in the software engineering space where sometimes, you know, a less feature-rich specific version of something is better.

Adrian Kennard 00:57:29 Absolutely. I I’ve seen, I mean, obviously as part of doing this, we’ve considered other libraries and I do lots of other software and I’ll look at a library to do something and sometimes you’ll find a library is so bloated and so much, and what you actually want is a tiny subset. And so sometimes it’s actually a lot easier to just write that specific bit that you need. Other times, you’ll see a library where it doesn’t work very well, or particularly with the Firebrick, the way we handle packets efficiently and try to do something at a very low level, as fast as we can and reliably, means you have to write it in a different way to a conventional operating system for an embedded system. So sometimes the libraries out there just don’t fit, but sometimes they’re too big and you want a small bit, so it’s, it’s always worth considering.

Gavin Henry 00:58:12 And Kevin, would your message be always make sure you’ve got a good earth?

Adrian Kennard 00:58:18 That’s a good one. Sums it up nicely. I like that. You’ve got to be well grounded to be a hardware engineer.

Gavin Henry 00:58:24 Yes. Was there anything we missed that you’d like to mention?

Adrian Kennard 00:58:27 The only other thing we, you asked about features and we didn’t really cover it. We do take feature requests from customers. We try to do things if we think lots of customers would want them, or sometimes if we think it’s a really nice feature. And in the pandemic, we did have to react quite quickly to requests from several people who wanted a high-availability internet. They wanted to be able to use multiple internet connections at once. And if one of them broke not drop a packet because they’re doing things like this podcast, recording here is all done over the internet. And if your internet drops out, even if it’s quick to react and fall back and only takes a minute, it breaks things. And we have people like judges doing video conferencing from home and things like this. And they wanted a way to do high-availability when the link breaks, because it will, they don’t lose anything. And we created a custom package based on L2TP and multiple links and tunnels to do this. And it’s worked very well for them, but it was, it was a case of us having to react to changing circumstances that no one could predict and implement a feature fairly quickly for some customers who were in a fix. And that’s the sort of thing we still do. We still try and react and meet our customer requirements.

Gavin Henry 00:59:37 So when a feature request comes through like that, do you have to bypass your release cycle and alpha beta?

Adrian Kennard 00:59:42 No, no. We still do that. That’s where the alpha release is really come into their own. So a feature like that might be in, especially where it’s a completely new feature. We can include it in the Firebrick, label it experimental. We can include it in this particular version of the build in Firebrick. It’s only available to some people and we can include it in alpha releases so that people who want to try it can without upsetting our normal releases. But ultimately it does then end up in a normal beta release and then a release.

Gavin Henry 01:00:09 I think I’ve got time quickly for one last question. When you look back at the whole thing, yourself and Kevin and your team, and you have your list of protocols or hardware, is there one thing there that you, that you go, wow, we did that or is it just the whole project as a whole? What makes you, you know, gives you that smile when you go to bed at night when you’ve had a rough day, you think, ah, doesn’t matter. I did that.

Kevin Hones 01:00:31 I would say just the fact that we have products that we’re essentially running our businesses on.

Adrian Kennard 01:00:35 Yes, that’s a good point.

Kevin Hones 01:00:37 They sit there working 24 hours a day and do a good job.

Adrian Kennard 01:00:42 Yeah. One of the features we put in was constant quality monitoring. Monitoring every single line every second on our broadband network. And that has allowed us to pull apart major problems in people like BTs network because we’ve had this monitoring and they don’t. And so we’re this tiny player ISP and we went in and told BT they have core network problems and proved it. And if monitoring graphs ended up on reports to BT Directors and things like that, and I thought, you know, that’s amazing that we are a small manufacturer and a small ISP, and we are talking to the big guy like this and saying, no, fix your network.

Gavin Henry 01:01:18 And that’s because you know, inside out and can prove every bit of your own stack and hardware that easily, not you. Excellent. So where can people find out more? They can follow you on Twitter or…?

Adrian Kennard 01:01:28 Well, FireBrick website’s FireBrick.co.uk. I suppose there’s not a lot on there apart from the release notes. We do, obviously when we come out with new products, we put a lot on there and there is a Twitter account doesn’t post very often if at all. So yeah. What do you think Kevin, in terms of the best way?

Kevin Hones 01:01:43 Best way to get in touch with us after looking at the website is either pick up the phone or give us an email we’re very approachable. And if it’s something appropriate, you can talk directly to the people actually designing things. Sometimes that’s what someone wants.

Gavin Henry 01:01:56 And you’ve both got your own Twitter account don’t you? And Adrian, you’ve got a blog where you,

Adrian Kennard 01:02:00 The blog probably when I’m doing something new on the FireBrick or coming up with a new idea, that’s often on my blog. So that’s well worth looking at. You can get us on an IRC channel as well, believe it or not.

Gavin Henry 01:02:12 Perfect. Adrian, Kevin, thank you for coming on the show. It’s been a real pleasure and this is Gavin Henry for Software Engineering Radio. Thank you for listening.

[End of Audio]

SE Radio theme: “Broken Reality” by Kevin MacLeod (incompetech.com — Licensed under Creative Commons: By Attribution 3.0)

SE Radio 527: Adrian Kennard and Kevin Hones on Writing a Network OS from Scratch

Show Notes

Related Links

Other References

Transcript

Join the discussion

More from this show

SE Radio 716: Martin Kleppmann Local-First Software

SE Radio 715: Sahaj Garg on Designing for Ambiguity in Human Input

SE Radio 714: Costa Alexoglou on Remote Pair Programming

Menu

Recent posts

Search

Search

SE Radio 527: Adrian Kennard and Kevin Hones on Writing a Network OS from Scratch

Show Notes

Related Links

Other References

Transcript

Join the discussion

More from this show

SE Radio 716: Martin Kleppmann Local-First Software

SE Radio 715: Sahaj Garg on Designing for Ambiguity in Human Input

SE Radio 714: Costa Alexoglou on Remote Pair Programming

Menu

Recent posts