June 20, 2024

Break the Glass and Walk Away: A (VERY) Brief Overview of BGP

Break the Glass and Walk Away: A (VERY) Brief Overview of BGP
Apple Podcasts podcast player iconOvercast podcast player iconSpotify podcast player iconPocketCasts podcast player iconCastro podcast player iconPodcast Addict podcast player iconYoutube Music podcast player iconiHeartRadio podcast player iconRSS Feed podcast player iconDeezer podcast player iconAmazon Music podcast player iconPlayerFM podcast player iconCastbox podcast player iconGoodpods podcast player iconPodchaser podcast player icon
Apple Podcasts podcast player iconOvercast podcast player iconSpotify podcast player iconPocketCasts podcast player iconCastro podcast player iconPodcast Addict podcast player iconYoutube Music podcast player iconiHeartRadio podcast player iconRSS Feed podcast player iconDeezer podcast player iconAmazon Music podcast player iconPlayerFM podcast player iconCastbox podcast player iconGoodpods podcast player iconPodchaser podcast player icon

Ned and Chris give a very brief overview of BGP, its place in the history of the internet, and how it works today.

It’s a Confusing Day in the Neighborship

Sure, Kim Kardashian broke the internet that one time, but she’s not the only one capable of such a feat. In this episode, Ned and Chris recount the tale of how Verizon and a BGP optimizer took large swaths of the internet offline in 2019. This leads them into the intricacies of border gateway protocols, tracing its evolution from a temporary solution for NSFNET in the 1980s to a foundational element of internet routing today. Along the way, they explore version four's operational details, including key attributes like local preferences and AS path length.   


Links

1
00:00:00,530 --> 00:00:04,910
Ned: I made the unfortunate decision to just use chaoslever.com

2
00:00:05,170 --> 00:00:08,220
and no subdomain [laugh] . So, there’s two problems.

3
00:00:08,320 --> 00:00:09,049
Chris: One is Ned.

4
00:00:09,330 --> 00:00:13,179
Ned: One is me [laugh] . I am always the perennial problem.

5
00:00:13,710 --> 00:00:19,390
They go with the assumption you want to use ‘www’ as your subdomain, so they

6
00:00:19,390 --> 00:00:25,689
do support setting your apex record—the at record—for chaoslever.com to—

7
00:00:25,689 --> 00:00:25,709
Chris: [loud snores]

8
00:00:25,709 --> 00:00:30,050
.
Ned: [laugh] you’re very—you’re cruel.

9
00:00:30,050 --> 00:00:31,280
Chris: [more loud snores]

10
00:00:31,280 --> 00:00:33,730
.
Ned: [laugh] . Goddammit.

11
00:00:43,200 --> 00:00:46,000
Hello, alleged human, and welcome to the Chaos Lever podcast.

12
00:00:46,000 --> 00:00:48,550
My name is Ned, and I’m definitely not a robot.

13
00:00:48,620 --> 00:00:55,500
I am a sentient, real human person with feelings, dreams, and just the general

14
00:00:55,500 --> 00:00:59,710
desire to smoothly migrate a website and not have everything go to shit.

15
00:01:00,670 --> 00:01:04,259
[sigh] . With me is Chris, who was also here?

16
00:01:04,719 --> 00:01:05,279
Mostly.

17
00:01:05,669 --> 00:01:08,180
Chris: Have you ever read my favorite philosophical tract?

18
00:01:08,330 --> 00:01:08,880
Ned: I don’t know.

19
00:01:09,070 --> 00:01:09,830
Chris: It’s a short one.

20
00:01:09,839 --> 00:01:10,870
It’s ancient text.

21
00:01:10,910 --> 00:01:12,929
It was translated, I think, from the Sumerian.

22
00:01:13,250 --> 00:01:13,680
Ned: Okay.

23
00:01:14,080 --> 00:01:17,450
Chris: And the title is, “Whatever You’re Trying To Do,”

24
00:01:17,760 --> 00:01:22,929
Sumerian question mark, dot dot dot, “Yeah, Good Luck With That.”

25
00:01:25,719 --> 00:01:27,210
Ned: [laugh] . Wow, that is a philosophy that

26
00:01:27,210 --> 00:01:29,940
is just broadly applicable to every situation.

27
00:01:30,709 --> 00:01:33,479
Chris: I believe—and this is, you know, it’s really tough

28
00:01:33,500 --> 00:01:36,200
with archaeology because you get a lot of incomplete records—

29
00:01:36,680 --> 00:01:37,140
Ned: It’s true.

30
00:01:37,610 --> 00:01:42,060
Chris: But I believe, and modern science agrees with me on this, the

31
00:01:42,060 --> 00:01:45,550
follow-up book to that is, “I Fucking Told You It Wasn’t Going To Work.”

32
00:01:46,670 --> 00:01:48,429
Ned: [laugh] . I’m glad to know that the

33
00:01:48,500 --> 00:01:51,770
Sumerians were so blunt in their philosophy.

34
00:01:52,030 --> 00:01:53,370
There’s nothing aesthetic about it.

35
00:01:53,640 --> 00:01:54,800
I appreciate it.

36
00:01:55,090 --> 00:01:57,820
Chris: I mean, it’s really, really hot in [Sumaria]

37
00:01:58,219 --> 00:01:58,630
.
Ned: Is it?

38
00:01:59,210 --> 00:01:59,800
Chris: Sure.

39
00:02:00,860 --> 00:02:05,289
Ned: Whenever people would bring up ancient civilizations, Babylon,

40
00:02:05,299 --> 00:02:11,600
Sumaria, et cetera, I always thought of those as in, sort of, some

41
00:02:11,800 --> 00:02:16,700
mythical place that didn’t actually exist on the modern map of today, and

42
00:02:16,700 --> 00:02:20,700
I’m sad to realize at some point that was not true, and that these are

43
00:02:20,730 --> 00:02:24,809
actual locations that you can go to; they just have different names now.

44
00:02:25,330 --> 00:02:27,130
Chris: Yeah, Ur still exists.

45
00:02:27,210 --> 00:02:28,200
I think it’s in Iraq.

46
00:02:29,040 --> 00:02:34,090
Ned: I don’t like it [laugh] . Yeah… oh, well.

47
00:02:34,280 --> 00:02:34,899
Here we are.

48
00:02:34,990 --> 00:02:38,130
Let’s talk about another mythical thing that shouldn’t exist, but does.

49
00:02:38,679 --> 00:02:39,620
It’s BGP.

50
00:02:40,250 --> 00:02:43,210
Chris: I’m not going to lie, that is, like, top ten transitions for you.

51
00:02:43,820 --> 00:02:44,340
Ned: [laugh] . Thank you.

52
00:02:44,450 --> 00:02:45,770
Chris: Might even be top five.

53
00:02:46,290 --> 00:02:48,920
Ned: [laugh] . I felt really good about it, in part

54
00:02:48,920 --> 00:02:51,220
because it was completely organic and not planned.

55
00:02:51,570 --> 00:02:53,390
And now I’m ruining it by talking about it.

56
00:02:53,800 --> 00:02:55,460
So, another top five right there.

57
00:02:55,889 --> 00:02:56,680
Chris: Different five.

58
00:02:57,040 --> 00:02:57,579
Ned: Yes.

59
00:02:58,020 --> 00:02:58,819
So Chris—

60
00:02:58,959 --> 00:02:59,249
Chris: What?

61
00:02:59,430 --> 00:03:01,950
Ned: What’s your general feeling on BGP?

62
00:03:02,960 --> 00:03:04,650
Chris: Anytime people start talking about it

63
00:03:05,760 --> 00:03:08,200
enthusiastically, I break a glass and walk away.

64
00:03:08,910 --> 00:03:10,520
Ned: [laugh] . You don’t threaten them with it?

65
00:03:10,830 --> 00:03:12,410
Chris: No, no, no, I just want the distraction.

66
00:03:12,790 --> 00:03:14,539
I understand and respect this conversation,

67
00:03:14,540 --> 00:03:17,769
but I don’t need it to be in my life at all.

68
00:03:18,449 --> 00:03:22,770
Ned: It does seem like one of those mysteries of

69
00:03:22,770 --> 00:03:25,080
the faith when it comes to network engineering.

70
00:03:25,120 --> 00:03:28,740
Like, BGP, it’s overseen by wizards—

71
00:03:28,960 --> 00:03:29,419
Chris: Oh, yeah.

72
00:03:29,469 --> 00:03:30,449
Ned: And warlocks.

73
00:03:30,609 --> 00:03:33,420
Chris: There are robes involved, incantations.

74
00:03:33,849 --> 00:03:35,730
Ned: At least one animal sacrifice.

75
00:03:36,139 --> 00:03:37,320
Chris: But not, like, a cute animal.

76
00:03:37,440 --> 00:03:38,430
They’re not monsters.

77
00:03:38,750 --> 00:03:39,160
Ned: No.

78
00:03:39,820 --> 00:03:42,950
I’m trying to think of a non-cute animal, but they’re also adorable.

79
00:03:43,220 --> 00:03:44,829
Chris: Only when they’re made into a Squishable.

80
00:03:45,250 --> 00:03:46,059
Ned: Oh, that’s true.

81
00:03:46,620 --> 00:03:48,584
So, many Squish models.

82
00:03:48,830 --> 00:03:50,320
My house is infested with them.

83
00:03:50,330 --> 00:03:52,329
It’s a real Tribbles kind of situation.

84
00:03:53,090 --> 00:03:53,990
What were we talking about?

85
00:03:54,529 --> 00:03:55,439
Chris: Uh, peanut butter?

86
00:03:55,819 --> 00:03:56,019
Ned: Yes.

87
00:03:56,290 --> 00:03:56,940
Chris: No, not again.

88
00:03:56,980 --> 00:03:57,450
Not again.

89
00:03:57,510 --> 00:03:59,360
Ned: No, no, no, no, we’re not going down that again.

90
00:03:59,430 --> 00:04:04,740
Okay, so I want to start today’s episode with a story from 2019, a

91
00:04:04,740 --> 00:04:09,359
story that involves messing up the internet for, kind of, everyone.

92
00:04:09,780 --> 00:04:13,099
A story that begins with a small company in rural Pennsylvania.

93
00:04:13,490 --> 00:04:19,090
The main culprit: BGP, aka, Border Gateway Protocol.

94
00:04:19,790 --> 00:04:23,499
Chris, you may remember this, but for those who aren’t familiar, the

95
00:04:23,500 --> 00:04:27,530
small company involved is called Allegheny Technologies Incorporated.

96
00:04:28,090 --> 00:04:32,320
And like any good technology company, when they needed to set up internet

97
00:04:32,320 --> 00:04:37,990
service, they didn’t just contract with one ISP, but instead they got

98
00:04:37,990 --> 00:04:42,890
connectivity from two, one from Verizon and one from a provider called DQE.

99
00:04:43,259 --> 00:04:44,300
That’s smart, you know?

100
00:04:44,300 --> 00:04:46,759
If DQE goes down, they can still get out through

101
00:04:46,760 --> 00:04:49,450
Verizon and people can reach them, et cetera, et cetera.

102
00:04:49,570 --> 00:04:50,180
You get the idea.

103
00:04:50,920 --> 00:04:54,120
Unfortunately, through a series of configuration errors

104
00:04:54,150 --> 00:04:57,670
and incompetence or laziness on the part of Verizon—

105
00:04:58,220 --> 00:04:58,260
Chris: [gasp]

106
00:04:58,840 --> 00:05:04,890
Ned: —shocking, I know [laugh] —deep breaths—large swaths of clients on the

107
00:05:04,890 --> 00:05:10,230
internet suddenly had their traffic routed through DQE to Allegheny Inc.

108
00:05:10,690 --> 00:05:12,469
And then back out through Verizon.

109
00:05:13,230 --> 00:05:18,539
An article on Cloudflare’s website compared it to routing all of the

110
00:05:18,540 --> 00:05:23,270
traffic for a major highway through a small suburban development.

111
00:05:24,049 --> 00:05:25,880
I think that’s actually an understatement.

112
00:05:26,960 --> 00:05:29,730
This would be like taking all the traffic from all the

113
00:05:29,730 --> 00:05:33,210
major highways in the United States and putting them

114
00:05:33,240 --> 00:05:37,580
through one small street in, like, gridlock Philadelphia.

115
00:05:38,340 --> 00:05:42,449
Chris: Or, like an unpaved one lane road.

116
00:05:42,620 --> 00:05:44,219
Ned: In Old City [laugh] yes.

117
00:05:45,240 --> 00:05:50,489
DQE and Allegheny obviously did not have the capacity to handle such a

118
00:05:50,490 --> 00:05:55,270
ridiculous increase in traffic, so they started dropping packets like crazy, and

119
00:05:55,270 --> 00:06:00,169
I’d imagine that one or more routers in the path just completely melted down.

120
00:06:00,549 --> 00:06:06,060
Eventually Cloudflare was able to reach engineers at DQE and get the

121
00:06:06,060 --> 00:06:10,929
situation resolved, but even with the fix in place, it took a few hours for

122
00:06:10,929 --> 00:06:15,250
the global internet to converge on the updated and now corrected routing.

123
00:06:15,880 --> 00:06:19,920
The Cloudflare article also details three different ways that this

124
00:06:19,930 --> 00:06:25,330
particular incident could have been avoided, specifically, prefix limits,

125
00:06:25,340 --> 00:06:31,600
IRR filtering, and RPKI don’t worry about what those things are just yet.

126
00:06:31,889 --> 00:06:35,649
We will get to them later, and by later I mean, next episode.

127
00:06:36,219 --> 00:06:36,499
Chris: [laugh]

128
00:06:36,810 --> 00:06:37,410
.
Ned: Probably.

129
00:06:38,150 --> 00:06:41,660
We’re going to use this little tale that I’ve told as a touchstone

130
00:06:42,030 --> 00:06:46,619
for this and however many more episodes it takes me to cover BGP.

131
00:06:46,870 --> 00:06:48,250
Chris: My guess is ten.

132
00:06:48,259 --> 00:06:48,269
Ned: Ahhh.

133
00:06:49,340 --> 00:06:50,609
I mean, at least.

134
00:06:51,040 --> 00:06:51,580
Minimum.

135
00:06:52,010 --> 00:06:56,330
I also plan on bringing on a real BGP expert in a later

136
00:06:56,330 --> 00:06:59,590
episode who can help us understand how to operate BGP

137
00:07:00,380 --> 00:07:05,469
securely because—spoiler—it’s horribly insecure right now.

138
00:07:05,469 --> 00:07:06,434
Chris: Ahhh.

139
00:07:07,400 --> 00:07:09,580
Ned: Yeah, shocking, I know.

140
00:07:10,230 --> 00:07:14,900
But first, what the hell is BGP, and how can it wreck a whole person’s day?

141
00:07:15,320 --> 00:07:16,430
Chris: Or even half a person.

142
00:07:17,130 --> 00:07:18,200
Ned: BGP history.

143
00:07:19,060 --> 00:07:22,670
I recommend drinking during this portion [laugh] . Okay, so,

144
00:07:23,130 --> 00:07:27,580
as I said earlier, BGP, it stands for Border Gateway Protocol.

145
00:07:27,880 --> 00:07:30,830
There’s a border, and it involves gateway, and this is a protocol.

146
00:07:30,880 --> 00:07:33,020
It is exactly what it says on the tin.

147
00:07:33,310 --> 00:07:34,770
Chris: You needed 3500 words?

148
00:07:34,770 --> 00:07:35,710
You could have just said that?

149
00:07:36,400 --> 00:07:38,670
I thought this was going to be, like, a full episode.

150
00:07:38,889 --> 00:07:39,530
Ned: Oh, no, that’s it.

151
00:07:39,540 --> 00:07:40,030
We’re done.

152
00:07:40,059 --> 00:07:40,449
Chris: Yeah.

153
00:07:40,530 --> 00:07:41,500
Ned: Everybody can go home.

154
00:07:41,969 --> 00:07:42,890
I explained it all.

155
00:07:43,590 --> 00:07:46,080
Okay, everybody that’s still here, let’s get into it.

156
00:07:46,609 --> 00:07:51,239
So, it is the exterior gateway protocol that the internet uses to figure

157
00:07:51,240 --> 00:07:55,650
out how to get packets from a source to a destination, and then back again.

158
00:07:56,139 --> 00:07:58,700
To understand why BGP exists and how it

159
00:07:58,700 --> 00:08:00,989
functions, we’re going to have to go back in time.

160
00:08:01,690 --> 00:08:04,710
Grab your best leg warmers, your heather gray sweatshirt,

161
00:08:04,990 --> 00:08:08,039
and red bandana because it’s time to get totally ’80s.

162
00:08:09,800 --> 00:08:10,690
No comment on that?

163
00:08:10,950 --> 00:08:12,500
Chris: No I’m just a little offended that you

164
00:08:12,500 --> 00:08:15,680
used my current outfit as some kind of joke.

165
00:08:16,090 --> 00:08:17,990
Ned: It was an inspiration, if you will.

166
00:08:18,750 --> 00:08:21,969
As we covered in a previous episode about DNS, the modern

167
00:08:22,000 --> 00:08:26,110
internet grew out of ARPANET, and its replacement NSFNET.

168
00:08:26,620 --> 00:08:28,960
Chris: Which is totally different than NsfwNET,

169
00:08:29,260 --> 00:08:31,140
which we’ll talk about on a later episode.

170
00:08:31,310 --> 00:08:35,069
Ned: [laugh] . That’s behind the Patreon paywall.

171
00:08:35,189 --> 00:08:35,229
Chris: [laugh]

172
00:08:36,829 --> 00:08:38,029
.
Ned: Ned and Chris after dark.

173
00:08:38,669 --> 00:08:40,130
If you want that, let us know.

174
00:08:40,190 --> 00:08:43,555
I think it’d be awful, but you know, you’re willing to pay for it [laugh]

175
00:08:43,890 --> 00:08:46,610
.
Chris: Previous evidence has shown that no one will ever want that.

176
00:08:46,890 --> 00:08:47,110
Ned: Okay,

177
00:08:49,370 --> 00:08:51,580
good [laugh] . NSFNET was established by the National Science

178
00:08:51,580 --> 00:08:55,569
Foundation, and its original intention was to connect five

179
00:08:55,570 --> 00:09:00,220
supercomputers in the US and various campus networks, tie them all

180
00:09:00,220 --> 00:09:04,990
together using a backbone network that NSF would help fund and manage.

181
00:09:05,930 --> 00:09:10,909
The backbone network was run by a single entity, and used leased lines

182
00:09:10,910 --> 00:09:15,940
from telcos that were running at a blazing 56 kilobits per second.

183
00:09:15,950 --> 00:09:17,430
Chris: Oof, Mario Andretti.

184
00:09:18,320 --> 00:09:19,140
Ned: Scorching.

185
00:09:19,860 --> 00:09:24,610
If you had a 56k modem in the early-’90s, you had the

186
00:09:24,610 --> 00:09:29,500
same network bandwidth as NSFNET at its inception in 1986.

187
00:09:30,219 --> 00:09:31,189
You probably didn’t have a supercomputer,

188
00:09:31,559 --> 00:09:33,480
but I mean, you had the effective bandwidth.

189
00:09:34,809 --> 00:09:37,170
NSFNET wasn’t open to just anyone.

190
00:09:37,180 --> 00:09:41,300
You couldn’t dial up and, you know, put it on the little cradle thing

191
00:09:41,620 --> 00:09:46,319
for your modem; they had a process by which regional networks could join.

192
00:09:47,170 --> 00:09:51,759
And those regional networks in turn had to adhere to the acceptable

193
00:09:51,759 --> 00:09:58,220
use policy of NSFNET, which precluded using NSFNET for making money.

194
00:09:58,600 --> 00:10:03,080
This was supposed to be campuses, and universities, and educational

195
00:10:03,080 --> 00:10:06,970
institutions all coming together to do research and trade information.

196
00:10:06,980 --> 00:10:08,510
So, this wasn’t about making money.

197
00:10:08,900 --> 00:10:09,750
That comes later.

198
00:10:10,260 --> 00:10:14,829
The whole thing was overseen by Merit Network, which was a networking consortium

199
00:10:14,830 --> 00:10:19,979
out of Michigan, and they ran a network operation center, and they worked to

200
00:10:19,980 --> 00:10:24,440
design and implement the network connectivity that was used by the backbone.

201
00:10:25,219 --> 00:10:28,860
Since the NSFNET formed the backbone of all of these different

202
00:10:28,879 --> 00:10:32,520
networks and their interconnectivity, there was a hierarchy,

203
00:10:32,680 --> 00:10:37,310
and all inter-network traffic had to traverse this backbone.

204
00:10:37,570 --> 00:10:41,400
So, if Regional Network A wanted to talk to Regional Network B,

205
00:10:41,670 --> 00:10:44,999
it would go up [background noise] to the backbone—what was that?

206
00:10:45,210 --> 00:10:46,389
Chris: I didn’t drop my fidget toy.

207
00:10:46,469 --> 00:10:47,470
I don’t have a fidget toy.

208
00:10:47,480 --> 00:10:51,830
— Ned: [laugh] —it would send the traffic up to the backbone, and then the

209
00:10:51,840 --> 00:10:56,810
backbone would take it to Regional Network B, and send the traffic back down.

210
00:10:57,120 --> 00:11:00,380
So, it was a relatively simple network when it comes to the

211
00:11:01,030 --> 00:11:04,040
interconnectivity between all these regional networks and the supercomputer.

212
00:11:04,559 --> 00:11:07,989
The NSFNET knew all the connected networks and could pretty

213
00:11:08,000 --> 00:11:11,939
easily route traffic from one network to another, but it also came

214
00:11:11,940 --> 00:11:15,520
with the lack of resiliency and serious bandwidth constraints.

215
00:11:16,130 --> 00:11:19,020
You only had one connection to the other regional network, and if the

216
00:11:19,030 --> 00:11:22,490
backbone went down or was congested, you were kind of out of luck.

217
00:11:23,330 --> 00:11:28,569
NSFNET had to pretty quickly update their backbone from these 56 kilobit

218
00:11:28,600 --> 00:11:34,829
per second lines to T1 lines that ran at 1.5 megabits per second.

219
00:11:35,389 --> 00:11:37,170
That happened in 1988.

220
00:11:37,549 --> 00:11:41,410
And then they had to upgrade them again in 1991 to

221
00:11:41,500 --> 00:11:45,450
45 megabits per second, which was known as a T3 line.

222
00:11:46,050 --> 00:11:49,510
While it was possible to keep increasing the speed of the leased

223
00:11:49,520 --> 00:11:54,280
lines that formed the NSFNET backbone, additional lines were

224
00:11:54,340 --> 00:11:58,770
added, which introduced multiple paths for traffic to travel.

225
00:11:59,549 --> 00:12:03,610
At the same time, NSFNET was connecting with networks in other countries

226
00:12:03,810 --> 00:12:08,010
and to even more networks in the US, so the idea of handcrafting

227
00:12:08,040 --> 00:12:12,709
traffic routing tables to efficiently move traffic was no longer viable.

228
00:12:13,590 --> 00:12:17,819
Back in the early-’80s, the networking group at the IETF was aware

229
00:12:17,820 --> 00:12:21,910
of the looming issues behind the inter-network routing, and so they

230
00:12:21,910 --> 00:12:28,119
proposed what they called the Exterior Gateway Protocol in RFC 827.

231
00:12:28,510 --> 00:12:33,199
And that was in 1982, and then it was updated further in 1984.

232
00:12:34,229 --> 00:12:40,270
And EGPwas actually used by NSFNET, but it had some serious shortcomings,

233
00:12:40,340 --> 00:12:48,139
so in 1989, RFC 1105 proposed the Border Gateway Protocol to replace EGP.

234
00:12:48,830 --> 00:12:52,409
To make it even more confusing, all routing protocols that

235
00:12:52,449 --> 00:12:55,780
are inter-network routing protocols are called ‘exterior

236
00:12:55,810 --> 00:12:59,650
gateway protocols.’ That’s not going to be confusing at all.

237
00:13:00,250 --> 00:13:00,870
Chris: Definitely not.

238
00:13:01,300 --> 00:13:02,890
Ned: The important thing to understand is that

239
00:13:02,940 --> 00:13:05,750
EGP as its own standard has since been retired.

240
00:13:05,860 --> 00:13:10,340
So, you can refer to EGP as broadly any protocol

241
00:13:10,340 --> 00:13:12,599
that handles this inter-network traffic.

242
00:13:13,450 --> 00:13:16,820
BGP itself is sometimes referred to as the three-napkin

243
00:13:16,830 --> 00:13:21,310
protocol, as the original ideas that underpin it were scribbled

244
00:13:21,310 --> 00:13:25,310
out by two engineers in Austin across three ketchup napkins.

245
00:13:25,920 --> 00:13:27,669
There’s no ketchup on the actual napkins; they were

246
00:13:27,670 --> 00:13:30,790
just, I guess, at a fast food place that served fries,

247
00:13:30,790 --> 00:13:32,569
and you were supposed to put ketchup on the napkins.

248
00:13:32,620 --> 00:13:33,190
I don’t know.

249
00:13:33,400 --> 00:13:34,410
Weird terminology.

250
00:13:35,340 --> 00:13:37,680
Chris: Maybe the napkins were sponsored by big ketchup.

251
00:13:38,059 --> 00:13:38,459
Ned: Ohhh.

252
00:13:39,139 --> 00:13:39,719
Heinz.

253
00:13:39,800 --> 00:13:40,600
Got to watch out.

254
00:13:40,730 --> 00:13:42,389
They get their paws into everything.

255
00:13:42,550 --> 00:13:44,870
They’re red, yucky paws.

256
00:13:45,530 --> 00:13:47,329
That’s an awful visual, I’m sorry.

257
00:13:47,820 --> 00:13:50,680
So, while this story might seem apocryphal,

258
00:13:51,100 --> 00:13:53,530
they have actual pictures of the napkins.

259
00:13:53,820 --> 00:13:57,060
There’s no ketchup stains, but it does have the actual diagrams

260
00:13:57,080 --> 00:14:00,919
and sort of the flow for distributing routes in a BGP system.

261
00:14:01,170 --> 00:14:02,300
Chris: All right, I’m going to ignore you for a

262
00:14:02,300 --> 00:14:04,510
minute and actually look this up because I’m curious.

263
00:14:05,609 --> 00:14:06,290
Ned: [laugh] . Fair enough.

264
00:14:07,210 --> 00:14:13,000
BGP was not meant to be a long-term fix for the problems that NSFNET

265
00:14:13,770 --> 00:14:17,150
was experiencing, and that the larger internet would experience.

266
00:14:17,620 --> 00:14:21,100
It was just meant to be a relatively short-term fix to deal with

267
00:14:21,110 --> 00:14:25,080
the explosion of networks that were now forming the internet.

268
00:14:26,040 --> 00:14:29,570
The engineers really thought that they would come along later and replace

269
00:14:29,570 --> 00:14:33,930
it at some future point with a more robust and well-thought-out protocol.

270
00:14:33,990 --> 00:14:36,210
And that’s adorable.

271
00:14:36,970 --> 00:14:37,640
Chris: Still searching.

272
00:14:37,650 --> 00:14:38,910
I’m sure what you’re saying is interesting.

273
00:14:39,240 --> 00:14:39,720
Ned: Mm-hm.

274
00:14:40,510 --> 00:14:45,160
It’s a well-known fact that anything that you put into production, even if it’s

275
00:14:45,160 --> 00:14:51,949
supposed to be a temporary fix, will become a [laugh] a pillar of everything

276
00:14:51,950 --> 00:14:56,119
else that’s built later, and it’s going to be very hard to remove that pillar.

277
00:14:58,190 --> 00:14:59,260
BGP is no exception.

278
00:14:59,889 --> 00:15:03,860
They mapped it out in 1989, and we’re still waiting for its replacement.

279
00:15:04,650 --> 00:15:07,140
This is going to become important as we start to talk about

280
00:15:07,590 --> 00:15:11,530
BGP and its security controls, or its complete lack thereof.

281
00:15:11,920 --> 00:15:13,619
They didn’t think they needed them because

282
00:15:13,620 --> 00:15:15,479
this was supposed to be a stopgap measure.

283
00:15:16,240 --> 00:15:20,589
BGP was iterated on quickly, with version two coming in 1990.

284
00:15:20,650 --> 00:15:22,770
So, that’s a year later from the original idea.

285
00:15:23,110 --> 00:15:27,280
Version three came in 1991, and version four came in 1994.

286
00:15:28,219 --> 00:15:31,760
Version four is the current version of BGP in use by

287
00:15:31,770 --> 00:15:35,549
the internet today, so let’s talk about how it works.

288
00:15:35,830 --> 00:15:38,900
Unless you have some interesting information about these ketchup napkins.

289
00:15:39,170 --> 00:15:41,590
Chris: Are you sure it wasn’t called the two-napkin protocol?

290
00:15:41,890 --> 00:15:42,210
Ned: Nope.

291
00:15:42,250 --> 00:15:42,960
Three napkins.

292
00:15:43,130 --> 00:15:44,520
It had a picture of three napkins.

293
00:15:44,520 --> 00:15:47,339
It’s not the first thing to be drawn out on napkins, though.

294
00:15:47,670 --> 00:15:48,750
Because engineers—

295
00:15:48,759 --> 00:15:51,249
Chris: We could do a whole episode on things that were drawn out on napkins.

296
00:15:51,259 --> 00:15:54,370
Ned: [laugh] . Oh, and how they’re all universally terrible.

297
00:15:55,340 --> 00:15:55,360
[sigh]

298
00:15:56,060 --> 00:15:56,340
.
Chris: Anyway.

299
00:15:56,340 --> 00:15:56,360
Ned: So—

300
00:15:57,000 --> 00:15:58,109
Chris: Back to whatever it is we—

301
00:15:58,110 --> 00:15:58,390
Ned: BGP.

302
00:15:58,460 --> 00:16:00,010
Chris: Which was—oh right, BGP.

303
00:16:00,020 --> 00:16:00,750
That’s what you were saying.

304
00:16:00,790 --> 00:16:01,089
Okay.

305
00:16:01,200 --> 00:16:02,610
Ned: We’re going to—not napkins—

306
00:16:02,710 --> 00:16:02,900
Chris: I’m back.

307
00:16:02,900 --> 00:16:04,230
Ned: —but we can talk about napkins still.

308
00:16:04,360 --> 00:16:05,549
I have strong opinions.

309
00:16:06,170 --> 00:16:09,280
How expansive do we need to get here about BGP?

310
00:16:09,969 --> 00:16:13,230
I’m going to assume that most people listening

311
00:16:13,490 --> 00:16:15,730
know at least a bit about networking.

312
00:16:16,050 --> 00:16:17,310
At least, I hope so.

313
00:16:17,320 --> 00:16:21,680
Like, otherwise, why are you tuning into this podcast [laugh] ? Be super weird.

314
00:16:21,960 --> 00:16:22,469
Except for you.

315
00:16:22,469 --> 00:16:22,889
Hi, mom.

316
00:16:23,230 --> 00:16:24,990
Chris: Oh, don’t act like your mother listens.

317
00:16:25,170 --> 00:16:26,300
Ned: It’s cruel and true.

318
00:16:27,170 --> 00:16:31,200
So, I’m going to take it as a given that most people know what an IP address

319
00:16:31,200 --> 00:16:36,370
is, are vaguely aware of TCP and how it works, and have at least heard

320
00:16:36,370 --> 00:16:40,400
of routing protocols, even if you don’t understand any of them, even RIP.

321
00:16:41,300 --> 00:16:44,290
Maybe the best thing here would be a packet walk.

322
00:16:44,849 --> 00:16:51,300
How does a packet on my desktop make its way to pod.chaoslever.com.

323
00:16:51,310 --> 00:16:53,199
Just pulling an address out of the air.

324
00:16:53,580 --> 00:16:54,370
Chris: Totally random.

325
00:16:54,580 --> 00:16:55,300
Ned: Totally random.

326
00:16:55,860 --> 00:16:59,860
First, my desktop has to figure out the IP address to

327
00:16:59,870 --> 00:17:03,079
send the web request to, and that’s a function of DNS.

328
00:17:04,099 --> 00:17:08,210
And Chris, as you know, we did two whole last shows about DNS.

329
00:17:08,589 --> 00:17:09,409
Go look them up.

330
00:17:09,980 --> 00:17:10,589
Enjoy them.

331
00:17:11,240 --> 00:17:16,629
Pod.chaoslever.com is hosted on Podpage, which has a few

332
00:17:16,630 --> 00:17:24,099
different public IP addresses on the 216.239.32.0/19 network.

333
00:17:24,389 --> 00:17:25,430
Make sure you remember that.

334
00:17:25,440 --> 00:17:26,530
There will be a test later.

335
00:17:27,210 --> 00:17:30,470
Once I have an IP address, how does my

336
00:17:30,470 --> 00:17:33,550
desktop know where to send that web request?

337
00:17:33,820 --> 00:17:35,939
How does it actually route the packet there?

338
00:17:36,559 --> 00:17:39,789
Well, my desktop’s networking stack has a route table in it.

339
00:17:40,490 --> 00:17:43,190
If you’re on a Windows box like me, open up a

340
00:17:43,190 --> 00:17:47,020
terminal and run the command ‘route print-4’.

341
00:17:47,490 --> 00:17:51,359
That will give you all the routes stored locally for IPv4.

342
00:17:52,170 --> 00:17:57,969
On Linux, it’s probably something like ‘ip route list.’ On Mac, I have no idea.

343
00:17:57,969 --> 00:18:00,600
I think it’s also ‘ip route list’ or something similar?

344
00:18:00,750 --> 00:18:01,250
Chris: Correct.

345
00:18:01,660 --> 00:18:04,060
Ned: This list determines where a packet is

346
00:18:04,060 --> 00:18:07,370
sent, with the most specific entry winning.

347
00:18:07,860 --> 00:18:12,720
Now, since the website I’m trying to contact has a public IP address, my desktop

348
00:18:12,730 --> 00:18:18,440
is going to use what’s called the default route, which looks like 0.0.0.0, which

349
00:18:18,460 --> 00:18:26,700
in my case, points to the home router as the next hop, which is 192.168.1.1.

350
00:18:26,740 --> 00:18:27,620
I’m very creative.

351
00:18:27,650 --> 00:18:28,560
Yes, you’re welcome.

352
00:18:29,130 --> 00:18:32,879
Chances are that is the [laugh] gateway of your home router as well.

353
00:18:33,620 --> 00:18:38,199
Once my packet hits that router, it checks the route table there—or the

354
00:18:38,200 --> 00:18:42,380
router checks its route table—and decides where to send the traffic next.

355
00:18:43,320 --> 00:18:47,420
My router has a single WAN interface, and that when interface

356
00:18:47,429 --> 00:18:50,999
has a public IP address that was handed out by my ISP.

357
00:18:51,820 --> 00:18:55,700
There is a default route on my router that sends traffic to

358
00:18:55,700 --> 00:18:59,980
the next hop that my ISP lists, which is going to be some

359
00:19:00,020 --> 00:19:03,849
kind of router on their side that has its own routing table.

360
00:19:04,530 --> 00:19:09,649
My ISP is Verizon, and my packet may bounce around inside of the Verizon

361
00:19:09,660 --> 00:19:13,790
network for a while before emerging at one of their peering endpoints.

362
00:19:14,150 --> 00:19:16,100
And we’ll cover peering in a little bit.

363
00:19:16,590 --> 00:19:20,310
So, we’ve gone from my desktop to my home router to one

364
00:19:20,310 --> 00:19:22,840
of Verizon’s routers, and then it bounces around inside

365
00:19:22,950 --> 00:19:25,610
of their network until it emerges to go get to Podpage.

366
00:19:27,170 --> 00:19:30,650
That network—Verizon’s network that’s all the various routers that

367
00:19:30,650 --> 00:19:35,480
they control—is what’s referred to as an autonomous system, or AS.

368
00:19:36,180 --> 00:19:40,359
That network is privately managed by Verizon, and all traffic inside their

369
00:19:40,360 --> 00:19:45,909
network is routed using whatever Interior Gateway Protocol they want to use.

370
00:19:46,180 --> 00:19:46,510
That’s an IGP.

371
00:19:47,820 --> 00:19:48,129
Wooo.

372
00:19:48,750 --> 00:19:56,300
That could be ISIS, OSPF, or even an internal version of BGP called iBGP.

373
00:19:56,830 --> 00:19:59,090
We’re not going to get into that; just know it exists.

374
00:19:59,860 --> 00:20:02,450
That internal routing protocol is going to decide

375
00:20:02,460 --> 00:20:05,789
where my packet emerges from the Verizon network.

376
00:20:06,559 --> 00:20:11,970
The path that my packet takes once it hits the border between Verizon and other

377
00:20:11,990 --> 00:20:17,510
autonomous systems will depend on external BGP and how it makes decisions.

378
00:20:18,450 --> 00:20:22,899
Each autonomous system on the internet gets an AS number or ASN.

379
00:20:24,480 --> 00:20:30,130
The original ASN specification used 16 bits, so the

380
00:20:30,130 --> 00:20:36,429
maximum AS number was 65,355, because we count from zero.

381
00:20:37,210 --> 00:20:40,850
And just like IPv4, there is a range of ASNs

382
00:20:40,959 --> 00:20:43,640
that are reserved for private or internal use.

383
00:20:43,830 --> 00:20:48,000
So, if you were setting up iBGP, you would use those internal ASNs.

384
00:20:49,640 --> 00:20:53,370
The rest of them are managed by the internet Assigned Numbers Authority

385
00:20:53,389 --> 00:20:58,169
or IANA, which maybe has an acronym pronunciation, I’m not sure.

386
00:20:58,180 --> 00:20:59,210
Have you ever heard one?

387
00:21:01,360 --> 00:21:01,720
Chris: Uh, Jana?

388
00:21:01,980 --> 00:21:02,240
Ned: Ayana?

389
00:21:02,250 --> 00:21:02,260
Eh.

390
00:21:02,780 --> 00:21:03,370
It’s IANA.

391
00:21:03,370 --> 00:21:05,129
Chris: I think that was a Fleetwood Mac song.

392
00:21:05,710 --> 00:21:06,090
Ned: Nice.

393
00:21:07,300 --> 00:21:09,040
[sigh] . Wonder where they got that name,

394
00:21:09,460 --> 00:21:11,650
the internet Assigned Numbers Authority.

395
00:21:12,440 --> 00:21:13,820
They assign numbers.

396
00:21:14,750 --> 00:21:20,120
Blocks of ASNs are handed out from the IANA to regional

397
00:21:20,150 --> 00:21:23,580
internet registries, and those handle the actual assignment

398
00:21:23,630 --> 00:21:29,280
of ASNs to people who want ASNs, these regional networks.

399
00:21:29,910 --> 00:21:34,360
When BGP was first implemented 16 bits probably seemed like plenty,

400
00:21:34,820 --> 00:21:38,650
and also was what routers were capable of handling at the time.

401
00:21:39,230 --> 00:21:46,630
In 2012, RFC 6793 expanded ASN to use four octets, or 32 bits,

402
00:21:47,130 --> 00:21:51,430
which raised the number of available numbers to roughly 4 billion.

403
00:21:51,910 --> 00:21:52,880
Will that be enough?

404
00:21:53,309 --> 00:21:57,550
At the moment, current statistics show that regional internet registries

405
00:21:57,550 --> 00:22:02,919
have handed out 130,000 ASN, so, um… I think we’ll be all right, for a while.

406
00:22:03,400 --> 00:22:04,270
Chris: We’ll be good, I think.

407
00:22:04,270 --> 00:22:04,830
We’ll be good.

408
00:22:05,219 --> 00:22:07,990
Ned: This is very different than the lack of available public

409
00:22:09,080 --> 00:22:12,099
IPv4 addresses because it’s not like every device gets an ASN.

410
00:22:12,560 --> 00:22:15,070
It’s every large network gets one.

411
00:22:15,950 --> 00:22:21,099
Still, though, that’s 130,000 public-facing as NS that BGP

412
00:22:21,110 --> 00:22:23,930
has to worry about when it comes to routing your packets.

413
00:22:24,360 --> 00:22:25,389
This thing has to be scalable.

414
00:22:26,120 --> 00:22:27,449
So, how does it do that?

415
00:22:28,160 --> 00:22:30,229
Chris: I thought we already established that: magic.

416
00:22:30,510 --> 00:22:30,770
Ned: Yes.

417
00:22:31,190 --> 00:22:32,210
That’s essentially what it is.

418
00:22:32,240 --> 00:22:35,879
And if you want to stop there, and just know that that’s what BGP is responsible

419
00:22:35,880 --> 00:22:41,110
for, you can ignore the next, like, ten minutes [laugh] . To get into some of

420
00:22:41,110 --> 00:22:44,509
the detail—and we’re not going to get down to nitty gritty here, but just some

421
00:22:44,509 --> 00:22:49,560
of the detail here—BGP is what’s called a path vector-based routing protocol,

422
00:22:49,870 --> 00:22:55,230
which means that it decides on a specific path for a route-based on attributes.

423
00:22:55,770 --> 00:22:59,090
Vector is the direction and path is the selection.

424
00:22:59,690 --> 00:23:02,840
BGP doesn’t understand or care about things like

425
00:23:02,920 --> 00:23:07,080
bandwidth, or latency, or even hops, really.

426
00:23:07,670 --> 00:23:10,870
Instead, it has a path selection algorithm that walks

427
00:23:10,870 --> 00:23:14,190
through the attributes of each possible path for a packet,

428
00:23:14,599 --> 00:23:17,899
and then picks one based on the selection criteria.

429
00:23:18,700 --> 00:23:21,379
We’ll get into the actual process it uses in a

430
00:23:21,380 --> 00:23:24,189
moment, but where is it getting this information from?

431
00:23:24,959 --> 00:23:26,010
From its neighbors.

432
00:23:26,530 --> 00:23:27,929
Oh, they have neighbors.

433
00:23:27,980 --> 00:23:29,110
It’s like a community.

434
00:23:29,520 --> 00:23:31,460
And there’s also communities [laugh]

435
00:23:31,480 --> 00:23:33,989
.
Chris: I would just like to pause and remind everybody that Ned

436
00:23:34,050 --> 00:23:37,360
explicitly said he wasn’t going to get into the nitty-gritty.

437
00:23:37,590 --> 00:23:38,709
Ned: I’m not [laugh]

438
00:23:39,130 --> 00:23:40,269
.
Chris: That’s the thing.

439
00:23:41,980 --> 00:23:44,680
Ned: This is the high-level stuff [laugh] . It gets so much deeper.

440
00:23:44,890 --> 00:23:47,730
Chris: No, no, I just wanted to point that out to explain to people

441
00:23:47,830 --> 00:23:52,340
a little more justification as to why my run away screaming protocol

442
00:23:52,420 --> 00:23:56,420
is what I operate upon when BGP comes up in quiet conversation.

443
00:23:57,360 --> 00:23:57,649
Ned: Right.

444
00:23:57,649 --> 00:24:01,970
All right, so if I’m a BGP—I’m a router running BGP,

445
00:24:01,970 --> 00:24:06,090
you can call me a node—I form relationships with other

446
00:24:06,090 --> 00:24:09,340
routers running BGP through what’s called neighborships.

447
00:24:09,340 --> 00:24:11,605
I don’t like the term, but apparently it’s used.

448
00:24:11,605 --> 00:24:12,670
Chris: Please tell me that’s not real.

449
00:24:12,830 --> 00:24:13,520
Ned: That’s real.

450
00:24:13,910 --> 00:24:14,529
I’m sorry.

451
00:24:15,009 --> 00:24:18,049
Setting up a neighborship is very, very simple.

452
00:24:18,370 --> 00:24:21,149
Let’s say we’ve got two routers: Router A and Router B.

453
00:24:21,720 --> 00:24:21,996
On Router—

454
00:24:21,996 --> 00:24:23,080
Chris: I just got—oh, my God.

455
00:24:23,320 --> 00:24:23,610
Ned: What?

456
00:24:24,270 --> 00:24:24,590
Chris: Neighborship?

457
00:24:24,590 --> 00:24:24,600
Ned: Neighborship.

458
00:24:26,849 --> 00:24:30,950
I heard it first, and that was like that can’t possibly be the real term.

459
00:24:31,639 --> 00:24:34,689
They’re also called peers, and I like that better, but

460
00:24:34,709 --> 00:24:38,110
that gets into the difference between peering and transit.

461
00:24:38,590 --> 00:24:39,669
And so…

462
00:24:39,969 --> 00:24:42,690
Chris: Can you hold on for one second, I got to go get a glass.

463
00:24:44,550 --> 00:24:45,409
Ned: [laugh] . Smash it real hard.

464
00:24:47,179 --> 00:24:50,159
[sigh] . The problem is that we use the same terms to mean

465
00:24:50,160 --> 00:24:52,990
too many different things in technology, and so sometimes

466
00:24:52,990 --> 00:24:56,010
we just got to make up a word, and it’s not always good.

467
00:24:56,990 --> 00:24:57,490
Anyway.

468
00:24:58,740 --> 00:25:01,320
So, let’s say I have two routers: Router A, Router B.

469
00:25:01,600 --> 00:25:06,829
On Router A, I tell it the IP address of Router B and its ASN.

470
00:25:06,829 --> 00:25:13,810
And then over on Router B, I tell it the IP address of Router A and its ASN.

471
00:25:13,820 --> 00:25:16,600
On Router A, I add any networks that I want to

472
00:25:16,620 --> 00:25:21,560
advertise, and same thing for Router B, and that’s it.

473
00:25:22,360 --> 00:25:25,480
The two routers will establish a TCP connection over

474
00:25:25,480 --> 00:25:29,449
port 179, and start exchanging route information.

475
00:25:29,959 --> 00:25:33,200
Each router will share the networks that it is advertising

476
00:25:33,380 --> 00:25:36,470
and any networks it learned about from other routers.

477
00:25:37,250 --> 00:25:41,600
And BGP only sends messages across that link

478
00:25:41,650 --> 00:25:44,210
when there’s an update to its advertised routes.

479
00:25:44,230 --> 00:25:48,910
So, unlike something like RIP that, every 30 seconds goes, “Here’s all my

480
00:25:48,910 --> 00:25:54,630
routes.” “Here’s all my routes.” That would be bad and awful, so BGP just

481
00:25:54,710 --> 00:25:59,200
sends information when something changes about one of the advertised routes.

482
00:25:59,550 --> 00:26:05,239
Otherwise, just hangs out, chills, plays Pinochle, and every 30

483
00:26:05,239 --> 00:26:07,639
or 60 seconds, it sends a keep-alive saying, “Yep, I’m still here.

484
00:26:07,770 --> 00:26:10,260
I got nothing new to say.” Kind of like you, Chris.

485
00:26:10,330 --> 00:26:13,760
I check in every 30 to 60 seconds to make sure you’re still here [laugh]

486
00:26:14,170 --> 00:26:15,870
.
Chris: As usual, I’ve got nothing new to say.

487
00:26:16,899 --> 00:26:20,839
Ned: [laugh] . Indeed, the routing decisions made by Router A

488
00:26:20,839 --> 00:26:24,449
will depend on the advertisements it gets from its neighbors.

489
00:26:25,000 --> 00:26:28,160
So, so far, we’ve just got Router A and B, but we

490
00:26:28,160 --> 00:26:31,800
can add additional routers as neighbors: C, D, and E.

491
00:26:32,510 --> 00:26:37,640
Router A learns about routes to different networks from all of these neighbors,

492
00:26:37,950 --> 00:26:41,820
and then makes path-based decisions based on the routes that it learned.

493
00:26:41,820 --> 00:26:47,510
BGP network advertisements can have a ton of attributes, but

494
00:26:47,510 --> 00:26:51,080
there’s really only about eight standard ones that are commonly

495
00:26:51,080 --> 00:26:55,020
used, and honestly, there’s probably only about three or four

496
00:26:55,020 --> 00:26:58,510
that actually matter, so we’re just going to talk about those.

497
00:26:59,000 --> 00:26:59,539
Chris: Thank God.

498
00:26:59,920 --> 00:27:00,380
Ned: Yes.

499
00:27:01,139 --> 00:27:06,750
Local preference is an attribute that lets you prefer one route over another.

500
00:27:07,120 --> 00:27:11,029
I could give Router B preference over Router C.

501
00:27:11,759 --> 00:27:12,639
Very straightforward.

502
00:27:13,469 --> 00:27:17,039
If both routers are an option for a given destination,

503
00:27:17,619 --> 00:27:19,980
the one with the higher preference gets the nod.

504
00:27:20,040 --> 00:27:23,880
So, Router B would get—I’d send my traffic to Router B instead of Router C.

505
00:27:24,540 --> 00:27:27,639
That’s useful if, say, the link on Router B is a

506
00:27:27,650 --> 00:27:30,680
ten gig link and the link to Router C is one gig.

507
00:27:30,990 --> 00:27:34,050
I probably want to use the link to Router B if I can help it.

508
00:27:34,590 --> 00:27:36,750
BGP doesn’t know about link speed, but you do.

509
00:27:37,460 --> 00:27:39,989
The next attribute is AS path length.

510
00:27:41,180 --> 00:27:44,970
The AS path is a list of every autonomous system a

511
00:27:44,970 --> 00:27:48,210
packet will pass through, from source to destination.

512
00:27:48,849 --> 00:27:51,560
So, when a router learns about a route from one of its

513
00:27:51,570 --> 00:27:55,199
neighbors and wants to share that route with the next router

514
00:27:55,200 --> 00:27:59,890
in line, it tacks on its AS number to the end of the AS path.

515
00:28:00,720 --> 00:28:05,909
So, the more autonomous systems a route travels through, the longer the

516
00:28:05,910 --> 00:28:12,500
path length becomes, and that makes it less preferred as a path to choose.

517
00:28:13,080 --> 00:28:16,350
That doesn’t mean that the shorter AS path route is

518
00:28:16,360 --> 00:28:20,370
actually faster, it just means that it’s shorter.

519
00:28:20,950 --> 00:28:24,720
Inside that autonomous system, there could be way more hops

520
00:28:24,860 --> 00:28:28,819
between the ingress and egress routers, so that’s why you might

521
00:28:28,820 --> 00:28:32,090
want to use something like local preference if you know that,

522
00:28:32,100 --> 00:28:36,500
say, Joe’s ISP and Crab Shack kind of sucks at passing traffic.

523
00:28:36,980 --> 00:28:38,160
Chris: Phenomenal crabs, though.

524
00:28:38,500 --> 00:28:39,540
Ned: Really good crabs.

525
00:28:40,120 --> 00:28:42,210
The last attribute is the router ID.

526
00:28:42,660 --> 00:28:48,899
If all other attributes for a route are the same, the lower router ID wins.

527
00:28:49,830 --> 00:28:51,590
Where does that router ID come from?

528
00:28:52,410 --> 00:28:53,290
That’s weird.

529
00:28:53,300 --> 00:28:54,540
It’s kind of up to the admin.

530
00:28:55,300 --> 00:28:58,960
The form looks exactly like an IPv4 address, and it’s

531
00:28:58,960 --> 00:29:01,940
usually set to the first loopback interface on the router.

532
00:29:02,759 --> 00:29:06,500
The router ID needs to be unique within an individual

533
00:29:06,590 --> 00:29:09,530
autonomous system and unique among its peers.

534
00:29:10,550 --> 00:29:13,290
So, you know, you can’t have two routers in the same

535
00:29:13,540 --> 00:29:17,349
neighborship—so sorry—that have the same router ID.

536
00:29:18,190 --> 00:29:19,209
Bad things will happen.

537
00:29:20,040 --> 00:29:25,890
Speaking of peers—back to our packet walk—the request has now left the Verizon

538
00:29:25,890 --> 00:29:30,530
network and it’s gone to some other network based on advertised routes.

539
00:29:31,260 --> 00:29:35,330
The Verizon router made a decision based on the path attributes for each route.

540
00:29:35,849 --> 00:29:37,439
Where is this all happening?

541
00:29:38,139 --> 00:29:40,070
Physically, where’s this actually happening?

542
00:29:40,770 --> 00:29:45,260
It’s at an internet exchange point of some kind—most likely—where a peering

543
00:29:45,270 --> 00:29:49,170
or transit arrangement has been created between two or more routers.

544
00:29:50,080 --> 00:29:54,380
So, at this point, we’re kind of done with BGP, but that led me

545
00:29:54,630 --> 00:29:58,470
to another rabbit hole, which is okay, I understand the theory.

546
00:29:58,580 --> 00:30:00,600
Where’s all this stuff actually happening?

547
00:30:01,350 --> 00:30:04,130
And it’s happening at these dedicated colocation

548
00:30:04,130 --> 00:30:06,430
facilities and internet exchange points.

549
00:30:06,570 --> 00:30:11,499
They used to be called NAPs, which was like Network Access… something.

550
00:30:11,980 --> 00:30:16,060
And there was a place called a SUPERNAP, down in Virginia,

551
00:30:16,139 --> 00:30:19,530
I think, where there, like, a metric shit ton of these

552
00:30:19,730 --> 00:30:22,510
different ISP lines all coming into the same facility.

553
00:30:22,920 --> 00:30:24,430
I don’t know if it’s still called the SUPERNAP.

554
00:30:25,050 --> 00:30:27,430
Chris: I think I’m lined up for a super nap, if you know what I’m saying.

555
00:30:27,530 --> 00:30:28,320
Ned: I do.

556
00:30:28,380 --> 00:30:29,960
I set you up for that one.

557
00:30:29,960 --> 00:30:30,540
You’re welcome.

558
00:30:31,259 --> 00:30:33,769
So, this isn’t entirely relevant to BGP,

559
00:30:33,790 --> 00:30:36,019
except it filled in some mental gaps for me.

560
00:30:36,920 --> 00:30:39,220
How are two autonomous systems connected?

561
00:30:39,880 --> 00:30:43,480
Well, they’re connected by two routers, but there’s two basic physical

562
00:30:43,490 --> 00:30:47,970
topologies that are followed: you can have a public peering arrangements

563
00:30:47,970 --> 00:30:52,280
between a bunch of ASs, and that usually happens at one of these internet

564
00:30:52,289 --> 00:30:57,460
exchange points, or a rented colocation space from a neutral provider.

565
00:30:57,470 --> 00:31:00,900
Think Equinix, or Digital Realty would be examples.

566
00:31:01,920 --> 00:31:04,960
Each ISPs router will be connected into a common

567
00:31:04,960 --> 00:31:09,190
switch fabric, and peering relationships will be formed

568
00:31:09,219 --> 00:31:11,849
between each router that’s connected into the switch.

569
00:31:12,559 --> 00:31:15,639
So, they’re all exchanging routing information with each other.

570
00:31:16,480 --> 00:31:21,780
The other option is a direct router-to-router connection between two ASs.

571
00:31:22,179 --> 00:31:23,749
That’s known as private peering.

572
00:31:24,590 --> 00:31:27,860
If you’ve ever been involved in setting up a connection to AWS with

573
00:31:27,860 --> 00:31:31,879
Direct Connect, or Azure with Express Connect—or Express Route.

574
00:31:32,070 --> 00:31:36,600
Sorry, stupid names—both of those use private peering and a

575
00:31:36,600 --> 00:31:41,490
direct physical connection from your network to Azure or AWS.

576
00:31:41,620 --> 00:31:44,880
You have to set up what’s called a cross-connect, which is essentially, from

577
00:31:44,880 --> 00:31:49,720
your router—or a router that you’re leasing through your ISP—it’s a cable

578
00:31:49,720 --> 00:31:54,280
that runs to the router or the switch that the cloud router is hooked into.

579
00:31:55,190 --> 00:31:58,420
There’s also a slight difference between peering and transit.

580
00:31:58,960 --> 00:32:01,670
Peering means that I can send traffic to your network,

581
00:32:01,700 --> 00:32:04,190
and you can send traffic to my network, and we don’t

582
00:32:04,190 --> 00:32:07,399
charge each other any money for accepting that traffic.

583
00:32:08,170 --> 00:32:11,860
Consider a scenario where you have a few different regional

584
00:32:11,860 --> 00:32:14,770
networks that want to pass network traffic between each other,

585
00:32:14,820 --> 00:32:17,669
rather than sending the traffic across a transit network.

586
00:32:18,620 --> 00:32:22,819
They can all rent space together at a colocation data center, and set up a

587
00:32:22,900 --> 00:32:26,889
public peering arrangement where they’ll exchange routes and paths traffic.

588
00:32:27,320 --> 00:32:30,689
It’s beneficial for all the networks involved to be able to

589
00:32:30,700 --> 00:32:34,310
communicate freely, and there’s a verbal peering agreement,

590
00:32:34,429 --> 00:32:37,380
or handshake agreement, to not be an asshole about it.

591
00:32:38,520 --> 00:32:38,550
Chris: [laugh]

592
00:32:38,940 --> 00:32:39,490
.
Ned: I’m serious.

593
00:32:39,490 --> 00:32:41,099
They’re like, “Just don’t be a dick.

594
00:32:41,440 --> 00:32:44,790
Don’t overwhelm my network with traffic that’s destined for somewhere else.

595
00:32:44,800 --> 00:32:46,860
Don’t try to use me as a transit network, and

596
00:32:46,860 --> 00:32:50,439
we’ll all get along.” And yes, I’m very serious.

597
00:32:51,020 --> 00:32:55,370
A study in 2011 showed that only 0.05% of

598
00:32:55,370 --> 00:32:58,309
peering agreements were actual written contracts.

599
00:32:59,199 --> 00:33:02,920
I imagine that’s grown in the last 13 years with the explosion of cloud

600
00:33:02,920 --> 00:33:06,470
where, like, if you want a peering agreement with Azure, it is absolutely

601
00:33:06,470 --> 00:33:10,589
a written contract, but from what I’ve heard, that’s in the minority.

602
00:33:10,630 --> 00:33:13,050
These regional networks are still using just

603
00:33:13,050 --> 00:33:16,350
handshakes and, like, firm nods at each other.

604
00:33:17,190 --> 00:33:20,000
Transit relationships are where a network is paying

605
00:33:20,000 --> 00:33:22,700
another network for access to the general internet.

606
00:33:23,820 --> 00:33:27,500
There’s a few giant tier one operators that lots of other

607
00:33:27,500 --> 00:33:30,710
networks pay to transmit their traffic across the internet.

608
00:33:31,540 --> 00:33:35,589
A regional network in, say, Luxembourg is unlikely to have a

609
00:33:35,590 --> 00:33:39,680
direct peering relationship with a network in Omaha, Nebraska,

610
00:33:40,240 --> 00:33:42,850
so that traffic needs to transit through another provider.

611
00:33:43,520 --> 00:33:45,870
That provider doesn’t see a mutual benefit for

612
00:33:45,880 --> 00:33:48,530
providing that transit, so they charge for it.

613
00:33:49,770 --> 00:33:52,530
Tier one networks are those networks that can reach all

614
00:33:52,530 --> 00:33:55,870
other networks on the internet using settlement-free peering.

615
00:33:56,690 --> 00:34:01,060
Tier two networks have to pay for at least some transit to other networks.

616
00:34:01,630 --> 00:34:05,680
And tier three networks pay for transit to all networks.

617
00:34:06,650 --> 00:34:09,210
Who are these mysterious tier one providers?

618
00:34:09,420 --> 00:34:11,570
Well, Verizon is one.

619
00:34:12,190 --> 00:34:17,920
So, is AT&T, and Comcast, and Lumen, who you might not have

620
00:34:17,920 --> 00:34:20,950
heard of, but that’s because they used to be called CenturyLink.

621
00:34:21,489 --> 00:34:23,360
They changed their name because they had a

622
00:34:23,360 --> 00:34:25,540
terrible reputation, and that was going to help.

623
00:34:26,110 --> 00:34:30,580
They’re also the biggest tier one provider in the world as far as I can tell.

624
00:34:31,659 --> 00:34:35,490
Since Verizon is a tier one network—going back to our packet walk,

625
00:34:35,490 --> 00:34:39,980
and to round this all out—Since it’s a tier one network, my packet

626
00:34:40,010 --> 00:34:44,100
doesn’t have to go across another transit network to get to Podpage.

627
00:34:45,320 --> 00:34:47,209
I looked it up, and Podpage is actually

628
00:34:47,209 --> 00:34:49,759
using Google Cloud to host their service.

629
00:34:50,489 --> 00:34:55,530
So, when I looked at it, the ASNs for Podpage—or the public IP

630
00:34:55,530 --> 00:35:00,399
addresses they’re using—lined up to Google’s ASNs, and so my

631
00:35:00,400 --> 00:35:03,830
little packet will go directly from Verizon network to Google.

632
00:35:04,059 --> 00:35:05,980
No other transit required.

633
00:35:06,000 --> 00:35:08,090
And in fact, that’s exactly what it does.

634
00:35:08,550 --> 00:35:13,040
Through the magic of traceroute, I can see my packet hop from Verizon, to

635
00:35:13,040 --> 00:35:19,110
Verizon business, to Google, to another Google AS because they have multiples.

636
00:35:19,929 --> 00:35:23,139
BGP has done its job, and all as well with the internet.

637
00:35:23,930 --> 00:35:24,770
But what if it isn’t?

638
00:35:25,500 --> 00:35:25,570
Chris: [laugh]

639
00:35:26,260 --> 00:35:27,750
.
Ned: How can BGP break?

640
00:35:28,090 --> 00:35:29,810
And can people do it on purpose?

641
00:35:30,520 --> 00:35:31,870
The answers will shock you.

642
00:35:32,360 --> 00:35:36,710
I—they probably won’t shock you [laugh] . The answer is there

643
00:35:36,710 --> 00:35:41,230
are many ways to break BGP, and yes, it can be done on purpose.

644
00:35:41,500 --> 00:35:46,770
But that is the story for another time, a future episode, and a guest

645
00:35:46,920 --> 00:35:50,279
who’s more eloquent than me at explaining security issues with BGP.

646
00:35:50,279 --> 00:35:50,299
[sigh]

647
00:35:52,820 --> 00:35:53,499
. You feel better?

648
00:35:53,700 --> 00:35:54,379
Chris: No.

649
00:35:54,570 --> 00:35:57,359
Ned: Have I demystified some of the magic of the internet for you?

650
00:35:57,660 --> 00:35:59,290
Chris: I’m more confused than when I started,

651
00:35:59,290 --> 00:36:00,780
and I didn’t think that was possible.

652
00:36:01,110 --> 00:36:01,350
Ned: Good.

653
00:36:01,350 --> 00:36:04,560
Then my job… [laugh] is a complete success.

654
00:36:04,670 --> 00:36:05,610
My job here is done.

655
00:36:06,550 --> 00:36:07,890
Hey, thanks for listening or something.

656
00:36:07,890 --> 00:36:10,610
I guess you found it worthwhile enough if you made it all the way to the

657
00:36:10,610 --> 00:36:14,219
end, so congratulations to you, friend, you accomplished something today.

658
00:36:14,670 --> 00:36:15,170
Maybe.

659
00:36:15,790 --> 00:36:18,640
Now, you can sit on the couch, think about the magic of

660
00:36:18,910 --> 00:36:22,049
BGP, and just get hopelessly confused like the rest of us.

661
00:36:22,330 --> 00:36:22,860
You’ve earned it.

662
00:36:23,390 --> 00:36:26,220
You can find more about this show by going to our LinkedIn page, just

663
00:36:26,220 --> 00:36:30,680
search ‘Chaos Lever,’ or go to the website, pod.chaoslever.com, where

664
00:36:30,680 --> 00:36:34,170
you’ll find show notes, blog posts, and general tomfoolery, and you

665
00:36:34,170 --> 00:36:37,590
can leave a comment that we might read on the Tech News of the Week.

666
00:36:37,980 --> 00:36:40,570
We’ll be back next week to see what fresh hell is upon us.

667
00:36:40,730 --> 00:36:41,620
Ta-ta for now.

668
00:36:49,740 --> 00:36:53,290
Chris: And just to make things even more unnecessarily confusing,

669
00:36:53,879 --> 00:36:57,530
it was originally called the two-napkin protocol, when it was first

670
00:36:57,550 --> 00:37:03,460
proposed and first published in a Cisco internal blog in 1989.

671
00:37:04,030 --> 00:37:06,190
Ned: [laugh] . And then there was a third napkin arose?

672
00:37:06,480 --> 00:37:07,030
Oh, no.

673
00:37:07,309 --> 00:37:08,970
Chris: Look, I mean, math is hard.