Welcome to the Chaos
June 20, 2024

Break the Glass and Walk Away: A (VERY) Brief Overview of BGP

Break the Glass and Walk Away: A (VERY) Brief Overview of BGP

Ned and Chris give a very brief overview of BGP, its place in the history of the internet, and how it works today.

It’s a Confusing Day in the Neighborship

Sure, Kim Kardashian broke the internet that one time, but she’s not the only one capable of such a feat. In this episode, Ned and Chris recount the tale of how Verizon and a BGP optimizer took large swaths of the internet offline in 2019. This leads them into the intricacies of border gateway protocols, tracing its evolution from a temporary solution for NSFNET in the 1980s to a foundational element of internet routing today. Along the way, they explore version four's operational details, including key attributes like local preferences and AS path length.   


Links

Transcript
1
00:00:00,530 --> 00:00:04,910
Ned: I made the unfortunate decision to just use chaoslever.com

2
00:00:05,170 --> 00:00:08,220
and no subdomain [laugh] . So, there’s two problems.

3
00:00:08,320 --> 00:00:09,049
Chris: One is Ned.

4
00:00:09,330 --> 00:00:13,179
Ned: One is me [laugh] . I am always the perennial problem.

5
00:00:13,710 --> 00:00:19,390
They go with the assumption you want to use ‘www’ as your subdomain, so they

6
00:00:19,390 --> 00:00:25,689
do support setting your apex record—the at record—for chaoslever.com to—

7
00:00:25,689 --> 00:00:25,709
Chris: [loud snores]

8
00:00:25,709 --> 00:00:30,050
.
Ned: [laugh] you’re very—you’re cruel.

9
00:00:30,050 --> 00:00:31,280
Chris: [more loud snores]

10
00:00:31,280 --> 00:00:33,730
.
Ned: [laugh] . Goddammit.

11
00:00:43,200 --> 00:00:46,000
Hello, alleged human, and welcome to the Chaos Lever podcast.

12
00:00:46,000 --> 00:00:48,550
My name is Ned, and I’m definitely not a robot.

13
00:00:48,620 --> 00:00:55,500
I am a sentient, real human person with feelings, dreams, and just the general

14
00:00:55,500 --> 00:00:59,710
desire to smoothly migrate a website and not have everything go to shit.

15
00:01:00,670 --> 00:01:04,259
[sigh] . With me is Chris, who was also here?

16
00:01:04,719 --> 00:01:05,279
Mostly.

17
00:01:05,669 --> 00:01:08,180
Chris: Have you ever read my favorite philosophical tract?

18
00:01:08,330 --> 00:01:08,880
Ned: I don’t know.

19
00:01:09,070 --> 00:01:09,830
Chris: It’s a short one.

20
00:01:09,839 --> 00:01:10,870
It’s ancient text.

21
00:01:10,910 --> 00:01:12,929
It was translated, I think, from the Sumerian.

22
00:01:13,250 --> 00:01:13,680
Ned: Okay.

23
00:01:14,080 --> 00:01:17,450
Chris: And the title is, “Whatever You’re Trying To Do,”

24
00:01:17,760 --> 00:01:22,929
Sumerian question mark, dot dot dot, “Yeah, Good Luck With That.”

25
00:01:25,719 --> 00:01:27,210
Ned: [laugh] . Wow, that is a philosophy that

26
00:01:27,210 --> 00:01:29,940
is just broadly applicable to every situation.

27
00:01:30,709 --> 00:01:33,479
Chris: I believe—and this is, you know, it’s really tough

28
00:01:33,500 --> 00:01:36,200
with archaeology because you get a lot of incomplete records—

29
00:01:36,680 --> 00:01:37,140
Ned: It’s true.

30
00:01:37,610 --> 00:01:42,060
Chris: But I believe, and modern science agrees with me on this, the

31
00:01:42,060 --> 00:01:45,550
follow-up book to that is, “I Fucking Told You It Wasn’t Going To Work.”

32
00:01:46,670 --> 00:01:48,429
Ned: [laugh] . I’m glad to know that the

33
00:01:48,500 --> 00:01:51,770
Sumerians were so blunt in their philosophy.

34
00:01:52,030 --> 00:01:53,370
There’s nothing aesthetic about it.

35
00:01:53,640 --> 00:01:54,800
I appreciate it.

36
00:01:55,090 --> 00:01:57,820
Chris: I mean, it’s really, really hot in [Sumaria]

37
00:01:58,219 --> 00:01:58,630
.
Ned: Is it?

38
00:01:59,210 --> 00:01:59,800
Chris: Sure.

39
00:02:00,860 --> 00:02:05,289
Ned: Whenever people would bring up ancient civilizations, Babylon,

40
00:02:05,299 --> 00:02:11,600
Sumaria, et cetera, I always thought of those as in, sort of, some

41
00:02:11,800 --> 00:02:16,700
mythical place that didn’t actually exist on the modern map of today, and

42
00:02:16,700 --> 00:02:20,700
I’m sad to realize at some point that was not true, and that these are

43
00:02:20,730 --> 00:02:24,809
actual locations that you can go to; they just have different names now.

44
00:02:25,330 --> 00:02:27,130
Chris: Yeah, Ur still exists.

45
00:02:27,210 --> 00:02:28,200
I think it’s in Iraq.

46
00:02:29,040 --> 00:02:34,090
Ned: I don’t like it [laugh] . Yeah… oh, well.

47
00:02:34,280 --> 00:02:34,899
Here we are.

48
00:02:34,990 --> 00:02:38,130
Let’s talk about another mythical thing that shouldn’t exist, but does.

49
00:02:38,679 --> 00:02:39,620
It’s BGP.

50
00:02:40,250 --> 00:02:43,210
Chris: I’m not going to lie, that is, like, top ten transitions for you.

51
00:02:43,820 --> 00:02:44,340
Ned: [laugh] . Thank you.

52
00:02:44,450 --> 00:02:45,770
Chris: Might even be top five.

53
00:02:46,290 --> 00:02:48,920
Ned: [laugh] . I felt really good about it, in part

54
00:02:48,920 --> 00:02:51,220
because it was completely organic and not planned.

55
00:02:51,570 --> 00:02:53,390
And now I’m ruining it by talking about it.

56
00:02:53,800 --> 00:02:55,460
So, another top five right there.

57
00:02:55,889 --> 00:02:56,680
Chris: Different five.

58
00:02:57,040 --> 00:02:57,579
Ned: Yes.

59
00:02:58,020 --> 00:02:58,819
So Chris—

60
00:02:58,959 --> 00:02:59,249
Chris: What?

61
00:02:59,430 --> 00:03:01,950
Ned: What’s your general feeling on BGP?

62
00:03:02,960 --> 00:03:04,650
Chris: Anytime people start talking about it

63
00:03:05,760 --> 00:03:08,200
enthusiastically, I break a glass and walk away.

64
00:03:08,910 --> 00:03:10,520
Ned: [laugh] . You don’t threaten them with it?

65
00:03:10,830 --> 00:03:12,410
Chris: No, no, no, I just want the distraction.

66
00:03:12,790 --> 00:03:14,539
I understand and respect this conversation,

67
00:03:14,540 --> 00:03:17,769
but I don’t need it to be in my life at all.

68
00:03:18,449 --> 00:03:22,770
Ned: It does seem like one of those mysteries of

69
00:03:22,770 --> 00:03:25,080
the faith when it comes to network engineering.

70
00:03:25,120 --> 00:03:28,740
Like, BGP, it’s overseen by wizards—

71
00:03:28,960 --> 00:03:29,419
Chris: Oh, yeah.

72
00:03:29,469 --> 00:03:30,449
Ned: And warlocks.

73
00:03:30,609 --> 00:03:33,420
Chris: There are robes involved, incantations.

74
00:03:33,849 --> 00:03:35,730
Ned: At least one animal sacrifice.

75
00:03:36,139 --> 00:03:37,320
Chris: But not, like, a cute animal.

76
00:03:37,440 --> 00:03:38,430
They’re not monsters.

77
00:03:38,750 --> 00:03:39,160
Ned: No.

78
00:03:39,820 --> 00:03:42,950
I’m trying to think of a non-cute animal, but they’re also adorable.

79
00:03:43,220 --> 00:03:44,829
Chris: Only when they’re made into a Squishable.

80
00:03:45,250 --> 00:03:46,059
Ned: Oh, that’s true.

81
00:03:46,620 --> 00:03:48,584
So, many Squish models.

82
00:03:48,830 --> 00:03:50,320
My house is infested with them.

83
00:03:50,330 --> 00:03:52,329
It’s a real Tribbles kind of situation.

84
00:03:53,090 --> 00:03:53,990
What were we talking about?

85
00:03:54,529 --> 00:03:55,439
Chris: Uh, peanut butter?

86
00:03:55,819 --> 00:03:56,019
Ned: Yes.

87
00:03:56,290 --> 00:03:56,940
Chris: No, not again.

88
00:03:56,980 --> 00:03:57,450
Not again.

89
00:03:57,510 --> 00:03:59,360
Ned: No, no, no, no, we’re not going down that again.

90
00:03:59,430 --> 00:04:04,740
Okay, so I want to start today’s episode with a story from 2019, a

91
00:04:04,740 --> 00:04:09,359
story that involves messing up the internet for, kind of, everyone.

92
00:04:09,780 --> 00:04:13,099
A story that begins with a small company in rural Pennsylvania.

93
00:04:13,490 --> 00:04:19,090
The main culprit: BGP, aka, Border Gateway Protocol.

94
00:04:19,790 --> 00:04:23,499
Chris, you may remember this, but for those who aren’t familiar, the

95
00:04:23,500 --> 00:04:27,530
small company involved is called Allegheny Technologies Incorporated.

96
00:04:28,090 --> 00:04:32,320
And like any good technology company, when they needed to set up internet

97
00:04:32,320 --> 00:04:37,990
service, they didn’t just contract with one ISP, but instead they got

98
00:04:37,990 --> 00:04:42,890
connectivity from two, one from Verizon and one from a provider called DQE.

99
00:04:43,259 --> 00:04:44,300
That’s smart, you know?

100
00:04:44,300 --> 00:04:46,759
If DQE goes down, they can still get out through

101
00:04:46,760 --> 00:04:49,450
Verizon and people can reach them, et cetera, et cetera.

102
00:04:49,570 --> 00:04:50,180
You get the idea.

103
00:04:50,920 --> 00:04:54,120
Unfortunately, through a series of configuration errors

104
00:04:54,150 --> 00:04:57,670
and incompetence or laziness on the part of Verizon—

105
00:04:58,220 --> 00:04:58,260
Chris: [gasp]

106
00:04:58,840 --> 00:05:04,890
Ned: —shocking, I know [laugh] —deep breaths—large swaths of clients on the

107
00:05:04,890 --> 00:05:10,230
internet suddenly had their traffic routed through DQE to Allegheny Inc.

108
00:05:10,690 --> 00:05:12,469
And then back out through Verizon.

109
00:05:13,230 --> 00:05:18,539
An article on Cloudflare’s website compared it to routing all of the

110
00:05:18,540 --> 00:05:23,270
traffic for a major highway through a small suburban development.

111
00:05:24,049 --> 00:05:25,880
I think that’s actually an understatement.

112
00:05:26,960 --> 00:05:29,730
This would be like taking all the traffic from all the

113
00:05:29,730 --> 00:05:33,210
major highways in the United States and putting them

114
00:05:33,240 --> 00:05:37,580
through one small street in, like, gridlock Philadelphia.

115
00:05:38,340 --> 00:05:42,449
Chris: Or, like an unpaved one lane road.

116
00:05:42,620 --> 00:05:44,219
Ned: In Old City [laugh] yes.

117
00:05:45,240 --> 00:05:50,489
DQE and Allegheny obviously did not have the capacity to handle such a

118
00:05:50,490 --> 00:05:55,270
ridiculous increase in traffic, so they started dropping packets like crazy, and

119
00:05:55,270 --> 00:06:00,169
I’d imagine that one or more routers in the path just completely melted down.

120
00:06:00,549 --> 00:06:06,060
Eventually Cloudflare was able to reach engineers at DQE and get the

121
00:06:06,060 --> 00:06:10,929
situation resolved, but even with the fix in place, it took a few hours for

122
00:06:10,929 --> 00:06:15,250
the global internet to converge on the updated and now corrected routing.

123
00:06:15,880 --> 00:06:19,920
The Cloudflare article also details three different ways that this

124
00:06:19,930 --> 00:06:25,330
particular incident could have been avoided, specifically, prefix limits,

125
00:06:25,340 --> 00:06:31,600
IRR filtering, and RPKI don’t worry about what those things are just yet.

126
00:06:31,889 --> 00:06:35,649
We will get to them later, and by later I mean, next episode.

127
00:06:36,219 --> 00:06:36,499
Chris: [laugh]

128
00:06:36,810 --> 00:06:37,410
.
Ned: Probably.

129
00:06:38,150 --> 00:06:41,660
We’re going to use this little tale that I’ve told as a touchstone

130
00:06:42,030 --> 00:06:46,619
for this and however many more episodes it takes me to cover BGP.

131
00:06:46,870 --> 00:06:48,250
Chris: My guess is ten.

132
00:06:48,259 --> 00:06:48,269
Ned: Ahhh.

133
00:06:49,340 --> 00:06:50,609
I mean, at least.

134
00:06:51,040 --> 00:06:51,580
Minimum.

135
00:06:52,010 --> 00:06:56,330
I also plan on bringing on a real BGP expert in a later

136
00:06:56,330 --> 00:06:59,590
episode who can help us understand how to operate BGP

137
00:07:00,380 --> 00:07:05,469
securely because—spoiler—it’s horribly insecure right now.

138
00:07:05,469 --> 00:07:06,434
Chris: Ahhh.

139
00:07:07,400 --> 00:07:09,580
Ned: Yeah, shocking, I know.

140
00:07:10,230 --> 00:07:14,900
But first, what the hell is BGP, and how can it wreck a whole person’s day?

141
00:07:15,320 --> 00:07:16,430
Chris: Or even half a person.

142
00:07:17,130 --> 00:07:18,200
Ned: BGP history.

143
00:07:19,060 --> 00:07:22,670
I recommend drinking during this portion [laugh] . Okay, so,

144
00:07:23,130 --> 00:07:27,580
as I said earlier, BGP, it stands for Border Gateway Protocol.

145
00:07:27,880 --> 00:07:30,830
There’s a border, and it involves gateway, and this is a protocol.

146
00:07:30,880 --> 00:07:33,020
It is exactly what it says on the tin.

147
00:07:33,310 --> 00:07:34,770
Chris: You needed 3500 words?

148
00:07:34,770 --> 00:07:35,710
You could have just said that?

149
00:07:36,400 --> 00:07:38,670
I thought this was going to be, like, a full episode.

150
00:07:38,889 --> 00:07:39,530
Ned: Oh, no, that’s it.

151
00:07:39,540 --> 00:07:40,030
We’re done.

152
00:07:40,059 --> 00:07:40,449
Chris: Yeah.

153
00:07:40,530 --> 00:07:41,500
Ned: Everybody can go home.

154
00:07:41,969 --> 00:07:42,890
I explained it all.

155
00:07:43,590 --> 00:07:46,080
Okay, everybody that’s still here, let’s get into it.

156
00:07:46,609 --> 00:07:51,239
So, it is the exterior gateway protocol that the internet uses to figure

157
00:07:51,240 --> 00:07:55,650
out how to get packets from a source to a destination, and then back again.

158
00:07:56,139 --> 00:07:58,700
To understand why BGP exists and how it

159
00:07:58,700 --> 00:08:00,989
functions, we’re going to have to go back in time.

160
00:08:01,690 --> 00:08:04,710
Grab your best leg warmers, your heather gray sweatshirt,

161
00:08:04,990 --> 00:08:08,039
and red bandana because it’s time to get totally ’80s.

162
00:08:09,800 --> 00:08:10,690
No comment on that?

163
00:08:10,950 --> 00:08:12,500
Chris: No I’m just a little offended that you

164
00:08:12,500 --> 00:08:15,680
used my current outfit as some kind of joke.

165
00:08:16,090 --> 00:08:17,990
Ned: It was an inspiration, if you will.

166
00:08:18,750 --> 00:08:21,969
As we covered in a previous episode about DNS, the modern

167
00:08:22,000 --> 00:08:26,110
internet grew out of ARPANET, and its replacement NSFNET.

168
00:08:26,620 --> 00:08:28,960
Chris: Which is totally different than NsfwNET,

169
00:08:29,260 --> 00:08:31,140
which we’ll talk about on a later episode.

170
00:08:31,310 --> 00:08:35,069
Ned: [laugh] . That’s behind the Patreon paywall.

171
00:08:35,189 --> 00:08:35,229
Chris: [laugh]

172
00:08:36,829 --> 00:08:38,029
.
Ned: Ned and Chris after dark.

173
00:08:38,669 --> 00:08:40,130
If you want that, let us know.

174
00:08:40,190 --> 00:08:43,555
I think it’d be awful, but you know, you’re willing to pay for it [laugh]

175
00:08:43,890 --> 00:08:46,610
.
Chris: Previous evidence has shown that no one will ever want that.

176
00:08:46,890 --> 00:08:47,110
Ned: Okay,

177
00:08:49,370 --> 00:08:51,580
good [laugh] . NSFNET was established by the National Science

178
00:08:51,580 --> 00:08:55,569
Foundation, and its original intention was to connect five

179
00:08:55,570 --> 00:09:00,220
supercomputers in the US and various campus networks, tie them all

180
00:09:00,220 --> 00:09:04,990
together using a backbone network that NSF would help fund and manage.

181
00:09:05,930 --> 00:09:10,909
The backbone network was run by a single entity, and used leased lines

182
00:09:10,910 --> 00:09:15,940
from telcos that were running at a blazing 56 kilobits per second.

183
00:09:15,950 --> 00:09:17,430
Chris: Oof, Mario Andretti.

184
00:09:18,320 --> 00:09:19,140
Ned: Scorching.

185
00:09:19,860 --> 00:09:24,610
If you had a 56k modem in the early-’90s, you had the

186
00:09:24,610 --> 00:09:29,500
same network bandwidth as NSFNET at its inception in 1986.

187
00:09:30,219 --> 00:09:31,189
You probably didn’t have a supercomputer,

188
00:09:31,559 --> 00:09:33,480
but I mean, you had the effective bandwidth.

189
00:09:34,809 --> 00:09:37,170
NSFNET wasn’t open to just anyone.

190
00:09:37,180 --> 00:09:41,300
You couldn’t dial up and, you know, put it on the little cradle thing

191
00:09:41,620 --> 00:09:46,319
for your modem; they had a process by which regional networks could join.

192
00:09:47,170 --> 00:09:51,759
And those regional networks in turn had to adhere to the acceptable

193
00:09:51,759 --> 00:09:58,220
use policy of NSFNET, which precluded using NSFNET for making money.

194
00:09:58,600 --> 00:10:03,080
This was supposed to be campuses, and universities, and educational

195
00:10:03,080 --> 00:10:06,970
institutions all coming together to do research and trade information.

196
00:10:06,980 --> 00:10:08,510
So, this wasn’t about making money.

197
00:10:08,900 --> 00:10:09,750
That comes later.

198
00:10:10,260 --> 00:10:14,829
The whole thing was overseen by Merit Network, which was a networking consortium

199
00:10:14,830 --> 00:10:19,979
out of Michigan, and they ran a network operation center, and they worked to

200
00:10:19,980 --> 00:10:24,440
design and implement the network connectivity that was used by the backbone.

201
00:10:25,219 --> 00:10:28,860
Since the NSFNET formed the backbone of all of these different

202
00:10:28,879 --> 00:10:32,520
networks and their interconnectivity, there was a hierarchy,

203
00:10:32,680 --> 00:10:37,310
and all inter-network traffic had to traverse this backbone.

204
00:10:37,570 --> 00:10:41,400
So, if Regional Network A wanted to talk to Regional Network B,

205
00:10:41,670 --> 00:10:44,999
it would go up [background noise] to the backbone—what was that?

206
00:10:45,210 --> 00:10:46,389
Chris: I didn’t drop my fidget toy.

207
00:10:46,469 --> 00:10:47,470
I don’t have a fidget toy.

208
00:10:47,480 --> 00:10:51,830
— Ned: [laugh] —it would send the traffic up to the backbone, and then the

209
00:10:51,840 --> 00:10:56,810
backbone would take it to Regional Network B, and send the traffic back down.

210
00:10:57,120 --> 00:11:00,380
So, it was a relatively simple network when it comes to the

211
00:11:01,030 --> 00:11:04,040
interconnectivity between all these regional networks and the supercomputer.

212
00:11:04,559 --> 00:11:07,989
The NSFNET knew all the connected networks and could pretty

213
00:11:08,000 --> 00:11:11,939
easily route traffic from one network to another, but it also came

214
00:11:11,940 --> 00:11:15,520
with the lack of resiliency and serious bandwidth constraints.

215
00:11:16,130 --> 00:11:19,020
You only had one connection to the other regional network, and if the

216
00:11:19,030 --> 00:11:22,490
backbone went down or was congested, you were kind of out of luck.

217
00:11:23,330 --> 00:11:28,569
NSFNET had to pretty quickly update their backbone from these 56 kilobit

218
00:11:28,600 --> 00:11:34,829
per second lines to T1 lines that ran at 1.5 megabits per second.

219
00:11:35,389 --> 00:11:37,170
That happened in 1988.

220
00:11:37,549 --> 00:11:41,410
And then they had to upgrade them again in 1991 to

221
00:11:41,500 --> 00:11:45,450
45 megabits per second, which was known as a T3 line.

222
00:11:46,050 --> 00:11:49,510
While it was possible to keep increasing the speed of the leased

223
00:11:49,520 --> 00:11:54,280
lines that formed the NSFNET backbone, additional lines were

224
00:11:54,340 --> 00:11:58,770
added, which introduced multiple paths for traffic to travel.

225
00:11:59,549 --> 00:12:03,610
At the same time, NSFNET was connecting with networks in other countries

226
00:12:03,810 --> 00:12:08,010
and to even more networks in the US, so the idea of handcrafting

227
00:12:08,040 --> 00:12:12,709
traffic routing tables to efficiently move traffic was no longer viable.

228
00:12:13,590 --> 00:12:17,819
Back in the early-’80s, the networking group at the IETF was aware

229
00:12:17,820 --> 00:12:21,910
of the looming issues behind the inter-network routing, and so they

230
00:12:21,910 --> 00:12:28,119
proposed what they called the Exterior Gateway Protocol in RFC 827.

231
00:12:28,510 --> 00:12:33,199
And that was in 1982, and then it was updated further in 1984.

232
00:12:34,229 --> 00:12:40,270
And EGPwas actually used by NSFNET, but it had some serious shortcomings,

233
00:12:40,340 --> 00:12:48,139
so in 1989, RFC 1105 proposed the Border Gateway Protocol to replace EGP.

234
00:12:48,830 --> 00:12:52,409
To make it even more confusing, all routing protocols that

235
00:12:52,449 --> 00:12:55,780
are inter-network routing protocols are called ‘exterior

236
00:12:55,810 --> 00:12:59,650
gateway protocols.’ That’s not going to be confusing at all.

237
00:13:00,250 --> 00:13:00,870
Chris: Definitely not.

238
00:13:01,300 --> 00:13:02,890
Ned: The important thing to understand is that

239
00:13:02,940 --> 00:13:05,750
EGP as its own standard has since been retired.

240
00:13:05,860 --> 00:13:10,340
So, you can refer to EGP as broadly any protocol

241
00:13:10,340 --> 00:13:12,599
that handles this inter-network traffic.

242
00:13:13,450 --> 00:13:16,820
BGP itself is sometimes referred to as the three-napkin

243
00:13:16,830 --> 00:13:21,310
protocol, as the original ideas that underpin it were scribbled

244
00:13:21,310 --> 00:13:25,310
out by two engineers in Austin across three ketchup napkins.

245
00:13:25,920 --> 00:13:27,669
There’s no ketchup on the actual napkins; they were

246
00:13:27,670 --> 00:13:30,790
just, I guess, at a fast food place that served fries,

247
00:13:30,790 --> 00:13:32,569
and you were supposed to put ketchup on the napkins.

248
00:13:32,620 --> 00:13:33,190
I don’t know.

249
00:13:33,400 --> 00:13:34,410
Weird terminology.

250
00:13:35,340 --> 00:13:37,680
Chris: Maybe the napkins were sponsored by big ketchup.

251
00:13:38,059 --> 00:13:38,459
Ned: Ohhh.

252
00:13:39,139 --> 00:13:39,719
Heinz.

253
00:13:39,800 --> 00:13:40,600
Got to watch out.

254
00:13:40,730 --> 00:13:42,389
They get their paws into everything.

255
00:13:42,550 --> 00:13:44,870
They’re red, yucky paws.

256
00:13:45,530 --> 00:13:47,329
That’s an awful visual, I’m sorry.

257
00:13:47,820 --> 00:13:50,680
So, while this story might seem apocryphal,

258
00:13:51,100 --> 00:13:53,530
they have actual pictures of the napkins.

259
00:13:53,820 --> 00:13:57,060
There’s no ketchup stains, but it does have the actual diagrams

260
00:13:57,080 --> 00:14:00,919
and sort of the flow for distributing routes in a BGP system.

261
00:14:01,170 --> 00:14:02,300
Chris: All right, I’m going to ignore you for a

262
00:14:02,300 --> 00:14:04,510
minute and actually look this up because I’m curious.

263
00:14:05,609 --> 00:14:06,290
Ned: [laugh] . Fair enough.

264
00:14:07,210 --> 00:14:13,000
BGP was not meant to be a long-term fix for the problems that NSFNET

265
00:14:13,770 --> 00:14:17,150
was experiencing, and that the larger internet would experience.

266
00:14:17,620 --> 00:14:21,100
It was just meant to be a relatively short-term fix to deal with

267
00:14:21,110 --> 00:14:25,080
the explosion of networks that were now forming the internet.

268
00:14:26,040 --> 00:14:29,570
The engineers really thought that they would come along later and replace

269
00:14:29,570 --> 00:14:33,930
it at some future point with a more robust and well-thought-out protocol.

270
00:14:33,990 --> 00:14:36,210
And that’s adorable.

271
00:14:36,970 --> 00:14:37,640
Chris: Still searching.

272
00:14:37,650 --> 00:14:38,910
I’m sure what you’re saying is interesting.

273
00:14:39,240 --> 00:14:39,720
Ned: Mm-hm.

274
00:14:40,510 --> 00:14:45,160
It’s a well-known fact that anything that you put into production, even if it’s

275
00:14:45,160 --> 00:14:51,949
supposed to be a temporary fix, will become a [laugh] a pillar of everything

276
00:14:51,950 --> 00:14:56,119
else that’s built later, and it’s going to be very hard to remove that pillar.

277
00:14:58,190 --> 00:14:59,260
BGP is no exception.

278
00:14:59,889 --> 00:15:03,860
They mapped it out in 1989, and we’re still waiting for its replacement.

279
00:15:04,650 --> 00:15:07,140
This is going to become important as we start to talk about

280
00:15:07,590 --> 00:15:11,530
BGP and its security controls, or its complete lack thereof.

281
00:15:11,920 --> 00:15:13,619
They didn’t think they needed them because

282
00:15:13,620 --> 00:15:15,479
this was supposed to be a stopgap measure.

283
00:15:16,240 --> 00:15:20,589
BGP was iterated on quickly, with version two coming in 1990.

284
00:15:20,650 --> 00:15:22,770
So, that’s a year later from the original idea.

285
00:15:23,110 --> 00:15:27,280
Version three came in 1991, and version four came in 1994.

286
00:15:28,219 --> 00:15:31,760
Version four is the current version of BGP in use by

287
00:15:31,770 --> 00:15:35,549
the internet today, so let’s talk about how it works.

288
00:15:35,830 --> 00:15:38,900
Unless you have some interesting information about these ketchup napkins.

289
00:15:39,170 --> 00:15:41,590
Chris: Are you sure it wasn’t called the two-napkin protocol?

290
00:15:41,890 --> 00:15:42,210
Ned: Nope.

291
00:15:42,250 --> 00:15:42,960
Three napkins.

292
00:15:43,130 --> 00:15:44,520
It had a picture of three napkins.

293
00:15:44,520 --> 00:15:47,339
It’s not the first thing to be drawn out on napkins, though.

294
00:15:47,670 --> 00:15:48,750
Because engineers—

295
00:15:48,759 --> 00:15:51,249
Chris: We could do a whole episode on things that were drawn out on napkins.

296
00:15:51,259 --> 00:15:54,370
Ned: [laugh] . Oh, and how they’re all universally terrible.

297
00:15:55,340 --> 00:15:55,360
[sigh]

298
00:15:56,060 --> 00:15:56,340
.
Chris: Anyway.

299
00:15:56,340 --> 00:15:56,360
Ned: So—

300
00:15:57,000 --> 00:15:58,109
Chris: Back to whatever it is we—

301
00:15:58,110 --> 00:15:58,390
Ned: BGP.

302
00:15:58,460 --> 00:16:00,010
Chris: Which was—oh right, BGP.

303
00:16:00,020 --> 00:16:00,750
That’s what you were saying.

304
00:16:00,790 --> 00:16:01,089
Okay.

305
00:16:01,200 --> 00:16:02,610
Ned: We’re going to—not napkins—

306
00:16:02,710 --> 00:16:02,900
Chris: I’m back.

307
00:16:02,900 --> 00:16:04,230
Ned: —but we can talk about napkins still.

308
00:16:04,360 --> 00:16:05,549
I have strong opinions.

309
00:16:06,170 --> 00:16:09,280
How expansive do we need to get here about BGP?

310
00:16:09,969 --> 00:16:13,230
I’m going to assume that most people listening

311
00:16:13,490 --> 00:16:15,730
know at least a bit about networking.

312
00:16:16,050 --> 00:16:17,310
At least, I hope so.

313
00:16:17,320 --> 00:16:21,680
Like, otherwise, why are you tuning into this podcast [laugh] ? Be super weird.

314
00:16:21,960 --> 00:16:22,469
Except for you.

315
00:16:22,469 --> 00:16:22,889
Hi, mom.

316
00:16:23,230 --> 00:16:24,990
Chris: Oh, don’t act like your mother listens.

317
00:16:25,170 --> 00:16:26,300
Ned: It’s cruel and true.

318
00:16:27,170 --> 00:16:31,200
So, I’m going to take it as a given that most people know what an IP address

319
00:16:31,200 --> 00:16:36,370
is, are vaguely aware of TCP and how it works, and have at least heard

320
00:16:36,370 --> 00:16:40,400
of routing protocols, even if you don’t understand any of them, even RIP.

321
00:16:41,300 --> 00:16:44,290
Maybe the best thing here would be a packet walk.

322
00:16:44,849 --> 00:16:51,300
How does a packet on my desktop make its way to pod.chaoslever.com.

323
00:16:51,310 --> 00:16:53,199
Just pulling an address out of the air.

324
00:16:53,580 --> 00:16:54,370
Chris: Totally random.

325
00:16:54,580 --> 00:16:55,300
Ned: Totally random.

326
00:16:55,860 --> 00:16:59,860
First, my desktop has to figure out the IP address to

327
00:16:59,870 --> 00:17:03,079
send the web request to, and that’s a function of DNS.

328
00:17:04,099 --> 00:17:08,210
And Chris, as you know, we did two whole last shows about DNS.

329
00:17:08,589 --> 00:17:09,409
Go look them up.

330
00:17:09,980 --> 00:17:10,589
Enjoy them.

331
00:17:11,240 --> 00:17:16,629
Pod.chaoslever.com is hosted on Podpage, which has a few

332
00:17:16,630 --> 00:17:24,099
different public IP addresses on the 216.239.32.0/19 network.

333
00:17:24,389 --> 00:17:25,430
Make sure you remember that.

334
00:17:25,440 --> 00:17:26,530
There will be a test later.

335
00:17:27,210 --> 00:17:30,470
Once I have an IP address, how does my

336
00:17:30,470 --> 00:17:33,550
desktop know where to send that web request?

337
00:17:33,820 --> 00:17:35,939
How does it actually route the packet there?

338
00:17:36,559 --> 00:17:39,789
Well, my desktop’s networking stack has a route table in it.

339
00:17:40,490 --> 00:17:43,190
If you’re on a Windows box like me, open up a

340
00:17:43,190 --> 00:17:47,020
terminal and run the command ‘route print-4’.

341
00:17:47,490 --> 00:17:51,359
That will give you all the routes stored locally for IPv4.

342
00:17:52,170 --> 00:17:57,969
On Linux, it’s probably something like ‘ip route list.’ On Mac, I have no idea.

343
00:17:57,969 --> 00:18:00,600
I think it’s also ‘ip route list’ or something similar?

344
00:18:00,750 --> 00:18:01,250
Chris: Correct.

345
00:18:01,660 --> 00:18:04,060
Ned: This list determines where a packet is

346
00:18:04,060 --> 00:18:07,370
sent, with the most specific entry winning.

347
00:18:07,860 --> 00:18:12,720
Now, since the website I’m trying to contact has a public IP address, my desktop

348
00:18:12,730 --> 00:18:18,440
is going to use what’s called the default route, which looks like 0.0.0.0, which

349
00:18:18,460 --> 00:18:26,700
in my case, points to the home router as the next hop, which is 192.168.1.1.

350
00:18:26,740 --> 00:18:27,620
I’m very creative.

351
00:18:27,650 --> 00:18:28,560
Yes, you’re welcome.

352
00:18:29,130 --> 00:18:32,879
Chances are that is the [laugh] gateway of your home router as well.

353
00:18:33,620 --> 00:18:38,199
Once my packet hits that router, it checks the route table there—or the

354
00:18:38,200 --> 00:18:42,380
router checks its route table—and decides where to send the traffic next.

355
00:18:43,320 --> 00:18:47,420
My router has a single WAN interface, and that when interface

356
00:18:47,429 --> 00:18:50,999
has a public IP address that was handed out by my ISP.

357
00:18:51,820 --> 00:18:55,700
There is a default route on my router that sends traffic to

358
00:18:55,700 --> 00:18:59,980
the next hop that my ISP lists, which is going to be some

359
00:19:00,020 --> 00:19:03,849
kind of router on their side that has its own routing table.

360
00:19:04,530 --> 00:19:09,649
My ISP is Verizon, and my packet may bounce around inside of the Verizon

361
00:19:09,660 --> 00:19:13,790
network for a while before emerging at one of their peering endpoints.

362
00:19:14,150 --> 00:19:16,100
And we’ll cover peering in a little bit.

363
00:19:16,590 --> 00:19:20,310
So, we’ve gone from my desktop to my home router to one

364
00:19:20,310 --> 00:19:22,840
of Verizon’s routers, and then it bounces around inside

365
00:19:22,950 --> 00:19:25,610
of their network until it emerges to go get to Podpage.

366
00:19:27,170 --> 00:19:30,650
That network—Verizon’s network that’s all the various routers that

367
00:19:30,650 --> 00:19:35,480
they control—is what’s referred to as an autonomous system, or AS.

368
00:19:36,180 --> 00:19:40,359
That network is privately managed by Verizon, and all traffic inside their

369
00:19:40,360 --> 00:19:45,909
network is routed using whatever Interior Gateway Protocol they want to use.

370
00:19:46,180 --> 00:19:46,510
That’s an IGP.

371
00:19:47,820 --> 00:19:48,129
Wooo.

372
00:19:48,750 --> 00:19:56,300
That could be ISIS, OSPF, or even an internal version of BGP called iBGP.

373
00:19:56,830 --> 00:19:59,090
We’re not going to get into that; just know it exists.

374
00:19:59,860 --> 00:20:02,450
That internal routing protocol is going to decide

375
00:20:02,460 --> 00:20:05,789
where my packet emerges from the Verizon network.

376
00:20:06,559 --> 00:20:11,970
The path that my packet takes once it hits the border between Verizon and other

377
00:20:11,990 --> 00:20:17,510
autonomous systems will depend on external BGP and how it makes decisions.

378
00:20:18,450 --> 00:20:22,899
Each autonomous system on the internet gets an AS number or ASN.

379
00:20:24,480 --> 00:20:30,130
The original ASN specification used 16 bits, so the

380
00:20:30,130 --> 00:20:36,429
maximum AS number was 65,355, because we count from zero.

381
00:20:37,210 --> 00:20:40,850
And just like IPv4, there is a range of ASNs

382
00:20:40,959 --> 00:20:43,640
that are reserved for private or internal use.

383
00:20:43,830 --> 00:20:48,000
So, if you were setting up iBGP, you would use those internal ASNs.

384
00:20:49,640 --> 00:20:53,370
The rest of them are managed by the internet Assigned Numbers Authority

385
00:20:53,389 --> 00:20:58,169
or IANA, which maybe has an acronym pronunciation, I’m not sure.

386
00:20:58,180 --> 00:20:59,210
Have you ever heard one?

387
00:21:01,360 --> 00:21:01,720
Chris: Uh, Jana?

388
00:21:01,980 --> 00:21:02,240
Ned: Ayana?

389
00:21:02,250 --> 00:21:02,260
Eh.

390
00:21:02,780 --> 00:21:03,370
It’s IANA.

391
00:21:03,370 --> 00:21:05,129
Chris: I think that was a Fleetwood Mac song.

392
00:21:05,710 --> 00:21:06,090
Ned: Nice.

393
00:21:07,300 --> 00:21:09,040
[sigh] . Wonder where they got that name,

394
00:21:09,460 --> 00:21:11,650
the internet Assigned Numbers Authority.

395
00:21:12,440 --> 00:21:13,820
They assign numbers.

396
00:21:14,750 --> 00:21:20,120
Blocks of ASNs are handed out from the IANA to regional

397
00:21:20,150 --> 00:21:23,580
internet registries, and those handle the actual assignment

398
00:21:23,630 --> 00:21:29,280
of ASNs to people who want ASNs, these regional networks.

399
00:21:29,910 --> 00:21:34,360
When BGP was first implemented 16 bits probably seemed like plenty,

400
00:21:34,820 --> 00:21:38,650
and also was what routers were capable of handling at the time.

401
00:21:39,230 --> 00:21:46,630
In 2012, RFC 6793 expanded ASN to use four octets, or 32 bits,

402
00:21:47,130 --> 00:21:51,430
which raised the number of available numbers to roughly 4 billion.

403
00:21:51,910 --> 00:21:52,880
Will that be enough?

404
00:21:53,309 --> 00:21:57,550
At the moment, current statistics show that regional internet registries

405
00:21:57,550 --> 00:22:02,919
have handed out 130,000 ASN, so, um… I think we’ll be all right, for a while.

406
00:22:03,400 --> 00:22:04,270
Chris: We’ll be good, I think.

407
00:22:04,270 --> 00:22:04,830
We’ll be good.

408
00:22:05,219 --> 00:22:07,990
Ned: This is very different than the lack of available public

409
00:22:09,080 --> 00:22:12,099
IPv4 addresses because it’s not like every device gets an ASN.

410
00:22:12,560 --> 00:22:15,070
It’s every large network gets one.

411
00:22:15,950 --> 00:22:21,099
Still, though, that’s 130,000 public-facing as NS that BGP

412
00:22:21,110 --> 00:22:23,930
has to worry about when it comes to routing your packets.

413
00:22:24,360 --> 00:22:25,389
This thing has to be scalable.

414
00:22:26,120 --> 00:22:27,449
So, how does it do that?

415
00:22:28,160 --> 00:22:30,229
Chris: I thought we already established that: magic.

416
00:22:30,510 --> 00:22:30,770
Ned: Yes.

417
00:22:31,190 --> 00:22:32,210
That’s essentially what it is.

418
00:22:32,240 --> 00:22:35,879
And if you want to stop there, and just know that that’s what BGP is responsible

419
00:22:35,880 --> 00:22:41,110
for, you can ignore the next, like, ten minutes [laugh] . To get into some of

420
00:22:41,110 --> 00:22:44,509
the detail—and we’re not going to get down to nitty gritty here, but just some

421
00:22:44,509 --> 00:22:49,560
of the detail here—BGP is what’s called a path vector-based routing protocol,

422
00:22:49,870 --> 00:22:55,230
which means that it decides on a specific path for a route-based on attributes.

423
00:22:55,770 --> 00:22:59,090
Vector is the direction and path is the selection.

424
00:22:59,690 --> 00:23:02,840
BGP doesn’t understand or care about things like

425
00:23:02,920 --> 00:23:07,080
bandwidth, or latency, or even hops, really.

426
00:23:07,670 --> 00:23:10,870
Instead, it has a path selection algorithm that walks

427
00:23:10,870 --> 00:23:14,190
through the attributes of each possible path for a packet,

428
00:23:14,599 --> 00:23:17,899
and then picks one based on the selection criteria.

429
00:23:18,700 --> 00:23:21,379
We’ll get into the actual process it uses in a

430
00:23:21,380 --> 00:23:24,189
moment, but where is it getting this information from?

431
00:23:24,959 --> 00:23:26,010
From its neighbors.

432
00:23:26,530 --> 00:23:27,929
Oh, they have neighbors.

433
00:23:27,980 --> 00:23:29,110
It’s like a community.

434
00:23:29,520 --> 00:23:31,460
And there’s also communities [laugh]

435
00:23:31,480 --> 00:23:33,989
.
Chris: I would just like to pause and remind everybody that Ned

436
00:23:34,050 --> 00:23:37,360
explicitly said he wasn’t going to get into the nitty-gritty.

437
00:23:37,590 --> 00:23:38,709
Ned: I’m not [laugh]

438
00:23:39,130 --> 00:23:40,269
.
Chris: That’s the thing.

439
00:23:41,980 --> 00:23:44,680
Ned: This is the high-level stuff [laugh] . It gets so much deeper.

440
00:23:44,890 --> 00:23:47,730
Chris: No, no, I just wanted to point that out to explain to people

441
00:23:47,830 --> 00:23:52,340
a little more justification as to why my run away screaming protocol

442
00:23:52,420 --> 00:23:56,420
is what I operate upon when BGP comes up in quiet conversation.

443
00:23:57,360 --> 00:23:57,649
Ned: Right.

444
00:23:57,649 --> 00:24:01,970
All right, so if I’m a BGP—I’m a router running BGP,

445
00:24:01,970 --> 00:24:06,090
you can call me a node—I form relationships with other

446
00:24:06,090 --> 00:24:09,340
routers running BGP through what’s called neighborships.

447
00:24:09,340 --> 00:24:11,605
I don’t like the term, but apparently it’s used.

448
00:24:11,605 --> 00:24:12,670
Chris: Please tell me that’s not real.

449
00:24:12,830 --> 00:24:13,520
Ned: That’s real.

450
00:24:13,910 --> 00:24:14,529
I’m sorry.

451
00:24:15,009 --> 00:24:18,049
Setting up a neighborship is very, very simple.

452
00:24:18,370 --> 00:24:21,149
Let’s say we’ve got two routers: Router A and Router B.

453
00:24:21,720 --> 00:24:21,996
On Router—

454
00:24:21,996 --> 00:24:23,080
Chris: I just got—oh, my God.

455
00:24:23,320 --> 00:24:23,610
Ned: What?

456
00:24:24,270 --> 00:24:24,590
Chris: Neighborship?

457
00:24:24,590 --> 00:24:24,600
Ned: Neighborship.

458
00:24:26,849 --> 00:24:30,950
I heard it first, and that was like that can’t possibly be the real term.

459
00:24:31,639 --> 00:24:34,689
They’re also called peers, and I like that better, but

460
00:24:34,709 --> 00:24:38,110
that gets into the difference between peering and transit.

461
00:24:38,590 --> 00:24:39,669
And so…

462
00:24:39,969 --> 00:24:42,690
Chris: Can you hold on for one second, I got to go get a glass.

463
00:24:44,550 --> 00:24:45,409
Ned: [laugh] . Smash it real hard.

464
00:24:47,179 --> 00:24:50,159
[sigh] . The problem is that we use the same terms to mean

465
00:24:50,160 --> 00:24:52,990
too many different things in technology, and so sometimes

466
00:24:52,990 --> 00:24:56,010
we just got to make up a word, and it’s not always good.

467
00:24:56,990 --> 00:24:57,490
Anyway.

468
00:24:58,740 --> 00:25:01,320
So, let’s say I have two routers: Router A, Router B.

469
00:25:01,600 --> 00:25:06,829
On Router A, I tell it the IP address of Router B and its ASN.

470
00:25:06,829 --> 00:25:13,810
And then over on Router B, I tell it the IP address of Router A and its ASN.

471
00:25:13,820 --> 00:25:16,600
On Router A, I add any networks that I want to

472
00:25:16,620 --> 00:25:21,560
advertise, and same thing for Router B, and that’s it.

473
00:25:22,360 --> 00:25:25,480
The two routers will establish a TCP connection over

474
00:25:25,480 --> 00:25:29,449
port 179, and start exchanging route information.

475
00:25:29,959 --> 00:25:33,200
Each router will share the networks that it is advertising

476
00:25:33,380 --> 00:25:36,470
and any networks it learned about from other routers.

477
00:25:37,250 --> 00:25:41,600
And BGP only sends messages across that link

478
00:25:41,650 --> 00:25:44,210
when there’s an update to its advertised routes.

479
00:25:44,230 --> 00:25:48,910
So, unlike something like RIP that, every 30 seconds goes, “Here’s all my

480
00:25:48,910 --> 00:25:54,630
routes.” “Here’s all my routes.” That would be bad and awful, so BGP just

481
00:25:54,710 --> 00:25:59,200
sends information when something changes about one of the advertised routes.

482
00:25:59,550 --> 00:26:05,239
Otherwise, just hangs out, chills, plays Pinochle, and every 30

483
00:26:05,239 --> 00:26:07,639
or 60 seconds, it sends a keep-alive saying, “Yep, I’m still here.

484
00:26:07,770 --> 00:26:10,260
I got nothing new to say.” Kind of like you, Chris.

485
00:26:10,330 --> 00:26:13,760
I check in every 30 to 60 seconds to make sure you’re still here [laugh]

486
00:26:14,170 --> 00:26:15,870
.
Chris: As usual, I’ve got nothing new to say.

487
00:26:16,899 --> 00:26:20,839
Ned: [laugh] . Indeed, the routing decisions made by Router A

488
00:26:20,839 --> 00:26:24,449
will depend on the advertisements it gets from its neighbors.

489
00:26:25,000 --> 00:26:28,160
So, so far, we’ve just got Router A and B, but we

490
00:26:28,160 --> 00:26:31,800
can add additional routers as neighbors: C, D, and E.

491
00:26:32,510 --> 00:26:37,640
Router A learns about routes to different networks from all of these neighbors,

492
00:26:37,950 --> 00:26:41,820
and then makes path-based decisions based on the routes that it learned.

493
00:26:41,820 --> 00:26:47,510
BGP network advertisements can have a ton of attributes, but

494
00:26:47,510 --> 00:26:51,080
there’s really only about eight standard ones that are commonly

495
00:26:51,080 --> 00:26:55,020
used, and honestly, there’s probably only about three or four

496
00:26:55,020 --> 00:26:58,510
that actually matter, so we’re just going to talk about those.

497
00:26:59,000 --> 00:26:59,539
Chris: Thank God.

498
00:26:59,920 --> 00:27:00,380
Ned: Yes.

499
00:27:01,139 --> 00:27:06,750
Local preference is an attribute that lets you prefer one route over another.

500
00:27:07,120 --> 00:27:11,029
I could give Router B preference over Router C.

501
00:27:11,759 --> 00:27:12,639
Very straightforward.

502
00:27:13,469 --> 00:27:17,039
If both routers are an option for a given destination,

503
00:27:17,619 --> 00:27:19,980
the one with the higher preference gets the nod.

504
00:27:20,040 --> 00:27:23,880
So, Router B would get—I’d send my traffic to Router B instead of Router C.

505
00:27:24,540 --> 00:27:27,639
That’s useful if, say, the link on Router B is a

506
00:27:27,650 --> 00:27:30,680
ten gig link and the link to Router C is one gig.

507
00:27:30,990 --> 00:27:34,050
I probably want to use the link to Router B if I can help it.

508
00:27:34,590 --> 00:27:36,750
BGP doesn’t know about link speed, but you do.

509
00:27:37,460 --> 00:27:39,989
The next attribute is AS path length.

510
00:27:41,180 --> 00:27:44,970
The AS path is a list of every autonomous system a

511
00:27:44,970 --> 00:27:48,210
packet will pass through, from source to destination.

512
00:27:48,849 --> 00:27:51,560
So, when a router learns about a route from one of its

513
00:27:51,570 --> 00:27:55,199
neighbors and wants to share that route with the next router

514
00:27:55,200 --> 00:27:59,890
in line, it tacks on its AS number to the end of the AS path.

515
00:28:00,720 --> 00:28:05,909
So, the more autonomous systems a route travels through, the longer the

516
00:28:05,910 --> 00:28:12,500
path length becomes, and that makes it less preferred as a path to choose.

517
00:28:13,080 --> 00:28:16,350
That doesn’t mean that the shorter AS path route is

518
00:28:16,360 --> 00:28:20,370
actually faster, it just means that it’s shorter.

519
00:28:20,950 --> 00:28:24,720
Inside that autonomous system, there could be way more hops

520
00:28:24,860 --> 00:28:28,819
between the ingress and egress routers, so that’s why you might

521
00:28:28,820 --> 00:28:32,090
want to use something like local preference if you know that,

522
00:28:32,100 --> 00:28:36,500
say, Joe’s ISP and Crab Shack kind of sucks at passing traffic.

523
00:28:36,980 --> 00:28:38,160
Chris: Phenomenal crabs, though.

524
00:28:38,500 --> 00:28:39,540
Ned: Really good crabs.

525
00:28:40,120 --> 00:28:42,210
The last attribute is the router ID.

526
00:28:42,660 --> 00:28:48,899
If all other attributes for a route are the same, the lower router ID wins.

527
00:28:49,830 --> 00:28:51,590
Where does that router ID come from?

528
00:28:52,410 --> 00:28:53,290
That’s weird.

529
00:28:53,300 --> 00:28:54,540
It’s kind of up to the admin.

530
00:28:55,300 --> 00:28:58,960
The form looks exactly like an IPv4 address, and it’s

531
00:28:58,960 --> 00:29:01,940
usually set to the first loopback interface on the router.

532
00:29:02,759 --> 00:29:06,500
The router ID needs to be unique within an individual

533
00:29:06,590 --> 00:29:09,530
autonomous system and unique among its peers.

534
00:29:10,550 --> 00:29:13,290
So, you know, you can’t have two routers in the same

535
00:29:13,540 --> 00:29:17,349
neighborship—so sorry—that have the same router ID.

536
00:29:18,190 --> 00:29:19,209
Bad things will happen.

537
00:29:20,040 --> 00:29:25,890
Speaking of peers—back to our packet walk—the request has now left the Verizon

538
00:29:25,890 --> 00:29:30,530
network and it’s gone to some other network based on advertised routes.

539
00:29:31,260 --> 00:29:35,330
The Verizon router made a decision based on the path attributes for each route.

540
00:29:35,849 --> 00:29:37,439
Where is this all happening?

541
00:29:38,139 --> 00:29:40,070
Physically, where’s this actually happening?

542
00:29:40,770 --> 00:29:45,260
It’s at an internet exchange point of some kind—most likely—where a peering

543
00:29:45,270 --> 00:29:49,170
or transit arrangement has been created between two or more routers.

544
00:29:50,080 --> 00:29:54,380
So, at this point, we’re kind of done with BGP, but that led me

545
00:29:54,630 --> 00:29:58,470
to another rabbit hole, which is okay, I understand the theory.

546
00:29:58,580 --> 00:30:00,600
Where’s all this stuff actually happening?

547
00:30:01,350 --> 00:30:04,130
And it’s happening at these dedicated colocation

548
00:30:04,130 --> 00:30:06,430
facilities and internet exchange points.

549
00:30:06,570 --> 00:30:11,499
They used to be called NAPs, which was like Network Access… something.

550
00:30:11,980 --> 00:30:16,060
And there was a place called a SUPERNAP, down in Virginia,

551
00:30:16,139 --> 00:30:19,530
I think, where there, like, a metric shit ton of these

552
00:30:19,730 --> 00:30:22,510
different ISP lines all coming into the same facility.

553
00:30:22,920 --> 00:30:24,430
I don’t know if it’s still called the SUPERNAP.

554
00:30:25,050 --> 00:30:27,430
Chris: I think I’m lined up for a super nap, if you know what I’m saying.

555
00:30:27,530 --> 00:30:28,320
Ned: I do.

556
00:30:28,380 --> 00:30:29,960
I set you up for that one.

557
00:30:29,960 --> 00:30:30,540
You’re welcome.

558
00:30:31,259 --> 00:30:33,769
So, this isn’t entirely relevant to BGP,

559
00:30:33,790 --> 00:30:36,019
except it filled in some mental gaps for me.

560
00:30:36,920 --> 00:30:39,220
How are two autonomous systems connected?

561
00:30:39,880 --> 00:30:43,480
Well, they’re connected by two routers, but there’s two basic physical

562
00:30:43,490 --> 00:30:47,970
topologies that are followed: you can have a public peering arrangements

563
00:30:47,970 --> 00:30:52,280
between a bunch of ASs, and that usually happens at one of these internet

564
00:30:52,289 --> 00:30:57,460
exchange points, or a rented colocation space from a neutral provider.

565
00:30:57,470 --> 00:31:00,900
Think Equinix, or Digital Realty would be examples.

566
00:31:01,920 --> 00:31:04,960
Each ISPs router will be connected into a common

567
00:31:04,960 --> 00:31:09,190
switch fabric, and peering relationships will be formed

568
00:31:09,219 --> 00:31:11,849
between each router that’s connected into the switch.

569
00:31:12,559 --> 00:31:15,639
So, they’re all exchanging routing information with each other.

570
00:31:16,480 --> 00:31:21,780
The other option is a direct router-to-router connection between two ASs.

571
00:31:22,179 --> 00:31:23,749
That’s known as private peering.

572
00:31:24,590 --> 00:31:27,860
If you’ve ever been involved in setting up a connection to AWS with

573
00:31:27,860 --> 00:31:31,879
Direct Connect, or Azure with Express Connect—or Express Route.

574
00:31:32,070 --> 00:31:36,600
Sorry, stupid names—both of those use private peering and a

575
00:31:36,600 --> 00:31:41,490
direct physical connection from your network to Azure or AWS.

576
00:31:41,620 --> 00:31:44,880
You have to set up what’s called a cross-connect, which is essentially, from

577
00:31:44,880 --> 00:31:49,720
your router—or a router that you’re leasing through your ISP—it’s a cable

578
00:31:49,720 --> 00:31:54,280
that runs to the router or the switch that the cloud router is hooked into.

579
00:31:55,190 --> 00:31:58,420
There’s also a slight difference between peering and transit.

580
00:31:58,960 --> 00:32:01,670
Peering means that I can send traffic to your network,

581
00:32:01,700 --> 00:32:04,190
and you can send traffic to my network, and we don’t

582
00:32:04,190 --> 00:32:07,399
charge each other any money for accepting that traffic.

583
00:32:08,170 --> 00:32:11,860
Consider a scenario where you have a few different regional

584
00:32:11,860 --> 00:32:14,770
networks that want to pass network traffic between each other,

585
00:32:14,820 --> 00:32:17,669
rather than sending the traffic across a transit network.

586
00:32:18,620 --> 00:32:22,819
They can all rent space together at a colocation data center, and set up a

587
00:32:22,900 --> 00:32:26,889
public peering arrangement where they’ll exchange routes and paths traffic.

588
00:32:27,320 --> 00:32:30,689
It’s beneficial for all the networks involved to be able to

589
00:32:30,700 --> 00:32:34,310
communicate freely, and there’s a verbal peering agreement,

590
00:32:34,429 --> 00:32:37,380
or handshake agreement, to not be an asshole about it.

591
00:32:38,520 --> 00:32:38,550
Chris: [laugh]

592
00:32:38,940 --> 00:32:39,490
.
Ned: I’m serious.

593
00:32:39,490 --> 00:32:41,099
They’re like, “Just don’t be a dick.

594
00:32:41,440 --> 00:32:44,790
Don’t overwhelm my network with traffic that’s destined for somewhere else.

595
00:32:44,800 --> 00:32:46,860
Don’t try to use me as a transit network, and

596
00:32:46,860 --> 00:32:50,439
we’ll all get along.” And yes, I’m very serious.

597
00:32:51,020 --> 00:32:55,370
A study in 2011 showed that only 0.05% of

598
00:32:55,370 --> 00:32:58,309
peering agreements were actual written contracts.

599
00:32:59,199 --> 00:33:02,920
I imagine that’s grown in the last 13 years with the explosion of cloud

600
00:33:02,920 --> 00:33:06,470
where, like, if you want a peering agreement with Azure, it is absolutely

601
00:33:06,470 --> 00:33:10,589
a written contract, but from what I’ve heard, that’s in the minority.

602
00:33:10,630 --> 00:33:13,050
These regional networks are still using just

603
00:33:13,050 --> 00:33:16,350
handshakes and, like, firm nods at each other.

604
00:33:17,190 --> 00:33:20,000
Transit relationships are where a network is paying

605
00:33:20,000 --> 00:33:22,700
another network for access to the general internet.

606
00:33:23,820 --> 00:33:27,500
There’s a few giant tier one operators that lots of other

607
00:33:27,500 --> 00:33:30,710
networks pay to transmit their traffic across the internet.

608
00:33:31,540 --> 00:33:35,589
A regional network in, say, Luxembourg is unlikely to have a

609
00:33:35,590 --> 00:33:39,680
direct peering relationship with a network in Omaha, Nebraska,

610
00:33:40,240 --> 00:33:42,850
so that traffic needs to transit through another provider.

611
00:33:43,520 --> 00:33:45,870
That provider doesn’t see a mutual benefit for

612
00:33:45,880 --> 00:33:48,530
providing that transit, so they charge for it.

613
00:33:49,770 --> 00:33:52,530
Tier one networks are those networks that can reach all

614
00:33:52,530 --> 00:33:55,870
other networks on the internet using settlement-free peering.

615
00:33:56,690 --> 00:34:01,060
Tier two networks have to pay for at least some transit to other networks.

616
00:34:01,630 --> 00:34:05,680
And tier three networks pay for transit to all networks.

617
00:34:06,650 --> 00:34:09,210
Who are these mysterious tier one providers?

618
00:34:09,420 --> 00:34:11,570
Well, Verizon is one.

619
00:34:12,190 --> 00:34:17,920
So, is AT&T, and Comcast, and Lumen, who you might not have

620
00:34:17,920 --> 00:34:20,950
heard of, but that’s because they used to be called CenturyLink.

621
00:34:21,489 --> 00:34:23,360
They changed their name because they had a

622
00:34:23,360 --> 00:34:25,540
terrible reputation, and that was going to help.

623
00:34:26,110 --> 00:34:30,580
They’re also the biggest tier one provider in the world as far as I can tell.

624
00:34:31,659 --> 00:34:35,490
Since Verizon is a tier one network—going back to our packet walk,

625
00:34:35,490 --> 00:34:39,980
and to round this all out—Since it’s a tier one network, my packet

626
00:34:40,010 --> 00:34:44,100
doesn’t have to go across another transit network to get to Podpage.

627
00:34:45,320 --> 00:34:47,209
I looked it up, and Podpage is actually

628
00:34:47,209 --> 00:34:49,759
using Google Cloud to host their service.

629
00:34:50,489 --> 00:34:55,530
So, when I looked at it, the ASNs for Podpage—or the public IP

630
00:34:55,530 --> 00:35:00,399
addresses they’re using—lined up to Google’s ASNs, and so my

631
00:35:00,400 --> 00:35:03,830
little packet will go directly from Verizon network to Google.

632
00:35:04,059 --> 00:35:05,980
No other transit required.

633
00:35:06,000 --> 00:35:08,090
And in fact, that’s exactly what it does.

634
00:35:08,550 --> 00:35:13,040
Through the magic of traceroute, I can see my packet hop from Verizon, to

635
00:35:13,040 --> 00:35:19,110
Verizon business, to Google, to another Google AS because they have multiples.

636
00:35:19,929 --> 00:35:23,139
BGP has done its job, and all as well with the internet.

637
00:35:23,930 --> 00:35:24,770
But what if it isn’t?

638
00:35:25,500 --> 00:35:25,570
Chris: [laugh]

639
00:35:26,260 --> 00:35:27,750
.
Ned: How can BGP break?

640
00:35:28,090 --> 00:35:29,810
And can people do it on purpose?

641
00:35:30,520 --> 00:35:31,870
The answers will shock you.

642
00:35:32,360 --> 00:35:36,710
I—they probably won’t shock you [laugh] . The answer is there

643
00:35:36,710 --> 00:35:41,230
are many ways to break BGP, and yes, it can be done on purpose.

644
00:35:41,500 --> 00:35:46,770
But that is the story for another time, a future episode, and a guest

645
00:35:46,920 --> 00:35:50,279
who’s more eloquent than me at explaining security issues with BGP.

646
00:35:50,279 --> 00:35:50,299
[sigh]

647
00:35:52,820 --> 00:35:53,499
. You feel better?

648
00:35:53,700 --> 00:35:54,379
Chris: No.

649
00:35:54,570 --> 00:35:57,359
Ned: Have I demystified some of the magic of the internet for you?

650
00:35:57,660 --> 00:35:59,290
Chris: I’m more confused than when I started,

651
00:35:59,290 --> 00:36:00,780
and I didn’t think that was possible.

652
00:36:01,110 --> 00:36:01,350
Ned: Good.

653
00:36:01,350 --> 00:36:04,560
Then my job… [laugh] is a complete success.

654
00:36:04,670 --> 00:36:05,610
My job here is done.

655
00:36:06,550 --> 00:36:07,890
Hey, thanks for listening or something.

656
00:36:07,890 --> 00:36:10,610
I guess you found it worthwhile enough if you made it all the way to the

657
00:36:10,610 --> 00:36:14,219
end, so congratulations to you, friend, you accomplished something today.

658
00:36:14,670 --> 00:36:15,170
Maybe.

659
00:36:15,790 --> 00:36:18,640
Now, you can sit on the couch, think about the magic of

660
00:36:18,910 --> 00:36:22,049
BGP, and just get hopelessly confused like the rest of us.

661
00:36:22,330 --> 00:36:22,860
You’ve earned it.

662
00:36:23,390 --> 00:36:26,220
You can find more about this show by going to our LinkedIn page, just

663
00:36:26,220 --> 00:36:30,680
search ‘Chaos Lever,’ or go to the website, pod.chaoslever.com, where

664
00:36:30,680 --> 00:36:34,170
you’ll find show notes, blog posts, and general tomfoolery, and you

665
00:36:34,170 --> 00:36:37,590
can leave a comment that we might read on the Tech News of the Week.

666
00:36:37,980 --> 00:36:40,570
We’ll be back next week to see what fresh hell is upon us.

667
00:36:40,730 --> 00:36:41,620
Ta-ta for now.

668
00:36:49,740 --> 00:36:53,290
Chris: And just to make things even more unnecessarily confusing,

669
00:36:53,879 --> 00:36:57,530
it was originally called the two-napkin protocol, when it was first

670
00:36:57,550 --> 00:37:03,460
proposed and first published in a Cisco internal blog in 1989.

671
00:37:04,030 --> 00:37:06,190
Ned: [laugh] . And then there was a third napkin arose?

672
00:37:06,480 --> 00:37:07,030
Oh, no.

673
00:37:07,309 --> 00:37:08,970
Chris: Look, I mean, math is hard.