Transcript
1
00:00:00,530 --> 00:00:04,910
Ned: I made the unfortunate decision to just use chaoslever.com
2
00:00:05,170 --> 00:00:08,220
and no subdomain [laugh] . So, there’s two problems.
3
00:00:08,320 --> 00:00:09,049
Chris: One is Ned.
4
00:00:09,330 --> 00:00:13,179
Ned: One is me [laugh] . I am always the perennial problem.
5
00:00:13,710 --> 00:00:19,390
They go with the assumption you want to use ‘www’ as your subdomain, so they
6
00:00:19,390 --> 00:00:25,689
do support setting your apex record—the at record—for chaoslever.com to—
7
00:00:25,689 --> 00:00:25,709
Chris: [loud snores]
8
00:00:25,709 --> 00:00:30,050
.
Ned: [laugh] you’re very—you’re cruel.
9
00:00:30,050 --> 00:00:31,280
Chris: [more loud snores]
10
00:00:31,280 --> 00:00:33,730
.
Ned: [laugh] . Goddammit.
11
00:00:43,200 --> 00:00:46,000
Hello, alleged human, and welcome to the Chaos Lever podcast.
12
00:00:46,000 --> 00:00:48,550
My name is Ned, and I’m definitely not a robot.
13
00:00:48,620 --> 00:00:55,500
I am a sentient, real human person with feelings, dreams, and just the general
14
00:00:55,500 --> 00:00:59,710
desire to smoothly migrate a website and not have everything go to shit.
15
00:01:00,670 --> 00:01:04,259
[sigh] . With me is Chris, who was also here?
16
00:01:04,719 --> 00:01:05,279
Mostly.
17
00:01:05,669 --> 00:01:08,180
Chris: Have you ever read my favorite philosophical tract?
18
00:01:08,330 --> 00:01:08,880
Ned: I don’t know.
19
00:01:09,070 --> 00:01:09,830
Chris: It’s a short one.
20
00:01:09,839 --> 00:01:10,870
It’s ancient text.
21
00:01:10,910 --> 00:01:12,929
It was translated, I think, from the Sumerian.
22
00:01:13,250 --> 00:01:13,680
Ned: Okay.
23
00:01:14,080 --> 00:01:17,450
Chris: And the title is, “Whatever You’re Trying To Do,”
24
00:01:17,760 --> 00:01:22,929
Sumerian question mark, dot dot dot, “Yeah, Good Luck With That.”
25
00:01:25,719 --> 00:01:27,210
Ned: [laugh] . Wow, that is a philosophy that
26
00:01:27,210 --> 00:01:29,940
is just broadly applicable to every situation.
27
00:01:30,709 --> 00:01:33,479
Chris: I believe—and this is, you know, it’s really tough
28
00:01:33,500 --> 00:01:36,200
with archaeology because you get a lot of incomplete records—
29
00:01:36,680 --> 00:01:37,140
Ned: It’s true.
30
00:01:37,610 --> 00:01:42,060
Chris: But I believe, and modern science agrees with me on this, the
31
00:01:42,060 --> 00:01:45,550
follow-up book to that is, “I Fucking Told You It Wasn’t Going To Work.”
32
00:01:46,670 --> 00:01:48,429
Ned: [laugh] . I’m glad to know that the
33
00:01:48,500 --> 00:01:51,770
Sumerians were so blunt in their philosophy.
34
00:01:52,030 --> 00:01:53,370
There’s nothing aesthetic about it.
35
00:01:53,640 --> 00:01:54,800
I appreciate it.
36
00:01:55,090 --> 00:01:57,820
Chris: I mean, it’s really, really hot in [Sumaria]
37
00:01:58,219 --> 00:01:58,630
.
Ned: Is it?
38
00:01:59,210 --> 00:01:59,800
Chris: Sure.
39
00:02:00,860 --> 00:02:05,289
Ned: Whenever people would bring up ancient civilizations, Babylon,
40
00:02:05,299 --> 00:02:11,600
Sumaria, et cetera, I always thought of those as in, sort of, some
41
00:02:11,800 --> 00:02:16,700
mythical place that didn’t actually exist on the modern map of today, and
42
00:02:16,700 --> 00:02:20,700
I’m sad to realize at some point that was not true, and that these are
43
00:02:20,730 --> 00:02:24,809
actual locations that you can go to; they just have different names now.
44
00:02:25,330 --> 00:02:27,130
Chris: Yeah, Ur still exists.
45
00:02:27,210 --> 00:02:28,200
I think it’s in Iraq.
46
00:02:29,040 --> 00:02:34,090
Ned: I don’t like it [laugh] . Yeah… oh, well.
47
00:02:34,280 --> 00:02:34,899
Here we are.
48
00:02:34,990 --> 00:02:38,130
Let’s talk about another mythical thing that shouldn’t exist, but does.
49
00:02:38,679 --> 00:02:39,620
It’s BGP.
50
00:02:40,250 --> 00:02:43,210
Chris: I’m not going to lie, that is, like, top ten transitions for you.
51
00:02:43,820 --> 00:02:44,340
Ned: [laugh] . Thank you.
52
00:02:44,450 --> 00:02:45,770
Chris: Might even be top five.
53
00:02:46,290 --> 00:02:48,920
Ned: [laugh] . I felt really good about it, in part
54
00:02:48,920 --> 00:02:51,220
because it was completely organic and not planned.
55
00:02:51,570 --> 00:02:53,390
And now I’m ruining it by talking about it.
56
00:02:53,800 --> 00:02:55,460
So, another top five right there.
57
00:02:55,889 --> 00:02:56,680
Chris: Different five.
58
00:02:57,040 --> 00:02:57,579
Ned: Yes.
59
00:02:58,020 --> 00:02:58,819
So Chris—
60
00:02:58,959 --> 00:02:59,249
Chris: What?
61
00:02:59,430 --> 00:03:01,950
Ned: What’s your general feeling on BGP?
62
00:03:02,960 --> 00:03:04,650
Chris: Anytime people start talking about it
63
00:03:05,760 --> 00:03:08,200
enthusiastically, I break a glass and walk away.
64
00:03:08,910 --> 00:03:10,520
Ned: [laugh] . You don’t threaten them with it?
65
00:03:10,830 --> 00:03:12,410
Chris: No, no, no, I just want the distraction.
66
00:03:12,790 --> 00:03:14,539
I understand and respect this conversation,
67
00:03:14,540 --> 00:03:17,769
but I don’t need it to be in my life at all.
68
00:03:18,449 --> 00:03:22,770
Ned: It does seem like one of those mysteries of
69
00:03:22,770 --> 00:03:25,080
the faith when it comes to network engineering.
70
00:03:25,120 --> 00:03:28,740
Like, BGP, it’s overseen by wizards—
71
00:03:28,960 --> 00:03:29,419
Chris: Oh, yeah.
72
00:03:29,469 --> 00:03:30,449
Ned: And warlocks.
73
00:03:30,609 --> 00:03:33,420
Chris: There are robes involved, incantations.
74
00:03:33,849 --> 00:03:35,730
Ned: At least one animal sacrifice.
75
00:03:36,139 --> 00:03:37,320
Chris: But not, like, a cute animal.
76
00:03:37,440 --> 00:03:38,430
They’re not monsters.
77
00:03:38,750 --> 00:03:39,160
Ned: No.
78
00:03:39,820 --> 00:03:42,950
I’m trying to think of a non-cute animal, but they’re also adorable.
79
00:03:43,220 --> 00:03:44,829
Chris: Only when they’re made into a Squishable.
80
00:03:45,250 --> 00:03:46,059
Ned: Oh, that’s true.
81
00:03:46,620 --> 00:03:48,584
So, many Squish models.
82
00:03:48,830 --> 00:03:50,320
My house is infested with them.
83
00:03:50,330 --> 00:03:52,329
It’s a real Tribbles kind of situation.
84
00:03:53,090 --> 00:03:53,990
What were we talking about?
85
00:03:54,529 --> 00:03:55,439
Chris: Uh, peanut butter?
86
00:03:55,819 --> 00:03:56,019
Ned: Yes.
87
00:03:56,290 --> 00:03:56,940
Chris: No, not again.
88
00:03:56,980 --> 00:03:57,450
Not again.
89
00:03:57,510 --> 00:03:59,360
Ned: No, no, no, no, we’re not going down that again.
90
00:03:59,430 --> 00:04:04,740
Okay, so I want to start today’s episode with a story from 2019, a
91
00:04:04,740 --> 00:04:09,359
story that involves messing up the internet for, kind of, everyone.
92
00:04:09,780 --> 00:04:13,099
A story that begins with a small company in rural Pennsylvania.
93
00:04:13,490 --> 00:04:19,090
The main culprit: BGP, aka, Border Gateway Protocol.
94
00:04:19,790 --> 00:04:23,499
Chris, you may remember this, but for those who aren’t familiar, the
95
00:04:23,500 --> 00:04:27,530
small company involved is called Allegheny Technologies Incorporated.
96
00:04:28,090 --> 00:04:32,320
And like any good technology company, when they needed to set up internet
97
00:04:32,320 --> 00:04:37,990
service, they didn’t just contract with one ISP, but instead they got
98
00:04:37,990 --> 00:04:42,890
connectivity from two, one from Verizon and one from a provider called DQE.
99
00:04:43,259 --> 00:04:44,300
That’s smart, you know?
100
00:04:44,300 --> 00:04:46,759
If DQE goes down, they can still get out through
101
00:04:46,760 --> 00:04:49,450
Verizon and people can reach them, et cetera, et cetera.
102
00:04:49,570 --> 00:04:50,180
You get the idea.
103
00:04:50,920 --> 00:04:54,120
Unfortunately, through a series of configuration errors
104
00:04:54,150 --> 00:04:57,670
and incompetence or laziness on the part of Verizon—
105
00:04:58,220 --> 00:04:58,260
Chris: [gasp]
106
00:04:58,840 --> 00:05:04,890
Ned: —shocking, I know [laugh] —deep breaths—large swaths of clients on the
107
00:05:04,890 --> 00:05:10,230
internet suddenly had their traffic routed through DQE to Allegheny Inc.
108
00:05:10,690 --> 00:05:12,469
And then back out through Verizon.
109
00:05:13,230 --> 00:05:18,539
An article on Cloudflare’s website compared it to routing all of the
110
00:05:18,540 --> 00:05:23,270
traffic for a major highway through a small suburban development.
111
00:05:24,049 --> 00:05:25,880
I think that’s actually an understatement.
112
00:05:26,960 --> 00:05:29,730
This would be like taking all the traffic from all the
113
00:05:29,730 --> 00:05:33,210
major highways in the United States and putting them
114
00:05:33,240 --> 00:05:37,580
through one small street in, like, gridlock Philadelphia.
115
00:05:38,340 --> 00:05:42,449
Chris: Or, like an unpaved one lane road.
116
00:05:42,620 --> 00:05:44,219
Ned: In Old City [laugh] yes.
117
00:05:45,240 --> 00:05:50,489
DQE and Allegheny obviously did not have the capacity to handle such a
118
00:05:50,490 --> 00:05:55,270
ridiculous increase in traffic, so they started dropping packets like crazy, and
119
00:05:55,270 --> 00:06:00,169
I’d imagine that one or more routers in the path just completely melted down.
120
00:06:00,549 --> 00:06:06,060
Eventually Cloudflare was able to reach engineers at DQE and get the
121
00:06:06,060 --> 00:06:10,929
situation resolved, but even with the fix in place, it took a few hours for
122
00:06:10,929 --> 00:06:15,250
the global internet to converge on the updated and now corrected routing.
123
00:06:15,880 --> 00:06:19,920
The Cloudflare article also details three different ways that this
124
00:06:19,930 --> 00:06:25,330
particular incident could have been avoided, specifically, prefix limits,
125
00:06:25,340 --> 00:06:31,600
IRR filtering, and RPKI don’t worry about what those things are just yet.
126
00:06:31,889 --> 00:06:35,649
We will get to them later, and by later I mean, next episode.
127
00:06:36,219 --> 00:06:36,499
Chris: [laugh]
128
00:06:36,810 --> 00:06:37,410
.
Ned: Probably.
129
00:06:38,150 --> 00:06:41,660
We’re going to use this little tale that I’ve told as a touchstone
130
00:06:42,030 --> 00:06:46,619
for this and however many more episodes it takes me to cover BGP.
131
00:06:46,870 --> 00:06:48,250
Chris: My guess is ten.
132
00:06:48,259 --> 00:06:48,269
Ned: Ahhh.
133
00:06:49,340 --> 00:06:50,609
I mean, at least.
134
00:06:51,040 --> 00:06:51,580
Minimum.
135
00:06:52,010 --> 00:06:56,330
I also plan on bringing on a real BGP expert in a later
136
00:06:56,330 --> 00:06:59,590
episode who can help us understand how to operate BGP
137
00:07:00,380 --> 00:07:05,469
securely because—spoiler—it’s horribly insecure right now.
138
00:07:05,469 --> 00:07:06,434
Chris: Ahhh.
139
00:07:07,400 --> 00:07:09,580
Ned: Yeah, shocking, I know.
140
00:07:10,230 --> 00:07:14,900
But first, what the hell is BGP, and how can it wreck a whole person’s day?
141
00:07:15,320 --> 00:07:16,430
Chris: Or even half a person.
142
00:07:17,130 --> 00:07:18,200
Ned: BGP history.
143
00:07:19,060 --> 00:07:22,670
I recommend drinking during this portion [laugh] . Okay, so,
144
00:07:23,130 --> 00:07:27,580
as I said earlier, BGP, it stands for Border Gateway Protocol.
145
00:07:27,880 --> 00:07:30,830
There’s a border, and it involves gateway, and this is a protocol.
146
00:07:30,880 --> 00:07:33,020
It is exactly what it says on the tin.
147
00:07:33,310 --> 00:07:34,770
Chris: You needed 3500 words?
148
00:07:34,770 --> 00:07:35,710
You could have just said that?
149
00:07:36,400 --> 00:07:38,670
I thought this was going to be, like, a full episode.
150
00:07:38,889 --> 00:07:39,530
Ned: Oh, no, that’s it.
151
00:07:39,540 --> 00:07:40,030
We’re done.
152
00:07:40,059 --> 00:07:40,449
Chris: Yeah.
153
00:07:40,530 --> 00:07:41,500
Ned: Everybody can go home.
154
00:07:41,969 --> 00:07:42,890
I explained it all.
155
00:07:43,590 --> 00:07:46,080
Okay, everybody that’s still here, let’s get into it.
156
00:07:46,609 --> 00:07:51,239
So, it is the exterior gateway protocol that the internet uses to figure
157
00:07:51,240 --> 00:07:55,650
out how to get packets from a source to a destination, and then back again.
158
00:07:56,139 --> 00:07:58,700
To understand why BGP exists and how it
159
00:07:58,700 --> 00:08:00,989
functions, we’re going to have to go back in time.
160
00:08:01,690 --> 00:08:04,710
Grab your best leg warmers, your heather gray sweatshirt,
161
00:08:04,990 --> 00:08:08,039
and red bandana because it’s time to get totally ’80s.
162
00:08:09,800 --> 00:08:10,690
No comment on that?
163
00:08:10,950 --> 00:08:12,500
Chris: No I’m just a little offended that you
164
00:08:12,500 --> 00:08:15,680
used my current outfit as some kind of joke.
165
00:08:16,090 --> 00:08:17,990
Ned: It was an inspiration, if you will.
166
00:08:18,750 --> 00:08:21,969
As we covered in a previous episode about DNS, the modern
167
00:08:22,000 --> 00:08:26,110
internet grew out of ARPANET, and its replacement NSFNET.
168
00:08:26,620 --> 00:08:28,960
Chris: Which is totally different than NsfwNET,
169
00:08:29,260 --> 00:08:31,140
which we’ll talk about on a later episode.
170
00:08:31,310 --> 00:08:35,069
Ned: [laugh] . That’s behind the Patreon paywall.
171
00:08:35,189 --> 00:08:35,229
Chris: [laugh]
172
00:08:36,829 --> 00:08:38,029
.
Ned: Ned and Chris after dark.
173
00:08:38,669 --> 00:08:40,130
If you want that, let us know.
174
00:08:40,190 --> 00:08:43,555
I think it’d be awful, but you know, you’re willing to pay for it [laugh]
175
00:08:43,890 --> 00:08:46,610
.
Chris: Previous evidence has shown that no one will ever want that.
176
00:08:46,890 --> 00:08:47,110
Ned: Okay,
177
00:08:49,370 --> 00:08:51,580
good [laugh] . NSFNET was established by the National Science
178
00:08:51,580 --> 00:08:55,569
Foundation, and its original intention was to connect five
179
00:08:55,570 --> 00:09:00,220
supercomputers in the US and various campus networks, tie them all
180
00:09:00,220 --> 00:09:04,990
together using a backbone network that NSF would help fund and manage.
181
00:09:05,930 --> 00:09:10,909
The backbone network was run by a single entity, and used leased lines
182
00:09:10,910 --> 00:09:15,940
from telcos that were running at a blazing 56 kilobits per second.
183
00:09:15,950 --> 00:09:17,430
Chris: Oof, Mario Andretti.
184
00:09:18,320 --> 00:09:19,140
Ned: Scorching.
185
00:09:19,860 --> 00:09:24,610
If you had a 56k modem in the early-’90s, you had the
186
00:09:24,610 --> 00:09:29,500
same network bandwidth as NSFNET at its inception in 1986.
187
00:09:30,219 --> 00:09:31,189
You probably didn’t have a supercomputer,
188
00:09:31,559 --> 00:09:33,480
but I mean, you had the effective bandwidth.
189
00:09:34,809 --> 00:09:37,170
NSFNET wasn’t open to just anyone.
190
00:09:37,180 --> 00:09:41,300
You couldn’t dial up and, you know, put it on the little cradle thing
191
00:09:41,620 --> 00:09:46,319
for your modem; they had a process by which regional networks could join.
192
00:09:47,170 --> 00:09:51,759
And those regional networks in turn had to adhere to the acceptable
193
00:09:51,759 --> 00:09:58,220
use policy of NSFNET, which precluded using NSFNET for making money.
194
00:09:58,600 --> 00:10:03,080
This was supposed to be campuses, and universities, and educational
195
00:10:03,080 --> 00:10:06,970
institutions all coming together to do research and trade information.
196
00:10:06,980 --> 00:10:08,510
So, this wasn’t about making money.
197
00:10:08,900 --> 00:10:09,750
That comes later.
198
00:10:10,260 --> 00:10:14,829
The whole thing was overseen by Merit Network, which was a networking consortium
199
00:10:14,830 --> 00:10:19,979
out of Michigan, and they ran a network operation center, and they worked to
200
00:10:19,980 --> 00:10:24,440
design and implement the network connectivity that was used by the backbone.
201
00:10:25,219 --> 00:10:28,860
Since the NSFNET formed the backbone of all of these different
202
00:10:28,879 --> 00:10:32,520
networks and their interconnectivity, there was a hierarchy,
203
00:10:32,680 --> 00:10:37,310
and all inter-network traffic had to traverse this backbone.
204
00:10:37,570 --> 00:10:41,400
So, if Regional Network A wanted to talk to Regional Network B,
205
00:10:41,670 --> 00:10:44,999
it would go up [background noise] to the backbone—what was that?
206
00:10:45,210 --> 00:10:46,389
Chris: I didn’t drop my fidget toy.
207
00:10:46,469 --> 00:10:47,470
I don’t have a fidget toy.
208
00:10:47,480 --> 00:10:51,830
— Ned: [laugh] —it would send the traffic up to the backbone, and then the
209
00:10:51,840 --> 00:10:56,810
backbone would take it to Regional Network B, and send the traffic back down.
210
00:10:57,120 --> 00:11:00,380
So, it was a relatively simple network when it comes to the
211
00:11:01,030 --> 00:11:04,040
interconnectivity between all these regional networks and the supercomputer.
212
00:11:04,559 --> 00:11:07,989
The NSFNET knew all the connected networks and could pretty
213
00:11:08,000 --> 00:11:11,939
easily route traffic from one network to another, but it also came
214
00:11:11,940 --> 00:11:15,520
with the lack of resiliency and serious bandwidth constraints.
215
00:11:16,130 --> 00:11:19,020
You only had one connection to the other regional network, and if the
216
00:11:19,030 --> 00:11:22,490
backbone went down or was congested, you were kind of out of luck.
217
00:11:23,330 --> 00:11:28,569
NSFNET had to pretty quickly update their backbone from these 56 kilobit
218
00:11:28,600 --> 00:11:34,829
per second lines to T1 lines that ran at 1.5 megabits per second.
219
00:11:35,389 --> 00:11:37,170
That happened in 1988.
220
00:11:37,549 --> 00:11:41,410
And then they had to upgrade them again in 1991 to
221
00:11:41,500 --> 00:11:45,450
45 megabits per second, which was known as a T3 line.
222
00:11:46,050 --> 00:11:49,510
While it was possible to keep increasing the speed of the leased
223
00:11:49,520 --> 00:11:54,280
lines that formed the NSFNET backbone, additional lines were
224
00:11:54,340 --> 00:11:58,770
added, which introduced multiple paths for traffic to travel.
225
00:11:59,549 --> 00:12:03,610
At the same time, NSFNET was connecting with networks in other countries
226
00:12:03,810 --> 00:12:08,010
and to even more networks in the US, so the idea of handcrafting
227
00:12:08,040 --> 00:12:12,709
traffic routing tables to efficiently move traffic was no longer viable.
228
00:12:13,590 --> 00:12:17,819
Back in the early-’80s, the networking group at the IETF was aware
229
00:12:17,820 --> 00:12:21,910
of the looming issues behind the inter-network routing, and so they
230
00:12:21,910 --> 00:12:28,119
proposed what they called the Exterior Gateway Protocol in RFC 827.
231
00:12:28,510 --> 00:12:33,199
And that was in 1982, and then it was updated further in 1984.
232
00:12:34,229 --> 00:12:40,270
And EGPwas actually used by NSFNET, but it had some serious shortcomings,
233
00:12:40,340 --> 00:12:48,139
so in 1989, RFC 1105 proposed the Border Gateway Protocol to replace EGP.
234
00:12:48,830 --> 00:12:52,409
To make it even more confusing, all routing protocols that
235
00:12:52,449 --> 00:12:55,780
are inter-network routing protocols are called ‘exterior
236
00:12:55,810 --> 00:12:59,650
gateway protocols.’ That’s not going to be confusing at all.
237
00:13:00,250 --> 00:13:00,870
Chris: Definitely not.
238
00:13:01,300 --> 00:13:02,890
Ned: The important thing to understand is that
239
00:13:02,940 --> 00:13:05,750
EGP as its own standard has since been retired.
240
00:13:05,860 --> 00:13:10,340
So, you can refer to EGP as broadly any protocol
241
00:13:10,340 --> 00:13:12,599
that handles this inter-network traffic.
242
00:13:13,450 --> 00:13:16,820
BGP itself is sometimes referred to as the three-napkin
243
00:13:16,830 --> 00:13:21,310
protocol, as the original ideas that underpin it were scribbled
244
00:13:21,310 --> 00:13:25,310
out by two engineers in Austin across three ketchup napkins.
245
00:13:25,920 --> 00:13:27,669
There’s no ketchup on the actual napkins; they were
246
00:13:27,670 --> 00:13:30,790
just, I guess, at a fast food place that served fries,
247
00:13:30,790 --> 00:13:32,569
and you were supposed to put ketchup on the napkins.
248
00:13:32,620 --> 00:13:33,190
I don’t know.
249
00:13:33,400 --> 00:13:34,410
Weird terminology.
250
00:13:35,340 --> 00:13:37,680
Chris: Maybe the napkins were sponsored by big ketchup.
251
00:13:38,059 --> 00:13:38,459
Ned: Ohhh.
252
00:13:39,139 --> 00:13:39,719
Heinz.
253
00:13:39,800 --> 00:13:40,600
Got to watch out.
254
00:13:40,730 --> 00:13:42,389
They get their paws into everything.
255
00:13:42,550 --> 00:13:44,870
They’re red, yucky paws.
256
00:13:45,530 --> 00:13:47,329
That’s an awful visual, I’m sorry.
257
00:13:47,820 --> 00:13:50,680
So, while this story might seem apocryphal,
258
00:13:51,100 --> 00:13:53,530
they have actual pictures of the napkins.
259
00:13:53,820 --> 00:13:57,060
There’s no ketchup stains, but it does have the actual diagrams
260
00:13:57,080 --> 00:14:00,919
and sort of the flow for distributing routes in a BGP system.
261
00:14:01,170 --> 00:14:02,300
Chris: All right, I’m going to ignore you for a
262
00:14:02,300 --> 00:14:04,510
minute and actually look this up because I’m curious.
263
00:14:05,609 --> 00:14:06,290
Ned: [laugh] . Fair enough.
264
00:14:07,210 --> 00:14:13,000
BGP was not meant to be a long-term fix for the problems that NSFNET
265
00:14:13,770 --> 00:14:17,150
was experiencing, and that the larger internet would experience.
266
00:14:17,620 --> 00:14:21,100
It was just meant to be a relatively short-term fix to deal with
267
00:14:21,110 --> 00:14:25,080
the explosion of networks that were now forming the internet.
268
00:14:26,040 --> 00:14:29,570
The engineers really thought that they would come along later and replace
269
00:14:29,570 --> 00:14:33,930
it at some future point with a more robust and well-thought-out protocol.
270
00:14:33,990 --> 00:14:36,210
And that’s adorable.
271
00:14:36,970 --> 00:14:37,640
Chris: Still searching.
272
00:14:37,650 --> 00:14:38,910
I’m sure what you’re saying is interesting.
273
00:14:39,240 --> 00:14:39,720
Ned: Mm-hm.
274
00:14:40,510 --> 00:14:45,160
It’s a well-known fact that anything that you put into production, even if it’s
275
00:14:45,160 --> 00:14:51,949
supposed to be a temporary fix, will become a [laugh] a pillar of everything
276
00:14:51,950 --> 00:14:56,119
else that’s built later, and it’s going to be very hard to remove that pillar.
277
00:14:58,190 --> 00:14:59,260
BGP is no exception.
278
00:14:59,889 --> 00:15:03,860
They mapped it out in 1989, and we’re still waiting for its replacement.
279
00:15:04,650 --> 00:15:07,140
This is going to become important as we start to talk about
280
00:15:07,590 --> 00:15:11,530
BGP and its security controls, or its complete lack thereof.
281
00:15:11,920 --> 00:15:13,619
They didn’t think they needed them because
282
00:15:13,620 --> 00:15:15,479
this was supposed to be a stopgap measure.
283
00:15:16,240 --> 00:15:20,589
BGP was iterated on quickly, with version two coming in 1990.
284
00:15:20,650 --> 00:15:22,770
So, that’s a year later from the original idea.
285
00:15:23,110 --> 00:15:27,280
Version three came in 1991, and version four came in 1994.
286
00:15:28,219 --> 00:15:31,760
Version four is the current version of BGP in use by
287
00:15:31,770 --> 00:15:35,549
the internet today, so let’s talk about how it works.
288
00:15:35,830 --> 00:15:38,900
Unless you have some interesting information about these ketchup napkins.
289
00:15:39,170 --> 00:15:41,590
Chris: Are you sure it wasn’t called the two-napkin protocol?
290
00:15:41,890 --> 00:15:42,210
Ned: Nope.
291
00:15:42,250 --> 00:15:42,960
Three napkins.
292
00:15:43,130 --> 00:15:44,520
It had a picture of three napkins.
293
00:15:44,520 --> 00:15:47,339
It’s not the first thing to be drawn out on napkins, though.
294
00:15:47,670 --> 00:15:48,750
Because engineers—
295
00:15:48,759 --> 00:15:51,249
Chris: We could do a whole episode on things that were drawn out on napkins.
296
00:15:51,259 --> 00:15:54,370
Ned: [laugh] . Oh, and how they’re all universally terrible.
297
00:15:55,340 --> 00:15:55,360
[sigh]
298
00:15:56,060 --> 00:15:56,340
.
Chris: Anyway.
299
00:15:56,340 --> 00:15:56,360
Ned: So—
300
00:15:57,000 --> 00:15:58,109
Chris: Back to whatever it is we—
301
00:15:58,110 --> 00:15:58,390
Ned: BGP.
302
00:15:58,460 --> 00:16:00,010
Chris: Which was—oh right, BGP.
303
00:16:00,020 --> 00:16:00,750
That’s what you were saying.
304
00:16:00,790 --> 00:16:01,089
Okay.
305
00:16:01,200 --> 00:16:02,610
Ned: We’re going to—not napkins—
306
00:16:02,710 --> 00:16:02,900
Chris: I’m back.
307
00:16:02,900 --> 00:16:04,230
Ned: —but we can talk about napkins still.
308
00:16:04,360 --> 00:16:05,549
I have strong opinions.
309
00:16:06,170 --> 00:16:09,280
How expansive do we need to get here about BGP?
310
00:16:09,969 --> 00:16:13,230
I’m going to assume that most people listening
311
00:16:13,490 --> 00:16:15,730
know at least a bit about networking.
312
00:16:16,050 --> 00:16:17,310
At least, I hope so.
313
00:16:17,320 --> 00:16:21,680
Like, otherwise, why are you tuning into this podcast [laugh] ? Be super weird.
314
00:16:21,960 --> 00:16:22,469
Except for you.
315
00:16:22,469 --> 00:16:22,889
Hi, mom.
316
00:16:23,230 --> 00:16:24,990
Chris: Oh, don’t act like your mother listens.
317
00:16:25,170 --> 00:16:26,300
Ned: It’s cruel and true.
318
00:16:27,170 --> 00:16:31,200
So, I’m going to take it as a given that most people know what an IP address
319
00:16:31,200 --> 00:16:36,370
is, are vaguely aware of TCP and how it works, and have at least heard
320
00:16:36,370 --> 00:16:40,400
of routing protocols, even if you don’t understand any of them, even RIP.
321
00:16:41,300 --> 00:16:44,290
Maybe the best thing here would be a packet walk.
322
00:16:44,849 --> 00:16:51,300
How does a packet on my desktop make its way to pod.chaoslever.com.
323
00:16:51,310 --> 00:16:53,199
Just pulling an address out of the air.
324
00:16:53,580 --> 00:16:54,370
Chris: Totally random.
325
00:16:54,580 --> 00:16:55,300
Ned: Totally random.
326
00:16:55,860 --> 00:16:59,860
First, my desktop has to figure out the IP address to
327
00:16:59,870 --> 00:17:03,079
send the web request to, and that’s a function of DNS.
328
00:17:04,099 --> 00:17:08,210
And Chris, as you know, we did two whole last shows about DNS.
329
00:17:08,589 --> 00:17:09,409
Go look them up.
330
00:17:09,980 --> 00:17:10,589
Enjoy them.
331
00:17:11,240 --> 00:17:16,629
Pod.chaoslever.com is hosted on Podpage, which has a few
332
00:17:16,630 --> 00:17:24,099
different public IP addresses on the 216.239.32.0/19 network.
333
00:17:24,389 --> 00:17:25,430
Make sure you remember that.
334
00:17:25,440 --> 00:17:26,530
There will be a test later.
335
00:17:27,210 --> 00:17:30,470
Once I have an IP address, how does my
336
00:17:30,470 --> 00:17:33,550
desktop know where to send that web request?
337
00:17:33,820 --> 00:17:35,939
How does it actually route the packet there?
338
00:17:36,559 --> 00:17:39,789
Well, my desktop’s networking stack has a route table in it.
339
00:17:40,490 --> 00:17:43,190
If you’re on a Windows box like me, open up a
340
00:17:43,190 --> 00:17:47,020
terminal and run the command ‘route print-4’.
341
00:17:47,490 --> 00:17:51,359
That will give you all the routes stored locally for IPv4.
342
00:17:52,170 --> 00:17:57,969
On Linux, it’s probably something like ‘ip route list.’ On Mac, I have no idea.
343
00:17:57,969 --> 00:18:00,600
I think it’s also ‘ip route list’ or something similar?
344
00:18:00,750 --> 00:18:01,250
Chris: Correct.
345
00:18:01,660 --> 00:18:04,060
Ned: This list determines where a packet is
346
00:18:04,060 --> 00:18:07,370
sent, with the most specific entry winning.
347
00:18:07,860 --> 00:18:12,720
Now, since the website I’m trying to contact has a public IP address, my desktop
348
00:18:12,730 --> 00:18:18,440
is going to use what’s called the default route, which looks like 0.0.0.0, which
349
00:18:18,460 --> 00:18:26,700
in my case, points to the home router as the next hop, which is 192.168.1.1.
350
00:18:26,740 --> 00:18:27,620
I’m very creative.
351
00:18:27,650 --> 00:18:28,560
Yes, you’re welcome.
352
00:18:29,130 --> 00:18:32,879
Chances are that is the [laugh] gateway of your home router as well.
353
00:18:33,620 --> 00:18:38,199
Once my packet hits that router, it checks the route table there—or the
354
00:18:38,200 --> 00:18:42,380
router checks its route table—and decides where to send the traffic next.
355
00:18:43,320 --> 00:18:47,420
My router has a single WAN interface, and that when interface
356
00:18:47,429 --> 00:18:50,999
has a public IP address that was handed out by my ISP.
357
00:18:51,820 --> 00:18:55,700
There is a default route on my router that sends traffic to
358
00:18:55,700 --> 00:18:59,980
the next hop that my ISP lists, which is going to be some
359
00:19:00,020 --> 00:19:03,849
kind of router on their side that has its own routing table.
360
00:19:04,530 --> 00:19:09,649
My ISP is Verizon, and my packet may bounce around inside of the Verizon
361
00:19:09,660 --> 00:19:13,790
network for a while before emerging at one of their peering endpoints.
362
00:19:14,150 --> 00:19:16,100
And we’ll cover peering in a little bit.
363
00:19:16,590 --> 00:19:20,310
So, we’ve gone from my desktop to my home router to one
364
00:19:20,310 --> 00:19:22,840
of Verizon’s routers, and then it bounces around inside
365
00:19:22,950 --> 00:19:25,610
of their network until it emerges to go get to Podpage.
366
00:19:27,170 --> 00:19:30,650
That network—Verizon’s network that’s all the various routers that
367
00:19:30,650 --> 00:19:35,480
they control—is what’s referred to as an autonomous system, or AS.
368
00:19:36,180 --> 00:19:40,359
That network is privately managed by Verizon, and all traffic inside their
369
00:19:40,360 --> 00:19:45,909
network is routed using whatever Interior Gateway Protocol they want to use.
370
00:19:46,180 --> 00:19:46,510
That’s an IGP.
371
00:19:47,820 --> 00:19:48,129
Wooo.
372
00:19:48,750 --> 00:19:56,300
That could be ISIS, OSPF, or even an internal version of BGP called iBGP.
373
00:19:56,830 --> 00:19:59,090
We’re not going to get into that; just know it exists.
374
00:19:59,860 --> 00:20:02,450
That internal routing protocol is going to decide
375
00:20:02,460 --> 00:20:05,789
where my packet emerges from the Verizon network.
376
00:20:06,559 --> 00:20:11,970
The path that my packet takes once it hits the border between Verizon and other
377
00:20:11,990 --> 00:20:17,510
autonomous systems will depend on external BGP and how it makes decisions.
378
00:20:18,450 --> 00:20:22,899
Each autonomous system on the internet gets an AS number or ASN.
379
00:20:24,480 --> 00:20:30,130
The original ASN specification used 16 bits, so the
380
00:20:30,130 --> 00:20:36,429
maximum AS number was 65,355, because we count from zero.
381
00:20:37,210 --> 00:20:40,850
And just like IPv4, there is a range of ASNs
382
00:20:40,959 --> 00:20:43,640
that are reserved for private or internal use.
383
00:20:43,830 --> 00:20:48,000
So, if you were setting up iBGP, you would use those internal ASNs.
384
00:20:49,640 --> 00:20:53,370
The rest of them are managed by the internet Assigned Numbers Authority
385
00:20:53,389 --> 00:20:58,169
or IANA, which maybe has an acronym pronunciation, I’m not sure.
386
00:20:58,180 --> 00:20:59,210
Have you ever heard one?
387
00:21:01,360 --> 00:21:01,720
Chris: Uh, Jana?
388
00:21:01,980 --> 00:21:02,240
Ned: Ayana?
389
00:21:02,250 --> 00:21:02,260
Eh.
390
00:21:02,780 --> 00:21:03,370
It’s IANA.
391
00:21:03,370 --> 00:21:05,129
Chris: I think that was a Fleetwood Mac song.
392
00:21:05,710 --> 00:21:06,090
Ned: Nice.
393
00:21:07,300 --> 00:21:09,040
[sigh] . Wonder where they got that name,
394
00:21:09,460 --> 00:21:11,650
the internet Assigned Numbers Authority.
395
00:21:12,440 --> 00:21:13,820
They assign numbers.
396
00:21:14,750 --> 00:21:20,120
Blocks of ASNs are handed out from the IANA to regional
397
00:21:20,150 --> 00:21:23,580
internet registries, and those handle the actual assignment
398
00:21:23,630 --> 00:21:29,280
of ASNs to people who want ASNs, these regional networks.
399
00:21:29,910 --> 00:21:34,360
When BGP was first implemented 16 bits probably seemed like plenty,
400
00:21:34,820 --> 00:21:38,650
and also was what routers were capable of handling at the time.
401
00:21:39,230 --> 00:21:46,630
In 2012, RFC 6793 expanded ASN to use four octets, or 32 bits,
402
00:21:47,130 --> 00:21:51,430
which raised the number of available numbers to roughly 4 billion.
403
00:21:51,910 --> 00:21:52,880
Will that be enough?
404
00:21:53,309 --> 00:21:57,550
At the moment, current statistics show that regional internet registries
405
00:21:57,550 --> 00:22:02,919
have handed out 130,000 ASN, so, um… I think we’ll be all right, for a while.
406
00:22:03,400 --> 00:22:04,270
Chris: We’ll be good, I think.
407
00:22:04,270 --> 00:22:04,830
We’ll be good.
408
00:22:05,219 --> 00:22:07,990
Ned: This is very different than the lack of available public
409
00:22:09,080 --> 00:22:12,099
IPv4 addresses because it’s not like every device gets an ASN.
410
00:22:12,560 --> 00:22:15,070
It’s every large network gets one.
411
00:22:15,950 --> 00:22:21,099
Still, though, that’s 130,000 public-facing as NS that BGP
412
00:22:21,110 --> 00:22:23,930
has to worry about when it comes to routing your packets.
413
00:22:24,360 --> 00:22:25,389
This thing has to be scalable.
414
00:22:26,120 --> 00:22:27,449
So, how does it do that?
415
00:22:28,160 --> 00:22:30,229
Chris: I thought we already established that: magic.
416
00:22:30,510 --> 00:22:30,770
Ned: Yes.
417
00:22:31,190 --> 00:22:32,210
That’s essentially what it is.
418
00:22:32,240 --> 00:22:35,879
And if you want to stop there, and just know that that’s what BGP is responsible
419
00:22:35,880 --> 00:22:41,110
for, you can ignore the next, like, ten minutes [laugh] . To get into some of
420
00:22:41,110 --> 00:22:44,509
the detail—and we’re not going to get down to nitty gritty here, but just some
421
00:22:44,509 --> 00:22:49,560
of the detail here—BGP is what’s called a path vector-based routing protocol,
422
00:22:49,870 --> 00:22:55,230
which means that it decides on a specific path for a route-based on attributes.
423
00:22:55,770 --> 00:22:59,090
Vector is the direction and path is the selection.
424
00:22:59,690 --> 00:23:02,840
BGP doesn’t understand or care about things like
425
00:23:02,920 --> 00:23:07,080
bandwidth, or latency, or even hops, really.
426
00:23:07,670 --> 00:23:10,870
Instead, it has a path selection algorithm that walks
427
00:23:10,870 --> 00:23:14,190
through the attributes of each possible path for a packet,
428
00:23:14,599 --> 00:23:17,899
and then picks one based on the selection criteria.
429
00:23:18,700 --> 00:23:21,379
We’ll get into the actual process it uses in a
430
00:23:21,380 --> 00:23:24,189
moment, but where is it getting this information from?
431
00:23:24,959 --> 00:23:26,010
From its neighbors.
432
00:23:26,530 --> 00:23:27,929
Oh, they have neighbors.
433
00:23:27,980 --> 00:23:29,110
It’s like a community.
434
00:23:29,520 --> 00:23:31,460
And there’s also communities [laugh]
435
00:23:31,480 --> 00:23:33,989
.
Chris: I would just like to pause and remind everybody that Ned
436
00:23:34,050 --> 00:23:37,360
explicitly said he wasn’t going to get into the nitty-gritty.
437
00:23:37,590 --> 00:23:38,709
Ned: I’m not [laugh]
438
00:23:39,130 --> 00:23:40,269
.
Chris: That’s the thing.
439
00:23:41,980 --> 00:23:44,680
Ned: This is the high-level stuff [laugh] . It gets so much deeper.
440
00:23:44,890 --> 00:23:47,730
Chris: No, no, I just wanted to point that out to explain to people
441
00:23:47,830 --> 00:23:52,340
a little more justification as to why my run away screaming protocol
442
00:23:52,420 --> 00:23:56,420
is what I operate upon when BGP comes up in quiet conversation.
443
00:23:57,360 --> 00:23:57,649
Ned: Right.
444
00:23:57,649 --> 00:24:01,970
All right, so if I’m a BGP—I’m a router running BGP,
445
00:24:01,970 --> 00:24:06,090
you can call me a node—I form relationships with other
446
00:24:06,090 --> 00:24:09,340
routers running BGP through what’s called neighborships.
447
00:24:09,340 --> 00:24:11,605
I don’t like the term, but apparently it’s used.
448
00:24:11,605 --> 00:24:12,670
Chris: Please tell me that’s not real.
449
00:24:12,830 --> 00:24:13,520
Ned: That’s real.
450
00:24:13,910 --> 00:24:14,529
I’m sorry.
451
00:24:15,009 --> 00:24:18,049
Setting up a neighborship is very, very simple.
452
00:24:18,370 --> 00:24:21,149
Let’s say we’ve got two routers: Router A and Router B.
453
00:24:21,720 --> 00:24:21,996
On Router—
454
00:24:21,996 --> 00:24:23,080
Chris: I just got—oh, my God.
455
00:24:23,320 --> 00:24:23,610
Ned: What?
456
00:24:24,270 --> 00:24:24,590
Chris: Neighborship?
457
00:24:24,590 --> 00:24:24,600
Ned: Neighborship.
458
00:24:26,849 --> 00:24:30,950
I heard it first, and that was like that can’t possibly be the real term.
459
00:24:31,639 --> 00:24:34,689
They’re also called peers, and I like that better, but
460
00:24:34,709 --> 00:24:38,110
that gets into the difference between peering and transit.
461
00:24:38,590 --> 00:24:39,669
And so…
462
00:24:39,969 --> 00:24:42,690
Chris: Can you hold on for one second, I got to go get a glass.
463
00:24:44,550 --> 00:24:45,409
Ned: [laugh] . Smash it real hard.
464
00:24:47,179 --> 00:24:50,159
[sigh] . The problem is that we use the same terms to mean
465
00:24:50,160 --> 00:24:52,990
too many different things in technology, and so sometimes
466
00:24:52,990 --> 00:24:56,010
we just got to make up a word, and it’s not always good.
467
00:24:56,990 --> 00:24:57,490
Anyway.
468
00:24:58,740 --> 00:25:01,320
So, let’s say I have two routers: Router A, Router B.
469
00:25:01,600 --> 00:25:06,829
On Router A, I tell it the IP address of Router B and its ASN.
470
00:25:06,829 --> 00:25:13,810
And then over on Router B, I tell it the IP address of Router A and its ASN.
471
00:25:13,820 --> 00:25:16,600
On Router A, I add any networks that I want to
472
00:25:16,620 --> 00:25:21,560
advertise, and same thing for Router B, and that’s it.
473
00:25:22,360 --> 00:25:25,480
The two routers will establish a TCP connection over
474
00:25:25,480 --> 00:25:29,449
port 179, and start exchanging route information.
475
00:25:29,959 --> 00:25:33,200
Each router will share the networks that it is advertising
476
00:25:33,380 --> 00:25:36,470
and any networks it learned about from other routers.
477
00:25:37,250 --> 00:25:41,600
And BGP only sends messages across that link
478
00:25:41,650 --> 00:25:44,210
when there’s an update to its advertised routes.
479
00:25:44,230 --> 00:25:48,910
So, unlike something like RIP that, every 30 seconds goes, “Here’s all my
480
00:25:48,910 --> 00:25:54,630
routes.” “Here’s all my routes.” That would be bad and awful, so BGP just
481
00:25:54,710 --> 00:25:59,200
sends information when something changes about one of the advertised routes.
482
00:25:59,550 --> 00:26:05,239
Otherwise, just hangs out, chills, plays Pinochle, and every 30
483
00:26:05,239 --> 00:26:07,639
or 60 seconds, it sends a keep-alive saying, “Yep, I’m still here.
484
00:26:07,770 --> 00:26:10,260
I got nothing new to say.” Kind of like you, Chris.
485
00:26:10,330 --> 00:26:13,760
I check in every 30 to 60 seconds to make sure you’re still here [laugh]
486
00:26:14,170 --> 00:26:15,870
.
Chris: As usual, I’ve got nothing new to say.
487
00:26:16,899 --> 00:26:20,839
Ned: [laugh] . Indeed, the routing decisions made by Router A
488
00:26:20,839 --> 00:26:24,449
will depend on the advertisements it gets from its neighbors.
489
00:26:25,000 --> 00:26:28,160
So, so far, we’ve just got Router A and B, but we
490
00:26:28,160 --> 00:26:31,800
can add additional routers as neighbors: C, D, and E.
491
00:26:32,510 --> 00:26:37,640
Router A learns about routes to different networks from all of these neighbors,
492
00:26:37,950 --> 00:26:41,820
and then makes path-based decisions based on the routes that it learned.
493
00:26:41,820 --> 00:26:47,510
BGP network advertisements can have a ton of attributes, but
494
00:26:47,510 --> 00:26:51,080
there’s really only about eight standard ones that are commonly
495
00:26:51,080 --> 00:26:55,020
used, and honestly, there’s probably only about three or four
496
00:26:55,020 --> 00:26:58,510
that actually matter, so we’re just going to talk about those.
497
00:26:59,000 --> 00:26:59,539
Chris: Thank God.
498
00:26:59,920 --> 00:27:00,380
Ned: Yes.
499
00:27:01,139 --> 00:27:06,750
Local preference is an attribute that lets you prefer one route over another.
500
00:27:07,120 --> 00:27:11,029
I could give Router B preference over Router C.
501
00:27:11,759 --> 00:27:12,639
Very straightforward.
502
00:27:13,469 --> 00:27:17,039
If both routers are an option for a given destination,
503
00:27:17,619 --> 00:27:19,980
the one with the higher preference gets the nod.
504
00:27:20,040 --> 00:27:23,880
So, Router B would get—I’d send my traffic to Router B instead of Router C.
505
00:27:24,540 --> 00:27:27,639
That’s useful if, say, the link on Router B is a
506
00:27:27,650 --> 00:27:30,680
ten gig link and the link to Router C is one gig.
507
00:27:30,990 --> 00:27:34,050
I probably want to use the link to Router B if I can help it.
508
00:27:34,590 --> 00:27:36,750
BGP doesn’t know about link speed, but you do.
509
00:27:37,460 --> 00:27:39,989
The next attribute is AS path length.
510
00:27:41,180 --> 00:27:44,970
The AS path is a list of every autonomous system a
511
00:27:44,970 --> 00:27:48,210
packet will pass through, from source to destination.
512
00:27:48,849 --> 00:27:51,560
So, when a router learns about a route from one of its
513
00:27:51,570 --> 00:27:55,199
neighbors and wants to share that route with the next router
514
00:27:55,200 --> 00:27:59,890
in line, it tacks on its AS number to the end of the AS path.
515
00:28:00,720 --> 00:28:05,909
So, the more autonomous systems a route travels through, the longer the
516
00:28:05,910 --> 00:28:12,500
path length becomes, and that makes it less preferred as a path to choose.
517
00:28:13,080 --> 00:28:16,350
That doesn’t mean that the shorter AS path route is
518
00:28:16,360 --> 00:28:20,370
actually faster, it just means that it’s shorter.
519
00:28:20,950 --> 00:28:24,720
Inside that autonomous system, there could be way more hops
520
00:28:24,860 --> 00:28:28,819
between the ingress and egress routers, so that’s why you might
521
00:28:28,820 --> 00:28:32,090
want to use something like local preference if you know that,
522
00:28:32,100 --> 00:28:36,500
say, Joe’s ISP and Crab Shack kind of sucks at passing traffic.
523
00:28:36,980 --> 00:28:38,160
Chris: Phenomenal crabs, though.
524
00:28:38,500 --> 00:28:39,540
Ned: Really good crabs.
525
00:28:40,120 --> 00:28:42,210
The last attribute is the router ID.
526
00:28:42,660 --> 00:28:48,899
If all other attributes for a route are the same, the lower router ID wins.
527
00:28:49,830 --> 00:28:51,590
Where does that router ID come from?
528
00:28:52,410 --> 00:28:53,290
That’s weird.
529
00:28:53,300 --> 00:28:54,540
It’s kind of up to the admin.
530
00:28:55,300 --> 00:28:58,960
The form looks exactly like an IPv4 address, and it’s
531
00:28:58,960 --> 00:29:01,940
usually set to the first loopback interface on the router.
532
00:29:02,759 --> 00:29:06,500
The router ID needs to be unique within an individual
533
00:29:06,590 --> 00:29:09,530
autonomous system and unique among its peers.
534
00:29:10,550 --> 00:29:13,290
So, you know, you can’t have two routers in the same
535
00:29:13,540 --> 00:29:17,349
neighborship—so sorry—that have the same router ID.
536
00:29:18,190 --> 00:29:19,209
Bad things will happen.
537
00:29:20,040 --> 00:29:25,890
Speaking of peers—back to our packet walk—the request has now left the Verizon
538
00:29:25,890 --> 00:29:30,530
network and it’s gone to some other network based on advertised routes.
539
00:29:31,260 --> 00:29:35,330
The Verizon router made a decision based on the path attributes for each route.
540
00:29:35,849 --> 00:29:37,439
Where is this all happening?
541
00:29:38,139 --> 00:29:40,070
Physically, where’s this actually happening?
542
00:29:40,770 --> 00:29:45,260
It’s at an internet exchange point of some kind—most likely—where a peering
543
00:29:45,270 --> 00:29:49,170
or transit arrangement has been created between two or more routers.
544
00:29:50,080 --> 00:29:54,380
So, at this point, we’re kind of done with BGP, but that led me
545
00:29:54,630 --> 00:29:58,470
to another rabbit hole, which is okay, I understand the theory.
546
00:29:58,580 --> 00:30:00,600
Where’s all this stuff actually happening?
547
00:30:01,350 --> 00:30:04,130
And it’s happening at these dedicated colocation
548
00:30:04,130 --> 00:30:06,430
facilities and internet exchange points.
549
00:30:06,570 --> 00:30:11,499
They used to be called NAPs, which was like Network Access… something.
550
00:30:11,980 --> 00:30:16,060
And there was a place called a SUPERNAP, down in Virginia,
551
00:30:16,139 --> 00:30:19,530
I think, where there, like, a metric shit ton of these
552
00:30:19,730 --> 00:30:22,510
different ISP lines all coming into the same facility.
553
00:30:22,920 --> 00:30:24,430
I don’t know if it’s still called the SUPERNAP.
554
00:30:25,050 --> 00:30:27,430
Chris: I think I’m lined up for a super nap, if you know what I’m saying.
555
00:30:27,530 --> 00:30:28,320
Ned: I do.
556
00:30:28,380 --> 00:30:29,960
I set you up for that one.
557
00:30:29,960 --> 00:30:30,540
You’re welcome.
558
00:30:31,259 --> 00:30:33,769
So, this isn’t entirely relevant to BGP,
559
00:30:33,790 --> 00:30:36,019
except it filled in some mental gaps for me.
560
00:30:36,920 --> 00:30:39,220
How are two autonomous systems connected?
561
00:30:39,880 --> 00:30:43,480
Well, they’re connected by two routers, but there’s two basic physical
562
00:30:43,490 --> 00:30:47,970
topologies that are followed: you can have a public peering arrangements
563
00:30:47,970 --> 00:30:52,280
between a bunch of ASs, and that usually happens at one of these internet
564
00:30:52,289 --> 00:30:57,460
exchange points, or a rented colocation space from a neutral provider.
565
00:30:57,470 --> 00:31:00,900
Think Equinix, or Digital Realty would be examples.
566
00:31:01,920 --> 00:31:04,960
Each ISPs router will be connected into a common
567
00:31:04,960 --> 00:31:09,190
switch fabric, and peering relationships will be formed
568
00:31:09,219 --> 00:31:11,849
between each router that’s connected into the switch.
569
00:31:12,559 --> 00:31:15,639
So, they’re all exchanging routing information with each other.
570
00:31:16,480 --> 00:31:21,780
The other option is a direct router-to-router connection between two ASs.
571
00:31:22,179 --> 00:31:23,749
That’s known as private peering.
572
00:31:24,590 --> 00:31:27,860
If you’ve ever been involved in setting up a connection to AWS with
573
00:31:27,860 --> 00:31:31,879
Direct Connect, or Azure with Express Connect—or Express Route.
574
00:31:32,070 --> 00:31:36,600
Sorry, stupid names—both of those use private peering and a
575
00:31:36,600 --> 00:31:41,490
direct physical connection from your network to Azure or AWS.
576
00:31:41,620 --> 00:31:44,880
You have to set up what’s called a cross-connect, which is essentially, from
577
00:31:44,880 --> 00:31:49,720
your router—or a router that you’re leasing through your ISP—it’s a cable
578
00:31:49,720 --> 00:31:54,280
that runs to the router or the switch that the cloud router is hooked into.
579
00:31:55,190 --> 00:31:58,420
There’s also a slight difference between peering and transit.
580
00:31:58,960 --> 00:32:01,670
Peering means that I can send traffic to your network,
581
00:32:01,700 --> 00:32:04,190
and you can send traffic to my network, and we don’t
582
00:32:04,190 --> 00:32:07,399
charge each other any money for accepting that traffic.
583
00:32:08,170 --> 00:32:11,860
Consider a scenario where you have a few different regional
584
00:32:11,860 --> 00:32:14,770
networks that want to pass network traffic between each other,
585
00:32:14,820 --> 00:32:17,669
rather than sending the traffic across a transit network.
586
00:32:18,620 --> 00:32:22,819
They can all rent space together at a colocation data center, and set up a
587
00:32:22,900 --> 00:32:26,889
public peering arrangement where they’ll exchange routes and paths traffic.
588
00:32:27,320 --> 00:32:30,689
It’s beneficial for all the networks involved to be able to
589
00:32:30,700 --> 00:32:34,310
communicate freely, and there’s a verbal peering agreement,
590
00:32:34,429 --> 00:32:37,380
or handshake agreement, to not be an asshole about it.
591
00:32:38,520 --> 00:32:38,550
Chris: [laugh]
592
00:32:38,940 --> 00:32:39,490
.
Ned: I’m serious.
593
00:32:39,490 --> 00:32:41,099
They’re like, “Just don’t be a dick.
594
00:32:41,440 --> 00:32:44,790
Don’t overwhelm my network with traffic that’s destined for somewhere else.
595
00:32:44,800 --> 00:32:46,860
Don’t try to use me as a transit network, and
596
00:32:46,860 --> 00:32:50,439
we’ll all get along.” And yes, I’m very serious.
597
00:32:51,020 --> 00:32:55,370
A study in 2011 showed that only 0.05% of
598
00:32:55,370 --> 00:32:58,309
peering agreements were actual written contracts.
599
00:32:59,199 --> 00:33:02,920
I imagine that’s grown in the last 13 years with the explosion of cloud
600
00:33:02,920 --> 00:33:06,470
where, like, if you want a peering agreement with Azure, it is absolutely
601
00:33:06,470 --> 00:33:10,589
a written contract, but from what I’ve heard, that’s in the minority.
602
00:33:10,630 --> 00:33:13,050
These regional networks are still using just
603
00:33:13,050 --> 00:33:16,350
handshakes and, like, firm nods at each other.
604
00:33:17,190 --> 00:33:20,000
Transit relationships are where a network is paying
605
00:33:20,000 --> 00:33:22,700
another network for access to the general internet.
606
00:33:23,820 --> 00:33:27,500
There’s a few giant tier one operators that lots of other
607
00:33:27,500 --> 00:33:30,710
networks pay to transmit their traffic across the internet.
608
00:33:31,540 --> 00:33:35,589
A regional network in, say, Luxembourg is unlikely to have a
609
00:33:35,590 --> 00:33:39,680
direct peering relationship with a network in Omaha, Nebraska,
610
00:33:40,240 --> 00:33:42,850
so that traffic needs to transit through another provider.
611
00:33:43,520 --> 00:33:45,870
That provider doesn’t see a mutual benefit for
612
00:33:45,880 --> 00:33:48,530
providing that transit, so they charge for it.
613
00:33:49,770 --> 00:33:52,530
Tier one networks are those networks that can reach all
614
00:33:52,530 --> 00:33:55,870
other networks on the internet using settlement-free peering.
615
00:33:56,690 --> 00:34:01,060
Tier two networks have to pay for at least some transit to other networks.
616
00:34:01,630 --> 00:34:05,680
And tier three networks pay for transit to all networks.
617
00:34:06,650 --> 00:34:09,210
Who are these mysterious tier one providers?
618
00:34:09,420 --> 00:34:11,570
Well, Verizon is one.
619
00:34:12,190 --> 00:34:17,920
So, is AT&T, and Comcast, and Lumen, who you might not have
620
00:34:17,920 --> 00:34:20,950
heard of, but that’s because they used to be called CenturyLink.
621
00:34:21,489 --> 00:34:23,360
They changed their name because they had a
622
00:34:23,360 --> 00:34:25,540
terrible reputation, and that was going to help.
623
00:34:26,110 --> 00:34:30,580
They’re also the biggest tier one provider in the world as far as I can tell.
624
00:34:31,659 --> 00:34:35,490
Since Verizon is a tier one network—going back to our packet walk,
625
00:34:35,490 --> 00:34:39,980
and to round this all out—Since it’s a tier one network, my packet
626
00:34:40,010 --> 00:34:44,100
doesn’t have to go across another transit network to get to Podpage.
627
00:34:45,320 --> 00:34:47,209
I looked it up, and Podpage is actually
628
00:34:47,209 --> 00:34:49,759
using Google Cloud to host their service.
629
00:34:50,489 --> 00:34:55,530
So, when I looked at it, the ASNs for Podpage—or the public IP
630
00:34:55,530 --> 00:35:00,399
addresses they’re using—lined up to Google’s ASNs, and so my
631
00:35:00,400 --> 00:35:03,830
little packet will go directly from Verizon network to Google.
632
00:35:04,059 --> 00:35:05,980
No other transit required.
633
00:35:06,000 --> 00:35:08,090
And in fact, that’s exactly what it does.
634
00:35:08,550 --> 00:35:13,040
Through the magic of traceroute, I can see my packet hop from Verizon, to
635
00:35:13,040 --> 00:35:19,110
Verizon business, to Google, to another Google AS because they have multiples.
636
00:35:19,929 --> 00:35:23,139
BGP has done its job, and all as well with the internet.
637
00:35:23,930 --> 00:35:24,770
But what if it isn’t?
638
00:35:25,500 --> 00:35:25,570
Chris: [laugh]
639
00:35:26,260 --> 00:35:27,750
.
Ned: How can BGP break?
640
00:35:28,090 --> 00:35:29,810
And can people do it on purpose?
641
00:35:30,520 --> 00:35:31,870
The answers will shock you.
642
00:35:32,360 --> 00:35:36,710
I—they probably won’t shock you [laugh] . The answer is there
643
00:35:36,710 --> 00:35:41,230
are many ways to break BGP, and yes, it can be done on purpose.
644
00:35:41,500 --> 00:35:46,770
But that is the story for another time, a future episode, and a guest
645
00:35:46,920 --> 00:35:50,279
who’s more eloquent than me at explaining security issues with BGP.
646
00:35:50,279 --> 00:35:50,299
[sigh]
647
00:35:52,820 --> 00:35:53,499
. You feel better?
648
00:35:53,700 --> 00:35:54,379
Chris: No.
649
00:35:54,570 --> 00:35:57,359
Ned: Have I demystified some of the magic of the internet for you?
650
00:35:57,660 --> 00:35:59,290
Chris: I’m more confused than when I started,
651
00:35:59,290 --> 00:36:00,780
and I didn’t think that was possible.
652
00:36:01,110 --> 00:36:01,350
Ned: Good.
653
00:36:01,350 --> 00:36:04,560
Then my job… [laugh] is a complete success.
654
00:36:04,670 --> 00:36:05,610
My job here is done.
655
00:36:06,550 --> 00:36:07,890
Hey, thanks for listening or something.
656
00:36:07,890 --> 00:36:10,610
I guess you found it worthwhile enough if you made it all the way to the
657
00:36:10,610 --> 00:36:14,219
end, so congratulations to you, friend, you accomplished something today.
658
00:36:14,670 --> 00:36:15,170
Maybe.
659
00:36:15,790 --> 00:36:18,640
Now, you can sit on the couch, think about the magic of
660
00:36:18,910 --> 00:36:22,049
BGP, and just get hopelessly confused like the rest of us.
661
00:36:22,330 --> 00:36:22,860
You’ve earned it.
662
00:36:23,390 --> 00:36:26,220
You can find more about this show by going to our LinkedIn page, just
663
00:36:26,220 --> 00:36:30,680
search ‘Chaos Lever,’ or go to the website, pod.chaoslever.com, where
664
00:36:30,680 --> 00:36:34,170
you’ll find show notes, blog posts, and general tomfoolery, and you
665
00:36:34,170 --> 00:36:37,590
can leave a comment that we might read on the Tech News of the Week.
666
00:36:37,980 --> 00:36:40,570
We’ll be back next week to see what fresh hell is upon us.
667
00:36:40,730 --> 00:36:41,620
Ta-ta for now.
668
00:36:49,740 --> 00:36:53,290
Chris: And just to make things even more unnecessarily confusing,
669
00:36:53,879 --> 00:36:57,530
it was originally called the two-napkin protocol, when it was first
670
00:36:57,550 --> 00:37:03,460
proposed and first published in a Cisco internal blog in 1989.
671
00:37:04,030 --> 00:37:06,190
Ned: [laugh] . And then there was a third napkin arose?
672
00:37:06,480 --> 00:37:07,030
Oh, no.
673
00:37:07,309 --> 00:37:08,970
Chris: Look, I mean, math is hard.