Sept. 17, 2024
Tech News of the Week 09-17-2024

In this episode, we discuss the European Court of Justice's decision forcing Apple to pay €13 billion in back taxes to Ireland, marking a major moment in corporate taxation within the EU. We also dive into Microsoft's breakthrough in quantum computing, as they announce the creation of 12 error-corrected qubits, a step forward in the notoriously difficult area of error resilience. Lastly, we explore OpenAI's "Strawberry" model, designed to improve reasoning in AI, and the latest drama involving OthersideAI's inflated claims about their new AI model, Reflection.
Links:
Transcript
1
00:00:08,500 --> 00:00:09,566
Welcome to tech news of the week.
2
00:00:09,566 --> 00:00:10,699
This is our weekly tech news podcast where Chris and I get into four
3
00:00:10,699 --> 00:00:11,500
of the best tech news of the week. We're going to be doing a podcast. We're going to be doing a podcast. We're going to be doing a podcast. We're going to be doing a podcast where
4
00:00:11,500 --> 00:00:13,633
Chris and I get into four interesting
5
00:00:14,000 --> 00:00:15,366
articles that caught our attention.
6
00:00:15,933 --> 00:00:17,300
I'm going to go first,
7
00:00:17,300 --> 00:00:19,133
Chris, if that's okay with you.
8
00:00:21,399 --> 00:00:22,199
My mic's muted.
9
00:00:23,399 --> 00:00:23,600
Good.
10
00:00:24,199 --> 00:00:24,866
It should be.
11
00:00:25,833 --> 00:00:27,899
The EU takes a big bite out of Apple.
12
00:00:28,899 --> 00:00:30,933
The European court of justice has handed
13
00:00:30,933 --> 00:00:33,333
down a landmark ruling that forces Apple
14
00:00:33,333 --> 00:00:37,100
to pay 13 billion euros in back taxes
15
00:00:37,100 --> 00:00:38,866
based on what they judge.
16
00:00:40,433 --> 00:00:42,799
Based on what they judge to be an illegal
17
00:00:42,799 --> 00:00:45,299
tax structure put forth by Ireland.
18
00:00:46,133 --> 00:00:49,200
The case dates back to 2016 and regards a
19
00:00:49,200 --> 00:00:51,799
period of almost 11 years when Apple's
20
00:00:51,799 --> 00:00:53,266
effective tax burden in
21
00:00:53,266 --> 00:00:55,100
Ireland was a mere 1%.
22
00:00:56,633 --> 00:00:58,466
Since that time, the loophole providing
23
00:00:58,466 --> 00:01:00,899
such relief has been closed and a
24
00:01:00,899 --> 00:01:03,399
universal 15% corporate tax has been
25
00:01:03,399 --> 00:01:04,733
adopted by most of
26
00:01:04,733 --> 00:01:06,066
the EU's member states.
27
00:01:06,966 --> 00:01:09,700
Not surprisingly, Tim Cook claims no
28
00:01:09,700 --> 00:01:12,033
wrongdoing, as does the Irish government.
29
00:01:13,033 --> 00:01:15,033
The case was originally ruled in Apple's
30
00:01:15,033 --> 00:01:17,933
favor back in 2020 by a lower court, but
31
00:01:17,933 --> 00:01:19,966
the European court of justice was primed
32
00:01:19,966 --> 00:01:21,500
to have the final say.
33
00:01:21,933 --> 00:01:24,900
And after a brisk four years, I think
34
00:01:24,900 --> 00:01:26,900
that's fast in the legal world.
35
00:01:27,599 --> 00:01:28,533
They decided that Ireland
36
00:01:28,533 --> 00:01:31,000
had granted Apple unlawful aid.
37
00:01:31,700 --> 00:01:33,066
You might be wondering about the money.
38
00:01:34,033 --> 00:01:36,299
Is the EU going to send Apple a PayPal
39
00:01:36,299 --> 00:01:37,833
invoice or a written
40
00:01:38,299 --> 00:01:40,166
proclamation by a carrier pigeon?
41
00:01:41,233 --> 00:01:42,799
Will swallows, laden or
42
00:01:42,799 --> 00:01:44,333
unladen be involved somehow?
43
00:01:45,400 --> 00:01:46,433
Fortunately, no.
44
00:01:47,099 --> 00:01:48,933
The back taxes have been sitting in an
45
00:01:48,933 --> 00:01:50,533
escrow account since 2018.
46
00:01:51,233 --> 00:01:52,900
And with this judgment, they can finally
47
00:01:52,900 --> 00:01:54,833
be released to the Irish state.
48
00:01:55,633 --> 00:01:58,299
Just like with GDPR, the EU once again
49
00:01:58,299 --> 00:01:59,166
shows us the way
50
00:01:59,166 --> 00:02:00,633
forward on corporate taxation.
51
00:02:01,400 --> 00:02:03,900
If only we could get Amazon or Walmart to
52
00:02:03,900 --> 00:02:07,200
pay 15% of their net income with a lame
53
00:02:07,200 --> 00:02:08,366
duck president, like
54
00:02:08,366 --> 00:02:10,233
maybe he could do it.
55
00:02:10,566 --> 00:02:12,533
Well, or more likely Biden is too busy
56
00:02:12,533 --> 00:02:14,466
racing his Corvette down the Delmarva
57
00:02:14,466 --> 00:02:16,933
peninsula room, room bitches.
58
00:02:18,400 --> 00:02:19,666
Seems like a Camaro guy.
59
00:02:21,633 --> 00:02:21,833
Maybe.
60
00:02:23,233 --> 00:02:26,133
Quantum update error corrected.
61
00:02:26,133 --> 00:02:27,266
Cubit count alert.
62
00:02:29,199 --> 00:02:31,000
As we've talked about a number of times
63
00:02:31,000 --> 00:02:33,933
on the show, creating cubits in quantum
64
00:02:33,933 --> 00:02:35,900
computers is getting pretty routine.
65
00:02:36,633 --> 00:02:40,433
I mean, relatively speaking, creating
66
00:02:40,433 --> 00:02:42,266
systems that can
67
00:02:42,266 --> 00:02:44,199
withstand errors, however,
68
00:02:45,366 --> 00:02:47,099
continues to be devilishly hard.
69
00:02:49,433 --> 00:02:51,099
Just so we're all on the same page, we
70
00:02:51,099 --> 00:02:54,933
are way over 1000 cubits in a number of
71
00:02:54,933 --> 00:02:55,099
running systems.
72
00:02:55,566 --> 00:03:00,266
So where are we with
73
00:03:00,266 --> 00:03:01,833
error corrected cubits?
74
00:03:02,300 --> 00:03:02,900
You might ask.
75
00:03:04,199 --> 00:03:08,300
Well, Microsoft of all people announced
76
00:03:08,300 --> 00:03:10,000
an answer with what
77
00:03:10,000 --> 00:03:11,699
they're calling the largest
78
00:03:11,699 --> 00:03:14,699
current number of error corrected cubits.
79
00:03:15,900 --> 00:03:18,699
And that number is 12.
80
00:03:21,800 --> 00:03:22,233
The approach is interesting.
81
00:03:22,266 --> 00:03:25,766
Microsoft has partnered with a quantum
82
00:03:25,766 --> 00:03:27,866
computing organization called Adam
83
00:03:27,866 --> 00:03:28,366
Computing.
84
00:03:29,433 --> 00:03:30,966
The approach they're taking is to spread
85
00:03:30,966 --> 00:03:33,933
the value of each cubit across several
86
00:03:33,933 --> 00:03:37,099
cubits, thus making any errors or issues
87
00:03:37,099 --> 00:03:39,099
that come up, quote, less catastrophic.
88
00:03:40,533 --> 00:03:41,400
Hilarious language.
89
00:03:42,300 --> 00:03:42,633
Love it.
90
00:03:42,900 --> 00:03:43,800
It looks like they're going with
91
00:03:43,800 --> 00:03:44,833
something around a four
92
00:03:44,833 --> 00:03:47,433
to one ratio, creating 12
93
00:03:47,433 --> 00:03:48,966
logical error corrected
94
00:03:48,966 --> 00:03:51,000
cubits backed by 56 physical ones.
95
00:03:54,033 --> 00:03:55,833
And the approach does seem to be working
96
00:03:55,833 --> 00:03:58,099
at least for certain algorithms.
97
00:03:59,300 --> 00:04:01,500
The test improved the error rate from
98
00:04:01,500 --> 00:04:07,400
2.4% down to 0.11%, which is substantial.
99
00:04:08,566 --> 00:04:08,766
Yeah.
100
00:04:09,766 --> 00:04:11,366
Now it's important to note that error
101
00:04:11,366 --> 00:04:13,166
corrected systems are
102
00:04:13,166 --> 00:04:15,199
helpful for a number
103
00:04:15,199 --> 00:04:18,933
of reasons, one of which is sometimes in
104
00:04:18,933 --> 00:04:21,433
quantum, there can be errors that can't
105
00:04:21,433 --> 00:04:25,633
be detected, which is different than
106
00:04:25,633 --> 00:04:26,933
errors that can be detected.
107
00:04:27,300 --> 00:04:29,100
And I will leave the difference and
108
00:04:29,100 --> 00:04:31,300
challenge for both of them as an exercise
109
00:04:31,300 --> 00:04:31,966
to the reader.
110
00:04:33,766 --> 00:04:35,466
Long story short, though, spreading out
111
00:04:35,466 --> 00:04:38,199
the work and creating logical cubits like
112
00:04:38,199 --> 00:04:39,833
Microsoft and Adam are doing in this
113
00:04:39,833 --> 00:04:42,600
means that even these failures, the ones
114
00:04:42,600 --> 00:04:43,699
that are not detected
115
00:04:43,699 --> 00:04:45,833
can at least be mitigated.
116
00:04:46,833 --> 00:04:48,533
Neat.
117
00:04:51,433 --> 00:04:53,833
Open AI announces strawberry models.
118
00:04:54,833 --> 00:04:58,766
Quick open up chat GPT or copilot and ask
119
00:04:58,766 --> 00:05:00,366
it how many R's are
120
00:05:00,366 --> 00:05:01,800
in the word strawberry.
121
00:05:02,633 --> 00:05:02,933
Go ahead.
122
00:05:03,399 --> 00:05:03,699
I'll wait.
123
00:05:06,533 --> 00:05:08,199
Listen, buddy, I've got two liters of
124
00:05:08,199 --> 00:05:09,466
Joel Cola, a Sudoku
125
00:05:09,466 --> 00:05:10,733
book and adult diapers.
126
00:05:11,500 --> 00:05:12,333
I can wait it out.
127
00:05:14,800 --> 00:05:14,899
You done?
128
00:05:16,333 --> 00:05:18,800
I could let us proceed before my heart
129
00:05:18,800 --> 00:05:20,533
leaps out of my body and strangles my
130
00:05:20,533 --> 00:05:22,833
teeth chances are your
131
00:05:22,833 --> 00:05:27,300
friend, I couldn't get through it.
132
00:05:27,300 --> 00:05:29,433
Chances are your friendly LLM told you
133
00:05:29,433 --> 00:05:31,633
that there are two R's in strawberry,
134
00:05:31,933 --> 00:05:33,933
which unless you are terrible at
135
00:05:33,933 --> 00:05:35,666
spelling, you know is wrong.
136
00:05:36,733 --> 00:05:37,066
So what?
137
00:05:37,833 --> 00:05:40,100
LLMs get stuff wrong all the time.
138
00:05:40,800 --> 00:05:41,399
Even better.
139
00:05:41,433 --> 00:05:43,600
If you tell it the correct answer, it
140
00:05:43,600 --> 00:05:46,600
will cheerfully suggest that you are
141
00:05:46,600 --> 00:05:48,266
the one counting stuff wrong.
142
00:05:49,133 --> 00:05:50,066
What is happening?
143
00:05:51,399 --> 00:05:53,300
It's like, I'm afraid you're mistaken.
144
00:05:53,766 --> 00:05:55,000
There are only two R's.
145
00:05:56,600 --> 00:05:59,333
What is happening is that LLMs break
146
00:05:59,333 --> 00:06:02,199
things into tokens to process information
147
00:06:02,633 --> 00:06:04,833
and the word strawberry is broken into
148
00:06:04,833 --> 00:06:06,333
two separate tokens.
149
00:06:07,333 --> 00:06:10,033
The best guess is that chat GPT season R
150
00:06:10,033 --> 00:06:13,300
in each token and counts two R's.
151
00:06:13,600 --> 00:06:15,866
This thorny problem is so well known that
152
00:06:15,866 --> 00:06:18,500
open AI codenamed their new AI
153
00:06:18,500 --> 00:06:21,233
model line as strawberry,
154
00:06:21,899 --> 00:06:25,600
also known as O1 for reasons.
155
00:06:26,633 --> 00:06:29,033
The new model is allegedly capable of
156
00:06:29,033 --> 00:06:31,300
reasoning through an answer, much like
157
00:06:31,300 --> 00:06:33,533
a person does, instead of just trying to
158
00:06:33,533 --> 00:06:34,966
vomit the whole thing out at once.
159
00:06:35,933 --> 00:06:38,633
O1 is the new model developed in parallel
160
00:06:38,633 --> 00:06:41,100
with the forthcoming GPT-5,
161
00:06:41,633 --> 00:06:43,166
and it makes use of reinforcement
162
00:06:43,166 --> 00:06:45,533
learning, aka telling the model
163
00:06:45,533 --> 00:06:46,899
when it gets things wrong.
164
00:06:47,733 --> 00:06:49,833
The reinforcement learning and multi-step
165
00:06:49,833 --> 00:06:52,433
reasoning should allow O1 to arrive at
166
00:06:52,433 --> 00:06:55,399
the correct answer of three for R's in
167
00:06:55,399 --> 00:06:57,699
strawberry, and also help it solve
168
00:06:57,766 --> 00:06:59,966
math word problems that have so far
169
00:06:59,966 --> 00:07:01,233
stumped previous generations.
170
00:07:02,500 --> 00:07:04,100
I got to try the O1 preview
171
00:07:04,100 --> 00:07:06,866
today and it apologized to me.
172
00:07:07,133 --> 00:07:09,399
Quote, "You are absolutely correct and I
173
00:07:09,399 --> 00:07:11,533
apologize for the oversight earlier.
174
00:07:12,166 --> 00:07:15,133
The word strawberry contains three R's."
175
00:07:15,133 --> 00:07:15,433
End quote.
176
00:07:16,466 --> 00:07:17,899
Absolutely amazing stuff.
177
00:07:21,466 --> 00:07:23,033
AI dude bro lies
178
00:07:23,033 --> 00:07:24,399
about model capabilities.
179
00:07:25,699 --> 00:07:26,199
Gets caught.
180
00:07:27,466 --> 00:07:28,100
Hilarity ensues.
181
00:07:30,500 --> 00:07:33,033
This past two weeks has been
182
00:07:33,033 --> 00:07:36,199
pretty wild for other side AI.
183
00:07:37,833 --> 00:07:40,266
The company became AI world famous for
184
00:07:40,266 --> 00:07:42,199
its product, which is called Hyper Write,
185
00:07:42,800 --> 00:07:44,533
which is apparently a writing assistant.
186
00:07:45,166 --> 00:07:46,199
Is it hyper wrong?
187
00:07:47,000 --> 00:07:51,333
But um, but of course, success in the
188
00:07:51,333 --> 00:07:52,899
Hyper Write realm wasn't
189
00:07:52,899 --> 00:07:54,699
enough for other side AI.
190
00:07:55,500 --> 00:07:56,300
And thus they started hyping up their own AI.
191
00:07:56,333 --> 00:08:02,100
Going under the brand name reflection.
192
00:08:03,966 --> 00:08:06,733
Allegedly based on llama 3.1.
193
00:08:07,000 --> 00:08:09,866
This past week, CEO Matt Schumer
194
00:08:09,866 --> 00:08:11,500
breathlessly announced
195
00:08:11,500 --> 00:08:14,100
reflection 70 B, which
196
00:08:14,100 --> 00:08:16,133
he claimed insane performance on.
197
00:08:16,133 --> 00:08:17,600
He showed tables and everything.
198
00:08:18,100 --> 00:08:21,899
He even published the model and uploaded
199
00:08:21,899 --> 00:08:24,166
it so other people could download it and
200
00:08:24,366 --> 00:08:24,866
test it.
201
00:08:25,233 --> 00:08:25,733
This is the first time that we've ever seen a model like this.
202
00:08:26,033 --> 00:08:29,899
This turned out to be a mistake as nobody
203
00:08:29,899 --> 00:08:31,233
could come close to the claimed
204
00:08:31,300 --> 00:08:32,233
performance numbers.
205
00:08:33,566 --> 00:08:36,333
In order to counter this, Matt went ahead
206
00:08:36,333 --> 00:08:37,033
and claimed that the
207
00:08:37,033 --> 00:08:38,500
upload was corrupted.
208
00:08:39,966 --> 00:08:40,299
Sure, Matt.
209
00:08:41,600 --> 00:08:44,600
Other side AI opened access to a private
210
00:08:44,600 --> 00:08:48,133
API so that people could test reflection
211
00:08:48,299 --> 00:08:50,233
70 B at home base.
212
00:08:52,133 --> 00:08:54,033
Seems like a not bad idea.
213
00:08:54,966 --> 00:08:59,000
Except that what the testers found was
214
00:08:59,000 --> 00:09:01,600
while there was better performance, there
215
00:09:01,600 --> 00:09:03,700
was plausible evidence that this private
216
00:09:03,700 --> 00:09:05,799
API was simply scrubbing answers pulled
217
00:09:05,799 --> 00:09:08,133
directly from an anthropics Claude model.
218
00:09:09,333 --> 00:09:12,033
Oh, so that's not a good look.
219
00:09:14,399 --> 00:09:16,833
After this Matt went dark, basically
220
00:09:16,833 --> 00:09:19,066
hanging all his supporters out to dry.
221
00:09:19,766 --> 00:09:22,700
Eventually he went on Twitter
222
00:09:22,700 --> 00:09:26,100
apologizing, sort of saying that he
223
00:09:26,100 --> 00:09:27,600
quote, got ahead of himself.
224
00:09:29,566 --> 00:09:31,166
This, as I'm sure you
225
00:09:31,166 --> 00:09:34,333
know, is also not a good look.
226
00:09:35,533 --> 00:09:38,500
It turns out fake announcements of wild
227
00:09:38,500 --> 00:09:41,100
success using repeatable tests of known
228
00:09:41,100 --> 00:09:42,833
benchmarks against a product that other
229
00:09:42,833 --> 00:09:44,733
people can download is a bad idea.
230
00:09:45,200 --> 00:09:49,333
But Chris, he was in founder mode.
231
00:09:51,133 --> 00:09:52,433
Move fast and break stuff.
232
00:09:53,200 --> 00:09:53,933
We're done. Go away now. Bye.
00:00:08,500 --> 00:00:09,566
Welcome to tech news of the week.
2
00:00:09,566 --> 00:00:10,699
This is our weekly tech news podcast where Chris and I get into four
3
00:00:10,699 --> 00:00:11,500
of the best tech news of the week. We're going to be doing a podcast. We're going to be doing a podcast. We're going to be doing a podcast. We're going to be doing a podcast where
4
00:00:11,500 --> 00:00:13,633
Chris and I get into four interesting
5
00:00:14,000 --> 00:00:15,366
articles that caught our attention.
6
00:00:15,933 --> 00:00:17,300
I'm going to go first,
7
00:00:17,300 --> 00:00:19,133
Chris, if that's okay with you.
8
00:00:21,399 --> 00:00:22,199
My mic's muted.
9
00:00:23,399 --> 00:00:23,600
Good.
10
00:00:24,199 --> 00:00:24,866
It should be.
11
00:00:25,833 --> 00:00:27,899
The EU takes a big bite out of Apple.
12
00:00:28,899 --> 00:00:30,933
The European court of justice has handed
13
00:00:30,933 --> 00:00:33,333
down a landmark ruling that forces Apple
14
00:00:33,333 --> 00:00:37,100
to pay 13 billion euros in back taxes
15
00:00:37,100 --> 00:00:38,866
based on what they judge.
16
00:00:40,433 --> 00:00:42,799
Based on what they judge to be an illegal
17
00:00:42,799 --> 00:00:45,299
tax structure put forth by Ireland.
18
00:00:46,133 --> 00:00:49,200
The case dates back to 2016 and regards a
19
00:00:49,200 --> 00:00:51,799
period of almost 11 years when Apple's
20
00:00:51,799 --> 00:00:53,266
effective tax burden in
21
00:00:53,266 --> 00:00:55,100
Ireland was a mere 1%.
22
00:00:56,633 --> 00:00:58,466
Since that time, the loophole providing
23
00:00:58,466 --> 00:01:00,899
such relief has been closed and a
24
00:01:00,899 --> 00:01:03,399
universal 15% corporate tax has been
25
00:01:03,399 --> 00:01:04,733
adopted by most of
26
00:01:04,733 --> 00:01:06,066
the EU's member states.
27
00:01:06,966 --> 00:01:09,700
Not surprisingly, Tim Cook claims no
28
00:01:09,700 --> 00:01:12,033
wrongdoing, as does the Irish government.
29
00:01:13,033 --> 00:01:15,033
The case was originally ruled in Apple's
30
00:01:15,033 --> 00:01:17,933
favor back in 2020 by a lower court, but
31
00:01:17,933 --> 00:01:19,966
the European court of justice was primed
32
00:01:19,966 --> 00:01:21,500
to have the final say.
33
00:01:21,933 --> 00:01:24,900
And after a brisk four years, I think
34
00:01:24,900 --> 00:01:26,900
that's fast in the legal world.
35
00:01:27,599 --> 00:01:28,533
They decided that Ireland
36
00:01:28,533 --> 00:01:31,000
had granted Apple unlawful aid.
37
00:01:31,700 --> 00:01:33,066
You might be wondering about the money.
38
00:01:34,033 --> 00:01:36,299
Is the EU going to send Apple a PayPal
39
00:01:36,299 --> 00:01:37,833
invoice or a written
40
00:01:38,299 --> 00:01:40,166
proclamation by a carrier pigeon?
41
00:01:41,233 --> 00:01:42,799
Will swallows, laden or
42
00:01:42,799 --> 00:01:44,333
unladen be involved somehow?
43
00:01:45,400 --> 00:01:46,433
Fortunately, no.
44
00:01:47,099 --> 00:01:48,933
The back taxes have been sitting in an
45
00:01:48,933 --> 00:01:50,533
escrow account since 2018.
46
00:01:51,233 --> 00:01:52,900
And with this judgment, they can finally
47
00:01:52,900 --> 00:01:54,833
be released to the Irish state.
48
00:01:55,633 --> 00:01:58,299
Just like with GDPR, the EU once again
49
00:01:58,299 --> 00:01:59,166
shows us the way
50
00:01:59,166 --> 00:02:00,633
forward on corporate taxation.
51
00:02:01,400 --> 00:02:03,900
If only we could get Amazon or Walmart to
52
00:02:03,900 --> 00:02:07,200
pay 15% of their net income with a lame
53
00:02:07,200 --> 00:02:08,366
duck president, like
54
00:02:08,366 --> 00:02:10,233
maybe he could do it.
55
00:02:10,566 --> 00:02:12,533
Well, or more likely Biden is too busy
56
00:02:12,533 --> 00:02:14,466
racing his Corvette down the Delmarva
57
00:02:14,466 --> 00:02:16,933
peninsula room, room bitches.
58
00:02:18,400 --> 00:02:19,666
Seems like a Camaro guy.
59
00:02:21,633 --> 00:02:21,833
Maybe.
60
00:02:23,233 --> 00:02:26,133
Quantum update error corrected.
61
00:02:26,133 --> 00:02:27,266
Cubit count alert.
62
00:02:29,199 --> 00:02:31,000
As we've talked about a number of times
63
00:02:31,000 --> 00:02:33,933
on the show, creating cubits in quantum
64
00:02:33,933 --> 00:02:35,900
computers is getting pretty routine.
65
00:02:36,633 --> 00:02:40,433
I mean, relatively speaking, creating
66
00:02:40,433 --> 00:02:42,266
systems that can
67
00:02:42,266 --> 00:02:44,199
withstand errors, however,
68
00:02:45,366 --> 00:02:47,099
continues to be devilishly hard.
69
00:02:49,433 --> 00:02:51,099
Just so we're all on the same page, we
70
00:02:51,099 --> 00:02:54,933
are way over 1000 cubits in a number of
71
00:02:54,933 --> 00:02:55,099
running systems.
72
00:02:55,566 --> 00:03:00,266
So where are we with
73
00:03:00,266 --> 00:03:01,833
error corrected cubits?
74
00:03:02,300 --> 00:03:02,900
You might ask.
75
00:03:04,199 --> 00:03:08,300
Well, Microsoft of all people announced
76
00:03:08,300 --> 00:03:10,000
an answer with what
77
00:03:10,000 --> 00:03:11,699
they're calling the largest
78
00:03:11,699 --> 00:03:14,699
current number of error corrected cubits.
79
00:03:15,900 --> 00:03:18,699
And that number is 12.
80
00:03:21,800 --> 00:03:22,233
The approach is interesting.
81
00:03:22,266 --> 00:03:25,766
Microsoft has partnered with a quantum
82
00:03:25,766 --> 00:03:27,866
computing organization called Adam
83
00:03:27,866 --> 00:03:28,366
Computing.
84
00:03:29,433 --> 00:03:30,966
The approach they're taking is to spread
85
00:03:30,966 --> 00:03:33,933
the value of each cubit across several
86
00:03:33,933 --> 00:03:37,099
cubits, thus making any errors or issues
87
00:03:37,099 --> 00:03:39,099
that come up, quote, less catastrophic.
88
00:03:40,533 --> 00:03:41,400
Hilarious language.
89
00:03:42,300 --> 00:03:42,633
Love it.
90
00:03:42,900 --> 00:03:43,800
It looks like they're going with
91
00:03:43,800 --> 00:03:44,833
something around a four
92
00:03:44,833 --> 00:03:47,433
to one ratio, creating 12
93
00:03:47,433 --> 00:03:48,966
logical error corrected
94
00:03:48,966 --> 00:03:51,000
cubits backed by 56 physical ones.
95
00:03:54,033 --> 00:03:55,833
And the approach does seem to be working
96
00:03:55,833 --> 00:03:58,099
at least for certain algorithms.
97
00:03:59,300 --> 00:04:01,500
The test improved the error rate from
98
00:04:01,500 --> 00:04:07,400
2.4% down to 0.11%, which is substantial.
99
00:04:08,566 --> 00:04:08,766
Yeah.
100
00:04:09,766 --> 00:04:11,366
Now it's important to note that error
101
00:04:11,366 --> 00:04:13,166
corrected systems are
102
00:04:13,166 --> 00:04:15,199
helpful for a number
103
00:04:15,199 --> 00:04:18,933
of reasons, one of which is sometimes in
104
00:04:18,933 --> 00:04:21,433
quantum, there can be errors that can't
105
00:04:21,433 --> 00:04:25,633
be detected, which is different than
106
00:04:25,633 --> 00:04:26,933
errors that can be detected.
107
00:04:27,300 --> 00:04:29,100
And I will leave the difference and
108
00:04:29,100 --> 00:04:31,300
challenge for both of them as an exercise
109
00:04:31,300 --> 00:04:31,966
to the reader.
110
00:04:33,766 --> 00:04:35,466
Long story short, though, spreading out
111
00:04:35,466 --> 00:04:38,199
the work and creating logical cubits like
112
00:04:38,199 --> 00:04:39,833
Microsoft and Adam are doing in this
113
00:04:39,833 --> 00:04:42,600
means that even these failures, the ones
114
00:04:42,600 --> 00:04:43,699
that are not detected
115
00:04:43,699 --> 00:04:45,833
can at least be mitigated.
116
00:04:46,833 --> 00:04:48,533
Neat.
117
00:04:51,433 --> 00:04:53,833
Open AI announces strawberry models.
118
00:04:54,833 --> 00:04:58,766
Quick open up chat GPT or copilot and ask
119
00:04:58,766 --> 00:05:00,366
it how many R's are
120
00:05:00,366 --> 00:05:01,800
in the word strawberry.
121
00:05:02,633 --> 00:05:02,933
Go ahead.
122
00:05:03,399 --> 00:05:03,699
I'll wait.
123
00:05:06,533 --> 00:05:08,199
Listen, buddy, I've got two liters of
124
00:05:08,199 --> 00:05:09,466
Joel Cola, a Sudoku
125
00:05:09,466 --> 00:05:10,733
book and adult diapers.
126
00:05:11,500 --> 00:05:12,333
I can wait it out.
127
00:05:14,800 --> 00:05:14,899
You done?
128
00:05:16,333 --> 00:05:18,800
I could let us proceed before my heart
129
00:05:18,800 --> 00:05:20,533
leaps out of my body and strangles my
130
00:05:20,533 --> 00:05:22,833
teeth chances are your
131
00:05:22,833 --> 00:05:27,300
friend, I couldn't get through it.
132
00:05:27,300 --> 00:05:29,433
Chances are your friendly LLM told you
133
00:05:29,433 --> 00:05:31,633
that there are two R's in strawberry,
134
00:05:31,933 --> 00:05:33,933
which unless you are terrible at
135
00:05:33,933 --> 00:05:35,666
spelling, you know is wrong.
136
00:05:36,733 --> 00:05:37,066
So what?
137
00:05:37,833 --> 00:05:40,100
LLMs get stuff wrong all the time.
138
00:05:40,800 --> 00:05:41,399
Even better.
139
00:05:41,433 --> 00:05:43,600
If you tell it the correct answer, it
140
00:05:43,600 --> 00:05:46,600
will cheerfully suggest that you are
141
00:05:46,600 --> 00:05:48,266
the one counting stuff wrong.
142
00:05:49,133 --> 00:05:50,066
What is happening?
143
00:05:51,399 --> 00:05:53,300
It's like, I'm afraid you're mistaken.
144
00:05:53,766 --> 00:05:55,000
There are only two R's.
145
00:05:56,600 --> 00:05:59,333
What is happening is that LLMs break
146
00:05:59,333 --> 00:06:02,199
things into tokens to process information
147
00:06:02,633 --> 00:06:04,833
and the word strawberry is broken into
148
00:06:04,833 --> 00:06:06,333
two separate tokens.
149
00:06:07,333 --> 00:06:10,033
The best guess is that chat GPT season R
150
00:06:10,033 --> 00:06:13,300
in each token and counts two R's.
151
00:06:13,600 --> 00:06:15,866
This thorny problem is so well known that
152
00:06:15,866 --> 00:06:18,500
open AI codenamed their new AI
153
00:06:18,500 --> 00:06:21,233
model line as strawberry,
154
00:06:21,899 --> 00:06:25,600
also known as O1 for reasons.
155
00:06:26,633 --> 00:06:29,033
The new model is allegedly capable of
156
00:06:29,033 --> 00:06:31,300
reasoning through an answer, much like
157
00:06:31,300 --> 00:06:33,533
a person does, instead of just trying to
158
00:06:33,533 --> 00:06:34,966
vomit the whole thing out at once.
159
00:06:35,933 --> 00:06:38,633
O1 is the new model developed in parallel
160
00:06:38,633 --> 00:06:41,100
with the forthcoming GPT-5,
161
00:06:41,633 --> 00:06:43,166
and it makes use of reinforcement
162
00:06:43,166 --> 00:06:45,533
learning, aka telling the model
163
00:06:45,533 --> 00:06:46,899
when it gets things wrong.
164
00:06:47,733 --> 00:06:49,833
The reinforcement learning and multi-step
165
00:06:49,833 --> 00:06:52,433
reasoning should allow O1 to arrive at
166
00:06:52,433 --> 00:06:55,399
the correct answer of three for R's in
167
00:06:55,399 --> 00:06:57,699
strawberry, and also help it solve
168
00:06:57,766 --> 00:06:59,966
math word problems that have so far
169
00:06:59,966 --> 00:07:01,233
stumped previous generations.
170
00:07:02,500 --> 00:07:04,100
I got to try the O1 preview
171
00:07:04,100 --> 00:07:06,866
today and it apologized to me.
172
00:07:07,133 --> 00:07:09,399
Quote, "You are absolutely correct and I
173
00:07:09,399 --> 00:07:11,533
apologize for the oversight earlier.
174
00:07:12,166 --> 00:07:15,133
The word strawberry contains three R's."
175
00:07:15,133 --> 00:07:15,433
End quote.
176
00:07:16,466 --> 00:07:17,899
Absolutely amazing stuff.
177
00:07:21,466 --> 00:07:23,033
AI dude bro lies
178
00:07:23,033 --> 00:07:24,399
about model capabilities.
179
00:07:25,699 --> 00:07:26,199
Gets caught.
180
00:07:27,466 --> 00:07:28,100
Hilarity ensues.
181
00:07:30,500 --> 00:07:33,033
This past two weeks has been
182
00:07:33,033 --> 00:07:36,199
pretty wild for other side AI.
183
00:07:37,833 --> 00:07:40,266
The company became AI world famous for
184
00:07:40,266 --> 00:07:42,199
its product, which is called Hyper Write,
185
00:07:42,800 --> 00:07:44,533
which is apparently a writing assistant.
186
00:07:45,166 --> 00:07:46,199
Is it hyper wrong?
187
00:07:47,000 --> 00:07:51,333
But um, but of course, success in the
188
00:07:51,333 --> 00:07:52,899
Hyper Write realm wasn't
189
00:07:52,899 --> 00:07:54,699
enough for other side AI.
190
00:07:55,500 --> 00:07:56,300
And thus they started hyping up their own AI.
191
00:07:56,333 --> 00:08:02,100
Going under the brand name reflection.
192
00:08:03,966 --> 00:08:06,733
Allegedly based on llama 3.1.
193
00:08:07,000 --> 00:08:09,866
This past week, CEO Matt Schumer
194
00:08:09,866 --> 00:08:11,500
breathlessly announced
195
00:08:11,500 --> 00:08:14,100
reflection 70 B, which
196
00:08:14,100 --> 00:08:16,133
he claimed insane performance on.
197
00:08:16,133 --> 00:08:17,600
He showed tables and everything.
198
00:08:18,100 --> 00:08:21,899
He even published the model and uploaded
199
00:08:21,899 --> 00:08:24,166
it so other people could download it and
200
00:08:24,366 --> 00:08:24,866
test it.
201
00:08:25,233 --> 00:08:25,733
This is the first time that we've ever seen a model like this.
202
00:08:26,033 --> 00:08:29,899
This turned out to be a mistake as nobody
203
00:08:29,899 --> 00:08:31,233
could come close to the claimed
204
00:08:31,300 --> 00:08:32,233
performance numbers.
205
00:08:33,566 --> 00:08:36,333
In order to counter this, Matt went ahead
206
00:08:36,333 --> 00:08:37,033
and claimed that the
207
00:08:37,033 --> 00:08:38,500
upload was corrupted.
208
00:08:39,966 --> 00:08:40,299
Sure, Matt.
209
00:08:41,600 --> 00:08:44,600
Other side AI opened access to a private
210
00:08:44,600 --> 00:08:48,133
API so that people could test reflection
211
00:08:48,299 --> 00:08:50,233
70 B at home base.
212
00:08:52,133 --> 00:08:54,033
Seems like a not bad idea.
213
00:08:54,966 --> 00:08:59,000
Except that what the testers found was
214
00:08:59,000 --> 00:09:01,600
while there was better performance, there
215
00:09:01,600 --> 00:09:03,700
was plausible evidence that this private
216
00:09:03,700 --> 00:09:05,799
API was simply scrubbing answers pulled
217
00:09:05,799 --> 00:09:08,133
directly from an anthropics Claude model.
218
00:09:09,333 --> 00:09:12,033
Oh, so that's not a good look.
219
00:09:14,399 --> 00:09:16,833
After this Matt went dark, basically
220
00:09:16,833 --> 00:09:19,066
hanging all his supporters out to dry.
221
00:09:19,766 --> 00:09:22,700
Eventually he went on Twitter
222
00:09:22,700 --> 00:09:26,100
apologizing, sort of saying that he
223
00:09:26,100 --> 00:09:27,600
quote, got ahead of himself.
224
00:09:29,566 --> 00:09:31,166
This, as I'm sure you
225
00:09:31,166 --> 00:09:34,333
know, is also not a good look.
226
00:09:35,533 --> 00:09:38,500
It turns out fake announcements of wild
227
00:09:38,500 --> 00:09:41,100
success using repeatable tests of known
228
00:09:41,100 --> 00:09:42,833
benchmarks against a product that other
229
00:09:42,833 --> 00:09:44,733
people can download is a bad idea.
230
00:09:45,200 --> 00:09:49,333
But Chris, he was in founder mode.
231
00:09:51,133 --> 00:09:52,433
Move fast and break stuff.
232
00:09:53,200 --> 00:09:53,933
We're done. Go away now. Bye.