-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path2_cap.html
386 lines (342 loc) · 16.7 KB
/
2_cap.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
<!--The design and styling of this site was taken from http://redbook.io-->
<!DOCTYPE html>
<html>
<head>
<title>Consistency in Distributed Systems</title>
<link href='css/style.css' rel='stylesheet'>
<link href='css/svg.css' rel='stylesheet'>
<meta name=viewport content="width=device-width, initial-scale=1">
<link rel="shortcut icon" href="favicon.ico">
</head>
<body>
<div id="header">
<div id="grouptitle">
<a href="index.html">Consistency in Distributed Systems</a>
</div>
</div>
<div id="container">
<article>
<h1 id="baseball" class="lecturetitle">
<a href="https://scholar.google.com/scholar?cluster=17914402714677808535">
Brewer's Conjecture and the Feasibility of Consistent, Available,
Partition-Tolerant Web Services
</a>
</h1>
<p>
In this lecture, we'll define linearizability: a formal consistency model
that's often used synonymously with strong consistency. We'll then
present and prove the infamous CAP theorem which tells us that
partition-tolerant distributed systems have to choose between strong
consistency and availability. In other words, we can't have our strongly
consistent cake and eat it with high availability too.
</p>
<h2 id="linearizability">Linearizability</h2>
<p>
In the last lecture, we established (rather informally) that a
distributed storage system is strongly consistent if it behaves
indistinguishably from a storage system running on a single computer. In
this section, we'll fortify our understanding of strong consistency by
taking a look at <strong>linearizability</strong>: a formalism of strong
consistency initially proposed by <a
href="https://scholar.google.com/scholar?cluster=7860241540823320465">
Maurice Herlihy and Jeannette Wing in 1990</a>.
</p>
<p>
In the last lecture, the only storage system we looked at was a key-value
store. In this lecture, we'll consider an even simpler storage system: a
<strong>register</strong>. A register stores a single value. Clients can
issue
<ul>
<li>
a <code>w(x)</code> request to write <code>x</code> to the value or
</li>
<li>
a <code>r()</code> request to read the value.
</li>
</ul>
</p>
<p>
An example interaction between a client (<code>a</code>) and a register
(<code>s</code>) running on a single computer is given below. The client
first writes the value 7 into the register and then reads it out.
</p>
<div class="svgbox">
<svg viewBox="0 0 400 102" id="simple_reg"></svg>
</div>
<p>
From now on, let's assume that the register running on a single computer
can instantaneously process a request and send a response the exact
moment the request arrives. With that assumption, let's take a look at
how two clients (<code>a</code> and <code>b</code>) might interact with
such a register when the network starts to delay and quicken the delivery
of messages.
</p>
<div class="svgbox">
<svg viewBox="0 0 400 152" id="async_network"></svg>
</div>
<p>
The execution begins as client <code>a</code> sends a very slow
<code>w(9)</code> request to the register. Before the <code>w(9)</code>
request has a chance to make it to the register, client <code>b</code>
sends a very speedy <code>w(4)</code> request. The <code>w(4)</code>
request arrives at the register and the response arrives back at client
<code>b</code> before client <code>a</code>'s request even has a chance
to reach the register. Finally, client <code>a</code>'s write request
arrives at the register, writes a value of <code>9</code>, and returns
back to client </code>a</code>.
</p>
<p>
To make things clearer, we've drawn the contents of the register at the
moment each request arrives at the register and takes effect. When the
<code>w(4)</code> request takes effect, the register has value
<code>4</code>; when the <code>w(9)</code> request takes effect, the
register has value <code>9</code>.
</p>
<p>
Note that from our god's eye view, we know exactly when each request
arrives at the register. Unfortunately, clients don't enjoy such a
luxury. A client only knows when it sends its request and when it
receives the corresponding response. It doesn't know exactly when the
request arrives at the register.
</p>
<p>
Let's take a closer look at an execution from the perspective of the
clients and note something rather profound. We may not know when requests
arrive at the register, but <em>we can guess</em>! That is, for each
request, we can guess a time—between when the request is sent and
when a response is received—that the request <em>could have</em>
arrived at the register.
</p>
<p>
Play around with the following interactive visualization of an execution
in which client <code>a</code> sends a <code>w(3)</code> request and then
client <code>b</code> sends a <code>r()</code> request. You can click
(and drag) on the thick colored bars to place your guess as to when each
request arrived at the server.
</p>
<div class="svgbox">
<svg viewBox="0 0 400 110" id="first_guess"></svg>
</div>
<p>
In the previous execution, all of your guesses <em>could have been
correct</em>. However, this is not always the case. Consider the
following execution in which client <code>a</code> and client
<code>b</code> concurrently issue write requests after which client
<code>a</code> issues a read request. Note that some of your guesses
<em>cannot possibly be correct</em>.
</p>
<div class="svgbox">
<svg viewBox="0 0 400 110" id="first_wrong_guess"></svg>
</div>
<p>
More specifically, any guess in which client <code>b</code>'s
<code>w(8)</code> request follows client <code>a</code>'s
<code>w(2)</code> request cannot possibly be correct because after both
the writes, client <code>a</code> reads a value of <code>2</code>. This
means that even though we don't know exactly when the requests arrive at
the register, we do know that the <code>w(2)</code> request must arrive
after the <code>w(8)</code> request.
</p>
<p>
For read requests, note that we've shaded the value of the register red
whenever the register's value is inconsistent with the value returned by
the read request. For example, client <code>a</code>'s read request above
returns <code>2</code>, so the final state of the register is shaded red
whenever its value is anything other than <code>2</code> (e.g.
<code>8</code>).
</p>
<p>
If any of the registers in our execution are shaded red, then our guess
can not possibly be correct. This is bad. If all of the registers are
green, then our guess could be correct. This is good. Try to find a
potentially correct guess for the following executions.
</p>
<div class="svgbox">
<svg viewBox="0 0 400 160" id="challenge_one"></svg>
</div>
<div class="svgbox">
<svg viewBox="0 0 400 210" id="challenge_two"></svg>
</div>
<p>
At this point, we've seen an execution in which all guesses were
potentially correct. We've also seen a few executions in which some
guesses were potentially correct and some were definitely incorrect. Now
try to find a potentially correct guess for the following execution.
</p>
<div class="svgbox">
<svg viewBox="0 0 400 160" id="no_good_guess"></svg>
</div>
<p>
Alas, all guesses are incorrect; there is no potentially correct guess!
That is, there does not exist a way for us to guess the moment that each
request arrives at the register such that the guess is consistent with
the responses of all the read requests. Thus, this execution could not
have taken place with a register running on a single computer. Instead,
we have deduced that the register must be replicated on multiple
computers. In other words, we were able to distinguish the behavior of
the register from the behavior of a register running on a single computer!
</p>
<p>
Surprise, this is linearizability!
</p>
<p>
For a given execution, if there exists a potentially correct guess, then
we say the execution is linearizable. If all guesses are definitely
incorrect, then we say the execution is not linearizable. Similarly, a
<strong>linearizable register</strong> is one that only allows
linearizable executions.
</p>
<p>
If a (potentially replicated) register is linearizable, then every
interaction with it is indistinguishable from an interaction with a
register running on a single computer in which the requests arrive at the
single-computer register according to one of our potentially correct
guesses. If a replicated register is not linearizable, then there are
some interactions with it that could not possibly occur with a
single-computer register.
</p>
<p>
Thus far, we've only discussed linearizability in the context of a
register, but linearizability can be extended quite naturally to deal
with many other types of objects (e.g. queues, sets). This generalization
of linearizability, as well as a full formalization, can be found in
Herlihy and Wing's 1990 paper: <a
href="https://scholar.google.com/scholar?cluster=7860241540823320465">
<em>Linearizability: A Correctness Condition for Concurrent
Objects</em></a>.
</p>
<h2 id="consistency_availability_and_partition-tolerance">
Consistency, Availability, and Partition-Tolerance
</h2>
<p>
The CAP theorem states that a partition-tolerant replicated register
cannot be both consistent and available. In order to understand (and
prove) the CAP theorem, we first have to define the words
<em>consistent</em>, <em>available</em>, and <em>partition-tolerant</em>.
</p>
<p>
<strong>Consistent.</strong> The CAP theorem uses the word
<em>consistent</em> as a synonym for <em>linearizable</em>. Thanks to the
last section, we already understand what it means for a replicated
register to be linearizable.
</p>
<p>
<strong>Available.</strong> A replicated register is available if every
request sent to a non-failed replica eventually produces a response. In
other words, whenever a non-failed replica receives a request (e.g.
<code>w(9)</code>) from a client, it eventually has to send a response
(e.g. <code>ok</code>) back to the client. The replica is allowed to take
as long as it wants before responding to the client, but it is
<em>not</em> allowed to ignore the request indefinitely.
</p>
<p>
<strong>Partition-tolerant.</strong> The CAP theorem assumes that, at any
time, replicas can be temporarily partitioned from one another and that
any messages sent between the two partitions are lost. Effectively, this
means that the network can drop arbitrary messages sent from any replica
to any other replica at any time. If a replicated register behaves
correctly despite the possibility of arbitrary message loss, we say it is
partition-tolerant.
</p>
<h2 id="proving_the_cap_theorem">Proving The CAP Theorem</h2>
<p>
To reiterate, the CAP theorem states that a partition-tolerant replicated
register cannot be both consistent and available. Now that we've
established the definitions of <em>consistent</em>, <em>available</em>,
and <em>partition-tolerant</em>, we're ready to prove the theorem! Note
that even though the CAP theorem is hugely far-reaching and influential,
we think you'll find the proof remarkably simple.
</p>
<p>
Assume for contradiction that there exists a consistent, available, and
fault-tolerant replicated register. Also assume the replicas are
partitioned in two. For simplicity (and without loss of generality),
consider a replicated register with exactly two replicas (<code>s1</code>
and <code>s2</code>). Assume the register has an initial value of 0 and
consider the following execution.
</p>
<div class="svgbox">
<svg viewBox="0 0 400 242" id="cap"></svg>
</div>
<p>
Our client (<code>a</code>) begins by issuing a <code>w(9)</code> request
to <code>s1</code>. When <code>s1</code> receives the request, it
repeatedly attempts to send a message to <code>s2</code>, but alas the
partition drops all the messages. By our assumption of availability,
<code>s1</code> eventually writes <code>9</code> to the register and
returns a response to the client.
</p>
<p>
Then, the client sends a <code>r()</code> request to <code>s2</code>: the
other replica. As before, <code>s2</code> tries repeatedly to send a
message to <code>s1</code>, but the partition prevents it. Again by the
assumption of availability, <code>s2</code> must eventually return to the
client. Since <code>s2</code> has not been able to communicate with
<code>s1</code> at all, the value of its register is still the initial
value of <code>0</code>.
</p>
</p>
The client wrote <code>9</code> to the register but then read back
<code>0</code>. This execution is not linearizable, and thus the register
is not linearizable. This violates our assumption of consistency and
proves the theorem!
<p>
<h2 id="implications_of_the_cap_theorem">
Implications of the CAP Theorem
</h2>
<p>
The CAP theorem proves that there is a fundamental trade-off between
strong consistency and availability. A distributed system must choose
between being strongly consistent and being highly available.
</p>
<p>
Shortly after Eric Brewer proposed the CAP theorem in his
<a href="https://scholar.google.com/scholar?cluster=14938038383922609098">
2000 PODC keynote</a>, there was a Cambrian explosion of systems that
chose high availability over strong consistency:
<a href="https://scholar.google.com/scholar?cluster=5432858092023181552">
Amazon's Dynamo</a>,
<a href="https://scholar.google.com/scholar?cluster=535416719812038974">
Google's BigTable</a>, and
<a href="https://scholar.google.com/scholar?cluster=9829178954647343079">
Facebook's Cassandra</a> to name a few. More recently, there has a
resurgence of systems that choose strong consistency over availability:
<a href="https://scholar.google.com/scholar?cluster=3523173873845838643">
Google's Spanner</a>,
<a href="https://scholar.google.com/scholar?cluster=15754405687028247503">
Stanford's RAMCloud</a>, and
<a href="https://scholar.google.com/scholar?cluster=8838739194584316753">
Cornell's Hyperdex</a> for example.
</p>
<p>
Furthermore,
<a href="https://scholar.google.com/scholar?cluster=4480351479446148866">
corollaries of the CAP theorem</a> show that there are also fundamental
trade-offs between consistency and latency. Even when the network is not
partitioned, strongly consistent storage systems run with more latency
that their weakly consistent counterparts.
</p>
<p>
The trade-offs detailed by the CAP theorem and its corollaries are the
reason that there are so many varying consistency models (including the
six we saw last lecture). Each consistency model represents one point in
a complex spectrum trading off consistency, availability, latency,
complexity, and so on. With this smorgasbord of consistency models,
systems are free to choose the one that suits their needs best.
</p>
</article>
</div>
<script src="js/snap.svg.js"></script>
<script src="js/distributed_systems.js"></script>
<script src="js/linearizable.js"></script>
<script src="js/2_cap.js"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-90310997-1', 'auto');
ga('send', 'pageview');
</script>
</body>
</html>