2_cap.html

<!--The design and styling of this site was taken from http://redbook.io-->
<!DOCTYPE html>
<html>
<head>
  <title>Consistency in Distributed Systems</title>
  <link href='css/style.css' rel='stylesheet'>
  <link href='css/svg.css' rel='stylesheet'>
  <meta name=viewport content="width=device-width, initial-scale=1">
  <link rel="shortcut icon" href="favicon.ico">
</head>

<body>
  <div id="header">
    <div id="grouptitle">
      <a href="index.html">Consistency in Distributed Systems</a>
    </div>
  </div>

  <div id="container">
    <article>
      <h1 id="baseball" class="lecturetitle">
        <a href="https://scholar.google.com/scholar?cluster=17914402714677808535">
          Brewer's Conjecture and the Feasibility of Consistent, Available,
          Partition-Tolerant Web Services
        </a>
      </h1>

      <p>
      In this lecture, we'll define linearizability: a formal consistency model
      that's often used synonymously with strong consistency. We'll then
      present and prove the infamous CAP theorem which tells us that
      partition-tolerant distributed systems have to choose between strong
      consistency and availability. In other words, we can't have our strongly
      consistent cake and eat it with high availability too.
      </p>

      <h2 id="linearizability">Linearizability</h2>
      <p>
      In the last lecture, we established (rather informally) that a
      distributed storage system is strongly consistent if it behaves
      indistinguishably from a storage system running on a single computer. In
      this section, we'll fortify our understanding of strong consistency by
      taking a look at <strong>linearizability</strong>: a formalism of strong
      consistency initially proposed by <a
      href="https://scholar.google.com/scholar?cluster=7860241540823320465">
      Maurice Herlihy and Jeannette Wing in 1990</a>.
      </p>

      <p>
      In the last lecture, the only storage system we looked at was a key-value
      store. In this lecture, we'll consider an even simpler storage system: a
      <strong>register</strong>. A register stores a single value.  Clients can
      issue
      <ul>
        <li>
          a <code>w(x)</code> request to write <code>x</code> to the value or
        </li>
        <li>
          a <code>r()</code> request to read the value.
        </li>
      </ul>
      </p>

      <p>
      An example interaction between a client (<code>a</code>) and a register
      (<code>s</code>) running on a single computer is given below. The client
      first writes the value 7 into the register and then reads it out.
      </p>

      <div class="svgbox">
        <svg viewBox="0 0 400 102" id="simple_reg"></svg>
      </div>

      <p>
      From now on, let's assume that the register running on a single computer
      can instantaneously process a request and send a response the exact
      moment the request arrives. With that assumption, let's take a look at
      how two clients (<code>a</code> and <code>b</code>) might interact with
      such a register when the network starts to delay and quicken the delivery
      of messages.
      </p>

      <div class="svgbox">
        <svg viewBox="0 0 400 152" id="async_network"></svg>
      </div>

      <p>
      The execution begins as client <code>a</code> sends a very slow
      <code>w(9)</code> request to the register. Before the <code>w(9)</code>
      request has a chance to make it to the register, client <code>b</code>
      sends a very speedy <code>w(4)</code> request. The <code>w(4)</code>
      request arrives at the register and the response arrives back at client
      <code>b</code> before client <code>a</code>'s request even has a chance
      to reach the register. Finally, client <code>a</code>'s write request
      arrives at the register, writes a value of <code>9</code>, and returns
      back to client </code>a</code>.
      </p>

      <p>
      To make things clearer, we've drawn the contents of the register at the
      moment each request arrives at the register and takes effect. When the
      <code>w(4)</code> request takes effect, the register has value
      <code>4</code>; when the <code>w(9)</code> request takes effect, the
      register has value <code>9</code>.
      </p>

      <p>
      Note that from our god's eye view, we know exactly when each request
      arrives at the register. Unfortunately, clients don't enjoy such a
      luxury. A client only knows when it sends its request and when it
      receives the corresponding response. It doesn't know exactly when the
      request arrives at the register.
      </p>

      <p>
      Let's take a closer look at an execution from the perspective of the
      clients and note something rather profound. We may not know when requests
      arrive at the register, but <em>we can guess</em>! That is, for each
      request, we can guess a time&mdash;between when the request is sent and
      when a response is received&mdash;that the request <em>could have</em>
      arrived at the register.
      </p>

      <p>
      Play around with the following interactive visualization of an execution
      in which client <code>a</code> sends a <code>w(3)</code> request and then
      client <code>b</code> sends a <code>r()</code> request. You can click
      (and drag) on the thick colored bars to place your guess as to when each
      request arrived at the server.
      </p>

      <div class="svgbox">
        <svg viewBox="0 0 400 110" id="first_guess"></svg>
      </div>

      <p>
      In the previous execution, all of your guesses <em>could have been
      correct</em>. However, this is not always the case. Consider the
      following execution in which client <code>a</code> and client
      <code>b</code> concurrently issue write requests after which client
      <code>a</code> issues a read request. Note that some of your guesses
      <em>cannot possibly be correct</em>.
      </p>

      <div class="svgbox">
        <svg viewBox="0 0 400 110" id="first_wrong_guess"></svg>
      </div>

      <p>
      More specifically, any guess in which client <code>b</code>'s
      <code>w(8)</code> request follows client <code>a</code>'s
      <code>w(2)</code> request cannot possibly be correct because after both
      the writes, client <code>a</code> reads a value of <code>2</code>. This
      means that even though we don't know exactly when the requests arrive at
      the register, we do know that the <code>w(2)</code> request must arrive
      after the <code>w(8)</code> request.
      </p>

      <p>
      For read requests, note that we've shaded the value of the register red
      whenever the register's value is inconsistent with the value returned by
      the read request. For example, client <code>a</code>'s read request above
      returns <code>2</code>, so the final state of the register is shaded red
      whenever its value is anything other than <code>2</code> (e.g.
      <code>8</code>).
      </p>

      <p>
      If any of the registers in our execution are shaded red, then our guess
      can not possibly be correct. This is bad. If all of the registers are
      green, then our guess could be correct. This is good. Try to find a
      potentially correct guess for the following executions.
      </p>

      <div class="svgbox">
        <svg viewBox="0 0 400 160" id="challenge_one"></svg>
      </div>

      <div class="svgbox">
        <svg viewBox="0 0 400 210" id="challenge_two"></svg>
      </div>

      <p>
      At this point, we've seen an execution in which all guesses were
      potentially correct. We've also seen a few executions in which some
      guesses were potentially correct and some were definitely incorrect. Now
      try to find a potentially correct guess for the following execution.
      </p>

      <div class="svgbox">
        <svg viewBox="0 0 400 160" id="no_good_guess"></svg>
      </div>

      <p>
      Alas, all guesses are incorrect; there is no potentially correct guess!
      That is, there does not exist a way for us to guess the moment that each
      request arrives at the register such that the guess is consistent with
      the responses of all the read requests. Thus, this execution could not
      have taken place with a register running on a single computer. Instead,
      we have deduced that the register must be replicated on multiple
      computers. In other words, we were able to distinguish the behavior of
      the register from the behavior of a register running on a single computer!
      </p>

      <p>
      Surprise, this is linearizability!
      </p>

      <p>
      For a given execution, if there exists a potentially correct guess, then
      we say the execution is linearizable. If all guesses are definitely
      incorrect, then we say the execution is not linearizable. Similarly, a
      <strong>linearizable register</strong> is one that only allows
      linearizable executions.
      </p>

      <p>
      If a (potentially replicated) register is linearizable, then every
      interaction with it is indistinguishable from an interaction with a
      register running on a single computer in which the requests arrive at the
      single-computer register according to one of our potentially correct
      guesses. If a replicated register is not linearizable, then there are
      some interactions with it that could not possibly occur with a
      single-computer register.
      </p>

      <p>
      Thus far, we've only discussed linearizability in the context of a
      register, but linearizability can be extended quite naturally to deal
      with many other types of objects (e.g. queues, sets). This generalization
      of linearizability, as well as a full formalization, can be found in
      Herlihy and Wing's 1990 paper: <a
      href="https://scholar.google.com/scholar?cluster=7860241540823320465">
      <em>Linearizability: A Correctness Condition for Concurrent
      Objects</em></a>.
      </p>

      <h2 id="consistency_availability_and_partition-tolerance">
        Consistency, Availability, and Partition-Tolerance
      </h2>
      <p>
      The CAP theorem states that a partition-tolerant replicated register
      cannot be both consistent and available. In order to understand (and
      prove) the CAP theorem, we first have to define the words
      <em>consistent</em>, <em>available</em>, and <em>partition-tolerant</em>.
      </p>

      <p>
      <strong>Consistent.</strong> The CAP theorem uses the word
      <em>consistent</em> as a synonym for <em>linearizable</em>. Thanks to the
      last section, we already understand what it means for a replicated
      register to be linearizable.
      </p>

      <p>
      <strong>Available.</strong> A replicated register is available if every
      request sent to a non-failed replica eventually produces a response. In
      other words, whenever a non-failed replica receives a request (e.g.
      <code>w(9)</code>) from a client, it eventually has to send a response
      (e.g. <code>ok</code>) back to the client. The replica is allowed to take
      as long as it wants before responding to the client, but it is
      <em>not</em> allowed to ignore the request indefinitely.
      </p>

      <p>
      <strong>Partition-tolerant.</strong> The CAP theorem assumes that, at any
      time, replicas can be temporarily partitioned from one another and that
      any messages sent between the two partitions are lost. Effectively, this
      means that the network can drop arbitrary messages sent from any replica
      to any other replica at any time. If a replicated register behaves
      correctly despite the possibility of arbitrary message loss, we say it is
      partition-tolerant.
      </p>

      <h2 id="proving_the_cap_theorem">Proving The CAP Theorem</h2>
      <p>
      To reiterate, the CAP theorem states that a partition-tolerant replicated
      register cannot be both consistent and available. Now that we've
      established the definitions of <em>consistent</em>, <em>available</em>,
      and <em>partition-tolerant</em>, we're ready to prove the theorem! Note
      that even though the CAP theorem is hugely far-reaching and influential,
      we think you'll find the proof remarkably simple.
      </p>

      <p>
      Assume for contradiction that there exists a consistent, available, and
      fault-tolerant replicated register. Also assume the replicas are
      partitioned in two.  For simplicity (and without loss of generality),
      consider a replicated register with exactly two replicas (<code>s1</code>
      and <code>s2</code>).  Assume the register has an initial value of 0 and
      consider the following execution.
      </p>

      <div class="svgbox">
        <svg viewBox="0 0 400 242" id="cap"></svg>
      </div>

      <p>
      Our client (<code>a</code>) begins by issuing a <code>w(9)</code> request
      to <code>s1</code>. When <code>s1</code> receives the request, it
      repeatedly attempts to send a message to <code>s2</code>, but alas the
      partition drops all the messages. By our assumption of availability,
      <code>s1</code> eventually writes <code>9</code> to the register and
      returns a response to the client.
      </p>

      <p>
      Then, the client sends a <code>r()</code> request to <code>s2</code>: the
      other replica. As before, <code>s2</code> tries repeatedly to send a
      message to <code>s1</code>, but the partition prevents it. Again by the
      assumption of availability, <code>s2</code> must eventually return to the
      client. Since <code>s2</code> has not been able to communicate with
      <code>s1</code> at all, the value of its register is still the initial
      value of <code>0</code>.
      </p>

      </p>
      The client wrote <code>9</code> to the register but then read back
      <code>0</code>. This execution is not linearizable, and thus the register
      is not linearizable. This violates our assumption of consistency and
      proves the theorem!
      <p>

      <h2 id="implications_of_the_cap_theorem">
        Implications of the CAP Theorem
      </h2>
      <p>
      The CAP theorem proves that there is a fundamental trade-off between
      strong consistency and availability. A distributed system must choose
      between being strongly consistent and being highly available.
      </p>

      <p>
      Shortly after Eric Brewer proposed the CAP theorem in his
      <a href="https://scholar.google.com/scholar?cluster=14938038383922609098">
        2000 PODC keynote</a>, there was a Cambrian explosion of systems that
      chose high availability over strong consistency:
      <a href="https://scholar.google.com/scholar?cluster=5432858092023181552">
        Amazon's Dynamo</a>,
      <a href="https://scholar.google.com/scholar?cluster=535416719812038974">
        Google's BigTable</a>, and
      <a href="https://scholar.google.com/scholar?cluster=9829178954647343079">
        Facebook's Cassandra</a> to name a few. More recently, there has a
      resurgence of systems that choose strong consistency over availability:
      <a href="https://scholar.google.com/scholar?cluster=3523173873845838643">
        Google's Spanner</a>,
      <a href="https://scholar.google.com/scholar?cluster=15754405687028247503">
        Stanford's RAMCloud</a>, and
      <a href="https://scholar.google.com/scholar?cluster=8838739194584316753">
        Cornell's Hyperdex</a> for example.
      </p>

      <p>
      Furthermore,
      <a href="https://scholar.google.com/scholar?cluster=4480351479446148866">
        corollaries of the CAP theorem</a> show that there are also fundamental
      trade-offs between consistency and latency. Even when the network is not
      partitioned, strongly consistent storage systems run with more latency
      that their weakly consistent counterparts.
      </p>

      <p>
      The trade-offs detailed by the CAP theorem and its corollaries are the
      reason that there are so many varying consistency models (including the
      six we saw last lecture). Each consistency model represents one point in
      a complex spectrum trading off consistency, availability, latency,
      complexity, and so on. With this smorgasbord of consistency models,
      systems are free to choose the one that suits their needs best.
      </p>
    </article>
  </div>

  <script src="js/snap.svg.js"></script>
  <script src="js/distributed_systems.js"></script>
  <script src="js/linearizable.js"></script>
  <script src="js/2_cap.js"></script>
  <script>
    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
    (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
    })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
    ga('create', 'UA-90310997-1', 'auto');
    ga('send', 'pageview');
  </script>
</body>
</html>