quot; All writes are instantly visible and the global order is maintained. However, determining the "most recent" write is only possible within a single processor machine and in a DS (without a global time) is ambiguous. ### Linear consistency (linearizability) **Linearizability** is a strong form of consistency, it ensures that all operations appear to occur instantaneously and exactly once at some point between their invocation and their response. It's a **real-time** guarantee. > "The system is **sequentially consistent** and also if $ts_{OP_1}(x) < ts_{OP_2}(y)$ then operation $OP_1(x)$ precedes $OP_2(y)$ in the operation sequence" If one operation appears to happen before another from any global perspective, then every process in the system must agree on this order. Linearizability is useful in scenarios where the application logic requires a certain ordering between operations to be enforced, and all writes become visible (as if they were executed) at some instant in time, maintaining a global order The difference between **strict consistency** and **linearizability** is subtle and often **theoretical** because, in practice, strict consistency is generally not achievable. Linearizability is often the strongest **practical** consistency model implemented in distributed systems. ### Sequential consistency Processes can agree on a sequence of operations, regardless of the real-world clock time. This agreed-upon sequence preserves the semantics of the writes and reads. Although the DS itself may not have a real clock, we can imagine ourselves outside the system, observing the real order of operations. Let's assume that the x-axis of the schedule represents the definition of real time. > "The result is the same as if the operations by all processes were executed in some sequential order, and the operations by each process appear in this sequence in the order specified by its program" The schedule is sequential, but … At this point in time $B$ thinks $x$ is already $1, C$ thinks $x$ is still 0. If they communicate (through a different channel), they break the **illusion of a single copy**. In practice: - Use a single coordinator (single leader replication): - Sequential consistency **limits availability** since it's necessary to contact the leader (which might be further away from the client) which must propagate synchronously the update to the replicas to achieve fault-tolerance - **No tolerance for network partitions**: in case of net. part. clients are blocked or leader is blocked to contacts followers - **Distributed agreement**: the use of leaderless protocols which are quorum-based where for each operation it's necessary a **quorum** (*quò·rum* is the quotient, in numbers or percentages, of the votes cast or of the voters, required for an election or resolution to be valid) of the servers which agrees on the version number of a resource. With $N R$ ($N W$) number of replicas that the clients contact to read (write) and $N$ the number of all replicas: - $NR + NW > N$ ensures that the sets of replicas involved in read and write operations overlap. It means that at least one replica is common between read and write sets, helping **to avoid read-write conflicts**. - $N W> \frac{N}{2}$ ensures that more than half of the replicas must agree for a write operation to be committed; **to avoid write-write conflicts** In practice, the quickest way to determine if a sequence of operations is sequentially consistent is to identify a valid **interleaving** of operations that is acceptable. It's important to note that an interleaving is valid only if all processes observe the same sequence of operations. When dealing with multiple variables, you need to consider each one individually, as various cases and patterns can emerge. ### Causal consistency Causal consistency is a weaker form of consistency when compared to linearizability or sequential consistency but provides a **balance between availability and consistency**. > "Writes that are potentially causally related must be seen by all processes in the same order. Concurrent writes may be seen in any order at different machines." Causal consistency indeed weakens sequential consistency based on Lamport's notion of happened-before and it's not a total order but a **partial order**. This means that only causally related operations need to be ordered with respect to each other, while concurrent operations can appear in any order). Causal consistency is favored in DSs because it is easier to guarantee with smaller overhead and is easier to implement compared to stronger consistency models. Remember that causal order is **transitive**: it's important to know for understanding how causal relationships are established across different operations. In practice, in the exercises, for each couple of "write-write" and "read-write" in each process, you make a constraint which has to be respected by the reads of all processes. #### FIFO consistency >"Writes done by a single process are seen by all others in the order in which they were issued; writes from different processes may be seen in any order at different machines" Super simple consistency where causality across processes is dropped. It's implied by Causal consistency. In practice, in the exercises, for each couple of "write-write" in each process you make a constraint which has to be respected by the reads of all processes. #### Eventual consistency There are scenarios where simultaneous updates are unlikely and where read operations are more prevalent: examples include web caches, DNS, and geo-distributed data stores like Facebook/Instagram. In these types of systems, eventual consistency is often deemed satisfactory, as it guarantees that updates will **eventually** propagate to all replicas: - Very easy to implement - Very few conflicts in practice - Today's networks offer fast propagation of updates This is widely used in practice, in scenarios where the order of messages is not important. Eventual consistency doesn't imply any fifo/causal/sequential consistency. ## Client-centric consistency models Client centric consistency models take the pov of a single client who is reading/writing from multiple locations. | Concept | Definition | Example | |------------------|-------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------| | Monotonic Reads | Subsequent reads by a process will always see the same or newer values. | Reading a forum thread will always show the latest replies, never reverting to older posts. | | Monotonic Writes | Writes by a process are completed in order. | Comments on a blog are published in the order they're written by a user. | | Read Your Writes | A process will see its own writes in successive reads. | After posting a tweet, you'll see your tweet when you refresh the page. | | Writes Follow Reads | Writes by a process reflect the latest read value. | Replying to a message only after you've seen the most recent messages in the conversation. | ### Client-centric consistency implementations In this scenario, each operation has a unique identifier (ReplicaID + a sequence number) and there are two sets assigned to each client: - the **read-set**: write identifiers relevant for the read operations executed by the client - the **write-set**: the identifiers of the write performed by the client. These sets can be represented using **vector clocks**, which keep track of the latest read/write identifiers from each replica. ## Design Strategies ### Replica placement |Type|Description| |---|---| |**Permanent Replicas**|Statically configured, used in systems like DNS and CDNs.| |**Server-Initiated**|Created dynamically to cope with access load, moving data closer to clients.| |**Client-Initiated**|Rely on client cache, can be shared among clients for enhanced performance.| ### Update Propagation #### What to Propagate |Method|Description| |---|---| |**Notification Only**|Update performed, only notification propagated; used with invalidation protocols. Best if `#reads << #writes`.| |**Transfer Modified Data**|Transfer the modified data to all copies; efficient when `#reads >> #writes`.| |**Enable Update Operation**|Propagate information for the update operation at other copies (active replication).| #### How to Propagate |Approach|Description| |---|---| |**Push-Based**|Update propagated to all replicas, regardless of need; preserves high consistency.| |**Pull-Based**|Update fetched on demand; convenient if reads < writes, manages client caches.| |**Leases**|Used to switch between push and pull approaches.| | |State of server|Messages sent|Response time at client| |---|---|---|---| |**Push-based**|List of client replicas and caches|Update (and possibly fetch update later)|Immediate (or fetch-update time)|| |**Pull-based**|None|Poll and update|Fetch-update time| #### Propagation Strategies |Protocol|Description| |---|---| |**Leader-Based**|Synchronous, asynchronous, or semi-synchronous propagation.| |**Leaderless**|Includes read repair and anti-entropy processes.| ## Case studies ### Case study: Spanner [Spanner](https://cloud.google.com/spanner/docs/true-time-external-consistency) is a globally-distributed database developed by Google. It is specifically designed to handle very large databases, utilizing a partitioned approach where each partition is replicated. - **Design**: Spanner is designed for very large databases with many partitions, each of which is replicated. - **Techniques**: It uses standard techniques: - **single-leader replication** with **Paxos** for fault-tolerant agreement on followers and leader - **2PC** for **atomic commits** - **timestamp protocols** for concurrency control - **TrueTime**: Spanner's novelty lies in TrueTime, which uses very precise clocks (atomic clocks + GPS) to provide an API that returns an uncertainty range. The "real" time is guaranteed to be within this range, which is crucial for deciding when to commit read-write transactions. - **Transactions**: Read-write transactions use TrueTime to decide when to commit, waiting until the uncertainty range has certainly passed. This ensures transactions are ordered based on time, achieving **linearizability**. Read-only transactions also acquire a timestamp through TrueTime, allowing them to read the latest value at that time without locking, optimizing frequent read-only operations ### Case study: Calvin Calvin is designed for the same settings as Spanner. It adopts a sequencing layer to order all incoming requests, both read and write. - **Guarantees**: Calvin provides **linearizability** through a sequencing layer that orders all incoming requests, both read and write. - **Operation**: It uses a replicated log implemented using Paxos, requiring operations to be deterministic and executed in the same order everywhere. - **Advantages**: The system reduces lock contention by achieving agreement on the order of execution before acquiring locks and eliminates the need for 2PC because transactions are deterministic (they either succeed or fail in all replicas) ### Case study: VoltDB - **Developer Role**: Developers specify how to partition database tables and transactions. Specifying this allows the database to organize data efficiently based on query hints. - **Execution**: Single-partition transactions can execute sequentially on that partition without coordinating with other partitions.