Rules of Engagement: Understanding Consistency
|Rules of Engagement was written to help shed light on the usefulness of NoSQL in your applications, and get you thinking about where it could fit into your overall architecture. I was originally going to use this space to read your mind and go through a list of some of the top reservations that you might have about giving NoSQL a chance. But when I dug into Eventual Consistency, I found it so compelling that I decided to dedicate an entire section to it.
Speed AND Availability AT Scale
Setting aside the differences, SQL or NoSQL, we all want the same things: speed AND availability AT scale. And, whenever you hear the word scale, distributed is sure to follow. So, given that we are talking about distributed databases (typically replicated), whether it’s SQL Server, Riak, or Cassandra, choices are going to have to be made between consistency and latency. And those decisions will have consequences. For our purposes, latency is the time between when the data was requested and when that request has a response. Consistency means that we get the same response for a given request.
Degrees of Consistency
Let’s say that we have 3 replicas of the data. If we want strong or guaranteed consistency, we need to write to all 3 replicas synchronously before allowing any reads on that data. Given that this is a distributed system, this would introduce some significant latency (i.e., slowness). Think about all of blocking headaches you have dealt with on “small” SQL Server databases, and now imagine trying to coordinate that synchronous work across multiple servers.There is a period of time between the write and the moment any reader will always see only the updated value. The CTO of Amazon, Werner Vogels, refers to this as the inconsistency window in Eventually Consistent – Revisited.
You have probably heard that replication is one of several options for scaling out SQL Server reads. Would it surprise you to know that using asynchronous replication in SQL Server results in eventual consistency? Another option available for scaling out SQL Server is database mirroring: high-performance mode is asynchronous (eventual consistency), high-safety mode is synchronous (strong consistency). Weak consistency provides no guarantee that subsequent reads will return the last write. Eventual consistency is a specific type of weak consistency that says, if write activity to a given piece of data were to stop, all replicas would be consistent, eventually.
Alphabet Soup
Many distributed NoSQL data stores support quorum replication, which allows the administrators to turn the knobs that decide between consistency and latency (we have similar knobs for SQL Server replication). So, if your argument against NoSQL has always been I can’t afford eventual consistency, know that not only is the level of consistency is in your hands, but you have been in the same boat with SQL Server – you just may not have known it.
Let’s talk more about these knobs, and some of the common definitions seen in the world of NoSQL:
N – the number of Nodes (or replicas) that store the data
W – the minimum number of nodes that must acknowledge the Write before considered complete
R – the minimum number of Read nodes (or replicas) queried for a read operation
If W+R > N, then the Write nodes overlap the Read nodes, which guarantees strong consistency.
Let’s go back to SQL Server for our example, and say we have a synchronous AlwaysOn Availability Group set up.
N = 2 Our number of ‘replicas’
W = 2 The write transaction isn’t committed on the primary until it receives acknowledgement from the secondary that it has been written to disk
R = 1 Only one node is queried using AlwaysOn
If W + R <= N, there is the possibility that read and write nodes do NOT overlap, resulting in weak or eventual consistency.
This time we’ll set up and asynchronous AlwaysOn Availability Group.
N = 2 Same as last time
W = 1 In asynchronous commit mode, that write is committed immediately to the primary
R = 1 Only one node is queried using AlwaysOn
More
For more information about consistency in NoSQL databases (especially ‘Dynamo style’ databases), I encourage you to watch How Eventual is Eventual Consistency? Introducing Probabilistically Bounded Staleness and check out the PBS site and demo/utility to experiment with different values for N, W, R, and latency.