Sunday, 20 January 2013

The CAP theorem

When designing distributed systems such as distributed grids, distributed caches, distributed web services, etc. I often  see solution architects and  technical architects  struggle with the design of the system while trying to provide three main characteristics : consistency,  availability and partition tolerance.
It appears to be widely accepted, nowadays, that any enterprise technical architecture design should guarantee those three features, almost like a standard.
Unfortunately, when it comes to designing distributed system, that's not immediate and you do see architects continuously reviewing their design in order to provide their systems with those three guarantees and, ultimately, having to find a trade off.
Let's first see what those features mean in a distributed systems:

Consistency
Consistency simply means that each server in the grid returns the right response to each request. That  means that the entire distributed system provides, consistently, the same data on each server where the request is made.

Availability
Availability means that each request eventually receives a response. The system is always available.

Partition tolerance
Communication between servers in the grid can be faulty.  Messages can be delayed or lost and servers can crash.  If that happens, the system should continue working correctly

Trying to satisfy  all the three guarantees in a distributed grid is , theoretically, impossible.
In 2000, professor Eric Brewer, from  the University of California, Berkeley , at the 2000 Symposium on Principles of Distributed Computing, made a conjecture of what became known as the CAP theorem.
According to the theorem, a distributed system can satisfy any two of these guarantees at the same time, but not all three.

This is something every  architect should know when dealing with distributed systems.
So, how do we deal with that?  If you google "CAP Theorem" you will find many articles with solutions, panacea, people claiming to have beaten this restriction, etc.
In reality, all comes to two steps:

1- drop one of the three constraints
2- design around it (architects already do that even not knowing the CAP theorem)

Consider that dropping one of the constraints does not mean that the system will never be consistent or available or  partition tolerant. It means that there will be a time window in which the system is not providing that feature (it can either be milliseconds or seconds).
NOSQL databases are a good example of distributed  systems that face the CAP theorem.

Antonio

2 comments:

  1. Hi Antonio.

    I think that its not really a choice of dropping one of the constraints but making choices that result in varying degrees of each. The CAP theorem also equally applies to both SQL and NoSql databases.

    ReplyDelete
    Replies
    1. Hi John
      I think that its not really a choice of dropping one of the constraints but making choices that result in varying degrees of each.
      True. I was considering that having a "degree" of one guarantee means you're basically not providing it 100%, which is like dropping it for some organisations. But I agree with you, rather than saying dropping a constraint, providing "varying degrees of each" is a better way of putting it.

      The CAP theorem also equally applies to both SQL and NoSql databases.
      Indeed, the CAP theorem does apply to SQL databases.

      I might discuss SQL databases and grids in the future. And why thye CAP theorem does apply to them too.

      Delete