Antonio's technology blog: The CAP theorem

When designing distributed systems such as distributed grids, distributed caches, distributed web services, etc. I often see solution architects and technical architects struggle with the design of the system while trying to provide three main characteristics : consistency, availability and partition tolerance.
It appears to be widely accepted, nowadays, that any enterprise technical architecture design should guarantee those three features, almost like a standard.
Unfortunately, when it comes to designing distributed system, that's not immediate and you do see architects continuously reviewing their design in order to provide their systems with those three guarantees and, ultimately, having to find a trade off.
Let's first see what those features mean in a distributed systems:

Consistency
Consistency simply means that each server in the grid returns the right response to each request. That means that the entire distributed system provides, consistently, the same data on each server where the request is made.

Availability
Availability means that each request eventually receives a response. The system is always available.

Partition tolerance
Communication between servers in the grid can be faulty. Messages can be delayed or lost and servers can crash. If that happens, the system should continue working correctly

Trying to satisfy all the three guarantees in a distributed grid is , theoretically, impossible.
In 2000, professor Eric Brewer, from the University of California, Berkeley , at the 2000 Symposium on Principles of Distributed Computing, made a conjecture of what became known as the CAP theorem.
According to the theorem, a distributed system can satisfy any two of these guarantees at the same time, but not all three.

This is something every architect should know when dealing with distributed systems.
So, how do we deal with that? If you google "CAP Theorem" you will find many articles with solutions, panacea, people claiming to have beaten this restriction, etc.
In reality, all comes to two steps:

1- drop one of the three constraints
2- design around it (architects already do that even not knowing the CAP theorem)

Consider that dropping one of the constraints does not mean that the system will never be consistent or available or partition tolerant. It means that there will be a time window in which the system is not providing that feature (it can either be milliseconds or seconds).
NOSQL databases are a good example of distributed systems that face the CAP theorem.

Antonio

Antonio's technology blog

Sunday, 20 January 2013

The CAP theorem

2 comments: