Start free trial Sign in

From the course: Cloud NoSQL for SQL Professionals

Understand CAP Theorem and data - NoSQL Tutorial

From the course: Cloud NoSQL for SQL Professionals

Start my 1-month free trial

Understand CAP Theorem and data

“

- [Narrator] Next up we're going to get an introduction to the CAP Theorem. This stands for consistency, availability, partitions or partition tolerance, and the thinking is that a database system can have two but not all three of these capabilities. So what are they? Consistency has to do with data and the consistency of the data being written and being read. Now in relational databases we have transactions by default which can be summarized by saying if we had two tables and we wanted to take something out of one and put it into another. Then we can combine or group those two statements as a transaction. They can either all succeed or all fail. Banking is a classic example here. Generally in NoSQL databases transactions aren't present although there are exceptions that we'll see. It's important to understand the performance overhead that transactions cause. In terms of availability, that means that the system will have multiple copies in case one part of the system goes down, there's a redundant copy. Again, there's a cost to this and partitions increase scalability because you can simply add more nodes to serve up more of the data. Now, you can already start to see even if you're new to this how these different abilities can be in conflict. Let's look at them in terms of our systems. It is of course an assumption that we're going to have more than one partition when we have this discussion and of course as we mentioned with one notable exception that I know of we can't have all three in one system. So if we think of our consistency and availability and not partitioning is really what we get with our relational database. Consistency we get transaction isolation and repeatability. And I already talked about the banking application and availability we can have clustered or duplicated servers with up to five nines of uptime. Of course that's a nontrivial thing to do as all of us who have been DBAs know there's quite a bit of work around that. As I mentioned though key to this is not easy to partition. It's expensive to scale. Let's contrast that with availability and partitioning that's classically present in NoSQL. So partitioning fast and globally scalable and if we took a mobile gaming application, we simply add more nodes and if the data isn't consistent all the time, our game players will just hit refresh. It's not, ya know, if I can never bank account by contrast here and we can have high availability we can have clustered servers and we can have high uptime. But as I just mentioned not transactionally consistent. Of course Google always has to have the exception case and so of the products that I'm aware of that are commercially available the only thing that claims to be and appears to be having all three of the CAP capabilities is Google Cloud Spanner. Now, as you might imagine this capacity is expensive and although it's very exciting I only recommend it to use for Google-scale relational data. Now if you're curious, like me, and wondering how they actually did this I put a link to the white paper that talks about how they created their own time management system called True Time. Which is key to the ability to have globally scalable transactional consistency. It is a really compelling read but it is the exception case everybody else just gets two Google gets three

Contents