From the course: Database Foundations: Database Management

Understanding concurrency and locks

From the course: Database Foundations: Database Management

Start my 1-month free trial

Understanding concurrency and locks

- [Instructor] When you start working with transactions, two important concepts come into play. Let's spend a few minutes and understand how relational databases deal with multiple concurrent users and data access locks. Concurrent access occurs when two or more users are performing activities with the same row of data at the same time. Without careful consideration, a number of inconsistency errors can develop which can corrupt the data that you've carefully stored. The first concurrency issue that can occur is when two users attempt to write to the same row in a table at the same time. This is called the lost update problem. Let's say that you're running an online shoe store and you have one pair left of the most popular new sneaker. A user on your site views your inventory and sees that there's one pair left. A second user on their own computer sees the same information. Both users add the sneaker into their shopping cart and attempt to check out. In a lost update scenario, the inventory system would subtract one pair of sneakers from the inventory from what the first user saw and also subtract one from inventory for the second user. Both operations would update the inventory level to zero when in reality, two pairs were actually sold. The problem comes when both shoppers we're allowed to read and write to the same database at the same time. To fix this, the database should force one user to wait while the other transaction completes then reread the inventory level for the second purchase to verify inventory levels again before the second transaction can proceed. You've probably experienced this if you've ever tried to buy a popular item online and added it to your cart only to be told later that it was actually sold out. The other three concurrency issues arise during table reads by multiple users. A dirty read occurs when user is allowed to read uncommitted data from another transaction. Without knowing whether that transaction will complete with a final commit or a rollback, the user won't be able to fully trust that the values that they see are accurate. Nonrepeatable reads occur when the data changes between two reads of the same value while in a transaction, no other operations should be able to modify values. And finally, a phantom read occurs when rows seemingly disappear during a transaction. This happens when a transaction reads a table twice but another process sneaks in and removes or adds new rows in between both of those reads. In order to combat all of these problems, relational database management systems employ a number of table and row locking strategies to prevent simultaneous access from occurring by concurrent users. Locks prevent two different transactions from accessing the same data resources at the same time either at the entire table level or at the individual row level. The problem with row and table locks is that they hold resources from being accessed which is actually kind of their entire point. But this slows down database performance. Ultimately the best and most secure way to handle these concurrency problems is to only allow serial execution of transactions. That means that a second transaction can begin only after the first one completes. Obviously this creates a lot of downtime while users wait around for other users to finish their work. So the goal of the RDBMS with this approach is to create a lock only when required and then release the lock as soon as it's safe to do so. This approach can only go so far though. Alternatively RDBMSs can also allow parallel execution of simultaneous transactions under certain circumstances. As long as precautions are taken to isolate the effects of these transactions from affecting each other The SQL specification outlines four isolation levels that could be applied to transactions depending on the kinds of concurrency issues that need to be mitigated against. These give database managers some agency in deciding which trade-offs between performance and reliability are acceptable and which ones aren't. Here are the four isolation levels that you can choose from. Serializable will prevent all concurrency issues including dirty reads, nonrepeatable reads and Phantom reads. This isolation level forces transactions to run one at a time. The repeatable read isolation level prevents dirty reads and nonrepeatable reads. But it will allow simultaneous transactions to create new rows or remove existing rows. For transactions that are only concerned with values that are already stored in the table, this might not be much of a problem. Next we can choose to employ a read committed isolation level. This strategy only prevents dirty reads. So values can't be seen by a transaction in progress and before they're committed or rolled back. And finally there is the read uncommitted isolation level which does not prevent any concurrency issues from occurring and transactions are allowed to execute simultaneously as fast as the server's resources will allow. Different RDBMS platforms will have different levels of support for some or all of these isolation levels. And the specifics in how they're implemented in your transactions will vary from platform to platform. But just know that performance trade-offs can be made when developing your transactions depending on how sensitive your data is to concurrency anomalies.

Contents