Robert Downey
News
About Me
My Thoughts
Resume
Code
Contact Me

You Can Kind of Lock What Isn't There

by RMD 2. January 2008 23:01

This is a follow up post to my previous entry titled, You Can't Lock What Isn't There. In that post I explain the folly of trying to use SQL Server locking hints to prevent duplicate row inserts in certain scenarios.

sql serverI exclaimed with great gusto that "you cannot use lock hints to prevent duplicate inserts. Period." Seems pretty final and certain, huh?

Well, it's not entirely true. In fact, you can use locking hints in combination with strict isolation levels to accomplish this feat. Specifically, you can use the serializable isolation level.

Isolation levels are on a per-connection basis in SQL Server. Once they're set, they stick around for that connection until they're changed. The serializable isolation level ensures that not only are results in the results set not updated, like the repeatable read isolation level, but also that no "phantom" rows appear. In other words, it prevents inserts that would be included in the results set from occurring. If I re-run my query again inside the same transaction, I will get the exact same results.

So how does this help us in the scenario I described in my other blog post? Well, the EXISTS check (see below) would include a duplicate value if it were inserted between the check and the subsequent insert and the same code was run again, and since that would violate the "phantom" rows guarantee of the serializable isolation level, SQL Server blocks that insert until the transaction completes.

BEGIN TRAN

IF NOT EXISTS(SELECT 1 FROM Orders WHERE OrderTxID = @OrderTxID)
BEGIN
     INSERT INTO Orders (NEWID(), @ProductID, @OrderTxID)
END

COMMIT TRAN

lockHow does SQL Server accomplish this feat? It's not pretty. SQL Server essentially locks the page that will eventually include the indexed column value being used in the query. In this case, it looks at the OrderTxID column's index, finds the page (or pages, or extents) that would include the next value in the sequence for this query, and places a lock on it.

Here is the real kicker: If no index is present, or if it can't determine what page or extent to lock, guess what... it locks the entire table. Now that's gonna be great for scalability, huh?

So I apologize for misleading you with my previous blog post. It wasn't completely correct. But now that you know the truth, pretend like you don't.

Tags: , , ,

Software Development

You Can't Lock What Isn't There

by RMD 19. June 2007 22:10

lock I recently had a bit of a SQL Server (and databases in general) epiphany. I suddenly realize that a strategy I had employed for years was fatally flawed. Looking back, it seems incredibly obvious, but I managed to miss it until now.

For whatever reason, I never realized that attempting to use lock hints to prevent duplicate row inserts will never work for the simple reason that you can't lock what isn't there.

Imagine you have an Orders table. This Orders table is very high load and has many thousands of rows inserted daily. Due to various business reasons, there are several NULLable columns in this table, preventing you from just creating a unique index on the unique combination of columns. Instead, you decide to do a UPDLOCK or a XLOCK on the table in question when doing a new insert with the hope that it will prevent duplicates. So you have something like:

BEGIN TRAN

IF NOT EXISTS(SELECT 1 FROM Orders (UPDLOCK) WHERE OrderTxID = @OrderTxID)
BEGIN
     INSERT INTO Orders (NEWID(), @ProductID, @OrderTxID)
END

COMMIT TRAN

At best, this won't work. You'll still get duplicates. At worst, you'll create a situation where you can have runaway deadlocking. The latter is how I discovered the flaw in my reasoning. Turns out that SQL Server will typically elevate your UPDLOCK to a page level lock on any index that you may have that covers the columns in your where clause. So with the above example, it would elevate to a page level lock of any index that had OrderTxID. When this happens, all it takes is a few well timed inserts and you'll have deadlocks on your hands. Lots of them.

sql server The basic reason is that a page level lock may include the page in which an insert is destined, and it's not necessarily the current insert. It could be one of the many simultaneous inserts going on during heavy load. This means that the other insert statements will often request a lock on the same page that the current insert has a lock on, and visa versa. The result is a deadlock.

The moral of the story is that unless you have a record that you know exists that you can do your locking on (such as a parent relation, like a Customer in the case of the Orders table example), you cannot use lock hints to prevent duplicate inserts. Period. (*Update 1/2/2008 - This isn't completely accurate. See my latest blog post on this topic for more information.)

So how do you? Well, normally you would just use unique constraints and indexes. In the previous, admittedly trivial example, I would just have a unique index on OrderTxID. But now imagine that OrderTxID is optional and can be NULL. Now what? You can't create a unique index on the column because there could be multiple, perfectly valid rows with NULL values.

All is not lost. You can use a computed column or an indexed view. In my case, I created a series of indexed views with unique constraints on these views. This worked like a charm.

In case you're wondering, if the Orders table had a parent table called Customer that it had a many to one relationship with, you could use the customer row to perform some smart locking. If you perform any kind of exclusive lock on the customer row (UPDLOCK, HOLDLOCK, XLOCK, etc.), this will effectively serialize your inserts on a per customer basis. Be warned, this might be escalated to a table level lock during very high loads, but it will still perform fairly well as long as the Customer table isn't as insert-heavy as the Orders table. So, in using our example:

BEGIN TRAN

SELECT 1 FROM Orders (XLOCK, HOLDLOCK) WHERE ID = @CustomerID

IF NOT EXISTS(SELECT 1 FROM Orders WHERE OrderTxID = @OrderTxID)
BEGIN
     INSERT INTO Orders (NEWID(), @CustomerID, @ProductID, @OrderTxID)
END

COMMIT TRAN

As you can see, by establishing an exclusive lock on the customer row that corresponds to the new order row, I can be sure that there will never be a case where a new order will be inserted between the time I check for a duplicate and insert the new row. This is very similar to a thread synchronization strategy often used in programming languages like C#. (i.e. the "lock" statement.)

When you can, use the database's built in features to prevent duplicates by leveraging unique constraints, indexed views, and computed columns.

Tags: , , ,

Software Development