I recently had a bit of a SQL Server (and databases in general) epiphany. I suddenly realize that a strategy I had employed for years was fatally flawed. Looking back, it seems incredibly obvious, but I managed to miss it until now.
For whatever reason, I never realized that attempting to use lock hints to prevent duplicate row inserts will never work for the simple reason that you can't lock what isn't there.
Imagine you have an Orders table. This Orders table is very high load and has many thousands of rows inserted daily. Due to various business reasons, there are several NULLable columns in this table, preventing you from just creating a unique index on the unique combination of columns. Instead, you decide to do a UPDLOCK or a XLOCK on the table in question when doing a new insert with the hope that it will prevent duplicates. So you have something like:
BEGIN TRAN
IF NOT EXISTS(SELECT 1 FROM Orders (UPDLOCK) WHERE OrderTxID = @OrderTxID)
BEGIN
INSERT INTO Orders (NEWID(), @ProductID, @OrderTxID)
END
COMMIT TRAN
At best, this won't work. You'll still get duplicates. At worst, you'll create a situation where you can have runaway deadlocking. The latter is how I discovered the flaw in my reasoning. Turns out that SQL Server will typically elevate your UPDLOCK to a page level lock on any index that you may have that covers the columns in your where clause. So with the above example, it would elevate to a page level lock of any index that had OrderTxID. When this happens, all it takes is a few well timed inserts and you'll have deadlocks on your hands. Lots of them.
The basic reason is that a page level lock may include the page in which an insert is destined, and it's not necessarily the current insert. It could be one of the many simultaneous inserts going on during heavy load. This means that the other insert statements will often request a lock on the same page that the current insert has a lock on, and visa versa. The result is a deadlock.
The moral of the story is that unless you have a record that you know exists that you can do your locking on (such as a parent relation, like a Customer in the case of the Orders table example), you cannot use lock hints to prevent duplicate inserts. Period. (*Update 1/2/2008 - This isn't completely accurate. See my latest blog post on this topic for more information.)
So how do you? Well, normally you would just use unique constraints and indexes. In the previous, admittedly trivial example, I would just have a unique index on OrderTxID. But now imagine that OrderTxID is optional and can be NULL. Now what? You can't create a unique index on the column because there could be multiple, perfectly valid rows with NULL values.
All is not lost. You can use a computed column or an indexed view. In my case, I created a series of indexed views with unique constraints on these views. This worked like a charm.
In case you're wondering, if the Orders table had a parent table called Customer that it had a many to one relationship with, you could use the customer row to perform some smart locking. If you perform any kind of exclusive lock on the customer row (UPDLOCK, HOLDLOCK, XLOCK, etc.), this will effectively serialize your inserts on a per customer basis. Be warned, this might be escalated to a table level lock during very high loads, but it will still perform fairly well as long as the Customer table isn't as insert-heavy as the Orders table. So, in using our example:
BEGIN TRAN
SELECT 1 FROM Orders (XLOCK, HOLDLOCK) WHERE ID = @CustomerID
IF NOT EXISTS(SELECT 1 FROM Orders WHERE OrderTxID = @OrderTxID)
BEGIN
INSERT INTO Orders (NEWID(), @CustomerID, @ProductID, @OrderTxID)
END
COMMIT TRAN
As you can see, by establishing an exclusive lock on the customer row that corresponds to the new order row, I can be sure that there will never be a case where a new order will be inserted between the time I check for a duplicate and insert the new row. This is very similar to a thread synchronization strategy often used in programming languages like C#. (i.e. the "lock" statement.)
When you can, use the database's built in features to prevent duplicates by leveraging unique constraints, indexed views, and computed columns.