Robert Downey
News
About Me
My Thoughts
Resume
Code
Contact Me

The Danger of Assuming Read Consistency

by RMD 23. September 2009 00:40

Riddle me this.

consistencyLet's say you want to perform a simple select statement from a table in a database. And let's say that table has a primary key, meaning that it has a clustered index on some column. let's say, the ID column. And let's say you do a select of some or all of the rows in that table.

Can you say for sure that the resulting data set will have one and only one entry for each primary key in the source table?

The answer is: it depends. :)

Most people would think, and I can understand why, that if you have a clustered index on a table, and you do something simple like SELECT * FROM myTable, that you'll get one result for each row in that table. But that's not always the case.

Imagine that the table you're selecting from has an index, and that index is on the column "Lastname". And imagine that your select statement is something like: SELECT ID, Lastname FROM myTable. It's fairly likely that your query will take advantage of the index since it "covers" all the data in your query.

So what's the issue? Well, let's say when you start your query, the index data looks like this:

ID, Lastname
3, Allen
1, Downey
4, Jacobs
2, Richardson

So now your query starts reading the rows out of the index. You've read "Allen" and "Downey" when all of a sudden, somebody changes the last name "Allen" to "Kelly". Since "Kelly" comes after "J" and before "R", the new index data looks like:

ID, Lastname
1, Downey
4, Jacobs
3, Kelly
2, Richardson

But wait, what happens? You just finished reading through Downey, but now Allen (who I guess got married to Mr. Kelly) is going to be read again. In other words, the output from your select statement will look like:

ID, lastname
3, Allen
1, Downey
4, Jacobs
3, Kelly
2, Richardson

We have a duplicate primary key value in our output, and 1 more record than currently exists in the original table! What gives?

Well, it's just the way things work. When the database engine starts reading data, it reads it in the order of the underlying data source. If that happens to be a source whose order can change, all bets are off.

If you don't want this to happen, you need to use a more strict isolation level for your query. In SQL Server, the minimum isolation level required to prevent this stuff is Repeatable Read. If you don't want to change isolation levels, you can essentially accomplish the same thing by using a TABLOCK hint on your select, although SQL Server might decide to ignore this hint.

So why don't we see these issues all the time? Well, part of it is a timing issue. You need to be performing a query on the exact set of data that happens to include a row that's being changed, and your query must be against a data set that is ordered on one or more columns with changing values. (That's why I used an index in my previous example, since it's very rare for a primary key on a table to change.)

Duplicate rows aren't the only issue. Using our previous example, what happens when Jacobs is renamed to "Clark"? That row would be moved to before Downey, and if I had already read past Downey, I would miss it. In other words, your results would be entirely missing the record with the ID of 4.

So watch out people!

Oh, and for the love of baby jesus, don't use the NOLOCK hint if you care at ALL about the consistency of the resulting data. Seriously.

Tags: , , ,

Software Development

The Perils of Multithreading

by RMD 30. April 2008 16:45

I've been trying to hunt down a bug for months. It's one of those bugs that you can't replicate, that you can really predict, and that only happens in production once every few weeks. It's also a bug that can essentially completely break the system, and at the worst possible time... during periods of heavy load.

The basic story is that I have an application that handles tens or hundreds of thousands of requests a day. These requests are in the form of MSMQ messages which are sent to a set of middle tier machines for processing from a variety of sources and for a variety of purposes.

Once they reach the middle tier, a component called the Dispatch Manager deserializes the contents of the message, examines the .NET type and various properties of that type that adhere to a particular interface called IDispatchedMessage, and then hands the message off to another component for processing. The process of peeking the message on the queue and dispatching it happens in a multithreaded, object pooled environment.

The problem was that sometimes the message types wouldn't get deserialized properly. I would get errors stating that the Dispatch Manager didn't know how to read that particular message. This almost always happened during periods of heavy load, when there were dozens of threads simultaneously dispatching messages. After a fairly random amount of time, but always when the load started to decrease a little, the Dispatch Manager would magically remember how to deserialize those messages again and the problem would go away until the next time I restarted the application or the internal application cache was cleared.

The message types and where they're supposed to get dispatched to are stored in a SQL Server database. This data doesn't change too often, so I cache it using a singleton pattern. The Dispatch Manager constructs an XmlMessageFormatter class instance using the type names from the database, caches this instance, and returns a reference to this static instance when requested. This instance is then assigned to the MSMQ message and the Body property is accessed to get the deserialized data. Something like:

public IDispatchedMessage DeserializeMessage(System.Messaging.Message message)
{
    IDispatchedMessage dispatchedMessage;
    XmlMessageFormatter messageFormatter;
    messageFormatter = MessageFormatterCache.GetMessageFormatter(); 
    message.Formatter = messageFormatter;
    
    dispatchedMessage = message.Body as IDispatchedMessage;
 
    return dispatchedMessage;
}

 

Looking at this code, I concentrated my attention on the MessageFormatterCache. I figured it had to be the problem. Some how, my singleton pattern was failing under load. I rewrote the code several times. I did extensive research on potential issues with double checked locking in .NET. Not matter what I did, the code still broke.

 

Well, that wasn't the issue. It was a far more insidious issue, and honestly one I should have caught a lot earlier. The XmlMessageFormatter class, like many classes in the .NET Framework, is not thread safe. Usually, this doesn't actually mean that it won't work, it just means that it hasn't been tested or designed with multithreaded access in mind. In the case of the XmlMessageFormatter, they weren't kidding.

 

The clue that tipped me off was that just before the Dispatch Manager would start forgetting message types, I would get an exception stating that a collection had been modified and therefore the enumeration could not continue. I was never creating, modifying, or enumerating collections in the code in question, so I took at look at the stack trace. Sure enough, it pointed directly to the Read method of the XmlMessageFormatter.

 

Here is a slightly abbreviated version of that Read method:

 


public object Read(Message message)
{
    this.CreateTargetSerializerTable();
 
    XmlTextReader xmlReader = new XmlTextReader(message.BodyStream);
  
    foreach (XmlSerializer serializer in this.targetSerializerTable.Values)
    {
        if (serializer.CanDeserialize(xmlReader))
        {
            return serializer.Deserialize(xmlReader);
        }
    }
    
    throw new InvalidOperationException(Res.GetString("InvalidTypeDes..."));

}

 

Well, there is the enumeration that was generating the exception. Wait a minute, what's that method call at the top of the Read method? That looks like something that might change state, and in a multithreaded context, that's a very bad thing. Let's take a look:

 


private void CreateTargetSerializerTable()
{
    if (!this.typeNamesAdded)
    {
        for (int i = 0; i < this.targetTypeNames.Length; i++)
        {
            Type type = Type.GetType(this.targetTypeNames[i], true);
            if (type != null)
            {
                this.targetSerializerTable[type] 
                            = new XmlSerializer(type);
            }
        }
        this.typeNamesAdded = true;
    }
    if (!this.typesAdded)
    {
        for (int j = 0; j < this.targetTypes.Length; j++)
        {
            this.targetSerializerTable[this.targetTypes[j]] 
                        = new XmlSerializer(this.targetTypes[j]);
        }
        this.typesAdded = true;
    }
    if (this.targetSerializerTable.Count == 0)
    {
        throw new InvalidOperationException(Res.GetString("TypeListMissing"));
    }
}

 

That's no good at all. This method is clearly not thread safe. Multiple threads can get to the boolean check and continue on into those if blocks before the first thread has finished building the collection and setting the bool to true.

 

This will cause everything to break until load calms down a little and one thread has time to completely run through the process of deserializing a single message. This is exactly what I was seeing.

 

The fix was fairly simple. Just use a cloned copy of the XmlMessageFormatter for each thread. I modified the MessageFormatterCache to return a cloned instance, and the problem disappeared. Microsoft even implemented ICloneable for the XmlMessageFormatter. That should have tipped me off right from the beginning. Oh well.

 

I also "primed" the cached instance by forcing it to deserialize a message in a thread safe context beforing letting it out into the wild world of multiple threads. This improved performance a little since it wasn't building the collection over and over again each time a thread used it.

 

Just goes to show, you should never assume that a class is thread safe.

Tags: , , , ,

Software Development

You Can Kind of Lock What Isn't There

by RMD 2. January 2008 23:01

This is a follow up post to my previous entry titled, You Can't Lock What Isn't There. In that post I explain the folly of trying to use SQL Server locking hints to prevent duplicate row inserts in certain scenarios.

sql serverI exclaimed with great gusto that "you cannot use lock hints to prevent duplicate inserts. Period." Seems pretty final and certain, huh?

Well, it's not entirely true. In fact, you can use locking hints in combination with strict isolation levels to accomplish this feat. Specifically, you can use the serializable isolation level.

Isolation levels are on a per-connection basis in SQL Server. Once they're set, they stick around for that connection until they're changed. The serializable isolation level ensures that not only are results in the results set not updated, like the repeatable read isolation level, but also that no "phantom" rows appear. In other words, it prevents inserts that would be included in the results set from occurring. If I re-run my query again inside the same transaction, I will get the exact same results.

So how does this help us in the scenario I described in my other blog post? Well, the EXISTS check (see below) would include a duplicate value if it were inserted between the check and the subsequent insert and the same code was run again, and since that would violate the "phantom" rows guarantee of the serializable isolation level, SQL Server blocks that insert until the transaction completes.

BEGIN TRAN

IF NOT EXISTS(SELECT 1 FROM Orders WHERE OrderTxID = @OrderTxID)
BEGIN
     INSERT INTO Orders (NEWID(), @ProductID, @OrderTxID)
END

COMMIT TRAN

lockHow does SQL Server accomplish this feat? It's not pretty. SQL Server essentially locks the page that will eventually include the indexed column value being used in the query. In this case, it looks at the OrderTxID column's index, finds the page (or pages, or extents) that would include the next value in the sequence for this query, and places a lock on it.

Here is the real kicker: If no index is present, or if it can't determine what page or extent to lock, guess what... it locks the entire table. Now that's gonna be great for scalability, huh?

So I apologize for misleading you with my previous blog post. It wasn't completely correct. But now that you know the truth, pretend like you don't.

Tags: , , ,

Software Development

You Can't Lock What Isn't There

by RMD 19. June 2007 22:10

lock I recently had a bit of a SQL Server (and databases in general) epiphany. I suddenly realize that a strategy I had employed for years was fatally flawed. Looking back, it seems incredibly obvious, but I managed to miss it until now.

For whatever reason, I never realized that attempting to use lock hints to prevent duplicate row inserts will never work for the simple reason that you can't lock what isn't there.

Imagine you have an Orders table. This Orders table is very high load and has many thousands of rows inserted daily. Due to various business reasons, there are several NULLable columns in this table, preventing you from just creating a unique index on the unique combination of columns. Instead, you decide to do a UPDLOCK or a XLOCK on the table in question when doing a new insert with the hope that it will prevent duplicates. So you have something like:

BEGIN TRAN

IF NOT EXISTS(SELECT 1 FROM Orders (UPDLOCK) WHERE OrderTxID = @OrderTxID)
BEGIN
     INSERT INTO Orders (NEWID(), @ProductID, @OrderTxID)
END

COMMIT TRAN

At best, this won't work. You'll still get duplicates. At worst, you'll create a situation where you can have runaway deadlocking. The latter is how I discovered the flaw in my reasoning. Turns out that SQL Server will typically elevate your UPDLOCK to a page level lock on any index that you may have that covers the columns in your where clause. So with the above example, it would elevate to a page level lock of any index that had OrderTxID. When this happens, all it takes is a few well timed inserts and you'll have deadlocks on your hands. Lots of them.

sql server The basic reason is that a page level lock may include the page in which an insert is destined, and it's not necessarily the current insert. It could be one of the many simultaneous inserts going on during heavy load. This means that the other insert statements will often request a lock on the same page that the current insert has a lock on, and visa versa. The result is a deadlock.

The moral of the story is that unless you have a record that you know exists that you can do your locking on (such as a parent relation, like a Customer in the case of the Orders table example), you cannot use lock hints to prevent duplicate inserts. Period. (*Update 1/2/2008 - This isn't completely accurate. See my latest blog post on this topic for more information.)

So how do you? Well, normally you would just use unique constraints and indexes. In the previous, admittedly trivial example, I would just have a unique index on OrderTxID. But now imagine that OrderTxID is optional and can be NULL. Now what? You can't create a unique index on the column because there could be multiple, perfectly valid rows with NULL values.

All is not lost. You can use a computed column or an indexed view. In my case, I created a series of indexed views with unique constraints on these views. This worked like a charm.

In case you're wondering, if the Orders table had a parent table called Customer that it had a many to one relationship with, you could use the customer row to perform some smart locking. If you perform any kind of exclusive lock on the customer row (UPDLOCK, HOLDLOCK, XLOCK, etc.), this will effectively serialize your inserts on a per customer basis. Be warned, this might be escalated to a table level lock during very high loads, but it will still perform fairly well as long as the Customer table isn't as insert-heavy as the Orders table. So, in using our example:

BEGIN TRAN

SELECT 1 FROM Orders (XLOCK, HOLDLOCK) WHERE ID = @CustomerID

IF NOT EXISTS(SELECT 1 FROM Orders WHERE OrderTxID = @OrderTxID)
BEGIN
     INSERT INTO Orders (NEWID(), @CustomerID, @ProductID, @OrderTxID)
END

COMMIT TRAN

As you can see, by establishing an exclusive lock on the customer row that corresponds to the new order row, I can be sure that there will never be a case where a new order will be inserted between the time I check for a duplicate and insert the new row. This is very similar to a thread synchronization strategy often used in programming languages like C#. (i.e. the "lock" statement.)

When you can, use the database's built in features to prevent duplicates by leveraging unique constraints, indexed views, and computed columns.

Tags: , , ,

Software Development