I've been trying to hunt down a bug for months. It's one of those bugs that you can't replicate, that you can really predict, and that only happens in production once every few weeks. It's also a bug that can essentially completely break the system, and at the worst possible time... during periods of heavy load.
The basic story is that I have an application that handles tens or hundreds of thousands of requests a day. These requests are in the form of MSMQ messages which are sent to a set of middle tier machines for processing from a variety of sources and for a variety of purposes.
Once they reach the middle tier, a component called the Dispatch Manager deserializes the contents of the message, examines the .NET type and various properties of that type that adhere to a particular interface called IDispatchedMessage, and then hands the message off to another component for processing. The process of peeking the message on the queue and dispatching it happens in a multithreaded, object pooled environment.
The problem was that sometimes the message types wouldn't get deserialized properly. I would get errors stating that the Dispatch Manager didn't know how to read that particular message. This almost always happened during periods of heavy load, when there were dozens of threads simultaneously dispatching messages. After a fairly random amount of time, but always when the load started to decrease a little, the Dispatch Manager would magically remember how to deserialize those messages again and the problem would go away until the next time I restarted the application or the internal application cache was cleared.
The message types and where they're supposed to get dispatched to are stored in a SQL Server database. This data doesn't change too often, so I cache it using a singleton pattern. The Dispatch Manager constructs an XmlMessageFormatter class instance using the type names from the database, caches this instance, and returns a reference to this static instance when requested. This instance is then assigned to the MSMQ message and the Body property is accessed to get the deserialized data. Something like:
public IDispatchedMessage DeserializeMessage(System.Messaging.Message message)
messageFormatter = MessageFormatterCache.GetMessageFormatter();
message.Formatter = messageFormatter;
dispatchedMessage = message.Body as IDispatchedMessage;
Looking at this code, I concentrated my attention on the MessageFormatterCache. I figured it had to be the problem. Some how, my singleton pattern was failing under load. I rewrote the code several times. I did extensive research on potential issues with double checked locking in .NET. Not matter what I did, the code still broke.
Well, that wasn't the issue. It was a far more insidious issue, and honestly one I should have caught a lot earlier. The XmlMessageFormatter class, like many classes in the .NET Framework, is not thread safe. Usually, this doesn't actually mean that it won't work, it just means that it hasn't been tested or designed with multithreaded access in mind. In the case of the XmlMessageFormatter, they weren't kidding.
The clue that tipped me off was that just before the Dispatch Manager would start forgetting message types, I would get an exception stating that a collection had been modified and therefore the enumeration could not continue. I was never creating, modifying, or enumerating collections in the code in question, so I took at look at the stack trace. Sure enough, it pointed directly to the Read method of the XmlMessageFormatter.
Here is a slightly abbreviated version of that Read method:
public object Read(Message message)
XmlTextReader xmlReader = new XmlTextReader(message.BodyStream);
foreach (XmlSerializer serializer in this.targetSerializerTable.Values)
throw new InvalidOperationException(Res.GetString("InvalidTypeDes..."));
Well, there is the enumeration that was generating the exception. Wait a minute, what's that method call at the top of the Read method? That looks like something that might change state, and in a multithreaded context, that's a very bad thing. Let's take a look:
private void CreateTargetSerializerTable()
for (int i = 0; i < this.targetTypeNames.Length; i++)
Type type = Type.GetType(this.targetTypeNames[i], true);
if (type != null)
= new XmlSerializer(type);
this.typeNamesAdded = true;
for (int j = 0; j < this.targetTypes.Length; j++)
= new XmlSerializer(this.targetTypes[j]);
this.typesAdded = true;
if (this.targetSerializerTable.Count == 0)
throw new InvalidOperationException(Res.GetString("TypeListMissing"));
That's no good at all. This method is clearly not thread safe. Multiple threads can get to the boolean check and continue on into those if blocks before the first thread has finished building the collection and setting the bool to true.
This will cause everything to break until load calms down a little and one thread has time to completely run through the process of deserializing a single message. This is exactly what I was seeing.
The fix was fairly simple. Just use a cloned copy of the XmlMessageFormatter for each thread. I modified the MessageFormatterCache to return a cloned instance, and the problem disappeared. Microsoft even implemented ICloneable for the XmlMessageFormatter. That should have tipped me off right from the beginning. Oh well.
I also "primed" the cached instance by forcing it to deserialize a message in a thread safe context beforing letting it out into the wild world of multiple threads. This improved performance a little since it wasn't building the collection over and over again each time a thread used it.
Just goes to show, you should never assume that a class is thread safe.