Robert Downey
News
About Me
My Thoughts
Resume
Code
Contact Me

Trials and Tribulations with SQL Server Compact Edition

by RMD 13. April 2011 17:44

SQL Server Compact 3.5 SP2I recently migrated an application from using a custom pseudo-database that I wrote to using SQL Server Compact Edition 3.5 SP2. SQL CE seemed to fit me needs nicely (no installation, essentially in-process, multi-threading capable), and so with far less research than I should have done, I spent 100 or so hours to integrate it into my software.

The primary reason for the migration was that my custom database code was pretty rudimentary. It had very simple locking mechanisms (global read/write locks, basically), and as a result it really couldn’t scale to lots of concurrent access from multiple threads. Up until this point, my software didn’t really require high levels of concurrency for data access, but the recent addition of some new features meant that it did now.

I needed to get a new data access layer up quickly, so I settled on Entity Framework. I’m stuck with .NET 3.5 for this particular piece of software, so it was the somewhat icky Entity Framework v1 that I had to use. It also meant I was stuck with SQL CE 3.5 rather than the fancy new 4.0. Still, I figured anything would be better that what I currently had.

Starting to Smell Funny

The first inklings I had that something was fishy was some interesting performance issues. It was taking a substantial amount of time to create my Object Context. Something like 500ms or more on a pretty fast desktop box. (Core i70 940, 8GB RAM, etc.) Since I couldn’t share my Object Contexts across threads, I decided I would create a thread-safe object pool and hand out the instantiated contexts on a thread by thread basis, reusing them when the previous thread had placed them back in the pool.

I wrote a nifty generic ObjectPool<T>, tested the hell out of it, and starting happily using it with great success. Subsequent requests for a pooled Object Context would now be fulfilled in 5ms instead of the 500ms it was taking before. Patting myself on the back, I left for the night, leaving my application running just to make sure things continued to work “long term”.

When I got in the next morning, I was greeted with lots and lots of errors. The most common of these was a deadlock timeout like:

An error occurred while executing the command definition. See the inner exception for details. ---> SQL Server Compact timed out waiting for a lock. The default lock time is 2000ms for devices and 5000ms for desktops. The default lock timeout can be increased in the connection string using the ssce: default lock timeout property. [ Session id = 11,Thread id = 7224,Process id = 8592,Table name = __SysObjects,Conflict type = x lock (s blocks),Resource = RID: 1030:29 ]

The deadlocks would always occur during a certain operation that was pretty write-heavy, but only one piece of code in the software ever did this write operation. I racked my brain trying to figure out how the code could deadlock, but couldn’t figure it out. I tried playing with the lock timeout settings, but this would typically just cause it to take longer to fail. Limiting the number of threads to 1 would, of course, resolve the issue – but that wasn’t an option.

In addition to those errors, I was seeing OutOfMemoryException and AccessViolation exceptions being thrown. I had no idea where to start with the OutOfMemoryExceptions, but the AccessViolation exceptions had to be coming from unmanaged code. The only unmanaged code I was using was, you guessed it, SQL CE. I decided that this must mean there is some hidden thread affinity issues with SQL CE connections, and I abandoned my usage of my nifty object pool.

It Gets Worse

I modified the code so that it no longer used the object pool and instead essentially wrapped all usage of the object context in a using statement. Things seemed to get better at first. I was still seeing the deadlock errors, but much less frequently. This scared me a bit, but it was rare enough that I chocked it up to me having a fast machine. I know – that doesn’t make a lot of sense – but whatever. Now, however, we had a new issue. The Dispose call on the ObjectContext was resulting in a NullReferenceException. The exception was being thrown by the Accessor.Dispose call made internally by the SqlCeConnection class. Not always, mind you… but perhaps 25% of the time.

I decided I’d try and ignore the problem by replacing my using statement with a custom delegate which did a try/finally and caught the NullReferenceException. Something like:

   1:  [DebuggerStepThrough()]
   2:  [MethodImpl(MethodImplOptions.NoInlining)]
   3:  public static void SafeUsing(Action<MyObjectContext> action)
   4:  {
   5:      MyObjectContext context = null;
   6:   
   7:      try
   8:      {
   9:          context = GetMyObjectContext();
  10:          action(context);
  11:      }
  12:      finally
  13:      {
  14:          //We do this instead of a normal using statement because
  15:          //SqlCe has the bad habit of throwing NREs when
  16:          //doing its dispose.
  17:          try
  18:          {
  19:              if(context != null)
  20:              {
  21:                 using(context){}
  22:              }
  23:          }catch(Exception){}
  24:      }
  25:  }

Problem solved… if by solved you mean not understood and then dutifully ignored. Not only that, but this didn’t handle the situation where the Finalizer called Dispose automatically. In those cases, a NullReferenceException was being thrown and eaten by .NET itself.

This “fix” really just made things worse. By allowing the application to continue to error over and over, I was actually setting the stage for some major memory corruption. So while the exception I was catching was managed, the root cause was likely some unmanaged code not behaving.

Much Worse

After between 1 second and 3 days of runtime, my software would now crash hard. How long it took to crash seemed to depend on how fast the machine in question was. By “hard” I mean that Windows killed the application outright and popped up a Windows Error Reporting dialog. This usually only happens when your application doesn’t handle a managed exception (which was impossible in my case due to some global error handling I had), or if your application is being a bad citizen and accessing memory it shouldn’t be. Managed, safe code can’t do that – so the culprit had to be SQL CE.

I configured Windows Error Reporting to gather a mini-dump from the crash. Opening up the resulting DMP file in Visual Studio (and configuring VS.NET to load the appropriate symbols from Microsoft’s public symbols server) allowed me to see the following stack trace:

A heap has been corrupted.

Some of the crashes would have a access violation in there instead of a heap corruption, but same difference. You’ll notice the SQL CE native interface DLL in the trace. That’s the DLL responsible for communicating between .NET and the native SQL CE APIs. In this case, it’s clear that SQL CE is attempting to free some memory on the heap, but that memory doesn’t belong to it. Windows detects this and kills the application.

I Give Up – Time to Call Microsoft

At this point, I’ve given up fixing this myself. I seriously started considering SQL Lite, but I just don’t have the time to rip out an entire database engine. I posted some questions on StackOverflow and the MSDN Forums, but got nothing very useful. Finally, I posted feedback on Microsoft Connect, and after a few days I got a response basically telling me to use SQL CE 4.0, and if I needed the fix in 3.5 I should contact support.

So I did. (Or, rather, my co-worker did. I’m a busy guy. Smile) Microsoft’s support teams seem to vary dramatically from product to product. Their SQL Server team is one of the best. Their response times are very low, and the quality of their feedback seems very high. Within a day we had a response. It turns out there had been not 1, not 2, not 3, but 4 “cumulative update” releases for SQL CE 3.5 SP2. I had searched for “hotfix” and “service pack”, but never “cumulative update”, so I never found these things.

Best of all, Cumulative Update 2 fixed two issues:

KB983516, KB2300599

Funny, those seem like exactly the two issues I’ve encountered. Sweet! Problem solved!

Or not.

Assembly Versioning Hell

I happily downloaded the update, installed it locally, and grabbed the new DLLs from the installation directory, popped them into my projects, rebuilt, re-deployed, and… crash. Not the same crash, mind you – a brand new, far more confusing crash.

System.Net.Sockets.SocketException: An operation was attempted on something that is not a socket

Huh? This would also typically be preceded an access violation, but not always. Sigh. Thoughts of SQL Lite started to return, but I decided to take another look at the mini dump just to make myself feel useful. It was then that I noticed the loaded modules list had something I wasn’t expecting:

image

The file version number for SQL CE 3.5 SP2 RTM is 3.5.8080.0. The file version for Cumulative Update 2 is 3.5.8082.0. The files sitting in my application’s directory are file version 3.5.8082.0, yet the ones actually being loaded are 3.5.8080.0 – the RTM version! Also notice that they’re all coming from the Program Files directory rather than my applications directory.

So why are these unmanaged DLLs (sqcee*.dll) being loaded from this Program Files directory rather than my applications directory? Well, the machine in question had SQL CE 3.5 SP2 RTM installed on it. My application had a reference to the managed DLLs System.Data.SqlServerCe.dll and System.Data.SqlServerCe.Entity.dll. When the .NET assembly loader (aka Fusion) resolves the path of the assembly to load, it will always prefer to load the assembly from the GAC if it has the same version number. In this case, despite the fact I had new versions of those managed assemblies sitting in my application folder, .NET saw that their assembly version (rather than the file version) matched one that was already in the GAC, and it loaded that instead. The GAC’s copy of the assembly was the RTM version, so it also loaded the RTM version of the unmanaged DLLs.

So how do we fix this? Well, you can do a few things to force .NET to load the assembly of your choice, but they’re pretty hacky. You can try and do an Assembly.LoadFile, or you can redirect to a specific assembly using the runtime section in the App.Config, but I kept thinking that Microsoft must have anticipated that people who installed SQL CE using the MSI installer might break applications that relied on specific versions of SQL CE that they deployed with their application.

Sure enough, they did. My co-worker discovered that I had accidentally grabbed the DLLs from the root installation folder for SQL CE, which is typically something like C:\Program Files (x86)\Microsoft SQL Server Compact Edition\v3.5\. If you look at that directory, you’ll notice a child directory called Private. This directory contains two DLLs, System.Data.SqlServerCe.dll and System.Data.SqlServerCe.Entity.dll. Just like the DLLs in the root directory, their file version is 3.5.8082.0, but their assembly version (which is the only version number that .NET really cares about) is 3.5.1.50. The assembly version for the DLLs in the root (and, most importantly, the DLLs in the GAC) are 3.5.1.0.

As a side note, an easy way to check the assembly version is to install the .NET Version shell extension which adds a nifty tab to the file properties, as seen below:

image

So by giving us an assembly with a version that does not match the version of the assembly added to the GAC by the SQL CE installer, we have the ability to do an assembly version based redirect and instruct .NET to look for version 3.5.1.50 rather than 3.5.1.0. Since 3.5.1.50 will only be found in our application’s installation directory, it will load that assembly, which will in turn load the unmanaged DLLs also sitting in our application’s installation directory. To do this, we must specify this behavior in the App.config like so:

image

This config entry simply says that anytime .NET tries to load System.Data.SqlServerCe with a version number of 3.5.1.0, it should instead load the assembly with the version number of 3.5.1.50.

Now, had I done any real research regarding deploying SQL CE with an application I would have found some excellent resources that explained all of this. But I didn’t. So I had to experience the pain.

So, did it work, or what?

Yeah. It worked. Now that my application will load the correct, updated SQL CE DLLs no matter what’s already on the machine, it no longer experiences the crashing or dead locks. Yay.

The moral of this story is, of course, do your research. Make sure you know the issues with any 3rd party components before you start using them, and make sure you’re using the latest and greatest when possible.

I’ll be sticking with SQL CE for a while, but the research I did on SQL Lite makes me think it may have been a better choice all along. SQL CE 4 looks awesome, but SQL Lite gets me most of the way there and it has the fact it’s probably the most widely deployed database in the world going for it.

Tags: , ,

Software Development