Damian Hickey

Mostly software and .NET related. Mostly.

RavenDB performance in your automated tests

In our automated testes, we usually have a single place that sets up a configured in-memory Raven database which might look something like this:

public static EmbeddableDocumentStore CreateDocumentStore(Assembly assembly)
{
	var store = new EmbeddableDocumentStore())
	store.UseEmbeddedHttpServer = false;
	store.RunInMemory = true;
	store.Initialize();
	IndexCreation.CreateIndexes(assembly, store);
	return store;
}

Over time, I noticed that our tests were geting slower and slower. Turned out that the IndexCreation process, which uses code generation underneath the hood, is a slow process that adds ~70 milliseconds per index per test. Multiply that by 10s of indexes and 1000s of tests and you've got quite a performance hit both on single test runs and test sessions (CI / NCrunch / Mightymoose / R# Test Session etc).

To improve the perfromance of this I removed "IndexCreation.CreateIndexes(assembly, store);" and made it such that we only put the exact index definitions that a test needs into the document store using a predicate. Thus minimising the document store creation time for an individul test.

But I think we can do better...

When running multiple tests in a session, we are often re-creating the same index over-and-over. Or rather, raven is emiting the same code, compiling and loading the same named type over-and-over. Adding a static Dictionary<string, Type> to the static Compile() method improved the performance of provisioning an embedable document store on second and subsequent test runs (if those tests used the same indexes) by at least a factor of 10. That is, second test (that used 2 indexes) run times went from >200ms to ~20ms. This also reduces the impact of a developer failing to properly restricted which indexes are created in a test.

(Side note: A positive side effect of this optimisation is reduced memory usage that is 'leaked' when dynamically loading assemblies.)

While there are probably going to be issues over my implementation hack of a compiled type cache (thread safety for one), I am going to try to get this improvement, or something similar, officially supported.

Update: Well that was quick. The optimization has been added to RavenDB and will be available in a build soon. Apparently it knocked 5-7 mins of Raven's build time. :)

RavenDB NuGet Packages: Coarse grained vs fine grained and sub-dependencies

This topic has recently come up on the RavenDB list (1, 2) and Ayende’s blog. I've been down that road of fine-grained packages (internal code) and back again so this is my opinion based on recent experience. The current opinion of the RavenDB team is that they want to avoid having 10s of nuget packages.

So, is 10’sof nuget packages really a problem and if so, for whom and how?

The Package Consumer

From the package consumer side, fine grained packages allows them to pick and choose precisely what they want without crufting up their project's references with unnecessary extraneous references. (Unnecessary references are a pet hate of mine). There are a number of OSS projects that appear to be doing this successfully such as Ninject , ServiceStack and NServiceBus.

One of the consumer’s concerns is that if they do pick 2 packages where one is dependent on the other, is that of package versioning and updating. If they were to pick RavenDB-Client and (a hypothetical) RavenDB-Client(Debug), they expect that at the precise moment one package is updated, then the other one is done so too, such that updating a solution is easy. That is unless RavenDB team is exercising flawless semantic versioning, which I doubt.

The other concern, regardless of a coarse-grained or fined grained packaging strategy, is that of package sub-dependencies. Despite the best intentions of authors with semver and specifying package version ranges in nuspec files , this is an area of brittleness, as was recently demonstrated by a recent log4net package update. Also specifying a package dependency because your package uses it internally unfairly makes your consumer dependent on it. Introduce another package that has the same dependency but maybe a different version ant they are at risk of runtime exceptions, deployment conflicts and having to perform brittle assembly redirect hacks.

Currently, adding a RavenDB-Client package to a class library adds the following 8 references:

image

… and the following package dependencies:

image

My fresh class library is now dependent on a specific Logging Framework, some MVC stuff that has nothing to do with what I am doing and a Community Technology Preview library that may or may not have redistribution licence concerns. This isn’t a great experience. A brief analysis:

  1. AsyncCtpLibrary’s usage is entirely internal to Raven.Client.Lightweight could be ilmerged and internalized. Example projects that do this approach include Automapper and Moq.
  2. Newtonsoft.Json is exposed through Raven.Client.Lightweight’s exported types so is a real dependency.
  3. NLog? There is a better way.
  4. Raven.Abstractions appears to contain both client side and server side concerns. The client side ones could be extracted and placed into Raven.Client.Lightweight and referenced by server side assemblies. (Perhaps, don’t know enough to say for sure)
  5. Raven.Client.MvcIntegration and .Debug are entirely optional and could be provided by separate packages, if I wanted them.
  6. System.ComponentModel.Composition could probably be removed if the server side abstractions were not part of the client package.

The end result, in my opinion, should look like this:

image

If the concerns of minimizing sub package dependencies and lock-stepping of package releases are addressed, then I believe that fine-grained packages are desirable to a package consumer.

Package Producer

The primary concern on the producer side is one of maintenance. Adding additional fine-grained package specifications to a single solution does have a cost, but I’d argue that it’s worth it when considering benefits to the consumer.

Where things do get difficult fast for the producer though is when the fine grain packages are independently versioned. Previously I said I doubted Raven is doing flawless semantic versioning. I doubt anyone is doing it flawlessly because there is no tooling available to enforce and you can’t rely on humans. I’ve tried the automatic CI based package updating “ripple” where Solution B that produces Package B but depends on Package A from solution A , automatically updates itself when a new version of Package A is produced. It didn’t work reliably at all. If the producer has a lot of fine-grained solutions and they have a lot of momentum, package management quickly becomes a mess and a massive time sink.

But if the package producer is using a single solution (as is the case of RavenDB) and a concurrent release of all fine-grained packages at the same time is performed, the cost of supporting fine grained packages is not prohibitive. This is the approach currently taken by ServiceStack.

Extraneous Project References

It’s an attention to detail thing, and for me it’s one of those telling indicators when assessing a code base and those who wrote it.

This is the default set of reference when creating a class library in VS2010:

image

Tell me, have you ever used System.Data.DataSetExtensions? When was the last time you actually used ADO.Net DataSets? 2008? 2005?

Having a reference is explicitly stating “I depend on this and I can’t and won’t work without it.”

When I review a project I haven’t seen before the very first thing I do is check it’s references. This instantly gives me an indication of how coupled the application / library is to other assemblies and what potential difficulties and pain there will be in versioning, maintenance and deployment. I also take into account the assembly type. Applications are at the top of the stack so their output will unlikely to be depended on. For these, I will be less concerned about the amount of references, but would still be concerned about diamond-dependencies from a maintenance perspective.

Framework assemblies, such as ORMs, Loggers, Data Clients, etc., are further down the stack and I am far more critical of. Any sort of non-BCL  dependencies in these will undoubtedly cause pain for the application developer. Your 3rd party ORM has a dependency on a certain version of one logger, but someone else’s DataClient uses a different version? Then you are in a world of odd runtime exceptions, deploy-time conflicts and assembly redirect hacks.

And forget about relying on accurate semantic versioning. There isn’t the tooling in .NET land to analyse assemblies and ‘break the build’ if there has been an breaking change without a corresponding bump in a version number. You have to rely on the authors of your 3rd party assemblies to be extra vigilant. Good luck with that.

In the end though, if there is a reference there and it’s not even being used, well, that is just sheer laziness.

Getting ReSharper to natively support xUnit

So I started a little online campaign yesterday to get the JetBrains folks to natively support xUnit in ReSharper. It already supports NUnit and MSTest (ugh) so what are the business cases for supporting xUnit?

  1. xUnit was part created by the same guy who was also involed in crating NUnit, Jame NewKirk. He details his leasons learned here, and thus can be argued that xUnit is a more modern version of NUnit.
  2. On NuGet gallery, NUnit has ~75K downloads compared to xUnit's ~14K, about 19% of NUnit's total. Not an insignificant market share.
  3. xUnit stats show a slow, but steady increase in adoption rates.
  4. xUnit-contrib, a projects whose predominate purpose is to provide a plugin for the ReSharper runner, also shows steady adoption. This project always lags behind the ReSharper releases and completely unusable during EAP phases due to API churn.
  5. xUnit, current version being 1.9, has had a version independent runner since 1.6. Thus ReSharper and the developer won't be constrained to any particular xUnit version, as long as it's > 1.6.

I hypothsized that xUnit adoption is being held back because ReSharper doesn't natively support it. That is, people are using it because using something else is simply too much friction. JetBrains ran a (non-scientific) poll to see of the NUnit/MSTest user base, who would move to another unit test framework if the friction is removed. As of posting, this is the current result:

The interesting thing is that over 54% of respondents would make a move if there was no friction. And at 39%, I am concluding from that that native xUnit support is a desired feature by ReSharper's customers. I hope JetBrains arrive at the same conclusion.

To vote on the offical issue, do so here http://youtrack.jetbrains.com/issue/RSRP-205721

Getting Visual Studio 11 Beta to recognize Visual Studio 2010 plugins

This has worked for me to get the Code Contracts and VisualHg plugins working in VS11 Beta. YMMV.

  1. Install VS11 Beta
  2. Install plugins
  3. 'Repair' re-install VS11 Beta

Works on my machine

Decrapify your Visual Studio installation

Fed up with the Visual Studio spamming your Programs and Features list? Make your voice heard over here.

In mean time, a lot of MSIs are silently uninstallable so you can use PC Decrapifier to remove them in one go.