Lazy Singletons in C# - Double Check and Lock Misconceptions

In this article, I want to take a look at two misconceptions relating to the implementation of a lazy singleton in C# (well, .Net really).

  • Double-Check and Lock for thread-safety during initialization.
  • The volatile keyword to “fix” Double-Check and Lock.

The idea for this article came about from a conversation I had on Twitter with Stefan Dokic, when discussing lazy singletons. I urge you to check out his book, “Design Patterns Simplified”. He uses real-word examples for using each pattern, so you’ll learn how to use a factory for something besides cats and dogs.

Singleton Pattern - A quick refresher

The Singleton Pattern is a software design pattern which says there must only be a single instance of a given object within the process (in the case of .Net, within a single AppDomain).

A lazy singleton is initialized when it is first accessed. In contrast, an eager singleton is loaded before the first access, usually when the process starts (or AppDomain is loaded).

Double-Check and Lock

Countless books, blogs, and forum posts tout Double-Check and Lock as a thread-safe way of implementing lazy singletons. In some cases this is true, but in most modern systems, it is not. In large multi-processor systems, it definitely is not. The implementation looks like this:

class Foo
{
    public static _instance;
    static readonly object _syncRoot = new object();

    public static Foo GetInstance()
    {
        if (_instance == null)
        {
            lock(_syncRoot)
            {
                if (_instance == null)
                {
                    _instance = new Foo();
                }
            }
        }

        return _instance;
    }
}

Looks safe at first glance. The instance is first checked for null. If it is null, a lock is taken to prevent other threads from entering, and is checked for null a second time (the double check) in case another thread grabbed the lock first and we had to wait for ours. If the instance is still null, we create a new one, otherwise we fall out of the block and return the instance.

So, what’s the problem? The CPU. Specifically, two things:

  • Cache memory consistency in multi-core and multi-processor systems
  • Reordering and other optimizations

Let’s look at cache first

There are three levels of cache in the CPU – Level 1 (L1), Level 2 (L2), and Level 3 (L3). Each core gets its own L1 and L2. L3 is shared across all cores. It’s important to note that this is per physical CPU. If the computer has multiple CPUs, each CPU has its own set of caches, as these caches are on-die.

The CPU reads and writes data to its registers. That write is then, eventually, propagated to L1. Then synchronized as well and eventually published to main memory and other processors. This is not an atomic operation by default! The cache write happens later. In the example above, while the lock prevents other threads on other cores or other processors from entering concurrently, there is no guarantee once inside, that the write (_instance = new Foo()) will be seen immediately by other the cores and other processors. This is where atomic instructions (like those on System.Threading.Interlocked) and volatile come into play. More on that later.

Reordering

Even if the writes were atomic, there is nothing stopping the CPU (or even the compiler or JIT) from “optimizing” this code and reordering the writes. If the write is reordered, the assumption of when threads see each other’s writes breaks down, because the code you wrote above may not be the order in which the CPU is actually executing those instructions and performing those reads and writes.

Volatile

Many articles and forum posts assert that volatile forces a read/write to main memory always, thereby avoiding the any consistency issues we may see in cache and memory updates.

This is false!

Microsoft’s documentation clearly states exactly what volatile does and does not do. It prevents the reordering of writes, but does not change how the CPU synchronizes cache writes across its cores or other processors. It will ensure that the data is flushed from the CPU register to cache immediately, which the CPU will eventually write to main memory. The CPU’s cache coherence mechanism ensures caches get updated across cores in the same physical CPU, but there is no guarantee when the data will be available to other CPUs. Unless you are 100% sure that your server does not have multiple physical processors, don’t rely on volatile to save the day here.

A Safe Way

So what is a safe way? Static initialization. The CLR does ensure that types only exist once in a process. So, we can use some .Net static type initialization tricks to implement a lazy singleton:

class Foo
{
    static class Singleton
    {
        static Singleton() {}

        internal static readonly Instance = new Foo();
    }

    public static Foo GetInstance()
    {
        return Singleton.Instance;
    }
}

Because statics are not initialized till they are accessed, internal static readonly Instance = new Foo(); will not execute till Foo.GetInstance() is called. Completely Lazy. Completely Thread-Safe.

A much simpler way in .Net is to just use System.Lazy<T>:

class Foo
{
    static readonly Lazy<Foo> _instance = new Lazy<Foo>();

    public static Foo GetInstance()
    {
        return _instance.Value;
    }
}

A Note on Virtual CPUs

Often, server-based software these days run on virtual CPUs, such as VMware or Hyper-V on-prem or a compute instance in the cloud, or even in containers, not physical ones. Virtual CPUs (vCPU) are not 1:1 with physical cores. They are CPU time managed by the hypervisor and presented to the VM as a CPU core. This means that multiple vCPUs may not be on the same physical die.

Conclusion

Double-Check and Lock will always be “broken” for multi-core and multi-processor environments, especially as presented in most texts. Yes, on a single core, single processor system (do they even make those anymore?), double-check and lock with volatile will work just fine.

Volatile is not the fix in multi-core or multi-processor systems. It does prevent reorder and does force an immediate flush from the register to cache to be snooped by other cores. It does not help at all in multi-processor systems.

Use System.Lazy<T> instead. If you look at the source code, there is a double-check and lock in there, but also a lot of code ensuring as much atomicity as possible and that only one instance can ever be created. The class is over 500 lines of code; far more complex than the textbook double-check and lock examples.

If you don’t want to take the performance hit of the locking and volatile memory boundary (especially with ARM), the lazy static initialization is a safe way to go, and more performant when compared to lock and volatile and therefor, more performant than System.Lazy<T>.

I like to play it safe and assume my apps will be deployed to a multi-processor system. Most of what I’m building is in the cloud with no visibility into what I am actually running on. My personal choice when writing singletons? Static Initializers, unless I can let the IoC container control lifetime.

Oh, one more thing, this holds true for x86 and ARM.

Cheers!

References

The “Double-Checked Locking is Broken” Declaration by Cliff Click and others.

Implementing the Singleton Pattern in C# from C# In Depth by Jon Skeet.

Volatile keyword in C#.

Memory barriers in ARM64 by Kunal Pathak, Microsoft.

Cache coherency Fundamentals (ARM) by Neil Parris.

Cache Coherency by John Wawrzynek, University of California, Berkeley