How Not To Protect Your Data With Locks

Rico Mariani
8 min readMay 11, 2023

--

I wanted to write an article on good ways to create thread-safe access to a variety of data structures, but I found myself increasingly frustrated on this topic because there are many possible patterns and it’s very hard to select any particular pattern to recommend universally. So instead, I thought I would take a lesson from Peopleware and talk about ways that are almost certainly a bad idea. In that context I can maybe offer some alternatives that are more likely to be successful and so create some content that might be helpful, which is, after all, the point.

So here are various things that I think are bad ideas pretty much universally.

1. Use lots of ad-hoc locks

No matter the language, no matter the runtime, if you find yourself having to write this sort of thing a lot:

void Increment() {
lock (mutex) {
// putting aside that it's silly to do this for just an int
count++;
}
}
`
void Increment() {
std::lock_guard<std::mutex> lock(mutex);
// putting aside that it's silly to do this for just an int
count++;
}

You are heading for trouble. I think the first thing you must remember is that a lock operation in isolation does nothing to protect your data. That's a tough one to swallow but really, understanding lock requires non-local knowledge of the code. Specifically, you get nothing unless every other mutation of the protected data (count in this case) also acquires the same mutex. The easiest way to accomplish this is of course if there is only one place that count is modified and so you're immediately looking at "all" of the mutations because there is only the one. But this is rarely the case.

If you have many different locks that are used in a variety of ways, it becomes increasingly likely that one of the users will be not quite right or that you will forget to protect the variable in one of the places.

Using language constructions that strictly limit the visibility and mutability of the protected state is essential to success but the bigger the protected state becomes the harder that is to achieve.

2. Nest locks in complicated ways

As soon as there is code that needs to acquire more than one lock you need to have some kind of deadlock avoidance strategy. The more places you do this the more places you have to get it just right. There are a variety of ways to avoid deadlock, perhaps the easiest is to create a lock order and always acquire the locks in that order (and fail immediately if you try to acquire locks in some other order).

void Foo() {
// so far so good...
std::lock_guard<std::mutex> lock(mutex1);
std::lock_guard<std::mutex> lock(mutex2);
count++;
}

void Bar() {
// deadlock hazard, requires non-local analysis
// normally the issue is not "in your face" like this
std::lock_guard<std::mutex> lock(mutex2);
std::lock_guard<std::mutex> lock(mutex1);
count++;
}

Understanding and avoiding deadlock situations requires sometimes very broad knowledge of the code and its access patterns and so it is wise to use simple strategies that make deadlocks impossible.

3. Run arbitrary unknown code while holding a lock

It seems tempting to create some kind of general purpose do stuff callback system where you can run whatever code you need under a mutex, but this can lead to huge problems. Probably the most significant issue is simply that if you can run anything then any caller anywhere might introduce deadlocks because of code they run. They might not even be aware of exactly which lock or locks are taken by the general code.

void DoAThing(Action func) {
lock (mutex) {
func();
}
}
void DoAThing(std::function<void()> func) {
std::lock_guard<std::mutex> lock(mutex);
func();
}

In both of these cases func could:

  • take arbitrarily long to run, or,
  • call another similar DoAThing with another mutex, or,
  • who knows what other naughtiness…

In general, you can’t really know that your mutex is doing its job unless you have complete non-local knowledge of how it is used and, in this case, not only do we lack complete knowledge, we’ve no knowledge at all.

Importantly, sometimes simple looking operating system calls actually end up running potentially arbitrary code, and so they are just as bad as the case above. For instance, in Windows if you dared call a harmless looking function like SendMessage you could implicate literally any code via some WndProc or another.

If you need arbitrary code to run, you need to set things up so that it’s more like an async pattern where you acquire the data you need and then call out to the worker having released the lock.

void DoAThing(std::function<void()> func) {
int a, b;
{
std::lock_guard<std::mutex> lock(mutex);
// fetch the state under the lock
a = get_a();
b = get_b();
}
// do the work with the state as it existed, this is consistent state
// func should not acquire other protected state
func(a, b);
}

The above just shows one way that you might be able reduce the locking. You also might want to combine this with some kind of thread pool to get the func call maybe on its own thread with clean parameters. The important aspect for this section anyway is that you don't want to hold the lock potentially for a long time and run who-knows-what code.

4. Assume lock acquisition never fails and is timely

Most underlying locks have failure modes.

For instance, the WaitForSingleObject function in Windows returns a DWORD value indicating the outcome of the wait operation. The possible return codes include:

  • WAIT_OBJECT_0 (0x00000000): The specified object is in a signaled state, and the wait operation was successful.
  • WAIT_ABANDONED (0x00000080): The specified mutex object was abandoned by the owning thread.
  • WAIT_TIMEOUT (0x00000102): The wait operation timed out.
  • WAIT_FAILED (0xFFFFFFFF): The wait operation failed due to an error.

These failure modes are possible for all waitable handles. But they don’t apply to EnterCriticalSection -- or do they?

typedef struct _CRITICAL_SECTION {
PVOID DebugInfo;
LONG LockCount;
LONG RecursionCount;
HANDLE OwningThread;
HANDLE LockSemaphore;
ULONG_PTR SpinCount;
} CRITICAL_SECTION, *PCRITICAL_SECTION;

That looks remarkably like a waitable handle inside, and it has failure modes… SRWLOCK is completely opaque but maybe it also has failure modes. It's good to know this stuff...

But more importantly, lock acquisition can take arbitrarily long and this may be totally unacceptable in some contexts. In these cases it’s advisable to always use a strategy that lets you specify timeouts so that you can report errors rather than bring your system to its knees. This is especially true when the lock you are using is widely shared like a generic heap lock or a loading lock.

In C++ STL this might mean using condition variables:

std::mutex mtx;
std::condition_variable cv;
bool isReady = false;

bool WaitForMutexWithTimeout(
std::unique_lock<std::mutex>& lock,
const std::chrono::milliseconds& timeout)
{
auto timeoutTime = std::chrono::steady_clock::now() + timeout;
while (!isReady)
{
if (cv.wait_until(lock, timeoutTime) == std::cv_status::timeout)
{
return false; // Timeout occurred
}
}
return true; // Mutex acquired
}

This is more complex than a direct Windows call, but it is portable.

Unbounded lock acquisition can lead to serious problems especially if the locks do not have clear ownership.

5. Use lock-free operations everywhere

The lock-free choices are by far the hardest to understand and get right in general although in some special cases they are not so bad.

To do a good job with lock-free data structures you really need a strong understanding of memory models — this knowledge is not common. My “nearly correct” joke version of this is that there are two kinds of people:

  • the kind who know how to write lock-free structures and refuse to do so because they’re suitably terrified, and
  • the kind who have no business writing lock-free data structures

So, in short, if you aren’t (1) terrified and (2) being forced to do this metaphorically at gunpoint, you are probably in that second group and you should keep reading until you are terrified. The Dunning-Kruger Effect really tells the story here. Importantly if you are trying to write portable code you should remember that Intel architectures are far too forgiving because of their strong memory models and your bugs might not show up until you try things out on processors that have a weak memory model. These days correct ARM code is crucial.

For a good pattern consider RCU (Read-Copy-Update). This pattern is a synchronization mechanism is particularly useful in situations where there are many readers and infrequent updates. This involves:

  • Reading: threads can concurrently read data without any synchronization overhead. Readers do not block or contend with each other, and they can freely access the shared data. They acquire a pointer to their data root atomically and then are free to go.
  • Update phase: When an update is needed, a writer thread creates a new version of the data structure, leaving the existing version intact and accessible to the readers. The update is performed independently and atomically and does not block readers.
  • Cleanup: When all readers have finished accessing the old version. Then the writer thread (or any thread really) can safely free or recycle the old version.
  • It’s possible to have many old versions in flight

RCU can be implemented with just interlocked increments for reference counting (and only when threads acquire access to new readable data) plus store-release pointer store to publish changed data and load-acquire read to get the latest data. Sequential consistency isn’t required.

Here’s a picture of how this might work:

root ->  x ptr   ->   complex x data
y ptr -> complex y data
z ptr -> complex z data
  • threads read this data as they like, they use load acquire on root so that
  • if they see a changed root they will so see changed x, y, z ptr values.

After update:

root' -> x  ptr  ->   complex x  data
y' ptr -> complex y' data (maybe partly shared with y)
z ptr -> complex z data

Here an update to y occurred. The writer updated the data pointers block leaving the data unchanged. x and z were re-used. Once the data block was ready root was changed to root' with store release.

  • Readers may still see root for a time and may continue using y.
  • New readers will see the new version quickly.
  • When the last reader has finished with y it can be deleted leaving only y'

Note that y' could likewise be partly copied to create y' just like we did with the root data block and so more fine-grained updates are possible. Basically, you have to ripple all the way up to the root from the point of the change but that's all.

Other sharing strategies are also possible, very likely they would rely on some sub-parts of your data being immutable.

If you’re going to attempt to create data structures like the above, you should have a clear understanding of what all the options of std::memory_order mean and when they are needed. Execute while (doubts) read_again();

If this kind of strategy is important in your codebase, then you really want it to be some central clear abstraction that most people can use in a pit-of-success kind of way without having to know the details. This is possible but it requires good planning.

Summary of Red Flags

  • presence of lots of locks
  • presence of nested locks, especially with no total ordering of locks
  • lack of clarity about exactly what code will run in locks
  • no error handling in the core lock code
  • presence of lots of ad hoc atomics all of which have the strongest memory choice

Any of these things are likely to bite you.

Generally, the best advice I can give is to pick one or two simple strategies and use them consistently. In most cases you want to be able to use your memory with impunity so data structures with excellent isolation properties (e.g., immutable, or mostly immutable plus RCU) are solid choices to build on. Another choice is to always do certain kinds of work on designated threads and use async queues to communicate. These can be great patterns, but there are other great patterns.

The overriding theme of these bits of advice is to limit cases where you need omniscience to know your solution is working and will continue to work as it composes with other parts of your system and its adjacent systems.

--

--

Rico Mariani

I’m an Architect at Microsoft; I specialize in software performance engineering and programming tools.