A few words about data and concurrency

“I need an atomic integer”

Oh Lord, here we go.

I find it completely astonishing observing what people think they need with regard to concurrent constructs and especially data sharing. It’s super hard to even understand all the considerations, even for experts, which is why not sharing data between threads at all has to be the first line of defense.

But ok, you want to go there, I strongly recommend you don’t, but at least let me tell you what some of the concerns are and maybe I will scare you out of it. That would be good.

This is a good place to remind everyone that what I’m going to write is only approximately correct, mainly in the interest of brevity.

Primitives are already atomic for simple read/write

In virtually all the cases you will encounter on any modern processor every primitive type (e.g. int) is an atomic read or write. It writes or it doesn’t. Most times when people are talking about atomic primitives they actually aren’t referring to atomicity at all, but maybe ordering or observability.

Why call it this at all? Well, maybe because of some of the elementary operations you can expect on an “atomic” type. Even if integer storage is atomic, incrementing it might not be, it’s a read-modify-write operation. So maybe you’d like to have something that you can ++ that has an atomic++ operator on it. You might call that an atomic int right? So I guess the question is, what are the atomic operations you want? How are they offered? Do you want/need a datatype for that?

There are some notable exceptions to atomicity. Generally it requires natural alignment of the data type in question (else it might take two writes to update it) and it must be a processor supported type (e.g. some 32 bit systems have C compilers that offer 64 bit integers, but they aren’t a primitive type as far as the processor is concerned).

Can’t I just fix everything by sprinkling volatile everywhere?

Well, no, actually most of the time people don’t even know what it does and it most likely won’t help you by itself. Briefly, the volatile keyword tells the compiler that the value in question might change without the code seeming to do anything. Classically this was because you were storing something like the address of a memory mapped timer device in a variable.

int *timer = (int *)0xF0001122;int time1 = *timer;...int time2 = *timer;

A clever compiler might remove the second timer read noting that you read the same address and nothing modified it in between. The following was designed to prevent just such a thing.

volatile int *timer = (int *)0xF0001122;int time1 = *timer;...int time2 = *timer;

When volatile is used it tells the compiler that it may not elide any reads/writes from/to the pointer. It must emit exactly one read/write for each appearance in the code and it may not reorder them relative to other volatile operations. Note the consequences of getting this wrong can be very painful. This loop for instance could be quite bad if the timer wasn’t read on every iteration:

int *timer = (int *)0xF0001122;*timer = 0;while (*timer < 1000) ;  // ultra dumb example (might spin forever)

Concurrent programming can provide similar kinds of “changed when I wasn’t looking” behavior because some other thread might be modifying the timer variable. Some compilers (e.g. MSC) were changed to include certain processor memory fences on volatile reads and writes to help with that. Those things are generally gone now. You’re left with volatile means read and write exactly the number of times I specify and never skip or reorder a read or write.

Some microprocessor basics

To understand what’s left, you have to know a few things about modern processors with multiple cores and therefore multiple concurrent threads of execution.

  • when you write data, the data might not be written immediately, processors routinely queue up writes and flush them when it’s convenient
  • the processor might not flush the writes in the order that you did the writing
  • on any given thread of execution if you write data and read it back you will see what you wrote, even if it hasn’t been flushed yet, the processor lies :)
  • any other thread looking at the same data might see a previous value because a pending write has not yet retired
  • you might see writes “out of order” because of the processor shenanigans above

Sometimes when people want to avoid these problems they say “I need an atomic int” — atomicity probably isn’t what they need at all, it’s some kind of write ordering guarantee.

Let me give you a concrete semi-realistic scenario. Suppose I have a simple linked list and one thread adding items to it periodically. For simplicity items are never removed.

The writing thread does this:

Foo *head;Foo *foo = malloc(sizeof(Foo));foo->data = 1;
foo->next = head;
head = foo;

You might be forgiven for thinking that this is a safe thing to do. After all, you safely make a nice fresh foo, then you set it up, and finally point the head at it atomically. The other thread will either see the new head or it won’t, and if you don’t care about a little delay all is well. It will see the new head soon enough. Right?

Well, no. There are several problems here.

There are three writes here, foo->data, foo->next, and head. They can be retired in any order. So there is some chance that the reading thread will see the new head before it sees the new foo->next. That’s bad because there’s junk in your foo object at that point… Similarly foo->data might still have junk in it.

So another service sometimes provided by “atomic” primitives is some kind of write-barrier. Applying this to head would effectively turn the code into something like this:

foo->data = 1;
foo->next = head;
_flush_primitive();
head = foo;

Which means that if you see the new head you will for sure also see the other writes that preceded it. The symmetric thing can be done on the reader side. This are sometimes called “load acquire” and “store release” semantics (store release shown above.)

Importantly, if there is no data ordering necessary, then none of this is needed. For instance if you want to tell a background thread that your user did something and you have no other actions to take no synchronization is needed at all! But this is tricky business. Allow me to illustrate:

int user_did_it = 0;// user does something
user_did_it = 1;
// sometime later, on a different thread
if (user_did_it) { coolness_ensues(); }

That’s fair enough, but this code which is nearly the same won’t work:

int user_did_it = 0;// user does something
prepare_for_coolness();
user_did_it = 1;
// sometime later, on a different thread
if (user_did_it) { coolness_ensues(); }

The problem being that the background thread might not see the prepared coolness before it sees user_did_it. Now you need a fence at least.

And I left off all the volatiles that are needed so that the compiler won’t mess it all up.

Summary

  • volatile gives you a fighting chance at getting concurrency correct because it prevents the compiler from removing and/or reordering and rewriting your operations.
  • volatile isn’t sufficient because the processor is also reordering writes, so you need some operations that give you more sane memory order guarantees
  • sometimes these are called “atomic” services but they often have very little to do with atomicity (except maybe atomic ++ operators and such)

In general this stuff is fantastically hard to get right and so you basically never want to do it. It’s hard enough to get simple critical sections correct…

Written by

I’m a software engineer at Facebook; I specialize in software performance engineering and programming tools generally. I survived Microsoft from 1988 to 2017.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store