A discussion of C++ pointer hazards with details

Rico Mariani
15 min readApr 11, 2023

--

I want to talk about pointer hazards today and I thought this would be a good time to introduce various assumptions compilers can and can’t make and therefore assumptions engineers can and can’t make. Now we’re going to end up discussing shared_ptr<foo> and threading considerations but I wanted to start by laying some foundation. These are things people don't usually think about a whole lot.

First let’s consider a small program and we’ll look at the code that’s generated and discuss why and then I’ll give you a more brutal example that illustrates why things have to be the way they are.

When looking at this code I want you to remember that const int *i doesn't mean that the value can't change, it only means that you can't change it with that particular pointer. The compiler is all-too-aware of this.

#include <stdio.h>

__declspec(noinline)
void print_twice_a(const int *i) {
printf("1: %d\n", *i);
printf("2: %d\n", *i);
}

__declspec(noinline)
void print_twice_b(const int* i) {
int d = *i;
printf("1: %d\n", *i);
printf("2: %d\n", d);
}

int main()
{
int i = 5;
print_twice_a(&i);
print_twice_b(&i);
}

Let’s have a look at the disassembly for these two:

void print_twice_a(const int *i) {// boring frame setup, save rbx
01070 push rbx
01072 sub rsp,20h
printf("1: %d\n", *i);
// load *i from [rcx], put it in edx where arg2 goes for the call
01076 mov edx,dword ptr [rcx]
// stash the pointer in rbx, we'll need it later
// rbx is preserved across calls
01078 mov rbx,rcx
// put the format string in rcx for the printf call
0107B lea rcx,[string "1: %d\n"]
// print
01082 call printf
printf("2: %d\n", *i);// we must read the pointed at value again because
// printf might have changed it!
// the compiler doesn't know what printf does and
// it could mutate the underlying value
01087 mov edx,dword ptr [rbx]
// load the format string
01089 lea rcx,[string "2: %d\n"]
}
// fix the stack
01090 add rsp,20h
01094 pop rbx
printf("2: %d\n", *i);
// tail call to printf
01095 jmp printf

The compiler can’t see the body of printfso it has to assume that it has access to the address of i and might mutate it. I'll give you a simple example where a similar mutation happens below. It will be an ugly example but I think what you have to understand here is not that there are ugly code bits out there but that in real world code, with various functions having access to their this pointer, and through it other data structures, it's quite common for pointers to be mutated out from under you. To avoid this sort of thing you have to be able to make strong assumptions about how your code works. This is the first example we have here where non-local reasoning is necessary to know some outcome. You can't tell that it's safe to use a pointer twice without knowing non-local facts like "this int is never rewritten". The trouble is of course that:

  • it’s hard to know all such facts
  • they have this horrible habit of changing, rendering previously correct code buggy

To see a different outcome have a look here:

__declspec(noinline)
void print_twice_b(const int* i) {
int d = *i;
printf("1: %d\n", *i);
printf("2: %d\n", d);
}

And in this version *i is not reloaded, have a look:

void print_twice_b(const int* i) {
010A0 push rbx
010A2 sub rsp,20h

int d = *i;
// we fetch the pointer value into ebx and save it this is "d"
010A6 mov ebx,dword ptr [rcx]

printf("1: %d\n", *i);
// we set up ecx and edx like usual for printf
010A8 mov edx,ebx
010AA lea rcx,[string "1: %d\n"]
010B1 call printf

// now we don't have to re-read the pointer value, we can use
// the stashed value because that's in a local variable
010B6 mov edx,ebx
010B8 lea rcx,[string "2: %d\n"]
}

010BF add rsp,20h
010C3 pop rbx
printf("2: %d\n", d);
010C4 jmp printf

As you can see at address 010B6 we didn't have to dereference the pointer. Now of course, in this case, we're talking about a trivial amount of code and even the performance hit for re-reading is likely to be small. That isn't the point, what I want you to take away from this is that things can change out from under you and the compiler knows this. Here's an offensive example of how it could happen. This is not code anyone would write as is but the general theory is that the function printArg that you are calling might mutate things in a way that you can observe through your const pointer.

int g_i = 5;

__declspec(noinline)
void printArg(int i) {
// horrible side-effect for illustration only
// never do this :D
g_i++;
printf("%d\n", i);
}

__declspec(noinline)
void print_twice_c(const int *i) {
printArg(*i); // prints 5
printArg(*i); // prints 6
}

int main() {
print_twice_c(&g_i);
}

From here out I won’t include all the codegen because in most cases it’s obvious given what we’ve seen before and I’m not trying to make comments on size today. Suffice to say that the pattern used in print_twice_a works here. The compiler reads the pointer twice assuming that *i might be different for the 2nd call. In this case, it will be different, but the code is correct. Anyone who has worked on a compiler back-end is no doubt painfully aware of how little you can assume about pointers and maintain correctness. (Rust gets some advantages here because of declared lifetimes).

Now let’s mess things up a little bit more. Consider this function, and the question “Is it the case that when we do the printf, we always have a == c?"

__declspec(noinline)
void print(const int *i, int *j) {
int a = *i;
int b = (*j)++;
int c = *i;
printf("%d %d %d", a, b, c);
}

If we look at the code, we’ll see that the compiler doesn’t think so:

void print(const int *i, int *j)

// no frame needed
// compute 'b' using *j (in rdx) store it in r8 (scratch)
01070 mov r8d,dword ptr [rdx]
// compute 'a' using *i (in rcx) store it in r10 (scratch)
01073 mov r10d,dword ptr [rcx]
// do the increment storing the result in eax
01076 lea eax,[r8+1]
// store the incremented value back in *j (still in rdx)
0107A mov dword ptr [rdx],eax

int c = *i;
// 'a' needs to go in edx, arg 2, we already have it in r10
0107C mov edx,r10d
// finally we compute 'c', we have to use *i again (still in rcx)
// we can't use the value in r10d, *i might be different now!
0107F mov r9d,dword ptr [rcx]
// at this point we have edx = a, r8d = b, r9d = c
// we just need the format string
printf("%d %d %d", a, b, c);
// get the format string
01082 lea rcx,[string "%d %d %d"]
// tail call printf
01089 jmp printf

OK again there is just a very subtle code thing going on here. When we computed c we correcty used *i again. Why? Because the compiler also has to deal with aliasing. It's possible and i and j are the same value, or in general that they somehow overlap. Like maybe i and j both point into the same array so certain offsets you read might be the same memory.

Here is a simple set up that forces the issue:

int main()
{
int i = 1;
print(&i, &i);
}

That code ends up printing 1 1 2 and once again we see that const only means that you can't change the indicated value with that particular pointer. Not that it won't change. It might even change by your own hands. Note that in this case there weren't even any intervening pieces of unknown code! Even though the compiler could see the entire code flow it couldn't assume i != j or in general that there is no overlap. Because in general, there can be overlap.

There are just normal pointer hazards every compiler has to know how to deal with.

Now let’s talk about threading for a second. Generally the compiler’s position on threading is “What’s threading? Never heard of it.” which is to say it generates code that would be correct if there was only ever one thread and if that isn’t the case then it’s not the compiler’s problem to make sure the right locking happens. This is not 100% true but it’s pretty close (there’s stuff like thread local storage in some compilers for instance). Now, on the other hand it’s quite normal for runtime libraries like say STL to take some position on threads in some areas. Sometimes they even include threadpool functions and create affordances for thread-safe data structures. STL even offers std::atomic to help you do more. I'm not going to go into all of those, but I do want to focus on what safety you do and don't get from things like shared_ptr and a common idiom const shared_ptr<foo> &p.

A common theme you should look for in these examples is that “this stuff doesn’t actually work in general but, with reasonable lifetime assumptions, it works pretty well.” I’m going to push the boundaries a bit to illustrate the sort of implicit assumptions we normally make. The point of this isn’t to claim that shared_ptr made bad choices but rather to show you the rough edges so that, if weird things are happening in a scenario of yours, you might be well informed to know what sorts of things you might look for to find the source of your problems.

I’m going to use one example consistently throughout this part and we’ll imagine different threading and lifetime situations. Some will be very unfortunate indeed, but that’s all part of the fun.

Consider these two classes:

struct B {
int b;
B() { b = 2; } // lame, but whatever
};

struct A {
int a;
std::shared_ptr<B> pb;
A() { a = 1; pb = std::make_shared<B>(); } // lame, but whatever
};

We’re going to consider a bunch of methods that deal specifically with B and we'll talk about how they work, when they work, and what assumptions were made. Importantly, I'm going to pick hard on the shared_ptr cases because shared_ptr isn't thread agnostic, it gives you some promise that you can use it in a thread-safe way. This isn't a lie, it does help, but it's not a complete solution either, it's only part of the solution. You need other things in place to get the rest of the way there and sometimes those things are implicit. And sometimes shared_ptr isn't actually helping you at all.

Let’s start with this one:

void do_it_1(const std::shared_ptr<B>& rb) {
printf("%d\n", rb->b);
}

Now how safe is this? It looks pretty safe right? Well the assumption is that your caller has some shared_ptr and you are borrowing it. You're relying on the caller keep a reference for you, some reference the caller knows will not go away during your call. Perhaps surprisingly, this is actually a rather tall order.

for instance:

void caller_1a() {
auto pb = std::make_shared<B>();
do_it_1(pb);
}

This works because caller_1a knows for sure that it is keeping the shared B alive. So it can loan the pointer.

How about this one:

void caller_1b(const std::shared_ptr<B>& pb) {
do_it(pb);
}

Well, that’s still pretty good, caller_1b was promised that pb would be good for its life and it lent that promise to someone else. Let's hold on this one and think about it some more later.

How about this one?

void caller_1c(const std::shared_ptr<A>& pa) {
do_it(pa->pb);
}

This one is much less good. caller_1c was given an A and promised that the A would survive. However, nobody said anything about the embedded B. For this to be ok, the code that called caller_1c needs to make an even more complicated lifetime promise, one that covers A and its transitive closure. Remember, we have a simple example here but in general A could have any number of maybe-nested sub-parts. Also the A could be your this pointer.

The reason this stuff usually just works is that we’ve made important assumptions about when A storage is modified and how long B lives. If take away those assumptions things explode pretty badly.

Let’s go back to do_it_1:

void do_it_1(const std::shared_ptr<B>& rb) {
printf("%d\n", rb->b);
}

Can we, in general, assume that rb->b is safe here? The answer is a resounding no.

Consider:

  1. We enter do_it_1 with a shared_ptr reference provided by caller_1c
  2. We context switch out and are suspected for several milliseconds
  3. Meanwhile, another thread mutates A->pb, maybe even setting it to nullptr (this is allowed!)
  4. We resume
  5. We now find that rb could be some other value, maybe nullptr maybe something else.

Now if we’re aware of threads generally we know we need some kind of atmocity and isolation story to get meaningful semantics. Fair enough.

But we’re just talking about shared_ptr here today. The real question is: given all of this, is do_it_1 actually any better than this dumber alternative?

void do_it_2(const B &rb) {
printf("%d\n", rb.b);
}

or equivalently:

void do_it_3(const B *pb) {
printf("%d\n", pb->b);
}

I think the answer is no. Both do_it_2 and do_it_3 require exactly the same promise as do_it_1. The shared pointer reference did not help us to write do_it_1 safely in any way. It did not provide additional non-null guarantees and if anything it made us more succeptible to aliasing problems. Note that the above are not exactly equivalent because they have different (better?) aliasing issues. And note that all the solutions so far have potential problems with null pointers and wild pointers because in the caller_1c pattern the A could be mutated or even deleted!

Note also that we don’t have to drag threads into this; side-effects are enough to ruin our assumptions. A more normal do_it method might look like this:

void do_it_4(const std::shared_ptr<B>& rb) {
call_some_stuff(args);
printf("%d\n", rb->b);
}

Keeping in mind the args could be constants, member variables of the class where do_it_4 is located, its this pointer, or some globals or other ambient things it has access to. The net of this is, as we saw with print_twice_a we can't assume that rb->b is unchanged. Compared to when we entered do_it_4, rb->b could be changed, null, or even invalid.

We can do better with this pattern:

void do_it_5(const std::shared_ptr<B>& rb) {
int b = rb->b;
call_some_stuff(args);
printf("%d\n", b);
}

In this version at least we’ve removed side-effects due to call_some_stuff as a source of problems, and we need a comparatively short-duration promise from our caller (whatever that means). This pattern is actually used in many user-mode to kernel-mode layers. In that context you must capture your arguments before you validate them and then use the captured values otherwise a nefarious caller might try to get you to validate arguments and then mutate them on another thread so that you validated one set of arguments but when the actual work is done you do it on some other set of arguments. This change of argument values could possibly violate your access rights or even allow control of the kernel via (e.g.) a buffer overrun caused by previously validated arguments subsequently changed to invalid arguments.

The thing is, the short duration promise mentioned above isn’t really necessarily short duration at all; we could context switch on entry and be delayed arbitrarily long. And, once again, flowing the shared_ptr did nothing to help us write correct code. We're only saved by complex non-local reasoning in both the caller and the callee. Only clear, simple, assumptions about lifetime can make this sort of reasoning possible.

Now I can’t resist pointing out one more thing, you might be thinking, “but surely if I pass in a shared_ptr reference I can save myself by creating my own shared_ptr to what I need.” Like so:

void do_it_6(const std::shared_ptr<B>& rb) {
std::shared_ptr<B> pb = rb;
call_some_stuff(args);
printf("%d\n", pb->b);
}

And with this code you suggest, “Now maybe call_some_stuff somehow transitively ends up modifying my B but at least no wild pointers, right?"

Well, no.

Constructing a std::shared_ptr<B> from a const std::shared_ptr<B>& does these three things:

  • check if the referred pointer is null
  • interlocked increment on the strong reference count if the pointer was not null
  • copy the pointer (data pointer and control block pointer) to the target

This isn’t enough to be thread-safe in general.

The problem is that even if the null check and the increment were one atomic operation (which they are not) you could still lose the object before the pointer was copied.

When a weak_ptr is upgraded to a strong pointer there is a complex loop powered by an interlocked compare-exchange operation that safely upgrades from weak to strong. Importantly, this can fail, and you get a suitable success code that tells you if it has.

When passed a shared_ptr reference we have effectively a weak pointer. With a different STL we might try to upgrade it atomically like we do with other weak pointers but that isn't what std::shared_ptr does. You see, we in fact have worse than a weak pointer because we don't even have a reliable control block pointer to tell us the object is dead. The loopy upgrade to strong that worked so well for weak_ptr might even crash if we attempted it without a viable control block.

We can do the upgrade written above safely in many cases, but only if we can make sensible lifetime assumptions about the objects in question.

Now you may be saying “OK fair enough, Mr. Pedantic, but surely if that isn’t enough, then this will save us.”

void do_it_7(const std::shared_ptr<B> pb) {
printf("%d\n", pb->b);
}

This has to be totally safe right?

Well, no. You have to now ask yourself where did this shared pointer pb come from? You see what actually happened here is we just transferred the responsibility of creating the shared pointer to our caller. Now there are two situations:

  1. the caller has “good reasons” to know that their shared_ptr reference or copy is durable and can use it to make a new shared_ptr, or
  2. they don’t

If (1) they didn’t have to give you a shared_ptr in the first place, they could have rented you their copy via a B* or B& as we saw before. If (2) they are just as screwed making the shared_ptr as you were. That code will have to rely on the same non-local lifetime assumptions your code would have to do its work.

In general, you’re always going to dealing with three problems:

  1. Side effects caused by ambient authority (access via globals, or big objects that are common)
  2. Aliasing
  3. Lifetime issues due to other threads

Now if you’re using a language like Rust, their position is that you have to declare lifetimes and then they can enforce that you don’t screw it up.

If you were using a language like C#, and say we have this situation:

void do_it_8(B b) {
Console.Writeline("{0}", b.b);
}

You can’t really go wrong here. In managed code there is no reference counting going on. The do_it_8 function might get a null pointer (so maybe add a null check) but once the B arrives in your function it won't go away on you. Furthermore, managed code frequently uses immutable patterns which are readily apparent in the class definition so you might be very thread-safe as well.

In the C++ world you could also reach for immutable patterns to help solve some of these issues, but they aren’t really omnipresent, and are often quite inconvenient. In C++ not even std::string is immutable. Still some clear isolation pattern, maybe something like a transaction providing lifetime scope and data consistency will give you good results.

In short, shared_ptr gives you some foundational things to build a correct solution upon. It isn't a full solution by any means. In some ways, the presence of shared_ptr is a good red flag that ownership is complex, and you should be looking carefully to see what the lifetime model is and what the atomicity and isolation rules are if threads are involved, otherwise you'll get into trouble.

I’d like to close with a few words on unique_ptr and we'll tweak the example just a bit to do that, thus:

struct D {
int d;
D() { d = 2; } // lame, but whatever
};

struct C {
int c;
std::unique_ptr<D> pd;
C() { a = 1; pd = std::make_shared<D>(); } // lame, but whatever
};

Interestingly, the unique_ptr provides a stronger guarantee than the shared_ptr; well, sort of.

In this case everything in sight is public and so you are faced with all the same sorts of issues as we saw above. But there is an implicit promise that the lifetime of pd here is strongly controlled by C and so in some sense you can feel much better about relying on a C pointer than you could relying on an A pointer.

In the A case the B was declared as shared and so you already get this hint that "complex stuff" is probably going on. With the analogous C class you have some reason to believe that D has simpler rules, especially if pd was private rather than public. All the same things can happen in both variations, but, probably, the lifetime arguments will be simpler if you are dealing with unique_ptr. Importantly the unique pointer is not necessarily immutable. But making it so, if possible, would be a fine pattern and would lend much clarity to your code.

Hopefully this discussion was somewhat illuminating. The std::shared_ptr type can be helpful in creating a thread-safe solution but it's only one small piece of a more complete answer. How and when you share, and how you solve your atomicity and isolation problems (e.g. with immutability) will likely be just as complex as it ever was.

But at least you won’t have to worry about getting your atomic reference counting wrong.

--

--

Rico Mariani
Rico Mariani

Written by Rico Mariani

I’m an Architect at Microsoft; I specialize in software performance engineering and programming tools.