When Bad C++ Happens to Good People
A C++ Hazard Exposé
Introductory Remarks
I can’t comprehensively give examples from the universe of C++ and std::*
in a short article like this. Hopefully, I can give you a sense of how the kind of Bad Stuff I’m going to discuss happens. From this, take away the notion that kicking the tires on what is underneath your abstraction will help you. A lot. You should know what protections you have, and which are illusory. This is especially important when writing systems code! Almost none of this applies to toy whiteboard samples but real systems run into these issues.
Can I do this — a lot?
I have this interesting notion for doing a sanity check, let’s call it the Chen Test because I learned it from Raymond Chen. It goes like this: “This thing we’re talking about. Would it be OK if everyone did this? A lot?”
One example he gave was that people wanted to have a flag for windows that would make their windows go on top of “topmost” windows. What is that, the “Toppermost” flag? People actually asked for this, but it obviously fails the Chen Test.
In Languages and Frameworks, the Chen Test has equal applicability. You just consider, “If you use this thing, a lot, is that OK?” While you’re at it consider how this thing you’re going to do will combine with the other things you’re already doing. The Chen Meta Test.
A Little Journey
At this point let’s pick an interesting example and see how deep the rabbit hole goes. Here’s a fragment to consider:
template <typename T, typename... U>
[[nodiscard]] std::shared_ptr<T> attach(std::shared_ptr<T> ptr, U&&... u)
{
return
{
ptr.get(),
[ptr, capture = std::make_tuple(std::forward<U>(u)...)](T*) {}
};
}
You can be forgiven if you don’t think that’s simple. But don’t worry, we’re going to go over this line by line… And keep in mind this is “an easy one.”
Attach Example
Before we dig in, let’s look at a possible use case for our fragment. With it, you can write something like this:
shared_ptr<Foo> some_function() {
shared_ptr<Bar> bar = make_shared<Bar>(...);
shared_ptr<Baz> baz = make_shared<Baz>(...);
shared_ptr<Foo> foo = make_shared<Foo>(..bar..baz...);
return attach(foo, bar, baz);
}
The idea being that I you attach the lifetime of secondary objects to the first object.
Ok, how does this work?
Inhale deeply…
Attach In Detail
attach
is a function (good start!) that returns ashared_ptr
. We’re constructing theshared_ptr
using the form that takes a raw pointer and a delegate to invoke when the pointer is supposed to be deleted — the so called “deletor”- The first argument to the constructor is the underlying pointer of the first argument to
attach
,ptr.get()
; So, the pointer we create will point to the same thing as the original first argument. You all knew you could justptr.get
ashared_ptr
and lose all the protection at any time, right? - The second argument is a delegate that does… nothing. The body is empty. But (!) the creation incidentally captures the original pointer and makes a new tuple consisting of every other argument. All of these are are going to be copy by value but mostly the arguments are likely to be other shared pointers anyway so it’s upcounts and downcounts galore. Look here to learn more about those costs.
Is that a Functor I See Before Me?
Let’s take a moment to recall The Meaning of [](){}
When you write this:
[a, &b, c](int x, int y) { ... body ... }
What you get is something like the below [assuming the lambda escapes and becomes a std::function]. The actual details are even more complicated and are covered at greater length in this article. But here’s a first approximation to the truth:
// The actual names will of course vary by compiler
// operator() ends up mapping to the Do function below
class A_Functor {
A a;
B& b;
C c;
A_Functor(A _a, B& _b, C _c) : a(_a), b(_b), c(_c) {}
virtual void Do(int x, int y) {
... body ...
}
// These additional "virtual" functions
// They are kind of like virtual because they are in a Functor
// Default Copy Constructor
// Default Copy Assignment Operator
// Default Move Constructor
// Default Move Assignment Operator
// Default Destructor
}
OK so with this in mind, and looking back at attach
you can see that it works because its leveraging the default virtual destructor of the anonymous class generated for the lambda that does nothing.
It’s so obvious now! Of course, the std::function
auto generated virtual destructor. How did I not see that?!?
But wait… every time we make one of these things that looks like an an anonymous function, what we’re actually going to get is an anonymous class? With like a bunch of members we probably don’t even need? And this happens every time???
Oh dear.
Now when you write attach
... someone might be tempted to write something simple like that original sample.
shared_ptr<Foo> some_function() {
shared_ptr<Foo> foo = make_shared<Foo>(...);
shared_ptr<Bar> bar = make_shared<Bar>(...);
shared_ptr<Baz> baz = make_shared<Baz>(...);
return attach(foo, bar, baz);
}
Does this person know they just made a class? But better still suppose they used some helper like this:
shared_ptr<Foo> some_function() {
shared_ptr<Bar> bar = create_bar(...);
shared_ptr<Baz> baz = create_baz(...);
shared_ptr<Foo> foo = create_foo(..bar..baz...);
return attach(foo, bar, baz);
}
Those helper functions might themselves have used attach
so now we have a whole tree of unique anonymous classes. And they are invisible to the caller — it looks "free" to do this. But there is no magic in C++. If your state is getting destroyed, you can be sure there is a generated destructor doing the job. This harmless looking construct has turned into a nightmare of code generation.
And that was an easy one.
A Lambda Monster
Let’s look at this thing… it’s supposed to return something you can call and that something captures some state…. in the end you get just the one pointer.
class Thing {
static shared_ptr<std::function<void()>> Create(args) {
struct data {
A a;
B b;
};
shared_ptr<data> _this = make_shared<data>();
auto f1 = make_shared<std::function<void()>>(
[&_this]() {
... _this.a ... _this.b...
}
);
auto f2 = ...;
return attach(
make_shared<std::function<void()>>([&]{
... f1()... f2()...
}),
_this,
f1,
f2
)
}
So wait I have two things that are sort of private, f1
and f2
and they capture some state that we stashed and then we have the one method we can actually call and it's holding on to all of that.
Wait, now I’m having a flashback. Functions that access common state… I’ve seen this before… is there a name for this? Oh yes.. member functions
Can We Just Do It Old School?
class Thing {
A a;
B b;
void f1() { ... use a/b ... }
void f2() { ... }
public:
Thing() {}
Go() {
... f1()... f2()...
}
}
As you look back to the lambda monster, consider the cost. It’s not just that it looks worse, it’s also far more expensive.
It seems like now that lambdas have crept into the language people feel the need to use them all over. And rather than making a class to capture state for their needs they make some generic template monster that can do it for any state, but they forget the cost… The simple code is also far more debugable.
If you wanted to debug the destruction of a Thing
it would be as simple as adding an empty body to put a breakpoint on and the state is obvious to anyone...
Having a clear understanding of what will actually happen when you write code is going to help you a lot. Seems obvious and yet, people often fall into the trap of using something simply because it looks concise. Brevity is not everything.
On the top list of often abused are these big hitters:
std::shared_ptr
- Lambdas (and by extension
std::function
) std::string
What really kills me is that sometimes we add new constructs that simply make it easier to do the wrong thing. This is never good.
Pop Quiz
Suppose we have this code:
void foo(const Bar& bar, ...) {
// some function calls and uses of bar
}
What are the answers to these questions:
- Is
bar
immutable? - Is
bar
guaranteed not to be a wild pointer? - Is
bar
guaranteed to be not null?
Take a moment to think.
.
.
.
.
The answers are no, no, and no.
To make the point I will write a simple program. This program does have some casts in it, but it doesn’t cast away const or anything rude like that. It doesn’t do anything memory-rude at all actually. The casts are only there to let it print pointer values as hex addresses with printf
.
struct Bar { /* anything */ };
using func = std::function<void(void)>;
void foo(const Bar& bar, func f) {
printf("1: %llx -- did it start valid?\n", (long long)(void*) & bar);
f(); // any function call, what can we assume now?
printf("2: %llx -- does it look valid?\n", (long long)(void*) & bar);
}
int main(int argc, char **argv)
{
std::shared_ptr<Bar> bar1 = std::make_shared<Bar>();
std::shared_ptr<Bar> bar2;
foo(*bar1, [&]() {
printf(" free: %llx (this pointer is now wild)\n",
(long long)(void*)&*bar1);
bar1 = bar2;
printf(" replaced: %llx (new value of bar1)\n",
(long long)(void*)&*bar1);
});
foo(*bar1, [&]() {});
return 0;
}
When I ran this code the output I got was:
Why?
const
does not mean immutable, it means that you cannot mutate that marked value with that particular variable. It says nothing about what other parts of the program can do. It says nothing about what the very same function might be able to do using a different variable.const Bar &bar
is only different fromconst Bar *bar
in a notational way. In terms of the assumptions, dangers, and generated code, there is no difference.
In this sample code I arranged for the callback function to do a simple assignment that would cause the argument to be deleted. In the next call, we pass in the null pointer and just like that we have a null argument.
Sometimes people attribute more null safety to the reference form than the pointer. This is not supportable. The provided reference is not necessarily directly initialized from say a stack or global variable, it could be:
- A dereference of any pointer, good or bad.
- A reference to an object that is embedded in some other structure.
In either case its lifetime is not assured.
What is ’&’
Actually About?
The whole &
notion was invented to solve exactly one problem:
// we don't want to write this
void foo(int *x, int *y) {
*x = 1;
*y = 2;
}
... foo(&x, &y);
// we'd rather write this
void foo(int &x, int &y) {
x = 1;
y = 2;
}
... foo(x, y);
That’s it. The only difference is the notation — if it’s a reference you don’t have to apply &
before you can pass it in and you don’t have to write the *
before you use it. But under the hood it’s still a pointer.
That means foo(const T& t)
has all the same hazards as foo(const T* t)
for all T
. Including, but not limited to:
std::shared_ptr
std::unique_ptr
std::string
And furthermore, foo(const std:string& str)
is never more memory safe than foo(const char *str)
but it is more expensive because of the double pointer indirection and also std::string
is not small.
I’ll say this another way: const std::string&
arguments work out commonly because of lifetime assumptions you make about your incoming argument. Assuming you have the assumptions right, and given those same assumptions, const char *
would also work in the same ways for the same reasons. It just cheaper and it makes the fact that assumptions are required obvious.
What std::string
gives you, the only thing it gives you, is a nice wrapper for a mutable string. If you need string mutation knock yourself out, that’s what it’s for. If you’re done mutating and you need long term storage for what you made, dense arrays of char
and const char *
are the way to go. Or, if you need substrings, then use string_view
over the same arrays is the way to go. Unlike const std::string&
, the string_view
class has useful features!
Conclusion
The big problem with using the more complicated patterns is that they are dev hallucinations. It seems like they offer protection, but they don’t. All you have to do is consider interesting cases like:
What if one of my callback functions is async? What if some of my input data is reachable via state that is stored elsewhere, like a subtree pointer? C++ is silent on who owns what data and this, is what leads to use-after-free and type alias problems. Use of &
and (ugh) tuples full of perfect forwarded&&
stuff does nothing to address the fact that the compiler doesn’t understand lifetime.
Indeed, one of the most important things anyone learns in a new C/C++ codebase is the rules of the road for lifetime in their little local universe. Following those hidden rules is what leads to success and differentiates the experts from the newbies in that codebase. But any time the rules are non-trivial you can expect security and privacy problems.
If you combine the fact that they are not safer than classic patterns and the compose badly, it’s easy to see why my usual advice is “just don’t go there”.
The fact that it’s common to see code where std::string
is used as the default string everywhere is just tragic. If it actually matters, it’s going to let you down.