A Taxonomy of Performance Pitfalls

Rico Mariani
7 min readMay 4, 2023

--

I’m often asked how I tackle performance problems. There is no one answer of course — there are many ways you can get bad performance in any given system — but I’ve found that just a few high-level categories are pretty good at telling the story. I don’t want to create the overall Venn Diagram of these things, I’m sure it’s very complex, but usually one of these is a good way to think about your problem, whatever it may be.

1. You have taken a dependency you cannot afford

This happens in lots of ways. My favorite is where people decide that they are going store their app settings in XML and so this puts a huge XML subsystem like say System.Xml into their startup path. Then looking at the startup settings the total content is like 6 integers and one string. So, they just took this massive hit just to parse something like 100 bytes of settings.

I can’t stress enough that you consider the cost vs. value that your dependencies bring. One of the great tragedies of the .NET framework was that the system made it very easy to bring in even huge amounts of code giving you a simple, clean, way to do something like load your settings, but at great cost.

When this kind of thing happened, I often asked, “Did your startup performance just suddenly go bad on Sunday? Like everything was fine and it was all fast but now out of the clear blue sky System.Xml is too slow?" The point being that what happened here was that surely nobody had tried it out at all to see if it was even remotely appropriate.

Another favorite of mine was the early version of the binary serializers which literally invoked the C# compiler at runtime (usually at startup!) to build the serializer and load it as a DLL. This was crazy expensive. And often they were going to read just a few dozen bytes of serialized state.

2. You have selected an inappropriate algorithm

This can manifest in a variety of ways. It might be the asymptotic complexity like O(n²) vs O(n lg(n)) but often it isn’t exactly that, or not just that. More often you have something like maybe a central storage that’s very good at insertion but very bad at deletion and so half your workload is suffering. The choice of some common collection type (like the wrong kind of hash table) leaves you with a huge compute cost over some of the elementary operations in your workload.

It might be that to do the job well you actually need two different data structures that are kept synchronized, or loosely synchronized. This can get arbitrarily messy, but one must remember that in these cases the issue isn’t that performance tuning is wrecking your simple code, the issue is that the simple code fundamentally isn’t suitable. The shape of the problem (e.g., mix of reads, inserts, deletes) should be describing the shape of the solution and a complex and nuanced workload likely commands a complex and nuanced response to that workload.

I often use this metaphor here: when you are creating a proof where the theorem involves absolute values you can be very nearly positive that the proof will have two cases. This is because absolute value is defined in terms of two cases and this reality will find its way into any correct proof. Sometimes you get lucky and you can fold the two cases but, even then, you have one case and then an abbreviated proof for the second case where you show its actually the same as the first case. Now if you have two absolute values in the theorom you can reasonably expect the proof to have four cases. The growth here is fundamental to the problem, not an issue with the non-economy of the proof.

Similarly, good engineering solutions reflect the fundamental complexities of the workload. Many things in the workload won’t matter but some things will. Identifying those things is essential and each of those things should see an affordance in the solution. If they are not considered, you get something that is unfit for use.

3. You are using more resources than you need

Here we come to the cases where the code is what is often called “bloated” or “hoggish”. These classes of problems involve waste. Maybe wasted CPU cycles recomputing the same things. Maybe wasted disk writes flushing data that will never be read. Maybe crazy duplication in storage like strings that are repeated over and over…

A good way to look into these is to think about what a minimal solution to your problem fundamentally must do. This magic solution, which may be unbuildable, can give you the speed of light in your system — no algorithm could be faster. Then you can compare your solution against that to get a sense of what your waste level is. Maybe you’re able to approach the speed of light. Maybe it’s just a useful reference. Maybe 2 times C is the best you think is actually achievable. It’s still a good exercise.

When I worked on the C compiler, we liked to compare its full-build speed to this:

$ cat *.h *.c >/dev/null
$ cp correct*.o correct.exe output

This is the “magic” compiler that only does the minimal i/o because magic.

If the compiler was writing far more .o to disk than the minimum something is going very wrong.

4. You are not using the hardware cost-effectively

In this case, the total volume of work might not be high (so not #3) but we’re not giving the workload to the hardware in a way that it can do the work economically. This is kind of a variation of #2 above but it covers other important considerations. Things like:

  • my network packets are a good size
  • my disk reads/writes are a good size
  • my disk reads/writes have suitable locality
  • my data structures are dense and have good locality
  • my code is dense and has good locality
  • my code lets the CPU go to low power states and stay there

There are many ways to abuse the hardware; my favorite example is a system that needs to read two files from a regular old-school disk: the last thing you want to do is read a few k from each file, the disk is likely to be seeking back and forth. You can do that same job much more economically by reading the files one after the other.

This simple phenomenon is analogous to many things that happen in micro-architecture: false sharing in caches, poor dependency chains in code, high levels of indirection. All of these things make it impossible for the hardware to put its best foot forward.

Power is particularly pernicious: simple things like making sure all your timers fire together rather than haphazardly can vastly improve the CPU’s ability to do your work economically. Do all the work, then sleep.

A variation on theme #4 is where you are using your software in a manner that is not cost effective. For instance, maybe you are using a system with a garbage collector and there are pointers all over your data structures resulting in high tracing costs, or there is too much promotion into durable generations resulting in very frequent collections.

5. Your orchestration is fundamentally wrong

Orchestration is about what work happens when and this goes wrong when work is scheduled at a time that is unwise. A simple example of this is the kind of work that happens at startup. In startup of a client application, you want to do all the things that are needed to create a working UI and pretty much only those things. You know you have bad orchestration when you plot the code that ran on some kind of timeline and you find yourself asking “Why is that running now?” or “Does this really need to happen here?”

Often, moving work around gets you a much better result.

Orchestration is broader than just this though. Maybe you have a ton of async work that your system generates. In what order should you process it? Even if you’re very good at doing the work economically there may be natural prioritization that is appropriate. Perhaps some work should be capped, or merged.

Bad orchestration can lead to latencies that are very bad even though overall costs for each work item are reasonable. If your product has an SLA then orchestration can be the chief tool to ensure you are compliant.

Conclusion

I can’t cover everything with just these few paragraphs, but these big themes come up over and over again. Usually, solutions do not require new complex craziness but rather a clearer understanding of the workloads and the fitness of the solution. Often complexities arise organically from nature of the problems and in these cases I don’t really think of it as “performance issues made the solution more complex” — the problem had those complexities in the first place.

Hopefully this highly simplified taxonomy well help get your creative juices flowing.

Other Reading

--

--

Rico Mariani
Rico Mariani

Written by Rico Mariani

I’m an Architect at Microsoft; I specialize in software performance engineering and programming tools.

No responses yet