Several people have been asking me for some general advice on how to do performance analysis and so I thought I’d summarize some of the training material I’ve used in a brief article. I’m going to try to cover a variety of cases but, as usual, I’m also going to try to be brief which means the information will only be approximately correct. Please keep that in mind. :)

What your result looks like

It’s important to remember that the output of most performance analysis is a document. The point of that document is to succinctly describe the present situation, what is going wrong, why it’s going wrong, and what might be done about it. Sometimes you’ll be the one doing the work to improve things, especially on smaller projects, but as often as not it’s more efficient for you to summarize the problem for someone who knows the code better than you. That’s the most efficient way to do this.

Where to Start

There’s two initial steps.

  1. Take a very broad look at your scenario and get a sense of what kind of problems you might be looking at.
A summary of the resources on a system
  1. Am I using this resource in an efficient manner?


Most people start with a profiler that will tell them about their code. Before you proceed, read the above, you want to make sure you actually have a CPU problem before using CPU tools.

  • Low level instrumentation can tell you how many instructions you retired in your workload, dividing that by the cycle time can give you “CPI” cycles per instruction. A high CPI indicates that the CPU isn’t working very efficiently. Further instrumentation can tell you which factors are resulting in poor CPI (cache misses are probably the most common)


The attack strategy is actually remarkably similar even though the problem is totally different.


Again remarkably similar. What is the total volume of data? What does efficient transfer look like? What code is driving the network usage? What resources are being fetched?


This should be sounding familiar by now. There are some wrinkles.


Memory is entangled with many of these others as we just saw looking at GPU and I hinted at looking at CPU.

Wait Analysis

If none of your resources are looking very busy then you probably have some contention. One way to think about this is as follow: any data that has been wrapped by a critical section is actually a software resource. It has wait time, a service queue, utilization percentage. All those same things.


I’m already much longer than I intended but if you read this far this summary should be self evident:

  1. Look for wasted work. Anything that isn’t forward progress on the problem at hand, remove it! This is the stuff that shouldn’t be happening at all. Wasted work can come in the form of bad algorithms.
  2. Consider the ways you are using your resource, are they efficient? Is your access pattern a good one? If not change your patterns to be more friendly to the resource. Again, bad algorithms can drive this.
  3. Finally, document your findings and learn from them. Documenting also lets you get audits from colleagues and encourages you to be thorough.

I’m a software engineer at Facebook; I specialize in software performance engineering and programming tools generally. I survived Microsoft from 1988 to 2017.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store