Casey Muratori has released an excellent video debunking five reasons why people/teams often, in fact in my opinion, nearly always, think that they can simply ignore performance issues and they don’t even want to talk about them. His top-level debunked points are:
- No need : i.e. processors are fast enough, you don’t need to worry about this in 2023
- Too small : i.e. the gains are too small anyway
- Not worth it : i.e. you don’t get enough even if you are willing to pay for it
- Niche : i.e. it only matters in some rare subsegments of programming
- Hotspot : i.e. you only have to tune a tiny fraction of the code anyway, just get an expert to do that
This is a brilliant video. Casey cites many sources for real examples and drills in particular into published performance successes at Facebook and what they did and why. I was happy to confirm that the information he was citing was substantially accurate as I had been directly involved in some and had first hand knowledge of many others.
These articles are well worth reading and I have not much to add other than this: if anything, they are underselling their case.
Now Casey and I sometimes disagree on the best perf strategy in any given situation but we never disagree that you should have a strategy, one that you have considered carefully balancing the various costs and benefits. Reasonable people may disagree on the best choice at that point but Casey’s main thesis is that you should have the discussion lest you find yourself painted into a corner and then have to do a costly rewrite.
I first wrote about this sort of thing back in 2006.
I was responding briefly to an article written by Randall Hyde and he was in turn responding to an essay by Charles Cook (link no longer works)
I’ve always thought this quote has all too often led software designers into serious mistakes because it has been applied to a different problem domain to what was intended. The full version of the quote is “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” and I agree with this. Its usually not worth spending a lot of time micro-optimizing code before its obvious where the performance bottlenecks are. But, conversely, when designing software at a system level, performance issues should always be considered from the beginning. A good software developer will do this automatically, having developed a feel for where performance issues will cause problems. An inexperienced developer will not bother, misguidedly believing that a bit of fine tuning at a later stage will fix any problems.
Now I have often relied on variations of these words in teaching performance tuning in the last couple of decades-ish.
Knuth called it “Hoare’s Maxim”, here it is again:
“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.”
Back around 2005 I wrote a preface for the Patterns and Practices Group at Microsoft, they had written a book full of performance guidance (you can still get the .pdf somewhere I’m sure, a lot of it is evergreen content). I wanted to use this quote giving advice similar to the above and I wanted a proper citation because there never is one. At the time Tony Hoare was working at Microsoft Research, so I called him :D
Suffice to say he wasn’t happy with how his maxim was mis-applied. But he couldn’t tell me where he had first written it, he told me “It sounds like me and I’m sure I could find the original source given a few hours in a good research library.” In 2006 his maxim was already decades old. I had heard it in the 80s and it likely dated back further than this.
What Casey is saying, what Daniel is saying, what I have been saying… this is stuff that has been well known to experts, Turing Award Winners, for decades. Denying it requires a willingness to be blind to reality that is of absurd proportions.
If you believe Tony Hoare (and you should) small efficiencies will not matter about 97% of the time. But far from being reason to ignore performance, this is a warning. What it means is that in 97% of the cases, trying to come back and fix things with hotspot tuning will make no difference at all. That code can only be improved by architectural changes. And in 3% of the cases, you must pay attention to the smallest details or be screwed. As Hyde writes, that is about one line in 33. Which means on any given screen of code there are probably two huge whammies. Do you know them? Page down, there’s another one, page down again, maybe two on that page. This stuff is everywhere.
While it’s true that hotspot tuning can get you some speed in many cases, it’s rarely enough, and it won’t get you to anything like the best performance. When you look at real systems what you discover is that after you eliminate the dumb stuff you end up with flat looking profiles. Maybe you can get a few local improvements, but in those cases the likelihood is you fixed something particularly dumb. When those dumb fixes are done, you’re left with hard issues. Issues like the data has crappy locality, the language tech has too many virtual calls, or the garbage collector behaves badly on your workload. These kinds of things don’t show up as a hotspot but rather as a flat tax across your whole system, sometimes 2x, 3x, 10x. They come about as a function of fundamental choices in how the system will work. They can’t be changed after the fact in any reasonable way. You have to rewrite.
I’m sure that premature optimization is bad, I mean premature anything seems not so good, it’s a rough way to start a sentence. Short of lottery winnings it’s hard to think of anything premature that’s a good idea. But Hoare’s 97% is about what he called “small efficiencies”. It doesn’t mean you ignore the overall design until later, you just don’t tune every damn thing too soon.
When I was teaching, I often used this metaphor: suppose you’re writing some system, you decide that you should avoid premature optimization, so you take the usual advice and build something simple that works. In this metaphor let’s pretend that your whole program is a sort. So you choose a simple sort that works. Bubble Sort. You try it out and it functions perfectly. Now remember Bubble Sort is a metaphor for your whole program. Now we all know that Bubble Sort is crap, so you have to eventually change to Quicksort. Hoare likes you more now. So how do you get there? Do you just, you know, “tune” the Bubble Sort? Of course not, you’re screwed, you have to throw it all out and do it over. OK, except the greater-than test, you can keep that. The rest is going in the trash.
But you got valuable experience, right? No, you didn’t. Anything you learned about the Bubble Sort is worthless. Quicksort has entirely different considerations.
The point here is that a small bit of analysis up front could have told you that you needed a O(n*lg(n)) sort and you would have been better served doing that up front. This does not mean you have to microtune the Quicksort up front. Maybe down the road you’ll discover that part of the sort (remember this is a metaphor) should be written in ASM because it’s just that important. Maybe you won’t. There will be time for that. But getting the right key choices up front was not premature. There is a suitable amount of analysis that is appropriate at each stage of your product.
I have been asked to help with many performance problems over the years. Nobody ever calls me in to help when they are 2% over goal. In most scenarios if you are within 2% the champagne is flowing (there are cases where 2% is the difference between greatness and crap). When I get called in people are usually missing their goals by a factor of 2, or 5, or 20… When that happens, the last thing you want is for someone like me to listen for an hour and then put three numbers on the whiteboard that show that your choices had no hope of working even with the best possible computers and networking now available. That is not a fun moment for anyone.
There is no software that doesn’t deserve a discussion about how it will perform and what choices should be made, which choices are likely to matter. Maybe it can be a very simple discussion because maybe the goals are very easy to meet. Maybe some bits are contentious. Maybe some should be decided later. But any system that will have a lot of users or will be used repeatedly deserves better than to have no discussion for fear of premature optimization.
Really, avoiding perf discussions totally is likely a sign of reckless disregard for end users and data center costs. I don’t have a lot of sympathy for engineers that put their productivity before costs accrued to 2B users, or the massive CO2 burden their crap will create in a datacenter for no reason.
Have the talks, consider the resource costs, decide when you’re going to deal with each concern, some early, some later. This will save you time and money; and it will ultimately make your teams more productive and your products better.