I think I end up having this discussion at least quarterly.
If you have “serious software” with “serious problems” you need a performance lab. I don’t think that’s very contentious, it’s just too darn important to have predictable performance to avoid that investment. You can start small and grow up as you need. You can follow good advice like making sure to measure consumption metrics and not just elapsed time. You can probably leverage existing recipes for making your lab. You may need tricky stuff like device racks to power cycle your systems and get new software on them efficiently and reliably. You may need all kinds of other things peculiar to your particular problem.
No matter what you do your lab isn’t reality and you don’t want it to be.
Let me say that again. The lab data is a carefully (!) constructed and controlled lie. It is not your real performance and you don’t want it to be.
The purpose of the lab is to detect detect common mistakes developers make involving the most important scenarios. This means that metrics are selected to maximize diagnosability and the scenarios are selected to help identify problems and get work assigned to the right people quickly.
It all comes down to these desirable properties of your lab:
#1 If you run the same build more than once you get the same result
#2 If you run different builds they get the the same workload
#3 If you re-run a previous build you get the same results as before
Property #1 gives you basic confidence in your data. While there is always variability if you don’t get consistent results with contemporaneous runs you have nothing.
Property #2 allows you to compare two different builds and meaningfully tell if one is better or worse.
Property #3 allows you to go back in time and re-run builds while still getting comparable results so that you can bisect if you need to.
To get the above properties, outside world effects are minimized so that the workload remains constant even if that isn’t normally the case for the real product (e.g. no Valentine’s Day Surge, no seasonality at all in fact). Additionally internal effects are controlled from run to run, things like available memory, disk fragmentation, presence of other processes, and other things of this ilk.
All of this is done to achieve the purpose of the lab, to find problems and generally ensure that the overall health of the product is where it needs to be; and if things not good, to point a clear finger at the source of the problem.
And this has almost nothing to do with the performance that actual people will actually see when they use the product in the world.
In the real world none of the things mentioned above are controlled. Users do not get a consistent experience. Even the same user on the same device gets a different experience depending on what else they have been doing and what the current state of the universe is.
Controlling resource usage and otherwise creating an efficient product, whether it’s an operating system, a library, or an application, is the best way to ensure that your users will get the best experience possible in a variety of situations. You can certainly add more scenarios to your lab to try to capture important things you expect your users will experience but you will never create the variety of signal that is The Real World. Don’t let yourself be fooled into thinking that your artificial reality is the truth, it isn’t. It’s The Matrix.
And the fact is, your lab couldn’t do its job if it was real. Real world data can only be analyzed by good quality telemetry with suitable aggregation and visualization. That’s how you find out what’s actually happening out there. While the real world data is the best, it can only tell you about something you’ve already shipped (and at least somewhat widely if it is to be meaningful) so it’s “backwards looking.” The lab not real but it can predict future problems or the likely absence of them, so it’s “forward looking.”
You need both if you are to be successful.