Making Great Platforms
I feel like some of the problems that are associated with making great platforms, software platforms of course, seem to come up over and over again. Maybe it’s just that I’m older and I’ve seen it more times now, but I do hear the same stories with the same solutions being re-discovered. Or, worse, being re-ignored only to be re-discovered after re-disaster strikes.
The thing is most of the platforms you will ever work on are not the biggest in the world — there are few things out there that are the size of say Windows (Azure or not), or Visual Studio (VSCode or not), or Office365, as platforms. In fact, not even Windows is the size of Windows — most of these platforms are broken into largely independent parts that are much more manageable. But even one of those parts is already something to reckon with, maybe with a whole ecosystem of its own. Consider for instance DirectX as a foundational technology and the number of platforms built on top of it. When you are at that size, there are an incredible number of amazing platforms ranging from SQLite to Unity with plenty more in between and sideways.
Even so, most things you’re likely to work on are much smaller than these and they are likely to have far fewer customers than the things I’ve mentioned, but they still benefit from the essential techniques used by the best platforms.
Here’s some techniques that I think are the most important that are nearly universal.
1. Understand your boundaries
Whatever your system is, you’d like it to be used in as many contexts as possible. Some platforms need to be “hosted”, some run “on the metal”, some can do both. Good platforms have clear boundaries that people can understand and use.
Let’s consider SQLite as an example. It can run on bare metal basically, using just file i/o APIs, but it also offers things like virtual file systems allowing it to be used in many contexts where normal storage might not be available (like WASM) or where the backing store might be other than vanilla files. Virtual file systems are a clear affordance that is part of the platform design to make it more useful in more contexts. Complementing this, SQLite also offers a clear API to call, so if you want to build on top of it, you get something stable that you can reasonably understand. Great platforms offer both things.
Having made a contract on either side, then it’s the job of a good platform to be as silent as possible on the rest. DirectX mandates as little as possible in the design of great games that are built on top of it and did not have to specifically make affordances for say Unreal, or Unity, or the various custom engines that are in AAA games. This is a feature.
Platforms mainly fail because either they lack sufficient useful features, and hence are not compelling, or else they have an over-abundance of constraints and prescriptions, and hence cannot be used in the ways that their customers would like to use them.
2. Enable Rapid Iteration
Platforms that take too long to fix bugs or add new features are highly undesirable. And how do you get rapid iteration? In two words: great tests.
It’s ironic that people think of tests as the thing that slows you down, but the reality is quite the reverse. The reason that you can move quickly in an extensive and important codebase, with lots of customers, is that you have a good safety net of tests that ensure you are not about to land some change that is terrible idea. These tests are often not just about correctness but also about performance and efficiency. A platform that requires a long battery of manual tests, or long alpha/beta times while customers try things out is going to be significantly disadvantaged. And while it’s basically impossible to prevent every bug that might affect your partners, high quality tests can go a long way to giving high quality results.
SQLite is probably the best study here. Bug fixes frequently land within minutes of being reported (especially with easy repro steps). And why is that team not terrified to make fixes in short order? Because their test suite is all kinds of awesome. The sorts of mistakes the developers are likely to make have plenty of defense and new mistakes are met with yet more tests. The result is that a small set of developers is able to maintain a library used on billions of devices without being paralyzed by fear of breakage.
Great platforms test in many dimensions, including performance. The theme of “if my customer finds these problems first, I’ve made a mistake” is a great one to get into your team here. The best platforms include many synthetic workloads that are stable, and therefore can be measured build-against-build but are also highly representative of things that customers actually do. For instance, while I worked on edgehtml we had tests that emulated the Facebook site. This was not the actual site mind you, that would have been a moving target, but a controlled clone that had the right mix of “Facebooky” things. We couldn’t do this for the entire web of course, but we had good samples for news and sports sites and a few others. With those, plus a much bigger series of micro-benchmarks, we could be very confident on each drop that we were not about to do something horrible. These measures are of course imperfect, but they make failures the exception rather than the rule.
Whatever your constraints, good tests can help you to understand your code’s efficacy on important workloads. It can be SLA type constraints, energy usage, frame rates, network traffic, disk wear, and of course correctness, but really anything at all that is important to your platform.
3. Understand your customers
I might be a bit contentious here, but it’s often important to not do what your customers are telling you to do. This is not the same as not listening.
And here I have to bust out the Henry Ford quote that isn’t really a Henry Ford quote:
“If I had asked customers what they wanted, they would’ve asked for a faster horse.”
Ford famously made great progress by locking down designs and optimizing, which worked great until his competitors started delivering much better cars.
So, while doing exactly what your customers ask for is perilous, it’s even more perilous to not recognize unmet needs and start planning for them. Steve Jobs shared some of Ford’s notions about customers, but he also strongly believed that you had to know the customer’s needs even better than they did to be successful. The Jobsian mode is not to ignore customers because they might be wrong, but to listen carefully, think deeply and then create solutions that are even better than what your customers are asking for. Before they ask if possible.
How do we do all this?
Well, sometimes I hear that you need specialists in say performance to get the great performance work done, or specialists in user interface, or industrial design, or what have you. I can hardly say to you that you don’t need such people at all, but they aren’t enough. And frankly such people will likely be very frustrated if you were to box them into exactly one role.
More than anything you need a team, or at least “squads” that can come together and look at problems from many angles. For instance, great performance engineers often make great reliability engineers because they know the product deeply. And those are exactly the kinds of people you want around when you are brainstorming new coolness for your customers because they are the ones who can tell you about untapped potential. They will come up with cool ways to do “the impossible”. And you’re going to want to surround them with people who know the customer experience and needs, because that, too, needs to be in the air, or at least in the water.
If you can find squads of people who frequently see things from different perspectives, that maybe begin by disagreeing on what is most important, but also communicate well and reconcile with each other then you are going to have some real winners on your hands.