Some Notes on Stateful Servers in .NET

5 min readSep 13, 2023

Introduction

Before I get into this at all, let’s give the usual disclaimers:

the following is only going to be approximately correct because being fully correct takes way too many words
this will be a generalization of different kinds of things which are likely to be important but might be more or less important in your system than I indicate below
never use the advice of some so-called performance expert such as me as a replacement for good solid measurements

That said, I think this stuff is useful so here goes.

State

State can come in a variety of ways but in this context I mean memory. Which is to say some unique-ish information about some “users” that is stored in the server. If the state is exogenous (like in a database or something) then most of this really doesn’t apply. The pattern of having nodes that compute, nodes that store, with each redundant in their own way is popular for a reason. But sometimes, you need state. Lots of state.

The presumption is that this state is spread across many systems but by some kind of sharding, and local synchronization, any given user can be served by some smaller pool of servers. And these servers are in the business of holding that users state and doing the necessary computation on it. Or perhaps it’s only some of the state. But enough.

This means these servers are likely to be using almost all of their memory in the service of state. You probably want as many users as possible per server, again with some redundancy. “User” could mean anything here, whatever the unit of sharding is.

Representation

The normal .NET idioms (and this is true of a lot of systems not just .NET) will encourage you to model your state as objects. Assorted objects to represent the User and then more objects to represent the Users’ data. Maybe lots of objects.

This is almost certainly a terrible idea.

It turns out that if you want to create a system with the worst possible performance you could pretty much read any book on Object Oriented Programming and do what it says. This may not be exactly true but you can be sure that creating a sea of objects is going to be terrible. Why?

pointers are fat, so they make your data fatter
pointers create opportunities to delocalize your data, which is to say that if A points to B points to C we have no a priori reason to believe A, B, and C are close to one another and so following that chain is likely to be uneconomical
pointers make more work for the garbage collector and indeed at some interval all the pointers have to be traced, whether the data was used, or not

Remember, in this context, we’re talking about a server that is trying to use all the memory it has to efficiently store as much state as it reasonably can. If only for economy. So we might be talking about a few hundred gigabytes of data.

Small objects with lots of pointers between them will be very uneconomical in terms of Actual Data Stored per byte of memory used, and, will certainly not minimize the amount of housekeeping the garbage collector has to do.

The Most Durable Objects Must Have The Most Density

I first started discussing this in the context of games. Games are a great incubator for lots of performance problems. Back in 2007 I gave a talk at Gamefest on the XNA Framework and how best to use it. It ran on the Compact Framework which did not include a generational collector. So full collections only. Ouch.

I had this to say then, and it still applies now:

If your game design includes a lot of objects you are going to get killed by the full collections. The bulk of your data must therefore be devoid of objects. It’s ok to have objects for the player and the things that are around the player, there won’t be so many of those, maybe a few hundred. But the level design — that can’t be objects. You want to represent the levels with arrays of numbers and big chunks of text (not individual strings) that you can randomly access.

When it comes time to collect those things, the GC will discover there are no pointers and will simply skip them. This will make your collections much faster, effectively giving you something more like generational collection algorithms.

Now the full version of .NET of course has a better collector, and it can use lots of tricks to do things faster. But the thing is, in a big server environment, the situation is a lot more like a scaled-up version of that game. If you represent your durable data as values with shape being provided mostly by adjacency you are going have very economical garbage collection and excellent data density. Both of which are great for getting the maximum value out of your processor.

I’ve seen big servers with “% time in GC” well above 10%, even approaching 20% or 30%. This is pretty unhealthy. I prefer to see numbers that are more like 2%, or 3% in the GC. This is achievable by using fewer objects.

Even if the algorithms are a little more complicated, you can easily pay for significant additional computation with savings in the memory layer. I had written about this also in the past:

The general advice there was, “imagine what you would do if you were going to store the data in a database. Now do that in memory.”

It’s pretty good advice.

Remember that you can retire something like 150 instructions (YMMV) in the time it takes to deal with one L2 cache miss. That is plenty of room to pay for a lot.

In fact, if you literally stored your data in a SQLite in-memory database and then read it with the managed code access to SQLite — saving prepared statements to avoid recompilation of course — you might come out way ahead on speed, storage, and correctness. You get an obvious transaction story and a replication story. You could even persist and merge if you wanted to do so, making re-sharding all the easier.

When all is said and done such a system might easily outperform a poor OO design while still giving you a nice interface to the parts you need in at given moment. Matt’s prototype above used a similar idea.

Touch Not What You Need Not Touch

It probably goes without saying, but any system with good performance will rely on touching only the necessary data. Any system that has to visit all the data, even periodically (such as to collect it) is necessarily materially disadvantaged. How disadvantaged will depend on the frequency and extent of the visitation. For instance, if your system basically never promotes objects to Generation 2 (watch the promote rate counters!) then you might go days between Gen2 collections. In which case those costs will be irrelevant. Density and Locality issues on the other hand will hurt you 100% of the time.

That’s It

It’s not complicated.

OO Patterns are very poor for big data, even in memory.
The GC follows all your pointers even if you don’t ever use the data.
Cache misses are much slower than normal instructions so it’s worth a lot to reduce them.
Denser data representation means you can store more useful data in the same memory.
All of these things lead to choices that are rich in values and low in pointers for any kind of durable data.

How much this all matters will depend on your system.