Diagnosing Garbage Collector Problems 101

[originally posted 6/30/2017, seeding some old content here]

I’ve been doing this for some time and I have a few basic ways to get to the bottom of typical GC problems. Note: this is not even remotely a comprehensive guide on the subject but rather the way to think about the most basic types of problems you will encounter. I make reference to the performance counters available in .NET because I know them the best but I think every collector worth mentioning has these notions (where they apply).

Is there a problem?

Generally: If your percentage is in the mid to high single digits you don’t have a problem. That’s likely comparable to what you would get from a traditional allocator. YMMV.

I have a problem, now what?

Frequent Collections

If you find that you are promoting a lot of objects, which then die quickly, then you are going to be driving a lot of large collections to get that memory back. Those collections are not cheap. Try to address the lifetime problems by either making the objects more durable (which has its own problems) or less durable (probably a better idea). A classic reason you can get into this situation is that you are allocating a lot of objects with finalizers. Objects that require finalization necessarily survive at least one collection. If you can eliminate the need for finalization (by forcing the issue with explicit cleanup for instance) you will be doing much better. Alternatively if you can recycle those objects instead of making them die you may be ok. The worst thing to do is to drive a lot of death into the eldest generation.

To get clues about what objects may be causing this situation, dump the heap. Many heap dumpers will tell you what’s in there that’s already doomed. That’s your clue. Alternatively try to get a dump of what’s being promoted.

If you don’t have a collector with a partial collection strategy, or if the promote rate is looking ok, then it’s likely that your overall allocation rate is the problem. Look for sources of temporary objects. Big sources of temporary objects often result from silly things like having object comparison methods that allocate, or object hashing methods that allocate (if you have an object with 5 fields don’t hash it by allocating an array and storing the 5 fields in it and then hashing the array).

To get clues about what might be driving your allocation rate, attach an allocation profiler to your process and look at the statistics. The data types or allocation stacks should tell you where the spam is coming from. Nix the spammer.

Slow Collections

  • store less overall (this is always a good idea)
  • store the same but do it with fewer objects and especially fewer pointers

If the collector has to trace a lot of objects this is going to have a bad effect on your application/server no matter the strategy. If it stops the world then you get big stalls; if it’s concurrent then dancing around all those pointers in the background while your program is trying to do useful work is going to ruin your locality and therefore your performance.

To minimize the work the collector has to do, make your durable objects low in pointers and rich in values. Use larger objects (like arrays) and blocky objects (like btree) rather than fine-grained objects (like linked lists).

It’s important to take a close look at your building blocks. For instance, in Java the standard HashMap links HashEntries together. That’s gonna be a lot of objects. If you have a huge hashmap you’ll find that the collector has to work overtime tracing it compared to some other structures that are low in pointers.

Summary

I’m a software engineer at Facebook; I specialize in software performance engineering and programming tools generally. I survived Microsoft from 1988 to 2017.