Lunar Panda Optimisation

Written by Dean Edis.

The technical development of Lunar Panda is now coming to an end. We're getting near to being feature complete, and now have to concentrate on level design, artwork, and sound effects. It is time to put on my 'Optimisation hat' (or 'Optimization hat', for the non-Brits :o).
This typically means I'll start up my code profiler of choice, run various parts of the application, and analyse the results to identify and resolve performance bottlenecks. And I am always surprised at what the profiler tells me! In my earlier coding days I used to try and write the most optimised code I could, as I was writing it. Using lookup tables, inline assembly code, etc - It was all done throughout the development of whatever app I was working on. A lesson I've learned now is DON'T OPTIMISE EARLY! More often than not the 'optimised' code would either be refactored out of existence (Removing the need for code is always the best sort of optimisation), or it would be called so (comparatively) few times it was not worth it.
Of course this doesn't mean you should write sloppy code - Just don't go over the top with optimisation until you really know where the bottlenecks are. Having a highly efficient routine might give you a warm feeling inside, but if making it so is at the expense of readability and maintainability it's just not worth it.
So, when you are ready to optimise, there are always a few good techniques to keep in mind.
    1. Is the profiler showing you any code that doesn't really need to be called? Are you calculating values you're never actually using?
      Removing code is always the best optimisation.
    2. Profile a 'release' build of your code (if applicable).
      There are often overheads in debug builds which can make the profiler results misleading or not applicable.
    3. Don't calculate values more times that you need. A well placed 'lookup table' (LUT) might cost some memory, but can result in significant speed increases.
    4. If you really need to make a function difficult to understand (I remember the days of optimising code using inline assembly, unrolling loops, etc...) consider keeping two copies of the same routine. Perhaps use an easier-to-read version in debug builds, and the optimised version in release builds.
      In fact, where I can, I try and call both functions in a debug build, and 'assert' (or similar) than both are giving the same output. That way you maintain readability and have runtime error checking. Bonus!
    5. If you have to run a time consuming piece of code, try and do it at a time the user won't mind a short delay.
      For Lunar Panda we make use of the time spent on the splash screen, and time at the end of each level when your score is being reported.
As our latest version of Lunar Panda is written in XNA not only am I profiling runtime performance, but also the impact the code has on the C# garbage collector.
Garbage collection is an enormous subject, but I'd certainly recommend anyone coding in C# (or any other garbage collected language) to read up on the basics of their GC. As we are planning a release on the Xbox 360 the issue of garbage collection is even more important - The C# GC on Windows can perform head-over-heals faster than its implementation on the Xbox, so the rules of coding change slightly.
Most of the time we're told 'Don't force a garbage collection - Let the garbage collector decide when is the right time to collect'. A good rule. But on the Xbox the lesson is slightly different. Garbage collection tends to happen much more regularly on this platform, and when it does it can make your app 'pause' for a short while. Not ideal if you're trying to land Mr Panda next to a bunch of asteroids!!
The solution is twofold.
  1. Use a tool to monitor your memory allocations during gameplay,

    I recommend Microsoft's free CLR Profiler (http://en.wikipedia.org/wiki/CLR_Profiler) - Check on the online videos for instructions on how to use it.
    The less allocations you make, the longer it will be before garbage collection happens.

    For example, we'd like to avoid a garbage collection whilst playing a level on Lunar Panda. The garbage collector tends to jump in after about 1Mb of allocations, so as the game runs at about 60 FPS, and a level will last at most a few minutes, we can calculate how much memory it's 'safe' to allocate per frame.

  2. Force a garbage collection to occur at a time the gamer won't mind.
    In C# this is done using a call to GC.Collect().

    For our game a good time to do this is just before a new level starts but AFTER all the resources for that level are loaded. You won't get good results if you force a GC and then load your resources! You might only get a few seconds into the game before your allocation count triggers the garbage collector to step in!
As an example of how a simple line of C# can cause a greater-than-expected number of garbage collections, consider my recent finding...
Every frame we display the angle of Mr Panda on the screen. The amount it changes is usually quite small, so at 60 FPS we end up drawing exactly the same text for multiple frames.
Our code looked something like this:
spriteBatch.DrawString(m_scoreFont, "Angle:" + angle, origin, Color.Yellow);
Doesn't look too bad, no?
But the CLR Profiler reported that of all the allocations we were making, 'string' objects were way up the list! Concatenating "Angle:" and its value triggered some internal string allocations. And then converting the 'angle' int to a string every frame triggered yet more! How do we solve this? The method I opted for was to add an 'AngleString' property. Its 'getter' only bothers generating a new string if the angle has actually changed, otherwise it just returns its previous value.
public string AngleString
{
get
{
if (m_angleString == null || m_angle != m_lastDisplayedAngle)
{
m_angleString = string.Format("Angle:" + m_angle);
m_lastDisplayedAngle = m_angle;
}
return m_angleString;
}
}
Bingo! Considerably less allocations!
Another 'hidden' overhead was caused by my love of LINQ. I use ReSharper from JetBrains (http://www.jetbrains.com/resharper/ - If you can afford it, get it. Seriously.), and it is very good at reporting 'code smells' and offering LINQ-based alternatives to common coding patterns. LINQ is great for hiding complexity and making code more readable, but there is a small runtime overhead in using it. By occasionally avoiding use of LINQ's Select() and Any() routines we have saved a good deal of memory allocations, especially in areas such as our per-pixel collision detection code.
Now I'm off for some more optimising, Good luck!