Coroutines
Strings and text

Asset auditing

Many of the problems found in real projects occur because of honest mistakes – temporary “test” changes and misclicks by a tired developer can silently add poorly-performing Assets, or change the import settings of existing Assets.

For any project of significant scale, it is best to have a first line of defense against human errors. It is relatively simple to write a small piece of code that ensures that no one can add a 4K uncompressed Texture to a project.

And yet, this is a surprisingly common problem. A 4K uncompressed Texture occupies 60 megabytes of memory. On a low-end mobile device, such as an iPhone 4S, it is dangerous to consume more than about 180–200 megabytes of memory. If added by mistake, this Texture inadvertently occupies between a third and a quarter of the application’s memory budget, and causes difficult-to-diagnose out-of-memory errors.

While it is now possible to track these issues down with the 5.3 Memory Profiler, it is arguably better to make sure such mistakes are impossible in the first place.

Using AssetPostprocessor

The AssetPostprocessor class in the Unity Editor can be used to enforce certain minimum standards on a Unity project. This class receives callbacks when Assets are imported. To use it, inherit from AssetPostprocessor and implement one or more OnPreprocess methods. Significant ones include:

  • OnPreprocessTexture

  • OnPreprocessModel

  • OnPreprocessAnimation

  • OnPreprocessAudio

See the Scripting Reference on AssetPostprocessor for more possible OnPreprocess methods.


public class ReadOnlyModelPostprocessor : AssetPostprocessor {

   public void OnPreprocessModel() {

        ModelImporter modelImporter =

 (ModelImporter)assetImporter;

        if(modelImporter.isReadable) {

            modelImporter.isReadable = false;

            modelImporter.SaveAndReimport();

        }

    }

}

This is a simple example of an AssetPostprocessor enforcing rules on a project:

This class is called any time a model is imported into the project, or whenever a model has its import settings changed. The code simply checks to see if the Read/Write enabled flag (represented by the isReadable property) is set to true. If it is, it forces the flag to be false and then saves and reimports the Asset.

Note that calling SaveAndReimport causes this snippet of code to be called again! However, because it is now assured that isReadable is false, this code does not produce an infinite reimport loop.

The reason for this change is discussed in the “Models” section, below.

Common Asset rules

Textures

Disable the read/write enabled flag

The Read/Write enabled flag causes a Texture to be kept twice in memory: once on the GPU and once in CPU-addressable memory(1) (NOTE: This is because, on most platforms, readback from GPU memory is extremely slow. Reading a Texture from GPU memory into a temporary buffer for use by CPU code (e.g. Texture.GetPixel) would be very nonperformant.). In Unity, this setting is disabled by default, but it can be accidentally switched on.

Read/Write Enabled is only necessary when manipulating Texture data outside of Shaders (such as with the Texture.GetPixel and Texture.SetPixel APIs) and should be avoided whenever possible.

Disable Mipmaps if possible

For objects that have a relatively invariant Z-depth relative to the Camera, it is possible to disable mipmaps to save about a third of the memory required to load the Texture. If an object changes Z-depth, disabling mipmaps can lead to worse Texture sampling performance on the GPU.

In general, this is useful for UI Textures and other Textures that are displayed at a constant size on screen.

Compress all Textures

Using a texture compression format suitable for the project’s target platform is crucial for saving memory.

If the selected texture compression format is unsuited to the target platform, Unity decompresses the Texture when the Texture is loaded, consuming both CPU time and an excessive amount of memory. This is most commonly a problem on Android devices, which often support vastly different texture compression formats depending on the chipset.

Enforce sensible Texture size limits

While this is simple, it is also very easy to forget to resize a Texture or to inadvertently alter the Texture size import setting. Determine a sensible maximum size for different types of Textures and enforce these via code.

For many mobile applications, 2048x2048 or 1024x1024 is sufficient for Texture atlases, and 512x512 is sufficient for Textures applied to 3D models.

Models

Disable the Read/Write enabled flag

TheRead/Write enabled flag for models operates identically to the one described for Textures. However, it is enabled by default for models.

Unity requires this flag to be enabled if a project is modifying a Mesh at runtime via script, or if the Mesh is used as the basis for a MeshCollider component. If the model is not used in a MeshCollider and is not manipulated by scripts, disable this flag to save half of the model’s memory.

Disable rigs on non-character models

By default, Unity imports a generic rig for non-character models. This causes an Animator component be to added if the model is instantiated at runtime. If the model is not animated via the Animation system, this adds unnecessary overhead to the animation system, because all active Animators must be ticked once per frame.

Disable the rig on non-animated models to avoid this automatic addition of an Animator component and possible inadvertent addition of unwanted Animators to a Scene.

Enable the Optimize Game Objects option on animated models

The Optimize Game Objects option has a significant performance impact on animated models. When the option is disabled, Unity creates a large transform hierarchy mirroring the model’s bone structure whenever the model is instantiated. This transform hierarchy is expensive to update, especially if other components (such as Particle Systems or Colliders) are attached to it. It also limits Unity’s ability to multithread Mesh skinning and bone animation calculations.

If specific locations on a model’s bone structure need to be exposed (such as exposing a model’s hands in order to dynamically attach weapon models) then these locations can be specifically whitelisted in the Extra Transforms list.

Some additional details can be found in the Unity Manual page on the Model Importer.

Use Mesh compression when possible

Enabling Mesh compression reduces the number of bits used to represent the floating-point numbers for different channels of a model’s data. This can lead to a minor loss of precision, and the effects of this imprecision should be checked by artists before use in a final project.

The specific numbers of bits that a given compression level uses are detailed in the ModelImporterMeshCompression Scripting Reference.

Note that it is possible to use different levels of compression for different channels, so a project may choose to compress only tangents and normals while leaving UVs and vertex positions uncompressed.

Note: Mesh Renderer Settings

When adding Mesh Renderers to Prefabs or GameObjects, take note of the settings on the component. By default, Unity enables Shadow casting and receiving, Light Probe sampling, Reflection Probe sampling, and Motion Vector calculation.

If a project does not require one or more of these features, ensure that they’re turned off by an automated script. Any runtime code that adds MeshRenderers needs to toggle these settings as well.

For 2D games, accidentally adding a MeshRenderer to the Scene with the shadow options turned on adds a full shadow pass to the rendering loop. This is generally a waste of performance.

Audio

Platform-appropriate compression settings

Enable a compression format for audio that matches the available hardware. All iOS devices include hardware MP3 decompressors, and many Android devices support Vorbis natively.

Further, import uncompressed audio files into Unity. Unity always recompresses audio when building a project. There is no need to import compressed audio and then recompress it; this only serves to degrade the quality of the final audio clips.

Force audio clips to mono

Few mobile devices actually have stereo speakers. On a mobile project, forcing the imported audio clips to be mono-channel halves their memory consumption. This setting is also applicable to any audio without stereo effects, such as most UI sound effects.

Reduce audio bitrate

While this requires consultation with an audio designer, attempt to minimize the bitrate of audio files in order to further save on memory consumption and built project size.

Understanding garbage collection & the managed heap

Another common problem faced by many Unity developers is the unexpected expansion of the managed heap. In Unity, the managed heap expands much more readily than it shrinks. Furthermore, Unity’s garbage collection strategy tends to fragment memory, which can prevent a large heap from shrinking.

Technical Details: how the managed heap operates & why it expands

The “managed heap” is a section of memory that is automatically managed by the memory manager of a project’s scripting runtime (Mono or IL2CPP). All objects created in managed code must be allocated on the managed heap(2) (NOTE: Strictly speaking, all non-null reference-typed objects and all boxed value-typed objects must be allocated on the managed heap).

In the above diagram, the white box represents a quantity of memory apportioned to the managed heap, and the colored boxes within it represent data values stored within the managed heap’s memory space. When additional values are needed, more space is allocated from within the managed heap.

The garbage collector runs periodically(3) (NOTE: The exact timing is platform-dependent). This sweeps through all objects on the heap, marking for deletion any objects that are no longer referenced. Unreferenced objects are then deleted, freeing up memory.

Crucially, Unity’s garbage collection – which uses the Boehm GC algorithm – is non-generational and non-compacting. “Non-generational” means that the GC must sweep through the entire heap when performing a collection pass, and its performance therefore degrades as the heap expands. “Non-compacting” means that objects in memory are not relocated in order to close gaps between objects.

The above diagram shows an example of memory fragmentation. When an object is released, its memory is freed. However, the freed space does not become part of a single large pool of “free memory”. The objects on either side of the freed object may still be in use. Because of this, the freed space is a “gap” between other segments of memory (this gap is indicated by the red circle in the diagram). The newly-freed space can therefore only be used to store data of identical or lesser size than the freed object.

When allocating an object, remember that the object must always occupy a contiguous block of space in memory.

This leads to the core problem of memory fragmentation: while the overall amount of space available in the heap may be substantial, it is possible that some or all of that space is in small “gaps” between allocated objects. In this case, even though there may be enough total space to accommodate a certain allocation, the managed heap cannot find a large enough block of contiguous memory in which to fit the allocation.

However, if a large object is allocated and there is insufficient contiguous free space to accommodate the object, as illustrated above, the Unity memory manager performs two operations.

First, if it has not already done so, the garbage collector runs. This attempts to free up enough space to fulfill the allocation request.

If, after the GC runs, there is still not enough contiguous space to fit the requested amount of memory, the heap must expand. The specific amount that the heap expands is platform-dependent; however, most Unity platforms double the size of the managed heap.

Key problems with the heap

The core issues with managed heap expansion are twofold:

  • Unity does not often release the memory pages allocated to the managed heap when it expands; it optimistically retains the expanded heap, even if a large portion of it is empty. This is to prevent the need to re-expand the heap should further large allocations occur.

  • On most platforms, Unity eventually releases the pages used by empty portions of the managed heap back to the operating system. The interval at which this occurs is not guaranteed and should not be relied upon.

  • The address space used by the managed heap is never returned to the operating system.

  • For 32-bit programs, this can lead to address space exhaustion if the managed heap expands and contracts many times. If a program’s available memory address space is exhausted, the operating system will terminate the program.

  • For 64-bit programs, the address space is sufficiently large that this is extremely unlikely to occur for programs whose running time does not exceed the average human lifespan.

Temporary allocations

Many Unity projects are found to operate with several tens or hundreds of kilobytes of temporary data being allocated to the managed heap each frame. This is often extremely detrimental to a project’s performance. Consider the following math:

If a program allocates one kilobyte (1kb) of temporary memory each frame, and is running at 60 frames per second, then it must allocate 60 kilobytes of temporary memory per second. Over the course of a minute, this adds up to 3.6 megabytes of garbage in memory. Invoking the garbage collector once per second is likely to be detrimental to performance, but allocating 3.6 megabytes per minute is problematic when attempting to run on low-memory devices.

Further, consider loading operations. If a large number of temporary objects are generated during a heavy Asset-loading operation, and those objects are referenced until the operation completes, then the garbage collector is unable to release those temporary objects and the managed heap needs to expand – even though many of the objects it contains will be released a short time later.

Keeping track of managed memory allocations is relatively simple. In Unity’s CPU Profiler, the Overview has a “GC Alloc” column. This column displays the number of bytes allocated on the managed heap in a specific frame (4) (NOTE: Note that this is not identical to the number of bytes temporarily allocated during a given frame. The profile displays the number of bytes allocated in a specific frame, even if some/all of the allocated memory is reused in subsequent frames.). With the “Deep Profiling” option enabled, it’s possible to track down the method in which these allocations occur.

Note that some script methods cause allocations when running in the Editor, but do not produce allocations after the project has been built. GetComponent is the most common example; this method always allocates when executed in the Editor, but not in a built project.

In general, it is strongly recommended that all developers minimize managed heap allocations whenever the project is in an interactive state. Allocations during non-interactive operations, such as Scene loading, are less problematic.

Basic memory conservation

There are a handful of relatively simple techniques that can be employed to reduce managed heap allocations.

Collection and array reuse

When using C#’s Collection classes or Arrays, consider reusing or pooling the allocated Collection or Array whenever possible. The Collection classes expose a Clear method which eliminates the Collection’s values but does not release the memory allocated to the Collection.


void Update() {

    List<float> nearestNeighbors = new List<float>();

    findDistancesToNearestNeighbors(nearestNeighbors);

    nearestNeighbors.Sort();

    // … use the sorted list somehow …

}

This is particularly useful when allocating temporary “helper” Collections for complex computations. A very simple example might be the following code:

In this example, the nearestNeighbors List is allocated once per frame in order to collect a set of data points. It’s very simple to hoist this List out of the method and into the containing class, which avoids allocating a new List each frame:


List<float> m_NearestNeighbors = new List<float>();

void Update() {

    m_NearestNeighbors.Clear();

    findDistancesToNearestNeighbors(NearestNeighbors);

    m_NearestNeighbors.Sort();

    // … use the sorted list somehow …

}

In this version, the List’s memory is retained and reused across multiple frames. New memory is only allocated when the List needs to expand.

Closures & anonymous methods

There are two points to consider when using closures and anonymous methods.

First, all method references in C# are reference types, and are therefore allocated on the heap. Temporary allocations can be easily created by passing a method reference as an argument. This allocation occurs regardless of whether the method being passed is an anonymous method or a predefined one.

Second, converting an anonymous method to a closure significantly increases the amount of memory required to pass the closure to method receiving it.

Consider the following code:


List<float> listOfNumbers = createListOfRandomNumbers();

listOfNumbers.Sort( (x, y) =>

(int)x.CompareTo((int)(y/2)) 

);

This snippet uses a simple anonymous method to control the sorting order of the list of numbers created on the first line. However, if a programmer wished to make this snippet reusable, it is tempting to substitute the constant 2 for a variable in local scope, like so:


List<float> listOfNumbers = createListOfRandomNumbers();

int desiredDivisor = getDesiredDivisor();

listOfNumbers.Sort( (x, y) =>

(int)x.CompareTo((int)(y/desiredDivisor))

);

The anonymous method now requires the method to be able to access the state of a variable outside of the method’s scope, and so has become a closure. The desiredDivisor variable must be passed into the closure somehow so that it can be used by the actual code of the closure.

To do this, C# generates an anonymous class that can retain the externally-scoped variables needed by the closure. A copy of this class is instantiated when the closure is passed to the Sort method, and the copy is initialized with the value of the desiredDivisor integer.

Because executing the closure requires instantiation of a copy of its generated class, and all classes are reference types in C#, then executing the closure requires allocation of an object on the managed heap.

In general, it is best to avoid closures in C# whenever possible. Anonymous methods and method references should be minimized in performance-sensitive code, and especially in code that executes on a per-frame basis.

Anonymous methods under IL2CPP

Currently, inspection of code generated by IL2CPP reveals that the simple declaration and assignment of a variable of type System.Function allocates a new object. This is true whether the variable is explicit (declared in a method/class) or implicit (declared as an argument to another method).

As such, any use of anonymous methods under the IL2CPP scripting backend allocates managed memory. This is not the case under the Mono scripting backend.

Furthermore, IL2CPP displays dramatically different levels of managed memory allocation depending on the way in which a method argument is declared. Closures, as expected, allocate the most memory per call.

Unintuitively, predefined methods allocate nearly as much memory as closures when passed as arguments under the IL2CPP scripting backend. Anonymous methods generate the least amount of transient garbage on the heap, by one or more orders of magnitude.

Therefore, if a project is intended to ship on the IL2CPP scripting backend, there are three key recommendations:

  • Prefer coding styles that do not require passing methods as arguments.

  • When unavoidable, prefer anonymous methods over predefined methods.

  • Avoid closures, regardless of scripting backend.

Boxing

Boxing is one of the most common sources of unintended temporary memory allocations found in Unity projects. It occurs whenever a value-typed value is utilized as a reference type; this most often occurs when passing primitive value-typed variables (such as int and float) to object-typed methods.

In this extremely simple example, the integer in x is boxed in order to be passed to the object.Equals method, because the Equals method on object requires that an object be passed to it.


int x = 1;

object y = new object();

y.Equals(x);

C# IDEs and compilers generally do not issue warnings about boxing, even though it leads to unintended memory allocations. This is because the C# language was developed with the assumption that small temporary allocations would be efficiently handled by generational garbage collectors and allocation-size-sensitive memory pools.

While Unity’s allocator does use different memory pools for small and large allocations, Unity’s garbage collector is not generational and therefore cannot efficiently sweep out the small, frequent temporary allocations generated by boxing.

Boxing should be avoided wherever possible when writing C# code for Unity runtimes.

Identifying boxing

Boxing shows up in CPU traces as calls to one of a few methods, depending on the scripting backend in use. These generally take one of the following forms, where <some class> is the name of some other class or struct, and is some number of arguments:

  • <some class>::Box(…)

  • Box(…)

  • <some class>_Box(…)

It can also be located by searching the output of a decompiler or IL viewer, such as the IL viewer tool built into ReSharper or the dotPeek decompiler. The IL instruction is “box”.

Dictionaries and enums

One common cause of boxing is the use of enum types as keys for Dictionaries. Declaring an enum creates a new value type that is treated like an integer behind the scenes, but enforces type-safety rules at compile time.

By default, a call to Dictionary.add(key, value) results in a call to Object.getHashCode(Object). This method is used to obtain the appropriate hash code for the Dictionary’s key, and is used in all methods that accept a key: Dictionary.tryGetValue, Dictionary.remove, etc.

The Object.getHashCode method is reference-typed, but enum values are always value types. Therefore, for enum-keyed Dictionaries, every method call results in the key being boxed at least once.

The following code snippet illustrates a simple example that demonstrates this boxing problem:


enum MyEnum { a, b, c };

var myDictionary = 

new Dictionary<MyEnum, object>();

myDictionary.Add(MyEnum.a, new object());

To solve this problem, it is necessary to write a custom class that implements the IEqualityComparer interface and assign an instance of that class as the Dictionary’s comparer (NOTE: This object is usually stateless, and therefore can be reused with different Dictionary instances to save memory.)

The following is a simple example of an IEqualityComparer for the above code snippet.


public class MyEnumComparer : IEqualityComparer<MyEnum> {

    public bool Equals(MyEnum x, MyEnum y) {

        return x == y;

    }

    public int GetHashCode(MyEnum x) {

        return (int)x;

    }

}

An instance of the above class could be passed to the Dictionary’s constructor.

Foreach loops

In Unity’s version of the Mono C# compiler, use of the foreach loop forces Unity to box a value each time the loop terminates (NOTE: The value is boxed once each time the loop as a whole finishes executing. It does not box once per iteration of the loop, so memory usage remains the same regardless of whether the loop runs 2 times or 200 times.). This is because the IL generated by Unity’s C# compiler constructs a generic value-type Enumerator in order to iterate over the value collection.

This Enumerator implements the IDisposable interface, which must be called when the loop terminates. However, calling interface methods on value-typed objects (such as structs and Enumerators) requires boxing them.

Examine the following very simple example code:


int accum = 0;

foreach(int x in myList) {

    accum += x;

}

The above, when run through Unity’s C# compiler, produces the following Intermediate Language:


   .method private hidebysig instance void 

    ILForeach() cil managed 

  {

    .maxstack 8

    .locals init (

      [0] int32 num,

      [1] int32 current,

      [2] valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32> V_2

    )

    // [67 5 - 67 16]

    IL_0000: ldc.i4.0     

    IL_0001: stloc.0      // num

    // [68 5 - 68 74]

    IL_0002: ldarg.0      // this

    IL_0003: ldfld        class [mscorlib]System.Collections.Generic.List`1<int32> test::myList

    IL_0008: callvirt     instance valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<!0/*int32*/> class [mscorlib]System.Collections.Generic.List`1<int32>::GetEnumerator()

    IL_000d: stloc.2      // V_2

    .try

    {

      IL_000e: br           IL_001f

    // [72 9 - 72 41]

      IL_0013: ldloca.s     V_2

      IL_0015: call         instance !0/*int32*/ valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32>::get_Current()

      IL_001a: stloc.1      // current

    // [73 9 - 73 23]

      IL_001b: ldloc.0      // num

      IL_001c: ldloc.1      // current

      IL_001d: add          

      IL_001e: stloc.0      // num

    // [70 7 - 70 36]

      IL_001f: ldloca.s     V_2

      IL_0021: call         instance bool valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32>::MoveNext()

      IL_0026: brtrue       IL_0013

      IL_002b: leave        IL_003c

    } // end of .try

    finally

    {

      IL_0030: ldloc.2      // V_2

      IL_0031: box          valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32>

      IL_0036: callvirt     instance void [mscorlib]System.IDisposable::Dispose()

      IL_003b: endfinally   

    } // end of finally

    IL_003c: ret          

  } // end of method test::ILForeach

} // end of class test

The most relevant code is the __finally { … }__ block near the bottom. The callvirt instruction discovers the location of the IDisposable.Dispose method in memory before invoking the method, and requires that the Enumerator be boxed.

In general, foreach loops should be avoided in Unity. Not only do they box, but the method-call cost of iterating over collections via Enumerators is generally much slower than manual iteration via a for or while loop.

Note that the C# compiler upgrade in Unity 5.5 significantly improves Unity’s ability to generate IL. In particular, the boxing operations has been eliminated from foreach loops. This eliminates the memory overhead associated with foreach loops. However, the CPU performance difference compared to equivalent Array-based code remains, due to method-call overhead.

Array-valued Unity APIs

A more pernicious and less-visible cause of spurious array allocation is the repeated accessing of Unity APIs that return arrays. All Unity APIs that return arrays create a new copy of the array each time they are accessed. It is extremely non-optimal to access an array-valued Unity API more often than necessary.

As an example, the following code spuriously creates 4 copies of the vertices array per loop iteration. The allocations are occur each time the .vertices property is accessed.


for(int i = 0; i < mesh.vertices.Length; i++)

{

    float x, y, z;

    x = mesh.vertices[i].x;

    y = mesh.vertices[i].y;

    z = mesh.vertices[i].z;

    // ...

    DoSomething(x, y, z);   

}

This can be trivially refactored into a single array allocation, regardless of the number of loop iterations, by capturing the vertices array before entering the loop:


var vertices = mesh.vertices;

for(int i = 0; i < vertices.Length; i++)

{

    float x, y, z;

    x = vertices[i].x;

    y = vertices[i].y;

    z = vertices[i].z;

    // ...

    DoSomething(x, y, z);   

}

While the CPU cost of accessing a property once is not very high, repeated accesses within tight loops create CPU performance hotspots. Further, repeated accesses unnecessarily expand the managed heap.

This problem is extremely common on mobile, because the Input.touches API behaves similarly to the above. It is extremely common for projects to contain code similar to the following, where an allocation occurs each time the .touches property is accessed.


for ( int i = 0; i < Input.touches.Length; i++ )

{

   Touch touch = Input.touches[i];

    // …

}

This can, of course, be trivially improved by hoisting the array allocation out of the loop condition:


Touch[] touches = Input.touches;

for ( int i = 0; i < touches.Length; i++ )

{

   Touch touch = touches[i];

   // …

}

However, there are now versions of many Unity APIs that do not cause memory allocations. These should generally be favored, when they’re available.


int touchCount = Input.touchCount;

for ( int i = 0; i < touchCount; i++ )

{

   Touch touch = Input.GetTouch(i);

   // …

}

Converting the above example to the allocation-less Touch API is simple:

Note that the property access (Input.touchCount) is still kept outside the loop condition in order to save the CPU cost of invoking the property’s get method.

Empty array reuse

Some teams prefer to return empty arrays instead of null when an array-valued method needs to return an empty set. This coding pattern is common in many managed languages, particularly C# and Java.

In general, when returning a zero-length array from a method, it is considerably more efficient to return a pre-allocated singleton instance of the zero-length array than to repeatedly create empty arrays.(5) (NOTE: Naturally, an exception should be made when the array is resized after being returned.)

Footnotes

  • (1) This is because, on most platforms, readback from GPU memory is extremely slow. Reading a Texture from GPU memory into a temporary buffer for use by CPU code (e.g. Texture.GetPixel) would be very nonperformant.

  • (2) Strictly speaking, all non-null reference-typed objects and all boxed value-typed objects must be allocated on the managed heap.

  • (3) The exact timing is platform-dependent.

  • (4) Note that this is not identical to the number of bytes temporarily allocated during a given frame. The profile displays the number of bytes allocated in a specific frame, even if some/all of the allocated memory is reused in subsequent frames.

  • (5) Naturally, an exception should be made when the array is resized after being returned.

Coroutines
Strings and text