Part 2 - Multi Texture Batching

By Richard Davey on 25th November 2016   @photonstorm

In order to understand what multi-texture batch support does, it's worth explaining how a render pass works.

image

The Pixi Render Pass

Phaser uses a custom build of Pixi v2 internally. Pixi v2 supports Sprite batching under WebGL in order to render lots of sprites, really fast. The way batching works is this:

1) Every frame, Pixi starts at the root of the display list, which is the Phaser.Stage object. It then dives into the children of this, and keeps iterating until it finds a Display Object. With the object in hand, it looks at its BaseTexture. From this it takes a reference to the image being used, and binds it to the GPU. A new sprite batch is then started.

2) After uploading the vertices data of the Display Object it moves to the next item in the display list. If it's using the same BaseTexture then Happy Days. There's no need to bind a new texture, so it just adds the sprite data to the current batch, and carries on.

This continues until one of two things happens:

  1. It encounters a Sprite that has a different BaseTexture, or
  2. It hits 2000 objects in the batch, or

In each case the batch is flushed. Flushing is the process of sending all of the texture and vertices data that has been built-up to WebGL, and invoking a draw call - effectively rendering the sprites. The list of data is then emptied out.

After this happens, the next texture being used is bound to the GPU. The empty batch is re-populated again, and so it carries on, until the end of the display list is reached, and the render pass is over.

The whole process happens every frame. On modern browsers and GPUs that's 60 times per second.

Using Texture Atlases

Obviously you want to keep the number of draw calls, and WebGL operations, to an absolute minimum. This is why using texture atlases is so important.

You can have loads of sprites, all using different parts of the same image, as seen in the screen shot of Texture Packer below:

image

If two, or more, sprites all use different parts of the same atlas then it doesn't cause the batch to flush, because the underlying source image hasn't changed. So no new texture bind needs to take place. This means that consecutive sprites that share an atlas all get bundled into a single draw call. Perfect.

However, there are limits on the maximum texture sizes that GPUs support. An iPhone 6S with an A9 GPU has a maximum texture size of 4096 x 4096 pixels. Gaming level desktop GPUs can support much more, but if you want the widest number of players possible for your game, you can't go there. Use a texture size too large and lots of browsers simply won't display anything beyond black space where your images should have been.

While 4096 x 4096 is a quite decent texture size, it's highly unlikely you'd fit all of your game assets into it. The larger, and more complex the game, the more assets you need. With character animations, game back drops, UI, particles, etc you often need to use multiple texture atlases just to render a single scene.

And in some cases using an atlas isn't even desirable. For example a photographic style game backdrop may be a far smaller download as a JPG, rather than a PNG, which is the format a texture atlas really needs to be.

Breaking the Batch

It's actually incredible easy to code a really basic scene, using just a handful of sprites, that internally is an absolute nightmare for WebGL to render. Take the following code:

function create() {

    var keys = ['mushroom', 'clown', 'beball', 'coke', 'asuna', 
    'bikkuriman', 'bsquad1', 'bsquad2', 'bsquad3', 'car', 
    'carrot', 'duck', 'diamond', 'eggplant', 'firstaid'];

    var group = game.add.group();

    for (var i = 0; i < 210; i++)
    {
        group.create(0, 0, keys[i % 15]);
    }

    group.align(16, -1, 50, 44, Phaser.CENTER);

}

Here you can see that it's creating 210 sprites in a single Group. As the for loop runs it assigns each sprite one of 15 different textures from the keys array. These are just simple PNGs that were loaded in the preloader.

The end result is a relatively tiny number of sprites displayed (210), which is well under our batch limit. However due to the way they've been created in the display list the WebGL Sprite Batching is utterly unable to benefit from what it does.

Remember that sprite batching needs consecutive sprites to share the same texture. In our short code above though, because we alternate between different textures for each sprite, WebGL goes into a spin. And after every sprite, the batch is flushed, a new texture is bound, a draw call is made, and it starts all over again.

If you look at the results in a WebGL frame debugger, we're using FireFox Dev Tools here, it's shocking:

image

Because of the way we structured our display list, we generated 212 draw operations, and a staggering 1911 WebGL calls. Just from 210 tiny sprites.

Now I fully appreciate this is a contrived example, and you could easily fit all of the sprites in our test into a single texture atlas. Yet I don't believe we're straying far from reality here. Very often developers group the assets in their texture atlases based on the type of game element it is, rather than where it appears in the display list.

Once you've got a nicely animated main character, animated baddies, some explosions going on, maybe some bitmap text floating up, particles, UI buttons, and game scenery, I believe most games are already heavily mixing different images in their display lists, causing constant batch flushes.

The effect of Multi-Texturing

So, what can we do about it?

If the device supports it, WebGL is capable of using more than one texture at a time in a shader. This is entirely GPU dependent, but thankfully very easy to determine. An iPhone 5S with its A7 GPU can support up to 16 texture image units at once, in a single shader. Even a lowly iPhone 4 can support up to 8. The difference it makes is remarkable:

image

This is the exact same example as above, with multi-texturing enabled in Phaser. We've gone from 212 draw ops down to just 2, and one of those was clearing the screen. 1911 WebGL calls, down to 19.

It's painfully clear which one is better for overall performance.

In the next part we'll cover how to enable your Phaser games to use this.