featured image
JingleSmash

Jingle Smash: Performance Work

   - 

This is part 5 of my series on how I built Jingle Smash, a block smashing WebVR game

Performance was the final step to making Jingle Smash, my block tumbling VR game, ready to ship. WebVR on low-end mobile devices like the Oculus Go can be slow, but with a little work we can at least get over a consistent 30fps, and usually 45 or above. Here's the steps I used to get Jingle Smash working well.

Merge Geometry

I learned from previous demos like my Halloween game that the limiting factor on a device like the Oculus Go isn't texture memory or number of polygons. No, the limiting factor is draw calls. In general we should try to keep draw calls under 100, preferably a lot under.

One of the easiest way to reduce draw calls is to combine multiple objects into one. If two objects have the same material (even if they use different UVs for the textures), then you can combine their geometry into one object. However, this is generally only effective for geometry that won't change.

In Jingle Smash the background is composed of multiple cones and spheres that make up the trees and hills. They don't move so they are a good candidate for geometry merging. Each color of cone trees uses the same texture and material so I was able to combine them all into a single object per color. Now 9 draw calls become two.


const tex = game.texture_loader.load('./textures/candycane.png')
tex.wrapS = THREE.RepeatWrapping
tex.wrapT = THREE.RepeatWrapping
tex.repeat.set(8,8)

const background = new THREE.Group()

const candyCones = new THREE.Geometry()
candyCones.merge(new THREE.ConeGeometry(1,10,16,8).translate(-22,5,0))
candyCones.merge(new THREE.ConeGeometry(1,10,16,8).translate(22,5,0))
candyCones.merge(new THREE.ConeGeometry(1,10,16,8).translate(7,5,-30))
candyCones.merge(new THREE.ConeGeometry(1,10,16,8).translate(-13,5,-20))
background.add(new THREE.Mesh(candyCones,new THREE.MeshLambertMaterial({ color:'white', map:tex,})))

const greenCones = new THREE.Geometry()
greenCones.merge(new THREE.ConeGeometry(1,5,16,8).translate(-15,2,-5))
greenCones.merge(new THREE.ConeGeometry(1,5,16,8).translate(-8,2,-28))
greenCones.merge(new THREE.ConeGeometry(1,5,16,8).translate(-8.5,0,-25))
greenCones.merge(new THREE.ConeGeometry(1,5,16,8).translate(15,2,-5))
greenCones.merge(new THREE.ConeGeometry(1,5,16,8).translate(14,0,-3))
background.add(new THREE.Mesh(greenCones,new THREE.MeshLambertMaterial({color:'green', map:tex,})))

The hills also use only a single material (white with lambert reflectance) so I combined them into a single object as well.


const dome_geo = new THREE.Geometry()
//left
dome_geo.merge(new THREE.SphereGeometry(6).translate(-20,-4,0))
dome_geo.merge(new THREE.SphereGeometry(10).translate(-25,-5,-10))
//right
dome_geo.merge(new THREE.SphereGeometry(10).translate(30,-5,-10))
dome_geo.merge(new THREE.SphereGeometry(6).translate(27,-3,2))

//front
dome_geo.merge(new THREE.SphereGeometry(15).translate(0,-6,-40))
dome_geo.merge(new THREE.SphereGeometry(7).translate(-15,-3,-30))
dome_geo.merge(new THREE.SphereGeometry(4).translate(7,-1,-25))

//back
dome_geo.merge(new THREE.SphereGeometry(15).translate(0,-6,40))
dome_geo.merge(new THREE.SphereGeometry(7).translate(-15,-3,30))
dome_geo.merge(new THREE.SphereGeometry(4).translate(7,-1,25))

background.add(new THREE.Mesh(dome_geo,new THREE.MeshLambertMaterial({color:'white'})))

Texture Compression

The next big thing I tried was texture compression. Before I started this project I thought texture compression enabled textures to be uploaded to the GPU faster and take up less RAM, so the init time would be reduced but drawing speed would be un-affected. How wrong I was!

Texture compression is a very special form of compression that makes the texture images fast to decompress. They are stored compressed in GPU memory then decompressed when accessed. This means less memory must be accessed so memory download becomes faster at the cost of doing decompression. However, GPUs have special hardware for decompression so that part becomes free.

Second, the texture compression formats are specifically designed to fit well into GPU core caches and be able to decompress just a portion of a texture at a time. In some cases this can reduce drawing time by an order of magnitude.

Texture compression is clearly a win, but it does have a downside. The formats are designed to be fast to decompress at the cost of being very slow to do the initial compression. And I don't mean two or three times slower. It can take many minutes to compress a texture in some of the newer formats. This means texture compression must be done offline, and can't be used for textures generated on the fly like I did for most of the game.

So, sadly, texture compression wouldn't help me much here. The big sky image with clouds could benefit but almost nothing else will. Additonally, every GPU supports different formats so I'd have to compress the image multiple times. WebGL2 introduces some new common formats that are supported on most GPUs, but currently ThreeJS doesn't use WebGL2.

In any case, when I tried compressing the sky it essentially made no difference, and I didn't know why. I started measure different parts of my game loop and discovered that the rendering time was only a fraction of my loop. My game is slow because of CPU stuff, not GPU, so I stopped worrying about Texture Compression for this project.

Raycasting

I noticed while playing my game that peformance would be reduced whenever I pointed the ornament slingshot towards the floor. I thought that was very odd, so I did some more measurements. It turns out I was wasting many milliseconds, on raycasting. I knew my raycasting wasn't as fast as it could be, buy why would it be slower pointed at the floor when it shouldn't intersect anything but the snow?

The default ThreeJS Raycaster is recursive. It will loop through every object in the scene from the root you provide to the intersectObject() function. Alternatively you can turn off recursion and it will check just the object passed in.

I use the Raycaster in my Pointer abstraction which is designed to be useful for all sorts of applications, so it recurses through the whole tree. More importantly, it starts at the scene, so it is recursing through the entire scene graph. I did provide a way to filter objects from being selected, but that doesn't affect the recursion, just the returned list.

Think of it like this: the scene graph is like a tree. By default the raycaster has to look at every branch and every leaf on the entire tree, even if I (the programmer) know that the possible targets are only in one part of the tree. What I needed was a way to tell the raycaster which entire branches could be safely skipped: like the entire background.

Raycaster doesn't provide a way to customize its recursive path but since ThreeJS is open source so I just made a copy and added a property called recurseFilter. This is a function that the raycaster calls on every branch with the current Object3D. It should return false if the raycaster can skip that branch.

For Jingle Smash I used the filter like this:

const pointer_opts = {
    //Pointer searches everything in the scene by default
    //override this to match just certain things
    intersectionFilter: ((o) => o.userData.clickable),
    //eliminate from raycasting recursion to make it faster
    recurseFilter: (o) => {
        if(o.userData && o.userData.skipRaycast === true) return false
        return true
    },
... // rest of the options
}

Now I can set userData.skipRaycast to true on anything I want. For Jingle Smash I skip raycasting on the camera, the sky sphere, the slingshot itself, the particle system used for explosions, the lights, and everything in the background (hills and cones). These changes dropped the cost of raycasting from sometimes over 50 milliseconds to always less than 1 ms. The end result was at least 10fps improvement.

Future work

I'm pretty happy with how the game turned out. In the future the main change I'd like to make is to find a faster physics engine. WebAssembly is enabling more of the C/C++/Rust physics engines to compiled for the web, so I will be able to switch to one of those at some point in the future.

The standard WebXR boilerplate I've been using for the past six months is starting to show it's age. I plan to rewrite it from scratch to better handle common use cases, and integrate the raycaster hack. It will also switch to be fully ES6 module compliant now that browsers support it everywhere and ThreeJS itself and some of it's utils are being ported to modules. (check out the jsm directory of the ThreeJS examples).