Building a Smarter Foundation: Math Improvements in Lumberyard
Authored by Karl Berg, Principal Engineer on Amazon Lumberyard
Nothing pushes the computational power of modern computers quite like real-time 3D games. From the vector algebra used to transform 3D scenes to 2D screen space, to the visibility queries used for rendering and AI, to the deformation of surfaces around bones for characters, to the integrators used to solve complex chains of physics constraints: game engines require a robust and extremely efficient mathematical foundation to build these complex systems.
Due to the historical origins of Lumberyard 1.x, what we had was a mix of legacy CryEngine math mixed together with a math library inherited from our Double Helix acquisition, along with some Lumberyard-specific additions sprinkled throughout. Over time, we realized our old math library had a lot of shortcomings, including: highly inconsistent API design that confused developers and led to unpredictable results; outdated concepts around compiler auto-vectorization; questionable and undocumented usage of estimate instructions; hardcoded intrinsics woven into class implementations that made updates for modern CPUs (like ARM variants) a real pain; and an archaic transform representation that required far too much effort to integrate into modern game feature implementations.
To support next generation Lumberyard development, we’ve addressed the above points with some impressive results in terms of improvements to performance and accuracy.
Here is a quick overview of some of the changes that have been made:
- Awkward methods have been deprecated and removed.
Approxoverloads of mathematical functions have been removed. Where an estimate instruction showed performance benefit, we’ve added Estimate operations instead. All functions not marked Estimate are full precision.
AZ::Transformhas been decomposed so it now contains individual
- We have a brand new C-style SIMD layer, accessible through the
AZ::Simdnamespace which now supports ARM Neon in addition to SSE on x64.
- We have a new set of fast performing SIMD trig functions with much improved accuracy, including:
atan2which work across CPU architectures due to the new
- We’ve added an
AZ::Frustumtype and implemented a robust set of visibility operations required for our next generation networking, streaming, and rendering efforts.
- Over 1,000 unit tests have been added or refactored to validate the correctness of the new math library.
- We’ve added more than 450 microbenchmarks to validate and monitor performance.
The public interfaces for
Matrix4x4 were often quite inconsistent, with functionality organically grafted into these classes over many years. This often led to to surprise when methods on one vector or matrix type either didn’t exist or behaved differently than the methods on a different vector or matrix type. We have performed a full audit of these classes, and have normalized the public interfaces to remove any surprises customers might have when mixing between vector or matrix types.
Awkward operators have been removed from these classes. This includes naked equality and product operators where the meaning of such operations were unclear. In our new implementation, callers should explicitly use the
TransformPoint methods to resolve any ambiguity in what the operations are actually doing.
Finally, poorly defined and confusing overloads, such as
NormalizeApprox, have been audited and condensed. Where appropriate estimate instructions exist and show meaningful performance advantages, we have added an
Estimate overload that exposes the reduced precision instruction, but by default all operations not marked Estimate are now full precision. We’ve optimized to an extent that there are still performance improvements with the new code, despite the increased accuracy.
Due to historical reasons, the
Transform class in Lumberyard stores its internal state in a 3×4 matrix rather than in distinct translation, orientation, and scale fields. As most gameplay systems expect separate translation and orientation values, we incurred a large cost decomposing the matrix representation into distinct values, and then converting back to matrix form again upon fetching results. Such gameplay systems included physics, AI, networking, and audio. In addition, the matrix representation allowed invalid transform states to be stored, such as matrices with skew or singular matrices. Because of this, we have removed matrix oriented accessors from the
AZ::Transform API and moved its internal storage to distinct translation, orientation, and scale values. This allows optimal interaction with gameplay code, physics, AI, pathfinding, networking, and other subsystems that expect distinct values, and we support an efficient conversion to
Matrix3x4 for use-cases like rendering where the matrix form is preferable to the decomposed form.
Our rewrite includes types and operations for
Vec4 float and integral types. This SIMD layer has scalar fallback, SSE 4.1, and ARM Neon backends, and makes it easy for future support of AVX-512 and other enhanced register-width SIMD extensions. Anyone familiar with the SIMD intrinsics used for the VMX execution units on PowerPC architectures should find the new
AZ::Simd layer somewhat familiar.
The new SIMD trig methods are significantly more accurate than the previous SIMD
cosine methods, and are faster by a considerable margin as well.
VectorFloat SIMD wrapper from Lumberyard 1.x has also been removed, as benchmarking using modern compilers confirmed that it provided no benefit and actually harmed performance in many situations, and only served to make calling code more complex.
Testing and benchmarking
We now have slightly more than 1,000 math unit tests that have very close to 100% coverage across the entirety of the math library. These test the performance of the various math classes and APIs, which we’ve been using to validate the performance impact of all of our changes.
Both the new and old SIMD transcendental functions have been tested for numerical accuracy. We will go into greater detail on this in a future post.
We’ve written over 450 new microbenchmarks to monitor the performance of the vast majority of the Lumberyard math API. We used these benchmarks to validate that work done on the new math library resulted in performance improvements over the old math library.
Overall, we’ve made a number of improvements to the mathematical foundations of Lumberyard in the last year; changes that we are excited to roll out to you in a future release. The end result will be faster frame rates, better platform support especially on ARM devices, higher accuracy, and more compute resources available for animation, special effects, and gameplay. Additionally, this new math library enables us to deliver some exciting new features, as well as backfill some longstanding gaps in the engines functionality.
Let us know your thoughts in the comments, and look for a follow-up on this topic soon!