[Project Thread] Klein, C++17 PGA

I’ll be using this as a project thread for updates on Klein.

Klein takes a different approach from GAL in that it specializes in \mathbf{P}(\mathbb{R}^*_{3, 0, 1}) (with \mathbf{P}(\mathbb{R}^*_{4, 0, 0}) for perspectivities coming soon hopefully). This enables me to optimize the code even more aggressively and I don’t have to rely on compiler autovectorization (which is still not in a state I would consider suitable for applications that are real-time and demand it).

As a result, you can see from the perf analysis page that Klein already is ~twice as fast as GLM and faster or equivalent to other existing solutions (not all the analysis has been published on that page yet, it’s something I need to automate). Note that the cycle counts there were simulated for Jaguar (PS4 chip) but I expect similar gains on other x86_64 platforms. In addition, Klein benefits from the GA formulation to support a host of operations not even supported in other libraries. A motor (dual quaternion) constant velocity interpolation (implemented in Klein in terms of the exp/log map) doesn’t exist in GLM, RTM, MathFu, DirectXMath, etc. for example.

I’ve only briefly started to translate the code to shader source code (first GLSL, then using spirv-cross to translate GLSL to HLSL).

To test Klein, I wrote an miniature CAS that does symbolic manipulation to help verify a whole host of operations comprehensively. The testing is bidirectional and results for each operation must agree for both the shipping library and the CAS. Compilation of the CAS (klein_shell) is optional.

I have a working subset of C-bindings that I hope will make it easy to bind to other languages like Javascript/Python/etc that would also benefit from the performance boost. This is still a work in progress as there are a number of operations missing, and this is another situation that would benefit from automation.

Documentation-wise, I’ve started drafting a “workbook” style guide for people learning GA and want to get practice with actual calculations and worked examples. The documentation in the API section of the site is autogenerated from source and tracks changes pushed to the repository very closely.

It’s hard to say exactly what should come next, as there’s so much to do! I think the first order of business for me is to implement the remaining missing functionality from libraries such as GLM and DirectXMath so I can truly say that Klein can replace them wholesale. A priority list for me is something like the following:

  1. GLM/DirectXMath parity
  2. More demos for common/interesting problems (camera control, IK, etc)
  3. GLSL/HLSL port
  4. SSE2/3 fallback (currently, I’m using _mm_dp_ps and _mm_blend_ps so this shouldn’t be too difficult.
  5. QoL improvements (automatic LLVM-MCA analysis, automatic shader tests, etc).
  6. Stability flag to prevent possible precision loss in situations where an implied cancellation was used for performance

For now though, I’m posting this because I believe what’s here is already immediately usable, and I hope it takes the game/graphics/etc community a bit closer to a GA formulation! Another goal of mine is to demonstrate that GA isn’t just a “replacement” for libraries that are based on the quat/dual-quat formulation. It’s a material improvement that enables far more operations/optimizations.

Last note, even though this is a project thread, anyone is welcome to provide feedback, ask questions, or comment.

Cheers,
Jeremy

5 Likes

Well, evening project is done. I now have an entirely separate codebase called mcruler (mc stands for machine code) which automatically integrates into toolchain to perform the cycle/dispatch simulation of arbitrarily fenced blocks of code using llvm-mca (LLVM’s machine code analyzer). This is now integrated in the perf subtree of the Klein source so I can easily compare and contrast segments of code from one target to another. The project has the lamest name I’ve ever come up with, entirely by accident but I’m going to roll with it.

1 Like

Hi Jeremy.

There exists a ‘public/klein/detail/x86’ directory.

(1) Does this mean klein’s implementation only works on a x86 architecture?

(2) Is it possible to cross-compile the C bindings in ./c_src to run on an Arm® Cortex®-M33 processor ? The Arm® Cortex®-M33 appears to support sse-200 which I think uses integer types (for DSP) instead of floats like SSE3 . The Arm® Cortex®-M33 does have a hardware floating point unit. My hope is to use PGA3D to fusion MARG sensor data.

(3) Does anyone provide a PGD3D library implemented in C?

Thanks,
Joe