How are algorithms that use Jacobian matrices in vector algebra done in geometric algebra
such as multivariate Newton’s method?

for multivariate Newton’s method
In vector algebra, one solves a system with the Jacobian matrix and function vector and then subtracts that solution vector from the input vector of the previous iteration.

The course notes state that one can use matrices with geometric algebra but that transform matrices are better done with geometric products

Are there any geometric algebra alternatives to jacobian matrices or does one remain using Jacobians? I am asking this since Jacobians can be be seen as transform matrices that map an infintesmal piece of space from one space to another.

Also related is that counterpart of Jacobian determinants in geometric algebra, for purposes like integration by substitution where one divides the integrand by the determinant of the Jacobian matrix to account for changes in density when transforming from one space to another.

Welcome to the forum, thanks for joining my Siggraph talk, and great question. I still have to dig into your specific use case, but I recently wrote a paper together with Dr Dorst that presents a matrix-free implementation of the Levenberg Marquardt algorithm. It includes a description of the geometric algebra version of both the Jacobian and the Hessian matrices.

It is called “Geometric Algebra Levenberg-Marquardt”, published in springer LNCS or just request the full text from my research gate page here.

Thanks for sharing this paper. I found the geometrical perspective really interesting—in particular, the idea of viewing an updated point in parameter space as an intersection of several hyperplanes is something I hadn’t really considered before.

However, I think the discussion of performance/computational complexity and numerical stability is missing some important caveats.

If I’m not misunderstanding something, the implementation of solving linear systems by intersecting hyperplanes with a wedge product (equation 8, or the computation of delta in listing 2) will scale very badly for larger numbers of parameters. Wedging together n vectors one-by-one to form an n-blade requires n! multiplications and additions (it’s a lot like computing a determinant by expanding minors), and also requires exponential memory (because the basis of n/2 vectors has dimension binomial(n, n/2) ~ 2^n/sqrt(n)). This is fine for 2 or 3 parameters, but would become very rapidly infeasible for many more parameters than that.

Instead of wedging the vectors together one-by-one, you could consider all n of them together and compute their wedge product using determinants. Then, it seems like this method of solving linear systems is identical to Cramer’s Rule. Cramer’s rule is usually considered to be an inefficient way to solve large linear systems (but see this approach that makes it more competitive with other techniques).

So I think it would be good to emphasize that the technique as implemented in this paper is really only competitive when the number of parameters is very small.

Regarding numerical stability, the paper compares average error on random matrices with Cholesky decomposition, but I think it’s also important to consider how the algorithm behaves for poorly conditioned matrices. I think it’s common for non-linear models used in many fields to have very poorly conditioned Jacobians. The regularization parameter in LM should help here, but I think you don’t want your step size to be limited by poor conditioning if you don’t have to.

Welcome to the forum, and thank you so much for your feedback, your comments are all very valid, people should be happy to get you as a reviewer !

Thanks for sharing this paper. I found the geometrical perspective really interesting—in particular, the idea of viewing an updated point in parameter space as an intersection of several hyperplanes is something I hadn’t really considered before.

This geometric view of the problem is the core idea of the paper, and I feel I should’ve perhaps focused even more on that. (but then I noticed the option to include the AD, and quickly started bouncing into the page limit.)

Absolutely accurate. When it comes to performance the method as presented in the paper is only competetive for a small (3-4) number of parameters. (and as you suggest, once the hyperplanes are constructed, one can use a high-performant method to calculate the wedge, not limited to Cramers method, as the solution is interpreted projectively).

One advantage that unfortunately I had no room and time to dive into, is the geometry of singular matrices. In the classic approach, the LMA algorithm simply slides towards gradient descent, avoiding singularities using regularisation, and decreasing the step-size and thus the efficiency.

There is however a very specific geometry for singular matrices, in the GA approach one notices this when the wedge product becomes zero. Before it does, however, it has a very clear geometric meaning. A full rank (r=n) matrix represents the intersection point of your set of hyperplanes, while matrices of lower rank are intersection lines (r=n-1), planes (r=n-2), etc …

Instead of simply ignoring this information and leaning towards gradient descent, one could now update the current estimate by projecting it orthogonally on the line, plane, etc …

So in practice, especially ill-conditioned scenarios may actually benefit most from this geometric view, although it will take another paper and more research to get to the bottom of that. (as is the influence of other aspects like bad condition number etc …)

The field of linear algebra has had millions of man hours of effort poured into it, and so in many scenarios it really is clear exactly what combination of multiplications and additions leads to an efficient solution. One would arrive at that solution either way (as you demonstrate, it is easy to recognize the n-1 wedges as solving a matrix and using proven standard methods, without destroying the added geometric insight).

In my view, GA and LA are complementary, where LA gives you the opportunity to use a large body of work for optimal solutions (and ofcourse extra strength when dealing with problems in the general linear group), and GA gives you the geometric view and coordinate free approach (and extra strength when dealing with incidence relations and transformations from orthogonal groups like SO(n), SE(n) or the conformal group).