Using (Euclidean) AD in Manopt.jl

Since Manifolds.jl 0.7 the support of automatic differentiation support has been extended.

This tutorial explains how to use Euclidean tools to derive a gradient for a real-valued function $F\colon \mathcal M → ℝ$. We will consider two methods: an intrinsic variant and a variant employing the embedding. These gradients can then be used within any gradient based optimisation algorithm in Manopt.jl.

While by default we use FiniteDifferences.jl, you can also use FiniteDiff.jl, ForwardDiff.jl, ReverseDiff.jl, or Zygote.jl.

In this Notebook we will take a look at a few possibilities to approximate or derive the gradient of a function $f:\mathcal M \to ℝ$ on a Riemannian manifold, without computing it yourself. There is mainly two different philosophies:

  1. Working instrinsically, i.e. stay on the manifold and in the tangent spaces. Here, we will consider approximating the gradient by forward differences.

  2. Working in an embedding – there we can use all tools from functions on Euclidean spaces – finite differences or automatic differenciation – and then compute the corresponding Riemannian gradient from there.

Let's first load all packages we need.

using Manifolds, Manopt, Random, LinearAlgebra
using FiniteDiff, ReverseDiff

1. (Intrinsic) Forward Differences

A first idea is to generalise (multivariate) finite differences to Riemannian manifolds. Let $X_1,\ldots,X_d ∈ T_p\mathcal M$ denote an orthonormal basis of the tangent space $T_p\mathcal M$ at the point $p∈\mathcal M$ on the Riemannian manifold.

We can generalise the notion of a directional derivative, i.e. for the “direction” $Y∈T_p\mathcal M$ let $c\colon [-ε,ε]$, $ε>0$, be a curve with $c(0) = p$, $\dot c(0) = Y$ and we obtain

$$ Df(p)[Y] = \frac{\mathrm{d}}{\mathrm{d}t} f(c(t)) = \lim_{h \to 0} \frac{1}{h}(f(\exp_p(hY))-f(p))$$

We can approximate $Df(p)[X]$ by a finite difference scheme for an $h>0$ as

$$DF(p)[Y] ≈ G_h(Y) := \frac{1}{h}(f(\exp_p(hY))-f(p))$$

Furthermore the gradient $\operatorname{grad}f$ is the Riesz representer of the differential, ie.

$$ Df(p)[Y] = g_p(\operatorname{grad}f(p), Y),\qquad \text{ for all } Y ∈ T_p\mathcal M$$

and since it is a tangent vector, we can write it in terms of a basis as

$$ \operatorname{grad}f(p) = \sum_{i=1}^{d} g_p(\operatorname{grad}f(p),X_i)X_i = \sum_{i=1}^{d} Df(p)[X_i]X_i$$

and perform the approximation from above to obtain

$$ \operatorname{grad}f(p) ≈ \sum_{i=1}^{d} G_h(X_i)X_i$$

for some suitable step size $h$.This comes at the cost of $d+1$ function evaluations and $d$ exponential maps.

This is the first variant we can use. An advantage is, that it is intrinsic in the sense that it does not require any embedding of the manifold.

An Example: The Rayleigh Quotient

The Rayleigh quotient is concerned with finding Eigenvalues (and Eigenvectors) of a symmetric matrix $A\in ℝ^{(n+1)×(n+1)}$. The optimisation problem reads

$$F\colon ℝ^{n+1} \to ℝ,\quad F(\mathbf x) = \frac{\mathbf x^\mathrm{T}A\mathbf x}{\mathbf x^\mathrm{T}\mathbf x}$$

Minimizing this function yields the smallest eigenvalue $\lambda_1$ as a value and the corresponding minimizer $\mathbf x^*$ is a corresponding eigenvector.

Since the length of an eigenvector is irrelevant, there is an ambiguity in the cost function. It can be better phrased on the sphere $𝕊^n$ of unit vectors in $\mathbb R^{n+1}$, i.e.

$$\operatorname*{arg\,min}_{p \in 𝕊^n} f(p) = \operatorname*{arg\,min}_{p \in 𝕊^n} p^\mathrm{T}Ap$$

We can compute the Riemannian gradient exactly as

$$\operatorname{grad} f(p) = 2(Ap - pp^\mathrm{T}Ap)$$

so we can compare it to the approximation by finite differences.

    n = 200
    A = randn(n + 1, n + 1)
    A = Symmetric(A)
    M = Sphere(n)
f1(p) = p' * A'p
f1 (generic function with 1 method)
gradf1(p) = 2 * (A * p - p * p' * A * p)
gradf1 (generic function with 1 method)

Manifolds provides a finite difference scheme in Tangent spaces, that you can introduce to use an existing framework (if the wrapper is implemented) form Euclidean space. Here we use FiniteDiff.jl.

r_backend = Manifolds.TangentDiffBackend(Manifolds.FiniteDiffBackend())
TangentDiffBackend{Manifolds.FiniteDiffBackend{Val{:central}}, ExponentialRetraction, LogarithmicInverseRetraction, DefaultOrthonormalBasis{ℝ, ManifoldsBase.TangentSpaceType}}(Manifolds.FiniteDiffBackend{Val{:central}}(Val{:central}()), ExponentialRetraction(), LogarithmicInverseRetraction(), DefaultOrthonormalBasis(ℝ))
gradf1_FD(p) = Manifolds.gradient(M, f1, p, r_backend)
gradf1_FD (generic function with 1 method)
    p = zeros(n + 1)
    p[1] = 1.0
    X1 = gradf1(p)
    X2 = gradf1_FD(p)
    norm(M, p, X1 - X2)

We obtain quite a good approximation of the gradient.

2. Conversion of an Euclidean Gradient in the Embedding to a Riemannian Gradient of an (not necessarily isometrically) embedded Manifold

Let $\tilde f\colon\mathbb R^m \to \mathbb R$ be a function in the embedding of an $n$-dimensional manifold $\mathcal M \subset \mathbb R^m$ and $f\colon \mathcal M \to \mathbb R$ denote the restriction of $\tilde f$ to the manifold $\mathcal M$.

Since we can use the push forward of the embedding to also embed the tangent space $T_p\mathcal M$, $p\in \mathcal M$, we can similarly obtain the differential $Df(p)\colon T_p\mathcal M \to \mathbb R$ by restricting the differential $D\tilde f(p)$ to the tangent space.

If both $T_p\mathcal M$ and $T_p\mathcal R^m$ have the same inner product, or in other words the manifold is isometrically embedded in $R^m$ (like for example the sphere $\mathbb S^n\subset\mathbb R^{m+1}$ then this restriction of the differential directly translates to a projection of the gradient, i.e.

$$\operatorname{grad}f(p) = \operatorname{Proj}_{T_p\mathcal M}(\operatorname{grad} \tilde f(p))$$

More generally we might have to take a change of the metric into account, i.e.

$$\langle \operatorname{Proj}_{T_p\mathcal M}(\operatorname{grad} \tilde f(p)), X \rangle = Df(p)[X] = g_p(\operatorname{grad}f(p), X)$$

or in words: we have to change the Riesz representer of the (restricted/projected) differential of $f$ ($\tilde f$) to the one with respect to the Riemannian metric. This is done using change_representer.

A continued Example

We continue with the Rayleigh Quotient from before, now just starting with the defintion of the Euclidean case in the embedding, the function $F$.

F(x) = x' * A * x / (x' * x);

The cost function is the same by restriction

f2(M, p) = F(p);

The gradient is now computed combining our gradient scheme with ReverseDiff.

function grad_f2_AD(M, p)
    return Manifolds.gradient(
        M, F, p, RiemannianProjectionBackend(Manifolds.ReverseDiffBackend())
grad_f2_AD (generic function with 1 method)
X3 = grad_f2_AD(M, p)
201-element Vector{Float64}:
norm(M, p, X1 - X3)

An Example for a nonisometrically embedded Manifold

on the manifold $\mathcal P(3)$ of symmetric positive definite matrices.

The following function computes (half) the distance squared (with respect to the linear affine metric) on the manifold $\mathcal P(3)$ to the identity, i.e. $I_3$. denoting the unit matrix we consider the function

$$ G(q) = \frac{1}{2}d^2_{\mathcal P(3)}(q,I_3) = \lVert \operatorname{Log}(q) \rVert_F^2,$$

where $\operatorname{Log}$ denotes the matrix logarithm and $\lVert \cdot \rVert_F$ is the Frobenius norm. This can be computed for symmetric positive definite matrices by summing the squares of the $\log$arithms of the eigenvalues of $q$ and divide by two:

G(q) = sum(log.(eigvals(Symmetric(q))) .^ 2) / 2
G (generic function with 1 method)

We can also interpret this as a function on the space of matrices and apply the Euclidean finite differences machinery; in this way we can easily derive the Euclidean gradient. But when computing the Riemannian gradient, we have to change the representer (see again change_representer) after projecting onto the tangent space $T_p\mathcal P(n)$ at $p$.

Let's first define a point and the manifold $N=\mathcal P(3)$.

rotM(α) = [1.0 0.0 0.0; 0.0 cos(α) sin(α); 0.0 -sin(α) cos(α)]
rotM (generic function with 1 method)
q = rotM(π / 6) * [1.0 0.0 0.0; 0.0 2.0 0.0; 0.0 0.0 3.0] * transpose(rotM(π / 6))
3×3 Matrix{Float64}:
 1.0  0.0       0.0
 0.0  2.25      0.433013
 0.0  0.433013  2.75
N = SymmetricPositiveDefinite(3)
is_point(N, q)

We could first just compute the gradient using FiniteDiff.jl, but this yields the Euclidean gradient:

FiniteDiff.finite_difference_gradient(G, q)
3×3 Matrix{Float64}:
 -1.83343e-11  0.0       0.0
  0.0          0.351481  0.0170005
  0.0          0.0       0.361296

Instead, we use the RiemannianProjectedBackend of Manifolds.jl, which in this case internally uses FiniteDiff.jl to compute a Euclidean gradient but then uses the conversion explained above to derive the Riemannian gradient.

We define this here again as a function grad_G_FD that could be used in the Manopt.jl framework within a gradient based optimisation.

function grad_G_FD(N, q)
    return Manifolds.gradient(
        N, G, q, RiemannianProjectionBackend(Manifolds.FiniteDiffBackend())
grad_G_FD (generic function with 1 method)
G1 = grad_G_FD(N, q)
3×3 Matrix{Float64}:
 -1.83343e-11  0.0       0.0
  0.0          1.86368   0.826856
  0.0          0.826856  2.81845

Now, we can agaon compare this to the (known) solution of the gradient, namely the gradient of (a half) the distance suqared, i.e. $G(q) = \frac{1}{2}d^2_{\mathcal P(3)}(q,I_3)$ is given by $\operatorname{grad} G(q) = -\operatorname{log}_q I_3$, where $\operatorname{log}$ is the logarithmic map on the manifold.

G2 = -log(N, q, Matrix{Float64}(I, 3, 3))
3×3 Matrix{Float64}:
 -0.0  -0.0       -0.0
 -0.0   1.86368    0.826856
 -0.0   0.826856   2.81845

Both terms agree up to $1.2×10^{-10}$:

norm(G1 - G2)
isapprox(M, q, G1, G2; atol=2 * 1e-10)

In this case we can not use ReverseDiff.jl, since it can not handle the eigvals! function that is called internally.


This tutorial illustrates how to use tools from Euclidean spaces, finite differences or automatic differentiation, to compute gradients on Riemannian manifolds. The scheme allows to use any differentiation framework within the embedding to derive a Riemannian gradient.