# Using (Euclidean) AD in Manopt.jl

Since Manifolds.jl 0.7, the support of automatic differentiation support has been extended.

This tutorial explains how to use Euclidean tools to derive a gradient for a real-valued function $f\colon \mathcal M → ℝ$. We will consider two methods: an intrinsic variant and a variant employing the embedding. These gradients can then be used within any gradient based optimization algorithm in Manopt.jl.

While by default we use FiniteDifferences.jl, you can also use FiniteDiff.jl, ForwardDiff.jl, ReverseDiff.jl, or Zygote.jl.

In this Notebook we will take a look at a few possibilities to approximate or derive the gradient of a function $f:\mathcal M \to ℝ$ on a Riemannian manifold, without computing it yourself. There are mainly two different philosophies:

1. Working instrinsically, i.e. staying on the manifold and in the tangent spaces. Here, we will consider approximating the gradient by forward differences.

2. Working in an embedding – there we can use all tools from functions on Euclidean spaces – finite differences or automatic differenciation – and then compute the corresponding Riemannian gradient from there.

## Setup

If you open this notebook in Pluto locally it switches between two modes. If the tutorial is within the Manopt.jl repository, this notebook tries to use the local package in development mode. Otherwise, the file uses the Pluto pacakge management version. In this case, the includsion of images might be broken. unless you create a subfolder optimize and activate asy-rendering.

Since the loading is a little complicated, we show, which versions of packages were installed in the following.

with_terminal() do
Pkg.status()
end
�[32m�[1mStatus�[22m�[39m /private/var/folders/_v/wg192lpd3mb1lp55zz7drpcw0000gn/T/jl_hpuuJD/Project.toml
�[90m [26cc04aa] �[39mFiniteDifferences v0.12.26
�[90m [af67fdf4] �[39mManifoldDiff v0.2.1
�[90m [0fc0a36d] �[39mManopt v0.4.0 ~/Repositories/Julia/Manopt.jl
�[90m [7f904dfe] �[39mPlutoUI v0.7.49
�[90m [37e2e46d] �[39mLinearAlgebra
�[90m [44cfe95a] �[39mPkg v1.8.0
�[90m [9a3f8284] �[39mRandom


## 1. (Intrinsic) Forward Differences

A first idea is to generalize (multivariate) finite differences to Riemannian manifolds. Let $X_1,\ldots,X_d ∈ T_p\mathcal M$ denote an orthonormal basis of the tangent space $T_p\mathcal M$ at the point $p∈\mathcal M$ on the Riemannian manifold.

We can generalize the notion of a directional derivative, i.e. for the “direction” $Y∈T_p\mathcal M$. Let $c\colon [-ε,ε]$, $ε>0$, be a curve with $c(0) = p$, $\dot c(0) = Y$, e.g. $c(t)= \exp_p(tY)$. We obtain

$$Df(p)[Y] = \left. \frac{d}{dt} \right|_{t=0} f(c(t)) = \lim_{t \to 0} \frac{1}{t}(f(\exp_p(tY))-f(p))$$

We can approximate $Df(p)[X]$ by a finite difference scheme for an $h>0$ as

$$DF(p)[Y] ≈ G_h(Y) := \frac{1}{h}(f(\exp_p(hY))-f(p))$$

Furthermore the gradient $\operatorname{grad}f$ is the Riesz representer of the differential, ie.

$$Df(p)[Y] = g_p(\operatorname{grad}f(p), Y),\qquad \text{ for all } Y ∈ T_p\mathcal M$$

and since it is a tangent vector, we can write it in terms of a basis as

$$\operatorname{grad}f(p) = \sum_{i=1}^{d} g_p(\operatorname{grad}f(p),X_i)X_i = \sum_{i=1}^{d} Df(p)[X_i]X_i$$

and perform the approximation from above to obtain

$$\operatorname{grad}f(p) ≈ \sum_{i=1}^{d} G_h(X_i)X_i$$

for some suitable step size $h$. This comes at the cost of $d+1$ function evaluations and $d$ exponential maps.

This is the first variant we can use. An advantage is that it is intrinsic in the sense that it does not require any embedding of the manifold.

### An Example: The Rayleigh Quotient

The Rayleigh quotient is concerned with finding eigenvalues (and eigenvectors) of a symmetric matrix $A\in ℝ^{(n+1)×(n+1)}$. The optimization problem reads

$$F\colon ℝ^{n+1} \to ℝ,\quad F(\mathbf x) = \frac{\mathbf x^\mathrm{T}A\mathbf x}{\mathbf x^\mathrm{T}\mathbf x}$$

Minimizing this function yields the smallest eigenvalue $\lambda_1$ as a value and the corresponding minimizer $\mathbf x^*$ is a corresponding eigenvector.

Since the length of an eigenvector is irrelevant, there is an ambiguity in the cost function. It can be better phrased on the sphere $𝕊^n$ of unit vectors in $\mathbb R^{n+1}$, i.e.

$$\operatorname*{arg\,min}_{p \in 𝕊^n}\ f(p) = \operatorname*{arg\,min}_{\ p \in 𝕊^n} p^\mathrm{T}Ap$$

We can compute the Riemannian gradient exactly as

$$\operatorname{grad} f(p) = 2(Ap - pp^\mathrm{T}Ap)$$

so we can compare it to the approximation by finite differences.

begin
Random.seed!(42)
n = 200
A = randn(n + 1, n + 1)
A = Symmetric(A)
M = Sphere(n)
nothing
end
f1(p) = p' * A'p
f1 (generic function with 1 method)
gradf1(p) = 2 * (A * p - p * p' * A * p)
gradf1 (generic function with 1 method)

Manifolds provides a finite difference scheme in tangent spaces, that you can introduce to use an existing framework (if the wrapper is implemented) form Euclidean space. Here we use FiniteDiff.jl.

r_backend = ManifoldDiff.TangentDiffBackend(ManifoldDiff.FiniteDifferencesBackend())
ManifoldDiff.TangentDiffBackend{ManifoldDiff.FiniteDifferencesBackend{FiniteDifferences.AdaptedFiniteDifferenceMethod{5, 1, FiniteDifferences.UnadaptedFiniteDifferenceMethod{7, 5}}}, ExponentialRetraction, LogarithmicInverseRetraction, DefaultOrthonormalBasis{ℝ, ManifoldsBase.TangentSpaceType}, DefaultOrthonormalBasis{ℝ, ManifoldsBase.TangentSpaceType}}(ManifoldDiff.FiniteDifferencesBackend{FiniteDifferences.AdaptedFiniteDifferenceMethod{5, 1, FiniteDifferences.UnadaptedFiniteDifferenceMethod{7, 5}}}(FiniteDifferences.AdaptedFiniteDifferenceMethod{5, 1, FiniteDifferences.UnadaptedFiniteDifferenceMethod{7, 5}}([-2, -1, 0, 1, 2], [0.08333333333333333, -0.6666666666666666, 0.0, 0.6666666666666666, -0.08333333333333333], ([-0.08333333333333333, 0.5, -1.5, 0.8333333333333334, 0.25], [0.08333333333333333, -0.6666666666666666, 0.0, 0.6666666666666666, -0.08333333333333333], [-0.25, -0.8333333333333334, 1.5, -0.5, 0.08333333333333333]), 10.0, 1.0, Inf, 0.05555555555555555, 1.4999999999999998, FiniteDifferences.UnadaptedFiniteDifferenceMethod{7, 5}([-3, -2, -1, 0, 1, 2, 3], [-0.5, 2.0, -2.5, 0.0, 2.5, -2.0, 0.5], ([0.5, -4.0, 12.5, -20.0, 17.5, -8.0, 1.5], [-0.5, 2.0, -2.5, 0.0, 2.5, -2.0, 0.5], [-1.5, 8.0, -17.5, 20.0, -12.5, 4.0, -0.5]), 10.0, 1.0, Inf, 0.5365079365079365, 10.0))), ExponentialRetraction(), LogarithmicInverseRetraction(), DefaultOrthonormalBasis(ℝ), DefaultOrthonormalBasis(ℝ))
gradf1_FD(p) = ManifoldDiff.gradient(M, f1, p, r_backend)
gradf1_FD (generic function with 1 method)
begin
p = zeros(n + 1)
p[1] = 1.0
norm(M, p, X1 - X2)
end
1.0183547605045089e-12

We obtain quite a good approximation of the gradient.

## 2. Conversion of a Euclidean Gradient in the Embedding to a Riemannian Gradient of a (not Necessarily Isometrically) Embedded Manifold

Let $\tilde f\colon\mathbb R^m \to \mathbb R$ be a function on the embedding of an $n$-dimensional manifold $\mathcal M \subset \mathbb R^m$ and let $f\colon \mathcal M \to \mathbb R$ denote the restriction of $\tilde f$ to the manifold $\mathcal M$.

Since we can use the pushforward of the embedding to also embed the tangent space $T_p\mathcal M$, $p\in \mathcal M$, we can similarly obtain the differential $Df(p)\colon T_p\mathcal M \to \mathbb R$ by restricting the differential $D\tilde f(p)$ to the tangent space.

If both $T_p\mathcal M$ and $T_p\mathbb R^m$ have the same inner product, or in other words the manifold is isometrically embedded in $\mathbb R^m$ (like for example the sphere $\mathbb S^n\subset\mathbb R^{m+1}$), then this restriction of the differential directly translates to a projection of the gradient, i.e.

$$\operatorname{grad}f(p) = \operatorname{Proj}_{T_p\mathcal M}(\operatorname{grad} \tilde f(p))$$

More generally we might have to take a change of the metric into account, i.e.

$$\langle \operatorname{Proj}_{T_p\mathcal M}(\operatorname{grad} \tilde f(p)), X \rangle = Df(p)[X] = g_p(\operatorname{grad}f(p), X)$$

or in words: we have to change the Riesz representer of the (restricted/projected) differential of $f$ ($\tilde f$) to the one with respect to the Riemannian metric. This is done using change_representer.

### A Continued Example

We continue with the Rayleigh Quotient from before, now just starting with the defintion of the Euclidean case in the embedding, the function $F$.

F(x) = x' * A * x / (x' * x);

The cost function is the same by restriction

f2(M, p) = F(p);

function grad_f2_AD(M, p)
M, F, p, Manifolds.RiemannianProjectionBackend(ManifoldDiff.FiniteDifferencesBackend())
)
end
grad_f2_AD (generic function with 1 method)
X3 = grad_f2_AD(M, p)
201-element Vector{Float64}:
0.0
0.14038510230588766
1.2081562751116681
-1.8650092022183193
-1.3296034534632897
0.8233596435551424
0.013307949254910802
⋮
-1.0895103349140873
2.423453243905215
2.349106449107357
0.5799736335284804
-0.07423232665910076
2.0859147085739465
norm(M, p, X1 - X3)
1.7211299825309384e-12

### An Example for a Nonisometrically Embedded Manifold

on the manifold $\mathcal P(3)$ of symmetric positive definite matrices.

The following function computes (half) the distance squared (with respect to the linear affine metric) on the manifold $\mathcal P(3)$ to the identity, i.e. $I_3$. Denoting the unit matrix we consider the function

$$G(q) = \frac{1}{2}d^2_{\mathcal P(3)}(q,I_3) = \lVert \operatorname{Log}(q) \rVert_F^2,$$

where $\operatorname{Log}$ denotes the matrix logarithm and $\lVert \cdot \rVert_F$ is the Frobenius norm. This can be computed for symmetric positive definite matrices by summing the squares of the logarithms of the eigenvalues of $q$ and dividing by two:

G(q) = sum(log.(eigvals(Symmetric(q))) .^ 2) / 2
G (generic function with 1 method)

We can also interpret this as a function on the space of matrices and apply the Euclidean finite differences machinery; in this way we can easily derive the Euclidean gradient. But when computing the Riemannian gradient, we have to change the representer (see again change_representer) after projecting onto the tangent space $T_p\mathcal P(n)$ at $p$.

Let's first define a point and the manifold $N=\mathcal P(3)$.

rotM(α) = [1.0 0.0 0.0; 0.0 cos(α) sin(α); 0.0 -sin(α) cos(α)]
rotM (generic function with 1 method)
q = rotM(π / 6) * [1.0 0.0 0.0; 0.0 2.0 0.0; 0.0 0.0 3.0] * transpose(rotM(π / 6))
3×3 Matrix{Float64}:
1.0  0.0       0.0
0.0  2.25      0.433013
0.0  0.433013  2.75
N = SymmetricPositiveDefinite(3)
SymmetricPositiveDefinite(3)
is_point(N, q)
true

We could first just compute the gradient using FiniteDifferences.jl, but this yields the Euclidean gradient:

FiniteDifferences.grad(central_fdm(5, 1), G, q)
([3.240417492806275e-14 1.1767558621318606e-14 2.075425785233536e-14; 0.0 0.35148121676557886 0.017000516835447246; 0.0 0.0 0.3612964697372195],)

Instead, we use the RiemannianProjectedBackend of Manifolds.jl, which in this case internally uses FiniteDifferences.jl to compute a Euclidean gradient but then uses the conversion explained above to derive the Riemannian gradient.

We define this here again as a function grad_G_FD that could be used in the Manopt.jl framework within a gradient based optimization.

function grad_G_FD(N, q)
N, G, q, ManifoldDiff.RiemannianProjectionBackend(ManifoldDiff.FiniteDifferencesBackend())
)
end
grad_G_FD (generic function with 1 method)
G1 = grad_G_FD(N, q)
3×3 Matrix{Float64}:
3.24042e-14  1.77319e-14  3.10849e-14
1.77319e-14  1.86368      0.826856
3.10849e-14  0.826856     2.81845

Now, we can again compare this to the (known) solution of the gradient, namely the gradient of (half of) the distance squared, i.e. $G(q) = \frac{1}{2}d^2_{\mathcal P(3)}(q,I_3)$ is given by $\operatorname{grad} G(q) = -\operatorname{log}_q I_3$, where $\operatorname{log}$ is the logarithmic map on the manifold.

G2 = -log(N, q, Matrix{Float64}(I, 3, 3))
3×3 Matrix{Float64}:
-0.0  -0.0       -0.0
-0.0   1.86368    0.826856
-0.0   0.826856   2.81845

Both terms agree up to $1.8×10^{-12}$:

norm(G1 - G2)
1.4601907912994879e-12
isapprox(M, q, G1, G2; atol=2 * 1e-12)
true

## Summary

This tutorial illustrates how to use tools from Euclidean spaces, finite differences or automatic differentiation, to compute gradients on Riemannian manifolds. The scheme allows to use any differentiation framework within the embedding to derive a Riemannian gradient.