Using (Euclidean) AD in Manopt.jl
Since Manifolds.jl 0.7, the support of automatic differentiation support has been extended.
This tutorial explains how to use Euclidean tools to derive a gradient for a real-valued function $f\colon \mathcal M → ℝ$. We will consider two methods: an intrinsic variant and a variant employing the embedding. These gradients can then be used within any gradient based optimization algorithm in Manopt.jl.
While by default we use FiniteDifferences.jl, you can also use FiniteDiff.jl, ForwardDiff.jl, ReverseDiff.jl, or Zygote.jl.
In this Notebook we will take a look at a few possibilities to approximate or derive the gradient of a function $f:\mathcal M \to ℝ$ on a Riemannian manifold, without computing it yourself. There are mainly two different philosophies:
Working instrinsically, i.e. staying on the manifold and in the tangent spaces. Here, we will consider approximating the gradient by forward differences.
Working in an embedding – there we can use all tools from functions on Euclidean spaces – finite differences or automatic differenciation – and then compute the corresponding Riemannian gradient from there.
Let's first load all the packages we need.
using Manifolds, Manopt, Random, LinearAlgebra
using FiniteDifferences
1. (Intrinsic) Forward Differences
A first idea is to generalize (multivariate) finite differences to Riemannian manifolds. Let $X_1,\ldots,X_d ∈ T_p\mathcal M$ denote an orthonormal basis of the tangent space $T_p\mathcal M$ at the point $p∈\mathcal M$ on the Riemannian manifold.
We can generalize the notion of a directional derivative, i.e. for the “direction” $Y∈T_p\mathcal M$. Let $c\colon [-ε,ε]$, $ε>0$, be a curve with $c(0) = p$, $\dot c(0) = Y$, e.g. $c(t)= \exp_p(tY)$. We obtain
$$ Df(p)[Y] = \left. \frac{d}{dt} \right|_{t=0} f(c(t)) = \lim_{t \to 0} \frac{1}{t}(f(\exp_p(tY))-f(p))$$
We can approximate $Df(p)[X]$ by a finite difference scheme for an $h>0$ as
$$DF(p)[Y] ≈ G_h(Y) := \frac{1}{h}(f(\exp_p(hY))-f(p))$$
Furthermore the gradient $\operatorname{grad}f$ is the Riesz representer of the differential, ie.
$$ Df(p)[Y] = g_p(\operatorname{grad}f(p), Y),\qquad \text{ for all } Y ∈ T_p\mathcal M$$
and since it is a tangent vector, we can write it in terms of a basis as
$$ \operatorname{grad}f(p) = \sum_{i=1}^{d} g_p(\operatorname{grad}f(p),X_i)X_i = \sum_{i=1}^{d} Df(p)[X_i]X_i$$
and perform the approximation from above to obtain
$$ \operatorname{grad}f(p) ≈ \sum_{i=1}^{d} G_h(X_i)X_i$$
for some suitable step size $h$. This comes at the cost of $d+1$ function evaluations and $d$ exponential maps.
This is the first variant we can use. An advantage is that it is intrinsic in the sense that it does not require any embedding of the manifold.
An Example: The Rayleigh Quotient
The Rayleigh quotient is concerned with finding eigenvalues (and eigenvectors) of a symmetric matrix $A\in ℝ^{(n+1)×(n+1)}$. The optimization problem reads
$$F\colon ℝ^{n+1} \to ℝ,\quad F(\mathbf x) = \frac{\mathbf x^\mathrm{T}A\mathbf x}{\mathbf x^\mathrm{T}\mathbf x}$$
Minimizing this function yields the smallest eigenvalue $\lambda_1$ as a value and the corresponding minimizer $\mathbf x^*$ is a corresponding eigenvector.
Since the length of an eigenvector is irrelevant, there is an ambiguity in the cost function. It can be better phrased on the sphere $𝕊^n$ of unit vectors in $\mathbb R^{n+1}$, i.e.
$$\operatorname*{arg\,min}_{p \in 𝕊^n}\ f(p) = \operatorname*{arg\,min}_{\ p \in 𝕊^n} p^\mathrm{T}Ap$$
We can compute the Riemannian gradient exactly as
$$\operatorname{grad} f(p) = 2(Ap - pp^\mathrm{T}Ap)$$
so we can compare it to the approximation by finite differences.
begin
Random.seed!(42)
n = 200
A = randn(n + 1, n + 1)
A = Symmetric(A)
M = Sphere(n)
nothing
end
f1(p) = p' * A'p
f1 (generic function with 1 method)
gradf1(p) = 2 * (A * p - p * p' * A * p)
gradf1 (generic function with 1 method)
Manifolds provides a finite difference scheme in tangent spaces, that you can introduce to use an existing framework (if the wrapper is implemented) form Euclidean space. Here we use FiniteDiff.jl
.
r_backend = Manifolds.TangentDiffBackend(Manifolds.FiniteDifferencesBackend())
Manifolds.TangentDiffBackend{Manifolds.FiniteDifferencesBackend{FiniteDifferences.AdaptedFiniteDifferenceMethod{5, 1, FiniteDifferences.UnadaptedFiniteDifferenceMethod{7, 5}}}, ExponentialRetraction, LogarithmicInverseRetraction, DefaultOrthonormalBasis{ℝ, ManifoldsBase.TangentSpaceType}, DefaultOrthonormalBasis{ℝ, ManifoldsBase.TangentSpaceType}}(Manifolds.FiniteDifferencesBackend{FiniteDifferences.AdaptedFiniteDifferenceMethod{5, 1, FiniteDifferences.UnadaptedFiniteDifferenceMethod{7, 5}}}(FiniteDifferences.AdaptedFiniteDifferenceMethod{5, 1, FiniteDifferences.UnadaptedFiniteDifferenceMethod{7, 5}}([-2, -1, 0, 1, 2], [0.08333333333333333, -0.6666666666666666, 0.0, 0.6666666666666666, -0.08333333333333333], ([-0.08333333333333333, 0.5, -1.5, 0.8333333333333334, 0.25], [0.08333333333333333, -0.6666666666666666, 0.0, 0.6666666666666666, -0.08333333333333333], [-0.25, -0.8333333333333334, 1.5, -0.5, 0.08333333333333333]), 10.0, 1.0, Inf, 0.05555555555555555, 1.4999999999999998, FiniteDifferences.UnadaptedFiniteDifferenceMethod{7, 5}([-3, -2, -1, 0, 1, 2, 3], [-0.5, 2.0, -2.5, 0.0, 2.5, -2.0, 0.5], ([0.5, -4.0, 12.5, -20.0, 17.5, -8.0, 1.5], [-0.5, 2.0, -2.5, 0.0, 2.5, -2.0, 0.5], [-1.5, 8.0, -17.5, 20.0, -12.5, 4.0, -0.5]), 10.0, 1.0, Inf, 0.5365079365079365, 10.0))), ExponentialRetraction(), LogarithmicInverseRetraction(), DefaultOrthonormalBasis(ℝ), DefaultOrthonormalBasis(ℝ))
gradf1_FD(p) = Manifolds.gradient(M, f1, p, r_backend)
gradf1_FD (generic function with 1 method)
begin
p = zeros(n + 1)
p[1] = 1.0
X1 = gradf1(p)
X2 = gradf1_FD(p)
norm(M, p, X1 - X2)
end
1.0003414846716736e-12
We obtain quite a good approximation of the gradient.
2. Conversion of a Euclidean Gradient in the Embedding to a Riemannian Gradient of a (not Necessarily Isometrically) Embedded Manifold
Let $\tilde f\colon\mathbb R^m \to \mathbb R$ be a function on the embedding of an $n$-dimensional manifold $\mathcal M \subset \mathbb R^m$ and let $f\colon \mathcal M \to \mathbb R$ denote the restriction of $\tilde f$ to the manifold $\mathcal M$.
Since we can use the pushforward of the embedding to also embed the tangent space $T_p\mathcal M$, $p\in \mathcal M$, we can similarly obtain the differential $Df(p)\colon T_p\mathcal M \to \mathbb R$ by restricting the differential $D\tilde f(p)$ to the tangent space.
If both $T_p\mathcal M$ and $T_p\mathbb R^m$ have the same inner product, or in other words the manifold is isometrically embedded in $\mathbb R^m$ (like for example the sphere $\mathbb S^n\subset\mathbb R^{m+1}$), then this restriction of the differential directly translates to a projection of the gradient, i.e.
$$\operatorname{grad}f(p) = \operatorname{Proj}_{T_p\mathcal M}(\operatorname{grad} \tilde f(p))$$
More generally we might have to take a change of the metric into account, i.e.
$$\langle \operatorname{Proj}_{T_p\mathcal M}(\operatorname{grad} \tilde f(p)), X \rangle = Df(p)[X] = g_p(\operatorname{grad}f(p), X)$$
or in words: we have to change the Riesz representer of the (restricted/projected) differential of $f$ ($\tilde f$) to the one with respect to the Riemannian metric. This is done using change_representer
.
A Continued Example
We continue with the Rayleigh Quotient from before, now just starting with the defintion of the Euclidean case in the embedding, the function $F$.
F(x) = x' * A * x / (x' * x);
The cost function is the same by restriction
f2(M, p) = F(p);
The gradient is now computed combining our gradient scheme with FiniteDifferences.
function grad_f2_AD(M, p)
return Manifolds.gradient(
M, F, p, Manifolds.RiemannianProjectionBackend(Manifolds.FiniteDifferencesBackend())
)
end
grad_f2_AD (generic function with 1 method)
X3 = grad_f2_AD(M, p)
201-element Vector{Float64}: 0.0 0.14038510230588766 1.2081562751116681 -1.8650092022183193 -1.3296034534632897 0.8233596435551424 0.013307949254910802 ⋮ -1.0895103349140873 2.423453243905215 2.349106449107357 0.5799736335284804 -0.07423232665910076 2.0859147085739465
norm(M, p, X1 - X3)
1.69683800899515e-12
An Example for a Nonisometrically Embedded Manifold
on the manifold $\mathcal P(3)$ of symmetric positive definite matrices.
The following function computes (half) the distance squared (with respect to the linear affine metric) on the manifold $\mathcal P(3)$ to the identity, i.e. $I_3$. Denoting the unit matrix we consider the function
$$ G(q) = \frac{1}{2}d^2_{\mathcal P(3)}(q,I_3) = \lVert \operatorname{Log}(q) \rVert_F^2,$$
where $\operatorname{Log}$ denotes the matrix logarithm and $\lVert \cdot \rVert_F$ is the Frobenius norm. This can be computed for symmetric positive definite matrices by summing the squares of the logarithms of the eigenvalues of $q$ and dividing by two:
G(q) = sum(log.(eigvals(Symmetric(q))) .^ 2) / 2
G (generic function with 1 method)
We can also interpret this as a function on the space of matrices and apply the Euclidean finite differences machinery; in this way we can easily derive the Euclidean gradient. But when computing the Riemannian gradient, we have to change the representer (see again change_representer
) after projecting onto the tangent space $T_p\mathcal P(n)$ at $p$.
Let's first define a point and the manifold $N=\mathcal P(3)$.
rotM(α) = [1.0 0.0 0.0; 0.0 cos(α) sin(α); 0.0 -sin(α) cos(α)]
rotM (generic function with 1 method)
q = rotM(π / 6) * [1.0 0.0 0.0; 0.0 2.0 0.0; 0.0 0.0 3.0] * transpose(rotM(π / 6))
3×3 Matrix{Float64}: 1.0 0.0 0.0 0.0 2.25 0.433013 0.0 0.433013 2.75
N = SymmetricPositiveDefinite(3)
SymmetricPositiveDefinite(3)
is_point(N, q)
true
We could first just compute the gradient using FiniteDifferences.jl
, but this yields the Euclidean gradient:
FiniteDifferences.grad(central_fdm(5, 1), G, q)
([3.240417492806275e-14 -2.3531899864903462e-14 0.0; 0.0 0.3514812167654708 0.017000516835452926; 0.0 0.0 0.36129646973723023],)
Instead, we use the RiemannianProjectedBackend
of Manifolds.jl
, which in this case internally uses FiniteDifferences.jl
to compute a Euclidean gradient but then uses the conversion explained above to derive the Riemannian gradient.
We define this here again as a function grad_G_FD
that could be used in the Manopt.jl
framework within a gradient based optimization.
function grad_G_FD(N, q)
return Manifolds.gradient(
N, G, q, Manifolds.RiemannianProjectionBackend(Manifolds.FiniteDifferencesBackend())
)
end
grad_G_FD (generic function with 1 method)
G1 = grad_G_FD(N, q)
3×3 Matrix{Float64}: 3.24042e-14 -2.64734e-14 -5.09481e-15 -2.64734e-14 1.86368 0.826856 -5.09481e-15 0.826856 2.81845
Now, we can again compare this to the (known) solution of the gradient, namely the gradient of (half of) the distance squared, i.e. $G(q) = \frac{1}{2}d^2_{\mathcal P(3)}(q,I_3)$ is given by $\operatorname{grad} G(q) = -\operatorname{log}_q I_3$, where $\operatorname{log}$ is the logarithmic map on the manifold.
G2 = -log(N, q, Matrix{Float64}(I, 3, 3))
3×3 Matrix{Float64}: -0.0 -0.0 -0.0 -0.0 1.86368 0.826856 -0.0 0.826856 2.81845
Both terms agree up to $1.8×10^{-12}$:
norm(G1 - G2)
1.776388869742036e-12
isapprox(M, q, G1, G2; atol=2 * 1e-12)
true
Summary
This tutorial illustrates how to use tools from Euclidean spaces, finite differences or automatic differentiation, to compute gradients on Riemannian manifolds. The scheme allows to use any differentiation framework within the embedding to derive a Riemannian gradient.