Optimize on your own manifold
Ronny Bergmann
When you have used a few solvers from Manopt.jl
for example like in the opening tutorial 🏔️ Get started: optimize! and also familiarized yourself with how to work with manifolds in general at 🚀 Get Started with Manifolds.jl
, you might come across the point that you want to implementing a manifold yourself and use it within Manopt.jl
. A challenge might be, which functions are necessary, since the overall interface of ManifoldsBase.jl is maybe not completely necessary.
This tutorial aims to help you through these steps to implement necessary parts of a manifold to get started with the solver you have in mind.
An example problem
We get started by loading the packages we need.
using LinearAlgebra, Manifolds, ManifoldsBase, Random
using Manopt
Random.seed!(42)
We also define the same manifold as in the implementing a manifold tutorial.
"""
ScaledSphere <: AbstractManifold{ℝ}
Define a sphere of fixed radius
# Fields
* `dimension` dimension of the sphere
* `radius` the radius of the sphere
# Constructor
ScaledSphere(dimension,radius)
Initialize the manifold to a certain `dimension` and `radius`,
which by default is set to `1.0`
"""
struct ScaledSphere <: AbstractManifold{ℝ}
dimension::Int
radius::Float64
end
We would like to compute a mean and/or median similar to 🏔️ Get started: optimize!. For given a set of points $q_1,\ldots,q_n$ we want to compute [Kar77]
\[ \operatorname*{arg\,min}_{p∈\mathcal M} \frac{1}{2n} \sum_{i=1}^n d_{\mathcal M}^2(p, q_i)\]
On the ScaledSphere
we just defined. We define a few parameters first
d = 5 # dimension of the sphere - embedded in R^{d+1}
r = 2.0 # radius of the sphere
N = 100 # data set size
M = ScaledSphere(d,r)
ScaledSphere(5, 2.0)
If we generate a few points
# generate 100 points around the north pole
pts = [ [zeros(d)..., M.radius] .+ 0.5.*([rand(d)...,0.5] .- 0.5) for _=1:N]
# project them onto the r-sphere
pts = [ r/norm(p) .* p for p in pts]
Then, before starting with optimization, we need the distance on the manifold, to define the cost function, as well as the logarithmic map to defined the gradient. For both, we here use the “lazy” approach of using the Sphere as a fallback. Finally, we have to provide information about how points and tangent vectors are stored on the manifold by implementing their representation_size
function, which is often required when allocating memory. While we could
import ManifoldsBase: distance, log, representation_size
function distance(M::ScaledSphere, p, q)
return M.radius * distance(Sphere(M.dimension), p ./ M.radius, q ./ M.radius)
end
function log(M::ScaledSphere, p, q)
return M.radius * log(Sphere(M.dimension), p ./ M.radius, q ./ M.radius)
end
representation_size(M::ScaledSphere) = (M.dimension+1,)
Define the cost and gradient
f(M, q) = sum(distance(M, q, p)^2 for p in pts)
grad_f(M,q) = sum( - log(M, q, p) for p in pts)
Defining the necessary functions to run a solver
The documentation usually lists the necessary functions in a section “Technical Details” close to the end of the documentation of a solver, for our case that is The gradient descent’s Technical Details,
They list all details, but we can start even step by step here if we are a bit careful.
A retraction
We first implement a retraction. Informally, given a current point and a direction to “walk into” we need a function that performs that walk. Since we take an easy one that just projects onto the sphere, we use the ProjectionRetraction
type. To be precise, we have to implement the in-place variant retract_project!
import ManifoldsBase: retract_project!
function retract_project!(M::ScaledSphere, q, p, X, t::Number)
q .= p .+ t .* X
q .*= M.radius / norm(q)
return q
end
retract_project! (generic function with 19 methods)
The other two technical remarks refer to the step size and the stopping criterion, so if we set these to something simpler, we should already be able to do a first run.
We have to specify
- that we want to use the new retraction,
- a simple step size and stopping criterion
We start with a certain point of cost
p0 = [zeros(d)...,1.0]
f(M,p0)
444.60374551157634
Then we can run our first solver, where we have to overwrite a few defaults, which would use functions we do not (yet) have. Let’s discuss these in the next steps.
q1 = gradient_descent(M, f, grad_f, p0;
retraction_method = ProjectionRetraction(), # state, that we use the retraction from above
stepsize = DecreasingLength(M; length=1.0), # A simple step size
stopping_criterion = StopAfterIteration(10), # A simple stopping criterion
X = zeros(d+1), # how we define/represent tangent vectors
)
f(M,q1)
162.4000287847332
We at least see, that the function value decreased.
Norm and maximal step size
To use more advanced stopping criteria and step sizes we first need an inner
(M, p, X)
. We also need a max_stepsize
(M)
, to avoid having too large steps on positively curved manifolds like our scaled sphere in this example
import ManifoldsBase: inner
import Manopt: max_stepsize
inner(M::ScaledSphere, p, X,Y) = dot(X,Y) # inherited from the embedding
# set the maximal allowed stepsize to injectivity radius.
Manopt.max_stepsize(M::ScaledSphere) = M.radius*π
Then we can use the default step size (ArmijoLinesearch
) and the default stopping criterion, which checks for a small gradient Norm
q2 = gradient_descent(M, f, grad_f, p0;
retraction_method = ProjectionRetraction(), # as before
X = zeros(d+1), # as before
)
f(M, q2)
9.772830131357034
Making life easier: default retraction and zero vector
To initialize tangent vector memory, the function zero_vector
(M,p)
is called. Similarly, the most-used retraction is returned by default_retraction_method
We can use both here, to make subsequent calls to the solver less verbose. We define
import ManifoldsBase: zero_vector, default_retraction_method
zero_vector(M::ScaledSphere, p) = zeros(M.dimension+1)
default_retraction_method(M::ScaledSphere) = ProjectionRetraction()
default_retraction_method (generic function with 19 methods)
and now we can even just call
q3 = gradient_descent(M, f, grad_f, p0)
f(M, q3)
9.772830131357034
But we for example automatically also get the possibility to obtain debug information like
gradient_descent(M, f, grad_f, p0; debug = [:Iteration, :Cost, :Stepsize, 25, :GradientNorm, :Stop, "\n"]);
Initial f(x): 444.603746
# 25 f(x): 9.772833s:0.018299583806109226|grad f(p)|:0.020516914880881486
# 50 f(x): 9.772830s:0.018299583806109226|grad f(p)|:0.00013449321419330018
The algorithm reached approximately critical point after 72 iterations; the gradient norm (9.20733514568335e-9) is less than 1.0e-8.
see How to Print Debug Output for more details.
Technical details
This tutorial is cached. It was last run on the following package versions.
using Pkg
Pkg.status()
Status `~/work/Manopt.jl/Manopt.jl/tutorials/Project.toml`
[6e4b80f9] BenchmarkTools v1.5.0
[5ae59095] Colors v0.12.11
[31c24e10] Distributions v0.25.112
[26cc04aa] FiniteDifferences v0.12.32
[7073ff75] IJulia v1.25.0
[8ac3fa9e] LRUCache v1.6.1
[af67fdf4] ManifoldDiff v0.3.12
[1cead3c2] Manifolds v0.10.3
[3362f125] ManifoldsBase v0.15.17
[0fc0a36d] Manopt v0.5.2 `..`
[91a5bcdd] Plots v1.40.8
[731186ca] RecursiveArrayTools v3.27.0
using Dates
now()
2024-10-18T19:20:52.061
Literature
- [Kar77]
- H. Karcher. Riemannian center of mass and mollifier smoothing. Communications on Pure and Applied Mathematics 30, 509–541 (1977).