# đď¸ Get started: optimize.

Ronny Bergmann

This tutorial both introduces the basics of optimisation on manifolds as well as how to use `Manopt.jl`

to perform optimisation on manifolds in Julia.

For more theoretical background, see for example [Car92] for an introduction to Riemannian manifolds and [AMS08] or [Bou23] to read more about optimisation thereon.

Let $\mathcal M$ denote a Riemannian manifold and let $f: \mathcal M â â$ be a cost function. The aim is to determine or obtain a point $p^*$ where $f$ is *minimal* or in other words $p^*$ is a *minimizer* of $f$.

This can also be written as

\[ \operatorname*{arg\,min}_{p â \mathcal M} f(p)\]

where the aim is to compute the minimizer $p^*$ numerically. As an example, consider the generalisation of the (arithemtic) mean. In the Euclidean case with $dâ\mathbb N$, that is for $nâ\mathbb N$ data points $y_1,\ldots,y_n â â^d$ the mean

\[ \frac{1}{n}\sum_{i=1}^n y_i\]

can not be directly generalised to data $q_1,\ldots,q_n â \mathcal M$, since on a manifold there is no addition available. But the mean can also be characterised as the following minimizer

\[ \operatorname*{arg\,min}_{xââ^d} \frac{1}{2n}\sum_{i=1}^n \lVert x - y_i\rVert^2\]

and using the Riemannian distance $d_\mathcal M$, this can be written on Riemannian manifolds, which is the so called *Riemannian Center of Mass* [Kar77]

\[ \operatorname*{arg\,min}_{pâ\mathcal M} \frac{1}{2n} \sum_{i=1}^n d_{\mathcal M}^2(p, q_i)\]

Fortunately the gradient can be computed and is

\[ \frac{1}{n} \sum_{i=1}^n -\log_p q_i\]

## Loading the necessary packages

Letâs assume you have already installed both `Manopt.jl`

and `Manifolds.jl`

in Julia (using for example `using Pkg; Pkg.add(["Manopt", "Manifolds"])`

). Then we can get started by loading both packages as well as `Random.jl`

for persistency in this tutorial.

```
using Manopt, Manifolds, Random, LinearAlgebra, ManifoldDiff
using ManifoldDiff: grad_distance, prox_distance
Random.seed!(42);
```

Now assume we are on the Sphere $\mathcal M = \mathbb S^2$ and we generate some random points âaroundâ some initial point $p$

```
n = 100
Ď = Ď / 8
M = Sphere(2)
p = 1 / sqrt(2) * [1.0, 0.0, 1.0]
data = [exp(M, p, Ď * rand(M; vector_at=p)) for i in 1:n];
```

Now we can define the cost function $f$ and its (Riemannian) gradient $\operatorname{grad} f$ for the Riemannian center of mass:

```
f(M, p) = sum(1 / (2 * n) * distance.(Ref(M), Ref(p), data) .^ 2)
grad_f(M, p) = sum(1 / n * grad_distance.(Ref(M), data, Ref(p)));
```

and just call `gradient_descent`

. For a first start, we do not have to provide more than the manifold, the cost, the gradient, and a starting point, which we just set to the first data point

`m1 = gradient_descent(M, f, grad_f, data[1])`

```
3-element Vector{Float64}:
0.6868392807355564
0.006531599748261925
0.7267799809043942
```

In order to get more details, we further add the `debug=`

keyword argument, which act as a decorator pattern.

This way we can easily specify a certain debug to be printed. The goal is to get an output of the form

`# i | Last Change: [...] | F(x): [...] |`

but where we also want to fix the display format for the change and the cost numbers (the `[...]`

) to have a certain format. Furthermore, the reason why the solver stopped should be printed at the end

These can easily be specified using either a Symbol when using the default format for numbers, or a tuple of a symbol and a format-string in the `debug=`

keyword that is available for every solver. We can also,Â for illustration reasons,Â just look at the first 6 steps by setting a `stopping_criterion=`

```
m2 = gradient_descent(M, f, grad_f, data[1];
debug=[:Iteration,(:Change, "|Îp|: %1.9f |"),
(:Cost, " F(x): %1.11f | "), "\n", :Stop],
stopping_criterion = StopAfterIteration(6)
)
```

```
Initial F(x): 0.32487988924 |
# 1 |Îp|: 1.063609017 | F(x): 0.25232524046 |
# 2 |Îp|: 0.809858671 | F(x): 0.20966960102 |
# 3 |Îp|: 0.616665145 | F(x): 0.18546505598 |
# 4 |Îp|: 0.470841764 | F(x): 0.17121604104 |
# 5 |Îp|: 0.359345690 | F(x): 0.16300825911 |
# 6 |Îp|: 0.274597420 | F(x): 0.15818548927 |
The algorithm reached its maximal number of iterations (6).
3-element Vector{Float64}:
0.7533872481682505
-0.06053107055583637
0.6547851890466334
```

See here for the list of available symbols.

The `debug=`

keyword is actually a list of `DebugActions`

added to every iteration, allowing you to write your own ones even. Additionally, `:Stop`

is an action added to the end of the solver to display the reason why the solver stopped.

The default stopping criterion for `gradient_descent`

is, to either stop when the gradient is small (`<1e-9`

) or a max number of iterations is reached (as a fallback). Combining stopping-criteria can be done by `|`

or `&`

. We further pass a number `25`

to `debug=`

to only an output every `25`

th iteration:

```
m3 = gradient_descent(M, f, grad_f, data[1];
debug=[:Iteration,(:Change, "|Îp|: %1.9f |"),
(:Cost, " F(x): %1.11f | "), "\n", :Stop, 25],
stopping_criterion = StopWhenGradientNormLess(1e-14) |Â StopAfterIteration(400),
)
```

```
Initial F(x): 0.32487988924 |
# 25 |Îp|: 0.459715605 | F(x): 0.15145076374 |
# 50 |Îp|: 0.000551270 | F(x): 0.15145051509 |
The algorithm reached approximately critical point after 73 iterations; the gradient norm (9.988871119384563e-16) is less than 1.0e-14.
3-element Vector{Float64}:
0.6868392794788668
0.006531600680779286
0.7267799820836411
```

We can finally use another way to determine the stepsize, for example a little more expensive `ArmijoLineSeach`

than the default stepsize rule used on the Sphere.

```
m4 = gradient_descent(M, f, grad_f, data[1];
debug=[:Iteration,(:Change, "|Îp|: %1.9f |"),
(:Cost, " F(x): %1.11f | "), "\n", :Stop, 2],
stepsize = ArmijoLinesearch(M; contraction_factor=0.999, sufficient_decrease=0.5),
stopping_criterion = StopWhenGradientNormLess(1e-14) |Â StopAfterIteration(400),
)
```

```
Initial F(x): 0.32487988924 |
# 2 |Îp|: 0.001318138 | F(x): 0.15145051509 |
# 4 |Îp|: 0.000000004 | F(x): 0.15145051509 |
# 6 |Îp|: 0.000000000 | F(x): 0.15145051509 |
The algorithm reached approximately critical point after 7 iterations; the gradient norm (5.073696618059386e-15) is less than 1.0e-14.
3-element Vector{Float64}:
0.6868392794788669
0.006531600680779358
0.7267799820836413
```

Then we reach approximately the same point as in the previous run, but in far less steps

`[f(M, m3)-f(M,m4), distance(M, m3, m4)]`

```
2-element Vector{Float64}:
1.6653345369377348e-16
1.727269835930624e-16
```

## Using the âTutorialâ mode

Since a few things on manifolds are a bit different from (classical) Euclidean optimization, `Manopt.jl`

has a mode to warn about a few pitfalls.

It can be set using

`set_manopt_parameter!(:Mode, "Tutorial")`

`[ Info: Setting the `Manopt.jl` parameter :Mode to Tutorial.`

to activate these. Continuing from the example before, one might argue, that the minimizer of $f$ does not depend on the scaling of the function. In theory this is of course also the case on manifolds, but for the optimizations there is a caveat. When we define the Riemannian mean without the scaling

```
f2(M, p) = sum(1 / 2 * distance.(Ref(M), Ref(p), data) .^ 2)
grad_f2(M, p) = sum(grad_distance.(Ref(M), data, Ref(p)));
```

And we consider the gradient at the starting point in norm

`norm(M, data[1], grad_f2(M, data[1]))`

`57.47318616893399`

On the sphere, when we follow a geodesic, we âreturnâ to the start point after length $2Ď$. If we âlandâ short before the starting point due to a gradient of length just shy of $2Ď$, the line search would take the gradient direction (and not the negative gradient direction) as a start. The line search is still performed, but in this case returns a much too small, maybe even nearly zero step size.

In other words âÂ we have to be careful, that the optimisation stays a âlocalâ argument we use.

This is also warned for in `"Tutorial"`

mode. Calling

`mX = gradient_descent(M, f2, grad_f2, data[1])`

```
â Warning: At iteration #0
â the gradient norm (57.47318616893399) is larger that 1.0 times the injectivity radius 3.141592653589793 at the current iterate.
â @ Manopt ~/work/Manopt.jl/Manopt.jl/src/plans/debug.jl:1001
â Warning: Further warnings will be suppressed, use DebugWarnIfGradientNormTooLarge(1.0, :Always) to get all warnings.
â @ Manopt ~/work/Manopt.jl/Manopt.jl/src/plans/debug.jl:1005
3-element Vector{Float64}:
0.6868392794870684
0.006531600674920825
0.7267799820759485
```

So just by chance it seems we still got nearly the same point as before, but when we for example look when this one stops, that is takes more steps.

`gradient_descent(M, f2, grad_f2, data[1], debug=[:Stop]);`

`The algorithm reached approximately critical point after 140 iterations; the gradient norm (6.807380063106406e-9) is less than 1.0e-8.`

This also illustrates one way to deactivate the hints, namely by overwriting the `debug=`

keyword, that in `Tutorial`

mode contains addional warnings.the other option is to globally reset the `:Mode`

back to

`set_manopt_parameter!(:Mode, "")`

`[ Info: Resetting the `Manopt.jl` parameter :Mode to default.`

## Example 2: computing the median of symmetric positive definite matrices.

For the second example letâs consider the manifold of $3 Ă 3$ symmetric positive definite matrices and again 100 random points

```
N = SymmetricPositiveDefinite(3)
m = 100
Ď = 0.005
q = Matrix{Float64}(I, 3, 3)
data2 = [exp(N, q, Ď * rand(N; vector_at=q)) for i in 1:m];
```

Instead of the mean, letâs consider a non-smooth optimisation task: The median can be generalized to Manifolds as the minimiser of the sum of distances, see [Bac14]. We define

`g(N, q) = sum(1 / (2 * m) * distance.(Ref(N), Ref(q), data2))`

`g (generic function with 1 method)`

Since the function is non-smooth, we can not use a gradient-based approach. But since for every summand the proximal map is available, we can use the cyclic proximal point algorithm (CPPA). We hence define the vector of proximal maps as

`proxes_g = Function[(N, Îť, q) -> prox_distance(N, Îť / m, di, q, 1) for di in data2];`

Besides also looking at a some debug prints, we can also easily record these values. Similarly to `debug=`

, `record=`

also accepts Symbols, see list here, to indicate things to record. We further set `return_state`

to true to obtain not just the (approximate) minimizer.

```
res = cyclic_proximal_point(N, g, proxes_g, data2[1];
debug=[:Iteration," | ",:Change," | ",(:Cost, "F(x): %1.12f"),"\n", 1000, :Stop,
],
record=[:Iteration, :Change, :Cost, :Iterate],
return_state=true,
);
```

```
Initial | | F(x): 0.005875512856
# 1000 | Last Change: 0.003704 | F(x): 0.003239019699
# 2000 | Last Change: 0.000015 | F(x): 0.003238996105
# 3000 | Last Change: 0.000005 | F(x): 0.003238991748
# 4000 | Last Change: 0.000002 | F(x): 0.003238990225
# 5000 | Last Change: 0.000001 | F(x): 0.003238989520
The algorithm reached its maximal number of iterations (5000).
```

The recording is realised by `RecordActions`

that are (also) executed at every iteration. These can also be individually implemented and added to the `record=`

array instead of symbols.

First, the computed median can be accessed as

`median = get_solver_result(res)`

```
3Ă3 Matrix{Float64}:
1.0 2.12236e-5 0.000398721
2.12236e-5 1.00044 0.000141798
0.000398721 0.000141798 1.00041
```

but we can also look at the recorded values. For simplicity (of output), lets just look at the recorded values at iteration 42

`get_record(res)[42]`

`(42, 1.0569455860769079e-5, 0.003252547739370045, [0.9998583866917449 0.0002098880312604301 0.0002895445818451581; 0.00020988803126037459 1.0000931572564762 0.0002084371501681892; 0.00028954458184524134 0.0002084371501681892 1.000070920743257])`

But we can also access whole series and see that the cost does not decrease that fast; actually, the CPPA might converge relatively slow. For that we can for example access the `:Cost`

that was recorded every `:Iterate`

as well as the (maybe a little boring) `:Iteration`

-number in a semi-log-plot.

```
x = get_record(res, :Iteration, :Iteration)
y = get_record(res, :Iteration, :Cost)
using Plots
plot(x,y,xaxis=:log, label="CPPA Cost")
```

## Literature

- [AMS08]
- P.-A.Â Absil, R.Â Mahony and R.Â Sepulchre.
*Optimization Algorithms on Matrix Manifolds*(Princeton University Press, 2008), available online at press.princeton.edu/chapters/absil/. - [Bac14]
- M.Â BaÄĂĄk.
*Computing medians and means in Hadamard spaces*. SIAMÂ JournalÂ onÂ Optimization**24**, 1542â1566 (2014), arXiv:1210.2145. - [Bou23]
- N.Â Boumal.
*An Introduction to Optimization on Smooth Manifolds*. FirstÂ Edition (Cambridge University Press, 2023). - [Car92]
- M.Â P.Â doÂ Carmo.
*Riemannian Geometry*.*Mathematics: Theory & Applications*(BirkhĂ¤user Boston, Inc., Boston, MA, 1992); p.Â xiv+300. - [Kar77]
- H.Â Karcher.
*Riemannian center of mass and mollifier smoothing*. CommunicationsÂ onÂ PureÂ andÂ AppliedÂ Mathematics**30**, 509â541 (1977).