# Get Started: Optimize!

In this tutorial, we will both introduce the basics of optimisation on manifolds as well as how to use `Manopt.jl`

to perform optimisation on manifolds in Julia.

For more theoretical background, see e.g. (do Carmo, 1992) for an introduction to Riemannian manifolds and (Absil, Mahony and Sepulchre, 2008) or (Boumal, 2022) to read more about optimisation thereon.

Let $\mathcal M$ denote a Riemannian manifold and let $f\colon \mathcal M → ℝ$ be a cost function. We aim to compute a point $p^*$ where $f$ is *minimal* or in other words $p^*$ is a *minimizer* of $f$.

We also write this as

\[ \operatorname*{arg\,min}_{p ∈ \mathcal M} f(p)\]

and would like to find $p^*$ numerically. As an example we take the generalisation of the (arithemtic) mean. In the Euclidean case with$d\in\mathbb N$, that is for $n\in \mathbb N$ data points $y_1,\ldots,y_n \in \mathbb R^d$ the mean

\[ \sum_{i=1}^n y_i\]

can not be directly generalised to data $q_1,\ldots,q_n$, since on a manifold we do not have an addition. But the mean can also be charcterised as

\[ \operatorname*{arg\,min}_{x\in\mathbb R^d} \frac{1}{2n}\sum_{i=1}^n \lVert x - y_i\rVert^2\]

and using the Riemannian distance $d_\mathcal M$, this can be written on Riemannian manifolds. We obtain the *Riemannian Center of Mass* (Karcher, 1977)

\[ \operatorname*{arg\,min}_{p\in\mathbb R^d} \frac{1}{2n} \sum_{i=1}^n d_{\mathcal M}^2(p, q_i)\]

Fortunately the gradient can be computed and is

\[ \operatorname*{arg\,min}_{p\in\mathbb R^d} \frac{1}{n} \sum_{i=1}^n -\log_p q_i\]

## Loading the necessary packages

Let’s assume you have already installed both Manotp and Manifolds in Julia (using e.g. `using Pkg; Pkg.add(["Manopt", "Manifolds"])`

). Then we can get started by loading both packages – and `Random`

for persistency in this tutorial.

```
using Manopt, Manifolds, Random, LinearAlgebra
Random.seed!(42);
```

Now assume we are on the Sphere $\mathcal M = \mathbb S^2$ and we generate some random points “around” some initial point $p$

```
n = 100
σ = π / 8
M = Sphere(2)
p = 1 / sqrt(2) * [1.0, 0.0, 1.0]
data = [exp(M, p, σ * rand(M; vector_at=p)) for i in 1:n];
```

Now we can define the cost function $f$ and its (Riemannian) gradient $\operatorname{grad} f$ for the Riemannian center of mass:

```
f(M, p) = sum(1 / (2 * n) * distance.(Ref(M), Ref(p), data) .^ 2)
grad_f(M, p) = sum(1 / n * grad_distance.(Ref(M), data, Ref(p)));
```

and just call `gradient_descent`

. For a first start, we do not have to provide more than the manifold, the cost, the gradient, and a starting point, which we just set to the first data point

`m1 = gradient_descent(M, f, grad_f, data[1])`

```
3-element Vector{Float64}:
0.6868392795563908
0.006531600623587405
0.7267799820108911
```

In order to get more details, we further add the `debug=`

keyword argument, which act as a decorator pattern.

This way we can easily specify a certain debug to be printed. The goal is to get an output of the form

`# i | Last Change: [...] | F(x): [...] |`

but where we also want to fix the display format for the change and the cost numbers (the `[...]`

) to have a certain format. Furthermore, the reason why the solver stopped should be printed at the end

These can easily be specified using either a Symbol – using the default format for numbers – or a tuple of a symbol and a format-string in the `debug=`

keyword that is avaiable for every solver. We can also – for illustration reasons – just look at the first 6 steps by setting a `stopping_criterion=`

```
m2 = gradient_descent(M, f, grad_f, data[1];
debug=[:Iteration,(:Change, "|Δp|: %1.9f |"),
(:Cost, " F(x): %1.11f | "), "\n", :Stop],
stopping_criterion = StopAfterIteration(6)
)
```

```
Initial F(x): 0.32487988924 |
# 1 |Δp|: 1.063609017 | F(x): 0.25232524046 |
# 2 |Δp|: 0.809858671 | F(x): 0.20966960102 |
# 3 |Δp|: 0.616665145 | F(x): 0.18546505598 |
# 4 |Δp|: 0.470841764 | F(x): 0.17121604104 |
# 5 |Δp|: 0.359345690 | F(x): 0.16300825911 |
# 6 |Δp|: 0.274597420 | F(x): 0.15818548927 |
The algorithm reached its maximal number of iterations (6).
3-element Vector{Float64}:
0.7533872481682506
-0.060531070555836286
0.6547851890466333
```

See here for the list of available symbols.

The `debug=`

keyword is actually a list of `DebugActions`

added to every iteration, allowing you to write your own ones even. Additionally, `:Stop`

is an action added to the end of the solver to display the reason why the solver stopped.

The default stopping criterion for `gradient_descent`

is, to either stopwhen the gradient is small (`<1e-9`

) or a max number of iterations is reached (as a fallback. Combining stopping-criteria can be done by `|`

or `&`

. We further pass a number `25`

to `debug=`

to only an output every `25`

th iteration:

```
m3 = gradient_descent(M, f, grad_f, data[1];
debug=[:Iteration,(:Change, "|Δp|: %1.9f |"),
(:Cost, " F(x): %1.11f | "), "\n", :Stop, 25],
stopping_criterion = StopWhenGradientNormLess(1e-14) | StopAfterIteration(400),
)
```

```
Initial F(x): 0.32487988924 |
# 25 |Δp|: 0.459715605 | F(x): 0.15145076374 |
# 50 |Δp|: 0.000551270 | F(x): 0.15145051509 |
# 75 |Δp|: 0.000000674 | F(x): 0.15145051509 |
The algorithm reached approximately critical point after 75 iterations; the gradient norm (2.2455867775217738e-15) is less than 1.0e-14.
3-element Vector{Float64}:
0.6868392794788665
0.006531600680779426
0.7267799820836411
```

We can finally use another way to determine the stepsize, for example a little more expensive `ArmijoLineSeach`

than the default stepsize rule used on the Sphere.

```
m4 = gradient_descent(M, f, grad_f, data[1];
debug=[:Iteration,(:Change, "|Δp|: %1.9f |"),
(:Cost, " F(x): %1.11f | "), "\n", :Stop, 2],
stepsize = ArmijoLinesearch(M; contraction_factor=0.999, sufficient_decrease=0.5),
stopping_criterion = StopWhenGradientNormLess(1e-14) | StopAfterIteration(400),
)
```

```
Initial F(x): 0.32487988924 |
# 2 |Δp|: 0.001318138 | F(x): 0.15145051509 |
# 4 |Δp|: 0.000000021 | F(x): 0.15145051509 |
# 6 |Δp|: 0.000000021 | F(x): 0.15145051509 |
The algorithm reached approximately critical point after 7 iterations; the gradient norm (2.014814589152431e-15) is less than 1.0e-14.
3-element Vector{Float64}:
0.6868392794788668
0.006531600680779305
0.7267799820836411
```

Then we reach approximately the same point as in the previous run, but in far less steps

`[f(M, m3)-f(M,m4), distance(M, m3, m4)]`

```
2-element Vector{Float64}:
1.942890293094024e-16
2.9802322387695312e-8
```

## Example 2: Computing the median of symmetric positive definite matrices.

For the second example let’s consider the manifold of $3 × 3$ symmetric positive definite matrices and again 100 random points

```
N = SymmetricPositiveDefinite(3)
m = 100
σ = 0.005
q = Matrix{Float64}(I, 3, 3)
data2 = [exp(N, q, σ * rand(N; vector_at=q)) for i in 1:m];
```

Instead of the mean, let’s consider a non-smooth optimisation task: The median can be generalized to Manifolds as the minimiser of the sum of distances, see e.g. (Bačák, 2014). We define

`g(N, q) = sum(1 / (2 * m) * distance.(Ref(N), Ref(q), data2))`

`g (generic function with 1 method)`

Since the function is non-smooth, we can not use a gradient-based approach. But since for every summand the proximal map is available, we can use the cyclic proximal point algorithm (CPPA). We hence define the vector of proximal maps as

`proxes_g = Function[(N, λ, q) -> prox_distance(N, λ / m, di, q, 1) for di in data2];`

Besides also looking at a some debug prints, we can also easily record these values. Similarly to `debug=`

, `record=`

also accepts Symbols, see list here, to indicate things to record. We further set `return_state`

to true to obtain not just the (approximate) minimizer.

```
s = cyclic_proximal_point(N, g, proxes_g, data2[1];
debug=[:Iteration," | ",:Change," | ",(:Cost, "F(x): %1.12f"),"\n", 1000, :Stop,
],
record=[:Iteration, :Change, :Cost, :Iterate],
return_state=true,
);
```

```
Initial | | F(x): 0.004628871966
# 1000
| Last Change: 0.005022 | F(x): 0.003271791321
# 2000
| Last Change: 0.000013 | F(x): 0.003271774443
# 3000
| Last Change: 0.000004 | F(x): 0.003271771311
# 4000
| Last Change: 0.000002 | F(x): 0.003271770214
# 5000
| Last Change: 0.000001 | F(x): 0.003271769706
The algorithm reached its maximal number of iterations (5000).
```

!!!note "Technical Detail" The recording is realised by `RecordActions`

that are (also) executed at every iteration. These can also be individually implemented and added to the `record=`

array instead of symbols.

First, the computed median can be accessed as

`median = get_solver_result(s)`

```
3×3 Matrix{Float64}:
1.00054 6.1464e-5 0.000343679
6.1464e-5 1.00007 0.000101879
0.000343679 0.000101879 1.00044
```

but we can also look at the recorded values. For simplicity (of output), lets just look at the recorded values at iteration 42

`get_record(s)[42]`

`(42, 7.640883870061361e-6, 0.0032798026498283275, [1.0002117366518994 0.00014357556816643746 0.00028004648707080637; 0.00014357556816643746 0.9999321794912776 0.00029370555482273464; 0.0002800464870708619 0.0002937055548227485 1.000391810942462])`

But we can also access whole serieses and see that the cost does not decrease that fast; actually, the CPPA might converge relatively slow. For that we can for example access the `:Cost`

that was recorded every `:Iterate`

as well as the (maybe a little boring) `:Iteration`

-number in a semilogplot.

```
x = get_record(s, :Iteration, :Iteration)
y = get_record(s, :Iteration, :Cost)
using Plots
plot(x,y,xaxis=:log, label="CPPA Cost")
```

## Literature

Absil, P.-A., Mahony, R. and Sepulchre, R. (2008) *Optimization algorithms on matrix manifolds*. Princeton University Press. Available at: https://doi.org/10.1515/9781400830244.

Bačák, M. (2014) “Computing medians and means in Hadamard spaces,” *SIAM Journal on Optimization*, 24(3), pp. 1542–1566. Available at: https://doi.org/10.1137/140953393.

Boumal, N. (2022) *An introduction to optimization on smooth manifolds*. Available at: https://www.nicolasboumal.net/book.

do Carmo, M.P. (1992) *Riemannian geometry*. Birkhäuser Boston, Inc., Boston, MA (Mathematics: Theory & applications), p. xiv+300. Available at: https://doi.org/10.1007/978-1-4757-2201-7.

Karcher, H. (1977) “Riemannian center of mass and mollifier smoothing,” *Communications on Pure and Applied Mathematics*, 30(5), pp. 509–541. Available at: https://doi.org/10.1002/cpa.3160300502.