# Available solvers in Manopt.jl

Optimisation problems can be classified with respect to several criteria. In the following we provide a grouping of the algorithms with respect to the “information” available about your optimisation problem

\[\operatorname*{arg\,min}_{p∈\mathbb M} f(p)\]

Within the groups we provide short notes on advantages of the individual solvers, pointing our properties the cost $f$ should have. We use 🏅 to indicate state-of-the-art solvers, that usually perform best in their corresponding group and for a maybe not so fast, maybe not so state-of-the-art method, that nevertheless gets the job done most reliably.

## Derivative Free

For derivative free only function evaluations of $f$ are used.

- Nelder-Mead a simplex based variant, that is using $d+1$ points, where $d$ is the dimension of the manifold.
- Particle Swarm use the evolution of a set of points, called swarm, to explore the domain of the cost and find a minimizer.
- CMA-ES uses a stochastic evolutionary strategy to perform minimization robust to local minima of the objective.

## First Order

### Gradient

- Gradient Descent uses the gradient from $f$ to determine a descent direction. Here, the direction can also be changed to be Averaged, Momentum-based, based on Nesterovs rule.
- Conjugate Gradient Descent uses information from the previous descent direction to improve the current (gradient-based) one including several such update rules.
- The Quasi-Newton Method 🏅 uses gradient evaluations to approximate the Hessian, which is then used in a Newton-like scheme, where both a limited memory and a full Hessian approximation are available with several different update rules.
- Steihaug-Toint Truncated Conjugate-Gradient Method a solver for a constrained problem defined on a tangent space.

### Subgradient

The following methods require the Riemannian subgradient $∂f$ to be available. While the subgradient might be set-valued, the function should provide one of the subgradients.

- The Subgradient Method takes the negative subgradient as a step direction and can be combined with a step size.
- The Convex Bundle Method (CBM) uses a former collection of sub gradients at the previous iterates and iterate candidates to solve a local approximation to
`f`

in every iteration by solving a quadratic problem in the tangent space. - The Proximal Bundle Method works similar to CBM, but solves a proximal map-based problem in every iteration.

## Second Order

- Adaptive Regularisation with Cubics 🏅 locally builds a cubic model to determine the next descent direction.
- The Riemannian Trust-Regions Solver builds a quadratic model within a trust region to determine the next descent direction.

## Splitting based

For splitting methods, the algorithms are based on splitting the cost into different parts, usually in a sum of two or more summands. This is usually very well tailored for non-smooth objectives.

### Smooth

The following methods require that the splitting, for example into several summands, is smooth in the sense that for every summand of the cost, the gradient should still exist everywhere

- Levenberg-Marquardt minimizes the square norm of $f: \mathcal M→ℝ^d$ provided the gradients of the component functions, or in other words the Jacobian of $f$.
- Stochastic Gradient Descent is based on a splitting of $f$ into a sum of several components $f_i$ whose gradients are provided. Steps are performed according to gradients of randomly selected components.
- The Alternating Gradient Descent alternates gradient descent steps on the components of the product manifold. All these components should be smooth aso the gradient exists, and (locally) convex.

### Nonsmooth

If the gradient does not exist everywhere, that is if the splitting yields summands that are nonsmooth, usually methods based on proximal maps are used.

- The Chambolle-Pock algorithm uses a splitting $f(p) = F(p) + G(Λ(p))$, where $G$ is defined on a manifold $\mathcal N$ and we need the proximal map of its Fenchel dual. Both these functions can be non-smooth.
- The Cyclic Proximal Point uses proximal maps of the functions from splitting $f$ into summands $f_i$
- Difference of Convex Algorithm (DCA) uses a splitting of the (nonconvex) function $f = g - h$ into a difference of two functions; for each of these we require the gradient of $g$ and the subgradient of $h$ to state a sub problem in every iteration to be solved.
- Difference of Convex Proximal Point uses a splitting of the (nonconvex) function $f = g - h$ into a difference of two functions; provided the proximal map of $g$ and the subgradient of $h$, the next iterate is computed. Compared to DCA, the correpsonding sub problem is here written in a form that yields the proximal map.
- Douglas—Rachford uses a splitting $f(p) = F(x) + G(x)$ and their proximal maps to compute a minimizer of $f$, which can be non-smooth.
- Primal-dual Riemannian semismooth Newton Algorithm extends Chambolle-Pock and requires the differentials of the proximal maps additionally.

## Constrained

Constrained problems of the form

\[\begin{align*} \operatorname*{arg\,min}_{p∈\mathbb M}& f(p)\\ \text{such that } & g(p) \leq 0\\&h(p) = 0 \end{align*}\]

For these you can use

- The Augmented Lagrangian Method (ALM), where both
`g`

and`grad_g`

as well as`h`

and`grad_h`

are keyword arguments, and one of these pairs is mandatory. - The Exact Penalty Method (EPM) uses a penalty term instead of augmentation, but has the same interface as ALM.
- Frank-Wolfe algorithm, where besides the gradient of $f$ either a closed form solution or a (maybe even automatically generated) sub problem solver for $\operatorname*{arg\,min}_{q ∈ C} ⟨\operatorname{grad} f(p_k), \log_{p_k}q⟩$ is required, where $p_k$ is a fixed point on the manifold (changed in every iteration).

# Alphabetical list List of algorithms

Note that the solvers (their `AbstractManoptSolverState`

, to be precise) can also be decorated to enhance your algorithm by general additional properties, see debug output and recording values. This is done using the `debug=`

and `record=`

keywords in the function calls. Similarly, a `cache=`

keyword is available in any of the function calls, that wraps the `AbstractManoptProblem`

in a cache for certain parts of the objective.

## Technical details

The main function a solver calls is

`Manopt.solve!`

— Method`solve!(p::AbstractManoptProblem, s::AbstractManoptSolverState)`

run the solver implemented for the `AbstractManoptProblem`

`p`

and the `AbstractManoptSolverState`

`s`

employing `initialize_solver!`

, `step_solver!`

, as well as the `stop_solver!`

of the solver.

which is a framework that you in general should not change or redefine. It uses the following methods, which also need to be implemented on your own algorithm, if you want to provide one.

`Manopt.initialize_solver!`

— Function`initialize_solver!(ams::AbstractManoptProblem, amp::AbstractManoptSolverState)`

Initialize the solver to the optimization `AbstractManoptProblem`

`amp`

by initializing the necessary values in the `AbstractManoptSolverState`

`amp`

.

`initialize_solver!(amp::AbstractManoptProblem, dss::DebugSolverState)`

Extend the initialization of the solver by a hook to run the `DebugAction`

that was added to the `:Start`

entry of the debug lists. All others are triggered (with iteration number `0`

) to trigger possible resets

`initialize_solver!(ams::AbstractManoptProblem, rss::RecordSolverState)`

Extend the initialization of the solver by a hook to run records that were added to the `:Start`

entry.

`Manopt.step_solver!`

— Function`step_solver!(amp::AbstractManoptProblem, ams::AbstractManoptSolverState, i)`

Do one iteration step (the `i`

th) for an `AbstractManoptProblem`

`p`

by modifying the values in the `AbstractManoptSolverState`

`ams`

.

`step_solver!(amp::AbstractManoptProblem, dss::DebugSolverState, i)`

Extend the `i`

th step of the solver by a hook to run debug prints, that were added to the `:BeforeIteration`

and `:Iteration`

entries of the debug lists.

`step_solver!(amp::AbstractManoptProblem, rss::RecordSolverState, i)`

Extend the `i`

th step of the solver by a hook to run records, that were added to the `:Iteration`

entry.

`Manopt.get_solver_result`

— Function```
get_solver_result(ams::AbstractManoptSolverState)
get_solver_result(tos::Tuple{AbstractManifoldObjective,AbstractManoptSolverState})
get_solver_result(o::AbstractManifoldObjective, s::AbstractManoptSolverState)
```

Return the final result after all iterations that is stored within the `AbstractManoptSolverState`

`ams`

, which was modified during the iterations.

For the case the objective is passed as well, but default, the objective is ignored, and the solver result for the state is called.

`Manopt.get_solver_return`

— Function```
get_solver_return(s::AbstractManoptSolverState)
get_solver_return(o::AbstractManifoldObjective, s::AbstractManoptSolverState)
```

determine the result value of a call to a solver. By default this returns the same as `get_solver_result`

.

```
get_solver_return(s::ReturnSolverState)
get_solver_return(o::AbstractManifoldObjective, s::ReturnSolverState)
```

return the internally stored state of the `ReturnSolverState`

instead of the minimizer. This means that when the state are decorated like this, the user still has to call `get_solver_result`

on the internal state separately.

`get_solver_return(o::ReturnManifoldObjective, s::AbstractManoptSolverState)`

return both the objective and the state as a tuple.

`Manopt.stop_solver!`

— Method`stop_solver!(amp::AbstractManoptProblem, ams::AbstractManoptSolverState, i)`

depending on the current `AbstractManoptProblem`

`amp`

, the current state of the solver stored in `AbstractManoptSolverState`

`ams`

and the current iterate `i`

this function determines whether to stop the solver, which by default means to call the internal `StoppingCriterion`

. `ams.stop`

## API for solvers

this is a short overview of the different types of high-level functions are usually available for a solver. Assume the solver is called `new_solver`

and requires a cost `f`

and some first order information `df`

as well as a starting point `p`

on `M`

. `f`

and `df`

form the objective together called `obj`

.

Then there are basically two different variants to call

### The easy to access call

```
new_solver(M, f, df, p=rand(M); kwargs...)
new_solver!(M, f, df, p; kwargs...)
```

Where the start point should be optional. Keyword arguments include the type of evaluation, decorators like `debug=`

or `record=`

as well as algorithm specific ones. If you provide an immutable point `p`

or the `rand(M)`

point is immutable, like on the `Circle()`

this method should turn the point into a mutable one as well.

The third variant works in place of `p`

, so it is mandatory.

This first interface would set up the objective and pass all keywords on the objective based call.

### Objective based calls to solvers

```
new_solver(M, obj, p=rand(M); kwargs...)
new_solver!(M, obj, p; kwargs...)
```

Here the objective would be created beforehand for example to compare different solvers on the same objective, and for the first variant the start point is optional. Keyword arguments include decorators like `debug=`

or `record=`

as well as algorithm specific ones.

This variant would generate the `problem`

and the `state`

and verify validity of all provided keyword arguments that affect the state. Then it would call the iterate process.

### Manual calls

If you generate the corresponding `problem`

and `state`

as the previous step does, you can also use the third (lowest level) and just call

`solve!(problem, state)`