Available solvers in Manopt.jl

Optimisation problems can be classified with respect to several criteria. In the following we provide a grouping of the algorithms with respect to the “information” available about your optimisation problem

\[\operatorname*{arg\,min}_{p∈\mathbb M} f(p)\]

Within the groups we provide short notes on advantages of the individual solvers, pointing our properties the cost $f$ should have. We use 🏅 to indicate state-of-the-art solvers, that usually perform best in their corresponding group and 🫏 for a maybe not so fast, maybe not so state-of-the-art method, that nevertheless gets the job done most reliably.

Derivative Free

For derivative free only function evaluations of $f$ are used.

Nelder-Mead a simplex based variant, that is using $d+1$ points, where $d$ is the dimension of the manifold.
Particle Swarm 🫏 use the evolution of a set of points, called swarm, to explore the domain of the cost and find a minimizer.
CMA-ES uses a stochastic evolutionary strategy to perform minimization robust to local minima of the objective.

First Order

Gradient

Gradient Descent uses the gradient from $f$ to determine a descent direction. Here, the direction can also be changed to be Averaged, Momentum-based, based on Nesterovs rule.
Conjugate Gradient Descent uses information from the previous descent direction to improve the current (gradient-based) one including several such update rules.
The Quasi-Newton Method 🏅 uses gradient evaluations to approximate the Hessian, which is then used in a Newton-like scheme, where both a limited memory and a full Hessian approximation are available with several different update rules.
Steihaug-Toint Truncated Conjugate-Gradient Method a solver for a constrained problem defined on a tangent space.

Subgradient

The following methods require the Riemannian subgradient $∂f$ to be available. While the subgradient might be set-valued, the function should provide one of the subgradients.

The Subgradient Method takes the negative subgradient as a step direction and can be combined with a step size.
The Convex Bundle Method (CBM) uses a former collection of sub gradients at the previous iterates and iterate candidates to solve a local approximation to f in every iteration by solving a quadratic problem in the tangent space.
The Proximal Bundle Method works similar to CBM, but solves a proximal map-based problem in every iteration.

Second Order

Adaptive Regularisation with Cubics 🏅 locally builds a cubic model to determine the next descent direction.
The Riemannian Trust-Regions Solver builds a quadratic model within a trust region to determine the next descent direction.

Splitting based

For splitting methods, the algorithms are based on splitting the cost into different parts, usually in a sum of two or more summands. This is usually very well tailored for non-smooth objectives.

Smooth

The following methods require that the splitting, for example into several summands, is smooth in the sense that for every summand of the cost, the gradient should still exist everywhere

Levenberg-Marquardt minimizes the square norm of $f: \mathcal M→ℝ^d$ provided the gradients of the component functions, or in other words the Jacobian of $f$.
Stochastic Gradient Descent is based on a splitting of $f$ into a sum of several components $f_i$ whose gradients are provided. Steps are performed according to gradients of randomly selected components.
The Alternating Gradient Descent alternates gradient descent steps on the components of the product manifold. All these components should be smooth aso the gradient exists, and (locally) convex.

Nonsmooth

If the gradient does not exist everywhere, that is if the splitting yields summands that are nonsmooth, usually methods based on proximal maps are used.

The Chambolle-Pock algorithm uses a splitting $f(p) = F(p) + G(Λ(p))$, where $G$ is defined on a manifold $\mathcal N$ and we need the proximal map of its Fenchel dual. Both these functions can be non-smooth.
The Cyclic Proximal Point 🫏 uses proximal maps of the functions from splitting $f$ into summands $f_i$
Difference of Convex Algorithm (DCA) uses a splitting of the (nonconvex) function $f = g - h$ into a difference of two functions; for each of these we require the gradient of $g$ and the subgradient of $h$ to state a sub problem in every iteration to be solved.
Difference of Convex Proximal Point uses a splitting of the (nonconvex) function $f = g - h$ into a difference of two functions; provided the proximal map of $g$ and the subgradient of $h$, the next iterate is computed. Compared to DCA, the correpsonding sub problem is here written in a form that yields the proximal map.
Douglas—Rachford uses a splitting $f(p) = F(x) + G(x)$ and their proximal maps to compute a minimizer of $f$, which can be non-smooth.
Primal-dual Riemannian semismooth Newton Algorithm extends Chambolle-Pock and requires the differentials of the proximal maps additionally.

Constrained

Constrained problems of the form

\[\begin{align*} \operatorname*{arg\,min}_{p∈\mathbb M}& f(p)\\ \text{such that } & g(p) \leq 0\\&h(p) = 0 \end{align*}\]

For these you can use

The Augmented Lagrangian Method (ALM), where both g and grad_g as well as h and grad_h are keyword arguments, and one of these pairs is mandatory.
The Exact Penalty Method (EPM) uses a penalty term instead of augmentation, but has the same interface as ALM.
Frank-Wolfe algorithm, where besides the gradient of $f$ either a closed form solution or a (maybe even automatically generated) sub problem solver for $\operatorname*{arg\,min}_{q ∈ C} ⟨\operatorname{grad} f(p_k), \log_{p_k}q⟩$ is required, where $p_k$ is a fixed point on the manifold (changed in every iteration).

Alphabetical list List of algorithms

Solver	Function	State
Adaptive Regularisation with Cubics	`adaptive_regularization_with_cubics`	`AdaptiveRegularizationState`
Augmented Lagrangian Method	`augmented_Lagrangian_method`	`AugmentedLagrangianMethodState`
Chambolle-Pock	`ChambollePock`	`ChambollePockState`
Conjugate Gradient Descent	`conjugate_gradient_descent`	`ConjugateGradientDescentState`
Convex Bundle Method	`convex_bundle_method`	`ConvexBundleMethodState`
Cyclic Proximal Point	`cyclic_proximal_point`	`CyclicProximalPointState`
Difference of Convex Algorithm	`difference_of_convex_algorithm`	`DifferenceOfConvexState`
Difference of Convex Proximal Point	`difference_of_convex_proximal_point`	`DifferenceOfConvexProximalState`
Douglas—Rachford	`DouglasRachford`	`DouglasRachfordState`
Exact Penalty Method	`exact_penalty_method`	`ExactPenaltyMethodState`
Frank-Wolfe algorithm	`Frank_Wolfe_method`	`FrankWolfeState`
Gradient Descent	`gradient_descent`	`GradientDescentState`
Levenberg-Marquardt	`LevenbergMarquardt`	`LevenbergMarquardtState`
Nelder-Mead	`NelderMead`	`NelderMeadState`
Particle Swarm	`particle_swarm`	`ParticleSwarmState`
Primal-dual Riemannian semismooth Newton Algorithm	`primal_dual_semismooth_Newton`	`PrimalDualSemismoothNewtonState`
Proximal Bundle Method	`proximal_bundle_method`	`ProximalBundleMethodState`
Quasi-Newton Method	`quasi_Newton`	`QuasiNewtonState`
Steihaug-Toint Truncated Conjugate-Gradient Method	`truncated_conjugate_gradient_descent`	`TruncatedConjugateGradientState`
Subgradient Method	`subgradient_method`	`SubGradientMethodState`
Stochastic Gradient Descent	`stochastic_gradient_descent`	`StochasticGradientDescentState`
Riemannian Trust-Regions	`trust_regions`	`TrustRegionsState`

Note that the solvers (their AbstractManoptSolverState, to be precise) can also be decorated to enhance your algorithm by general additional properties, see debug output and recording values. This is done using the debug= and record= keywords in the function calls. Similarly, a cache= keyword is available in any of the function calls, that wraps the AbstractManoptProblem in a cache for certain parts of the objective.

Technical details

The main function a solver calls is

Manopt.solve! — Method

solve!(p::AbstractManoptProblem, s::AbstractManoptSolverState)

run the solver implemented for the AbstractManoptProblemp and the AbstractManoptSolverStates employing initialize_solver!, step_solver!, as well as the stop_solver! of the solver.

source

which is a framework that you in general should not change or redefine. It uses the following methods, which also need to be implemented on your own algorithm, if you want to provide one.

Manopt.initialize_solver! — Function

initialize_solver!(ams::AbstractManoptProblem, amp::AbstractManoptSolverState)

Initialize the solver to the optimization AbstractManoptProblem amp by initializing the necessary values in the AbstractManoptSolverState amp.

source

initialize_solver!(amp::AbstractManoptProblem, dss::DebugSolverState)

Extend the initialization of the solver by a hook to run the DebugAction that was added to the :Start entry of the debug lists. All others are triggered (with iteration number 0) to trigger possible resets

source

initialize_solver!(ams::AbstractManoptProblem, rss::RecordSolverState)

Extend the initialization of the solver by a hook to run records that were added to the :Start entry.

source

Manopt.step_solver! — Function

step_solver!(amp::AbstractManoptProblem, ams::AbstractManoptSolverState, i)

Do one iteration step (the ith) for an AbstractManoptProblemp by modifying the values in the AbstractManoptSolverState ams.

source

step_solver!(amp::AbstractManoptProblem, dss::DebugSolverState, i)

Extend the ith step of the solver by a hook to run debug prints, that were added to the :BeforeIteration and :Iteration entries of the debug lists.

source

step_solver!(amp::AbstractManoptProblem, rss::RecordSolverState, i)

Extend the ith step of the solver by a hook to run records, that were added to the :Iteration entry.

source

Manopt.get_solver_result — Function

get_solver_result(ams::AbstractManoptSolverState)
get_solver_result(tos::Tuple{AbstractManifoldObjective,AbstractManoptSolverState})
get_solver_result(o::AbstractManifoldObjective, s::AbstractManoptSolverState)

Return the final result after all iterations that is stored within the AbstractManoptSolverState ams, which was modified during the iterations.

For the case the objective is passed as well, but default, the objective is ignored, and the solver result for the state is called.

source

Manopt.get_solver_return — Function

get_solver_return(s::AbstractManoptSolverState)
get_solver_return(o::AbstractManifoldObjective, s::AbstractManoptSolverState)

determine the result value of a call to a solver. By default this returns the same as get_solver_result.

get_solver_return(s::ReturnSolverState)
get_solver_return(o::AbstractManifoldObjective, s::ReturnSolverState)

return the internally stored state of the ReturnSolverState instead of the minimizer. This means that when the state are decorated like this, the user still has to call get_solver_result on the internal state separately.

get_solver_return(o::ReturnManifoldObjective, s::AbstractManoptSolverState)

return both the objective and the state as a tuple.

source

Manopt.stop_solver! — Method

stop_solver!(amp::AbstractManoptProblem, ams::AbstractManoptSolverState, i)

depending on the current AbstractManoptProblem amp, the current state of the solver stored in AbstractManoptSolverState ams and the current iterate i this function determines whether to stop the solver, which by default means to call the internal StoppingCriterion. ams.stop

source

API for solvers

this is a short overview of the different types of high-level functions are usually available for a solver. Assume the solver is called new_solver and requires a cost f and some first order information df as well as a starting point p on M. f and df form the objective together called obj.

Then there are basically two different variants to call

The easy to access call

new_solver(M, f, df, p=rand(M); kwargs...)
new_solver!(M, f, df, p; kwargs...)

Where the start point should be optional. Keyword arguments include the type of evaluation, decorators like debug= or record= as well as algorithm specific ones. If you provide an immutable point p or the rand(M) point is immutable, like on the Circle() this method should turn the point into a mutable one as well.

The third variant works in place of p, so it is mandatory.

This first interface would set up the objective and pass all keywords on the objective based call.

Objective based calls to solvers

new_solver(M, obj, p=rand(M); kwargs...)
new_solver!(M, obj, p; kwargs...)

Here the objective would be created beforehand for example to compare different solvers on the same objective, and for the first variant the start point is optional. Keyword arguments include decorators like debug= or record= as well as algorithm specific ones.

This variant would generate the problem and the state and verify validity of all provided keyword arguments that affect the state. Then it would call the iterate process.

Manual calls

If you generate the corresponding problem and state as the previous step does, you can also use the third (lowest level) and just call

solve!(problem, state)