Available solvers in Manopt.jl

Optimisation problems can be classified with respect to several criteria. The following list of the algorithms is a grouped with respect to the “information” available about a optimisation problem

\[\operatorname*{arg\,min}_{p∈\mathbb M} f(p)\]

Within each group short notes on advantages of the individual solvers, and required properties the cost $f$ should have, are provided. In that list a 🏅 is used to indicate state-of-the-art solvers, that usually perform best in their corresponding group and 🫏 for a maybe not so fast, maybe not so state-of-the-art method, that nevertheless gets the job done most reliably.

Derivative free

For derivative free only function evaluations of $f$ are used.

  • Nelder-Mead a simplex based variant, that is using $d+1$ points, where $d$ is the dimension of the manifold.
  • Particle Swarm 🫏 use the evolution of a set of points, called swarm, to explore the domain of the cost and find a minimizer.
  • CMA-ES uses a stochastic evolutionary strategy to perform minimization robust to local minima of the objective.

First order


  • Gradient Descent uses the gradient from $f$ to determine a descent direction. Here, the direction can also be changed to be Averaged, Momentum-based, based on Nesterovs rule.
  • Conjugate Gradient Descent uses information from the previous descent direction to improve the current (gradient-based) one including several such update rules.
  • The Quasi-Newton Method 🏅 uses gradient evaluations to approximate the Hessian, which is then used in a Newton-like scheme, where both a limited memory and a full Hessian approximation are available with several different update rules.
  • Steihaug-Toint Truncated Conjugate-Gradient Method a solver for a constrained problem defined on a tangent space.


The following methods require the Riemannian subgradient $∂f$ to be available. While the subgradient might be set-valued, the function should provide one of the subgradients.

  • The Subgradient Method takes the negative subgradient as a step direction and can be combined with a step size.
  • The Convex Bundle Method (CBM) uses a former collection of sub gradients at the previous iterates and iterate candidates to solve a local approximation to f in every iteration by solving a quadratic problem in the tangent space.
  • The Proximal Bundle Method works similar to CBM, but solves a proximal map-based problem in every iteration.

Second order

Splitting based

For splitting methods, the algorithms are based on splitting the cost into different parts, usually in a sum of two or more summands. This is usually very well tailored for non-smooth objectives.


The following methods require that the splitting, for example into several summands, is smooth in the sense that for every summand of the cost, the gradient should still exist everywhere

  • Levenberg-Marquardt minimizes the square norm of $f: \mathcal M→ℝ^d$ provided the gradients of the component functions, or in other words the Jacobian of $f$.
  • Stochastic Gradient Descent is based on a splitting of $f$ into a sum of several components $f_i$ whose gradients are provided. Steps are performed according to gradients of randomly selected components.
  • The Alternating Gradient Descent alternates gradient descent steps on the components of the product manifold. All these components should be smooth as it is required, that the gradient exists, and is (locally) convex.


If the gradient does not exist everywhere, that is if the splitting yields summands that are nonsmooth, usually methods based on proximal maps are used.

  • The Chambolle-Pock algorithm uses a splitting $f(p) = F(p) + G(Λ(p))$, where $G$ is defined on a manifold $\mathcal N$ and the proximal map of its Fenchel dual is required. Both these functions can be non-smooth.
  • The Cyclic Proximal Point 🫏 uses proximal maps of the functions from splitting $f$ into summands $f_i$
  • Difference of Convex Algorithm (DCA) uses a splitting of the (non-convex) function $f = g - h$ into a difference of two functions; for each of these it is required to have access to the gradient of $g$ and the subgradient of $h$ to state a sub problem in every iteration to be solved.
  • Difference of Convex Proximal Point uses a splitting of the (non-convex) function $f = g - h$ into a difference of two functions; provided the proximal map of $g$ and the subgradient of $h$, the next iterate is computed. Compared to DCA, the corresponding sub problem is here written in a form that yields the proximal map.
  • Douglas—Rachford uses a splitting $f(p) = F(x) + G(x)$ and their proximal maps to compute a minimizer of $f$, which can be non-smooth.
  • Primal-dual Riemannian semismooth Newton Algorithm extends Chambolle-Pock and requires the differentials of the proximal maps additionally.


Constrained problems of the form

\[\begin{align*} \operatorname*{arg\,min}_{p∈\mathbb M}& f(p)\\ \text{such that } & g(p) \leq 0\\&h(p) = 0 \end{align*}\]

For these you can use

  • The Augmented Lagrangian Method (ALM), where both g and grad_g as well as h and grad_h are keyword arguments, and one of these pairs is mandatory.
  • The Exact Penalty Method (EPM) uses a penalty term instead of augmentation, but has the same interface as ALM.
  • Frank-Wolfe algorithm, where besides the gradient of $f$ either a closed form solution or a (maybe even automatically generated) sub problem solver for $\operatorname*{arg\,min}_{q ∈ C} ⟨\operatorname{grad} f(p_k), \log_{p_k}q⟩$ is required, where $p_k$ is a fixed point on the manifold (changed in every iteration).

Alphabetical list List of algorithms

Adaptive Regularisation with Cubicsadaptive_regularization_with_cubicsAdaptiveRegularizationState
Augmented Lagrangian Methodaugmented_Lagrangian_methodAugmentedLagrangianMethodState
Conjugate Gradient Descentconjugate_gradient_descentConjugateGradientDescentState
Convex Bundle Methodconvex_bundle_methodConvexBundleMethodState
Cyclic Proximal Pointcyclic_proximal_pointCyclicProximalPointState
Difference of Convex Algorithmdifference_of_convex_algorithmDifferenceOfConvexState
Difference of Convex Proximal Pointdifference_of_convex_proximal_pointDifferenceOfConvexProximalState
Exact Penalty Methodexact_penalty_methodExactPenaltyMethodState
Frank-Wolfe algorithmFrank_Wolfe_methodFrankWolfeState
Gradient Descentgradient_descentGradientDescentState
Particle Swarmparticle_swarmParticleSwarmState
Primal-dual Riemannian semismooth Newton Algorithmprimal_dual_semismooth_NewtonPrimalDualSemismoothNewtonState
Proximal Bundle Methodproximal_bundle_methodProximalBundleMethodState
Quasi-Newton Methodquasi_NewtonQuasiNewtonState
Steihaug-Toint Truncated Conjugate-Gradient Methodtruncated_conjugate_gradient_descentTruncatedConjugateGradientState
Subgradient Methodsubgradient_methodSubGradientMethodState
Stochastic Gradient Descentstochastic_gradient_descentStochasticGradientDescentState
Riemannian Trust-Regionstrust_regionsTrustRegionsState

Note that the solvers (their AbstractManoptSolverState, to be precise) can also be decorated to enhance your algorithm by general additional properties, see debug output and recording values. This is done using the debug= and record= keywords in the function calls. Similarly, a cache= keyword is available in any of the function calls, that wraps the AbstractManoptProblem in a cache for certain parts of the objective.

Technical details

The main function a solver calls is

which is a framework that you in general should not change or redefine. It uses the following methods, which also need to be implemented on your own algorithm, if you want to provide one.

initialize_solver!(ams::AbstractManoptProblem, amp::AbstractManoptSolverState)

Initialize the solver to the optimization AbstractManoptProblem amp by initializing the necessary values in the AbstractManoptSolverState amp.

initialize_solver!(amp::AbstractManoptProblem, dss::DebugSolverState)

Extend the initialization of the solver by a hook to run the DebugAction that was added to the :Start entry of the debug lists. All others are triggered (with iteration number 0) to trigger possible resets

initialize_solver!(ams::AbstractManoptProblem, rss::RecordSolverState)

Extend the initialization of the solver by a hook to run records that were added to the :Start entry.

step_solver!(amp::AbstractManoptProblem, ams::AbstractManoptSolverState, i)

Do one iteration step (the ith) for an AbstractManoptProblemp by modifying the values in the AbstractManoptSolverState ams.

step_solver!(amp::AbstractManoptProblem, dss::DebugSolverState, i)

Extend the ith step of the solver by a hook to run debug prints, that were added to the :BeforeIteration and :Iteration entries of the debug lists.

step_solver!(amp::AbstractManoptProblem, rss::RecordSolverState, i)

Extend the ith step of the solver by a hook to run records, that were added to the :Iteration entry.

get_solver_result(o::AbstractManifoldObjective, s::AbstractManoptSolverState)

Return the final result after all iterations that is stored within the AbstractManoptSolverState ams, which was modified during the iterations.

For the case the objective is passed as well, but default, the objective is ignored, and the solver result for the state is called.

get_solver_return(o::AbstractManifoldObjective, s::AbstractManoptSolverState)

determine the result value of a call to a solver. By default this returns the same as get_solver_result.

get_solver_return(o::AbstractManifoldObjective, s::ReturnSolverState)

return the internally stored state of the ReturnSolverState instead of the minimizer. This means that when the state are decorated like this, the user still has to call get_solver_result on the internal state separately.

get_solver_return(o::ReturnManifoldObjective, s::AbstractManoptSolverState)

return both the objective and the state as a tuple.


API for solvers

this is a short overview of the different types of high-level functions are usually available for a solver. Assume the solver is called new_solver and requires a cost f and some first order information df as well as a starting point p on M. f and df form the objective together called obj.

Then there are basically two different variants to call

The easy to access call

new_solver(M, f, df, p=rand(M); kwargs...)
new_solver!(M, f, df, p; kwargs...)

Where the start point should be optional. Keyword arguments include the type of evaluation, decorators like debug= or record= as well as algorithm specific ones. If you provide an immutable point p or the rand(M) point is immutable, like on the Circle() this method should turn the point into a mutable one as well.

The third variant works in place of p, so it is mandatory.

This first interface would set up the objective and pass all keywords on the objective based call.

Objective based calls to solvers

new_solver(M, obj, p=rand(M); kwargs...)
new_solver!(M, obj, p; kwargs...)

Here the objective would be created beforehand for example to compare different solvers on the same objective, and for the first variant the start point is optional. Keyword arguments include decorators like debug= or record= as well as algorithm specific ones.

This variant would generate the problem and the state and verify validity of all provided keyword arguments that affect the state. Then it would call the iterate process.

Manual calls

If you generate the corresponding problem and state as the previous step does, you can also use the third (lowest level) and just call

solve!(problem, state)