Adaptive regularization with cubics

adaptive_regularization_with_cubics(M, f, grad_f, Hess_f, p=rand(M); kwargs...)
adaptive_regularization_with_cubics(M, f, grad_f, p=rand(M); kwargs...)
adaptive_regularization_with_cubics(M, mho, p=rand(M); kwargs...)

Solve an optimization problem on the manifold M by iteratively minimizing

\[ m_k(X) = f(p_k) + ⟨X, \operatorname{grad} f(p_k)⟩ + \frac{1}{2}⟨X, \operatorname{Hess} f(p_k)[X]⟩ + \frac{σ_k}{3}\lVert X \rVert^3\]

on the tangent space at the current iterate $p_k$, where $X ∈ T_{p_k}\mathcal M$ and $σ_k > 0$ is a regularization parameter.

Let $X_k$ denote the minimizer of the model $m_k$ and use the model improvement

\[ ρ_k = \frac{f(p_k) - f(\operatorname{retr}_{p_k}(X_k))}{m_k(0) - m_k(X_k) + \frac{σ_k}{3}\lVert X_k\rVert^3}.\]

With two thresholds $η_2 ≥ η_1 > 0$ set $p_{k+1} = \operatorname{retr}_{p_k}(X_k)$ if $ρ ≥ η_1$ and reject the candidate otherwise, that is, set $p_{k+1} = p_k$.

Further update the regularization parameter using factors $0 < γ_1 < 1 < γ_2$

\[σ_{k+1} = \begin{cases} \max\{σ_{\min}, γ_1σ_k\} & \text{ if } ρ \geq η_2 &\text{ (the model was very successful)},\\ σ_k & \text{ if } ρ ∈ [η_1, η_2)&\text{ (the model was successful)},\\ γ_2σ_k & \text{ if } ρ < η_1&\text{ (the model was unsuccessful)}. \end{cases}\]

For more details see [ABBC20].


  • M: a manifold $\mathcal M$
  • f: a cost function $F: \mathcal M → ℝ$ to minimize
  • grad_f: the gradient $\operatorname{grad}F: \mathcal M → T \mathcal M$ of $F$
  • Hess_f: (optional) the Hessian $H( \mathcal M, x, ξ)$ of $F$
  • p: an initial value $p ∈ \mathcal M$

For the case that no Hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference.

the cost f and its gradient and Hessian might also be provided as a ManifoldHessianObjective

Keyword arguments

the default values are given in brackets

  • σ: (100.0 / sqrt(manifold_dimension(M)) initial regularization parameter
  • σmin: (1e-10) minimal regularization value $σ_{\min}$
  • η1: (0.1) lower model success threshold
  • η2: (0.9) upper model success threshold
  • γ1: (0.1) regularization reduction factor (for the success case)
  • γ2: (2.0) regularization increment factor (for the non-success case)
  • evaluation: (AllocatingEvaluation) specify whether the gradient works by allocation (default) form grad_f(M, p) or InplaceEvaluation in place, that is of the form grad_f!(M, X, p) and analogously for the Hessian.
  • retraction_method: (default_retraction_method(M, typeof(p))) a retraction to use
  • initial_tangent_vector: (zero_vector(M, p)) initialize any tangent vector data,
  • maxIterLanczos: (200) a shortcut to set the stopping criterion in the sub solver,
  • ρ_regularization: (1e3) a regularization to avoid dividing by zero for small values of cost and model
  • stopping_criterion: (StopAfterIteration(40) |StopWhenGradientNormLess(1e-9) |StopWhenAllLanczosVectorsUsed(maxIterLanczos))
  • sub_state: LanczosState(M, copy(M, p); maxIterLanczos=maxIterLanczos, σ=σ) a state for the subproblem or an [AbstractEvaluationType`](@ref) if the problem is a function.
  • sub_objective: a shortcut to modify the objective of the subproblem used within in the
  • sub_problem: DefaultManoptProblem(M, sub_objective) the problem (or a function) for the sub problem

All other keyword arguments are passed to decorate_state! for state decorators or decorate_objective! for objective, respectively. If you provide the ManifoldGradientObjective directly, these decorations can still be specified

By default the debug= keyword is set to DebugIfEntry(:ρ_denominator, >(0); message="Denominator nonpositive", type=:error) to avoid that by rounding errors the denominator in the computation of ρ gets nonpositive.

adaptive_regularization_with_cubics!(M, f, grad_f, Hess_f, p; kwargs...)
adaptive_regularization_with_cubics!(M, f, grad_f, p; kwargs...)
adaptive_regularization_with_cubics!(M, mho, p; kwargs...)

evaluate the Riemannian adaptive regularization with cubics solver in place of p.


  • M: a manifold $\mathcal M$
  • f: a cost function $F: \mathcal M → ℝ$ to minimize
  • grad_f: the gradient $\operatorname{grad}F: \mathcal M → T \mathcal M$ of $F$
  • Hess_f: (optional) the Hessian $H( \mathcal M, x, ξ)$ of $F$
  • p: an initial value $p ∈ \mathcal M$

For the case that no Hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference.

the cost f and its gradient and Hessian might also be provided as a ManifoldHessianObjective

for more details and all options, see adaptive_regularization_with_cubics.



AdaptiveRegularizationState{P,T} <: AbstractHessianSolverState

A state for the adaptive_regularization_with_cubics solver.


a default value is given in brackets if a parameter can be left out in initialization.

  • η1, η2: (0.1, 0.9) bounds for evaluating the regularization parameter
  • γ1, γ2: (0.1, 2.0) shrinking and expansion factors for regularization parameter σ
  • p: (rand(M) the current iterate
  • X: (zero_vector(M,p)) the current gradient $\operatorname{grad}f(p)$
  • s: (zero_vector(M,p)) the tangent vector step resulting from minimizing the model problem in the tangent space $\mathcal T_{p} \mathcal M$
  • σ: the current cubic regularization parameter
  • σmin: (1e-7) lower bound for the cubic regularization parameter
  • ρ_regularization: (1e3) regularization parameter for computing ρ.

When approaching convergence ρ may be difficult to compute with numerator and denominator approaching zero. Regularizing the ratio lets ρ go to 1 near convergence.

Furthermore the following integral fields are defined

  • q: (copy(M,p)) a point for the candidates to evaluate model and ρ
  • H: (copy(M, p, X)) the current Hessian, $\operatorname{Hess}F(p)[⋅]$
  • S: (copy(M, p, X)) the current solution from the subsolver
  • ρ: the current regularized ratio of actual improvement and model improvement.
  • ρ_denominator: (one(ρ)) a value to store the denominator from the computation of ρ to allow for a warning or error when this value is non-positive.


AdaptiveRegularizationState(M, p=rand(M); X=zero_vector(M, p); kwargs...)

Construct the solver state with all fields stated as keyword arguments.


Sub solvers

There are several ways to approach the subsolver. The default is the first one.

Lanczos iteration

LanczosState{P,T,SC,B,I,R,TM,V,Y} <: AbstractManoptSolverState

Solve the adaptive regularized subproblem with a Lanczos iteration


  • stop: the stopping criterion
  • σ: the current regularization parameter
  • X: the Iterate
  • Lanczos_vectors: the obtained Lanczos vectors
  • tridig_matrix: the tridiagonal coefficient matrix T
  • coefficients: the coefficients $y_1,...y_k$ that determine the solution
  • Hp: a temporary vector containing the evaluation of the Hessian
  • Hp_residual: a temporary vector containing the residual to the Hessian
  • S: the current obtained / approximated solution

(Conjugate) gradient descent

There is a generic objective, that implements the sub problem


A model for the adaptive regularization with Cubics

\[m(X) = f(p) + ⟨\operatorname{grad} f(p), X ⟩_p + \frac{1}{2} ⟨\operatorname{Hess} f(p)[X], X⟩_p + \frac{σ}{3} \lVert X \rVert^3,\]

cf. Eq. (33) in [ABBC20]



AdaptiveRagularizationWithCubicsModelObjective(mho, σ=1.0)

with either an AbstractManifoldHessianObjective objective or an decorator containing such an objective.


Since the sub problem is given on the tangent space, you have to provide

arc_obj = AdaptiveRagularizationWithCubicsModelObjective(mho, σ)
sub_problem = DefaultProblem(TangentSpaceAt(M,p), arc_obj)

where mho is the Hessian objective of f to solve. Then use this for the sub_problem keyword and use your favourite gradient based solver for the sub_state keyword, for example a ConjugateGradientDescentState

Additional stopping criteria

StopWhenAllLanczosVectorsUsed <: StoppingCriterion

When an inner iteration has used up all Lanczos vectors, then this stopping criterion is a fallback / security stopping criterion to not access a non-existing field in the array allocated for vectors.

Note that this stopping criterion (for now) is only implemented for the case that an AdaptiveRegularizationState when using a LanczosState subsolver


  • maxLanczosVectors: maximal number of Lanczos vectors
  • reason: a String indicating the reason if the criterion indicated to stop


StopWhenFirstOrderProgress <: StoppingCriterion

A stopping criterion related to the Riemannian adaptive regularization with cubics (ARC) solver indicating that the model function at the current (outer) iterate,

\[ m(X) = f(p) + <X, \operatorname{grad}f(p)> + \frac{1}{2} <X, \operatorname{Hess} f(p)[X]> + \frac{σ}{3} \lVert X \rVert^3,\]

defined on the tangent space $T_{p}\mathcal M$ fulfills at the current iterate $X_k$ that

\[m(X_k) \leq m(0) \quad\text{ and }\quad \lVert \operatorname{grad} m(X_k) \rVert ≤ θ \lVert X_k \rVert^2\]


  • θ: the factor $θ$ in the second condition
  • reason: a String indicating the reason if the criterion indicated to stop

Technical details

The adaptive_regularization_with_cubics requires the following functions of a manifolds to be available

  • A retract!(M, q, p, X); it is recommended to set the default_retraction_method to a favourite retraction. If this default is set, a retraction_method= does not have to be specified.
  • if you do not provide an initial regularization parameter σ, a manifold_dimension is required.
  • By default the tangent vector storing the gradient is initialized calling zero_vector(M,p).
  • inner(M, p, X, Y) is used within the algorithm step

Furthermore, within the Lanczos subsolver, generating a random vector (at p) using rand!(M, X; vector_at=p) in place of X is required


N. Agarwal, N. Boumal, B. Bullins and C. Cartis. Adaptive regularization with cubics on manifolds. Mathematical Programming (2020).