Adaptive regularization with Cubics

Manopt.adaptive_regularization_with_cubicsFunction
adaptive_regularization_with_cubics(M, f, grad_f, Hess_f, p=rand(M); kwargs...)
adaptive_regularization_with_cubics(M, f, grad_f, p=rand(M); kwargs...)
adaptive_regularization_with_cubics(M, mho, p=rand(M); kwargs...)

Solve an optimization problem on the manifold M by iteratively minimizing

\[m_k(X) = f(p_k) + ⟨X, \operatorname{grad} f(p_k)⟩ + \frac{1}{2}⟨X, \operatorname{Hess} f(p_k)[X]⟩ + \frac{σ_k}{3}\lVert X \rVert^3\]

on the tangent space at the current iterate $p_k$, i.e. $X ∈ T_{p_k}\mathcal M$ and where $σ_k > 0$ is a regularization parameter.

Let $X_k$ denote the minimizer of the model $m_k$, then we use the model improvement

\[ρ_k = \frac{f(p_k) - f(\operatorname{retr}_{p_k}(X_k))}{m_k(0) - m_k(s) + \frac{σ_k}{3}\lVert X_k\rVert^3}.\]

We use two thresholds $η_2 ≥ η_1 > 0$ and set $p_{k+1} = \operatorname{retr}_{p_k}(X_k)$ if $ρ ≥ η_1$ and reject the candidate otherwise, i.e. set $p_{k+1} = p_k$.

We further update the regularozation parameter using factors $0 < γ_1 < 1 < γ_2$

\[σ_{k+1} = \begin{cases} \max\{σ_{\min}, γ_1σ_k\} & \text{ if } ρ \geq η_2 &\text{ (the model was very successful)},\\ σ_k & \text{ if } ρ \in [η_1, η_2)&\text{ (the model was succesful)},\\ γ_2σ_k & \text{ if } ρ < η_1&\text{ (the model was unsuccesful)}. \end{cases}\]

For more details see Agarwal, Boumal, Bullins, Cartis, Math. Prog., 2020.

Input

  • M – a manifold $\mathcal M$
  • f – a cost function $F: \mathcal M → ℝ$ to minimize
  • grad_f- the gradient $\operatorname{grad}F: \mathcal M → T \mathcal M$ of $F$
  • Hess_f – (optional) the hessian $H( \mathcal M, x, ξ)$ of $F$
  • p – an initial value $p ∈ \mathcal M$

For the case that no hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference.

the cost f and its gradient and hessian might also be provided as a ManifoldHessianObjective

Keyword arguments

the default values are given in brackets

  • σ - (100.0 / sqrt(manifold_dimension(M)) initial regularization parameter
  • σmin - (1e-10) minimal regularization value $σ_{\min}$
  • η1 - (0.1) lower model success threshold
  • η2 - (0.9) upper model success threshold
  • γ1 - (0.1) regularization reduction factor (for the success case)
  • γ2 - (2.0) regularization increment factor (for the non-success case)
  • evaluation – (AllocatingEvaluation) specify whether the gradient works by allocation (default) form grad_f(M, p) or InplaceEvaluation in place, i.e. is of the form grad_f!(M, X, p) and analogously for the hessian.
  • retraction_method – (default_retraction_method(M, typeof(p))) a retraction to use
  • initial_tangent_vector - (zero_vector(M, p)) initialize any tangent vector data,
  • maxIterLanczos - (200) a shortcut to set the stopping criterion in the sub_solver,
  • ρ_regularization - (1e3) a regularization to avoid dividing by zero for small values of cost and model
  • stopping_criterion - (StopAfterIteration(40) |StopWhenGradientNormLess(1e-9) |StopWhenAllLanczosVectorsUsed(maxIterLanczos))
  • sub_state - LanczosState(M, copy(M, p); maxIterLanczos=maxIterLanczos, σ=σ) a state for the subproblem or an [AbstractEvaluationType`](@ref) if the problem is a funtion.
  • sub_objective - a shortcut to modify the objective of the subproblem used within in the
  • sub_problem - DefaultManoptProblem(M, sub_objective) the problem (or a function) for the sub problem

All other keyword arguments are passed to decorate_state! for state decorators or decorate_objective! for objective, respectively. If you provide the ManifoldGradientObjective directly, these decorations can still be specified

By default the debug= keyword is set to DebugIfEntry(:ρ_denonimator, >(0); message="Denominator nonpositive", type=:error)to avoid that by rounding errors the denominator in the computation ofρ` gets nonpositive.

source
Manopt.adaptive_regularization_with_cubics!Function
adaptive_regularization_with_cubics!(M, f, grad_f, Hess_f, p; kwargs...)
adaptive_regularization_with_cubics!(M, f, grad_f, p; kwargs...)
adaptive_regularization_with_cubics!(M, mho, p; kwargs...)

evaluate the Riemannian adaptive regularization with cubics solver in place of p.

Input

  • M – a manifold $\mathcal M$
  • f – a cost function $F: \mathcal M → ℝ$ to minimize
  • grad_f- the gradient $\operatorname{grad}F: \mathcal M → T \mathcal M$ of $F$
  • Hess_f – (optional) the hessian $H( \mathcal M, x, ξ)$ of $F$
  • p – an initial value $p ∈ \mathcal M$

For the case that no hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference.

the cost f and its gradient and hessian might also be provided as a ManifoldHessianObjective

for more details and all options, see adaptive_regularization_with_cubics.

source

State

Manopt.AdaptiveRegularizationStateType
AdaptiveRegularizationState{P,T} <: AbstractHessianSolverState

A state for the adaptive_regularization_with_cubics solver.

Fields

a default value is given in brackets if a parameter can be left out in initialization.

  • η1, η2 – (0.1, 0.9) bounds for evaluating the regularization parameter
  • γ1, γ2 – (0.1, 2.0) shrinking and exansion factors for regularization parameter σ
  • p – (rand(M) the current iterate
  • X – (zero_vector(M,p)) the current gradient $\operatorname{grad}f(p)$
  • s - (zero_vector(M,p)) the tangent vector step resulting from minimizing the model problem in the tangent space $\mathcal T_{p} \mathcal M$
  • σ – the current cubic regularization parameter
  • σmin – (1e-7) lower bound for the cubic regularization parameter
  • ρ_regularization – (1e3) regularization paramter for computing ρ. As we approach convergence the ρ may be difficult to compute with numerator and denominator approachign zero. Regularizing the the ratio lets ρ go to 1 near convergence.
  • evaluation - (AllocatingEvaluation()) if you provide a
  • retraction_method – (default_retraction_method(M)) the retraction to use
  • stopping_criterion – (StopAfterIteration(100)) a StoppingCriterion
  • sub_problem - sub problem solved in each iteration
  • sub_state - sub state for solving the sub problem – either a solver state if the problem is an AbstractManoptProblem or an AbstractEvaluationType if it is a function, where it defaults to AllocatingEvaluation.

Furthermore the following interal fields are defined

  • q - (copy(M,p)) a point for the candidates to evaluate model and ρ
  • H – (copy(M, p, X)) the current hessian, $\operatorname{Hess}F(p)[⋅]$
  • S – (copy(M, p, X)) the current solution from the subsolver
  • ρ – the current regularized ratio of actual improvement and model improvement.
  • ρ_denominator – (one(ρ)) a value to store the denominator from the computation of ρ to allow for a warning or error when this value is non-positive.

Constructor

AdaptiveRegularizationState(M, p=rand(M); X=zero_vector(M, p); kwargs...)

Construct the solver state with all fields stated above as keyword arguments.

source

Sub solvers

There are several ways to approach the subsolver. The default is the first one.

Lanczos Iteration

Manopt.LanczosStateType
LanczosState{P,T,SC,B,I,R,TM,V,Y} <: AbstractManoptSolverState

Solve the adaptive regularized subproblem with a Lanczos iteration

Fields

  • p the current iterate
  • stop – the stopping criterion
  • σ – the current regularization parameter
  • X the current gradient
  • Lanczos_vectors – the obtained Lanczos vectors
  • tridig_matrix the tridigonal coefficient matrix T
  • coefficients the coefficients y_1,...y_k` that deteermine the solution
  • Hp – a temporary vector containing the evaluation of the Hessian
  • Hp_residual – a temporary vector containing the residual to the Hessian
  • S – the current obtained / approximated solution
source

(Conjugate) Gradient Descent

There are two generic functors, that implement the sub problem

Manopt.AdaptiveRegularizationCubicCostType
AdaptiveRegularizationCubicCost

We define the model $m(X)$ in the tangent space of the current iterate $p=p_k$ as

\[ m(X) = f(p) + <X, \operatorname{grad}f(p)> + \frac{1}{2} <X, \operatorname{Hess} f(p)[X]> + \frac{σ}{3} \lVert X \rVert^3\]

Fields

Constructors

AdaptiveRegularizationCubicCost(mho, σ, X)
AdaptiveRegularizationCubicCost(M, mho, σ; p=rand(M), X=get_gradient(M, mho, p))

Initialize the cubic cost to the objective mho, regularization parameter σ, and (temporary) gradient X.

Note

For this gradient function to work, we require the TangentSpaceAtPoint from Manifolds.jl

source
Manopt.AdaptiveRegularizationCubicGradType
AdaptiveRegularizationCubicGrad

We define the model $m(X)$ in the tangent space of the current iterate $p=p_k$ as

\[ m(X) = f(p) + <X, \operatorname{grad}f(p)> + \frac{1}{2} <X, \operatorname{Hess} f(p)[X]> + \frac{σ}{3} \lVert X \rVert^3\]

This struct represents its gradient, given by

\[ \operatorname{grad} m(X) = \operatorname{grad}f(p) + \operatorname{Hess} f(p)[X] + σ \lVert X \rVert X\]

Fields

Constructors

AdaptiveRegularizationCubicGrad(mho, σ, X)
AdaptiveRegularizationCubicGrad(M, mho, σ; p=rand(M), X=get_gradient(M, mho, p))

Initialize the cubic cost to the original objective mho, regularization parameter σ, and (temporary) gradient X.

Note

from Manifolds.jl

  • The gradient functor provides both an allocating as well as an in-place variant.
source

Since the sub problem is given on the tangent space, you have to provide

g = AdaptiveRegularizationCubicCost(M, mho, σ)
grad_g = AdaptiveRegularizationCubicGrad(M, mho, σ)
sub_problem = DefaultProblem(TangentSpaceAt(M,p), ManifoldGradienObjective(g, grad_g))

where mho is the hessian objective of f to solve. Then use this for the sub_problem keyword and use your favourite gradient based solver for the sub_state keyword, for example a ConjugateGradientDescentState

Additional Stopping Criteria

Manopt.StopWhenAllLanczosVectorsUsedType
StopWhenAllLanczosVectorsUsed <: StoppingCriterion

When an inner iteration has used up all Lanczos vectors, then this stoping crtierion is a fallback / security stopping criterion in order to not access a non-existing field in the array allocated for vectors.

Note that this stopping criterion (for now) is only implemented for the case that an AdaptiveRegularizationState when using a LanczosState subsolver

Fields

  • maxLanczosVectors – maximal number of Lanczos vectors
  • reason – a String indicating the reason if the criterion indicated to stop

Constructor

StopWhenAllLanczosVectorsUsed(maxLancosVectors::Int)
source
Manopt.StopWhenFirstOrderProgressType
StopWhenFirstOrderProgress <: StoppingCriterion

A stopping criterion related to the Riemannian adaptive regularization with cubics (ARC) solver indicating that the model function at the current (outer) iterate, i.e.

\[ m(X) = f(p) + <X, \operatorname{grad}f(p)> + \frac{1}{2} <X, \operatorname{Hess} f(p)[X]> + \frac{σ}{3} \lVert X \rVert^3,\]

defined on the tangent space $T_{p}\mathcal M$ fulfills at the current iterate $X_k$ that

\[m(X_k) \leq m(0) \quad\text{ and }\quad \lVert \operatorname{grad} m(X_k) \rVert ≤ θ \lVert X_k \rVert^2\]

Fields

  • θ – the factor $θ$ in the second condition above
  • reason – a String indicating the reason if the criterion indicated to stop
source

Literature

[ABBC20]
N. Agarwal, N. Boumal, B. Bullins and C. Cartis. Adaptive regularization with cubics on manifolds. Mathematical Programming (2020).