Adaptive regularization with cubics
Manopt.adaptive_regularization_with_cubics
— Functionadaptive_regularization_with_cubics(M, f, grad_f, Hess_f, p=rand(M); kwargs...)
adaptive_regularization_with_cubics(M, f, grad_f, p=rand(M); kwargs...)
adaptive_regularization_with_cubics(M, mho, p=rand(M); kwargs...)
Solve an optimization problem on the manifold M
by iteratively minimizing
\[m_k(X) = f(p_k) + ⟨X, \operatorname{grad} f(p_k)⟩ + \frac{1}{2}⟨X, \operatorname{Hess} f(p_k)[X]⟩ + \frac{σ_k}{3}\lVert X \rVert^3\]
on the tangent space at the current iterate $p_k$, i.e. $X ∈ T_{p_k}\mathcal M$ and where $σ_k > 0$ is a regularization parameter.
Let $X_k$ denote the minimizer of the model $m_k$, then we use the model improvement
\[ρ_k = \frac{f(p_k) - f(\operatorname{retr}_{p_k}(X_k))}{m_k(0) - m_k(s) + \frac{σ_k}{3}\lVert X_k\rVert^3}.\]
We use two thresholds $η_2 ≥ η_1 > 0$ and set $p_{k+1} = \operatorname{retr}_{p_k}(X_k)$ if $ρ ≥ η_1$ and reject the candidate otherwise, i.e. set $p_{k+1} = p_k$.
We further update the regularization parameter using factors $0 < γ_1 < 1 < γ_2$
\[σ_{k+1} = \begin{cases} \max\{σ_{\min}, γ_1σ_k\} & \text{ if } ρ \geq η_2 &\text{ (the model was very successful)},\\ σ_k & \text{ if } ρ \in [η_1, η_2)&\text{ (the model was successful)},\\ γ_2σ_k & \text{ if } ρ < η_1&\text{ (the model was unsuccessful)}. \end{cases}\]
For more details see Agarwal, Boumal, Bullins, Cartis, Math. Prog., 2020.
Input
M
– a manifold $\mathcal M$f
– a cost function $F: \mathcal M → ℝ$ to minimizegrad_f
- the gradient $\operatorname{grad}F: \mathcal M → T \mathcal M$ of $F$Hess_f
– (optional) the hessian $H( \mathcal M, x, ξ)$ of $F$p
– an initial value $p ∈ \mathcal M$
For the case that no hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference
.
the cost f
and its gradient and hessian might also be provided as a ManifoldHessianObjective
Keyword arguments
the default values are given in brackets
σ
- (100.0 / sqrt(manifold_dimension(M)
) initial regularization parameterσmin
- (1e-10
) minimal regularization value $σ_{\min}$η1
- (0.1
) lower model success thresholdη2
- (0.9
) upper model success thresholdγ1
- (0.1
) regularization reduction factor (for the success case)γ2
- (2.0
) regularization increment factor (for the non-success case)evaluation
– (AllocatingEvaluation
) specify whether the gradient works by allocation (default) formgrad_f(M, p)
orInplaceEvaluation
in place, i.e. is of the formgrad_f!(M, X, p)
and analogously for the hessian.retraction_method
– (default_retraction_method(M, typeof(p))
) a retraction to useinitial_tangent_vector
- (zero_vector(M, p)
) initialize any tangent vector data,maxIterLanczos
- (200
) a shortcut to set the stopping criterion in the sub_solver,ρ_regularization
- (1e3
) a regularization to avoid dividing by zero for small values of cost and modelstopping_criterion
- (StopAfterIteration
(40) |
StopWhenGradientNormLess
(1e-9) |
StopWhenAllLanczosVectorsUsed
(maxIterLanczos)
)sub_state
-LanczosState
(M, copy(M, p); maxIterLanczos=maxIterLanczos, σ=σ) a state for the subproblem or an [
AbstractEvaluationType`](@ref) if the problem is a function.sub_objective
- a shortcut to modify the objective of the subproblem used within in thesub_problem
-DefaultManoptProblem
(M, sub_objective)
the problem (or a function) for the sub problem
All other keyword arguments are passed to decorate_state!
for state decorators or decorate_objective!
for objective, respectively. If you provide the ManifoldGradientObjective
directly, these decorations can still be specified
By default the debug=
keyword is set to DebugIfEntry
(:ρ_denominator, >(0); message="Denominator nonpositive", type=:error)
to avoid that by rounding errors the denominator in the computation of
ρ` gets nonpositive.
Manopt.adaptive_regularization_with_cubics!
— Functionadaptive_regularization_with_cubics!(M, f, grad_f, Hess_f, p; kwargs...)
adaptive_regularization_with_cubics!(M, f, grad_f, p; kwargs...)
adaptive_regularization_with_cubics!(M, mho, p; kwargs...)
evaluate the Riemannian adaptive regularization with cubics solver in place of p
.
Input
M
– a manifold $\mathcal M$f
– a cost function $F: \mathcal M → ℝ$ to minimizegrad_f
- the gradient $\operatorname{grad}F: \mathcal M → T \mathcal M$ of $F$Hess_f
– (optional) the hessian $H( \mathcal M, x, ξ)$ of $F$p
– an initial value $p ∈ \mathcal M$
For the case that no hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference
.
the cost f
and its gradient and hessian might also be provided as a ManifoldHessianObjective
for more details and all options, see adaptive_regularization_with_cubics
.
State
Manopt.AdaptiveRegularizationState
— TypeAdaptiveRegularizationState{P,T} <: AbstractHessianSolverState
A state for the adaptive_regularization_with_cubics
solver.
Fields
a default value is given in brackets if a parameter can be left out in initialization.
η1
,η2
– (0.1
,0.9
) bounds for evaluating the regularization parameterγ1
,γ2
– (0.1
,2.0
) shrinking and expansion factors for regularization parameterσ
p
– (rand(M)
the current iterateX
– (zero_vector(M,p)
) the current gradient $\operatorname{grad}f(p)$s
- (zero_vector(M,p)
) the tangent vector step resulting from minimizing the model problem in the tangent space $\mathcal T_{p} \mathcal M$σ
– the current cubic regularization parameterσmin
– (1e-7
) lower bound for the cubic regularization parameterρ_regularization
– (1e3) regularization parameter for computing ρ. As we approach convergence the ρ may be difficult to compute with numerator and denominator approaching zero. Regularizing the the ratio lets ρ go to 1 near convergence.evaluation
- (AllocatingEvaluation()
) if you provide aretraction_method
– (default_retraction_method(M)
) the retraction to usestopping_criterion
– (StopAfterIteration
(100)
) aStoppingCriterion
sub_problem
- sub problem solved in each iterationsub_state
- sub state for solving the sub problem – either a solver state if the problem is anAbstractManoptProblem
or anAbstractEvaluationType
if it is a function, where it defaults toAllocatingEvaluation
.
Furthermore the following integral fields are defined
q
- (copy(M,p)
) a point for the candidates to evaluate model and ρH
– (copy(M, p, X)
) the current hessian, $\operatorname{Hess}F(p)[⋅]$S
– (copy(M, p, X)
) the current solution from the subsolverρ
– the current regularized ratio of actual improvement and model improvement.ρ_denominator
– (one(ρ)
) a value to store the denominator from the computation of ρ to allow for a warning or error when this value is non-positive.
Constructor
AdaptiveRegularizationState(M, p=rand(M); X=zero_vector(M, p); kwargs...)
Construct the solver state with all fields stated above as keyword arguments.
Sub solvers
There are several ways to approach the subsolver. The default is the first one.
Lanczos iteration
Manopt.LanczosState
— TypeLanczosState{P,T,SC,B,I,R,TM,V,Y} <: AbstractManoptSolverState
Solve the adaptive regularized subproblem with a Lanczos iteration
Fields
stop
– the stopping criterionσ
– the current regularization parameterX
the IterateLanczos_vectors
– the obtained Lanczos vectorstridig_matrix
the tridiagonal coefficient matrix Tcoefficients
the coefficientsy_1,...y_k
` that determine the solutionHp
– a temporary vector containing the evaluation of the HessianHp_residual
– a temporary vector containing the residual to the HessianS
– the current obtained / approximated solution
(Conjugate) gradient descent
There is a generic objective, that implements the sub problem
Manopt.AdaptiveRagularizationWithCubicsModelObjective
— TypeAdaptiveRagularizationWithCubicsModelObjective
A model for the adaptive regularization with Cubics
\[m(X) = f(p) + ⟨\operatorname{grad} f(p), X ⟩_p + \frac{1}{2} ⟨\operatorname{Hess} f(p)[X], X⟩_p + \frac{σ}{3} \lVert X \rVert^3,\]
cf. Eq. (33) in [ABBC20]
Fields
objective
– anAbstractManifoldHessianObjective
proving $f$, its gradient and Hessianσ
– the current (cubic) regularization parameter
Constructors
AdaptiveRagularizationWithCubicsModelObjective(mho, σ=1.0)
with either an AbstractManifoldHessianObjective
objective
or an decorator containing such an objective.
Since the sub problem is given on the tangent space, you have to provide
arc_obj = AdaptiveRagularizationWithCubicsModelObjective(mho, σ)
sub_problem = DefaultProblem(TangentSpaceAt(M,p), arc_obj)
where mho
is the Hessian objective of f
to solve. Then use this for the sub_problem
keyword and use your favourite gradient based solver for the sub_state
keyword, for example a ConjugateGradientDescentState
Additional stopping criteria
Manopt.StopWhenAllLanczosVectorsUsed
— TypeStopWhenAllLanczosVectorsUsed <: StoppingCriterion
When an inner iteration has used up all Lanczos vectors, then this stopping criterion is a fallback / security stopping criterion in order to not access a non-existing field in the array allocated for vectors.
Note that this stopping criterion (for now) is only implemented for the case that an AdaptiveRegularizationState
when using a LanczosState
subsolver
Fields
maxLanczosVectors
– maximal number of Lanczos vectorsreason
– a String indicating the reason if the criterion indicated to stop
Constructor
StopWhenAllLanczosVectorsUsed(maxLancosVectors::Int)
Manopt.StopWhenFirstOrderProgress
— TypeStopWhenFirstOrderProgress <: StoppingCriterion
A stopping criterion related to the Riemannian adaptive regularization with cubics (ARC) solver indicating that the model function at the current (outer) iterate, i.e.
\[ m(X) = f(p) + <X, \operatorname{grad}f(p)> + \frac{1}{2} <X, \operatorname{Hess} f(p)[X]> + \frac{σ}{3} \lVert X \rVert^3,\]
defined on the tangent space $T_{p}\mathcal M$ fulfills at the current iterate $X_k$ that
\[m(X_k) \leq m(0) \quad\text{ and }\quad \lVert \operatorname{grad} m(X_k) \rVert ≤ θ \lVert X_k \rVert^2\]
Fields
θ
– the factor $θ$ in the second condition abovereason
– a String indicating the reason if the criterion indicated to stop
Technical details
The adaptive_regularization_with_cubics
requires the following functions of a manifolds to be available
- A
retract!
(M, q, p, X)
; it is recommended to set thedefault_retraction_method
to a favourite retraction. If this default is set, aretraction_method=
does not have to be specified. - if you do not provide an initial regularization parameter
σ
, amanifold_dimension
is required. - By default the tangent vector storing the gradient is initialized calling
zero_vector
(M,p)
. inner
(M, p, X, Y)
is used within the algorithm step
Furthermore, within the Lanczos subsolver, generating a random vector (at p
) using rand!
(M, X; vector_at=p)
in place of X
is required
Literature
- [ABBC20]
- N. Agarwal, N. Boumal, B. Bullins and C. Cartis. Adaptive regularization with cubics on manifolds. Mathematical Programming (2020).