Adaptive regularization with cubics
Manopt.adaptive_regularization_with_cubics — Functionadaptive_regularization_with_cubics(M, f, grad_f, Hess_f, p=rand(M); kwargs...)
adaptive_regularization_with_cubics(M, f, grad_f, p=rand(M); kwargs...)
adaptive_regularization_with_cubics(M, mho, p=rand(M); kwargs...)Solve an optimization problem on the manifold M by iteratively minimizing
\[ m_k(X) = f(p_k) + ⟨X, \operatorname{grad} f(p_k)⟩ + \frac{1}{2}⟨X, \operatorname{Hess} f(p_k)[X]⟩ + \frac{σ_k}{3}\lVert X \rVert^3\]
on the tangent space at the current iterate $p_k$, where $X ∈ T_{p_k}\mathcal M$ and $σ_k > 0$ is a regularization parameter.
Let $X_k$ denote the minimizer of the model $m_k$ and use the model improvement
\[ ρ_k = \frac{f(p_k) - f(\operatorname{retr}_{p_k}(X_k))}{m_k(0) - m_k(X_k) + \frac{σ_k}{3}\lVert X_k\rVert^3}.\]
With two thresholds $η_2 ≥ η_1 > 0$ set $p_{k+1} = \operatorname{retr}_{p_k}(X_k)$ if $ρ ≥ η_1$ and reject the candidate otherwise, that is, set $p_{k+1} = p_k$.
Further update the regularization parameter using factors $0 < γ_1 < 1 < γ_2$
\[σ_{k+1} = \begin{cases} \max\{σ_{\min}, γ_1σ_k\} & \text{ if } ρ \geq η_2 &\text{ (the model was very successful)},\\ σ_k & \text{ if } ρ ∈ [η_1, η_2)&\text{ (the model was successful)},\\ γ_2σ_k & \text{ if } ρ < η_1&\text{ (the model was unsuccessful)}. \end{cases}\]
For more details see [ABBC20].
Input
M: a manifold $\mathcal M$f: a cost function $F: \mathcal M → ℝ$ to minimizegrad_f: the gradient $\operatorname{grad}F: \mathcal M → T \mathcal M$ of $F$Hess_f: (optional) the Hessian $H( \mathcal M, x, ξ)$ of $F$p: an initial value $p ∈ \mathcal M$
For the case that no Hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference.
the cost f and its gradient and Hessian might also be provided as a ManifoldHessianObjective
Keyword arguments
the default values are given in brackets
σ: (100.0 / sqrt(manifold_dimension(M)) initial regularization parameterσmin: (1e-10) minimal regularization value $σ_{\min}$η1: (0.1) lower model success thresholdη2: (0.9) upper model success thresholdγ1: (0.1) regularization reduction factor (for the success case)γ2: (2.0) regularization increment factor (for the non-success case)evaluation: (AllocatingEvaluation) specify whether the gradient works by allocation (default) formgrad_f(M, p)orInplaceEvaluationin place, that is of the formgrad_f!(M, X, p)and analogously for the Hessian.retraction_method: (default_retraction_method(M, typeof(p))) a retraction to useinitial_tangent_vector: (zero_vector(M, p)) initialize any tangent vector data,maxIterLanczos: (200) a shortcut to set the stopping criterion in the sub solver,ρ_regularization: (1e3) a regularization to avoid dividing by zero for small values of cost and modelstopping_criterion: (StopAfterIteration(40) |StopWhenGradientNormLess(1e-9) |StopWhenAllLanczosVectorsUsed(maxIterLanczos))sub_state:LanczosState(M, copy(M, p); maxIterLanczos=maxIterLanczos, σ=σ) a state for the subproblem or an [AbstractEvaluationType`](@ref) if the problem is a function.sub_objective: a shortcut to modify the objective of the subproblem used within in thesub_problem:DefaultManoptProblem(M, sub_objective)the problem (or a function) for the sub problem
All other keyword arguments are passed to decorate_state! for state decorators or decorate_objective! for objective, respectively. If you provide the ManifoldGradientObjective directly, these decorations can still be specified
By default the debug= keyword is set to DebugIfEntry(:ρ_denominator, >(0); message="Denominator nonpositive", type=:error) to avoid that by rounding errors the denominator in the computation of ρ gets nonpositive.
Manopt.adaptive_regularization_with_cubics! — Functionadaptive_regularization_with_cubics!(M, f, grad_f, Hess_f, p; kwargs...)
adaptive_regularization_with_cubics!(M, f, grad_f, p; kwargs...)
adaptive_regularization_with_cubics!(M, mho, p; kwargs...)evaluate the Riemannian adaptive regularization with cubics solver in place of p.
Input
M: a manifold $\mathcal M$f: a cost function $F: \mathcal M → ℝ$ to minimizegrad_f: the gradient $\operatorname{grad}F: \mathcal M → T \mathcal M$ of $F$Hess_f: (optional) the Hessian $H( \mathcal M, x, ξ)$ of $F$p: an initial value $p ∈ \mathcal M$
For the case that no Hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference.
the cost f and its gradient and Hessian might also be provided as a ManifoldHessianObjective
for more details and all options, see adaptive_regularization_with_cubics.
State
Manopt.AdaptiveRegularizationState — TypeAdaptiveRegularizationState{P,T} <: AbstractHessianSolverStateA state for the adaptive_regularization_with_cubics solver.
Fields
a default value is given in brackets if a parameter can be left out in initialization.
η1,η2: (0.1,0.9) bounds for evaluating the regularization parameterγ1,γ2: (0.1,2.0) shrinking and expansion factors for regularization parameterσp: (rand(M)the current iterateX: (zero_vector(M,p)) the current gradient $\operatorname{grad}f(p)$s: (zero_vector(M,p)) the tangent vector step resulting from minimizing the model problem in the tangent space $\mathcal T_{p} \mathcal M$σ: the current cubic regularization parameterσmin: (1e-7) lower bound for the cubic regularization parameterρ_regularization: (1e3) regularization parameter for computing ρ.
When approaching convergence ρ may be difficult to compute with numerator and denominator approaching zero. Regularizing the ratio lets ρ go to 1 near convergence.
evaluation: (AllocatingEvaluation()) if you provide aretraction_method: (default_retraction_method(M)) the retraction to usestopping_criterion: (StopAfterIteration(100)) aStoppingCriterionsub_problem: sub problem solved in each iterationsub_state: anAbstractManoptSolverStatefor the subsolver
Furthermore the following integral fields are defined
q: (copy(M,p)) a point for the candidates to evaluate model and ρH: (copy(M, p, X)) the current Hessian, $\operatorname{Hess}F(p)[⋅]$S: (copy(M, p, X)) the current solution from the subsolverρ: the current regularized ratio of actual improvement and model improvement.ρ_denominator: (one(ρ)) a value to store the denominator from the computation of ρ to allow for a warning or error when this value is non-positive.
Constructor
AdaptiveRegularizationState(M, p=rand(M); X=zero_vector(M, p); kwargs...)Construct the solver state with all fields stated as keyword arguments.
Sub solvers
There are several ways to approach the subsolver. The default is the first one.
Lanczos iteration
Manopt.LanczosState — TypeLanczosState{P,T,SC,B,I,R,TM,V,Y} <: AbstractManoptSolverStateSolve the adaptive regularized subproblem with a Lanczos iteration
Fields
stop: the stopping criterionσ: the current regularization parameterX: the IterateLanczos_vectors: the obtained Lanczos vectorstridig_matrix: the tridiagonal coefficient matrix Tcoefficients: the coefficients $y_1,...y_k$ that determine the solutionHp: a temporary vector containing the evaluation of the HessianHp_residual: a temporary vector containing the residual to the HessianS: the current obtained / approximated solution
(Conjugate) gradient descent
There is a generic objective, that implements the sub problem
Manopt.AdaptiveRagularizationWithCubicsModelObjective — TypeAdaptiveRagularizationWithCubicsModelObjectiveA model for the adaptive regularization with Cubics
\[m(X) = f(p) + ⟨\operatorname{grad} f(p), X ⟩_p + \frac{1}{2} ⟨\operatorname{Hess} f(p)[X], X⟩_p + \frac{σ}{3} \lVert X \rVert^3,\]
cf. Eq. (33) in [ABBC20]
Fields
objective: anAbstractManifoldHessianObjectiveproving $f$, its gradient and Hessianσ: the current (cubic) regularization parameter
Constructors
AdaptiveRagularizationWithCubicsModelObjective(mho, σ=1.0)with either an AbstractManifoldHessianObjective objective or an decorator containing such an objective.
Since the sub problem is given on the tangent space, you have to provide
arc_obj = AdaptiveRagularizationWithCubicsModelObjective(mho, σ)
sub_problem = DefaultProblem(TangentSpaceAt(M,p), arc_obj)where mho is the Hessian objective of f to solve. Then use this for the sub_problem keyword and use your favourite gradient based solver for the sub_state keyword, for example a ConjugateGradientDescentState
Additional stopping criteria
Manopt.StopWhenAllLanczosVectorsUsed — TypeStopWhenAllLanczosVectorsUsed <: StoppingCriterionWhen an inner iteration has used up all Lanczos vectors, then this stopping criterion is a fallback / security stopping criterion to not access a non-existing field in the array allocated for vectors.
Note that this stopping criterion (for now) is only implemented for the case that an AdaptiveRegularizationState when using a LanczosState subsolver
Fields
maxLanczosVectors: maximal number of Lanczos vectorsat_iterationindicates at which iteration (includingi=0) the stopping criterion was fulfilled and is-1while it is not fulfilled.
Constructor
StopWhenAllLanczosVectorsUsed(maxLancosVectors::Int)Manopt.StopWhenFirstOrderProgress — TypeStopWhenFirstOrderProgress <: StoppingCriterionA stopping criterion related to the Riemannian adaptive regularization with cubics (ARC) solver indicating that the model function at the current (outer) iterate,
\[ m(X) = f(p) + <X, \operatorname{grad}f(p)> + \frac{1}{2} <X, \operatorname{Hess} f(p)[X]> + \frac{σ}{3} \lVert X \rVert^3,\]
defined on the tangent space $T_{p}\mathcal M$ fulfills at the current iterate $X_k$ that
\[m(X_k) \leq m(0) \quad\text{ and }\quad \lVert \operatorname{grad} m(X_k) \rVert ≤ θ \lVert X_k \rVert^2\]
Fields
θ: the factor $θ$ in the second conditionat_iteration: indicates at which iteration (includingi=0) the stopping criterion was fulfilled and is-1while it is not fulfilled.
Technical details
The adaptive_regularization_with_cubics requires the following functions of a manifolds to be available
- A
retract!(M, q, p, X); it is recommended to set thedefault_retraction_methodto a favourite retraction. If this default is set, aretraction_method=does not have to be specified. - if you do not provide an initial regularization parameter
σ, amanifold_dimensionis required. - By default the tangent vector storing the gradient is initialized calling
zero_vector(M,p). inner(M, p, X, Y)is used within the algorithm step
Furthermore, within the Lanczos subsolver, generating a random vector (at p) using rand!(M, X; vector_at=p)in place ofX` is required
Literature
- [ABBC20]
- N. Agarwal, N. Boumal, B. Bullins and C. Cartis. Adaptive regularization with cubics on manifolds. Mathematical Programming (2020).