Adaptive regularization with Cubics
Manopt.adaptive_regularization_with_cubics
— Functionadaptive_regularization_with_cubics(M, f, grad_f, Hess_f, p=rand(M); kwargs...)
adaptive_regularization_with_cubics(M, f, grad_f, p=rand(M); kwargs...)
adaptive_regularization_with_cubics(M, mho, p=rand(M); kwargs...)
Solve an optimization problem on the manifold M
by iteratively minimizing
\[m_k(X) = f(p_k) + ⟨X, \operatorname{grad} f(p_k)⟩ + \frac{1}{2}⟨X, \operatorname{Hess} f(p_k)[X]⟩ + \frac{σ_k}{3}\lVert X \rVert^3\]
on the tangent space at the current iterate $p_k$, i.e. $X ∈ T_{p_k}\mathcal M$ and where $σ_k > 0$ is a regularization parameter.
Let $X_k$ denote the minimizer of the model $m_k$, then we use the model improvement
\[ρ_k = \frac{f(p_k) - f(\operatorname{retr}_{p_k}(X_k))}{m_k(0) - m_k(s) + \frac{σ_k}{3}\lVert X_k\rVert^3}.\]
We use two thresholds $η_2 ≥ η_1 > 0$ and set $p_{k+1} = \operatorname{retr}_{p_k}(X_k)$ if $ρ ≥ η_1$ and reject the candidate otherwise, i.e. set $p_{k+1} = p_k$.
We further update the regularozation parameter using factors $0 < γ_1 < 1 < γ_2$
\[σ_{k+1} = \begin{cases} \max\{σ_{\min}, γ_1σ_k\} & \text{ if } ρ \geq η_2 &\text{ (the model was very successful)},\\ σ_k & \text{ if } ρ \in [η_1, η_2)&\text{ (the model was succesful)},\\ γ_2σ_k & \text{ if } ρ < η_1&\text{ (the model was unsuccesful)}. \end{cases}\]
For more details see Agarwal, Boumal, Bullins, Cartis, Math. Prog., 2020.
Input
M
– a manifold $\mathcal M$f
– a cost function $F: \mathcal M → ℝ$ to minimizegrad_f
- the gradient $\operatorname{grad}F: \mathcal M → T \mathcal M$ of $F$Hess_f
– (optional) the hessian $H( \mathcal M, x, ξ)$ of $F$p
– an initial value $p ∈ \mathcal M$
For the case that no hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference
.
the cost f
and its gradient and hessian might also be provided as a ManifoldHessianObjective
Keyword arguments
the default values are given in brackets
σ
- (100.0 / sqrt(manifold_dimension(M)
) initial regularization parameterσmin
- (1e-10
) minimal regularization value $σ_{\min}$η1
- (0.1
) lower model success thresholdη2
- (0.9
) upper model success thresholdγ1
- (0.1
) regularization reduction factor (for the success case)γ2
- (2.0
) regularization increment factor (for the non-success case)evaluation
– (AllocatingEvaluation
) specify whether the gradient works by allocation (default) formgrad_f(M, p)
orInplaceEvaluation
in place, i.e. is of the formgrad_f!(M, X, p)
and analogously for the hessian.retraction_method
– (default_retraction_method(M, typeof(p))
) a retraction to useinitial_tangent_vector
- (zero_vector(M, p)
) initialize any tangent vector data,maxIterLanczos
- (200
) a shortcut to set the stopping criterion in the sub_solver,ρ_regularization
- (1e3
) a regularization to avoid dividing by zero for small values of cost and modelstopping_criterion
- (StopAfterIteration
(40) |
StopWhenGradientNormLess
(1e-9) |
StopWhenAllLanczosVectorsUsed
(maxIterLanczos)
)sub_state
-LanczosState
(M, copy(M, p); maxIterLanczos=maxIterLanczos, σ=σ) a state for the subproblem or an [
AbstractEvaluationType`](@ref) if the problem is a funtion.sub_objective
- a shortcut to modify the objective of the subproblem used within in thesub_problem
-DefaultManoptProblem
(M, sub_objective)
the problem (or a function) for the sub problem
All other keyword arguments are passed to decorate_state!
for state decorators or decorate_objective!
for objective, respectively. If you provide the ManifoldGradientObjective
directly, these decorations can still be specified
By default the debug=
keyword is set to DebugIfEntry
(:ρ_denonimator, >(0); message="Denominator nonpositive", type=:error)
to avoid that by rounding errors the denominator in the computation of
ρ` gets nonpositive.
Manopt.adaptive_regularization_with_cubics!
— Functionadaptive_regularization_with_cubics!(M, f, grad_f, Hess_f, p; kwargs...)
adaptive_regularization_with_cubics!(M, f, grad_f, p; kwargs...)
adaptive_regularization_with_cubics!(M, mho, p; kwargs...)
evaluate the Riemannian adaptive regularization with cubics solver in place of p
.
Input
M
– a manifold $\mathcal M$f
– a cost function $F: \mathcal M → ℝ$ to minimizegrad_f
- the gradient $\operatorname{grad}F: \mathcal M → T \mathcal M$ of $F$Hess_f
– (optional) the hessian $H( \mathcal M, x, ξ)$ of $F$p
– an initial value $p ∈ \mathcal M$
For the case that no hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference
.
the cost f
and its gradient and hessian might also be provided as a ManifoldHessianObjective
for more details and all options, see adaptive_regularization_with_cubics
.
State
Manopt.AdaptiveRegularizationState
— TypeAdaptiveRegularizationState{P,T} <: AbstractHessianSolverState
A state for the adaptive_regularization_with_cubics
solver.
Fields
a default value is given in brackets if a parameter can be left out in initialization.
η1
,η2
– (0.1
,0.9
) bounds for evaluating the regularization parameterγ1
,γ2
– (0.1
,2.0
) shrinking and exansion factors for regularization parameterσ
p
– (rand(M)
the current iterateX
– (zero_vector(M,p)
) the current gradient $\operatorname{grad}f(p)$s
- (zero_vector(M,p)
) the tangent vector step resulting from minimizing the model problem in the tangent space $\mathcal T_{p} \mathcal M$σ
– the current cubic regularization parameterσmin
– (1e-7
) lower bound for the cubic regularization parameterρ_regularization
– (1e3) regularization paramter for computing ρ. As we approach convergence the ρ may be difficult to compute with numerator and denominator approachign zero. Regularizing the the ratio lets ρ go to 1 near convergence.evaluation
- (AllocatingEvaluation()
) if you provide aretraction_method
– (default_retraction_method(M)
) the retraction to usestopping_criterion
– (StopAfterIteration
(100)
) aStoppingCriterion
sub_problem
- sub problem solved in each iterationsub_state
- sub state for solving the sub problem – either a solver state if the problem is anAbstractManoptProblem
or anAbstractEvaluationType
if it is a function, where it defaults toAllocatingEvaluation
.
Furthermore the following interal fields are defined
q
- (copy(M,p)
) a point for the candidates to evaluate model and ρH
– (copy(M, p, X)
) the current hessian, $\operatorname{Hess}F(p)[⋅]$S
– (copy(M, p, X)
) the current solution from the subsolverρ
– the current regularized ratio of actual improvement and model improvement.ρ_denominator
– (one(ρ)
) a value to store the denominator from the computation of ρ to allow for a warning or error when this value is non-positive.
Constructor
AdaptiveRegularizationState(M, p=rand(M); X=zero_vector(M, p); kwargs...)
Construct the solver state with all fields stated above as keyword arguments.
Sub solvers
There are several ways to approach the subsolver. The default is the first one.
Lanczos Iteration
Manopt.LanczosState
— TypeLanczosState{P,T,SC,B,I,R,TM,V,Y} <: AbstractManoptSolverState
Solve the adaptive regularized subproblem with a Lanczos iteration
Fields
p
the current iteratestop
– the stopping criterionσ
– the current regularization parameterX
the current gradientLanczos_vectors
– the obtained Lanczos vectorstridig_matrix
the tridigonal coefficient matrix Tcoefficients
the coefficientsy_1,...y_k
` that deteermine the solutionHp
– a temporary vector containing the evaluation of the HessianHp_residual
– a temporary vector containing the residual to the HessianS
– the current obtained / approximated solution
(Conjugate) Gradient Descent
There are two generic functors, that implement the sub problem
Manopt.AdaptiveRegularizationCubicCost
— TypeAdaptiveRegularizationCubicCost
We define the model $m(X)$ in the tangent space of the current iterate $p=p_k$ as
\[ m(X) = f(p) + <X, \operatorname{grad}f(p)> + \frac{1}{2} <X, \operatorname{Hess} f(p)[X]> + \frac{σ}{3} \lVert X \rVert^3\]
Fields
mho
– anAbstractManifoldObjective
that should provide at leastget_cost
,get_gradient
andget_hessian
.σ
– the current regularization parameterX
– a storage for the gradient atp
of the original cost
Constructors
AdaptiveRegularizationCubicCost(mho, σ, X)
AdaptiveRegularizationCubicCost(M, mho, σ; p=rand(M), X=get_gradient(M, mho, p))
Initialize the cubic cost to the objective mho
, regularization parameter σ
, and (temporary) gradient X
.
For this gradient function to work, we require the TangentSpaceAtPoint
from Manifolds.jl
Manopt.AdaptiveRegularizationCubicGrad
— TypeAdaptiveRegularizationCubicGrad
We define the model $m(X)$ in the tangent space of the current iterate $p=p_k$ as
\[ m(X) = f(p) + <X, \operatorname{grad}f(p)> + \frac{1}{2} <X, \operatorname{Hess} f(p)[X]> + \frac{σ}{3} \lVert X \rVert^3\]
This struct represents its gradient, given by
\[ \operatorname{grad} m(X) = \operatorname{grad}f(p) + \operatorname{Hess} f(p)[X] + σ \lVert X \rVert X\]
Fields
mho
– anAbstractManifoldObjective
that should provide at leastget_cost
,get_gradient
andget_hessian
.σ
– the current regularization parameterX
– a storage for the gradient atp
of the original cost
Constructors
AdaptiveRegularizationCubicGrad(mho, σ, X)
AdaptiveRegularizationCubicGrad(M, mho, σ; p=rand(M), X=get_gradient(M, mho, p))
Initialize the cubic cost to the original objective mho
, regularization parameter σ
, and (temporary) gradient X
.
- For this gradient function to work, we require the
TangentSpaceAtPoint
from Manifolds.jl
- The gradient functor provides both an allocating as well as an in-place variant.
Since the sub problem is given on the tangent space, you have to provide
g = AdaptiveRegularizationCubicCost(M, mho, σ)
grad_g = AdaptiveRegularizationCubicGrad(M, mho, σ)
sub_problem = DefaultProblem(TangentSpaceAt(M,p), ManifoldGradienObjective(g, grad_g))
where mho
is the hessian objective of f
to solve. Then use this for the sub_problem
keyword and use your favourite gradient based solver for the sub_state
keyword, for example a ConjugateGradientDescentState
Additional Stopping Criteria
Manopt.StopWhenAllLanczosVectorsUsed
— TypeStopWhenAllLanczosVectorsUsed <: StoppingCriterion
When an inner iteration has used up all Lanczos vectors, then this stoping crtierion is a fallback / security stopping criterion in order to not access a non-existing field in the array allocated for vectors.
Note that this stopping criterion (for now) is only implemented for the case that an AdaptiveRegularizationState
when using a LanczosState
subsolver
Fields
maxLanczosVectors
– maximal number of Lanczos vectorsreason
– a String indicating the reason if the criterion indicated to stop
Constructor
StopWhenAllLanczosVectorsUsed(maxLancosVectors::Int)
Manopt.StopWhenFirstOrderProgress
— TypeStopWhenFirstOrderProgress <: StoppingCriterion
A stopping criterion related to the Riemannian adaptive regularization with cubics (ARC) solver indicating that the model function at the current (outer) iterate, i.e.
\[ m(X) = f(p) + <X, \operatorname{grad}f(p)> + \frac{1}{2} <X, \operatorname{Hess} f(p)[X]> + \frac{σ}{3} \lVert X \rVert^3,\]
defined on the tangent space $T_{p}\mathcal M$ fulfills at the current iterate $X_k$ that
\[m(X_k) \leq m(0) \quad\text{ and }\quad \lVert \operatorname{grad} m(X_k) \rVert ≤ θ \lVert X_k \rVert^2\]
Fields
θ
– the factor $θ$ in the second condition abovereason
– a String indicating the reason if the criterion indicated to stop
Literature
- [ABBC20]
-
N. Agarwal, N. Boumal, B. Bullins and C. Cartis. Adaptive regularization with cubics on manifolds. Mathematical Programming (2020).