Adaptive regularization with cubics
Manopt.adaptive_regularization_with_cubics
— Functionadaptive_regularization_with_cubics(M, f, grad_f, Hess_f, p=rand(M); kwargs...)
adaptive_regularization_with_cubics(M, f, grad_f, p=rand(M); kwargs...)
adaptive_regularization_with_cubics(M, mho, p=rand(M); kwargs...)
adaptive_regularization_with_cubics!(M, f, grad_f, Hess_f, p; kwargs...)
adaptive_regularization_with_cubics!(M, f, grad_f, p; kwargs...)
adaptive_regularization_with_cubics!(M, mho, p; kwargs...)
Solve an optimization problem on the manifold M
by iteratively minimizing
\[m_k(X) = f(p_k) + ⟨X, \operatorname{grad} f(p^{(k)})⟩ + \frac{1}{2}⟨X, \operatorname{Hess} f(p^{(k)})[X]⟩ + \frac{σ_k}{3}\lVert X \rVert^3\]
on the tangent space at the current iterate $p_k$, where $X ∈ T_{p_k}\mathcal M$ and $σ_k > 0$ is a regularization parameter.
Let $Xp^{(k)}$ denote the minimizer of the model $m_k$ and use the model improvement
\[ ρ_k = \frac{f(p_k) - f(\operatorname{retr}_{p_k}(X_k))}{m_k(0) - m_k(X_k) + \frac{σ_k}{3}\lVert X_k\rVert^3}.\]
With two thresholds $η_2 ≥ η_1 > 0$ set $p_{k+1} = \operatorname{retr}_{p_k}(X_k)$ if $ρ ≥ η_1$ and reject the candidate otherwise, that is, set $p_{k+1} = p_k$.
Further update the regularization parameter using factors $0 < γ_1 < 1 < γ_2$ reads
\[σ_{k+1} = \begin{cases} \max\{σ_{\min}, γ_1σ_k\} & \text{ if } ρ \geq η_2 &\text{ (the model was very successful)},\\ σ_k & \text{ if } ρ ∈ [η_1, η_2)&\text{ (the model was successful)},\\ γ_2σ_k & \text{ if } ρ < η_1&\text{ (the model was unsuccessful)}. \end{cases}\]
For more details see [ABBC20].
Input
M::
AbstractManifold
: a Riemannian manifold $\mathcal M$
f
: a cost function $f: \mathcal M→ ℝ$ implemented as(M, p) -> v
grad_f
: the (Riemannian) gradient $\operatorname{grad}f: \mathcal M → T_{p}\mathcal M$ of f as a function(M, p) -> X
or a function(M, X, p) -> X
computingX
in-placeHess_f
: the (Riemannian) Hessian $\operatorname{Hess}f: T_{p}\mathcal M → T_{p}\mathcal M$ of f as a function(M, p, X) -> Y
or a function(M, Y, p, X) -> Y
computingY
in-placep
: a point on the manifold $\mathcal M$
the cost f
and its gradient and Hessian might also be provided as a ManifoldHessianObjective
Keyword arguments
σ=100.0 / sqrt(manifold_dimension(M)
: initial regularization parameterσmin=1e-10
: minimal regularization value $σ_{\min}$η1=0.1
: lower model success thresholdη2=0.9
: upper model success thresholdγ1=0.1
: regularization reduction factor (for the success case)γ2=2.0
: regularization increment factor (for the non-success case)evaluation=
AllocatingEvaluation
()
: specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (AllocatingEvaluation
) or whether they modify their input argument to return the result therein (InplaceEvaluation
). Since usually the first argument is the manifold, the modified argument is the second.initial_tangent_vector=zero_vector(M, p)
: initialize any tangent vector data,maxIterLanczos=200
: a shortcut to set the stopping criterion in the sub solver,ρ_regularization=1e3
: a regularization to avoid dividing by zero for small values of cost and modelretraction_method=
default_retraction_method
(M, typeof(p))
: a retraction $\operatorname{retr}$ to use, see the section on retractions:stopping_criterion=
StopAfterIteration
(40)
|
StopWhenGradientNormLess
(1e-9)
|
StopWhenAllLanczosVectorsUsed
(maxIterLanczos)
: a functor indicating that the stopping criterion is fulfilledsub_kwargs=
(;)
: a named tuple of keyword arguments that are passed todecorate_objective!
of the sub solvers objective, thedecorate_state!
of the subsovlers state, and the sub state constructor itself.sub_objective=nothing
: a shortcut to modify the objective of the subproblem used within in thesub_problem=
keyword By default, this is initialized as aAdaptiveRagularizationWithCubicsModelObjective
, which can further be decorated by using thesub_kwargs=
keyword.sub_state=
LanczosState
(M, copy(M,p))
: a state to specify the sub solver to use. For a closed form solution, this indicates the type of function.sub_problem=
DefaultManoptProblem
(M, sub_objective)
: specify a problem for a solver or a closed form solution function, which can be allocating or in-place.
All other keyword arguments are passed to decorate_state!
for state decorators or decorate_objective!
for objective decorators, respectively.
If you provide the ManifoldGradientObjective
directly, the evaluation=
keyword is ignored. The decorations are still applied to the objective.
If you activate tutorial mode (cf. is_tutorial_mode
), this solver provides additional debug warnings.
Output
The obtained approximate minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return
for details, especially the return_state=
keyword.
Manopt.adaptive_regularization_with_cubics!
— Functionadaptive_regularization_with_cubics(M, f, grad_f, Hess_f, p=rand(M); kwargs...)
adaptive_regularization_with_cubics(M, f, grad_f, p=rand(M); kwargs...)
adaptive_regularization_with_cubics(M, mho, p=rand(M); kwargs...)
adaptive_regularization_with_cubics!(M, f, grad_f, Hess_f, p; kwargs...)
adaptive_regularization_with_cubics!(M, f, grad_f, p; kwargs...)
adaptive_regularization_with_cubics!(M, mho, p; kwargs...)
Solve an optimization problem on the manifold M
by iteratively minimizing
\[m_k(X) = f(p_k) + ⟨X, \operatorname{grad} f(p^{(k)})⟩ + \frac{1}{2}⟨X, \operatorname{Hess} f(p^{(k)})[X]⟩ + \frac{σ_k}{3}\lVert X \rVert^3\]
on the tangent space at the current iterate $p_k$, where $X ∈ T_{p_k}\mathcal M$ and $σ_k > 0$ is a regularization parameter.
Let $Xp^{(k)}$ denote the minimizer of the model $m_k$ and use the model improvement
\[ ρ_k = \frac{f(p_k) - f(\operatorname{retr}_{p_k}(X_k))}{m_k(0) - m_k(X_k) + \frac{σ_k}{3}\lVert X_k\rVert^3}.\]
With two thresholds $η_2 ≥ η_1 > 0$ set $p_{k+1} = \operatorname{retr}_{p_k}(X_k)$ if $ρ ≥ η_1$ and reject the candidate otherwise, that is, set $p_{k+1} = p_k$.
Further update the regularization parameter using factors $0 < γ_1 < 1 < γ_2$ reads
\[σ_{k+1} = \begin{cases} \max\{σ_{\min}, γ_1σ_k\} & \text{ if } ρ \geq η_2 &\text{ (the model was very successful)},\\ σ_k & \text{ if } ρ ∈ [η_1, η_2)&\text{ (the model was successful)},\\ γ_2σ_k & \text{ if } ρ < η_1&\text{ (the model was unsuccessful)}. \end{cases}\]
For more details see [ABBC20].
Input
M::
AbstractManifold
: a Riemannian manifold $\mathcal M$
f
: a cost function $f: \mathcal M→ ℝ$ implemented as(M, p) -> v
grad_f
: the (Riemannian) gradient $\operatorname{grad}f: \mathcal M → T_{p}\mathcal M$ of f as a function(M, p) -> X
or a function(M, X, p) -> X
computingX
in-placeHess_f
: the (Riemannian) Hessian $\operatorname{Hess}f: T_{p}\mathcal M → T_{p}\mathcal M$ of f as a function(M, p, X) -> Y
or a function(M, Y, p, X) -> Y
computingY
in-placep
: a point on the manifold $\mathcal M$
the cost f
and its gradient and Hessian might also be provided as a ManifoldHessianObjective
Keyword arguments
σ=100.0 / sqrt(manifold_dimension(M)
: initial regularization parameterσmin=1e-10
: minimal regularization value $σ_{\min}$η1=0.1
: lower model success thresholdη2=0.9
: upper model success thresholdγ1=0.1
: regularization reduction factor (for the success case)γ2=2.0
: regularization increment factor (for the non-success case)evaluation=
AllocatingEvaluation
()
: specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (AllocatingEvaluation
) or whether they modify their input argument to return the result therein (InplaceEvaluation
). Since usually the first argument is the manifold, the modified argument is the second.initial_tangent_vector=zero_vector(M, p)
: initialize any tangent vector data,maxIterLanczos=200
: a shortcut to set the stopping criterion in the sub solver,ρ_regularization=1e3
: a regularization to avoid dividing by zero for small values of cost and modelretraction_method=
default_retraction_method
(M, typeof(p))
: a retraction $\operatorname{retr}$ to use, see the section on retractions:stopping_criterion=
StopAfterIteration
(40)
|
StopWhenGradientNormLess
(1e-9)
|
StopWhenAllLanczosVectorsUsed
(maxIterLanczos)
: a functor indicating that the stopping criterion is fulfilledsub_kwargs=
(;)
: a named tuple of keyword arguments that are passed todecorate_objective!
of the sub solvers objective, thedecorate_state!
of the subsovlers state, and the sub state constructor itself.sub_objective=nothing
: a shortcut to modify the objective of the subproblem used within in thesub_problem=
keyword By default, this is initialized as aAdaptiveRagularizationWithCubicsModelObjective
, which can further be decorated by using thesub_kwargs=
keyword.sub_state=
LanczosState
(M, copy(M,p))
: a state to specify the sub solver to use. For a closed form solution, this indicates the type of function.sub_problem=
DefaultManoptProblem
(M, sub_objective)
: specify a problem for a solver or a closed form solution function, which can be allocating or in-place.
All other keyword arguments are passed to decorate_state!
for state decorators or decorate_objective!
for objective decorators, respectively.
If you provide the ManifoldGradientObjective
directly, the evaluation=
keyword is ignored. The decorations are still applied to the objective.
If you activate tutorial mode (cf. is_tutorial_mode
), this solver provides additional debug warnings.
Output
The obtained approximate minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return
for details, especially the return_state=
keyword.
State
Manopt.AdaptiveRegularizationState
— TypeAdaptiveRegularizationState{P,T} <: AbstractHessianSolverState
A state for the adaptive_regularization_with_cubics
solver.
Fields
η1
,η1
: bounds for evaluating the regularization parameterγ1
,γ2
: shrinking and expansion factors for regularization parameterσ
H
: the current Hessian evaluations
: the current solution from the subsolverp::P
: a point on the manifold $\mathcal M$storing the current iterateq
: a point for the candidates to evaluate model and ρX::T
: a tangent vector at the point $p$ on the manifold $\mathcal M$storing the gradient at the current iterates
: the tangent vector step resulting from minimizing the model problem in the tangent space $T_{p}\mathcal M$σ
: the current cubic regularization parameterσmin
: lower bound for the cubic regularization parameterρ_regularization
: regularization parameter for computing ρ. When approaching convergence ρ may be difficult to compute with numerator and denominator approaching zero. Regularizing the ratio lets ρ go to 1 near convergence.ρ
: the current regularized ratio of actual improvement and model improvement.ρ_denominator
: a value to store the denominator from the computation of ρ to allow for a warning or error when this value is non-positive.retraction_method::AbstractRetractionMethod
: a retraction $\operatorname{retr}$ to use, see the section on retractionsstop::StoppingCriterion
: a functor indicating that the stopping criterion is fulfilledsub_problem::Union{AbstractManoptProblem, F}
: specify a problem for a solver or a closed form solution function, which can be allocating or in-place.sub_state::Union{AbstractManoptProblem, F}
: a state to specify the sub solver to use. For a closed form solution, this indicates the type of function.
Furthermore the following integral fields are defined
Constructor
AdaptiveRegularizationState(M, sub_problem, sub_state; kwargs...)
Construct the solver state with all fields stated as keyword arguments and the following defaults
Keyword arguments
η1=0.1
η2=0.9
γ1=0.1
γ2=2.0
σ=100/manifold_dimension(M)
- `σmin=1e-7
ρ_regularization=1e3
evaluation=
AllocatingEvaluation
()
: specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (AllocatingEvaluation
) or whether they modify their input argument to return the result therein (InplaceEvaluation
). Since usually the first argument is the manifold, the modified argument is the second.p=
rand
(M)
: a point on the manifold $\mathcal M$retraction_method=
default_retraction_method
(M, typeof(p))
: a retraction $\operatorname{retr}$ to use, see the section on retractionsstopping_criterion=
StopAfterIteration
(100)
: a functor indicating that the stopping criterion is fulfilledX=
zero_vector
(M, p)
: a tangent vector at the point $p$ on the manifold $\mathcal M$
Sub solvers
There are several ways to approach the subsolver. The default is the first one.
Lanczos iteration
Manopt.LanczosState
— TypeLanczosState{P,T,SC,B,I,R,TM,V,Y} <: AbstractManoptSolverState
Solve the adaptive regularized subproblem with a Lanczos iteration
Fields
stop::StoppingCriterion
: a functor indicating that the stopping criterion is fulfilledstop_newton::StoppingCriterion
: a functor indicating that the stopping criterion is fulfilledused for the inner Newton iterationσ
: the current regularization parameterX
: the IterateLanczos_vectors
: the obtained Lanczos vectorstridig_matrix
: the tridiagonal coefficient matrix Tcoefficients
: the coefficients $y_1,...y_k$ that determine the solutionHp
: a temporary tangent vector containing the evaluation of the HessianHp_residual
: a temporary tangent vector containing the residual to the HessianS
: the current obtained / approximated solution
Constructor
LanczosState(TpM::TangentSpace; kwargs...)
Keyword arguments
X=
zero_vector
(M, p)
: a tangent vector at the point $p$ on the manifold $\mathcal M$as the iteratemaxIterLanzcos=200
: shortcut to set the maximal number of iterations in thestopping_crtierion=
θ=0.5
: set the parameter in theStopWhenFirstOrderProgress
within the defaultstopping_criterion=
.stopping_criterion=
StopAfterIteration
(maxIterLanczos)
|
StopWhenFirstOrderProgress
(θ)
: a functor indicating that the stopping criterion is fulfilledstopping_criterion_newton=
StopAfterIteration
(200)
: a functor indicating that the stopping criterion is fulfilled used for the inner Newton iterationσ=10.0
: specify the regularization parameter
(Conjugate) gradient descent
There is a generic objective, that implements the sub problem
Manopt.AdaptiveRagularizationWithCubicsModelObjective
— TypeAdaptiveRagularizationWithCubicsModelObjective
A model for the adaptive regularization with Cubics
\[m(X) = f(p) + ⟨\operatorname{grad} f(p), X ⟩_p + \frac{1}{2} ⟨\operatorname{Hess} f(p)[X], X⟩_p + \frac{σ}{3} \lVert X \rVert^3,\]
cf. Eq. (33) in [ABBC20]
Fields
objective
: anAbstractManifoldHessianObjective
proving $f$, its gradient and Hessianσ
: the current (cubic) regularization parameter
Constructors
AdaptiveRagularizationWithCubicsModelObjective(mho, σ=1.0)
with either an AbstractManifoldHessianObjective
objective
or an decorator containing such an objective.
Since the sub problem is given on the tangent space, you have to provide
arc_obj = AdaptiveRagularizationWithCubicsModelObjective(mho, σ)
sub_problem = DefaultProblem(TangentSpaceAt(M,p), arc_obj)
where mho
is the Hessian objective of f
to solve. Then use this for the sub_problem
keyword and use your favourite gradient based solver for the sub_state
keyword, for example a ConjugateGradientDescentState
Additional stopping criteria
Manopt.StopWhenAllLanczosVectorsUsed
— TypeStopWhenAllLanczosVectorsUsed <: StoppingCriterion
When an inner iteration has used up all Lanczos vectors, then this stopping criterion is a fallback / security stopping criterion to not access a non-existing field in the array allocated for vectors.
Note that this stopping criterion (for now) is only implemented for the case that an AdaptiveRegularizationState
when using a LanczosState
subsolver
Fields
maxLanczosVectors
: maximal number of Lanczos vectorsat_iteration
indicates at which iteration (includingi=0
) the stopping criterion was fulfilled and is-1
while it is not fulfilled.
Constructor
StopWhenAllLanczosVectorsUsed(maxLancosVectors::Int)
Manopt.StopWhenFirstOrderProgress
— TypeStopWhenFirstOrderProgress <: StoppingCriterion
A stopping criterion related to the Riemannian adaptive regularization with cubics (ARC) solver indicating that the model function at the current (outer) iterate,
\[m_k(X) = f(p_k) + ⟨X, \operatorname{grad} f(p^{(k)})⟩ + \frac{1}{2}⟨X, \operatorname{Hess} f(p^{(k)})[X]⟩ + \frac{σ_k}{3}\lVert X \rVert^3\]
defined on the tangent space $T_{p}\mathcal M$ fulfills at the current iterate $X_k$ that
\[m(X_k) \leq m(0) \quad\text{ and }\quad \lVert \operatorname{grad} m(X_k) \rVert ≤ θ \lVert X_k \rVert^2\]
Fields
θ
: the factor $θ$ in the second conditionat_iteration::Int
: an integer indicating at which the stopping criterion last indicted to stop, which might also be before the solver started (0
). Any negative value indicates that this was not yet the case;
Constructor
StopWhenAllLanczosVectorsUsed(θ)
Technical details
The adaptive_regularization_with_cubics
requires the following functions of a manifolds to be available
- A
retract!
(M, q, p, X)
; it is recommended to set thedefault_retraction_method
to a favourite retraction. If this default is set, aretraction_method=
does not have to be specified. - if you do not provide an initial regularization parameter
σ
, amanifold_dimension
is required. - By default the tangent vector storing the gradient is initialized calling
zero_vector
(M,p)
. inner
(M, p, X, Y)
is used within the algorithm step
Furthermore, within the Lanczos subsolver, generating a random vector (at p
) using rand!
(M, X; vector_at=p)
in place of X
is required
Literature
- [ABBC20]
- N. Agarwal, N. Boumal, B. Bullins and C. Cartis. Adaptive regularization with cubics on manifolds. Mathematical Programming (2020).