Steihaug-Toint truncated conjugate gradient method
Solve the constraint optimization problem on the tangent space
\[\begin{align*} \operatorname*{arg\,min}_{Y ∈ T_p\mathcal{M}}&\ m_p(Y) = f(p) + ⟨\operatorname{grad}f(p), Y⟩_p + \frac{1}{2} ⟨\mathcal{H}_p[Y], Y⟩_p\\ \text{such that}& \ \lVert Y \rVert_p ≤ Δ \end{align*}\]
on the tangent space $T_p\mathcal M$ of a Riemannian manifold $\mathcal M$ by using the Steihaug-Toint truncated conjugate-gradient (tCG) method, see [ABG06], Algorithm 2, and [CGT00]. Here $\mathcal H_p$ is either the Hessian $\operatorname{Hess} f(p)$ or a linear symmetric operator on the tangent space approximating the Hessian.
Interface
Manopt.truncated_conjugate_gradient_descent
— Functiontruncated_conjugate_gradient_descent(M, f, grad_f, Hess_f, p=rand(M), X=rand(M); vector_at=p);
kwargs...
)
truncated_conjugate_gradient_descent(M, mho::ManifoldHessianObjective, p=rand(M), X=rand(M; vector_at=p);
kwargs...
)
truncated_conjugate_gradient_descent(M, trmo::TrustRegionModelObjective, p=rand(M), X=rand(M; vector_at=p);
kwargs...
)
solve the trust-region subproblem
\[\begin{align*} \operatorname*{arg\,min}_{Y ∈ T_p\mathcal{M}}&\ m_p(Y) = f(p) + ⟨\operatorname{grad}f(p), Y⟩_p + \frac{1}{2} ⟨\mathcal{H}_p[Y], Y⟩_p\\ \text{such that}& \ \lVert Y \rVert_p ≤ Δ \end{align*}\]
on a manifold $\mathcal M$ by using the Steihaug-Toint truncated conjugate-gradient (tCG) method. This can be done inplace of X
.
For a description of the algorithm and theorems offering convergence guarantees, see [ABG06, CGT00].
Input
M::
AbstractManifold
: a Riemannian manifold $\mathcal M$
f
: a cost function $f: \mathcal M→ ℝ$ implemented as(M, p) -> v
grad_f
: the (Riemannian) gradient $\operatorname{grad}f$: \mathcal M → T_{p}\mathcal M of f as a function(M, p) -> X
or a function(M, X, p) -> X
computingX
in-placeHess_f
: the (Riemannian) Hessian $\operatorname{Hess}f$: T{p}\mathcal M → T{p}\mathcal M of f as a function(M, p, X) -> Y
or a function(M, Y, p, X) -> Y
computingY
in-placep
: a point on the manifold $\mathcal M$X
: a tangent vector at the point $p$ on the manifold $\mathcal M$
Instead of the three functions, you either provide a ManifoldHessianObjective
mho
which is then used to build the trust region model, or a TrustRegionModelObjective
trmo
directly.
Keyword arguments
evaluation=
AllocatingEvaluation
()
: specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (AllocatingEvaluation
) or whether they modify their input argument to return the result therein (InplaceEvaluation
). Since usually the first argument is the manifold, the modified argument is the second.preconditioner
: a preconditioner for the Hessian H. This is either an allocating function(M, p, X) -> Y
or an in-place function(M, Y, p, X) -> Y
, seeevaluation
, and by default set to the identity.θ=1.0
: the superlinear convergence target rate of $1+θ$κ=0.1
: the linear convergence target rate.project!=copyto!
: for numerical stability it is possible to project onto the tangent space after every iteration. the function has to work inplace ofY
, that is(M, Y, p, X) -> Y
, whereX
andY
can be the same memory.randomize=false
: indicate whetherX
is initialised to a random vector or not. This disables preconditioning.retraction_method=
default_retraction_method
(M, typeof(p))
: a retraction $\operatorname{retr}$ to use, see the section on retractionsstopping_criterion=
StopAfterIteration
(
manifold_dimension
(base_manifold(Tpm))
)
|
StopWhenResidualIsReducedByFactorOrPower
(; κ=κ, θ=θ)
|
StopWhenTrustRegionIsExceeded
()
|
StopWhenCurvatureIsNegative
()
|
StopWhenModelIncreased
()
: a functor indicating that the stopping criterion is fulfilledtrust_region_radius=
injectivity_radius
(M) / 4
: the initial trust-region radius
All other keyword arguments are passed to decorate_state!
for state decorators or decorate_objective!
for objective decorators, respectively.
Output
The obtained approximate minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return
for details, especially the return_state=
keyword.
See also
Manopt.truncated_conjugate_gradient_descent!
— Functiontruncated_conjugate_gradient_descent(M, f, grad_f, Hess_f, p=rand(M), X=rand(M); vector_at=p);
kwargs...
)
truncated_conjugate_gradient_descent(M, mho::ManifoldHessianObjective, p=rand(M), X=rand(M; vector_at=p);
kwargs...
)
truncated_conjugate_gradient_descent(M, trmo::TrustRegionModelObjective, p=rand(M), X=rand(M; vector_at=p);
kwargs...
)
solve the trust-region subproblem
\[\begin{align*} \operatorname*{arg\,min}_{Y ∈ T_p\mathcal{M}}&\ m_p(Y) = f(p) + ⟨\operatorname{grad}f(p), Y⟩_p + \frac{1}{2} ⟨\mathcal{H}_p[Y], Y⟩_p\\ \text{such that}& \ \lVert Y \rVert_p ≤ Δ \end{align*}\]
on a manifold $\mathcal M$ by using the Steihaug-Toint truncated conjugate-gradient (tCG) method. This can be done inplace of X
.
For a description of the algorithm and theorems offering convergence guarantees, see [ABG06, CGT00].
Input
M::
AbstractManifold
: a Riemannian manifold $\mathcal M$
f
: a cost function $f: \mathcal M→ ℝ$ implemented as(M, p) -> v
grad_f
: the (Riemannian) gradient $\operatorname{grad}f$: \mathcal M → T_{p}\mathcal M of f as a function(M, p) -> X
or a function(M, X, p) -> X
computingX
in-placeHess_f
: the (Riemannian) Hessian $\operatorname{Hess}f$: T{p}\mathcal M → T{p}\mathcal M of f as a function(M, p, X) -> Y
or a function(M, Y, p, X) -> Y
computingY
in-placep
: a point on the manifold $\mathcal M$X
: a tangent vector at the point $p$ on the manifold $\mathcal M$
Instead of the three functions, you either provide a ManifoldHessianObjective
mho
which is then used to build the trust region model, or a TrustRegionModelObjective
trmo
directly.
Keyword arguments
evaluation=
AllocatingEvaluation
()
: specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (AllocatingEvaluation
) or whether they modify their input argument to return the result therein (InplaceEvaluation
). Since usually the first argument is the manifold, the modified argument is the second.preconditioner
: a preconditioner for the Hessian H. This is either an allocating function(M, p, X) -> Y
or an in-place function(M, Y, p, X) -> Y
, seeevaluation
, and by default set to the identity.θ=1.0
: the superlinear convergence target rate of $1+θ$κ=0.1
: the linear convergence target rate.project!=copyto!
: for numerical stability it is possible to project onto the tangent space after every iteration. the function has to work inplace ofY
, that is(M, Y, p, X) -> Y
, whereX
andY
can be the same memory.randomize=false
: indicate whetherX
is initialised to a random vector or not. This disables preconditioning.retraction_method=
default_retraction_method
(M, typeof(p))
: a retraction $\operatorname{retr}$ to use, see the section on retractionsstopping_criterion=
StopAfterIteration
(
manifold_dimension
(base_manifold(Tpm))
)
|
StopWhenResidualIsReducedByFactorOrPower
(; κ=κ, θ=θ)
|
StopWhenTrustRegionIsExceeded
()
|
StopWhenCurvatureIsNegative
()
|
StopWhenModelIncreased
()
: a functor indicating that the stopping criterion is fulfilledtrust_region_radius=
injectivity_radius
(M) / 4
: the initial trust-region radius
All other keyword arguments are passed to decorate_state!
for state decorators or decorate_objective!
for objective decorators, respectively.
Output
The obtained approximate minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return
for details, especially the return_state=
keyword.
See also
State
Manopt.TruncatedConjugateGradientState
— TypeTruncatedConjugateGradientState <: AbstractHessianSolverState
describe the Steihaug-Toint truncated conjugate-gradient method, with
Fields
Let T
denote the type of a tangent vector and R <: Real
.
δ::T
: the conjugate gradient search directionδHδ
,YPδ
,δPδ
,YPδ
: temporary inner products withHδ
and preconditioned inner products.Hδ
,HY
: temporary results of the Hessian applied toδ
andY
, respectively.κ::R
: the linear convergence target rate.project!
: for numerical stability it is possible to project onto the tangent space after every iteration. the function has to work inplace ofY
, that is(M, Y, p, X) -> Y
, whereX
andY
can be the same memory.randomize
: indicate whetherX
is initialised to a random vector or notresidual::T
: the gradient of the model $m(Y)$stop::StoppingCriterion
: a functor indicating that the stopping criterion is fulfilledθ::R
: the superlinear convergence target rate of $1+θ$trust_region_radius::R
: the trust-region radiusX::T
: the gradient $\operatorname{grad}f(p)$Y::T
: current iterate tangent vectorz::T
: the preconditioned residualz_r::R
: inner product of the residual andz
Constructor
TruncatedConjugateGradientState(TpM::TangentSpace, Y=rand(TpM); kwargs...)
Initialise the TCG state.
Input
TpM
: aTangentSpace
Keyword arguments
κ=0.1
project!::F=copyto!
: initialise the numerical stabilisation to just copy the resultrandomize=false
θ=1.0
trust_region_radius=
injectivity_radius
(base_manifold(TpM)) / 4
stopping_criterion=
StopAfterIteration
(
manifold_dimension
(base_manifold(Tpm))
)
|
StopWhenResidualIsReducedByFactorOrPower
(; κ=κ, θ=θ)
|
StopWhenTrustRegionIsExceeded
()
|
StopWhenCurvatureIsNegative
()
|
StopWhenModelIncreased
()
: a functor indicating that the stopping criterion is fulfilledX=
zero_vector
(M, p)
: a tangent vector at the point $p$ on the manifold $\mathcal M$
See also
Stopping criteria
Manopt.StopWhenResidualIsReducedByFactorOrPower
— TypeStopWhenResidualIsReducedByFactorOrPower <: StoppingCriterion
A functor for testing if the norm of residual at the current iterate is reduced either by a power of 1+θ or by a factor κ compared to the norm of the initial residual. The criterion hence reads
$\lVert r_k \rVert_{p} ≦ \lVert r_0 \rVert_{p^{(0)}} \min \bigl( κ, \lVert r_0 \rVert_{p^{(0)}} \bigr)$.
Fields
κ
: the reduction factorθ
: part of the reduction powerat_iteration::Int
: an integer indicating at which the stopping criterion last indicted to stop, which might also be before the solver started (0
). Any negative value indicates that this was not yet the case;
Constructor
StopWhenResidualIsReducedByFactorOrPower(; κ=0.1, θ=1.0)
Initialize the StopWhenResidualIsReducedByFactorOrPower functor to indicate to stop after the norm of the current residual is lesser than either the norm of the initial residual to the power of 1+θ or the norm of the initial residual times κ.
See also
Manopt.StopWhenTrustRegionIsExceeded
— TypeStopWhenTrustRegionIsExceeded <: StoppingCriterion
A functor for testing if the norm of the next iterate in the Steihaug-Toint truncated conjugate gradient method is larger than the trust-region radius $θ ≤ \lVert Y^{(k)}^{*} \rVert_{p^{(k)}}$ and to end the algorithm when the trust region has been left.
Fields
at_iteration::Int
: an integer indicating at which the stopping criterion last indicted to stop, which might also be before the solver started (0
). Any negative value indicates that this was not yet the case;trr
the trust region radiusYPY
the computed norm of $Y$.
Constructor
StopWhenTrustRegionIsExceeded()
initialize the StopWhenTrustRegionIsExceeded functor to indicate to stop after the norm of the next iterate is greater than the trust-region radius.
See also
Manopt.StopWhenCurvatureIsNegative
— TypeStopWhenCurvatureIsNegative <: StoppingCriterion
A functor for testing if the curvature of the model is negative, $⟨δ_k, \operatorname{Hess} F(p)[δ_k]⟩_p ≦ 0$. In this case, the model is not strictly convex, and the stepsize as computed does not yield a reduction of the model.
Fields
at_iteration::Int
: an integer indicating at which the stopping criterion last indicted to stop, which might also be before the solver started (0
). Any negative value indicates that this was not yet the case;value
store the value of the inner product.reason
: stores a reason of stopping if the stopping criterion has been reached, seeget_reason
.
Constructor
StopWhenCurvatureIsNegative()
See also
Manopt.StopWhenModelIncreased
— TypeStopWhenModelIncreased <: StoppingCriterion
A functor for testing if the curvature of the model value increased.
Fields
at_iteration::Int
: an integer indicating at which the stopping criterion last indicted to stop, which might also be before the solver started (0
). Any negative value indicates that this was not yet the case;model_value
stre the last model valueinc_model_value
store the model value that increased
Constructor
StopWhenModelIncreased()
See also
Manopt.set_parameter!
— Methodset_parameter!(c::StopWhenResidualIsReducedByFactorOrPower, :ResidualPower, v)
Update the residual Power θ
to v
.
Manopt.set_parameter!
— Methodset_parameter!(c::StopWhenResidualIsReducedByFactorOrPower, :ResidualFactor, v)
Update the residual Factor κ
to v
.
Trust region model
Manopt.TrustRegionModelObjective
— TypeTrustRegionModelObjective{O<:AbstractManifoldHessianObjective} <: AbstractManifoldSubObjective{O}
A trust region model of the form
\[ m(X) = f(p) + ⟨\operatorname{grad} f(p), X⟩_p + \frac{1}(2} ⟨\operatorname{Hess} f(p)[X], X⟩_p\]
Fields
objective
: anAbstractManifoldHessianObjective
proving $f$, its gradient and Hessian
Constructors
TrustRegionModelObjective(objective)
with either an AbstractManifoldHessianObjective
objective
or an decorator containing such an objective
Technical details
The trust_regions
solver requires the following functions of a manifold to be available
- if you do not provide a
trust_region_radius=
, theninjectivity_radius
on the manifoldM
is required. - the
norm
as well, to stop when the norm of the gradient is small, but if you implementedinner
, the norm is provided already. - A
zero_vector!
(M,X,p)
. - A `copyto!
(M, q, p)
andcopy
(M,p)
for points.
Literature
- [ABG06]
- P.-A. Absil, C. Baker and K. Gallivan. Trust-Region Methods on Riemannian Manifolds. Foundations of Computational Mathematics 7, 303–330 (2006).
- [CGT00]
- A. R. Conn, N. I. Gould and P. L. Toint. Trust Region Methods (Society for Industrial and Applied Mathematics, 2000).