Steihaug-Toint Truncated Conjugate-Gradient Method
The aim is to solve the trust-region subproblem
\[\operatorname*{arg\,min}_{η ∈ T_{x}\mathcal{M}} m_{x}(η) = F(x) + ⟨\operatorname{grad}F(x), η⟩_{x} + \frac{1}{2} ⟨ \mathcal{H}[η], η⟩_{x}\]
\[\text{s.t.} \; ⟨η, η⟩_{x} \leq {Δ}^2\]
on a manifold by using the Steihaug-Toint truncated conjugate-gradient method, abbreviated tCG-method. All terms involving the trust-region radius use an inner product w.r.t. the preconditioner; this is because the iterates grow in length w.r.t. the preconditioner, guaranteeing that we do not re-enter the trust-region.
Initialization
Initialize $η_0 = η$ if using randomized approach and $η$ the zero tangent vector otherwise, $r_0 = \operatorname{grad}F(x)$, $z_0 = \operatorname{P}(r_0)$, $δ_0 = z_0$ and $k=0$
Iteration
Repeat until a convergence criterion is reached
- Set $α =\frac{⟨r_k, z_k⟩_x}{⟨δ_k, \mathcal{H}[δ_k]⟩_x}$ and $⟨η_k, η_k⟩_{x}^* = ⟨η_k, \operatorname{P}(η_k)⟩_x + 2α ⟨η_k, \operatorname{P}(δ_k)⟩_{x} + {α}^2 ⟨δ_k, \operatorname{P}(δ_k)⟩_{x}$.
- If $⟨δ_k, \mathcal{H}[δ_k]⟩_x ≤ 0$ or $⟨η_k, η_k⟩_x^* ≥ Δ^2$ return $η_{k+1} = η_k + τ δ_k$ and stop.
- Set $η_{k}^*= η_k + α δ_k$, if $⟨η_k, η_k⟩_{x} + \frac{1}{2} ⟨η_k, \operatorname{Hess}[F] (η_k)_{x}⟩_{x} ≤ ⟨η_k^*, η_k^*⟩_{x} + \frac{1}{2} ⟨η_k^*, \operatorname{Hess}[F] (η_k)_ {x}⟩_{x}$ set $η_{k+1} = η_k$ else set $η_{k+1} = η_{k}^*$.
- Set $r_{k+1} = r_k + α \mathcal{H}[δ_k]$, $z_{k+1} = \operatorname{P}(r_{k+1})$, $β = \frac{⟨r_{k+1}, z_{k+1}⟩_{x}}{⟨r_k, z_k ⟩_{x}}$ and $δ_{k+1} = -z_{k+1} + β δ_k$.
- Set $k=k+1$.
Result
The result is given by the last computed $η_k$.
Remarks
The $\operatorname{P}(⋅)$ denotes the symmetric, positive definite preconditioner. It is required if a randomized approach is used i.e. using a random tangent vector $η_0$ as the initial vector. The idea behind it is to avoid saddle points. Preconditioning is simply a rescaling of the variables and thus a redefinition of the shape of the trust region. Ideally $\operatorname{P}(⋅)$ is a cheap, positive approximation of the inverse of the Hessian of $F$ at $x$. On default, the preconditioner is just the identity.
To step number 2: obtain $τ$ from the positive root of $\left\lVert η_k + τ δ_k \right\rVert_{\operatorname{P}, x} = Δ$ what becomes after the conversion of the equation to
\[ τ = \frac{-⟨η_k, \operatorname{P}(δ_k)⟩_{x} + \sqrt{⟨η_k, \operatorname{P}(δ_k)⟩_{x}^{2} + ⟨δ_k, \operatorname{P}(δ_k)⟩_{x} ( Δ^2 - ⟨η_k, \operatorname{P}(η_k)⟩_{x})}} {⟨δ_k, \operatorname{P}(δ_k)⟩_{x}}.\]
It can occur that $⟨δ_k, \operatorname{Hess}[F] (δ_k)_{x}⟩_{x} = κ ≤ 0$ at iteration $k$. In this case, the model is not strictly convex, and the stepsize $α =\frac{⟨r_k, z_k⟩_{x}} {κ}$ computed in step 1. does not give a reduction in the model function $m_x(⋅)$. Indeed, $m_x(⋅)$ is unbounded from below along the line $η_k + α δ_k$. If our aim is to minimize the model within the trust-region, it makes far more sense to reduce $m_x(⋅)$ along $η_k + α δ_k$ as much as we can while staying within the trust-region, and this means moving to the trust-region boundary along this line. Thus, when $κ ≤ 0$ at iteration k, we replace $α = \frac{⟨r_k, z_k⟩_{x}}{κ}$ with $τ$ described as above. The other possibility is that $η_{k+1}$ would lie outside the trust-region at iteration k (i.e. $⟨η_k, η_k⟩_{x}^{* } ≥ {Δ}^2$ that can be identified with the norm of $η_{k+1}$). In particular, when $\operatorname{Hess}[F] (⋅)_{x}$ is positive definite and $η_{k+1}$ lies outside the trust region, the solution to the trust-region problem must lie on the trust-region boundary. Thus, there is no reason to continue with the conjugate gradient iteration, as it stands, as subsequent iterates will move further outside the trust-region boundary. A sensible strategy, just as in the case considered above, is to move to the trust-region boundary by finding $τ$.
Although it is virtually impossible in practice to know how many iterations are necessary to provide a good estimate $η_{k}$ of the trust-region subproblem, the method stops after a certain number of iterations, which is realised by StopAfterIteration
. In order to increase the convergence rate of the underlying trust-region method, see trust_regions
, a typical stopping criterion is to stop as soon as an iteration $k$ is reached for which
\[ \Vert r_k \Vert_x \leqq \Vert r_0 \Vert_x \min \left( \Vert r_0 \Vert^{θ}_x, κ \right \]
holds, where $0 < κ < 1$ and $θ > 0$ are chosen in advance. This is realized in this method by StopIfResidualIsReducedByFactorOrPower
. It can be shown shown that under appropriate conditions the iterates $x_k$ of the underlying trust-region method converge to nondegenerate critical points with an order of convergence of at least $\min \left( θ + 1, 2 \right)$, see [Absil, Mahony, Sepulchre, 2008]. The method also aborts if the curvature of the model is negative, i.e. if $\langle \delta_k, \mathcal{H}[δ_k] \rangle_x \leqq 0$, which is realised by StopWhenCurvatureIsNegative
. If the next possible approximate solution $η_{k}^{*}$ calculated in iteration $k$ lies outside the trust region, i.e. if $\lVert η_{k}^{*} \rVert_x \geq Δ$, then the method aborts, which is realised by StopWhenTrustRegionIsExceeded
. Furthermore, the method aborts if the new model value evaluated at $η_{k}^{*}$ is greater than the previous model value evaluated at $η_{k}$, which is realised by StopWhenModelIncreased
.
Interface
Manopt.truncated_conjugate_gradient_descent
— Functiontruncated_conjugate_gradient_descent(M, F, gradF, x, η, HessF, trust_region_radius)
solve the trust-region subproblem
\[\operatorname*{arg\,min}_{η ∈ T_xM} m_x(η) \quad\text{where} m_x(η) = F(x) + ⟨\operatorname{grad}F(x),η⟩_x + \frac{1}{2}⟨\operatorname{Hess}F(x)[η],η⟩_x,\]
\[\text{such that}\quad ⟨η,η⟩_x ≤ Δ^2\]
on a manifold M by using the Steihaug-Toint truncated conjugate-gradient method, abbreviated tCG-method. For a description of the algorithm and theorems offering convergence guarantees, see the reference:
- P.-A. Absil, C.G. Baker, K.A. Gallivan, Trust-region methods on Riemannian manifolds, FoCM, 2007. doi: 10.1007/s10208-005-0179-9
- A. R. Conn, N. I. M. Gould, P. L. Toint, Trust-region methods, SIAM, MPS, 2000. doi: 10.1137/1.9780898719857
Input
M
– a manifold $\mathcal M$F
– a cost function $F: \mathcal M → ℝ$ to minimizegradF
– the gradient $\operatorname{grad}F: \mathcal M → T\mathcal M$ ofF
HessF
– the hessian $\operatorname{Hess}F(x): T_x\mathcal M → T_x\mathcal M$, $X ↦ \operatorname{Hess}F(x)[X] = ∇_X\operatorname{grad}f(x)$x
– a point on the manifold $x ∈ \mathcal M$η
– an update tangential vector $η ∈ T_x\mathcal M$HessF
– the hessian $\operatorname{Hess}F(x): T_x\mathcal M → T_x\mathcal M$, $X ↦ \operatorname{Hess}F(x)[X] = ∇_x\operatorname{grad}f(x)$
Optional
evaluation
– (AllocatingEvaluation
) specify whether the gradient and hessian work by allocation (default) orMutatingEvaluation
in placepreconditioner
– a preconditioner for the hessian Hθ
– (1.0
) 1+θ is the superlinear convergence target rate. The method aborts if the residual is less than or equal to the initial residual to the power of 1+θ.κ
– (0.1
) the linear convergence target rate. The method aborts if the residual is less than or equal to κ times the initial residual.randomize
– set to true if the trust-region solve is to be initiated with a random tangent vector. If set to true, no preconditioner will be used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.trust_region_radius
– (injectivity_radius(M)/4
) a trust-region radiusproject!
: (copyto!
) specify a projection operation for tangent vectors for numerical stability. A function(M, Y, p, X) -> ...
working in place ofY
. per default, no projection is perfomed, set it toproject!
to activate projection.stopping_criterion
– (StopWhenAny
,StopAfterIteration
,StopIfResidualIsReducedByFactor
,StopIfResidualIsReducedByPower
,StopWhenCurvatureIsNegative
,StopWhenTrustRegionIsExceeded
) a functor inheriting fromStoppingCriterion
indicating when to stop, where for the default, the maximal number of iterations is set to the dimension of the manifold, the power factor isθ
, the reduction factor isκ
.
and the ones that are passed to decorate_options
for decorators.
Output
Output
the obtained (approximate) minimizer $\eta^*$, see get_solver_return
for details
see also
Manopt.truncated_conjugate_gradient_descent!
— Functiontruncated_conjugate_gradient_descent!(M, F, gradF, x, η, HessF, trust_region_radius; kwargs...)
solve the trust-region subproblem in place of x
.
Input
Input
M
– a manifold $\mathcal M$F
– a cost function $F: \mathcal M → ℝ$ to minimizegradF
– the gradient $\operatorname{grad}F: \mathcal M → T\mathcal M$ ofF
HessF
– the hessian $\operatorname{Hess}F(x): T_x\mathcal M → T_x\mathcal M$, $X ↦ \operatoname{Hess}F(x)[X] = ∇_ξ\operatorname{grad}f(x)$x
– a point on the manifold $x ∈ \mathcal M$η
– an update tangential vector $η ∈ T_x\mathcal M$
For more details and all optional arguments, see truncated_conjugate_gradient_descent
.
Options
Manopt.TruncatedConjugateGradientOptions
— TypeTruncatedConjugateGradientOptions <: AbstractHessianOptions
describe the Steihaug-Toint truncated conjugate-gradient method, with
Fields
a default value is given in brackets if a parameter can be left out in initialization.
x
: a point, where the trust-region subproblem needs to be solvedη
: a tangent vector (called update vector), which solves the trust-region subproblem after successful calculation by the algorithmstop
: aStoppingCriterion
.gradient
: the gradient at the current iterateδ
: search directiontrust_region_radius
: (injectivity_radius(M)/4
) the trust-region radiusresidual
: the gradientrandomize
: indicates if the trust-region solve and so the algorithm is to be initiated with a random tangent vector. If set to true, no preconditioner will be used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.project!
: (copyto!
) specify a projection operation for tangent vectors for numerical stability. A function(M, Y, p, X) -> ...
working in place ofY
. per default, no projection is perfomed, set it toproject!
to activate projection.
Constructor
TruncatedConjugateGradientOptions(M, p, x, η;
trust_region_radius=injectivity_radius(M)/4,
randomize=false,
θ=1.0,
κ=0.1,
project! = copyto!,
)
and a slightly involved `stopping_criterion`
See also
Additional Stopping Criteria
Manopt.StopIfResidualIsReducedByPower
— TypeStopIfResidualIsReducedByPower <: StoppingCriterion
A functor for testing if the norm of residual at the current iterate is reduced by a power of 1+θ compared to the norm of the initial residual, i.e. $\Vert r_k \Vert_x \leqq \Vert r_0 \Vert_{x}^{1+\theta}$.
Fields
θ
– part of the reduction powerreason
– stores a reason of stopping if the stopping criterion has one be reached, seeget_reason
.
Constructor
StopIfResidualIsReducedByPower(θ)
initialize the StopIfResidualIsReducedByFactor functor to indicate to stop after the norm of the current residual is lesser than the norm of the initial residual to the power of 1+θ.
See also
Manopt.StopIfResidualIsReducedByFactor
— TypeStopIfResidualIsReducedByFactor <: StoppingCriterion
A functor for testing if the norm of residual at the current iterate is reduced by a factor compared to the norm of the initial residual, i.e. $\Vert r_k \Vert_x \leqq κ \Vert r_0 \Vert_x$.
Fields
κ
– the reduction factorreason
– stores a reason of stopping if the stopping criterion has one be reached, seeget_reason
.
Constructor
StopIfResidualIsReducedByFactor(κ)
initialize the StopIfResidualIsReducedByFactor functor to indicate to stop after the norm of the current residual is lesser than the norm of the initial residual times κ.
See also
Manopt.StopWhenTrustRegionIsExceeded
— TypeStopWhenTrustRegionIsExceeded <: StoppingCriterion
A functor for testing if the norm of the next iterate in the Steihaug-Toint tcg method is larger than the trust-region radius, i.e. $\lVert η_{k}^{*} \rVert_x \geq Δ$ terminate the algorithm when the trust region has been left.
Fields
reason
– stores a reason of stopping if the stopping criterion has one be reached, seeget_reason
.storage
– stores the necessary parametersη, δ, residual
to check the criterion.
Constructor
StopWhenTrustRegionIsExceeded([a])
initialize the StopWhenTrustRegionIsExceeded functor to indicate to stop after the norm of the next iterate is greater than the trust-region radius using the StoreOptionsAction
a
, which is initialized to store :η, :δ, :residual
by default.
See also
Manopt.StopWhenCurvatureIsNegative
— TypeStopWhenCurvatureIsNegative <: StoppingCriterion
A functor for testing if the curvature of the model is negative, i.e. $\langle \delta_k, \operatorname{Hess}[F](\delta_k)\rangle_x \leqq 0$. In this case, the model is not strictly convex, and the stepsize as computed does not give a reduction of the model.
Fields
reason
– stores a reason of stopping if the stopping criterion has one be reached, seeget_reason
.
Constructor
StopWhenCurvatureIsNegative()
See also
Manopt.StopWhenModelIncreased
— TypeStopWhenModelIncreased <: StoppingCriterion
A functor for testing if model value increased.
Fields
reason
– stores a reason of stopping if the stopping criterion has one be reached, seeget_reason
.
Constructor
StopWhenModelIncreased()
See also
Manopt.update_stopping_criterion!
— Methodupdate_stopping_criterion!(c::Stoppingcriterion, s::Symbol, v::value)
update_stopping_criterion!(o::Options, s::Symbol, v::value)
update_stopping_criterion!(c::Stoppingcriterion, ::Val{Symbol}, v::value)
Update a value within a stopping criterion, specified by the symbol s
, to v
. If a criterion does not have a value assigned that corresponds to s
, the update is ignored.
For the second signature, the stopping criterion within the Options
o
is updated.
To see which symbol updates which value, see the specific stopping criteria. They should use dispatch per symbol value (the third signature).
Literature
- [Absil, Mahony, Sepulchre, 2008]
Absil, Pierre-Antoine and Mahony, Robert and Sepulchre, Rodolphe:
Optimization Algorithms on Matrix Manifolds Mathematics of Computation - Math. Comput., Volume 78. doi: 10.1515/9781400830244,