Steihaug-Toint Truncated Conjugate-Gradient Method
The aim is to solve the trust-region subproblem
on a manifold by using the Steihaug-Toint truncated conjugate-gradient method. All terms involving the trust-region radius use an inner product w.r.t. the preconditioner; this is because the iterates grow in length w.r.t. the preconditioner, guaranteeing that we do not re-enter the trust-region.
Initialization
Initialize $\eta_0 = \eta$ if using randomized approach else $\eta_0$=
zeroTVector
(M,x)
, $r_0 = \nabla F(x)$, $z_0 = \operatorname{P}(r_0)$, $\delta_0 = z_0$ and $k=0$
Iteration
Repeat until a convergence criterion is reached
- Set $\kappa = \langle \delta_k, \operatorname{Hess}[F] (\delta_k)_ {x} \rangle_{x}$, $\alpha =\frac{\langle r_k, z_k \rangle_{x}}{\kappa}$ and $\langle \eta_k, \eta_k \rangle_{x}^{* } = \langle \eta_k, \operatorname{P}(\eta_k) \rangle_{x} + 2\alpha \langle \eta_k, \operatorname{P}(\delta_k) \rangle_{x} + {\alpha}^2 \langle \delta_k, \operatorname{P}(\delta_k) \rangle_{x}$.
- If $\kappa \leqq 0$ or $\langle \eta_k, \eta_k \rangle_{x}^{* } \geqq {\Delta}^2$ return $\eta_{k+1} = \eta_k + \tau \delta_k$ and stop.
- Set $\eta_{k}^{* }= \eta_k + \alpha \delta_k$, if $\langle \eta_k, \eta_k \rangle_{x} + \frac{1}{2} \langle \eta_k, \operatorname{Hess}[F] (\eta_k)_ {x} \rangle_{x} \leqq \langle \eta_k^{* }, \eta_k^{* } \rangle_{x} + \frac{1}{2} \langle \eta_k^{* }, \operatorname{Hess}[F] (\eta_k)_ {x} \rangle_{x}$ set $\eta_{k+1} = \eta_k$ else set $\eta_{k+1} = \eta_{k}^{* }$.
- Set $r_{k+1} = r_k + \alpha \operatorname{Hess}[F] (\delta_k)_ {x}$, $z_{k+1} = \operatorname{P}(r_{k+1})$, $\beta = \frac{\langle r_{k+1}, z_{k+1} \rangle_{x}}{\langle r_k, z_k \rangle_{x}}$ and $\delta_{k+1} = -z_{k+1} + \beta \delta_k$.
- Set $k=k+1$.
Result
The result is given by the last computed $η_k$.
Remarks
The $\operatorname{P}(\cdot)$ denotes the symmetric, positive definite preconditioner. It is required if a randomized approach is used i.e. using a random tangent vector $\eta$=
randomTVector
(M,x)
as initial vector. The idea behind it is to avoid saddle points. Preconditioning is simply a rescaling of the variables and thus a redefinition of the shape of the trust region. Ideally $\operatorname{P}(\cdot)$ is a cheap, positive approximation of the inverse of the Hessian of $F$ at $x$. On default, the preconditioner is just the identity.
To step number 2: Obtain $\tau$ from the positive root of $\left\lVert \eta_k + \tau \delta_k \right\rVert_{\operatorname{P}, x} = \Delta$ what becomes after the conversion of the equation to
It can occur that $\langle \delta_k, \operatorname{Hess}[F] (\delta_k)_ {x} \rangle_{x} = \kappa \leqq 0$ at iteration $k$. In this case, the model is not strictly convex, and the stepsize $\alpha =\frac{\langle r_k, z_k \rangle_{x}} {\kappa}$ computed in step 1. does not give a reduction in the modelfunction $m_{x}(\cdot)$. Indeed, $m_{x}(\cdot)$ is unbounded from below along the line $\eta_k + \alpha \delta_k$. If our aim is to minimize the model within the trust-region, it makes far more sense to reduce $m_{x}(\cdot)$ along $\eta_k + \alpha \delta_k$ as much as we can while staying within the trust-region, and this means moving to the trust-region boundary along this line. Thus when $\kappa \leqq 0$ at iteration k, we replace $\alpha = \frac{\langle r_k, z_k \rangle_{x}}{\kappa}$ with $\tau$ described as above. The other possibility is that $\eta_{k+1}$ would lie outside the trust-region at iteration k (i.e. $\langle \eta_k, \eta_k \rangle_{x}^{* } \geqq {\Delta}^2$ what can be identified with the norm of $\eta_{k+1}$). In particular, when $\operatorname{Hess}[F] (\cdot)_ {x}$ is positive definite and $\eta_{k+1}$ lies outside the trust region, the solution to the trust-region problem must lie on the trust-region boundary. Thus, there is no reason to continue with the conjugate gradient iteration, as it stands, as subsequent iterates will move further outside the trust-region boundary. A sensible strategy, just as in the case considered above, is to move to the trust-region boundary by finding $\tau$.
Interface
Manopt.truncatedConjugateGradient
— Function.truncatedConjugateGradient(M, F, ∇F, x, η, H, Δ)
solve the trust-region subproblem
with the Steihaug-Toint truncated conjugate-gradient method. For a description of the algorithm and theorems offering convergence guarantees, see the reference:
- P.-A. Absil, C.G. Baker, K.A. Gallivan, Trust-region methods on Riemannian manifolds, FoCM, 2007. doi: 10.1007/s10208-005-0179-9
- A. R. Conn, N. I. M. Gould, P. L. Toint, Trust-region methods, SIAM, MPS, 2000. doi: 10.1137/1.9780898719857
Input
M
– a manifold $\mathcal M$F
– a cost function $F\colon\mathcal M\to\mathbb R$ to minimize∇F
– the gradient $\nabla F\colon\mathcal M\to T\mathcal M$ of Fx
– a point on the manifold $x\in\mathcal M$η
– an update tangential vector $\eta\in\mathcal{T_{x}M}$H
– the hessian $H( \mathcal M, x, \xi)$ of FΔ
– a trust-region radius
Optional
preconditioner
– a preconditioner for the hessian Hθ
– 1+θ is the superlinear convergence target rate. The algorithm will terminate early if the residual was reduced by a power of 1+theta.κ
– the linear convergence target rate: algorithm will terminate early if the residual was reduced by a factor of kappa.useRandom
– set to true if the trust-region solve is to be initiated with a random tangent vector. If set to true, no preconditioner will be used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.stoppingCriterion
– (stopWhenAny
,stopAfterIteration
,stopIfResidualIsReducedByFactor
,stopIfResidualIsReducedByPower
,stopWhenCurvatureIsNegative
,stopWhenTrustRegionIsExceeded
) a functor inheriting fromStoppingCriterion
indicating when to stop, where for the default, the maximal number of iterations ismanifoldDimension
, the power factor isθ
, the reduction factor isκ
. .returnOptions
– (false
) – if actiavated, the extended result, i.e. the completeOptions
re returned. This can be used to access recorded values. If set to false (default) just the optimal valuexOpt
is returned
and the ones that are passed to decorateOptions
for decorators.
Output
η
– an approximate solution of the trust-region subproblem in $\mathcal{T_{x}M}$.
OR
options
- the options returned by the solver (seereturnOptions
)
see also
Options
TruncatedConjugateGradientOptions <: HessianOptions
describe the Steihaug-Toint truncated conjugate-gradient method, with
Fields
a default value is given in brackets if a parameter can be left out in initialization.
x
: aMPoint
, where the trust-region subproblem needs to be solvedstop
: a function s,r = @(o,iter,ξ,x,xnew) returning a stop indicator and a reason based on an iteration number, the gradient and the last and current iteratesη
: aTVector
(called update vector), which solves the trust-region subproblem after successful calculation by the algorithmδ
: search directionΔ
: the trust-region radiusresidual
: the gradientuseRand
: indicates if the trust-region solve and so the algorithm is to be initiated with a random tangent vector. If set to true, no preconditioner will be used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.
Constructor
TruncatedConjugateGradientOptions(x, stop, eta, delta, Delta, res, uR)
construct a truncated conjugate-gradient Option with the fields as above.
See also
Additional Stopping Criteria
stopIfResidualIsReducedByPower <: StoppingCriterion
A functor for testing if the norm of residual at the current iterate is reduced by a power of 1+θ compared to the norm of the initial residual, i.e. $\Vert r_k \Vert_x \leqq \Vert r_0 \Vert_{x}^{1+\theta}$. In this case the algorithm reached superlinear convergence.
Fields
θ
– part of the reduction powerinitialResidualNorm
- stores the norm of the residual at the initial vector $\eta$ of the Steihaug-Toint tcg mehtodtruncatedConjugateGradient
reason
– stores a reason of stopping if the stopping criterion has one be reached, seegetReason
.
Constructor
stopIfResidualIsReducedByPower(iRN, θ)
initialize the stopIfResidualIsReducedByFactor functor to indicate to stop after the norm of the current residual is lesser than the norm of the initial residual iRN to the power of 1+θ.
See also
stopIfResidualIsReducedByFactor <: StoppingCriterion
A functor for testing if the norm of residual at the current iterate is reduced by a factor compared to the norm of the initial residual, i.e. $\Vert r_k \Vert_x \leqq \kappa \Vert r_0 \Vert_x$. In this case the algorithm reached linear convergence.
Fields
κ
– the reduction factorinitialResidualNorm
- stores the norm of the residual at the initial vector $\eta$ of the Steihaug-Toint tcg mehtodtruncatedConjugateGradient
reason
– stores a reason of stopping if the stopping criterion has one be reached, seegetReason
.
Constructor
stopIfResidualIsReducedByFactor(iRN, κ)
initialize the stopIfResidualIsReducedByFactor functor to indicate to stop after the norm of the current residual is lesser than the norm of the initial residual iRN times κ.
See also
stopWhenTrustRegionIsExceeded <: StoppingCriterion
A functor for testing if the norm of the next iterate in the Steihaug-Toint tcg mehtod is larger than the trust-region radius, i.e. $\Vert η_{k}^{*} \Vert_x ≧ Δ$. terminate the algorithm when the trust region has been left.
Fields
reason
– stores a reason of stopping if the stopping criterion has one be reached, seegetReason
.storage
– stores the necessary parametersη, δ, residual
to check the criterion.
Constructor
stopWhenTrustRegionIsExceeded([a])
initialize the stopWhenTrustRegionIsExceeded functor to indicate to stop after the norm of the next iterate is greater than the trust-region radius using the StoreOptionsAction
a
, which is initialized to store :η, :δ, :residual
by default.
See also
stopWhenCurvatureIsNegative <: StoppingCriterion
A functor for testing if the curvature of the model is negative, i.e. $\langle \delta_k, \operatorname{Hess}[F](\delta_k)\rangle_x \leqq 0$. In this case, the model is not strictly convex, and the stepsize as computed does not give a reduction of the model.
Fields
reason
– stores a reason of stopping if the stopping criterion has one be reached, seegetReason
.storage
– stores the necessary parameterδ
to check the criterion.
Constructor
stopWhenCurvatureIsNegative([a])
initialize the stopWhenCurvatureIsNegative functor to indicate to stop after the inner product of the search direction and the hessian applied on the search dircetion is less than zero using the StoreOptionsAction
a
, which is initialized to just store :δ
by default.
See also