# Steihaug-Toint Truncated Conjugate-Gradient Method

The aim is to solve the trust-region subproblem

on a manifold by using the Steihaug-Toint truncated conjugate-gradient method. All terms involving the trust-region radius use an inner product w.r.t. the preconditioner; this is because the iterates grow in length w.r.t. the preconditioner, guaranteeing that we do not re-enter the trust-region.

## Initialization

Initialize $\eta_0 = \eta$ if using randomized approach and $\eta$ the zero tangent vector otherwise, $r_0 = \nabla F(x)$, $z_0 = \operatorname{P}(r_0)$, $\delta_0 = z_0$ and $k=0$

## Iteration

Repeat until a convergence criterion is reached

- Set $\kappa = \langle \delta_k, \operatorname{Hess}[F] (\delta_k)_ {x} \rangle_{x}$, $\alpha =\frac{\langle r_k, z_k \rangle_{x}}{\kappa}$ and $\langle \eta_k, \eta_k \rangle_{x}^{* } = \langle \eta_k, \operatorname{P}(\eta_k) \rangle_{x} + 2\alpha \langle \eta_k, \operatorname{P}(\delta_k) \rangle_{x} + {\alpha}^2 \langle \delta_k, \operatorname{P}(\delta_k) \rangle_{x}$.
- If $\kappa \leqq 0$ or $\langle \eta_k, \eta_k \rangle_{x}^{* } \geqq {\Delta}^2$ return $\eta_{k+1} = \eta_k + \tau \delta_k$ and stop.
- Set $\eta_{k}^{* }= \eta_k + \alpha \delta_k$, if $\langle \eta_k, \eta_k \rangle_{x} + \frac{1}{2} \langle \eta_k, \operatorname{Hess}[F] (\eta_k)_ {x} \rangle_{x} \leqq \langle \eta_k^{* }, \eta_k^{* } \rangle_{x} + \frac{1}{2} \langle \eta_k^{* }, \operatorname{Hess}[F] (\eta_k)_ {x} \rangle_{x}$ set $\eta_{k+1} = \eta_k$ else set $\eta_{k+1} = \eta_{k}^{* }$.
- Set $r_{k+1} = r_k + \alpha \operatorname{Hess}[F] (\delta_k)_ {x}$, $z_{k+1} = \operatorname{P}(r_{k+1})$, $\beta = \frac{\langle r_{k+1}, z_{k+1} \rangle_{x}}{\langle r_k, z_k \rangle_{x}}$ and $\delta_{k+1} = -z_{k+1} + \beta \delta_k$.
- Set $k=k+1$.

## Result

The result is given by the last computed $η_k$.

## Remarks

The $\operatorname{P}(\cdot)$ denotes the symmetric, positive deﬁnite preconditioner. It is required if a randomized approach is used i.e. using a random tangent vector $\eta$ as initial vector. The idea behind it is to avoid saddle points. Preconditioning is simply a rescaling of the variables and thus a redeﬁnition of the shape of the trust region. Ideally $\operatorname{P}(\cdot)$ is a cheap, positive approximation of the inverse of the Hessian of $F$ at $x$. On default, the preconditioner is just the identity.

To step number 2: Obtain $\tau$ from the positive root of $\left\lVert \eta_k + \tau \delta_k \right\rVert_{\operatorname{P}, x} = \Delta$ what becomes after the conversion of the equation to

It can occur that $\langle \delta_k, \operatorname{Hess}[F] (\delta_k)_ {x} \rangle_{x} = \kappa \leqq 0$ at iteration $k$. In this case, the model is not strictly convex, and the stepsize $\alpha =\frac{\langle r_k, z_k \rangle_{x}} {\kappa}$ computed in step 1. does not give a reduction in the modelfunction $m_{x}(\cdot)$. Indeed, $m_{x}(\cdot)$ is unbounded from below along the line $\eta_k + \alpha \delta_k$. If our aim is to minimize the model within the trust-region, it makes far more sense to reduce $m_{x}(\cdot)$ along $\eta_k + \alpha \delta_k$ as much as we can while staying within the trust-region, and this means moving to the trust-region boundary along this line. Thus when $\kappa \leqq 0$ at iteration k, we replace $\alpha = \frac{\langle r_k, z_k \rangle_{x}}{\kappa}$ with $\tau$ described as above. The other possibility is that $\eta_{k+1}$ would lie outside the trust-region at iteration k (i.e. $\langle \eta_k, \eta_k \rangle_{x}^{* } \geqq {\Delta}^2$ what can be identified with the norm of $\eta_{k+1}$). In particular, when $\operatorname{Hess}[F] (\cdot)_ {x}$ is positive deﬁnite and $\eta_{k+1}$ lies outside the trust region, the solution to the trust-region problem must lie on the trust-region boundary. Thus, there is no reason to continue with the conjugate gradient iteration, as it stands, as subsequent iterates will move further outside the trust-region boundary. A sensible strategy, just as in the case considered above, is to move to the trust-region boundary by ﬁnding $\tau$.

## Interface

`Manopt.truncated_conjugate_gradient_descent`

— Function`truncated_conjugate_gradient_descent(M, F, ∇F, x, η, H, Δ)`

solve the trust-region subproblem

with the Steihaug-Toint truncated conjugate-gradient method. For a description of the algorithm and theorems offering convergence guarantees, see the reference:

- P.-A. Absil, C.G. Baker, K.A. Gallivan, Trust-region methods on Riemannian manifolds, FoCM, 2007. doi: 10.1007/s10208-005-0179-9
- A. R. Conn, N. I. M. Gould, P. L. Toint, Trust-region methods, SIAM, MPS, 2000. doi: 10.1137/1.9780898719857

**Input**

`M`

– a manifold $\mathcal M$`F`

– a cost function $F\colon\mathcal M\to\mathbb R$ to minimize`∇F`

– the gradient $\nabla F\colon\mathcal M\to T\mathcal M$ of F`x`

– a point on the manifold $x ∈ \mathcal M$`η`

– an update tangential vector $\eta ∈ \mathcal{T_{x}M}$`H`

– the hessian $H( \mathcal M, x, \xi)$ of F`Δ`

– a trust-region radius

**Optional**

`preconditioner`

– a preconditioner for the hessian H`θ`

– 1+θ is the superlinear convergence target rate. The algorithm will terminate early if the residual was reduced by a power of 1+theta.`κ`

– the linear convergence target rate: algorithm will terminate early if the residual was reduced by a factor of kappa.`useRandom`

– set to true if the trust-region solve is to be initiated with a random tangent vector. If set to true, no preconditioner will be used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.`stopping_criterion`

– (`StopWhenAny`

,`StopAfterIteration`

,`stopIfResidualIsReducedByFactor`

,`stopIfResidualIsReducedByPower`

,`StopWhenCurvatureIsNegative`

,`StopWhenTrustRegionIsExceeded`

) a functor inheriting from`StoppingCriterion`

indicating when to stop, where for the default, the maximal number of iterations is set to the dimension of the manifold, the power factor is`θ`

, the reduction factor is`κ`

. .`return_options`

– (`false`

) – if actiavated, the extended result, i.e. the complete`Options`

re returned. This can be used to access recorded values. If set to false (default) just the optimal value`xOpt`

is returned

and the ones that are passed to `decorate_options`

for decorators.

**Output**

`η`

– an approximate solution of the trust-region subproblem in $\mathcal{T_{x}M}$.

OR

`options`

- the options returned by the solver (see`return_options`

)

**see also**

## Options

`Manopt.TruncatedConjugateGradientOptions`

— Type`TruncatedConjugateGradientOptions <: HessianOptions`

describe the Steihaug-Toint truncated conjugate-gradient method, with

**Fields**

a default value is given in brackets if a parameter can be left out in initialization.

`x`

: a point, where the trust-region subproblem needs to be solved`stop`

: a function s,r = @(o,iter,ξ,x,xnew) returning a stop indicator and a reason based on an iteration number, the gradient and the last and current iterates`η`

: a tangent vector (called update vector), which solves the trust-region subproblem after successful calculation by the algorithm`δ`

: search direction`Δ`

: the trust-region radius`residual`

: the gradient`useRand`

: indicates if the trust-region solve and so the algorithm is to be initiated with a random tangent vector. If set to true, no preconditioner will be used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.

**Constructor**

`TruncatedConjugateGradientOptions(x, stop, eta, delta, Delta, res, uR)`

construct a truncated conjugate-gradient Option with the fields as above.

**See also**

## Additional Stopping Criteria

`Manopt.stopIfResidualIsReducedByPower`

— Type`stopIfResidualIsReducedByPower <: StoppingCriterion`

A functor for testing if the norm of residual at the current iterate is reduced by a power of 1+θ compared to the norm of the initial residual, i.e. $\Vert r_k \Vert_x \leqq \Vert r_0 \Vert_{x}^{1+\theta}$. In this case the algorithm reached superlinear convergence.

**Fields**

`θ`

– part of the reduction power`initialResidualNorm`

- stores the norm of the residual at the initial vector $\eta$ of the Steihaug-Toint tcg mehtod`truncated_conjugate_gradient_descent`

`reason`

– stores a reason of stopping if the stopping criterion has one be reached, see`get_reason`

.

**Constructor**

`stopIfResidualIsReducedByPower(iRN, θ)`

initialize the stopIfResidualIsReducedByFactor functor to indicate to stop after the norm of the current residual is lesser than the norm of the initial residual iRN to the power of 1+θ.

**See also**

`Manopt.stopIfResidualIsReducedByFactor`

— Type`stopIfResidualIsReducedByFactor <: StoppingCriterion`

A functor for testing if the norm of residual at the current iterate is reduced by a factor compared to the norm of the initial residual, i.e. $\Vert r_k \Vert_x \leqq \kappa \Vert r_0 \Vert_x$. In this case the algorithm reached linear convergence.

**Fields**

`κ`

– the reduction factor`initialResidualNorm`

- stores the norm of the residual at the initial vector $\eta$ of the Steihaug-Toint tcg mehtod`truncated_conjugate_gradient_descent`

`reason`

– stores a reason of stopping if the stopping criterion has one be reached, see`get_reason`

.

**Constructor**

`stopIfResidualIsReducedByFactor(iRN, κ)`

initialize the stopIfResidualIsReducedByFactor functor to indicate to stop after the norm of the current residual is lesser than the norm of the initial residual iRN times κ.

**See also**

`Manopt.StopWhenTrustRegionIsExceeded`

— Type`StopWhenTrustRegionIsExceeded <: StoppingCriterion`

A functor for testing if the norm of the next iterate in the Steihaug-Toint tcg mehtod is larger than the trust-region radius, i.e. $\Vert η_{k}^{*} \Vert_x ≧ Δ$. terminate the algorithm when the trust region has been left.

**Fields**

`reason`

– stores a reason of stopping if the stopping criterion has one be reached, see`get_reason`

.`storage`

– stores the necessary parameters`η, δ, residual`

to check the criterion.

**Constructor**

`StopWhenTrustRegionIsExceeded([a])`

initialize the StopWhenTrustRegionIsExceeded functor to indicate to stop after the norm of the next iterate is greater than the trust-region radius using the `StoreOptionsAction`

`a`

, which is initialized to store `:η, :δ, :residual`

by default.

**See also**

`Manopt.StopWhenCurvatureIsNegative`

— Type`StopWhenCurvatureIsNegative <: StoppingCriterion`

A functor for testing if the curvature of the model is negative, i.e. $\langle \delta_k, \operatorname{Hess}[F](\delta_k)\rangle_x \leqq 0$. In this case, the model is not strictly convex, and the stepsize as computed does not give a reduction of the model.

**Fields**

`reason`

– stores a reason of stopping if the stopping criterion has one be reached, see`get_reason`

.`storage`

– stores the necessary parameter`δ`

to check the criterion.

**Constructor**

`StopWhenCurvatureIsNegative([a])`

initialize the StopWhenCurvatureIsNegative functor to indicate to stop after the inner product of the search direction and the hessian applied on the search dircetion is less than zero using the `StoreOptionsAction`

`a`

, which is initialized to just store `:δ`

by default.

**See also**