# Steihaug-Toint Truncated Conjugate-Gradient Method

The aim is to solve the trust-region subproblem

\[\operatorname*{arg\,min}_{η ∈ T_{x}\mathcal{M}} m_{x}(η) = F(x) + ⟨\operatorname{grad}F(x), η⟩_{x} + \frac{1}{2} ⟨ \mathcal{H}[η], η⟩_{x}\]

\[\text{s.t.} \; ⟨η, η⟩_{x} \leq {Δ}^2\]

on a manifold by using the Steihaug-Toint truncated conjugate-gradient method, abbreviated tCG-method. All terms involving the trust-region radius use an inner product w.r.t. the preconditioner; this is because the iterates grow in length w.r.t. the preconditioner, guaranteeing that we do not re-enter the trust-region.

## Initialization

Initialize $η_0 = η$ if using randomized approach and $η$ the zero tangent vector otherwise, $r_0 = \operatorname{grad}F(x)$, $z_0 = \operatorname{P}(r_0)$, $δ_0 = z_0$ and $k=0$

## Iteration

Repeat until a convergence criterion is reached

- Set $α =\frac{⟨r_k, z_k⟩_x}{⟨δ_k, \mathcal{H}[δ_k]⟩_x}$ and $⟨η_k, η_k⟩_{x}^* = ⟨η_k, \operatorname{P}(η_k)⟩_x + 2α ⟨η_k, \operatorname{P}(δ_k)⟩_{x} + {α}^2 ⟨δ_k, \operatorname{P}(δ_k)⟩_{x}$.
- If $⟨δ_k, \mathcal{H}[δ_k]⟩_x ≤ 0$ or $⟨η_k, η_k⟩_x^* ≥ Δ^2$ return $η_{k+1} = η_k + τ δ_k$ and stop.
- Set $η_{k}^*= η_k + α δ_k$, if $⟨η_k, η_k⟩_{x} + \frac{1}{2} ⟨η_k, \operatorname{Hess}[F] (η_k)_{x}⟩_{x} ≤ ⟨η_k^*, η_k^*⟩_{x} + \frac{1}{2} ⟨η_k^*, \operatorname{Hess}[F] (η_k)_ {x}⟩_{x}$ set $η_{k+1} = η_k$ else set $η_{k+1} = η_{k}^*$.
- Set $r_{k+1} = r_k + α \mathcal{H}[δ_k]$, $z_{k+1} = \operatorname{P}(r_{k+1})$, $β = \frac{⟨r_{k+1}, z_{k+1}⟩_{x}}{⟨r_k, z_k ⟩_{x}}$ and $δ_{k+1} = -z_{k+1} + β δ_k$.
- Set $k=k+1$.

## Result

The result is given by the last computed $η_k$.

## Remarks

The $\operatorname{P}(⋅)$ denotes the symmetric, positive deﬁnite preconditioner. It is required if a randomized approach is used i.e. using a random tangent vector $η_0$ as the initial vector. The idea behind it is to avoid saddle points. Preconditioning is simply a rescaling of the variables and thus a redefinition of the shape of the trust region. Ideally $\operatorname{P}(⋅)$ is a cheap, positive approximation of the inverse of the Hessian of $F$ at $x$. On default, the preconditioner is just the identity.

To step number 2: obtain $τ$ from the positive root of $\left\lVert η_k + τ δ_k \right\rVert_{\operatorname{P}, x} = Δ$ what becomes after the conversion of the equation to

\[ τ = \frac{-⟨η_k, \operatorname{P}(δ_k)⟩_{x} + \sqrt{⟨η_k, \operatorname{P}(δ_k)⟩_{x}^{2} + ⟨δ_k, \operatorname{P}(δ_k)⟩_{x} ( Δ^2 - ⟨η_k, \operatorname{P}(η_k)⟩_{x})}} {⟨δ_k, \operatorname{P}(δ_k)⟩_{x}}.\]

It can occur that $⟨δ_k, \operatorname{Hess}[F] (δ_k)_{x}⟩_{x} = κ ≤ 0$ at iteration $k$. In this case, the model is not strictly convex, and the stepsize $α =\frac{⟨r_k, z_k⟩_{x}} {κ}$ computed in step 1. does not give a reduction in the model function $m_x(⋅)$. Indeed, $m_x(⋅)$ is unbounded from below along the line $η_k + α δ_k$. If our aim is to minimize the model within the trust-region, it makes far more sense to reduce $m_x(⋅)$ along $η_k + α δ_k$ as much as we can while staying within the trust-region, and this means moving to the trust-region boundary along this line. Thus, when $κ ≤ 0$ at iteration k, we replace $α = \frac{⟨r_k, z_k⟩_{x}}{κ}$ with $τ$ described as above. The other possibility is that $η_{k+1}$ would lie outside the trust-region at iteration k (i.e. $⟨η_k, η_k⟩_{x}^{* } ≥ {Δ}^2$ that can be identified with the norm of $η_{k+1}$). In particular, when $\operatorname{Hess}[F] (⋅)_{x}$ is positive deﬁnite and $η_{k+1}$ lies outside the trust region, the solution to the trust-region problem must lie on the trust-region boundary. Thus, there is no reason to continue with the conjugate gradient iteration, as it stands, as subsequent iterates will move further outside the trust-region boundary. A sensible strategy, just as in the case considered above, is to move to the trust-region boundary by finding $τ$.

Although it is virtually impossible in practice to know how many iterations are necessary to provide a good estimate $η_{k}$ of the trust-region subproblem, the method stops after a certain number of iterations, which is realised by `StopAfterIteration`

. In order to increase the convergence rate of the underlying trust-region method, see `trust_regions`

, a typical stopping criterion is to stop as soon as an iteration $k$ is reached for which

\[ \Vert r_k \Vert_x \leqq \Vert r_0 \Vert_x \min \left( \Vert r_0 \Vert^{θ}_x, κ \right)\]

holds, where $0 < κ < 1$ and $θ > 0$ are chosen in advance. This is realized in this method by `StopWhenResidualIsReducedByFactorOrPower`

. It can be shown shown that under appropriate conditions the iterates $x_k$ of the underlying trust-region method converge to nondegenerate critical points with an order of convergence of at least $\min \left( θ + 1, 2 \right)$, see [Absil, Mahony, Sepulchre, 2008]. The method also aborts if the curvature of the model is negative, i.e. if $\langle \delta_k, \mathcal{H}[δ_k] \rangle_x \leqq 0$, which is realised by `StopWhenCurvatureIsNegative`

. If the next possible approximate solution $η_{k}^{*}$ calculated in iteration $k$ lies outside the trust region, i.e. if $\lVert η_{k}^{*} \rVert_x \geq Δ$, then the method aborts, which is realised by `StopWhenTrustRegionIsExceeded`

. Furthermore, the method aborts if the new model value evaluated at $η_{k}^{*}$ is greater than the previous model value evaluated at $η_{k}$, which is realised by `StopWhenModelIncreased`

.

## Interface

`Manopt.truncated_conjugate_gradient_descent`

— Function`truncated_conjugate_gradient_descent(M, f, grad_f, p, η, Hess_f, trust_region_radius)`

solve the trust-region subproblem

\[\operatorname*{arg\,min}_{η ∈ T_pM} m_p(η) \quad\text{where} m_p(η) = f(p) + ⟨\operatorname{grad} f(p),η⟩_x + \frac{1}{2}⟨\operatorname{Hess} f(p)[η],η⟩_x,\]

\[\text{such that}\quad ⟨η,η⟩_x ≤ Δ^2\]

on a manifold M by using the Steihaug-Toint truncated conjugate-gradient method, abbreviated tCG-method. For a description of the algorithm and theorems offering convergence guarantees, see the reference:

- P.-A. Absil, C.G. Baker, K.A. Gallivan, Trust-region methods on Riemannian manifolds, FoCM, 2007. doi: 10.1007/s10208-005-0179-9
- A. R. Conn, N. I. M. Gould, P. L. Toint, Trust-region methods, SIAM, MPS, 2000. doi: 10.1137/1.9780898719857

**Input**

`M`

– a manifold $\mathcal M$`f`

– a cost function $F: \mathcal M → ℝ$ to minimize`grad_f`

– the gradient $\operatorname{grad}f: \mathcal M → T\mathcal M$ of`F`

`Hess_f`

– the hessian $\operatorname{Hess}f: T_p\mathcal M → T_p\mathcal M$, $X ↦ \operatorname{Hess}F(p)[X] = ∇_X\operatorname{grad}f(p)$`p`

– a point on the manifold $p ∈ \mathcal M$`η`

– an update tangential vector $η ∈ T_p\mathcal M$

**Optional**

`evaluation`

– (`AllocatingEvaluation`

) specify whether the gradient and hessian work by allocation (default) or`InplaceEvaluation`

in place`preconditioner`

– a preconditioner for the hessian H`θ`

– (`1.0`

) 1+θ is the superlinear convergence target rate. The method aborts if the residual is less than or equal to the initial residual to the power of 1+θ.`κ`

– (`0.1`

) the linear convergence target rate. The method aborts if the residual is less than or equal to κ times the initial residual.`randomize`

– set to true if the trust-region solve is to be initiated with a random tangent vector. If set to true, no preconditioner will be used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.`trust_region_radius`

– (`injectivity_radius(M)/4`

) a trust-region radius`project!`

: (`copyto!`

) specify a projection operation for tangent vectors for numerical stability. A function`(M, Y, p, X) -> ...`

working in place of`Y`

. per default, no projection is perfomed, set it to`project!`

to activate projection.`stopping_criterion`

– (`StopAfterIteration`

`| [`

StopWhenResidualIsReducedByFactorOrPower`](@ref)`

| '`StopWhenCurvatureIsNegative`

`|`

`StopWhenTrustRegionIsExceeded`

) a functor inheriting from`StoppingCriterion`

indicating when to stop, where for the default, the maximal number of iterations is set to the dimension of the manifold, the power factor is`θ`

, the reduction factor is`κ`

.

and the ones that are passed to `decorate_state!`

for decorators.

**Output**

the obtained (approximate) minimizer $\eta^*$, see `get_solver_return`

for details

**see also**

`Manopt.truncated_conjugate_gradient_descent!`

— Function`truncated_conjugate_gradient_descent!(M, F, gradF, x, η, HessF, trust_region_radius; kwargs...)`

solve the trust-region subproblem in place of `x`

.

**Input**

**Input**

`M`

– a manifold $\mathcal M$`f`

– a cost function $F: \mathcal M → ℝ$ to minimize`grad_f`

– the gradient $\operatorname{grad}f: \mathcal M → T\mathcal M$ of`f`

`Hess_f`

– the hessian $\operatorname{Hess}f(x): T_p\mathcal M → T_p\mathcal M$, $X ↦ \operatorname{Hess}f(p)[X]$`p`

– a point on the manifold $p ∈ \mathcal M$`X`

– an update tangential vector $X ∈ T_x\mathcal M$

For more details and all optional arguments, see `truncated_conjugate_gradient_descent`

.

## State

`Manopt.TruncatedConjugateGradientState`

— Type`TruncatedConjugateGradientState <: AbstractHessianSolverState`

describe the Steihaug-Toint truncated conjugate-gradient method, with

**Fields**

a default value is given in brackets if a parameter can be left out in initialization.

`x`

: a point, where the trust-region subproblem needs to be solved`η`

: a tangent vector (called update vector), which solves the trust-region subproblem after successful calculation by the algorithm`stop`

: a`StoppingCriterion`

.`gradient`

: the gradient at the current iterate`δ`

: search direction`trust_region_radius`

: (`injectivity_radius(M)/4`

) the trust-region radius`residual`

: the gradient`randomize`

: indicates if the trust-region solve and so the algorithm is to be initiated with a random tangent vector. If set to true, no preconditioner will be used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.`project!`

: (`copyto!`

) specify a projection operation for tangent vectors for numerical stability. A function`(M, Y, p, X) -> ...`

working in place of`Y`

. per default, no projection is perfomed, set it to`project!`

to activate projection.

**Constructor**

```
TruncatedConjugateGradientState(M, p, x, η;
trust_region_radius=injectivity_radius(M)/4,
randomize=false,
θ=1.0,
κ=0.1,
project!=copyto!,
)
and a slightly involved `stopping_criterion`
```

**See also**

## Stopping Criteria

`Manopt.StopWhenResidualIsReducedByFactorOrPower`

— Type`StopWhenResidualIsReducedByFactorOrPower <: StoppingCriterion`

A functor for testing if the norm of residual at the current iterate is reduced either by a power of 1+θ or by a factor κ compared to the norm of the initial residual, i.e. $\Vert r_k \Vert_x \leqq \Vert r_0 \Vert_{x} \ \min \left( \kappa, \Vert r_0 \Vert_{x}^{\theta} \right)$.

**Fields**

`κ`

– the reduction factor`θ`

– part of the reduction power`reason`

– stores a reason of stopping if the stopping criterion has one be reached, see`get_reason`

.

**Constructor**

`StopWhenResidualIsReducedByFactorOrPower(; κ=0.1, θ=1.0)`

initialize the StopWhenResidualIsReducedByFactorOrPower functor to indicate to stop after the norm of the current residual is lesser than either the norm of the initial residual to the power of 1+θ or the norm of the initial residual times κ.

**See also**

`Manopt.StopWhenTrustRegionIsExceeded`

— Type`StopWhenTrustRegionIsExceeded <: StoppingCriterion`

A functor for testing if the norm of the next iterate in the Steihaug-Toint tcg mehtod is larger than the trust-region radius, i.e. $\Vert η_{k}^{*} \Vert_x ≧ trust_region_radius$. terminate the algorithm when the trust region has been left.

**Fields**

`reason`

– stores a reason of stopping if the stopping criterion has one be reached, see`get_reason`

.

**Constructor**

`StopWhenTrustRegionIsExceeded()`

initialize the StopWhenTrustRegionIsExceeded functor to indicate to stop after the norm of the next iterate is greater than the trust-region radius.

**See also**

`Manopt.StopWhenCurvatureIsNegative`

— Type`StopWhenCurvatureIsNegative <: StoppingCriterion`

A functor for testing if the curvature of the model is negative, i.e. $\langle \delta_k, \operatorname{Hess}[F](\delta_k)\rangle_x \leqq 0$. In this case, the model is not strictly convex, and the stepsize as computed does not give a reduction of the model.

**Fields**

`reason`

– stores a reason of stopping if the stopping criterion has one be reached, see`get_reason`

.

**Constructor**

`StopWhenCurvatureIsNegative()`

**See also**

`Manopt.StopWhenModelIncreased`

— Type`StopWhenModelIncreased <: StoppingCriterion`

A functor for testing if the curvature of the model value increased.

**Fields**

`reason`

– stores a reason of stopping if the stopping criterion has one be reached, see`get_reason`

.

**Constructor**

`StopWhenModelIncreased()`

**See also**

`Manopt.update_stopping_criterion!`

— Method`update_stopping_criterion!(c::StopWhenResidualIsReducedByFactorOrPower, :ResidualPower, v)`

Update the residual Power `θ`

to `v`

.

`Manopt.update_stopping_criterion!`

— Method`update_stopping_criterion!(c::StopWhenResidualIsReducedByFactorOrPower, :ResidualFactor, v)`

Update the residual Factor `κ`

to `v`

.

## Literature

- [Absil, Mahony, Sepulchre, 2008]
Absil, Pierre-Antoine and Mahony, Robert and Sepulchre, Rodolphe:
Optimization Algorithms on Matrix Manifolds Mathematics of Computation - Math. Comput., Volume 78. doi: 10.1515/9781400830244,