# Steihaug-Toint Truncated Conjugate-Gradient Method

The aim is to solve the trust-region subproblem

\[\operatorname*{arg\,min}_{η ∈ T_{x}\mathcal{M}} m_{x}(η) = F(x) + ⟨\operatorname{grad}F(x), η⟩_{x} + \frac{1}{2} ⟨ \operatorname{Hess}F(x)[η], η⟩_{x}\]

\[\text{s.t.} \; ⟨η, η⟩_{x} \leq {Δ}^2\]

on a manifold by using the Steihaug-Toint truncated conjugate-gradient method. All terms involving the trust-region radius use an inner product w.r.t. the preconditioner; this is because the iterates grow in length w.r.t. the preconditioner, guaranteeing that we do not re-enter the trust-region.

## Initialization

Initialize $η_0 = η$ if using randomized approach and $η$ the zero tangent vector otherwise, $r_0 = \operatorname{grad}F(x)$, $z_0 = \operatorname{P}(r_0)$, $δ_0 = z_0$ and $k=0$

## Iteration

Repeat until a convergence criterion is reached

- Set $κ = ⟨δ_k, \operatorname{Hess}F(x)[δ_k]⟩_x$, $α =\frac{⟨r_k, z_k⟩_x}{κ}$ and $⟨η_k, η_k⟩_{x}^* = ⟨η_k, \operatorname{P}(η_k)⟩_x + 2α ⟨η_k, \operatorname{P}(δ_k)⟩_{x} + {α}^2 ⟨δ_k, \operatorname{P}(δ_k)⟩_{x}$.
- If $κ ≤ 0$ or $⟨η_k, η_k⟩_x^* ≥ Δ^2$ return $η_{k+1} = η_k + τ δ_k$ and stop.
- Set $η_{k}^*= η_k + α δ_k$, if $⟨η_k, η_k⟩_{x} + \frac{1}{2} ⟨η_k, \operatorname{Hess}[F] (η_k)_{x}⟩_{x} ≤ ⟨η_k^*, η_k^*⟩_{x} + \frac{1}{2} ⟨η_k^*, \operatorname{Hess}[F] (η_k)_ {x}⟩_{x}$ set $η_{k+1} = η_k$ else set $η_{k+1} = η_{k}^*$.
- Set $r_{k+1} = r_k + α \operatorname{Hess}[F](δ_k)_x$, $z_{k+1} = \operatorname{P}(r_{k+1})$, $β = \frac{⟨r_{k+1}, z_{k+1}⟩_{x}}{⟨r_k, z_k ⟩_{x}}$ and $δ_{k+1} = -z_{k+1} + β δ_k$.
- Set $k=k+1$.

## Result

The result is given by the last computed $η_k$.

## Remarks

The $\operatorname{P}(⋅)$ denotes the symmetric, positive deﬁnite preconditioner. It is required if a randomized approach is used i.e. using a random tangent vector $η_0$ as initial vector. The idea behind it is to avoid saddle points. Preconditioning is simply a rescaling of the variables and thus a redeﬁnition of the shape of the trust region. Ideally $\operatorname{P}(⋅)$ is a cheap, positive approximation of the inverse of the Hessian of $F$ at $x$. On default, the preconditioner is just the identity.

To step number 2: Obtain $τ$ from the positive root of $\left\lVert η_k + τ δ_k \right\rVert_{\operatorname{P}, x} = Δ$ what becomes after the conversion of the equation to

\[ τ = \frac{-⟨η_k, \operatorname{P}(δ_k)⟩_{x} + \sqrt{⟨η_k, \operatorname{P}(δ_k)⟩_{x}^{2} + ⟨δ_k, \operatorname{P}(δ_k)⟩_{x} ( Δ^2 - ⟨η_k, \operatorname{P}(η_k)⟩_{x})}} {⟨δ_k, \operatorname{P}(δ_k)⟩_{x}}.\]

It can occur that $⟨δ_k, \operatorname{Hess}[F] (δ_k)_{x}⟩_{x} = κ ≤ 0$ at iteration $k$. In this case, the model is not strictly convex, and the stepsize $α =\frac{⟨r_k, z_k⟩_{x}} {κ}$ computed in step 1. does not give a reduction in the modelfunction $m_x(⋅)$. Indeed, $m_x(⋅)$ is unbounded from below along the line $η_k + α δ_k$. If our aim is to minimize the model within the trust-region, it makes far more sense to reduce $m_x(⋅)$ along $η_k + α δ_k$ as much as we can while staying within the trust-region, and this means moving to the trust-region boundary along this line. Thus when $κ ≤ 0$ at iteration k, we replace $α = \frac{⟨r_k, z_k⟩_{x}}{κ}$ with $τ$ described as above. The other possibility is that $η_{k+1}$ would lie outside the trust-region at iteration k (i.e. $⟨η_k, η_k⟩_{x}^{* } ≥ {Δ}^2$ what can be identified with the norm of $η_{k+1}$). In particular, when $\operatorname{Hess}[F] (⋅)_{x}$ is positive deﬁnite and $η_{k+1}$ lies outside the trust region, the solution to the trust-region problem must lie on the trust-region boundary. Thus, there is no reason to continue with the conjugate gradient iteration, as it stands, as subsequent iterates will move further outside the trust-region boundary. A sensible strategy, just as in the case considered above, is to move to the trust-region boundary by ﬁnding $τ$.

## Interface

`Manopt.truncated_conjugate_gradient_descent`

— Function`truncated_conjugate_gradient_descent(M, F, gradF, x, η, HessF, trust_region_radius)`

solve the trust-region subproblem

\[\operatorname*{arg\,min}_{η ∈ T_xM} m_x(η) \quad\text{where} m_x(η) = F(x) + ⟨\operatorname{grad}F(x),η⟩_x + \frac{1}{2}⟨\operatorname{Hess}F(x)[η],η⟩_x,\]

\[\text{such that}\quad ⟨η,η⟩_x ≤ Δ^2\]

with the `truncated_conjugate_gradient_descent`

. For a description of the algorithm and theorems offering convergence guarantees, see the reference:

- P.-A. Absil, C.G. Baker, K.A. Gallivan, Trust-region methods on Riemannian manifolds, FoCM, 2007. doi: 10.1007/s10208-005-0179-9
- A. R. Conn, N. I. M. Gould, P. L. Toint, Trust-region methods, SIAM, MPS, 2000. doi: 10.1137/1.9780898719857

**Input**

`M`

– a manifold $\mathcal M$`F`

– a cost function $F: \mathcal M → ℝ$ to minimize`gradF`

– the gradient $\operatorname{grad}F: \mathcal M → T\mathcal M$ of`F`

`HessF`

– the hessian $\operatorname{Hess}F(x): T_x\mathcal M → T_x\mathcal M$, $X ↦ \operatoname{Hess}F(x)[X] = ∇_ξ\operatorname{grad}f(x)$`x`

– a point on the manifold $x ∈ \mathcal M$`η`

– an update tangential vector $η ∈ T_x\mathcal M$`trust_region_radius`

– a trust-region radius

**Optional**

`evaluation`

– (`AllocatingEvaluation`

) specify whether the gradient and hessian work by allocation (default) or`MutatingEvaluation`

in place`preconditioner`

– a preconditioner for the hessian H`θ`

– 1+θ is the superlinear convergence target rate. The algorithm will terminate early if the residual was reduced by a power of 1+theta.`κ`

– the linear convergence target rate: algorithm will terminate early if the residual was reduced by a factor of kappa.`randomize`

– set to true if the trust-region solve is to be initiated with a random tangent vector. If set to true, no preconditioner will be used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.`project_vector!`

: (`copyto!`

) specify a projection operation for tangent vectors for numerical stability. A function`(M, Y, p, X) -> ...`

working in place of`Y`

. per default, no projection is perfomed, set it to`project!`

to activate projection.`stopping_criterion`

– (`StopWhenAny`

,`StopAfterIteration`

,`StopIfResidualIsReducedByFactor`

,`StopIfResidualIsReducedByPower`

,`StopWhenCurvatureIsNegative`

,`StopWhenTrustRegionIsExceeded`

) a functor inheriting from`StoppingCriterion`

indicating when to stop, where for the default, the maximal number of iterations is set to the dimension of the manifold, the power factor is`θ`

, the reduction factor is`κ`

. .`return_options`

– (`false`

) – if activated, the extended result, i.e. the complete`Options`

re returned. This can be used to access recorded values. If set to false (default) just the optimal value`x_opt`

is returned

and the ones that are passed to `decorate_options`

for decorators.

**Output**

`η`

– an approximate solution of the trust-region subproblem in $T_{x}\mathcal M$.

OR

`options`

- the options returned by the solver (see`return_options`

)

**see also**

`Manopt.truncated_conjugate_gradient_descent!`

— Function`truncated_conjugate_gradient_descent!(M, F, gradF, x, η, HessF, trust_region_radius; kwargs...)`

solve the trust-region subproblem in place of `x`

.

**Input**

**Input**

`M`

– a manifold $\mathcal M$`F`

– a cost function $F: \mathcal M → ℝ$ to minimize`gradF`

– the gradient $\operatorname{grad}F: \mathcal M → T\mathcal M$ of`F`

`HessF`

– the hessian $\operatorname{Hess}F(x): T_x\mathcal M → T_x\mathcal M$, $X ↦ \operatoname{Hess}F(x)[X] = ∇_ξ\operatorname{grad}f(x)$`x`

– a point on the manifold $x ∈ \mathcal M$`η`

– an update tangential vector $η ∈ T_x\mathcal M$`trust_region_radius`

– a trust-region radius

For more details and all optional arguments, see `truncated_conjugate_gradient_descent`

.

## Options

`Manopt.TruncatedConjugateGradientOptions`

— Type`TruncatedConjugateGradientOptions <: AbstractHessianOptions`

describe the Steihaug-Toint truncated conjugate-gradient method, with

**Fields**

a default value is given in brackets if a parameter can be left out in initialization.

`x`

: a point, where the trust-region subproblem needs to be solved`stop`

: a function s,r = @(o,iter,ξ,x,xnew) returning a stop indicator and a reason based on an iteration number, the gradient and the last and current iterates`gradient`

: the gradient at the current iterate`η`

: a tangent vector (called update vector), which solves the trust-region subproblem after successful calculation by the algorithm`δ`

: search direction`trust_region_radius`

: the trust-region radius`residual`

: the gradient`randomize`

: indicates if the trust-region solve and so the algorithm is to be initiated with a random tangent vector. If set to true, no preconditioner will be used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.`project_vector!`

: (`copyto!`

) specify a projection operation for tangent vectors for numerical stability. A function`(M, Y, p, X) -> ...`

working in place of`Y`

. per default, no projection is perfomed, set it to`project!`

to activate projection.

**Constructor**

`TruncatedConjugateGradientOptions(x, stop, eta, delta, Delta, res, uR)`

construct a truncated conjugate-gradient Option with the fields as above.

**See also**

## Additional Stopping Criteria

`Manopt.StopIfResidualIsReducedByPower`

— Type`StopIfResidualIsReducedByPower <: StoppingCriterion`

A functor for testing if the norm of residual at the current iterate is reduced by a power of 1+θ compared to the norm of the initial residual, i.e. $\Vert r_k \Vert_x \leqq \Vert r_0 \Vert_{x}^{1+\theta}$. In this case the algorithm reached superlinear convergence.

**Fields**

`θ`

– part of the reduction power`initialResidualNorm`

- stores the norm of the residual at the initial vector $η$ of the Steihaug-Toint tcg mehtod`truncated_conjugate_gradient_descent`

`reason`

– stores a reason of stopping if the stopping criterion has one be reached, see`get_reason`

.

**Constructor**

`StopIfResidualIsReducedByPower(iRN, θ)`

initialize the StopIfResidualIsReducedByFactor functor to indicate to stop after the norm of the current residual is lesser than the norm of the initial residual iRN to the power of 1+θ.

**See also**

`Manopt.StopIfResidualIsReducedByFactor`

— Type`StopIfResidualIsReducedByFactor <: StoppingCriterion`

A functor for testing if the norm of residual at the current iterate is reduced by a factor compared to the norm of the initial residual, i.e. $\Vert r_k \Vert_x \leqq κ \Vert r_0 \Vert_x$. In this case the algorithm reached linear convergence.

**Fields**

`κ`

– the reduction factor`initialResidualNorm`

- stores the norm of the residual at the initial vector $η$ of the Steihaug-Toint tcg mehtod`truncated_conjugate_gradient_descent`

`reason`

– stores a reason of stopping if the stopping criterion has one be reached, see`get_reason`

.

**Constructor**

`StopIfResidualIsReducedByFactor(iRN, κ)`

initialize the StopIfResidualIsReducedByFactor functor to indicate to stop after the norm of the current residual is lesser than the norm of the initial residual iRN times κ.

**See also**

`Manopt.StopWhenTrustRegionIsExceeded`

— Type`StopWhenTrustRegionIsExceeded <: StoppingCriterion`

A functor for testing if the norm of the next iterate in the Steihaug-Toint tcg mehtod is larger than the trust-region radius, i.e. $\Vert η_{k}^{*} \Vert_x ≧ trust_region_radius$. terminate the algorithm when the trust region has been left.

**Fields**

`reason`

– stores a reason of stopping if the stopping criterion has one be reached, see`get_reason`

.`storage`

– stores the necessary parameters`η, δ, residual`

to check the criterion.

**Constructor**

`StopWhenTrustRegionIsExceeded([a])`

initialize the StopWhenTrustRegionIsExceeded functor to indicate to stop after the norm of the next iterate is greater than the trust-region radius using the `StoreOptionsAction`

`a`

, which is initialized to store `:η, :δ, :residual`

by default.

**See also**

`Manopt.StopWhenCurvatureIsNegative`

— Type`StopWhenCurvatureIsNegative <: StoppingCriterion`

A functor for testing if the curvature of the model is negative, i.e. $\langle \delta_k, \operatorname{Hess}[F](\delta_k)\rangle_x \leqq 0$. In this case, the model is not strictly convex, and the stepsize as computed does not give a reduction of the model.

**Fields**

`reason`

– stores a reason of stopping if the stopping criterion has one be reached, see`get_reason`

.

**Constructor**

`StopWhenCurvatureIsNegative()`

**See also**

`Manopt.StopWhenModelIncreased`

— Type`StopWhenModelIncreased <: StoppingCriterion`

A functor for testing if the curvature of the model value increased.

**Fields**

`reason`

– stores a reason of stopping if the stopping criterion has one be reached, see`get_reason`

.

**Constructor**

`StopWhenModelIncreased()`

**See also**