Steihaug-Toint Truncated Conjugate-Gradient Method
The aim is to solve the trust-region subproblem
\[\operatorname*{arg\,min}_{η ∈ T_{x}\mathcal{M}} m_{x}(η) = F(x) + ⟨\operatorname{grad}F(x), η⟩_{x} + \frac{1}{2} ⟨ \mathcal{H}[η], η⟩_{x}\]
\[\text{s.t.} \; ⟨η, η⟩_{x} \leq {Δ}^2\]
on a manifold by using the Steihaug-Toint truncated conjugate-gradient method, abbreviated tCG-method. All terms involving the trust-region radius use an inner product w.r.t. the preconditioner; this is because the iterates grow in length w.r.t. the preconditioner, guaranteeing that we do not re-enter the trust-region.
Initialization
Initialize $η_0 = η$ if using randomized approach and $η$ the zero tangent vector otherwise, $r_0 = \operatorname{grad}F(x)$, $z_0 = \operatorname{P}(r_0)$, $δ_0 = z_0$ and $k=0$
Iteration
Repeat until a convergence criterion is reached
- Set $α =\frac{⟨r_k, z_k⟩_x}{⟨δ_k, \mathcal{H}[δ_k]⟩_x}$ and $⟨η_k, η_k⟩_{x}^* = ⟨η_k, \operatorname{P}(η_k)⟩_x + 2α ⟨η_k, \operatorname{P}(δ_k)⟩_{x} + {α}^2 ⟨δ_k, \operatorname{P}(δ_k)⟩_{x}$.
- If $⟨δ_k, \mathcal{H}[δ_k]⟩_x ≤ 0$ or $⟨η_k, η_k⟩_x^* ≥ Δ^2$ return $η_{k+1} = η_k + τ δ_k$ and stop.
- Set $η_{k}^*= η_k + α δ_k$, if $⟨η_k, η_k⟩_{x} + \frac{1}{2} ⟨η_k, \operatorname{Hess}[F] (η_k)_{x}⟩_{x} ≤ ⟨η_k^*, η_k^*⟩_{x} + \frac{1}{2} ⟨η_k^*, \operatorname{Hess}[F] (η_k)_ {x}⟩_{x}$ set $η_{k+1} = η_k$ else set $η_{k+1} = η_{k}^*$.
- Set $r_{k+1} = r_k + α \mathcal{H}[δ_k]$, $z_{k+1} = \operatorname{P}(r_{k+1})$, $β = \frac{⟨r_{k+1}, z_{k+1}⟩_{x}}{⟨r_k, z_k ⟩_{x}}$ and $δ_{k+1} = -z_{k+1} + β δ_k$.
- Set $k=k+1$.
Result
The result is given by the last computed $η_k$.
Remarks
The $\operatorname{P}(⋅)$ denotes the symmetric, positive definite preconditioner. It is required if a randomized approach is used i.e. using a random tangent vector $η_0$ as the initial vector. The idea behind it is to avoid saddle points. Preconditioning is simply a rescaling of the variables and thus a redefinition of the shape of the trust region. Ideally $\operatorname{P}(⋅)$ is a cheap, positive approximation of the inverse of the Hessian of $F$ at $x$. On default, the preconditioner is just the identity.
To step number 2: obtain $τ$ from the positive root of $\left\lVert η_k + τ δ_k \right\rVert_{\operatorname{P}, x} = Δ$ what becomes after the conversion of the equation to
\[ τ = \frac{-⟨η_k, \operatorname{P}(δ_k)⟩_{x} + \sqrt{⟨η_k, \operatorname{P}(δ_k)⟩_{x}^{2} + ⟨δ_k, \operatorname{P}(δ_k)⟩_{x} ( Δ^2 - ⟨η_k, \operatorname{P}(η_k)⟩_{x})}} {⟨δ_k, \operatorname{P}(δ_k)⟩_{x}}.\]
It can occur that $⟨δ_k, \operatorname{Hess}[F] (δ_k)_{x}⟩_{x} = κ ≤ 0$ at iteration $k$. In this case, the model is not strictly convex, and the stepsize $α =\frac{⟨r_k, z_k⟩_{x}} {κ}$ computed in step 1. does not give a reduction in the model function $m_x(⋅)$. Indeed, $m_x(⋅)$ is unbounded from below along the line $η_k + α δ_k$. If our aim is to minimize the model within the trust-region, it makes far more sense to reduce $m_x(⋅)$ along $η_k + α δ_k$ as much as we can while staying within the trust-region, and this means moving to the trust-region boundary along this line. Thus, when $κ ≤ 0$ at iteration k, we replace $α = \frac{⟨r_k, z_k⟩_{x}}{κ}$ with $τ$ described as above. The other possibility is that $η_{k+1}$ would lie outside the trust-region at iteration k (i.e. $⟨η_k, η_k⟩_{x}^{* } ≥ {Δ}^2$ that can be identified with the norm of $η_{k+1}$). In particular, when $\operatorname{Hess}[F] (⋅)_{x}$ is positive definite and $η_{k+1}$ lies outside the trust region, the solution to the trust-region problem must lie on the trust-region boundary. Thus, there is no reason to continue with the conjugate gradient iteration, as it stands, as subsequent iterates will move further outside the trust-region boundary. A sensible strategy, just as in the case considered above, is to move to the trust-region boundary by finding $τ$.
Although it is virtually impossible in practice to know how many iterations are necessary to provide a good estimate $η_{k}$ of the trust-region subproblem, the method stops after a certain number of iterations, which is realised by StopAfterIteration
. In order to increase the convergence rate of the underlying trust-region method, see trust_regions
, a typical stopping criterion is to stop as soon as an iteration $k$ is reached for which
\[ \Vert r_k \Vert_x \leqq \Vert r_0 \Vert_x \min \left( \Vert r_0 \Vert^{θ}_x, κ \right)\]
holds, where $0 < κ < 1$ and $θ > 0$ are chosen in advance. This is realized in this method by StopWhenResidualIsReducedByFactorOrPower
. It can be shown shown that under appropriate conditions the iterates $x_k$ of the underlying trust-region method converge to nondegenerate critical points with an order of convergence of at least $\min \left( θ + 1, 2 \right)$, see Absil, Mahony, Sepulchre, Princeton University Press, 2008. The method also aborts if the curvature of the model is negative, i.e. if $\langle \delta_k, \mathcal{H}[δ_k] \rangle_x \leqq 0$, which is realised by StopWhenCurvatureIsNegative
. If the next possible approximate solution $η_{k}^{*}$ calculated in iteration $k$ lies outside the trust region, i.e. if $\lVert η_{k}^{*} \rVert_x \geq Δ$, then the method aborts, which is realised by StopWhenTrustRegionIsExceeded
. Furthermore, the method aborts if the new model value evaluated at $η_{k}^{*}$ is greater than the previous model value evaluated at $η_{k}$, which is realised by StopWhenModelIncreased
.
Interface
Manopt.truncated_conjugate_gradient_descent
— Functiontruncated_conjugate_gradient_descent(M, f, grad_f, p; kwargs...)
truncated_conjugate_gradient_descent(M, f, grad_f, p, X; kwargs...)
truncated_conjugate_gradient_descent(M, f, grad_f, Hess_f; kwargs...)
truncated_conjugate_gradient_descent(M, f, grad_f, Hess_f, p; kwargs...)
truncated_conjugate_gradient_descent(M, f, grad_f, Hess_f, p, X; kwargs...)
truncated_conjugate_gradient_descent(M, mho::ManifoldHessianObjective, p, X; kwargs...)
solve the trust-region subproblem
\[\operatorname*{arg\,min}_{η ∈ T_pM} m_p(η) \quad\text{where} m_p(η) = f(p) + ⟨\operatorname{grad} f(p),η⟩_x + \frac{1}{2}⟨\operatorname{Hess} f(p)[η],η⟩_x,\]
\[\text{such that}\quad ⟨η,η⟩_x ≤ Δ^2\]
on a manifold M by using the Steihaug-Toint truncated conjugate-gradient method, abbreviated tCG-method. For a description of the algorithm and theorems offering convergence guarantees, see the reference:
- P.-A. Absil, C.G. Baker, K.A. Gallivan, Trust-region methods on Riemannian manifolds, FoCM, 2007. doi: 10.1007/s10208-005-0179-9
- A. R. Conn, N. I. M. Gould, P. L. Toint, Trust-region methods, SIAM, MPS, 2000. doi: 10.1137/1.9780898719857
Input
See signatures above, you can leave out only the Hessian, the vector, the point and the vector, or all 3.
M
– a manifold $\mathcal M$f
– a cost function $F: \mathcal M → ℝ$ to minimizegrad_f
– the gradient $\operatorname{grad}f: \mathcal M → T\mathcal M$ ofF
Hess_f
– (optional, cf.ApproxHessianFiniteDifference
) the hessian $\operatorname{Hess}f: T_p\mathcal M → T_p\mathcal M$, $X ↦ \operatorname{Hess}F(p)[X] = ∇_X\operatorname{grad}f(p)$p
– a point on the manifold $p ∈ \mathcal M$X
– an update tangential vector $X ∈ T_p\mathcal M$
Optional
evaluation
– (AllocatingEvaluation
) specify whether the gradient and hessian work by allocation (default) orInplaceEvaluation
in placepreconditioner
– a preconditioner for the hessian Hθ
– (1.0
) 1+θ is the superlinear convergence target rate. The method aborts if the residual is less than or equal to the initial residual to the power of 1+θ.κ
– (0.1
) the linear convergence target rate. The method aborts if the residual is less than or equal to κ times the initial residual.randomize
– set to true if the trust-region solve is to be initiated with a random tangent vector. If set to true, no preconditioner will be used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.trust_region_radius
– (injectivity_radius(M)/4
) a trust-region radiusproject!
: (copyto!
) specify a projection operation for tangent vectors for numerical stability. A function(M, Y, p, X) -> ...
working in place ofY
. per default, no projection is perfomed, set it toproject!
to activate projection.stopping_criterion
– (StopAfterIteration
| [
StopWhenResidualIsReducedByFactorOrPower](@ref)
| 'StopWhenCurvatureIsNegative
|
StopWhenTrustRegionIsExceeded
) a functor inheriting fromStoppingCriterion
indicating when to stop, where for the default, the maximal number of iterations is set to the dimension of the manifold, the power factor isθ
, the reduction factor isκ
.
and the ones that are passed to decorate_state!
for decorators.
Output
the obtained (approximate) minimizer $\eta^*$, see get_solver_return
for details
see also
Manopt.truncated_conjugate_gradient_descent!
— Functiontruncated_conjugate_gradient_descent!(M, f, grad_f, Hess_f, p, X; kwargs...)
truncated_conjugate_gradient_descent!(M, f, grad_f, p, X; kwargs...)
solve the trust-region subproblem in place of X
(and p
).
Input
M
– a manifold $\mathcal M$f
– a cost function $F: \mathcal M → ℝ$ to minimizegrad_f
– the gradient $\operatorname{grad}f: \mathcal M → T\mathcal M$ off
Hess_f
– the hessian $\operatorname{Hess}f(x): T_p\mathcal M → T_p\mathcal M$, $X ↦ \operatorname{Hess}f(p)[X]$p
– a point on the manifold $p ∈ \mathcal M$X
– an update tangential vector $X ∈ T_x\mathcal M$
For more details and all optional arguments, see truncated_conjugate_gradient_descent
.
State
Manopt.TruncatedConjugateGradientState
— TypeTruncatedConjugateGradientState <: AbstractHessianSolverState
describe the Steihaug-Toint truncated conjugate-gradient method, with
Fields
a default value is given in brackets if a parameter can be left out in initialization.
x
: a point, where the trust-region subproblem needs to be solvedη
: a tangent vector (called update vector), which solves the trust-region subproblem after successful calculation by the algorithmstop
: aStoppingCriterion
.gradient
: the gradient at the current iterateδ
: search directiontrust_region_radius
: (injectivity_radius(M)/4
) the trust-region radiusresidual
: the gradientrandomize
: indicates if the trust-region solve and so the algorithm is to be initiated with a random tangent vector. If set to true, no preconditioner will be used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.project!
: (copyto!
) specify a projection operation for tangent vectors for numerical stability. A function(M, Y, p, X) -> ...
working in place ofY
. per default, no projection is perfomed, set it toproject!
to activate projection.
Constructor
TruncatedConjugateGradientState(M, p=rand(M), η=zero_vector(M,p);
trust_region_radius=injectivity_radius(M)/4,
randomize=false,
θ=1.0,
κ=0.1,
project!=copyto!,
)
and a slightly involved `stopping_criterion`
See also
Stopping Criteria
Manopt.StopWhenResidualIsReducedByFactorOrPower
— TypeStopWhenResidualIsReducedByFactorOrPower <: StoppingCriterion
A functor for testing if the norm of residual at the current iterate is reduced either by a power of 1+θ or by a factor κ compared to the norm of the initial residual, i.e. $\Vert r_k \Vert_x \leqq \Vert r_0 \Vert_{x} \ \min \left( \kappa, \Vert r_0 \Vert_{x}^{\theta} \right)$.
Fields
κ
– the reduction factorθ
– part of the reduction powerreason
– stores a reason of stopping if the stopping criterion has one be reached, seeget_reason
.
Constructor
StopWhenResidualIsReducedByFactorOrPower(; κ=0.1, θ=1.0)
initialize the StopWhenResidualIsReducedByFactorOrPower functor to indicate to stop after the norm of the current residual is lesser than either the norm of the initial residual to the power of 1+θ or the norm of the initial residual times κ.
See also
Manopt.StopWhenTrustRegionIsExceeded
— TypeStopWhenTrustRegionIsExceeded <: StoppingCriterion
A functor for testing if the norm of the next iterate in the Steihaug-Toint tcg method is larger than the trust-region radius, i.e. $\Vert η_{k}^{*} \Vert_x ≧ trust_region_radius$. terminate the algorithm when the trust region has been left.
Fields
reason
– stores a reason of stopping if the stopping criterion has been reached, seeget_reason
.
Constructor
StopWhenTrustRegionIsExceeded()
initialize the StopWhenTrustRegionIsExceeded functor to indicate to stop after the norm of the next iterate is greater than the trust-region radius.
See also
Manopt.StopWhenCurvatureIsNegative
— TypeStopWhenCurvatureIsNegative <: StoppingCriterion
A functor for testing if the curvature of the model is negative, i.e. $\langle \delta_k, \operatorname{Hess}[F](\delta_k)\rangle_x \leqq 0$. In this case, the model is not strictly convex, and the stepsize as computed does not give a reduction of the model.
Fields
reason
– stores a reason of stopping if the stopping criterion has been reached, seeget_reason
.
Constructor
StopWhenCurvatureIsNegative()
See also
Manopt.StopWhenModelIncreased
— TypeStopWhenModelIncreased <: StoppingCriterion
A functor for testing if the curvature of the model value increased.
Fields
reason
– stores a reason of stopping if the stopping criterion has been reached, seeget_reason
.
Constructor
StopWhenModelIncreased()
See also
Manopt.update_stopping_criterion!
— Methodupdate_stopping_criterion!(c::StopWhenResidualIsReducedByFactorOrPower, :ResidualPower, v)
Update the residual Power θ
to v
.
Manopt.update_stopping_criterion!
— Methodupdate_stopping_criterion!(c::StopWhenResidualIsReducedByFactorOrPower, :ResidualFactor, v)
Update the residual Factor κ
to v
.
Literature
- [AMS08]
-
P.-A. Absil, R. Mahony and R. Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton University Press (2008). [open access](http://press.princeton.edu/chapters/absil/).