The Riemannian TrustRegions Solver
The aim is to solve an optimization problem on a manifold
\[\operatorname*{min}_{x ∈ \mathcal{M}} F(x)\]
by using the Riemannian trustregions solver. It is number one choice for smooth optimization. This trustregion method uses the SteihaugToint truncated conjugategradient method truncated_conjugate_gradient_descent
to solve the inner minimization problem called the trustregions subproblem. This inner solver can be preconditioned by providing a preconditioner (symmetric and positive deﬁnite, an approximation of the inverse of the Hessian of $F$). If no Hessian of the cost function $F$ is provided, a standard approximation of the Hessian based on the gradient $\operatorname{grad}F$ with ApproxHessianFiniteDifference
will be computed.
Initialization
Initialize $x_0 = x$ with an initial point $x$ on the manifold. It can be given by the caller or set randomly. Set the initial trustregion radius $\Delta =\frac{1}{8} \bar{\Delta}$ where $\bar{\Delta}$ is the maximum radius the trustregion can have. Usually one uses the root of the manifold's dimension $\operatorname{dim}(\mathcal{M})$. For accepting the next iterate and evaluating the new trustregion radius, one needs an accept/reject threshold $\rho' ∈ [0,\frac{1}{4})$, which is $\rho' = 0.1$ on default. Set $k=0$.
Iteration
Repeat until a convergence criterion is reached
 Set $η$ as a random tangent vector if using randomized approach. Else set $η$ as the zero vector in the tangential space $T_{x_k}\mathcal{M}$.
 Set $η^*$ as the solution of the trustregion subproblem, computed by the tcgmethod with $η$ as initial vector.
 If using randomized approach, compare $η^*$ with the Cauchy point $η_{c}^* = \tau_{c} \frac{\Delta}{\lVert \operatorname{Grad}[F] (x_k) \rVert_{x_k}} \operatorname{Grad}[F] (x_k)$ by the model function $m_{x_k}(⋅)$. If the model decrease is larger by using the Cauchy point, set $η^* = η_{c}^*$.
 Set ${x}^* = \operatorname{retr}_{x_k}(η^*)$.
 Set $\rho = \frac{F(x_k)F({x}^*)}{m_{x_k}(η)m_{x_k}(η^*)}$, where $m_{x_k}(⋅)$ describes the quadratic model function.
 Update the trustregion radius:$\Delta = \begin{cases}\frac{1}{4} \Delta &\text{ if } \rho < \frac{1}{4} \, \text{or} \, m_{x_k}(η)m_{x_k}(η^*) \leq 0 \, \text{or} \, \rho = \pm ∈ fty , \\\operatorname{min}(2 \Delta, \bar{\Delta}) &\text{ if } \rho > \frac{3}{4} \, \text{and the tcgmethod stopped because of negative curvature or exceeding the trustregion},\\\Delta & \, \text{otherwise.}\end{cases}$
 If $m_{x_k}(η)m_{x_k}(η^*) \geq 0$ and $\rho > \rho'$ set $x_k = {x}^*$.
 Set $k = k+1$.
Result
The result is given by the last computed $x_k$.
Remarks
To the initialization: a random point on the manifold.
To step number 1: using a randomized approach means using a random tangent vector as initial vector for the approximate solve of the trustregions subproblem. If this is the case, keep in mind that the vector must be in the trustregion radius. This is achieved by multiplying η
by sqrt(4,eps(Float64))
as long as its norm is greater than the current trustregion radius $\Delta$. For not using randomized approach, one can get the zero tangent vector.
To step number 2: obtain $η^*$ by (approximately) solving the trustregions subproblem
\[\operatorname*{arg\,min}_{η ∈ T_{x_k}\mathcal{M}} m_{x_k}(η) = F(x_k) + \langle \operatorname{grad}F(x_k), η \rangle_{x_k} + \frac{1}{2} \langle \operatorname{Hess}[F](η)_ {x_k}, η \rangle_{x_k}\]
\[\text{s.t.} \; \langle η, η \rangle_{x_k} \leq {\Delta}^2\]
with the SteihaugToint truncated conjugategradient (tcg) method. The problem as well as the solution method is described in the truncated_conjugate_gradient_descent
. In this inner solver, the stopping criterion StopWhenResidualIsReducedByFactorOrPower
so that superlinear or at least linear convergence in the trustregion method can be achieved.
To step number 3: if using a random tangent vector as an initial vector, compare the result of the tcgmethod with the Cauchy point. Convergence proofs assume that one achieves at least (a fraction of) the reduction of the Cauchy point. The idea is to go in the direction of the gradient to an optimal point. This can be on the edge, but also before. The parameter $\tau_{c}$ for the optimal length is defined by
\[\tau_{c} = \begin{cases} 1 & \langle \operatorname{Grad}[F] (x_k), \, \operatorname{Hess}[F] (η_k)_ {x_k}\rangle_{x_k} \leq 0 , \\ \operatorname{min}(\frac{{\operatorname{norm}(\operatorname{Grad}[F] (x_k))}^3} {\Delta \langle \operatorname{Grad}[F] (x_k), \, \operatorname{Hess}[F] (η_k)_ {x_k}\rangle_{x_k}}, 1) & \, \text{otherwise.} \end{cases}\]
To check the model decrease one compares
\[m_{x_k}(η_{c}^*) = F(x_k) + \langle η_{c}^*, \operatorname{Grad}[F] (x_k)\rangle_{x_k} + \frac{1}{2}\langle η_{c}^*, \operatorname{Hess}[F] (η_{c}^*)_ {x_k}\rangle_{x_k}\]
with
\[m_{x_k}(η^*) = F(x_k) + \langle η^*, \operatorname{Grad}[F] (x_k)\rangle_{x_k} + \frac{1}{2}\langle η^*, \operatorname{Hess}[F] (η^*)_ {x_k}\rangle_{x_k}.\]
If $m_{x_k}(η_{c}^*) < m_{x_k}(η^*)$ then $m_{x_k}(η_{c}^*)$ is the better choice.
To step number 4: $\operatorname{retr}_{x_k}(⋅)$ denotes the retraction, a mapping $\operatorname{retr}_{x_k}:T_{x_k}\mathcal{M} \rightarrow \mathcal{M}$ which approximates the exponential map. In some cases it is cheaper to use this instead of the exponential.
To step number 6: one knows that the truncated_conjugate_gradient_descent
algorithm stopped for these reasons when the stopping criteria StopWhenCurvatureIsNegative
, StopWhenTrustRegionIsExceeded
are activated.
To step number 7: the last step is to decide if the new point ${x}^*$ is accepted.
Interface
Manopt.trust_regions
— Functiontrust_regions(M, f, grad_f, hess_f, p)
trust_regions(M, f, grad_f, p)
run the Riemannian trustregions solver for optimization on manifolds to minmize f
cf. [Absil, Baker, Gallivan, FoCM, 2006; Conn, Gould, Toint, SIAM, 2000].
For the case that no hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference
. For solving the the inner trustregion subproblem of finding an updatevector, see truncated_conjugate_gradient_descent
.
Input
M
– a manifold $\mathcal M$f
– a cost function $F : \mathcal M → ℝ$ to minimizegrad_f
 the gradient $\operatorname{grad}F : \mathcal M → T \mathcal M$ of $F$Hess_f
– (optional), the hessian $\operatorname{Hess}F(x): T_x\mathcal M → T_x\mathcal M$, $X ↦ \operatorname{Hess}F(x)[X] = ∇_ξ\operatorname{grad}f(x)$p
– an initial value $x ∈ \mathcal M$
Optional
evaluation
– (AllocatingEvaluation
) specify whether the gradient and hessian work by allocation (default) orInplaceEvaluation
in placemax_trust_region_radius
– the maximum trustregion radiuspreconditioner
– a preconditioner (a symmetric, positive definite operator that should approximate the inverse of the Hessian)randomize
– set to true if the trustregion solve is to be initiated with a random tangent vector. If set to true, no preconditioner will be used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.project!
: (copyto!
) specify a projection operation for tangent vectors within the TCG for numerical stability. A function(M, Y, p, X) > ...
working in place ofY
. per default, no projection is perfomed, set it toproject!
to activate projection.retraction
– (default_retraction_method(M, typeof(p))
) approximation of the exponential mapstopping_criterion
– (StopWhenAny
(StopAfterIteration
(1000)
,StopWhenGradientNormLess
(10^(6))
) a functor inheriting fromStoppingCriterion
indicating when to stop.trust_region_radius
 the initial trustregion radiusρ_prime
– Accept/reject threshold: if ρ (the performance ratio for the iterate) is at least ρ', the outer iteration is accepted. Otherwise, it is rejected. In case it is rejected, the trustregion radius will have been decreased. To ensure this, ρ' >= 0 must be strictly smaller than 1/4. If ρ_prime is negative, the algorithm is not guaranteed to produce monotonically decreasing cost values. It is strongly recommended to set ρ' > 0, to aid convergence.ρ_regularization
– Close to convergence, evaluating the performance ratio ρ is numerically challenging. Meanwhile, close to convergence, the quadratic model should be a good fit and the steps should be accepted. Regularization lets ρ go to 1 as the model decrease and the actual decrease go to zero. Set this option to zero to disable regularization (not recommended). When this is not zero, it may happen that the iterates produced are not monotonically improving the cost when very close to convergence. This is because the corrected cost improvement could change sign if it is negative but very small.θ
– (1.0
) 1+θ is the superlinear convergence target rate of the tCGmethodtruncated_conjugate_gradient_descent
, which computes an approximate solution for the trustregion subproblem. The tCGmethod aborts if the residual is less than or equal to the initial residual to the power of 1+θ.κ
– (0.1
) the linear convergence target rate of the tCGmethodtruncated_conjugate_gradient_descent
, which computes an approximate solution for the trustregion subproblem. The method aborts if the residual is less than or equal to κ times the initial residual.reduction_threshold
– (0.1
) Trustregion reduction threshold: if ρ (the performance ratio for the iterate) is less than this bound, the trustregion radius and thus the trustregions decreases.augmentation_threshold
– (0.75
) Trustregion augmentation threshold: if ρ (the performance ratio for the iterate) is greater than this and further conditions apply, the trustregion radius and thus the trustregions increases.
Output
the obtained (approximate) minimizer $p^*$, see get_solver_return
for details
see also
Manopt.trust_regions!
— Functiontrust_regions!(M, f, grad_f, Hess_f, p; kwargs...)
trust_regions!(M, f, grad_f, p; kwargs...)
evaluate the Riemannian trustregions solver in place of p
.
Input
M
– a manifold $\mathcal M$f
– a cost function $F: \mathcal M → ℝ$ to minimizegrad_f
 the gradient $\operatorname{grad}F: \mathcal M → T \mathcal M$ of $F$Hess_f
– (optional) the hessian $H( \mathcal M, x, ξ)$ of $F$p
– an initial value $p ∈ \mathcal M$
For the case that no hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference
.
for more details and all options, see trust_regions
State
Manopt.TrustRegionsState
— TypeTrustRegionsState <: AbstractHessianSolverState
describe the trustregions solver, with
Fields
where all but p
are keyword arguments in the constructor
p
: the current iteratestop
: (`StopAfterIteration(1000)  StopWhenGradientNormLess(1e6))max_trust_region_radius
: (sqrt(manifold_dimension(M))
) the maximum trustregion radiusproject!
: (copyto!
) specify a projection operation for tangent vectors for numerical stability. A function(M, Y, p, X) > ...
working in place ofY
. per default, no projection is perfomed, set it toproject!
to activate projection.randomize
: (false
) indicates if the trustregion solve is to be initiated with a random tangent vector. If set to true, no preconditioner will be used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.ρ_prime
: (0.1
) a lower bound of the performance ratio for the iterate that decides if the iteration will be accepted or not. If not, the trustregion radius will have been decreased. To ensure this, ρ'>= 0 must be strictly smaller than 1/4. If ρ' is negative, the algorithm is not guaranteed to produce monotonically decreasing cost values. It is strongly recommended to set ρ' > 0, to aid convergence.ρ_regularization
: (10000.0
) Close to convergence, evaluating the performance ratio ρ is numerically challenging. Meanwhile, close to convergence, the quadratic model should be a good fit and the steps should be accepted. Regularization lets ρ go to 1 as the model decrease and the actual decrease go to zero. Set this option to zero to disable regularization (not recommended). When this is not zero, it may happen that the iterates produced are not monotonically improving the cost when very close to convergence. This is because the corrected cost improvement could change sign if it is negative but very small.trust_region_radius
: the (initial) trustregion radius
Constructor
TrustRegionsState(M,
p=rand(M),
X=zero_vector(M,p),
sub_state=TruncatedConjugateGradientState(M, p, X),
)
construct a trustregions Option with all other fields from above being keyword arguments
See also
Approximation of the Hessian
We currently provide a few different methods to approximate the Hessian.
Manopt.ApproxHessianFiniteDifference
— TypeApproxHessianFiniteDifference{E, P, T, G, RTR,, VTR, R <: Real} <: AbstractApproxHessian
A functor to approximate the Hessian by a finite difference of gradient evaluation.
Given a point p
and a direction X
and the gradient $\operatorname{grad}F: \mathcal M \to T\mathcal M$ of a function $F$ the Hessian is approximated as follows: Let $c$ be a stepsize, $X∈ T_p\mathcal M$ a tangent vector and $q = \operatorname{retr}_p(\frac{c}{\lVert X \rVert_p}X)$ be a step in direction $X$ of length $c$ following a retraction Then we approximate the Hessian by the finite difference of the gradients, where $\mathcal T_{\cdot\gets\cdot}$ is a vector transport.
\[\operatorname{Hess}F(p)[X] ≈ \frac{\lVert X \rVert_p}{c}\Bigl( \mathcal T_{p\gets q}\bigr(\operatorname{grad}F(q)\bigl)  \operatorname{grad}F(p)\Bigl)\]
Fields
gradient!!
the gradient function (either allocating or mutating, seeevaluation
parameter)step_length
a step length for the finite differenceretraction_method
 a retraction to usevector_transport_method
a vector transport to use
Internal temporary fields
grad_tmp
a temporary storage for the gradient at the currentp
grad_dir_tmp
a temporary storage for the gradient at the currentp_dir
p_dir::P
a temporary storage to the forward direction (i.e. $q$ above)
Constructor
ApproximateFiniteDifference(M, p, grad_f; kwargs...)
Keyword arguments
evaluation
(AllocatingEvaluation
) whether the gradient is given as an allocation function or an inplace (InplaceEvaluation
).steplength
($2^{14}$) step length $c$ to approximate the gradient evaluationsretraction_method
– (default_retraction_method(M, typeof(p))
) aretraction(M, p, X)
to use in the approximation.vector_transport_method
 (default_vector_transport_method(M, typeof(p))
) a vector transport to use
Manopt.ApproxHessianSymmetricRankOne
— TypeApproxHessianSymmetricRankOne{E, P, G, T, B<:AbstractBasis{ℝ}, VTR, R<:Real} <: AbstractApproxHessian
A functor to approximate the Hessian by the symmetric rank one update.
Fields
gradient!!
the gradient function (either allocating or mutating, seeevaluation
parameter).ν
a small real number to ensure that the denominator in the update does not become too small and thus the method does not break down.vector_transport_method
a vector transport to use.
Internal temporary fields
p_tmp
a temporary storage the current pointp
.grad_tmp
a temporary storage for the gradient at the currentp
.matrix
a temporary storage for the matrix representation of the approximating operator.basis
a temporary storage for an orthonormal basis at the currentp
.
Constructor
ApproxHessianSymmetricRankOne(M, p, gradF; kwargs...)
Keyword arguments
initial_operator
(Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))
) the matrix representation of the initial approximating operator.basis
(DefaultOrthonormalBasis()
) an orthonormal basis in the tangent space of the initial iterate p.nu
(1
)evaluation
(AllocatingEvaluation
) whether the gradient is given as an allocation function or an inplace (InplaceEvaluation
).vector_transport_method
(ParallelTransport()
) vector transport $\mathcal T_{\cdot\gets\cdot}$ to use.
Manopt.ApproxHessianBFGS
— TypeApproxHessianBFGS{E, P, G, T, B<:AbstractBasis{ℝ}, VTR, R<:Real} <: AbstractApproxHessian
A functor to approximate the Hessian by the BFGS update.
Fields
gradient!!
the gradient function (either allocating or mutating, seeevaluation
parameter).scale
vector_transport_method
a vector transport to use.
Internal temporary fields
p_tmp
a temporary storage the current pointp
.grad_tmp
a temporary storage for the gradient at the currentp
.matrix
a temporary storage for the matrix representation of the approximating operator.basis
a temporary storage for an orthonormal basis at the currentp
.
Constructor
ApproxHessianBFGS(M, p, gradF; kwargs...)
Keyword arguments
initial_operator
(Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))
) the matrix representation of the initial approximating operator.basis
(DefaultOrthonormalBasis()
) an orthonormal basis in the tangent space of the initial iterate p.nu
(1
)evaluation
(AllocatingEvaluation
) whether the gradient is given as an allocation function or an inplace (InplaceEvaluation
).vector_transport_method
(ParallelTransport()
) vector transport $\mathcal T_{\cdot\gets\cdot}$ to use.
as well as their (nonexported) common supertype
Manopt.AbstractApproxHessian
— TypeAbstractApproxHessian <: Function
An abstract supertypes for approximate hessian functions, declares them also to be functions.
Literature
 [ABG06]

P.A. Absil, C. Baker and K. Gallivan. TrustRegion Methods on Riemannian Manifolds. Foundations of Computational Mathematics 7, 303–330 (2006).
 [CGT00]

A. R. Conn, N. I. Gould and P. L. Toint. Trust Region Methods. Society for Industrial and Applied Mathematics (2000).