# The Riemannian trust regions solver

Minimize a function

\[\operatorname*{\arg\,min}_{p ∈ \mathcal{M}}\ f(p)\]

by using the Riemannian trust-regions solver following [ABG06] a model is build by lifting the objective at the $k$th iterate $p_k$ by locally mapping the cost function $f$ to the tangent space as $f_k: T_{p_k}\mathcal M → ℝ$ as $f_k(X) = f(\operatorname{retr}_{p_k}(X))$. The trust region subproblem is then defined as

\[\operatorname*{arg\,min}_{X ∈ T_{p_k}\mathcal M}\ m_k(X),\]

where

\[\begin{align*} m_k&: T_{p_K}\mathcal M → ℝ,\\ m_k(X) &= f(p_k) + ⟨\operatorname{grad} f(p_k), X⟩_{p_k} + \frac{1}{2}\langle \mathcal H_k(X),X⟩_{p_k}\\ \text{such that}&\ \lVert X \rVert_{p_k} ≤ Δ_k. \end{align*}\]

Here $Δ_k$ is a trust region radius, that is adapted every iteration, and $\mathcal H_k$ is some symmetric linear operator that approximates the Hessian $\operatorname{Hess} f$ of $f$.

## Interface

`Manopt.trust_regions`

— Function```
trust_regions(M, f, grad_f, hess_f, p=rand(M))
trust_regions(M, f, grad_f, p=rand(M))
```

run the Riemannian trust-regions solver for optimization on manifolds to minimize `f`

, see on [ABG06, CGT00].

For the case that no Hessian is provided, the Hessian is computed using finite differences, see `ApproxHessianFiniteDifference`

. For solving the inner trust-region subproblem of finding an update-vector, by default the `truncated_conjugate_gradient_descent`

is used.

**Input**

`M`

: a manifold $\mathcal M$`f`

: a cost function $f : \mathcal M → ℝ$ to minimize`grad_f`

: the gradient $\operatorname{grad}F : \mathcal M → T \mathcal M$ of $F$`Hess_f`

: (optional), the Hessian $\operatorname{Hess}F(x): T_x\mathcal M → T_x\mathcal M$, $X ↦ \operatorname{Hess}F(x)[X] = ∇_ξ\operatorname{grad}f(x)$`p`

: (`rand(M)`

) an initial value $x ∈ \mathcal M$

**Keyword arguments**

`acceptance_rate`

: Accept/reject threshold: if ρ (the performance ratio for the iterate) is at least the acceptance rate ρ', the candidate is accepted. This value should be between $0$ and $\frac{1}{4}$`augmentation_threshold`

: (`0.75`

) trust-region augmentation threshold: if ρ is larger than this threshold, a solution is on the trust region boundary and negative curvature, and the radius is extended (augmented)`augmentation_factor`

: (`2.0`

) trust-region augmentation factor`evaluation`

: (`AllocatingEvaluation`

) specify whether the gradient and Hessian work by allocation (default) or`InplaceEvaluation`

in place`κ`

: (`0.1`

) the linear convergence target rate of the tCG method`truncated_conjugate_gradient_descent`

, and is used in a stopping criterion therein`max_trust_region_radius`

: the maximum trust-region radius`preconditioner`

: a preconditioner (a symmetric, positive definite operator that should approximate the inverse of the Hessian)`project!`

; (`copyto!`

) specify a projection operation for tangent vectors within the subsolver for numerical stability. The required form is`(M, Y, p, X) -> ...`

working in place of`Y`

.`randomize`

; set to true if the trust-region solve is to be initiated with a random tangent vector and no preconditioner is used.`ρ_regularization`

: (`1e3`

) regularize the performance evaluation $ρ$ to avoid numerical inaccuracies.`reduction_factor`

: (`0.25`

) trust-region reduction factor`reduction_threshold`

: (`0.1`

) trust-region reduction threshold: if ρ is below this threshold, the trust region radius is reduced by`reduction_factor`

.`retraction`

(`default_retraction_method(M, typeof(p))`

) a retraction to use`stopping_criterion`

: (`StopAfterIteration`

`(1000) |`

`StopWhenGradientNormLess`

`(1e-6)`

) a functor inheriting from`StoppingCriterion`

indicating when to stop.`sub_kwargs`

: keyword arguments passed to the sub state and used to decorate the sub options`sub_stopping_criterion`

: a stopping criterion for the sub solver, uses the same standard as TCG.`sub_problem`

: (`DefaultManoptProblem`

`(M,`

`ConstrainedManifoldObjective`

`(subcost, subgrad; evaluation=evaluation))`

) problem for the subsolver`sub_state`

: (`QuasiNewtonState`

) using`QuasiNewtonLimitedMemoryDirectionUpdate`

with`InverseBFGS`

and`sub_stopping_criterion`

as a stopping criterion. See also`sub_kwargs`

.`θ`

: (`1.0`

) 1+θ is the superlinear convergence target rate of the tCG-method`truncated_conjugate_gradient_descent`

, and is used in a stopping criterion therein`trust_region_radius`

: the initial trust-region radius

For the case that no Hessian is provided, the Hessian is computed using finite difference, see `ApproxHessianFiniteDifference`

.

**Output**

the obtained (approximate) minimizer $p^*$, see `get_solver_return`

for details

**See also**

`Manopt.trust_regions!`

— Function```
trust_regions!(M, f, grad_f, Hess_f, p; kwargs...)
trust_regions!(M, f, grad_f, p; kwargs...)
```

evaluate the Riemannian trust-regions solver in place of `p`

.

**Input**

`M`

: a manifold $\mathcal M$`f`

: a cost function $f: \mathcal M → ℝ$ to minimize`grad_f`

: the gradient $\operatorname{grad}f: \mathcal M → T \mathcal M$ of $F$`Hess_f`

: (optional) the Hessian $\operatorname{Hess} f$`p`

: an initial value $p ∈ \mathcal M$

For the case that no Hessian is provided, the Hessian is computed using finite difference, see `ApproxHessianFiniteDifference`

.

for more details and all options, see `trust_regions`

## State

`Manopt.TrustRegionsState`

— Type`TrustRegionsState <: AbstractHessianSolverState`

Store the state of the trust-regions solver.

**Fields**

All the following fields (besides `p`

) can be set by specifying them as keywords.

`acceptance_rate`

: (`0.1`

) a lower bound of the performance ratio for the iterate that decides if the iteration is accepted or not.`max_trust_region_radius`

: (`sqrt(manifold_dimension(M))`

) the maximum trust-region radius`p`

: (`rand(M)`

if a manifold is provided) the current iterate`project!`

: (`copyto!`

) specify a projection operation for tangent vectors for numerical stability. A function`(M, Y, p, X) -> ...`

working in place of`Y`

. per default, no projection is performed, set it to`project!`

to activate projection.`stop`

: (`StopAfterIteration`

`(1000) |`

`StopWhenGradientNormLess`

`(1e-6)`

)`randomize`

: (`false`

) indicates if the trust-region solve is to be initiated with a random tangent vector. If set to true, no preconditioner is used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.`ρ_regularization`

: (`10000.0`

) regularize the model fitness $ρ$ to avoid division by zero`sub_problem`

: an`AbstractManoptProblem`

problem or a function`(M, p, X) -> q`

or`(M, q, p, X)`

for the a closed form solution of the sub problem`sub_state`

: (`TruncatedConjugateGradientState`

`(M, p, X)`

)`σ`

: (`0.0`

or`1e-6`

depending on`randomize`

) Gaussian standard deviation when creating the random initial tangent vector`trust_region_radius`

: (`max_trust_region_radius / 8`

) the (initial) trust-region radius`X`

: (`zero_vector(M,p)`

) the current gradient`grad_f(p)`

Use this default to specify the type of tangent vector to allocate also for the internal (tangent vector) fields.

**Internal fields**

`HX`

,`HY`

,`HZ`

: interim storage (to avoid allocation) of ``\operatorname{Hess} f(p)[\cdot]`

of`X`

,`Y`

,`Z`

`Y`

: the solution (tangent vector) of the subsolver`Z`

: the Cauchy point (only used if random is activated)

**Constructors**

All the following constructors have the fields as keyword arguments with the defaults given in brackets. If no initial point `p`

is provided, `p=rand(M)`

is used

```
TrustRegionsState(M, mho; kwargs...)
TrustRegionsState(M, p, mho; kwargs...)
```

A trust region state, where the sub problem is set to a `DefaultManoptProblem`

on the tangent space using the `TrustRegionModelObjective`

to be solved with `truncated_conjugate_gradient_descent!`

or in other words the sub state is set to `TruncatedConjugateGradientState`

.

```
TrustRegionsState(M, sub_problem, sub_state; kwargs...)
TrustRegionsState(M, p, sub_problem, sub_state; kwargs...)
```

A trust region state, where the sub problem is solved using a `AbstractManoptProblem`

`sub_problem`

and an `AbstractManoptSolverState`

`sub_state`

.

```
TrustRegionsState(M, f::Function; evaluation=AllocatingEvaluation, kwargs...)
TrustRegionsState(M, p, f; evaluation=AllocatingEvaluation, kwargs...)
```

A trust region state, where the sub problem is solved in closed form by a function `f(M, p, Y, Δ)`

, where `p`

is the current iterate, `Y`

the initial tangent vector at `p`

and `Δ`

the current trust region radius.

**See also**

## Approximation of the Hessian

Several different methods to approximate the Hessian are available.

`Manopt.ApproxHessianFiniteDifference`

— Type`ApproxHessianFiniteDifference{E, P, T, G, RTR,, VTR, R <: Real} <: AbstractApproxHessian`

A functor to approximate the Hessian by a finite difference of gradient evaluation.

Given a point `p`

and a direction `X`

and the gradient $\operatorname{grad}F: \mathcal M → T\mathcal M$ of a function $F$ the Hessian is approximated as follows: let $c$ be a stepsize, $X∈ T_p\mathcal M$ a tangent vector and $q = \operatorname{retr}_p(\frac{c}{\lVert X \rVert_p}X)$ be a step in direction $X$ of length $c$ following a retraction Then the Hessian is approximated by the finite difference of the gradients, where $\mathcal T_{\cdot\gets\cdot}$ is a vector transport.

\[\operatorname{Hess}F(p)[X] ≈ \frac{\lVert X \rVert_p}{c}\Bigl( \mathcal T_{p\gets q}\bigr(\operatorname{grad}F(q)\bigl) - \operatorname{grad}F(p) \Bigl)\]

**Fields**

`gradient!!`

: the gradient function (either allocating or mutating, see`evaluation`

parameter)`step_length`

: a step length for the finite difference`retraction_method`

: a retraction to use`vector_transport_method`

: a vector transport to use

**Internal temporary fields**

`grad_tmp`

: a temporary storage for the gradient at the current`p`

`grad_dir_tmp`

: a temporary storage for the gradient at the current`p_dir`

`p_dir::P`

: a temporary storage to the forward direction (or the $q$ in the formula)

**Constructor**

`ApproximateFiniteDifference(M, p, grad_f; kwargs...)`

**Keyword arguments**

`evaluation`

: (`AllocatingEvaluation`

) whether the gradient is given as an allocation function or an in-place (`InplaceEvaluation`

).`steplength`

: ($2^{-14}$) step length $c$ to approximate the gradient evaluations`retraction_method`

: (`default_retraction_method(M, typeof(p))`

) a`retraction(M, p, X)`

to use in the approximation.`vector_transport_method`

: (`default_vector_transport_method(M, typeof(p))`

) a vector transport to use

`Manopt.ApproxHessianSymmetricRankOne`

— Type`ApproxHessianSymmetricRankOne{E, P, G, T, B<:AbstractBasis{ℝ}, VTR, R<:Real} <: AbstractApproxHessian`

A functor to approximate the Hessian by the symmetric rank one update.

**Fields**

`gradient!!`

the gradient function (either allocating or mutating, see`evaluation`

parameter).`ν`

a small real number to ensure that the denominator in the update does not become too small and thus the method does not break down.`vector_transport_method`

a vector transport to use.

**Internal temporary fields**

`p_tmp`

a temporary storage the current point`p`

.`grad_tmp`

a temporary storage for the gradient at the current`p`

.`matrix`

a temporary storage for the matrix representation of the approximating operator.`basis`

a temporary storage for an orthonormal basis at the current`p`

.

**Constructor**

`ApproxHessianSymmetricRankOne(M, p, gradF; kwargs...)`

**Keyword arguments**

`initial_operator`

(`Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))`

) the matrix representation of the initial approximating operator.`basis`

(`DefaultOrthonormalBasis()`

) an orthonormal basis in the tangent space of the initial iterate p.`nu`

(`-1`

)`evaluation`

(`AllocatingEvaluation`

) whether the gradient is given as an allocation function or an in-place (`InplaceEvaluation`

).`vector_transport_method`

(`ParallelTransport()`

) vector transport $\mathcal T_{\cdot\gets\cdot}$ to use.

`Manopt.ApproxHessianBFGS`

— Type`ApproxHessianBFGS{E, P, G, T, B<:AbstractBasis{ℝ}, VTR, R<:Real} <: AbstractApproxHessian`

A functor to approximate the Hessian by the BFGS update.

**Fields**

`gradient!!`

the gradient function (either allocating or mutating, see`evaluation`

parameter).`scale`

`vector_transport_method`

a vector transport to use.

**Internal temporary fields**

`p_tmp`

a temporary storage the current point`p`

.`grad_tmp`

a temporary storage for the gradient at the current`p`

.`matrix`

a temporary storage for the matrix representation of the approximating operator.`basis`

a temporary storage for an orthonormal basis at the current`p`

.

**Constructor**

`ApproxHessianBFGS(M, p, gradF; kwargs...)`

**Keyword arguments**

`initial_operator`

(`Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))`

) the matrix representation of the initial approximating operator.`basis`

(`DefaultOrthonormalBasis()`

) an orthonormal basis in the tangent space of the initial iterate p.`nu`

(`-1`

)`evaluation`

(`AllocatingEvaluation`

) whether the gradient is given as an allocation function or an in-place (`InplaceEvaluation`

).`vector_transport_method`

(`ParallelTransport()`

) vector transport $\mathcal T_{\cdot\gets\cdot}$ to use.

as well as their (non-exported) common supertype

`Manopt.AbstractApproxHessian`

— Type`AbstractApproxHessian <: Function`

An abstract supertypes for approximate Hessian functions, declares them also to be functions.

## Technical details

The `trust_regions`

solver requires the following functions of a manifold to be available

- A
`retract!`

`(M, q, p, X)`

; it is recommended to set the`default_retraction_method`

to a favourite retraction. If this default is set, a`retraction_method=`

does not have to be specified. - By default the stopping criterion uses the
`norm`

as well, to stop when the norm of the gradient is small, but if you implemented`inner`

, the norm is provided already. - if you do not provide an initial
`max_trust_region_radius`

, a`manifold_dimension`

is required. - A `copyto!
`(M, q, p)`

and`copy`

`(M,p)`

for points. - By default the tangent vectors are initialized calling
`zero_vector`

`(M,p)`

.

## Literature

- [ABG06]
- P.-A. Absil, C. Baker and K. Gallivan.
*Trust-Region Methods on Riemannian Manifolds*. Foundations of Computational Mathematics**7**, 303–330 (2006). - [CGT00]
- A. R. Conn, N. I. Gould and P. L. Toint.
*Trust Region Methods*(Society for Industrial and Applied Mathematics, 2000).