The Riemannian Trust-Regions Solver

The aim is to solve an optimization problem on a manifold

\[\operatorname*{min}_{x ∈ \mathcal{M}} F(x)\]

by using the Riemannian trust-regions solver. It is number one choice for smooth optimization. This trust-region method uses the Steihaug-Toint truncated conjugate-gradient method truncated_conjugate_gradient_descent to solve the inner minimization problem called the trust-regions subproblem. This inner solver can be preconditioned by providing a preconditioner (symmetric and positive definite, an approximation of the inverse of the Hessian of $F$). If no Hessian of the cost function $F$ is provided, a standard approximation of the Hessian based on the gradient $\operatorname{grad}F$ with ApproxHessianFiniteDifference will be computed.

Initialization

Initialize $x_0 = x$ with an initial point $x$ on the manifold. It can be given by the caller or set randomly. Set the initial trust-region radius $\Delta =\frac{1}{8} \bar{\Delta}$ where $\bar{\Delta}$ is the maximum radius the trust-region can have. Usually one uses the root of the manifold's dimension $\operatorname{dim}(\mathcal{M})$. For accepting the next iterate and evaluating the new trust-region radius, one needs an accept/reject threshold $\rho' ∈ [0,\frac{1}{4})$, which is $\rho' = 0.1$ on default. Set $k=0$.

Iteration

Repeat until a convergence criterion is reached

  1. Set $η$ as a random tangent vector if using randomized approach. Else set $η$ as the zero vector in the tangential space $T_{x_k}\mathcal{M}$.
  2. Set $η^*$ as the solution of the trust-region subproblem, computed by the tcg-method with $η$ as initial vector.
  3. If using randomized approach, compare $η^*$ with the Cauchy point $η_{c}^* = -\tau_{c} \frac{\Delta}{\lVert \operatorname{Grad}[F] (x_k) \rVert_{x_k}} \operatorname{Grad}[F] (x_k)$ by the model function $m_{x_k}(⋅)$. If the model decrease is larger by using the Cauchy point, set $η^* = η_{c}^*$.
  4. Set ${x}^* = \operatorname{retr}_{x_k}(η^*)$.
  5. Set $\rho = \frac{F(x_k)-F({x}^*)}{m_{x_k}(η)-m_{x_k}(η^*)}$, where $m_{x_k}(⋅)$ describes the quadratic model function.
  6. Update the trust-region radius:$\Delta = \begin{cases}\frac{1}{4} \Delta &\text{ if } \rho < \frac{1}{4} \, \text{or} \, m_{x_k}(η)-m_{x_k}(η^*) \leq 0 \, \text{or} \, \rho = \pm ∈ fty , \\\operatorname{min}(2 \Delta, \bar{\Delta}) &\text{ if } \rho > \frac{3}{4} \, \text{and the tcg-method stopped because of negative curvature or exceeding the trust-region},\\\Delta & \, \text{otherwise.}\end{cases}$
  7. If $m_{x_k}(η)-m_{x_k}(η^*) \geq 0$ and $\rho > \rho'$ set $x_k = {x}^*$.
  8. Set $k = k+1$.

Result

The result is given by the last computed $x_k$.

Remarks

To the initialization: a random point on the manifold.

To step number 1: using a randomized approach means using a random tangent vector as initial vector for the approximate solve of the trust-regions subproblem. If this is the case, keep in mind that the vector must be in the trust-region radius. This is achieved by multiplying η by sqrt(4,eps(Float64)) as long as its norm is greater than the current trust-region radius $\Delta$. For not using randomized approach, one can get the zero tangent vector.

To step number 2: obtain $η^*$ by (approximately) solving the trust-regions subproblem

\[\operatorname*{arg\,min}_{η ∈ T_{x_k}\mathcal{M}} m_{x_k}(η) = F(x_k) + \langle \operatorname{grad}F(x_k), η \rangle_{x_k} + \frac{1}{2} \langle \operatorname{Hess}[F](η)_ {x_k}, η \rangle_{x_k}\]

\[\text{s.t.} \; \langle η, η \rangle_{x_k} \leq {\Delta}^2\]

with the Steihaug-Toint truncated conjugate-gradient (tcg) method. The problem as well as the solution method is described in the truncated_conjugate_gradient_descent. In this inner solver, the stopping criteria StopIfResidualIsReducedByFactor and StopIfResidualIsReducedByPower are used so that superlinear or at least linear convergence in the trust-region method can be achieved.

To step number 3: if using a random tangent vector as an initial vector, compare the result of the tcg-method with the Cauchy point. Convergence proofs assume that one achieves at least (a fraction of) the reduction of the Cauchy point. The idea is to go in the direction of the gradient to an optimal point. This can be on the edge, but also before. The parameter $\tau_{c}$ for the optimal length is defined by

\[\tau_{c} = \begin{cases} 1 & \langle \operatorname{Grad}[F] (x_k), \, \operatorname{Hess}[F] (η_k)_ {x_k}\rangle_{x_k} \leq 0 , \\ \operatorname{min}(\frac{{\operatorname{norm}(\operatorname{Grad}[F] (x_k))}^3} {\Delta \langle \operatorname{Grad}[F] (x_k), \, \operatorname{Hess}[F] (η_k)_ {x_k}\rangle_{x_k}}, 1) & \, \text{otherwise.} \end{cases}\]

To check the model decrease one compares

\[m_{x_k}(η_{c}^*) = F(x_k) + \langle η_{c}^*, \operatorname{Grad}[F] (x_k)\rangle_{x_k} + \frac{1}{2}\langle η_{c}^*, \operatorname{Hess}[F] (η_{c}^*)_ {x_k}\rangle_{x_k}\]

with

\[m_{x_k}(η^*) = F(x_k) + \langle η^*, \operatorname{Grad}[F] (x_k)\rangle_{x_k} + \frac{1}{2}\langle η^*, \operatorname{Hess}[F] (η^*)_ {x_k}\rangle_{x_k}.\]

If $m_{x_k}(η_{c}^*) < m_{x_k}(η^*)$ then $m_{x_k}(η_{c}^*)$ is the better choice.

To step number 4: $\operatorname{retr}_{x_k}(⋅)$ denotes the retraction, a mapping $\operatorname{retr}_{x_k}:T_{x_k}\mathcal{M} \rightarrow \mathcal{M}$ which approximates the exponential map. In some cases it is cheaper to use this instead of the exponential.

To step number 6: one knows that the truncated_conjugate_gradient_descent algorithm stopped for these reasons when the stopping criteria StopWhenCurvatureIsNegative, StopWhenTrustRegionIsExceeded are activated.

To step number 7: the last step is to decide if the new point ${x}^*$ is accepted.

Interface

Manopt.trust_regionsFunction
trust_regions(M, F, gradF, hessF, x)

evaluate the Riemannian trust-regions solver for optimization on manifolds. It will attempt to minimize the cost function F on the Manifold M. If no Hessian H is provided, a standard approximation of the Hessian based on the gradient gradF will be computed. For solving the the inner trust-region subproblem of finding an update-vector, it uses the Steihaug-Toint truncated conjugate-gradient method. For a description of the algorithm and more details see

Input

  • M – a manifold $\mathcal M$
  • F – a cost function $F : \mathcal M → ℝ$ to minimize
  • gradF- the gradient $\operatorname{grad}F : \mathcal M → T \mathcal M$ of $F$
  • x – an initial value $x ∈ \mathcal M$
  • HessF – the hessian $\operatorname{Hess}F(x): T_x\mathcal M → T_x\mathcal M$, $X ↦ \operatorname{Hess}F(x)[X] = ∇_ξ\operatorname{grad}f(x)$

Optional

  • evaluation – (AllocatingEvaluation) specify whether the gradient and hessian work by allocation (default) or MutatingEvaluation in place
  • max_trust_region_radius – the maximum trust-region radius
  • preconditioner – a preconditioner (a symmetric, positive definite operator that should approximate the inverse of the Hessian)
  • randomize – set to true if the trust-region solve is to be initiated with a random tangent vector. If set to true, no preconditioner will be used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.
  • project! : (copyto!) specify a projection operation for tangent vectors within the TCG for numerical stability. A function (M, Y, p, X) -> ... working in place of Y. per default, no projection is perfomed, set it to project! to activate projection.
  • retraction – (default_retraction_method(M)) approximation of the exponential map
  • stopping_criterion – (StopWhenAny(StopAfterIteration(1000), StopWhenGradientNormLess(10^(-6))) a functor inheriting from StoppingCriterion indicating when to stop.
  • trust_region_radius - the initial trust-region radius
  • ρ_prime – Accept/reject threshold: if ρ (the performance ratio for the iterate) is at least ρ', the outer iteration is accepted. Otherwise, it is rejected. In case it is rejected, the trust-region radius will have been decreased. To ensure this, ρ' >= 0 must be strictly smaller than 1/4. If ρ_prime is negative, the algorithm is not guaranteed to produce monotonically decreasing cost values. It is strongly recommended to set ρ' > 0, to aid convergence.
  • ρ_regularization – Close to convergence, evaluating the performance ratio ρ is numerically challenging. Meanwhile, close to convergence, the quadratic model should be a good fit and the steps should be accepted. Regularization lets ρ go to 1 as the model decrease and the actual decrease go to zero. Set this option to zero to disable regularization (not recommended). When this is not zero, it may happen that the iterates produced are not monotonically improving the cost when very close to convergence. This is because the corrected cost improvement could change sign if it is negative but very small.
  • θ – (1.0) 1+θ is the superlinear convergence target rate of the tCG-method truncated_conjugate_gradient_descent, which computes an approximate solution for the trust-region subproblem. The tCG-method aborts if the residual is less than or equal to the initial residual to the power of 1+θ.
  • κ – (0.1) the linear convergence target rate of the tCG-method truncated_conjugate_gradient_descent, which computes an approximate solution for the trust-region subproblem. The method aborts if the residual is less than or equal to κ times the initial residual.
  • η_1 – (0.1) Trust-region reduction threshold: if ρ (the performance ratio for the iterate) is less than η_1, the trust-region radius and thus the trust-regions decreases.
  • η_2 – (0.75) Trust-region augmentation threshold: if ρ (the performance ratio for the iterate) is greater than η_2 and further conditions apply, the trust-region radius and thus the trust-regions increases.
  • return_options – (false) – if activated, the extended result, i.e. the complete Options are returned. This can be used to access recorded values. If set to false (default) just the optimal value x_opt is returned

Output

the obtained (approximate) minimizer $x^*$, see get_solver_return for details

see also

truncated_conjugate_gradient_descent

source
Manopt.trust_regions!Function
trust_regions!(M, F, gradF, hessF, x; kwargs...)

evaluate the Riemannian trust-regions solver for optimization on manifolds in place of x.

Input

  • M – a manifold $\mathcal M$
  • F – a cost function $F: \mathcal M → ℝ$ to minimize
  • gradF- the gradient $\operatorname{grad}F: \mathcal M → T \mathcal M$ of $F$
  • x – an initial value $x ∈ \mathcal M$
  • H – the hessian $H( \mathcal M, x, ξ)$ of $F$

for more details and all options, see trust_regions

source

Options

Manopt.AbstractHessianOptionsType
AbstractHessianOptions <: Options

An Options type to represent algorithms that employ the Hessian. These options are assumed to have a field (gradient) to store the current gradient $\operatorname{grad}f(x)$

source
Manopt.TrustRegionsOptionsType
TrustRegionsOptions <: AbstractHessianOptions

describe the trust-regions solver, with

Fields

where all but x are keyword arguments in the constructor

  • x : a point as starting point
  • stop : (`StopAfterIteration(1000) | StopWhenGradientNormLess(1e-6))
  • trust_region_radius : the (initial) trust-region radius
  • max_trust_region_radius : (sqrt(manifold_dimension(M))) the maximum trust-region radius
  • randomize : (false) indicates if the trust-region solve is to be initiated with a random tangent vector. If set to true, no preconditioner will be used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.
  • project! : (copyto!) specify a projection operation for tangent vectors for numerical stability. A function (M, Y, p, X) -> ... working in place of Y. per default, no projection is perfomed, set it to project! to activate projection.
  • ρ_prime : (0.1) a lower bound of the performance ratio for the iterate that decides if the iteration will be accepted or not. If not, the trust-region radius will have been decreased. To ensure this, ρ'>= 0 must be strictly smaller than 1/4. If ρ' is negative, the algorithm is not guaranteed to produce monotonically decreasing cost values. It is strongly recommended to set ρ' > 0, to aid convergence.
  • ρ_regularization : (10000.0) Close to convergence, evaluating the performance ratio ρ is numerically challenging. Meanwhile, close to convergence, the quadratic model should be a good fit and the steps should be accepted. Regularization lets ρ go to 1 as the model decrease and the actual decrease go to zero. Set this option to zero to disable regularization (not recommended). When this is not zero, it may happen that the iterates produced are not monotonically improving the cost when very close to convergence. This is because the corrected cost improvement could change sign if it is negative but very small.

Constructor

TrustRegionsOptions(M, x)

construct a trust-regions Option with all other fields from above being keyword arguments

See also

trust_regions

source

Approximation of the Hessian

Manopt.ApproxHessianFiniteDifferenceType
ApproxHessianFiniteDifference{E, P, T, G, RTR,, VTR, R <: Real}

A functor to approximate the Hessian by a finite difference of gradient evaluation.

Given a point p and a direction X and the gradient $\operatorname{grad}F: \mathcal M \to T\mathcal M$ of a function $F$ the Hessian is approximated as follows: Let $c$ be a stepsize, $X∈ T_p\mathcal M$ a tangent vector and $q = \operatorname{retr}_p(\frac{c}{\lVert X \rVert_p}X)$ be a step in direction $X$ of length $c$ following a retraction Then we approximate the Hessian by the finite difference of the gradients, where $\mathcal T_{\cdot\gets\cdot}$ is a vector transport.

\[\operatorname{Hess}F(p)[X] ≈ \frac{\lVert X \rVert_p}{c}\Bigl( \mathcal T_{p\gets q}\bigr(\operatorname{grad}F(q)\bigl) - \operatorname{grad}F(p)\Bigl)\]

Fields

  • gradient!! the gradient function (either allocating or mutating, see evaluation parameter)
  • step_length a step length for the finite difference
  • retraction_method - a retraction to use
  • vector_transport_method a vector transport to use

Internal temporary fields

  • grad_tmp a temporary storage for the gradient at the current p
  • grad_dir_tmp a temporary storage for the gradient at the current p_dir
  • p_dir::P a temporary storage to the forward direction (i.e. $q$ above)

Constructor

ApproximateFiniteDifference(M, p, gradF; kwargs...)

Keyword arguments

  • evaluation (AllocatingEvaluation()) whether the gradient is given as an allocation function or an in-place (MutatingEvaluation()).
  • steplength ($2^{-14}$) step length $c$ to approximate the gradient evaluations
  • retraction_method – (default_retraction_method(M)) a retraction(M, p, X) to use in the approximation.
  • vector_transport_method - (default_vector_transport_method(M)) a vector transport to use
source
Manopt.ApproxHessianSymmetricRankOneType
ApproxHessianSymmetricRankOne{E, P, G, T, B<:AbstractBasis{ℝ}, VTR, R<:Real}

A functor to approximate the Hessian by the symmetric rank one update.

Fields

  • gradient!! the gradient function (either allocating or mutating, see evaluation parameter).
  • ν a small real number to ensure that the denominator in the update does not become too small and thus the method does not break down.
  • vector_transport_method a vector transport to use.

Internal temporary fields

  • p_tmp a temporary storage the current point p.
  • grad_tmp a temporary storage for the gradient at the current p.
  • matrix a temporary storage for the matrix representation of the approximating operator.
  • basis a temporary storage for an orthonormal basis at the current p.

Constructor

ApproxHessianSymmetricRankOne(M, p, gradF; kwargs...)

Keyword arguments

  • initial_operator (Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))) the matrix representation of the initial approximating operator.
  • basis (DefaultOrthonormalBasis()) an orthonormal basis in the tangent space of the initial iterate p.
  • nu (-1)
  • evaluation (AllocatingEvaluation()) whether the gradient is given as an allocation function or an in-place (MutatingEvaluation()).
  • vector_transport_method (ParallelTransport()) vector transport $\mathcal T_{\cdot\gets\cdot}$ to use.
source
Manopt.ApproxHessianBFGSType
ApproxHessianBFGS{E, P, G, T, B<:AbstractBasis{ℝ}, VTR, R<:Real}

A functor to approximate the Hessian by the BFGS update.

Fields

  • gradient!! the gradient function (either allocating or mutating, see evaluation parameter).
  • scale
  • vector_transport_method a vector transport to use.

Internal temporary fields

  • p_tmp a temporary storage the current point p.
  • grad_tmp a temporary storage for the gradient at the current p.
  • matrix a temporary storage for the matrix representation of the approximating operator.
  • basis a temporary storage for an orthonormal basis at the current p.

Constructor

ApproxHessianBFGS(M, p, gradF; kwargs...)

Keyword arguments

  • initial_operator (Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))) the matrix representation of the initial approximating operator.
  • basis (DefaultOrthonormalBasis()) an orthonormal basis in the tangent space of the initial iterate p.
  • nu (-1)
  • evaluation (AllocatingEvaluation()) whether the gradient is given as an allocation function or an in-place (MutatingEvaluation()).
  • vector_transport_method (ParallelTransport()) vector transport $\mathcal T_{\cdot\gets\cdot}$ to use.
source