The Riemannian trust regions solver

Minimize a function

\[\operatorname*{\arg\,min}_{p ∈ \mathcal{M}}\ f(p)\]

by using the Riemannian trust-regions solver following [ABG06] a model is build by lifting the objective at the $k$th iterate $p_k$ by locally mapping the cost function $f$ to the tangent space as $f_k: T_{p_k}\mathcal M → ℝ$ as $f_k(X) = f(\operatorname{retr}_{p_k}(X))$. The trust region subproblem is then defined as

\[\operatorname*{arg\,min}_{X ∈ T_{p_k}\mathcal M}\ m_k(X),\]

where

\[\begin{align*} m_k&: T_{p_K}\mathcal M → ℝ,\\ m_k(X) &= f(p_k) + ⟨\operatorname{grad} f(p_k), X⟩_{p_k} + \frac{1}{2}\langle \mathcal H_k(X),X⟩_{p_k}\\ \text{such that}&\ \lVert X \rVert_{p_k} ≤ Δ_k. \end{align*}\]

Here $Δ_k$ is a trust region radius, that is adapted every iteration, and $\mathcal H_k$ is some symmetric linear operator that approximates the Hessian $\operatorname{Hess} f$ of $f$.

Interface

Manopt.trust_regionsFunction
trust_regions(M, f, grad_f, hess_f, p=rand(M))
trust_regions(M, f, grad_f, p=rand(M))

run the Riemannian trust-regions solver for optimization on manifolds to minimize f, see on [ABG06, CGT00].

For the case that no Hessian is provided, the Hessian is computed using finite differences, see ApproxHessianFiniteDifference. For solving the inner trust-region subproblem of finding an update-vector, by default the truncated_conjugate_gradient_descent is used.

Input

  • M: a manifold $\mathcal M$
  • f: a cost function $f : \mathcal M → ℝ$ to minimize
  • grad_f: the gradient $\operatorname{grad}F : \mathcal M → T \mathcal M$ of $F$
  • Hess_f: (optional), the Hessian $\operatorname{Hess}F(x): T_x\mathcal M → T_x\mathcal M$, $X ↦ \operatorname{Hess}F(x)[X] = ∇_ξ\operatorname{grad}f(x)$
  • p: (rand(M)) an initial value $x ∈ \mathcal M$

Keyword arguments

  • acceptance_rate: Accept/reject threshold: if ρ (the performance ratio for the iterate) is at least the acceptance rate ρ', the candidate is accepted. This value should be between $0$ and $\frac{1}{4}$
  • augmentation_threshold: (0.75) trust-region augmentation threshold: if ρ is larger than this threshold, a solution is on the trust region boundary and negative curvature, and the radius is extended (augmented)
  • augmentation_factor: (2.0) trust-region augmentation factor
  • evaluation: (AllocatingEvaluation) specify whether the gradient and Hessian work by allocation (default) or InplaceEvaluation in place
  • κ: (0.1) the linear convergence target rate of the tCG method truncated_conjugate_gradient_descent, and is used in a stopping criterion therein
  • max_trust_region_radius: the maximum trust-region radius
  • preconditioner: a preconditioner (a symmetric, positive definite operator that should approximate the inverse of the Hessian)
  • project!; (copyto!) specify a projection operation for tangent vectors within the subsolver for numerical stability. The required form is (M, Y, p, X) -> ... working in place of Y.
  • randomize; set to true if the trust-region solve is to be initiated with a random tangent vector and no preconditioner is used.
  • ρ_regularization: (1e3) regularize the performance evaluation $ρ$ to avoid numerical inaccuracies.
  • reduction_factor: (0.25) trust-region reduction factor
  • reduction_threshold: (0.1) trust-region reduction threshold: if ρ is below this threshold, the trust region radius is reduced by reduction_factor.
  • retraction (default_retraction_method(M, typeof(p))) a retraction to use
  • stopping_criterion: (StopAfterIteration(1000) |StopWhenGradientNormLess(1e-6)) a functor inheriting from StoppingCriterion indicating when to stop.
  • sub_kwargs: keyword arguments passed to the sub state and used to decorate the sub options
  • sub_stopping_criterion: a stopping criterion for the sub solver, uses the same standard as TCG.
  • sub_problem: (DefaultManoptProblem(M,ConstrainedManifoldObjective(subcost, subgrad; evaluation=evaluation))) problem for the subsolver
  • sub_state: (QuasiNewtonState) using QuasiNewtonLimitedMemoryDirectionUpdate with InverseBFGS and sub_stopping_criterion as a stopping criterion. See also sub_kwargs.
  • θ: (1.0) 1+θ is the superlinear convergence target rate of the tCG-method truncated_conjugate_gradient_descent, and is used in a stopping criterion therein
  • trust_region_radius: the initial trust-region radius

For the case that no Hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference.

Output

the obtained (approximate) minimizer $p^*$, see get_solver_return for details

See also

truncated_conjugate_gradient_descent

source
Manopt.trust_regions!Function
trust_regions!(M, f, grad_f, Hess_f, p; kwargs...)
trust_regions!(M, f, grad_f, p; kwargs...)

evaluate the Riemannian trust-regions solver in place of p.

Input

  • M: a manifold $\mathcal M$
  • f: a cost function $f: \mathcal M → ℝ$ to minimize
  • grad_f: the gradient $\operatorname{grad}f: \mathcal M → T \mathcal M$ of $F$
  • Hess_f: (optional) the Hessian $\operatorname{Hess} f$
  • p: an initial value $p ∈ \mathcal M$

For the case that no Hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference.

for more details and all options, see trust_regions

source

State

Manopt.TrustRegionsStateType
TrustRegionsState <: AbstractHessianSolverState

Store the state of the trust-regions solver.

Fields

All the following fields (besides p) can be set by specifying them as keywords.

  • acceptance_rate: (0.1) a lower bound of the performance ratio for the iterate that decides if the iteration is accepted or not.
  • max_trust_region_radius: (sqrt(manifold_dimension(M))) the maximum trust-region radius
  • p: (rand(M) if a manifold is provided) the current iterate
  • project!: (copyto!) specify a projection operation for tangent vectors for numerical stability. A function (M, Y, p, X) -> ... working in place of Y. per default, no projection is performed, set it to project! to activate projection.
  • stop: (StopAfterIteration(1000) |StopWhenGradientNormLess(1e-6))
  • randomize: (false) indicates if the trust-region solve is to be initiated with a random tangent vector. If set to true, no preconditioner is used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.
  • ρ_regularization: (10000.0) regularize the model fitness $ρ$ to avoid division by zero
  • sub_problem: an AbstractManoptProblem problem or a function (M, p, X) -> q or (M, q, p, X) for the a closed form solution of the sub problem
  • sub_state: (TruncatedConjugateGradientState(M, p, X))
  • σ: (0.0 or 1e-6 depending on randomize) Gaussian standard deviation when creating the random initial tangent vector
  • trust_region_radius: (max_trust_region_radius / 8) the (initial) trust-region radius
  • X: (zero_vector(M,p)) the current gradient grad_f(p) Use this default to specify the type of tangent vector to allocate also for the internal (tangent vector) fields.

Internal fields

  • HX, HY, HZ: interim storage (to avoid allocation) of `\operatorname{Hess} f(p)[\cdot] of X, Y, Z
  • Y: the solution (tangent vector) of the subsolver
  • Z: the Cauchy point (only used if random is activated)

Constructors

All the following constructors have the fields as keyword arguments with the defaults given in brackets. If no initial point p is provided, p=rand(M) is used

TrustRegionsState(M, mho; kwargs...)
TrustRegionsState(M, p, mho; kwargs...)

A trust region state, where the sub problem is set to a DefaultManoptProblem on the tangent space using the TrustRegionModelObjective to be solved with truncated_conjugate_gradient_descent! or in other words the sub state is set to TruncatedConjugateGradientState.

TrustRegionsState(M, sub_problem, sub_state; kwargs...)
TrustRegionsState(M, p, sub_problem, sub_state; kwargs...)

A trust region state, where the sub problem is solved using a AbstractManoptProblem sub_problem and an AbstractManoptSolverState sub_state.

TrustRegionsState(M, f::Function; evaluation=AllocatingEvaluation, kwargs...)
TrustRegionsState(M, p, f; evaluation=AllocatingEvaluation, kwargs...)

A trust region state, where the sub problem is solved in closed form by a function f(M, p, Y, Δ), where p is the current iterate, Y the initial tangent vector at p and Δ the current trust region radius.

See also

trust_regions, trust_regions!

source

Approximation of the Hessian

Several different methods to approximate the Hessian are available.

Manopt.ApproxHessianFiniteDifferenceType
ApproxHessianFiniteDifference{E, P, T, G, RTR,, VTR, R <: Real} <: AbstractApproxHessian

A functor to approximate the Hessian by a finite difference of gradient evaluation.

Given a point p and a direction X and the gradient $\operatorname{grad}F: \mathcal M → T\mathcal M$ of a function $F$ the Hessian is approximated as follows: let $c$ be a stepsize, $X∈ T_p\mathcal M$ a tangent vector and $q = \operatorname{retr}_p(\frac{c}{\lVert X \rVert_p}X)$ be a step in direction $X$ of length $c$ following a retraction Then the Hessian is approximated by the finite difference of the gradients, where $\mathcal T_{\cdot\gets\cdot}$ is a vector transport.

\[\operatorname{Hess}F(p)[X] ≈ \frac{\lVert X \rVert_p}{c}\Bigl( \mathcal T_{p\gets q}\bigr(\operatorname{grad}F(q)\bigl) - \operatorname{grad}F(p) \Bigl)\]

Fields

  • gradient!!: the gradient function (either allocating or mutating, see evaluation parameter)
  • step_length: a step length for the finite difference
  • retraction_method: a retraction to use
  • vector_transport_method: a vector transport to use

Internal temporary fields

  • grad_tmp: a temporary storage for the gradient at the current p
  • grad_dir_tmp: a temporary storage for the gradient at the current p_dir
  • p_dir::P: a temporary storage to the forward direction (or the $q$ in the formula)

Constructor

ApproximateFiniteDifference(M, p, grad_f; kwargs...)

Keyword arguments

  • evaluation: (AllocatingEvaluation) whether the gradient is given as an allocation function or an in-place (InplaceEvaluation).
  • steplength: ($2^{-14}$) step length $c$ to approximate the gradient evaluations
  • retraction_method: (default_retraction_method(M, typeof(p))) a retraction(M, p, X) to use in the approximation.
  • vector_transport_method: (default_vector_transport_method(M, typeof(p))) a vector transport to use
source
Manopt.ApproxHessianSymmetricRankOneType
ApproxHessianSymmetricRankOne{E, P, G, T, B<:AbstractBasis{ℝ}, VTR, R<:Real} <: AbstractApproxHessian

A functor to approximate the Hessian by the symmetric rank one update.

Fields

  • gradient!! the gradient function (either allocating or mutating, see evaluation parameter).
  • ν a small real number to ensure that the denominator in the update does not become too small and thus the method does not break down.
  • vector_transport_method a vector transport to use.

Internal temporary fields

  • p_tmp a temporary storage the current point p.
  • grad_tmp a temporary storage for the gradient at the current p.
  • matrix a temporary storage for the matrix representation of the approximating operator.
  • basis a temporary storage for an orthonormal basis at the current p.

Constructor

ApproxHessianSymmetricRankOne(M, p, gradF; kwargs...)

Keyword arguments

  • initial_operator (Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))) the matrix representation of the initial approximating operator.
  • basis (DefaultOrthonormalBasis()) an orthonormal basis in the tangent space of the initial iterate p.
  • nu (-1)
  • evaluation (AllocatingEvaluation) whether the gradient is given as an allocation function or an in-place (InplaceEvaluation).
  • vector_transport_method (ParallelTransport()) vector transport $\mathcal T_{\cdot\gets\cdot}$ to use.
source
Manopt.ApproxHessianBFGSType
ApproxHessianBFGS{E, P, G, T, B<:AbstractBasis{ℝ}, VTR, R<:Real} <: AbstractApproxHessian

A functor to approximate the Hessian by the BFGS update.

Fields

  • gradient!! the gradient function (either allocating or mutating, see evaluation parameter).
  • scale
  • vector_transport_method a vector transport to use.

Internal temporary fields

  • p_tmp a temporary storage the current point p.
  • grad_tmp a temporary storage for the gradient at the current p.
  • matrix a temporary storage for the matrix representation of the approximating operator.
  • basis a temporary storage for an orthonormal basis at the current p.

Constructor

ApproxHessianBFGS(M, p, gradF; kwargs...)

Keyword arguments

  • initial_operator (Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))) the matrix representation of the initial approximating operator.
  • basis (DefaultOrthonormalBasis()) an orthonormal basis in the tangent space of the initial iterate p.
  • nu (-1)
  • evaluation (AllocatingEvaluation) whether the gradient is given as an allocation function or an in-place (InplaceEvaluation).
  • vector_transport_method (ParallelTransport()) vector transport $\mathcal T_{\cdot\gets\cdot}$ to use.
source

as well as their (non-exported) common supertype

Technical details

The trust_regions solver requires the following functions of a manifold to be available

  • A retract!(M, q, p, X); it is recommended to set the default_retraction_method to a favourite retraction. If this default is set, a retraction_method= does not have to be specified.
  • By default the stopping criterion uses the norm as well, to stop when the norm of the gradient is small, but if you implemented inner, the norm is provided already.
  • if you do not provide an initial max_trust_region_radius, a manifold_dimension is required.
  • A `copyto!(M, q, p) and copy(M,p) for points.
  • By default the tangent vectors are initialized calling zero_vector(M,p).

Literature

[ABG06]
P.-A. Absil, C. Baker and K. Gallivan. Trust-Region Methods on Riemannian Manifolds. Foundations of Computational Mathematics 7, 303–330 (2006).
[CGT00]
A. R. Conn, N. I. Gould and P. L. Toint. Trust Region Methods (Society for Industrial and Applied Mathematics, 2000).