The Riemannian trust regions solver

Minimize a function

\[\operatorname*{\arg\,min}_{p ∈ \mathcal{M}}\ f(p)\]

by using the Riemannian trust-regions solver following [ABG06] a model is build by lifting the objective at the $k$th iterate $p_k$ by locally mapping the cost function $f$ to the tangent space as $f_k: T_{p_k}\mathcal M → ℝ$ as $f_k(X) = f(\operatorname{retr}_{p_k}(X))$. The trust region subproblem is then defined as

\[\operatorname*{arg\,min}_{X ∈ T_{p_k}\mathcal M}\ m_k(X),\]

where

\[\begin{align*} m_k&: T_{p_K}\mathcal M → ℝ,\\ m_k(X) &= f(p_k) + ⟨\operatorname{grad} f(p_k), X⟩_{p_k} + \frac{1}{2}\langle \mathcal H_k(X),X⟩_{p_k}\\ \text{such that}&\ \lVert X \rVert_{p_k} ≤ Δ_k. \end{align*}\]

Here $Δ_k$ is a trust region radius, that is adapted every iteration, and $\mathcal H_k$ is some symmetric linear operator that approximates the Hessian $\operatorname{Hess} f$ of $f$.

Interface

Manopt.trust_regions — Function

trust_regions(M, f, grad_f, hess_f, p=rand(M))
trust_regions(M, f, grad_f, p=rand(M))

run the Riemannian trust-regions solver for optimization on manifolds to minimize f, see on [ABG06, CGT00].

For the case that no Hessian is provided, the Hessian is computed using finite differences, see ApproxHessianFiniteDifference. For solving the inner trust-region subproblem of finding an update-vector, by default the truncated_conjugate_gradient_descent is used.

Input

M: a manifold $\mathcal M$
f: a cost function $f : \mathcal M → ℝ$ to minimize
grad_f: the gradient $\operatorname{grad}F : \mathcal M → T \mathcal M$ of $F$
Hess_f: (optional), the Hessian $\operatorname{Hess}F(x): T_x\mathcal M → T_x\mathcal M$, $X ↦ \operatorname{Hess}F(x)[X] = ∇_ξ\operatorname{grad}f(x)$
p: (rand(M)) an initial value $x ∈ \mathcal M$

Keyword arguments

acceptance_rate: Accept/reject threshold: if ρ (the performance ratio for the iterate) is at least the acceptance rate ρ', the candidate is accepted. This value should be between $0$ and $\frac{1}{4}$
augmentation_threshold: (0.75) trust-region augmentation threshold: if ρ is larger than this threshold, a solution is on the trust region boundary and negative curvature, and the radius is extended (augmented)
augmentation_factor: (2.0) trust-region augmentation factor
evaluation: (AllocatingEvaluation) specify whether the gradient and Hessian work by allocation (default) or InplaceEvaluation in place
κ: (0.1) the linear convergence target rate of the tCG method truncated_conjugate_gradient_descent, and is used in a stopping criterion therein
max_trust_region_radius: the maximum trust-region radius
preconditioner: a preconditioner (a symmetric, positive definite operator that should approximate the inverse of the Hessian)
project!; (copyto!) specify a projection operation for tangent vectors within the subsolver for numerical stability. The required form is (M, Y, p, X) -> ... working in place of Y.
randomize; set to true if the trust-region solve is to be initiated with a random tangent vector and no preconditioner is used.
ρ_regularization: (1e3) regularize the performance evaluation $ρ$ to avoid numerical inaccuracies.
reduction_factor: (0.25) trust-region reduction factor
reduction_threshold: (0.1) trust-region reduction threshold: if ρ is below this threshold, the trust region radius is reduced by reduction_factor.
retraction (default_retraction_method(M, typeof(p))) a retraction to use
stopping_criterion: (StopAfterIteration(1000) |StopWhenGradientNormLess(1e-6)) a functor inheriting from StoppingCriterion indicating when to stop.
sub_kwargs: keyword arguments passed to the sub state and used to decorate the sub options
sub_stopping_criterion: a stopping criterion for the sub solver, uses the same standard as TCG.
sub_problem: (DefaultManoptProblem(M,ConstrainedManifoldObjective(subcost, subgrad; evaluation=evaluation))) problem for the subsolver
sub_state: (QuasiNewtonState) using QuasiNewtonLimitedMemoryDirectionUpdate with InverseBFGS and sub_stopping_criterion as a stopping criterion. See also sub_kwargs.
θ: (1.0) 1+θ is the superlinear convergence target rate of the tCG-method truncated_conjugate_gradient_descent, and is used in a stopping criterion therein
trust_region_radius: the initial trust-region radius

For the case that no Hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference.

Output

the obtained (approximate) minimizer $p^*$, see get_solver_return for details

See also

truncated_conjugate_gradient_descent

source

Manopt.trust_regions! — Function

trust_regions!(M, f, grad_f, Hess_f, p; kwargs...)
trust_regions!(M, f, grad_f, p; kwargs...)

evaluate the Riemannian trust-regions solver in place of p.

Input

M: a manifold $\mathcal M$
f: a cost function $f: \mathcal M → ℝ$ to minimize
grad_f: the gradient $\operatorname{grad}f: \mathcal M → T \mathcal M$ of $F$
Hess_f: (optional) the Hessian $\operatorname{Hess} f$
p: an initial value $p ∈ \mathcal M$

For the case that no Hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference.

for more details and all options, see trust_regions

source

State

Manopt.TrustRegionsState — Type

TrustRegionsState <: AbstractHessianSolverState

Store the state of the trust-regions solver.

Fields

All the following fields (besides p) can be set by specifying them as keywords.

acceptance_rate: (0.1) a lower bound of the performance ratio for the iterate that decides if the iteration is accepted or not.
max_trust_region_radius: (sqrt(manifold_dimension(M))) the maximum trust-region radius
p: (rand(M) if a manifold is provided) the current iterate
project!: (copyto!) specify a projection operation for tangent vectors for numerical stability. A function (M, Y, p, X) -> ... working in place of Y. per default, no projection is performed, set it to project! to activate projection.
stop: (StopAfterIteration(1000) |StopWhenGradientNormLess(1e-6))
randomize: (false) indicates if the trust-region solve is to be initiated with a random tangent vector. If set to true, no preconditioner is used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.
ρ_regularization: (10000.0) regularize the model fitness $ρ$ to avoid division by zero
sub_problem: an AbstractManoptProblem problem or a function (M, p, X) -> q or (M, q, p, X) for the a closed form solution of the sub problem
sub_state: (TruncatedConjugateGradientState(M, p, X))
σ: (0.0 or 1e-6 depending on randomize) Gaussian standard deviation when creating the random initial tangent vector
trust_region_radius: (max_trust_region_radius / 8) the (initial) trust-region radius
X: (zero_vector(M,p)) the current gradient grad_f(p) Use this default to specify the type of tangent vector to allocate also for the internal (tangent vector) fields.

Internal fields

HX, HY, HZ: interim storage (to avoid allocation) of `\operatorname{Hess} f(p)[\cdot] of X, Y, Z
Y: the solution (tangent vector) of the subsolver
Z: the Cauchy point (only used if random is activated)

Constructors

All the following constructors have the fields as keyword arguments with the defaults given in brackets. If no initial point p is provided, p=rand(M) is used

TrustRegionsState(M, mho; kwargs...)
TrustRegionsState(M, p, mho; kwargs...)

A trust region state, where the sub problem is set to a DefaultManoptProblem on the tangent space using the TrustRegionModelObjective to be solved with truncated_conjugate_gradient_descent! or in other words the sub state is set to TruncatedConjugateGradientState.

TrustRegionsState(M, sub_problem, sub_state; kwargs...)
TrustRegionsState(M, p, sub_problem, sub_state; kwargs...)

A trust region state, where the sub problem is solved using a AbstractManoptProblem sub_problem and an AbstractManoptSolverState sub_state.

TrustRegionsState(M, f::Function; evaluation=AllocatingEvaluation, kwargs...)
TrustRegionsState(M, p, f; evaluation=AllocatingEvaluation, kwargs...)

A trust region state, where the sub problem is solved in closed form by a function f(M, p, Y, Δ), where p is the current iterate, Y the initial tangent vector at p and Δ the current trust region radius.

See also

trust_regions, trust_regions!

source

Approximation of the Hessian

Several different methods to approximate the Hessian are available.

Manopt.ApproxHessianFiniteDifference — Type

ApproxHessianFiniteDifference{E, P, T, G, RTR, VTR, R <: Real} <: AbstractApproxHessian

A functor to approximate the Hessian by a finite difference of gradient evaluation.

Given a point p and a direction X and the gradient $\operatorname{grad}F: \mathcal M → T\mathcal M$ of a function $F$ the Hessian is approximated as follows: let $c$ be a stepsize, $X∈ T_p\mathcal M$ a tangent vector and $q = \operatorname{retr}_p(\frac{c}{\lVert X \rVert_p}X)$ be a step in direction $X$ of length $c$ following a retraction Then the Hessian is approximated by the finite difference of the gradients, where $\mathcal T_{\cdot\gets\cdot}$ is a vector transport.

\[\operatorname{Hess}F(p)[X] ≈ \frac{\lVert X \rVert_p}{c}\Bigl( \mathcal T_{p\gets q}\bigr(\operatorname{grad}F(q)\bigl) - \operatorname{grad}F(p) \Bigl)\]

Fields

gradient!!: the gradient function (either allocating or mutating, see evaluation parameter)
step_length: a step length for the finite difference
retraction_method: a retraction to use
vector_transport_method: a vector transport to use

Internal temporary fields

grad_tmp: a temporary storage for the gradient at the current p
grad_dir_tmp: a temporary storage for the gradient at the current p_dir
p_dir::P: a temporary storage to the forward direction (or the $q$ in the formula)

Constructor

ApproximateFiniteDifference(M, p, grad_f; kwargs...)

Keyword arguments

evaluation: (AllocatingEvaluation) whether the gradient is given as an allocation function or an in-place (InplaceEvaluation).
steplength: ($2^{-14}$) step length $c$ to approximate the gradient evaluations
retraction_method: (default_retraction_method(M, typeof(p))) a retraction(M, p, X) to use in the approximation.
vector_transport_method: (default_vector_transport_method(M, typeof(p))) a vector transport to use

source

Manopt.ApproxHessianSymmetricRankOne — Type

ApproxHessianSymmetricRankOne{E, P, G, T, B<:AbstractBasis{ℝ}, VTR, R<:Real} <: AbstractApproxHessian

A functor to approximate the Hessian by the symmetric rank one update.

Fields

gradient!! the gradient function (either allocating or mutating, see evaluation parameter).
ν a small real number to ensure that the denominator in the update does not become too small and thus the method does not break down.
vector_transport_method a vector transport to use.

Internal temporary fields

p_tmp a temporary storage the current point p.
grad_tmp a temporary storage for the gradient at the current p.
matrix a temporary storage for the matrix representation of the approximating operator.
basis a temporary storage for an orthonormal basis at the current p.

Constructor

ApproxHessianSymmetricRankOne(M, p, gradF; kwargs...)

Keyword arguments

initial_operator (Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))) the matrix representation of the initial approximating operator.
basis (DefaultOrthonormalBasis()) an orthonormal basis in the tangent space of the initial iterate p.
nu (-1)
evaluation (AllocatingEvaluation) whether the gradient is given as an allocation function or an in-place (InplaceEvaluation).
vector_transport_method (ParallelTransport()) vector transport $\mathcal T_{\cdot\gets\cdot}$ to use.

source

Manopt.ApproxHessianBFGS — Type

ApproxHessianBFGS{E, P, G, T, B<:AbstractBasis{ℝ}, VTR, R<:Real} <: AbstractApproxHessian

A functor to approximate the Hessian by the BFGS update.

Fields

gradient!! the gradient function (either allocating or mutating, see evaluation parameter).
scale
vector_transport_method a vector transport to use.

Internal temporary fields

p_tmp a temporary storage the current point p.
grad_tmp a temporary storage for the gradient at the current p.
matrix a temporary storage for the matrix representation of the approximating operator.
basis a temporary storage for an orthonormal basis at the current p.

Constructor

ApproxHessianBFGS(M, p, gradF; kwargs...)

Keyword arguments

initial_operator (Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))) the matrix representation of the initial approximating operator.
basis (DefaultOrthonormalBasis()) an orthonormal basis in the tangent space of the initial iterate p.
nu (-1)
evaluation (AllocatingEvaluation) whether the gradient is given as an allocation function or an in-place (InplaceEvaluation).
vector_transport_method (ParallelTransport()) vector transport $\mathcal T_{\cdot\gets\cdot}$ to use.

source

as well as their (non-exported) common supertype

Manopt.AbstractApproxHessian — Type

AbstractApproxHessian <: Function

An abstract supertype for approximate Hessian functions, declares them also to be functions.

source

Technical details

The trust_regions solver requires the following functions of a manifold to be available

A retract!(M, q, p, X); it is recommended to set the default_retraction_method to a favourite retraction. If this default is set, a retraction_method= does not have to be specified.
By default the stopping criterion uses the norm as well, to stop when the norm of the gradient is small, but if you implemented inner, the norm is provided already.
if you do not provide an initial max_trust_region_radius, a manifold_dimension is required.
A `copyto!(M, q, p) and copy(M,p) for points.
By default the tangent vectors are initialized calling zero_vector(M,p).

Literature

[ABG06]: P.-A. Absil, C. Baker and K. Gallivan. Trust-Region Methods on Riemannian Manifolds. Foundations of Computational Mathematics 7, 303–330 (2006).
[CGT00]: A. R. Conn, N. I. Gould and P. L. Toint. Trust Region Methods (Society for Industrial and Applied Mathematics, 2000).