The Riemannian trust regions solver
Minimize a function
\[\operatorname*{\arg\,min}_{p ∈ \mathcal{M}}\ f(p)\]
by using the Riemannian trust-regions solver following [ABG06] a model is build by lifting the objective at the $k$th iterate $p_k$ by locally mapping the cost function $f$ to the tangent space as $f_k: T_{p_k}\mathcal M → ℝ$ as $f_k(X) = f(\operatorname{retr}_{p_k}(X))$. The trust region subproblem is then defined as
\[\operatorname*{arg\,min}_{X ∈ T_{p_k}\mathcal M}\ m_k(X),\]
where
\[\begin{align*} m_k&: T_{p_K}\mathcal M → ℝ,\\ m_k(X) &= f(p_k) + ⟨\operatorname{grad} f(p_k), X⟩_{p_k} + \frac{1}{2}\langle \mathcal H_k(X),X⟩_{p_k}\\ \text{such that}&\ \lVert X \rVert_{p_k} ≤ Δ_k. \end{align*}\]
Here $Δ_k$ is a trust region radius, that is adapted every iteration, and $\mathcal H_k$ is some symmetric linear operator that approximates the Hessian $\operatorname{Hess} f$ of $f$.
Interface
Manopt.trust_regions
— Functiontrust_regions(M, f, grad_f, Hess_f, p=rand(M); kwargs...)
trust_regions(M, f, grad_f, p=rand(M); kwargs...)
trust_regions!(M, f, grad_f, Hess_f, p; kwargs...)
trust_regions!(M, f, grad_f, p; kwargs...)
run the Riemannian trust-regions solver for optimization on manifolds to minimize f
, see on [ABG06, CGT00].
For the case that no Hessian is provided, the Hessian is computed using finite differences, see ApproxHessianFiniteDifference
. For solving the inner trust-region subproblem of finding an update-vector, by default the truncated_conjugate_gradient_descent
is used.
Input
M::
AbstractManifold
: a Riemannian manifold $\mathcal M$
f
: a cost function $f: \mathcal M→ ℝ$ implemented as(M, p) -> v
grad_f
: the (Riemannian) gradient $\operatorname{grad}f: \mathcal M → T_{p}\mathcal M$ of f as a function(M, p) -> X
or a function(M, X, p) -> X
computingX
in-placeHess_f
: the (Riemannian) Hessian $\operatorname{Hess}f: T_{p}\mathcal M → T_{p}\mathcal M$ of f as a function(M, p, X) -> Y
or a function(M, Y, p, X) -> Y
computingY
in-placep
: a point on the manifold $\mathcal M$
Keyword arguments
acceptance_rate
: accept/reject threshold: if ρ (the performance ratio for the iterate) is at least the acceptance rate ρ', the candidate is accepted. This value should be between $0$ and $rac{1}{4}$augmentation_threshold=0.75
: trust-region augmentation threshold: if ρ is larger than this threshold, a solution is on the trust region boundary and negative curvature, and the radius is extended (augmented)augmentation_factor=2.0
: trust-region augmentation factorevaluation=
AllocatingEvaluation
()
: specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (AllocatingEvaluation
) or whether they modify their input argument to return the result therein (InplaceEvaluation
). Since usually the first argument is the manifold, the modified argument is the second.κ=0.1
: the linear convergence target rate of the tCG methodtruncated_conjugate_gradient_descent
, and is used in a stopping criterion thereinmax_trust_region_radius
: the maximum trust-region radiuspreconditioner
: a preconditioner for the Hessian H. This is either an allocating function(M, p, X) -> Y
or an in-place function(M, Y, p, X) -> Y
, seeevaluation
, and by default set to the identity.project!=copyto!
: for numerical stability it is possible to project onto the tangent space after every iteration. the function has to work inplace ofY
, that is(M, Y, p, X) -> Y
, whereX
andY
can be the same memory.randomize=false
: indicate whetherX
is initialised to a random vector or not. This disables preconditioning.ρ_regularization=1e3
: regularize the performance evaluation $ρ$ to avoid numerical inaccuracies.reduction_factor=0.25
: trust-region reduction factorreduction_threshold=0.1
: trust-region reduction threshold: if ρ is below this threshold, the trust region radius is reduced byreduction_factor
.retraction_method=
default_retraction_method
(M, typeof(p))
: a retraction $\operatorname{retr}$ to use, see the section on retractionsstopping_criterion=
StopAfterIteration
(1000)
|
StopWhenGradientNormLess
(1e-6)
: a functor indicating that the stopping criterion is fulfilledsub_kwargs=
(;)
: a named tuple of keyword arguments that are passed todecorate_objective!
of the sub solvers objective, thedecorate_state!
of the subsovlers state, and the sub state constructor itself.sub_stopping_criterion=
( seetruncated_conjugate_gradient_descent
): a functor indicating that the stopping criterion is fulfilledsub_problem=
DefaultManoptProblem
(M,
ConstrainedManifoldObjective
(subcost, subgrad; evaluation=evaluation))
: specify a problem for a solver or a closed form solution function, which can be allocating or in-place.sub_state=
QuasiNewtonState
: a state to specify the sub solver to use. For a closed form solution, this indicates the type of function. whereQuasiNewtonLimitedMemoryDirectionUpdate
withInverseBFGS
is usedθ=1.0
: the superlinear convergence target rate of $1+θ$ of the tCG-methodtruncated_conjugate_gradient_descent
, and is used in a stopping criterion thereintrust_region_radius=
injectivity_radius
(M) / 4
: the initial trust-region radius
For the case that no Hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference
.
All other keyword arguments are passed to decorate_state!
for state decorators or decorate_objective!
for objective decorators, respectively.
Output
The obtained approximate minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return
for details, especially the return_state=
keyword.
See also
Manopt.trust_regions!
— Functiontrust_regions(M, f, grad_f, Hess_f, p=rand(M); kwargs...)
trust_regions(M, f, grad_f, p=rand(M); kwargs...)
trust_regions!(M, f, grad_f, Hess_f, p; kwargs...)
trust_regions!(M, f, grad_f, p; kwargs...)
run the Riemannian trust-regions solver for optimization on manifolds to minimize f
, see on [ABG06, CGT00].
For the case that no Hessian is provided, the Hessian is computed using finite differences, see ApproxHessianFiniteDifference
. For solving the inner trust-region subproblem of finding an update-vector, by default the truncated_conjugate_gradient_descent
is used.
Input
M::
AbstractManifold
: a Riemannian manifold $\mathcal M$
f
: a cost function $f: \mathcal M→ ℝ$ implemented as(M, p) -> v
grad_f
: the (Riemannian) gradient $\operatorname{grad}f: \mathcal M → T_{p}\mathcal M$ of f as a function(M, p) -> X
or a function(M, X, p) -> X
computingX
in-placeHess_f
: the (Riemannian) Hessian $\operatorname{Hess}f: T_{p}\mathcal M → T_{p}\mathcal M$ of f as a function(M, p, X) -> Y
or a function(M, Y, p, X) -> Y
computingY
in-placep
: a point on the manifold $\mathcal M$
Keyword arguments
acceptance_rate
: accept/reject threshold: if ρ (the performance ratio for the iterate) is at least the acceptance rate ρ', the candidate is accepted. This value should be between $0$ and $rac{1}{4}$augmentation_threshold=0.75
: trust-region augmentation threshold: if ρ is larger than this threshold, a solution is on the trust region boundary and negative curvature, and the radius is extended (augmented)augmentation_factor=2.0
: trust-region augmentation factorevaluation=
AllocatingEvaluation
()
: specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (AllocatingEvaluation
) or whether they modify their input argument to return the result therein (InplaceEvaluation
). Since usually the first argument is the manifold, the modified argument is the second.κ=0.1
: the linear convergence target rate of the tCG methodtruncated_conjugate_gradient_descent
, and is used in a stopping criterion thereinmax_trust_region_radius
: the maximum trust-region radiuspreconditioner
: a preconditioner for the Hessian H. This is either an allocating function(M, p, X) -> Y
or an in-place function(M, Y, p, X) -> Y
, seeevaluation
, and by default set to the identity.project!=copyto!
: for numerical stability it is possible to project onto the tangent space after every iteration. the function has to work inplace ofY
, that is(M, Y, p, X) -> Y
, whereX
andY
can be the same memory.randomize=false
: indicate whetherX
is initialised to a random vector or not. This disables preconditioning.ρ_regularization=1e3
: regularize the performance evaluation $ρ$ to avoid numerical inaccuracies.reduction_factor=0.25
: trust-region reduction factorreduction_threshold=0.1
: trust-region reduction threshold: if ρ is below this threshold, the trust region radius is reduced byreduction_factor
.retraction_method=
default_retraction_method
(M, typeof(p))
: a retraction $\operatorname{retr}$ to use, see the section on retractionsstopping_criterion=
StopAfterIteration
(1000)
|
StopWhenGradientNormLess
(1e-6)
: a functor indicating that the stopping criterion is fulfilledsub_kwargs=
(;)
: a named tuple of keyword arguments that are passed todecorate_objective!
of the sub solvers objective, thedecorate_state!
of the subsovlers state, and the sub state constructor itself.sub_stopping_criterion=
( seetruncated_conjugate_gradient_descent
): a functor indicating that the stopping criterion is fulfilledsub_problem=
DefaultManoptProblem
(M,
ConstrainedManifoldObjective
(subcost, subgrad; evaluation=evaluation))
: specify a problem for a solver or a closed form solution function, which can be allocating or in-place.sub_state=
QuasiNewtonState
: a state to specify the sub solver to use. For a closed form solution, this indicates the type of function. whereQuasiNewtonLimitedMemoryDirectionUpdate
withInverseBFGS
is usedθ=1.0
: the superlinear convergence target rate of $1+θ$ of the tCG-methodtruncated_conjugate_gradient_descent
, and is used in a stopping criterion thereintrust_region_radius=
injectivity_radius
(M) / 4
: the initial trust-region radius
For the case that no Hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference
.
All other keyword arguments are passed to decorate_state!
for state decorators or decorate_objective!
for objective decorators, respectively.
Output
The obtained approximate minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return
for details, especially the return_state=
keyword.
See also
State
Manopt.TrustRegionsState
— TypeTrustRegionsState <: AbstractHessianSolverState
Store the state of the trust-regions solver.
Fields
acceptance_rate
: a lower bound of the performance ratio for the iterate that decides if the iteration is accepted or not.HX
,HY
,HZ
: interim storage (to avoid allocation) of `\operatorname{Hess} f(p)[⋅]
ofX
,Y
,Z
max_trust_region_radius
: the maximum trust-region radiusp::P
: a point on the manifold $\mathcal M$storing the current iterateproject!
: for numerical stability it is possible to project onto the tangent space after every iteration. the function has to work inplace ofY
, that is(M, Y, p, X) -> Y
, whereX
andY
can be the same memory.stop::StoppingCriterion
: a functor indicating that the stopping criterion is fulfilledrandomize
: indicate whetherX
is initialised to a random vector or notρ_regularization
: regularize the model fitness $ρ$ to avoid division by zerosub_problem::Union{AbstractManoptProblem, F}
: specify a problem for a solver or a closed form solution function, which can be allocating or in-place.sub_state::Union{AbstractManoptProblem, F}
: a state to specify the sub solver to use. For a closed form solution, this indicates the type of function.σ
: Gaussian standard deviation when creating the random initial tangent vector This field has no effect, whenrandomize
is false.trust_region_radius
: the trust-region radiusX::T
: a tangent vector at the point $p$ on the manifold $\mathcal M$Y
: the solution (tangent vector) of the subsolverZ
: the Cauchy point (only used if random is activated)
Constructors
TrustRegionsState(M, mho::AbstractManifoldHessianObjective; kwargs...)
TrustRegionsState(M, sub_problem, sub_state; kwargs...)
TrustRegionsState(M, sub_problem; evaluation=AllocatingEvaluation(), kwargs...)
create a trust region state.
- given a
AbstractManifoldHessianObjective
mho
, the default sub solver, aTruncatedConjugateGradientState
withmho
used to define the problem on a tangent space is created - given a
sub_problem
and anevaluation=
keyword, the sub problem solver is assumed to be the closed form solution, whereevaluation
determines how to call the sub function.
Input
M::
AbstractManifold
: a Riemannian manifold $\mathcal M$
sub_problem
: specify a problem for a solver or a closed form solution function, which can be allocating or in-place.sub_state
: a state to specify the sub solver to use. For a closed form solution, this indicates the type of function.
Keyword arguments
acceptance_rate=0.1
max_trust_region_radius=sqrt(manifold_dimension(M))
p=
rand
(M)
: a point on the manifold $\mathcal M$to specify the initial valueproject!=copyto!
stopping_criterion=
StopAfterIteration
(1000)
|
StopWhenGradientNormLess
(1e-6)
: a functor indicating that the stopping criterion is fulfilledrandomize=false
ρ_regularization=10000.0
θ=1.0
trust_region_radius=max_trust_region_radius / 8
X=
zero_vector
(M, p)
: a tangent vector at the point $p$ on the manifold $\mathcal M$to specify the representation of a tangent vector
See also
Approximation of the Hessian
Several different methods to approximate the Hessian are available.
Manopt.ApproxHessianFiniteDifference
— TypeApproxHessianFiniteDifference{E, P, T, G, RTR, VTR, R <: Real} <: AbstractApproxHessian
A functor to approximate the Hessian by a finite difference of gradient evaluation.
Given a point p
and a direction X
and the gradient $\operatorname{grad} f(p)$ of a function $f$ the Hessian is approximated as follows: let $c$ be a stepsize, $X ∈ T_{p}\mathcal M$ a tangent vector and $q = \operatorname{retr}_p(\frac{c}{\lVert X \rVert_p}X)$ be a step in direction $X$ of length $c$ following a retraction Then the Hessian is approximated by the finite difference of the gradients, where $\mathcal T_{⋅←⋅}$ is a vector transport.
\[\operatorname{Hess}f(p)[X] ≈ \frac{\lVert X \rVert_p}{c}\Bigl( \mathcal T_{p\gets q}\bigr(\operatorname{grad}f(q)\bigl) - \operatorname{grad}f(p) \Bigl)\]
Fields
gradient!!
: the gradient function (either allocating or mutating, seeevaluation
parameter)step_length
: a step length for the finite differenceretraction_method=
default_retraction_method
(M, typeof(p))
: a retraction $\operatorname{retr}$ to use, see the section on retractionsvector_transport_method=
default_vector_transport_method
(M, typeof(p))
: a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
Internal temporary fields
grad_tmp
: a temporary storage for the gradient at the currentp
grad_dir_tmp
: a temporary storage for the gradient at the currentp_dir
p_dir::P
: a temporary storage to the forward direction (or the $q$ in the formula)
Constructor
ApproximateFiniteDifference(M, p, grad_f; kwargs...)
Keyword arguments
evaluation=
AllocatingEvaluation
()
: specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (AllocatingEvaluation
) or whether they modify their input argument to return the result therein (InplaceEvaluation
). Since usually the first argument is the manifold, the modified argument is the second.steplength=
2^{-14}$: step length$c`` to approximate the gradient evaluationsretraction_method=
default_retraction_method
(M, typeof(p))
: a retraction $\operatorname{retr}$ to use, see the section on retractionsvector_transport_method=
default_vector_transport_method
(M, typeof(p))
: a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
Manopt.ApproxHessianSymmetricRankOne
— TypeApproxHessianSymmetricRankOne{E, P, G, T, B<:AbstractBasis{ℝ}, VTR, R<:Real} <: AbstractApproxHessian
A functor to approximate the Hessian by the symmetric rank one update.
Fields
gradient!!
: the gradient function (either allocating or mutating, seeevaluation
parameter).ν
: a small real number to ensure that the denominator in the update does not become too small and thus the method does not break down.vector_transport_method=
default_vector_transport_method
(M, typeof(p))
: a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports.
Internal temporary fields
p_tmp
: a temporary storage the current pointp
.grad_tmp
: a temporary storage for the gradient at the currentp
.matrix
: a temporary storage for the matrix representation of the approximating operator.basis
: a temporary storage for an orthonormal basis at the currentp
.
Constructor
ApproxHessianSymmetricRankOne(M, p, gradF; kwargs...)
Keyword arguments
initial_operator
(Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))
) the matrix representation of the initial approximating operator.basis
(DefaultOrthonormalBasis()
) an orthonormal basis in the tangent space of the initial iterate p.nu
(-1
)evaluation=
AllocatingEvaluation
()
: specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (AllocatingEvaluation
) or whether they modify their input argument to return the result therein (InplaceEvaluation
). Since usually the first argument is the manifold, the modified argument is the second.vector_transport_method=
default_vector_transport_method
(M, typeof(p))
: a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
Manopt.ApproxHessianBFGS
— TypeApproxHessianBFGS{E, P, G, T, B<:AbstractBasis{ℝ}, VTR, R<:Real} <: AbstractApproxHessian
A functor to approximate the Hessian by the BFGS update.
Fields
gradient!!
the gradient function (either allocating or mutating, seeevaluation
parameter).scale
vector_transport_method::AbstractVectorTransportMethodP
: a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
Internal temporary fields
p_tmp
a temporary storage the current pointp
.grad_tmp
a temporary storage for the gradient at the currentp
.matrix
a temporary storage for the matrix representation of the approximating operator.basis
a temporary storage for an orthonormal basis at the currentp
.
Constructor
ApproxHessianBFGS(M, p, gradF; kwargs...)
Keyword arguments
initial_operator
(Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))
) the matrix representation of the initial approximating operator.basis
(DefaultOrthonormalBasis()
) an orthonormal basis in the tangent space of the initial iterate p.nu
(-1
)evaluation=
AllocatingEvaluation
()
: specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (AllocatingEvaluation
) or whether they modify their input argument to return the result therein (InplaceEvaluation
). Since usually the first argument is the manifold, the modified argument is the second.vector_transport_method=
default_vector_transport_method
(M, typeof(p))
: a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
as well as their (non-exported) common supertype
Manopt.AbstractApproxHessian
— TypeAbstractApproxHessian <: Function
An abstract supertype for approximate Hessian functions, declares them also to be functions.
Technical details
The trust_regions
solver requires the following functions of a manifold to be available
- A
retract!
(M, q, p, X)
; it is recommended to set thedefault_retraction_method
to a favourite retraction. If this default is set, aretraction_method=
does not have to be specified. - By default the stopping criterion uses the
norm
as well, to stop when the norm of the gradient is small, but if you implementedinner
, the norm is provided already. - if you do not provide an initial
max_trust_region_radius
, amanifold_dimension
is required. - A `copyto!
(M, q, p)
andcopy
(M,p)
for points. - By default the tangent vectors are initialized calling
zero_vector
(M,p)
.
Literature
- [ABG06]
- P.-A. Absil, C. Baker and K. Gallivan. Trust-Region Methods on Riemannian Manifolds. Foundations of Computational Mathematics 7, 303–330 (2006).
- [CGT00]
- A. R. Conn, N. I. Gould and P. L. Toint. Trust Region Methods (Society for Industrial and Applied Mathematics, 2000).