The Riemannian trust-regions solver
Minimize a function
\[\operatorname*{\arg\,min}_{p ∈ \mathcal{M}}\ f(p)\]
by using the Riemannian trust-regions solver following [ABG06] a model is build by lifting the objective at the $k$th iterate $p_k$ by locally mapping the cost function $f$ to the tangent space as $f_k: T_{p_k}\mathcal M → \mathbb R$ as $f_k(X) = f(\operatorname{retr}_{p_k}(X))$. The trust region subproblem is then defined as
\[\operatorname*{arg\,min}_{X ∈ T_{p_k}\mathcal M}\ m_k(X),\]
where
\[\begin{align*} m_k&: T_{p_K}\mathcal M → \mathbb R,\\ m_k(X) &= f(p_k) + ⟨\operatorname{grad} f(p_k), X⟩_{p_k} + \frac{1}{2}\langle \mathcal H_k(X),X⟩_{p_k}\\ \text{such that}&\ \lVert X \rVert_{p_k} ≤ Δ_k. \end{align*}\]
Here $Δ_k$ is a trust region radius, that is adapted every iteration, and $\mathcal H_k$ is some symmetric linear operator that approximates the Hessian $\operatorname{Hess} f$ of $f$.
Interface
Manopt.trust_regions
— Functiontrust_regions(M, f, grad_f, hess_f, p=rand(M))
trust_regions(M, f, grad_f, p=rand(M))
run the Riemannian trust-regions solver for optimization on manifolds to minimize f
cf. [Absil, Baker, Gallivan, FoCM, 2006; Conn, Gould, Toint, SIAM, 2000].
For the case that no hessian is provided, the Hessian is computed using finite differences, see ApproxHessianFiniteDifference
. For solving the the inner trust-region subproblem of finding an update-vector, by default the truncated_conjugate_gradient_descent
is used.
Input
M
– a manifold $\mathcal M$f
– a cost function $f : \mathcal M → ℝ$ to minimizegrad_f
– the gradient $\operatorname{grad}F : \mathcal M → T \mathcal M$ of $F$Hess_f
– (optional), the hessian $\operatorname{Hess}F(x): T_x\mathcal M → T_x\mathcal M$, $X ↦ \operatorname{Hess}F(x)[X] = ∇_ξ\operatorname{grad}f(x)$p
– (rand(M)
) an initial value $x ∈ \mathcal M$
Keyword Arguments
acceptance_rate
– Accept/reject threshold: if ρ (the performance ratio for the iterate) is at least the acceptance rate ρ', the candidate is accepted. This value should be between $0$ and $\frac{1}{4}$ (formerly this was called `ρ_prime, which will be removed on the next breaking change)augmentation_threshold
– (0.75
) trust-region augmentation threshold: if ρ is above this threshold, we have a solution on the trust region boundary and negative curvature, we extend (augment) the radiusaugmentation_factor
– (2.0
) trust-region augmentation factorevaluation
– (AllocatingEvaluation
) specify whether the gradient and hessian work by allocation (default) orInplaceEvaluation
in placeκ
– (0.1
) the linear convergence target rate of the tCG methodtruncated_conjugate_gradient_descent
, and is used in a stopping crierion thereinmax_trust_region_radius
– the maximum trust-region radiuspreconditioner
– a preconditioner (a symmetric, positive definite operator that should approximate the inverse of the Hessian)project!
– (copyto!
) specify a projection operation for tangent vectors within the subsolver for numerical stability. this means we require a function(M, Y, p, X) -> ...
working in place ofY
.randomize
– set to true if the trust-region solve is to be initiated with a random tangent vector and no preconditioner will be used.ρ_regularization
– (1e3
) regularize the performance evaluation $ρ$ to avoid numerical inaccuracies.reduction_factor
– (0.25
) trust-region reduction factorreduction_threshold
– (0.1
) trust-region reduction threshold: if ρ is below this threshold, the trust region radius is reduced byreduction_factor
.retraction
– (default_retraction_method(M, typeof(p))
) a retraction to usestopping_criterion
– (StopAfterIteration
(1000) |
StopWhenGradientNormLess
(1e-6)
) a functor inheriting fromStoppingCriterion
indicating when to stop.sub_kwargs
– keyword arguments passed to the sub state and used to decorate the sub options, e.g. with debug.sub_stopping_criterion
– a stopping criterion for the sub solver, uses the same standard as TCG.sub_problem
– (DefaultManoptProblem
(M,
ConstrainedManifoldObjective
(subcost, subgrad; evaluation=evaluation))
) problem for the subsolversub_state
– (QuasiNewtonState
) usingQuasiNewtonLimitedMemoryDirectionUpdate
withInverseBFGS
andsub_stopping_criterion
as a stopping criterion. See alsosub_kwargs
.θ
– (1.0
) 1+θ is the superlinear convergence target rate of the tCG-methodtruncated_conjugate_gradient_descent
, and is used in a stopping crierion thereintrust_region_radius
– the initial trust-region radius
For the case that no hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference
.
Output
the obtained (approximate) minimizer $p^*$, see get_solver_return
for details
see also
Manopt.trust_regions!
— Functiontrust_regions!(M, f, grad_f, Hess_f, p; kwargs...)
trust_regions!(M, f, grad_f, p; kwargs...)
evaluate the Riemannian trust-regions solver in place of p
.
Input
M
– a manifold $\mathcal M$f
– a cost function $f: \mathcal M → ℝ$ to minimizegrad_f
– the gradient $\operatorname{grad}f: \mathcal M → T \mathcal M$ of $F$Hess_f
– (optional) the hessian $\operatorname{Hess} f$p
– an initial value $p ∈ \mathcal M$
For the case that no hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference
.
for more details and all options, see trust_regions
State
Manopt.TrustRegionsState
— TypeTrustRegionsState <: AbstractHessianSolverState
Store the state of the trust-regions solver.
Fields
All the following fields (besides p
) can be set by specifying them as keywords.
acceptance_rate
– (0.1
) a lower bound of the performance ratio for the iterate that decides if the iteration will be accepted or not.max_trust_region_radius
– (sqrt(manifold_dimension(M))
) the maximum trust-region radiusp
– (rand(M)
if a manifold is provided) the current iterateproject!
– (copyto!
) specify a projection operation for tangent vectors for numerical stability. A function(M, Y, p, X) -> ...
working in place ofY
. per default, no projection is perfomed, set it toproject!
to activate projection.stop
– (StopAfterIteration
(1000) |
StopWhenGradientNormLess
(1e-6)
)randomize
– (false
) indicates if the trust-region solve is to be initiated with a random tangent vector. If set to true, no preconditioner will be used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.ρ_regularization
– (10000.0
) regularize the model fitness $ρ$ to avoid division by zerosub_problem
– anAbstractManoptProblem
problem or a function(M, p, X) -> q
or(M, q, p, X)
for the a closed form solution of the sub problemsub_state
– (TruncatedConjugateGradientState
(M, p, X)
)σ
– (0.0
or1e-6
depending onrandomize
) Gaussian standard deviation when creating the random initial tangent vectortrust_region_radius
– (max_trust_region_radius / 8
) the (initial) trust-region radiusX
– (zero_vector(M,p)
) the current gradientgrad_f(p)
Use this default to specify the type of tangent vector to allocate also for the internal (tangent vector) fields.
Internal Fields
HX
,HY
,HZ
– interims storage (to avoid allocation) of `\operatorname{Hess} f(p)[\cdot]
ofX
,Y
,Z
Y
– the solution (tangent vector) of the subsolverZ
– the Cauchy point (only used if random is activated)
Constructors
All the following constructors have the above fields as keyword arguents with the defaults given in brackets. If no initial point p
is provided, p=rand(M)
is used
TrustRegionsState(M, mho; kwargs...)
TrustRegionsState(M, p, mho; kwargs...)
A trust region state, where the sub problem is set to a DefaultManoptProblem
on the tangent space using the TrustRegionModelObjective
to be solved with truncated_conjugate_gradient_descent!
or in other words the sub state is set to TruncatedConjugateGradientState
.
TrustRegionsState(M, sub_problem, sub_state; kwargs...)
TrustRegionsState(M, p, sub_problem, sub_state; kwargs...)
A trust region state, where the sub problem is solved using a AbstractManoptProblem
sub_problem
and an AbstractManoptSolverState
sub_state
.
TrustRegionsState(M, f::Function; evaluation=AllocatingEvaluation, kwargs...)
TrustRegionsState(M, p, f; evaluation=AllocatingEvaluation, kwargs...)
A trust region state, where the sub problem is solved in closed form by a funtion f(M, p, Y, Δ)
, where p
is the current iterate, Y
the inital tangent vector at p
and Δ
the current trust region radius.
See also
Approximation of the Hessian
Several different methods to approximate the Hessian are available.
Manopt.ApproxHessianFiniteDifference
— TypeApproxHessianFiniteDifference{E, P, T, G, RTR,, VTR, R <: Real} <: AbstractApproxHessian
A functor to approximate the Hessian by a finite difference of gradient evaluation.
Given a point p
and a direction X
and the gradient $\operatorname{grad}F: \mathcal M \to T\mathcal M$ of a function $F$ the Hessian is approximated as follows: Let $c$ be a stepsize, $X∈ T_p\mathcal M$ a tangent vector and $q = \operatorname{retr}_p(\frac{c}{\lVert X \rVert_p}X)$ be a step in direction $X$ of length $c$ following a retraction Then we approximate the Hessian by the finite difference of the gradients, where $\mathcal T_{\cdot\gets\cdot}$ is a vector transport.
\[\operatorname{Hess}F(p)[X] ≈ \frac{\lVert X \rVert_p}{c}\Bigl( \mathcal T_{p\gets q}\bigr(\operatorname{grad}F(q)\bigl) - \operatorname{grad}F(p)\Bigl)\]
Fields
gradient!!
the gradient function (either allocating or mutating, seeevaluation
parameter)step_length
a step length for the finite differenceretraction_method
- a retraction to usevector_transport_method
a vector transport to use
Internal temporary fields
grad_tmp
a temporary storage for the gradient at the currentp
grad_dir_tmp
a temporary storage for the gradient at the currentp_dir
p_dir::P
a temporary storage to the forward direction (i.e. $q$ above)
Constructor
ApproximateFiniteDifference(M, p, grad_f; kwargs...)
Keyword arguments
evaluation
(AllocatingEvaluation
) whether the gradient is given as an allocation function or an in-place (InplaceEvaluation
).steplength
($2^{-14}$) step length $c$ to approximate the gradient evaluationsretraction_method
– (default_retraction_method(M, typeof(p))
) aretraction(M, p, X)
to use in the approximation.vector_transport_method
- (default_vector_transport_method(M, typeof(p))
) a vector transport to use
Manopt.ApproxHessianSymmetricRankOne
— TypeApproxHessianSymmetricRankOne{E, P, G, T, B<:AbstractBasis{ℝ}, VTR, R<:Real} <: AbstractApproxHessian
A functor to approximate the Hessian by the symmetric rank one update.
Fields
gradient!!
the gradient function (either allocating or mutating, seeevaluation
parameter).ν
a small real number to ensure that the denominator in the update does not become too small and thus the method does not break down.vector_transport_method
a vector transport to use.
Internal temporary fields
p_tmp
a temporary storage the current pointp
.grad_tmp
a temporary storage for the gradient at the currentp
.matrix
a temporary storage for the matrix representation of the approximating operator.basis
a temporary storage for an orthonormal basis at the currentp
.
Constructor
ApproxHessianSymmetricRankOne(M, p, gradF; kwargs...)
Keyword arguments
initial_operator
(Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))
) the matrix representation of the initial approximating operator.basis
(DefaultOrthonormalBasis()
) an orthonormal basis in the tangent space of the initial iterate p.nu
(-1
)evaluation
(AllocatingEvaluation
) whether the gradient is given as an allocation function or an in-place (InplaceEvaluation
).vector_transport_method
(ParallelTransport()
) vector transport $\mathcal T_{\cdot\gets\cdot}$ to use.
Manopt.ApproxHessianBFGS
— TypeApproxHessianBFGS{E, P, G, T, B<:AbstractBasis{ℝ}, VTR, R<:Real} <: AbstractApproxHessian
A functor to approximate the Hessian by the BFGS update.
Fields
gradient!!
the gradient function (either allocating or mutating, seeevaluation
parameter).scale
vector_transport_method
a vector transport to use.
Internal temporary fields
p_tmp
a temporary storage the current pointp
.grad_tmp
a temporary storage for the gradient at the currentp
.matrix
a temporary storage for the matrix representation of the approximating operator.basis
a temporary storage for an orthonormal basis at the currentp
.
Constructor
ApproxHessianBFGS(M, p, gradF; kwargs...)
Keyword arguments
initial_operator
(Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))
) the matrix representation of the initial approximating operator.basis
(DefaultOrthonormalBasis()
) an orthonormal basis in the tangent space of the initial iterate p.nu
(-1
)evaluation
(AllocatingEvaluation
) whether the gradient is given as an allocation function or an in-place (InplaceEvaluation
).vector_transport_method
(ParallelTransport()
) vector transport $\mathcal T_{\cdot\gets\cdot}$ to use.
as well as their (non-exported) common supertype
Manopt.AbstractApproxHessian
— TypeAbstractApproxHessian <: Function
An abstract supertypes for approximate hessian functions, declares them also to be functions.
Technical details
The trust_regions
solver requires the following functions of a manifold to be available
- A
retract!
(M, q, p, X)
; it is recommended to set thedefault_retraction_method
to a favourite retraction. If this default is set, aretraction_method=
does not have to be specified. - By default the stopping criterion uses the
norm
as well, to stop when the norm of the gradient is small, but if you implementedinner
, the norm is provided already. - if you do not provide an initial
max_trudst_region_radius
, amanifold_dimension
is required. - A
copyto!
(M, q, p)
andcopy
(M,p)
for points. - By default the tangent vectors are initialized calling
zero_vector
(M,p)
.
Literature
- [ABG06]
- P.-A. Absil, C. Baker and K. Gallivan. Trust-Region Methods on Riemannian Manifolds. Foundations of Computational Mathematics 7, 303–330 (2006).
- [CGT00]
- A. R. Conn, N. I. Gould and P. L. Toint. Trust Region Methods (Society for Industrial and Applied Mathematics, 2000).