The Riemannian Trust-Regions Solver
The aim is to solve an optimization problem on a manifold
by using the Riemannian trust-regions solver. It is number one choice for smooth optimization. This trust-region method uses the Steihaug-Toint truncated conjugate-gradient method truncatedConjugateGradient
to solve the inner minimization problem called the trust-regions subproblem. This inner solve can be preconditioned by providing a preconditioner (symmetric and positive definite, an approximation of the inverse of the Hessian of $F$). If no Hessian of the cost function $F$ is provided, a standard approximation of the Hessian based on the gradient $\nabla F$ with approxHessianFD
will be computed.
Initialization
Initialize $x_0 = x$ with an initial point $x$ on the manifold. It can be given by the caller or set randomly. Set the initial trust-region radius $\Delta =\frac{1}{8} \bar{\Delta}$ where $\bar{\Delta}$ is the maximum radius the trust-region can have. Usually one uses the root of the manifold dimension $\operatorname{dim}(\mathcal{M})$. For accepting the next iterate and evaluating the new trust-region radius one needs an accept/reject threshold $\rho' \in [0,\frac{1}{4})$, which is $\rho' = 0.1$ on default. Set $k=0$.
Iteration
Repeat until a convergence criterion is reached
- Set $\eta$ as a random tangent vector if using randomized approach. Else set $\eta$ as the zero vector in the tangential space $T_{x_k}\mathcal{M}$.
- Set $\eta^{* }$ as the solution of the trust-region subproblem, computed by the tcg-method with $\eta$ as initial vector.
- If using randomized approach compare $\eta^{* }$ with the Cauchy point $\eta_{c}^{* } = -\tau_{c} \frac{\Delta}{\operatorname{norm}(\operatorname{Grad}[f] (x_k))} \operatorname{Grad}[F] (x_k)$ by the model function $m_{x_k}(\cdot)$. If the model decrease is larger by using the Cauchy point, set $\eta^{* } = \eta_{c}^{* }$.
- Set ${x}^{* } = \operatorname{Retr}_{x_k}(\eta^{* })$.
- Set $\rho = \frac{F(x_k)-F({x}^{* })}{m_{x_k}(\eta)-m_{x_k}(\eta^{* })}$, where $m_{x_k}(\cdot)$ describes the quadratic model function.
- Update the trust-region radius: $\Delta = \begin{cases} \frac{1}{4} \Delta & \rho < \frac{1}{4} \, \text{or} \, m_{x_k}(\eta)-m_{x_k}(\eta^{* }) \leq 0 \, \text{or} \, \rho = \pm \infty , \\ \operatorname{min}(2 \Delta, \bar{\Delta}) & \rho > \frac{3}{4} \, \text{and the tcg-method stopped because of negative curvature or exceeding the trust-region}, \\ \Delta & \, \text{otherwise.} \end{cases}$
- If $m_{x_k}(\eta)-m_{x_k}(\eta^{* }) \geq 0$ and $\rho > \rho'$ set $x_k = {x}^{* }$.
- Set $k = k+1$.
Result
The result is given by the last computed $x_k$.
Remarks
To the Initialization: A random point on the manifold can be generated with randomMPoint
(M)
.
To step number 1: Using randomized approach means using a random tangent vector as initial vector for the approximal solve of the trust-regions subproblem. If this is the case, keep in mind that the vector must be in the trust-region radius. This is achieved by multiplying η =
randomTVector
(M,x)
by sqrt(4,eps(Float64))
as long as its norm is greater than the current trust-region radius $\Delta$. For not using randomized approach, one can get the zero tangent vector with η =
zeroTVector
(M,x)
.
To step number 2: Obtain $\eta^{* }$ by (approximately) solving the trust-regions subproblem
with the Steihaug-Toint truncated conjugate-gradient (tcg) method. The problem as well as the solution method is described in the truncatedConjugateGradient
.
To step number 3: If using a random tangent vector as an initial vector, compare the result of the tcg-method with the Cauchy point. Convergence proofs assume that one achieves at least (a fraction of) the reduction of the Cauchy point. The idea is to go in the direction of the gradient to an optimal point. This can be on the edge, but also before. The parameter $\tau_{c}$ for the optimal length is defined by
To check the model decrease one compares $m_{x_k}(\eta_{c}^{* }) = F(x_k) + \langle \eta_{c}^{* }, \operatorname{Grad}[F] (x_k)\rangle_{x_k} + \frac{1}{2}\langle \eta_{c}^{* }, \operatorname{Hess}[F] (\eta_{c}^{* })_ {x_k}\rangle_{x_k}$ with $m_{x_k}(\eta^{* }) = F(x_k) + \langle \eta^{* }, \operatorname{Grad}[F] (x_k)\rangle_{x_k} + \frac{1}{2}\langle \eta^{* }, \operatorname{Hess}[F] (\eta^{* })_ {x_k}\rangle_{x_k}$. If $m_{x_k}(\eta_{c}^{* }) < m_{x_k}(\eta^{* })$ then is $m_{x_k}(\eta_{c}^{* })$ the better choice.
To step number 4: $\operatorname{Retr}_{x_k}(\cdot)$ denotes the retraction, a mapping $\operatorname{Retr}_{x_k}:T_{x_k}\mathcal{M} \rightarrow \mathcal{M}$ wich approximates the exponential map. In some cases it is cheaper to use this instead of the exponential.
To step number 6: One knows that the truncatedConjugateGradient
algorithm stopped for these reasons when the stopping criteria stopWhenCurvatureIsNegative
, stopWhenTrustRegionIsExceeded
are activated.
To step number 7: The last step is to decide if the new point ${x}^{* }$ is accepted.
Interface
Manopt.trustRegions
— Function.trustRegions(M, F, ∇F, x, H)
evaluate the Riemannian trust-regions solver for optimization on manifolds. It will attempt to minimize the cost function F on the Manifold M. If no Hessian H is provided, a standard approximation of the Hessian based on the gradient ∇F will be computed. For solving the the inner trust-region subproblem of finding an update-vector, it uses the Steihaug-Toint truncated conjugate-gradient method. For a description of the algorithm and more details see
- P.-A. Absil, C.G. Baker, K.A. Gallivan, Trust-region methods on Riemannian manifolds, FoCM, 2007. doi: 10.1007/s10208-005-0179-9
- A. R. Conn, N. I. M. Gould, P. L. Toint, Trust-region methods, SIAM, MPS, 2000. doi: 10.1137/1.9780898719857
Input
M
– a manifold $\mathcal M$F
– a cost function $F \colon \mathcal M \to \mathbb R$ to minimize∇F
- the gradient $\nabla F \colon \mathcal M \to T \mathcal M$ of $F$x
– an initial value $x \in \mathcal M$H
– the hessian $H( \mathcal M, x, \xi)$ of $F$
Optional
retraction
– approximation of the exponential mapexp
preconditioner
– a preconditioner (a symmetric, positive definite operator that should approximate the inverse of the Hessian)stoppingCriterion
– (stopWhenAny
(stopAfterIteration
(1000)
,stopWhenGradientNormLess
(10^(-6))
) a functor inheriting fromStoppingCriterion
indicating when to stop.Δ_bar
– the maximum trust-region radiusΔ
- the (initial) trust-region radiususeRandom
– set to true if the trust-region solve is to be initiated with a random tangent vector. If set to true, no preconditioner will be used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.ρ_prime
– Accept/reject threshold: if ρ (the performance ratio for the iterate) is at least ρ', the outer iteration is accepted. Otherwise, it is rejected. In case it is rejected, the trust-region radius will have been decreased. To ensure this, ρ' >= 0 must be strictly smaller than 1/4. If ρ_prime is negative, the algorithm is not guaranteed to produce monotonically decreasing cost values. It is strongly recommended to set ρ' > 0, to aid convergence.ρ_regularization
– Close to convergence, evaluating the performance ratio ρ is numerically challenging. Meanwhile, close to convergence, the quadratic model should be a good fit and the steps should be accepted. Regularization lets ρ go to 1 as the model decrease and the actual decrease go to zero. Set this option to zero to disable regularization (not recommended). When this is not zero, it may happen that the iterates produced are not monotonically improving the cost when very close to convergence. This is because the corrected cost improvement could change sign if it is negative but very small.returnOptions
– (false
) – if actiavated, the extended result, i.e. the completeOptions
are returned. This can be used to access recorded values. If set to false (default) just the optimal valuexOpt
is returned
Output
x
– the last reached point on the manifold
see also
Options
Manopt.TrustRegionsOptions
— Type.TrustRegionsOptions <: HessianOptions
describe the trust-regions solver, with
Fields
a default value is given in brackets if a parameter can be left out in initialization.
x
: aMPoint
as starting pointstop
: a function s,r = @(o,iter) returning a stop indicator and a reason based on an iteration number and the gradientΔ
: the (initial) trust-region radiusΔ_bar
: the maximum trust-region radiususeRand
: indicates if the trust-region solve is to be initiated with a random tangent vector. If set to true, no preconditioner will be used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.ρ_prime
: a lower bound of the performance ratio for the iterate that decides if the iteration will be accepted or not. If not, the trust-region radius will have been decreased. To ensure this, ρ'>= 0 must be strictly smaller than 1/4. If ρ' is negative, the algorithm is not guaranteed to produce monotonically decreasing cost values. It is strongly recommended to set ρ' > 0, to aid convergence.ρ_regularization
: Close to convergence, evaluating the performance ratio ρ is numerically challenging. Meanwhile, close to convergence, the quadratic model should be a good fit and the steps should be accepted. Regularization lets ρ go to 1 as the model decrease and the actual decrease go to zero. Set this option to zero to disable regularization (not recommended). When this is not zero, it may happen that the iterates produced are not monotonically improving the cost when very close to convergence. This is because the corrected cost improvement could change sign if it is negative but very small.
Constructor
TrustRegionsOptions(x, stop, delta, delta_bar, uR, rho_prime, rho_reg)
construct a trust-regions Option with the fields as above.
See also
Approximation of the Hessian
Manopt.approxHessianFD
— Function.approxHessianFD(p,x,ξ,[stepsize=2.0^(-14)])
return an approximated solution of the Hessian of the cost function applied to a TVector
ξ
by using a generic finite difference approximation based on computations of the gradient.
Input
p
– a Manopt problem structure (already containing the manifold and enough information to compute the cost gradient)x
– aMPoint
where the Hessian is to be approximatedξ
– aTVector
on which the approximated Hessian is to be applied
Optional
stepsize
– the length of the step with which the method should work
Output