Conjugate gradient descent

Manopt.conjugate_gradient_descentFunction
conjugate_gradient_descent(M, F, gradF, p=rand(M))
conjugate_gradient_descent(M, gradient_objective, p)

perform a conjugate gradient based descent

\[p_{k+1} = \operatorname{retr}_{p_k} \bigl( s_kδ_k \bigr),\]

where $\operatorname{retr}$ denotes a retraction on the Manifold M and one can employ different rules to update the descent direction $δ_k$ based on the last direction $δ_{k-1}$ and both gradients $\operatorname{grad}f(x_k)$,$\operatorname{grad}f(x_{k-1})$. The Stepsize $s_k$ may be determined by a Linesearch.

Alternatively to f and grad_f you can provide the AbstractManifoldGradientObjective gradient_objective directly.

Available update rules are SteepestDirectionUpdateRule, which yields a gradient_descent, ConjugateDescentCoefficient (the default), DaiYuanCoefficient, FletcherReevesCoefficient, HagerZhangCoefficient, HestenesStiefelCoefficient, LiuStoreyCoefficient, and PolakRibiereCoefficient. These can all be combined with a ConjugateGradientBealeRestart rule.

They all compute $β_k$ such that this algorithm updates the search direction as

\[\delta_k=\operatorname{grad}f(p_k) + β_k \delta_{k-1}\]

Input

  • M a manifold $\mathcal M$
  • f a cost function $F:\mathcal M→ℝ$ to minimize implemented as a function (M,p) -> v
  • grad_f the gradient $\operatorname{grad}F:\mathcal M → T\mathcal M$ of $F$ implemented also as (M,x) -> X
  • p an initial value $x∈\mathcal M$

Optional

  • coefficient: (ConjugateDescentCoefficient <: DirectionUpdateRule) rule to compute the descent direction update coefficient $β_k$, as a functor, where the resulting function maps are (amp, cgs, i) -> β with amp an AbstractManoptProblem, cgs is the ConjugateGradientDescentState, and i is the current iterate.
  • evaluation: (AllocatingEvaluation) specify whether the gradient works by allocation (default) form gradF(M, x) or InplaceEvaluation in place of the form gradF!(M, X, x).
  • retraction_method: (default_retraction_method(M, typeof(p))) a retraction method to use.
  • stepsize: (ArmijoLinesearch via default_stepsize) A Stepsize function applied to the search direction. The default is a constant step size 1.
  • stopping_criterion: (stopWhenAny( stopAtIteration(200), stopGradientNormLess(10.0^-8))) a function indicating when to stop.
  • vector_transport_method: (default_vector_transport_method(M, typeof(p))) vector transport method to transport the old descent direction when computing the new descent direction.

If you provide the ManifoldGradientObjective directly, evaluation is ignored.

Output

the obtained (approximate) minimizer $p^*$, see get_solver_return for details

source
Manopt.conjugate_gradient_descent!Function
conjugate_gradient_descent!(M, F, gradF, x)
conjugate_gradient_descent!(M, gradient_objective, p; kwargs...)

perform a conjugate gradient based descent in place of x as

\[p_{k+1} = \operatorname{retr}_{p_k} \bigl( s_k\delta_k \bigr),\]

where $\operatorname{retr}$ denotes a retraction on the Manifold M

Input

  • M: a manifold $\mathcal M$
  • f: a cost function $F:\mathcal M→ℝ$ to minimize
  • grad_f: the gradient $\operatorname{grad}F:\mathcal M→ T\mathcal M$ of F
  • p: an initial value $p∈\mathcal M$

Alternatively to f and grad_f you can provide the AbstractManifoldGradientObjective gradient_objective directly.

for more details and options, especially the DirectionUpdateRules, see conjugate_gradient_descent.

source

State

Manopt.ConjugateGradientDescentStateType
ConjugateGradientState <: AbstractGradientSolverState

specify options for a conjugate gradient descent algorithm, that solves a [DefaultManoptProblem].

Fields

  • p: the current iterate, a point on a manifold
  • X: the current gradient, also denoted as $ξ$ or $X_k$ for the gradient in the $k$th step.
  • δ: the current descent direction, also a tangent vector
  • β: the current update coefficient rule, see .
  • coefficient: (ConjugateDescentCoefficient()) a DirectionUpdateRule function to determine the new β
  • stepsize: (default_stepsize(M, ConjugateGradientDescentState; retraction_method=retraction_method)) a Stepsize function
  • stop: (StopAfterIteration(500) |StopWhenGradientNormLess(1e-8)) a StoppingCriterion
  • retraction_method: (default_retraction_method(M, typeof(p))) a type of retraction
  • vector_transport_method: (default_retraction_method(M, typeof(p))) a type of retraction

Constructor

ConjugateGradientState(M, p)

where the last five fields can be set by their names as keyword and the X can be set to a tangent vector type using the keyword initial_gradient which defaults to zero_vector(M,p), and δ is initialized to a copy of this vector.

See also

conjugate_gradient_descent, DefaultManoptProblem, ArmijoLinesearch

source

Available coefficients

The update rules act as DirectionUpdateRule, which internally always first evaluate the gradient itself.

Manopt.ConjugateGradientBealeRestartType
ConjugateGradientBealeRestart <: DirectionUpdateRule

An update rule might require a restart, that is using pure gradient as descent direction, if the last two gradients are nearly orthogonal, see [HZ06, page 12] (in the preprint, page 46 in Journal page numbers). This method is named after E. Beale from his proceedings paper in 1972 [Bea72]. This method acts as a decorator to any existing DirectionUpdateRule direction_update.

When obtain from the ConjugateGradientDescentStatecgs the last $p_k,X_k$ and the current $p_{k+1},X_{k+1}$ iterate and the gradient, respectively.

Then a restart is performed, hence $β_k = 0$ returned if

\[ \frac{ ⟨X_{k+1}, P_{p_{k+1}\gets p_k}X_k⟩}{\lVert X_k \rVert_{p_k}} > ξ,\]

where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$, and $ξ$ is the threshold. The default threshold is chosen as 0.2 as recommended in [Pow77]

Constructor

ConjugateGradientBealeRestart(
    direction_update::D,
    threshold=0.2;
    manifold::AbstractManifold = DefaultManifold(),
    vector_transport_method::V=default_vector_transport_method(manifold),
)
source
Manopt.ConjugateDescentCoefficientType
ConjugateDescentCoefficient <: DirectionUpdateRule

Computes an update coefficient for the conjugate gradient method, where the ConjugateGradientDescentStatecgds include the last iterates $p_k,X_k$, the current iterates $p_{k+1},X_{k+1}$ of the iterate and the gradient, respectively, and the last update direction $\delta=\delta_k$, based on [Fle87] adapted to manifolds:

\[β_k = \frac{ \lVert X_{k+1} \rVert_{p_{k+1}}^2 } {\langle -\delta_k,X_k \rangle_{p_k}}.\]

See also conjugate_gradient_descent

Constructor

ConjugateDescentCoefficient(a::StoreStateAction=())

Construct the conjugate descent coefficient update rule, a new storage is created by default.

source
Manopt.DaiYuanCoefficientType
DaiYuanCoefficient <: DirectionUpdateRule

Computes an update coefficient for the conjugate gradient method, where the ConjugateGradientDescentStatecgds include the last iterates $p_k,X_k$, the current iterates $p_{k+1},X_{k+1}$ of the iterate and the gradient, respectively, and the last update direction $\delta=\delta_k$, based on [DY99] adapted to manifolds:

Let $\nu_k = X_{k+1} - P_{p_{k+1}\gets p_k}X_k$, where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$.

Then the coefficient reads

\[β_k = \frac{ \lVert X_{k+1} \rVert_{p_{k+1}}^2 } {\langle P_{p_{k+1}\gets p_k}\delta_k, \nu_k \rangle_{p_{k+1}}}.\]

See also conjugate_gradient_descent

Constructor

function DaiYuanCoefficient(
    M::AbstractManifold=DefaultManifold(2);
    t::AbstractVectorTransportMethod=default_vector_transport_method(M)
)

Construct the Dai—Yuan coefficient update rule, where the parallel transport is the default vector transport and a new storage is created by default.

source
Manopt.FletcherReevesCoefficientType
FletcherReevesCoefficient <: DirectionUpdateRule

Computes an update coefficient for the conjugate gradient method, where the ConjugateGradientDescentStatecgds include the last iterates $p_k,X_k$, the current iterates $p_{k+1},X_{k+1}$ of the iterate and the gradient, respectively, and the last update direction $\delta=\delta_k$, based on [FR64] adapted to manifolds:

\[β_k = \frac{\lVert X_{k+1}\rVert_{p_{k+1}}^2}{\lVert X_k\rVert_{x_{k}}^2}.\]

See also conjugate_gradient_descent

Constructor

FletcherReevesCoefficient(a::StoreStateAction=())

Construct the Fletcher—Reeves coefficient update rule, a new storage is created by default.

source
Manopt.HagerZhangCoefficientType
HagerZhangCoefficient <: DirectionUpdateRule

Computes an update coefficient for the conjugate gradient method, where the ConjugateGradientDescentStatecgds include the last iterates $p_k,X_k$, the current iterates $p_{k+1},X_{k+1}$ of the iterate and the gradient, respectively, and the last update direction $\delta=\delta_k$, based on [HZ05]. adapted to manifolds: let $\nu_k = X_{k+1} - P_{p_{k+1}\gets p_k}X_k$, where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$.

\[β_k = \Bigl\langle\nu_k - \frac{ 2\lVert \nu_k\rVert_{p_{k+1}}^2 }{ \langle P_{p_{k+1}\gets p_k}\delta_k, \nu_k \rangle_{p_{k+1}} } P_{p_{k+1}\gets p_k}\delta_k, \frac{X_{k+1}}{ \langle P_{p_{k+1}\gets p_k}\delta_k, \nu_k \rangle_{p_{k+1}} } \Bigr\rangle_{p_{k+1}}.\]

This method includes a numerical stability proposed by those authors.

See also conjugate_gradient_descent

Constructor

function HagerZhangCoefficient(t::AbstractVectorTransportMethod)
function HagerZhangCoefficient(M::AbstractManifold = DefaultManifold(2))

Construct the Hager Zhang coefficient update rule, where the parallel transport is the default vector transport and a new storage is created by default.

source
Manopt.HestenesStiefelCoefficientType
HestenesStiefelCoefficient <: DirectionUpdateRule

Computes an update coefficient for the conjugate gradient method, where the ConjugateGradientDescentStatecgds include the last iterates $p_k,X_k$, the current iterates $p_{k+1},X_{k+1}$ of the iterate and the gradient, respectively, and the last update direction $\delta=\delta_k$, based on [HS52] adapted to manifolds as follows:

Let $\nu_k = X_{k+1} - P_{p_{k+1}\gets p_k}X_k$. Then the update reads

\[β_k = \frac{\langle X_{k+1}, \nu_k \rangle_{p_{k+1}} } { \langle P_{p_{k+1}\gets p_k} \delta_k, \nu_k\rangle_{p_{k+1}} },\]

where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$.

Constructor

function HestenesStiefelCoefficient(transport_method::AbstractVectorTransportMethod)
function HestenesStiefelCoefficient(M::AbstractManifold = DefaultManifold(2))

Construct the Heestens Stiefel coefficient update rule, where the parallel transport is the default vector transport and a new storage is created by default.

See also conjugate_gradient_descent

source
Manopt.LiuStoreyCoefficientType
LiuStoreyCoefficient <: DirectionUpdateRule

Computes an update coefficient for the conjugate gradient method, where the ConjugateGradientDescentStatecgds include the last iterates $p_k,X_k$, the current iterates $p_{k+1},X_{k+1}$ of the iterate and the gradient, respectively, and the last update direction $\delta=\delta_k$, based on [LS91] adapted to manifolds:

Let $\nu_k = X_{k+1} - P_{p_{k+1}\gets p_k}X_k$, where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$.

Then the coefficient reads

\[β_k = - \frac{ \langle X_{k+1},\nu_k \rangle_{p_{k+1}} } {\langle \delta_k,X_k \rangle_{p_k}}.\]

See also conjugate_gradient_descent

Constructor

function LiuStoreyCoefficient(t::AbstractVectorTransportMethod)
function LiuStoreyCoefficient(M::AbstractManifold = DefaultManifold(2))

Construct the Lui Storey coefficient update rule, where the parallel transport is the default vector transport and a new storage is created by default.

source
Manopt.PolakRibiereCoefficientType
PolakRibiereCoefficient <: DirectionUpdateRule

Computes an update coefficient for the conjugate gradient method, where the ConjugateGradientDescentStatecgds include the last iterates $p_k,X_k$, the current iterates $p_{k+1},X_{k+1}$ of the iterate and the gradient, respectively, and the last update direction $\delta=\delta_k$, based on [PR69] and [Pol69] adapted to manifolds:

Let $\nu_k = X_{k+1} - P_{p_{k+1}\gets p_k}X_k$, where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$.

Then the update reads

\[β_k = \frac{ \langle X_{k+1}, \nu_k \rangle_{p_{k+1}} } {\lVert X_k \rVert_{p_k}^2 }.\]

Constructor

function PolakRibiereCoefficient(
    M::AbstractManifold=DefaultManifold(2);
    t::AbstractVectorTransportMethod=default_vector_transport_method(M)
)

Construct the PolakRibiere coefficient update rule, where the parallel transport is the default vector transport and a new storage is created by default.

See also conjugate_gradient_descent

source

Technical details

The conjugate_gradient_descent solver requires the following functions of a manifold to be available

  • A retract!(M, q, p, X); it is recommended to set the default_retraction_method to a favourite retraction. If this default is set, a retraction_method= does not have to be specified.
  • A vector_transport_to!M, Y, p, X, q); it is recommended to set the default_vector_transport_method to a favourite retraction. If this default is set, a vector_transport_method= or vector_transport_method_dual= (for $\mathcal N$) does not have to be specified.
  • By default gradient descent uses ArmijoLinesearch which requires max_stepsize(M) to be set and an implementation of inner(M, p, X).
  • By default the stopping criterion uses the norm as well, to stop when the norm of the gradient is small, but if you implemented inner, the norm is provided already.
  • By default the tangent vector storing the gradient is initialized calling zero_vector(M,p).

Literature

[Bea72]
E. M. Beale. A derivation of conjugate gradients. In: Numerical methods for nonlinear optimization, edited by F. A. Lootsma (Academic Press, London, London, 1972); pp. 39–43.
[DY99]
Y. H. Dai and Y. Yuan. A Nonlinear Conjugate Gradient Method with a Strong Global Convergence Property. SIAM Journal on Optimization 10, 177–182 (1999).
[Fle87]
R. Fletcher. Practical Methods of Optimization. 2 Edition, A Wiley-Interscience Publication (John Wiley & Sons Ltd., 1987).
[FR64]
R. Fletcher and C. M. Reeves. Function minimization by conjugate gradients. The Computer Journal 7, 149–154 (1964).
[HZ06]
W. W. Hager and H. Zhang. A survey of nonlinear conjugate gradient methods. Pacific Journal of Optimization 2, 35–58 (2006).
[HZ05]
W. W. Hager and H. Zhang. A New Conjugate Gradient Method with Guaranteed Descent and an Efficient Line Search. SIAM Journal on Optimization 16, 170–192 (2005).
[HS52]
M. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards 49, 409 (1952).
[LS91]
Y. Liu and C. Storey. Efficient generalized conjugate gradient algorithms, part 1: Theory. Journal of Optimization Theory and Applications 69, 129–137 (1991).
[PR69]
E. Polak and G. Ribière. Note sur la convergence de méthodes de directions conjuguées. Revue française d’informatique et de recherche opérationnelle 3, 35–43 (1969).
[Pol69]
B. T. Polyak. The conjugate gradient method in extremal problems. USSR Computational Mathematics and Mathematical Physics 9, 94–112 (1969).
[Pow77]
M. J. Powell. Restart procedures for the conjugate gradient method. Mathematical Programming 12, 241–254 (1977).