Conjugate gradient descent

Manopt.conjugate_gradient_descent — Function

conjugate_gradient_descent(M, F, gradF, p=rand(M))
conjugate_gradient_descent(M, gradient_objective, p)

perform a conjugate gradient based descent

\[p_{k+1} = \operatorname{retr}_{p_k} \bigl( s_kδ_k \bigr),\]

where $\operatorname{retr}$ denotes a retraction on the Manifold M and one can employ different rules to update the descent direction $δ_k$ based on the last direction $δ_{k-1}$ and both gradients $\operatorname{grad}f(x_k)$,$\operatorname{grad}f(x_{k-1})$. The Stepsize $s_k$ may be determined by a Linesearch.

Alternatively to f and grad_f you can provide the AbstractManifoldGradientObjective gradient_objective directly.

Available update rules are SteepestDirectionUpdateRule, which yields a gradient_descent, ConjugateDescentCoefficient (the default), DaiYuanCoefficient, FletcherReevesCoefficient, HagerZhangCoefficient, HestenesStiefelCoefficient, LiuStoreyCoefficient, and PolakRibiereCoefficient. These can all be combined with a ConjugateGradientBealeRestart rule.

They all compute $β_k$ such that this algorithm updates the search direction as

\[\delta_k=\operatorname{grad}f(p_k) + β_k \delta_{k-1}\]

Input

M a manifold $\mathcal M$
f a cost function $F:\mathcal M→ℝ$ to minimize implemented as a function (M,p) -> v
grad_f the gradient $\operatorname{grad}F:\mathcal M → T\mathcal M$ of $F$ implemented also as (M,x) -> X
p an initial value $x∈\mathcal M$

Optional

coefficient: (ConjugateDescentCoefficient <: DirectionUpdateRule) rule to compute the descent direction update coefficient $β_k$, as a functor, where the resulting function maps are (amp, cgs, i) -> β with amp an AbstractManoptProblem, cgs is the ConjugateGradientDescentState, and i is the current iterate.
evaluation: (AllocatingEvaluation) specify whether the gradient works by allocation (default) form gradF(M, x) or InplaceEvaluation in place of the form gradF!(M, X, x).
retraction_method: (default_retraction_method(M, typeof(p))) a retraction method to use.
stepsize: (ArmijoLinesearch via default_stepsize) A Stepsize function applied to the search direction. The default is a constant step size 1.
stopping_criterion: (stopWhenAny( stopAtIteration(200), stopGradientNormLess(10.0^-8))) a function indicating when to stop.
vector_transport_method: (default_vector_transport_method(M, typeof(p))) vector transport method to transport the old descent direction when computing the new descent direction.

If you provide the ManifoldGradientObjective directly, evaluation is ignored.

Output

the obtained (approximate) minimizer $p^*$, see get_solver_return for details

source

Manopt.conjugate_gradient_descent! — Function

conjugate_gradient_descent!(M, F, gradF, x)
conjugate_gradient_descent!(M, gradient_objective, p; kwargs...)

perform a conjugate gradient based descent in place of x as

\[p_{k+1} = \operatorname{retr}_{p_k} \bigl( s_k\delta_k \bigr),\]

where $\operatorname{retr}$ denotes a retraction on the Manifold M

Input

M: a manifold $\mathcal M$
f: a cost function $F:\mathcal M→ℝ$ to minimize
grad_f: the gradient $\operatorname{grad}F:\mathcal M→ T\mathcal M$ of F
p: an initial value $p∈\mathcal M$

Alternatively to f and grad_f you can provide the AbstractManifoldGradientObjective gradient_objective directly.

for more details and options, especially the DirectionUpdateRules, see conjugate_gradient_descent.

source

State

Manopt.ConjugateGradientDescentState — Type

ConjugateGradientState <: AbstractGradientSolverState

specify options for a conjugate gradient descent algorithm, that solves a [DefaultManoptProblem].

Fields

p: the current iterate, a point on a manifold
X: the current gradient, also denoted as $ξ$ or $X_k$ for the gradient in the $k$th step.
δ: the current descent direction, also a tangent vector
β: the current update coefficient rule, see .
coefficient: (ConjugateDescentCoefficient()) a DirectionUpdateRule function to determine the new β
stepsize: (default_stepsize(M, ConjugateGradientDescentState; retraction_method=retraction_method)) a Stepsize function
stop: (StopAfterIteration(500) |StopWhenGradientNormLess(1e-8)) a StoppingCriterion
retraction_method: (default_retraction_method(M, typeof(p))) a type of retraction
vector_transport_method: (default_retraction_method(M, typeof(p))) a type of retraction

Constructor

ConjugateGradientState(M, p)

where the last five fields can be set by their names as keyword and the X can be set to a tangent vector type using the keyword initial_gradient which defaults to zero_vector(M,p), and δ is initialized to a copy of this vector.

source

Available coefficients

The update rules act as DirectionUpdateRule, which internally always first evaluate the gradient itself.

Manopt.ConjugateGradientBealeRestart — Type

ConjugateGradientBealeRestart <: DirectionUpdateRule

An update rule might require a restart, that is using pure gradient as descent direction, if the last two gradients are nearly orthogonal, see [HZ06, page 12] (in the preprint, page 46 in Journal page numbers). This method is named after E. Beale from his proceedings paper in 1972 [Bea72]. This method acts as a decorator to any existing DirectionUpdateRule direction_update.

When obtain from the ConjugateGradientDescentStatecgs the last $p_k,X_k$ and the current $p_{k+1},X_{k+1}$ iterate and the gradient, respectively.

Then a restart is performed, hence $β_k = 0$ returned if

\[ \frac{ ⟨X_{k+1}, P_{p_{k+1}\gets p_k}X_k⟩}{\lVert X_k \rVert_{p_k}} > ξ,\]

where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$, and $ξ$ is the threshold. The default threshold is chosen as 0.2 as recommended in [Pow77]

Constructor

ConjugateGradientBealeRestart(
    direction_update::D,
    threshold=0.2;
    manifold::AbstractManifold = DefaultManifold(),
    vector_transport_method::V=default_vector_transport_method(manifold),
)

source

Manopt.ConjugateDescentCoefficient — Type

ConjugateDescentCoefficient <: DirectionUpdateRule

\[β_k = \frac{ \lVert X_{k+1} \rVert_{p_{k+1}}^2 } {\langle -\delta_k,X_k \rangle_{p_k}}.\]

See also conjugate_gradient_descent

Constructor

ConjugateDescentCoefficient(a::StoreStateAction=())

Construct the conjugate descent coefficient update rule, a new storage is created by default.

source

Manopt.DaiYuanCoefficient — Type

DaiYuanCoefficient <: DirectionUpdateRule

Let $\nu_k = X_{k+1} - P_{p_{k+1}\gets p_k}X_k$, where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$.

Then the coefficient reads

\[β_k = \frac{ \lVert X_{k+1} \rVert_{p_{k+1}}^2 } {\langle P_{p_{k+1}\gets p_k}\delta_k, \nu_k \rangle_{p_{k+1}}}.\]

See also conjugate_gradient_descent

Constructor

function DaiYuanCoefficient(
    M::AbstractManifold=DefaultManifold(2);
    t::AbstractVectorTransportMethod=default_vector_transport_method(M)
)

Construct the Dai—Yuan coefficient update rule, where the parallel transport is the default vector transport and a new storage is created by default.

source

Manopt.FletcherReevesCoefficient — Type

FletcherReevesCoefficient <: DirectionUpdateRule

\[β_k = \frac{\lVert X_{k+1}\rVert_{p_{k+1}}^2}{\lVert X_k\rVert_{x_{k}}^2}.\]

See also conjugate_gradient_descent

Constructor

FletcherReevesCoefficient(a::StoreStateAction=())

Construct the Fletcher—Reeves coefficient update rule, a new storage is created by default.

source

Manopt.HagerZhangCoefficient — Type

HagerZhangCoefficient <: DirectionUpdateRule

Computes an update coefficient for the conjugate gradient method, where the ConjugateGradientDescentStatecgds include the last iterates $p_k,X_k$, the current iterates $p_{k+1},X_{k+1}$ of the iterate and the gradient, respectively, and the last update direction $\delta=\delta_k$, based on [HZ05]. adapted to manifolds: let $\nu_k = X_{k+1} - P_{p_{k+1}\gets p_k}X_k$, where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$.

\[β_k = \Bigl\langle\nu_k - \frac{ 2\lVert \nu_k\rVert_{p_{k+1}}^2 }{ \langle P_{p_{k+1}\gets p_k}\delta_k, \nu_k \rangle_{p_{k+1}} } P_{p_{k+1}\gets p_k}\delta_k, \frac{X_{k+1}}{ \langle P_{p_{k+1}\gets p_k}\delta_k, \nu_k \rangle_{p_{k+1}} } \Bigr\rangle_{p_{k+1}}.\]

This method includes a numerical stability proposed by those authors.

See also conjugate_gradient_descent

Constructor

function HagerZhangCoefficient(t::AbstractVectorTransportMethod)
function HagerZhangCoefficient(M::AbstractManifold = DefaultManifold(2))

Construct the Hager Zhang coefficient update rule, where the parallel transport is the default vector transport and a new storage is created by default.

source

Manopt.HestenesStiefelCoefficient — Type

HestenesStiefelCoefficient <: DirectionUpdateRule

Let $\nu_k = X_{k+1} - P_{p_{k+1}\gets p_k}X_k$. Then the update reads

\[β_k = \frac{\langle X_{k+1}, \nu_k \rangle_{p_{k+1}} } { \langle P_{p_{k+1}\gets p_k} \delta_k, \nu_k\rangle_{p_{k+1}} },\]

where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$.

Constructor

function HestenesStiefelCoefficient(transport_method::AbstractVectorTransportMethod)
function HestenesStiefelCoefficient(M::AbstractManifold = DefaultManifold(2))

Construct the Heestens Stiefel coefficient update rule, where the parallel transport is the default vector transport and a new storage is created by default.

See also conjugate_gradient_descent

source

Manopt.LiuStoreyCoefficient — Type

LiuStoreyCoefficient <: DirectionUpdateRule

Let $\nu_k = X_{k+1} - P_{p_{k+1}\gets p_k}X_k$, where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$.

Then the coefficient reads

\[β_k = - \frac{ \langle X_{k+1},\nu_k \rangle_{p_{k+1}} } {\langle \delta_k,X_k \rangle_{p_k}}.\]

See also conjugate_gradient_descent

Constructor

function LiuStoreyCoefficient(t::AbstractVectorTransportMethod)
function LiuStoreyCoefficient(M::AbstractManifold = DefaultManifold(2))

Construct the Lui Storey coefficient update rule, where the parallel transport is the default vector transport and a new storage is created by default.

source

Manopt.PolakRibiereCoefficient — Type

PolakRibiereCoefficient <: DirectionUpdateRule

Let $\nu_k = X_{k+1} - P_{p_{k+1}\gets p_k}X_k$, where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$.

Then the update reads

\[β_k = \frac{ \langle X_{k+1}, \nu_k \rangle_{p_{k+1}} } {\lVert X_k \rVert_{p_k}^2 }.\]

Constructor

function PolakRibiereCoefficient(
    M::AbstractManifold=DefaultManifold(2);
    t::AbstractVectorTransportMethod=default_vector_transport_method(M)
)

Construct the PolakRibiere coefficient update rule, where the parallel transport is the default vector transport and a new storage is created by default.

See also conjugate_gradient_descent

source

Manopt.SteepestDirectionUpdateRule — Type

SteepestDirectionUpdateRule <: DirectionUpdateRule

The simplest rule to update is to have no influence of the last direction and hence return an update $β = 0$ for all ConjugateGradientDescentStatecgds

See also conjugate_gradient_descent

source

Technical details

The conjugate_gradient_descent solver requires the following functions of a manifold to be available

A retract!(M, q, p, X); it is recommended to set the default_retraction_method to a favourite retraction. If this default is set, a retraction_method= does not have to be specified.
A vector_transport_to!M, Y, p, X, q); it is recommended to set the default_vector_transport_method to a favourite retraction. If this default is set, a vector_transport_method= or vector_transport_method_dual= (for $\mathcal N$) does not have to be specified.
By default gradient descent uses ArmijoLinesearch which requires max_stepsize(M) to be set and an implementation of inner(M, p, X).
By default the stopping criterion uses the norm as well, to stop when the norm of the gradient is small, but if you implemented inner, the norm is provided already.
By default the tangent vector storing the gradient is initialized calling zero_vector(M,p).

Literature

[Bea72]: E. M. Beale. A derivation of conjugate gradients. In: Numerical methods for nonlinear optimization, edited by F. A. Lootsma (Academic Press, London, London, 1972); pp. 39–43.
[DY99]: Y. H. Dai and Y. Yuan. A Nonlinear Conjugate Gradient Method with a Strong Global Convergence Property. SIAM Journal on Optimization 10, 177–182 (1999).
[Fle87]: R. Fletcher. Practical Methods of Optimization. 2 Edition, A Wiley-Interscience Publication (John Wiley & Sons Ltd., 1987).
[FR64]: R. Fletcher and C. M. Reeves. Function minimization by conjugate gradients. The Computer Journal 7, 149–154 (1964).
[HZ06]: W. W. Hager and H. Zhang. A survey of nonlinear conjugate gradient methods. Pacific Journal of Optimization 2, 35–58 (2006).
[HZ05]: W. W. Hager and H. Zhang. A New Conjugate Gradient Method with Guaranteed Descent and an Efficient Line Search. SIAM Journal on Optimization 16, 170–192 (2005).
[HS52]: M. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards 49, 409 (1952).
[LS91]: Y. Liu and C. Storey. Efficient generalized conjugate gradient algorithms, part 1: Theory. Journal of Optimization Theory and Applications 69, 129–137 (1991).
[PR69]: E. Polak and G. Ribière. Note sur la convergence de méthodes de directions conjuguées. Revue française d’informatique et de recherche opérationnelle 3, 35–43 (1969).
[Pol69]: B. T. Polyak. The conjugate gradient method in extremal problems. USSR Computational Mathematics and Mathematical Physics 9, 94–112 (1969).
[Pow77]: M. J. Powell. Restart procedures for the conjugate gradient method. Mathematical Programming 12, 241–254 (1977).