Conjugate gradient descent

Manopt.conjugate_gradient_descentFunction
conjugate_gradient_descent(M, f, grad_f, p=rand(M))
conjugate_gradient_descent!(M, f, grad_f, p)
conjugate_gradient_descent(M, gradient_objective, p)
conjugate_gradient_descent!(M, gradient_objective, p; kwargs...)

perform a conjugate gradient based descent-

\[p_{k+1} = \operatorname{retr}_{p_k} \bigl( s_kδ_k \bigr),\]

where $\operatorname{retr}$ denotes a retraction on the Manifold M and one can employ different rules to update the descent direction $δ_k$ based on the last direction $δ_{k-1}$ and both gradients $\operatorname{grad}f(x_k)$,$\operatorname{grad} f(x_{k-1})$. The Stepsize $s_k$ may be determined by a Linesearch.

Alternatively to f and grad_f you can provide the AbstractManifoldFirstOrderObjective gradient_objective directly.

Available update rules are SteepestDescentCoefficientRule, which yields a gradient_descent, ConjugateDescentCoefficient (the default), DaiYuanCoefficientRule, FletcherReevesCoefficient, HagerZhangCoefficient, HestenesStiefelCoefficient, LiuStoreyCoefficient, and PolakRibiereCoefficient. These can all be combined with a ConjugateGradientBealeRestartRule rule.

They all compute $β_k$ such that this algorithm updates the search direction as

\[δ_k=\operatorname{grad}f(p_k) + β_k \delta_{k-1}\]

Input

  • M::AbstractManifold: a Riemannian manifold $\mathcal M$
  • f: a cost function $f: \mathcal M→ ℝ$ implemented as (M, p) -> v
  • grad_f: the (Riemannian) gradient $\operatorname{grad}f: \mathcal M → T_{p}\mathcal M$ of f as a function (M, p) -> X or a function (M, X, p) -> X computing X in-place
  • p: a point on the manifold $\mathcal M$

Keyword arguments

If you provide the ManifoldFirstOrderObjective directly, the evaluation= keyword is ignored. The decorations are still applied to the objective.

Output

The obtained approximate minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return for details, especially the return_state= keyword.

source
Manopt.conjugate_gradient_descent!Function
conjugate_gradient_descent(M, f, grad_f, p=rand(M))
conjugate_gradient_descent!(M, f, grad_f, p)
conjugate_gradient_descent(M, gradient_objective, p)
conjugate_gradient_descent!(M, gradient_objective, p; kwargs...)

perform a conjugate gradient based descent-

\[p_{k+1} = \operatorname{retr}_{p_k} \bigl( s_kδ_k \bigr),\]

where $\operatorname{retr}$ denotes a retraction on the Manifold M and one can employ different rules to update the descent direction $δ_k$ based on the last direction $δ_{k-1}$ and both gradients $\operatorname{grad}f(x_k)$,$\operatorname{grad} f(x_{k-1})$. The Stepsize $s_k$ may be determined by a Linesearch.

Alternatively to f and grad_f you can provide the AbstractManifoldFirstOrderObjective gradient_objective directly.

Available update rules are SteepestDescentCoefficientRule, which yields a gradient_descent, ConjugateDescentCoefficient (the default), DaiYuanCoefficientRule, FletcherReevesCoefficient, HagerZhangCoefficient, HestenesStiefelCoefficient, LiuStoreyCoefficient, and PolakRibiereCoefficient. These can all be combined with a ConjugateGradientBealeRestartRule rule.

They all compute $β_k$ such that this algorithm updates the search direction as

\[δ_k=\operatorname{grad}f(p_k) + β_k \delta_{k-1}\]

Input

  • M::AbstractManifold: a Riemannian manifold $\mathcal M$
  • f: a cost function $f: \mathcal M→ ℝ$ implemented as (M, p) -> v
  • grad_f: the (Riemannian) gradient $\operatorname{grad}f: \mathcal M → T_{p}\mathcal M$ of f as a function (M, p) -> X or a function (M, X, p) -> X computing X in-place
  • p: a point on the manifold $\mathcal M$

Keyword arguments

If you provide the ManifoldFirstOrderObjective directly, the evaluation= keyword is ignored. The decorations are still applied to the objective.

Output

The obtained approximate minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return for details, especially the return_state= keyword.

source

State

Manopt.ConjugateGradientDescentStateType
ConjugateGradientState <: AbstractGradientSolverState

specify options for a conjugate gradient descent algorithm, that solves a [DefaultManoptProblem].

Fields

  • p::P: a point on the manifold $\mathcal M$ storing the current iterate
  • X::T: a tangent vector at the point $p$ on the manifold $\mathcal M$
  • δ: the current descent direction, also a tangent vector
  • β: the current update coefficient rule, see .
  • coefficient: function to determine the new β
  • stepsize::Stepsize: a functor inheriting from Stepsize to determine a step size
  • stop::StoppingCriterion: a functor indicating that the stopping criterion is fulfilled
  • retraction_method::AbstractRetractionMethod: a retraction $\operatorname{retr}$ to use, see the section on retractions
  • vector_transport_method::AbstractVectorTransportMethodP: a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports

Constructor

ConjugateGradientState(M::AbstractManifold; kwargs...)

where the last five fields can be set by their names as keyword and the X can be set to a tangent vector type using the keyword initial_gradient which defaults to zero_vector(M,p), and δ is initialized to a copy of this vector.

Keyword arguments

The following fields from above <re keyword arguments

See also

conjugate_gradient_descent, DefaultManoptProblem, ArmijoLinesearch

source

Available coefficients

The update rules act as DirectionUpdateRule, which internally always first evaluate the gradient itself.

Manopt.ConjugateDescentCoefficientFunction
ConjugateDescentCoefficient()
ConjugateDescentCoefficient(M::AbstractManifold)

Compute the (classical) conjugate gradient coefficient based on [Fle87] adapted to manifolds

Denote the last iterate and gradient by $p_k,X_k$, the current iterate and gradient by $p_{k+1}, X_{k+1}$, respectively, as well as the last update direction by $δ_k$.

Then the coefficient reads

\[β_k = \frac{\mathrm{D}_{}f(p_{k+1})[X_{k+1}]}{\mathrm{D}_{}f(p_k)[-δ_k]} = \frac{\lVert X_{k+1} \rVert_{p_{k+1}}^2}{⟨-δ_k,X_k⟩_{p_k}}\]

The second one it the one usually stated, while the first one avoids to use the metric inner. The first one is implemented here, but falls back to calling inner if there is no dedicated differential available.

Info

This function generates a ManifoldDefaultsFactory for ConjugateDescentCoefficientRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.

source
Manopt.ConjugateGradientBealeRestartFunction
ConjugateGradientBealeRestart(direction_update::Union{DirectionUpdateRule,ManifoldDefaultsFactory}; kwargs...)
ConjugateGradientBealeRestart(M::AbstractManifold, direction_update::Union{DirectionUpdateRule,ManifoldDefaultsFactory}; kwargs...)

Compute a conjugate gradient coefficient with a potential restart, when two directions are nearly orthogonal. See [HZ06, page 12] (in the preprint, page 46 in Journal page numbers). This method is named after E. Beale from his proceedings paper in 1972 [Bea72]. This method acts as a decorator to any existing DirectionUpdateRule direction_update.

Denote the last iterate and gradient by $p_k,X_k$, the current iterate and gradient by $p_{k+1}, X_{k+1}$, respectively, as well as the last update direction by $δ_k$.

Then a restart is performed, hence $β_k = 0$ returned if

\[ \frac{⟨X_{k+1}, \mathcal T_{p_{k+1}←p_k}X_k⟩}{\lVert X_k \rVert_{p_k}} > ε,\]

where $ε$ is the threshold, which is set by default to 0.2, see [Pow77]

Input

Keyword arguments

Info

This function generates a ManifoldDefaultsFactory for ConjugateGradientBealeRestartRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.

source
Manopt.DaiYuanCoefficientFunction
DaiYuanCoefficient(; kwargs...)
DaiYuanCoefficient(M::AbstractManifold; kwargs...)

Computes an update coefficient for the conjugate_gradient_descent algorithm based on [DY99] adapted to Riemannian manifolds.

Denote the last iterate and gradient by $p_k,X_k$, the current iterate and gradient by $p_{k+1}, X_{k+1}$, respectively, as well as the last update direction by $δ_k$.

Let $ν_k = X_{k+1} - \mathcal T_{p_{k+1}←p_k}X_k$, where $\mathcal T_{⋅←⋅}$ denotes a vector transport.

Then the coefficient reads

\[β_k = = \frac{\mathrm{D}_{}f(p_{k+1})[X_{k+1}]}{⟨δ_k,ν_k⟩_{p_{k+1}}} = \frac{\lVert X_{k+1} \rVert_{p_{k+1}}^2}{⟨\mathcal T_{p_{k+1}←p_k}δ_k, ν_k⟩_{p_{k+1}}}\]

The second one it the one usually stated, while the first one avoids to use the metric inner. The first one is implemented here, but falls back to calling inner if there is no dedicated differential available.

Keyword arguments

Info

This function generates a ManifoldDefaultsFactory for DaiYuanCoefficientRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.

source
Manopt.FletcherReevesCoefficientFunction
FletcherReevesCoefficient()
FletcherReevesCoefficient(M::AbstractManifold)

Computes an update coefficient for the conjugate_gradient_descent algorithm based on [FR64] adapted to manifolds

Denote the last iterate and gradient by $p_k,X_k$, the current iterate and gradient by $p_{k+1}, X_{k+1}$, respectively, as well as the last update direction by $δ_k$.

Then the coefficient reads

\[β_k = \frac{\mathrm{D}_{}f(p_{k+1})[X_{k+1}]}{\mathrm{D}_{}f(p_k)[X_k]} = \frac{\lVert X_{k+1} \rVert_{p_{k+1}}^2}{\lVert X_k \rVert_{p_k}^2}\]

The second one it the one usually stated, while the first one avoids to use the metric inner. The first one is implemented here, but falls back to calling inner if there is no dedicated differential available.

Info

This function generates a ManifoldDefaultsFactory for FletcherReevesCoefficientRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.

source
Manopt.HagerZhangCoefficientFunction
HagerZhangCoefficient(; kwargs...)
HagerZhangCoefficient(M::AbstractManifold; kwargs...)

Computes an update coefficient for the conjugate_gradient_descent algorithm based on [FR64] adapted to manifolds

Denote the last iterate and gradient by $p_k,X_k$, the current iterate and gradient by $p_{k+1}, X_{k+1}$, respectively, as well as the last update direction by $δ_k$.

Let $ν_k = X_{k+1} - \mathcal T_{p_{k+1}←p_k}X_k$, where $\mathcal T_{⋅←⋅}$ denotes a vector transport.

Then the coefficient reads

\[β_k = \Bigl⟨ν_k - \frac{2\lVert ν_k \rVert_{p_{k+1}}^2}{⟨\mathcal T_{p_{k+1}←p_k}δ_k, ν_k⟩_{p_{k+1}}} \mathcal T_{p_{k+1}←p_k}δ_k, \frac{X_{k+1}}{⟨\mathcal T_{p_{k+1}←p_k}δ_k, ν_k⟩_{p_{k+1}}} \Bigr⟩_{p_{k+1}}.\]

This method includes a numerical stability proposed by those authors.

Keyword arguments

Info

This function generates a ManifoldDefaultsFactory for HagerZhangCoefficientRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.

source
Manopt.HestenesStiefelCoefficientFunction
HestenesStiefelCoefficient(; kwargs...)
HestenesStiefelCoefficient(M::AbstractManifold; kwargs...)

Computes an update coefficient for the conjugate_gradient_descent algorithm based on [HS52] adapted to manifolds

Denote the last iterate and gradient by $p_k,X_k$, the current iterate and gradient by $p_{k+1}, X_{k+1}$, respectively, as well as the last update direction by $δ_k$.

Let $ν_k = X_{k+1} - \mathcal T_{p_{k+1}←p_k}X_k$, where $\mathcal T_{⋅←⋅}$ denotes a vector transport.

Then the coefficient reads

\[\begin{aligned} β_k &= \frac{\mathrm{D}_{}f(p_{k+1})[ν_k]}{\mathrm{D}_{}f(p_{k+1})[\mathcal T_{p_{k+1}←p_k}δ_k] - \mathrm{D}_{}f(p_k)[δ_k]} \\&= \frac{⟨X_{k+1},ν_k⟩_{p_{k+1}}}{⟨\mathcal T_{p_{k+1}←p_k}δ_k,X_{k+1}⟩_{p_{k+1}} - ⟨δ_k,X_k⟩_{p_{k}}} \\&= \frac{⟨X_{k+1},ν_k⟩_{p_{k+1}}}{⟨\mathcal T_{p_{k+1}←p_k}δ_k,ν_k⟩_{p_{k+1}}}, \end{aligned}\]

The third one is the one usually stated, while the first one avoids to use the metric inner. The first one is implemented here, but falls back to calling inner if there is no dedicated differential available.

Keyword arguments

Info

This function generates a ManifoldDefaultsFactory for HestenesStiefelCoefficientRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.

source
Manopt.LiuStoreyCoefficientFunction
LiuStoreyCoefficient(; kwargs...)
LiuStoreyCoefficient(M::AbstractManifold; kwargs...)

Computes an update coefficient for the conjugate_gradient_descent algorithm based on [LS91] adapted to manifolds

Denote the last iterate and gradient by $p_k,X_k$, the current iterate and gradient by $p_{k+1}, X_{k+1}$, respectively, as well as the last update direction by $δ_k$.

Let $ν_k = X_{k+1} - \mathcal T_{p_{k+1}←p_k}X_k$, where $\mathcal T_{⋅←⋅}$ denotes a vector transport.

Then the coefficient reads

\[β_k = - \frac{\mathrm{D}_{}f(p_{k+1})[ν_k]}{\mathrm{D}_{}f(p_k)[δ_k]} = - \frac{⟨X_{k+1},ν_k⟩_{p_{k+1}}}{⟨δ_k,X_k⟩_{p_k}}.\]

The second one it the one usually stated, while the first one avoids to use the metric inner. The first one is implemented here, but falls back to calling inner if there is no dedicated differential available.

Keyword arguments

Info

This function generates a ManifoldDefaultsFactory for LiuStoreyCoefficientRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.

source
Manopt.PolakRibiereCoefficientFunction
PolakRibiereCoefficient(; kwargs...)
PolakRibiereCoefficient(M::AbstractManifold; kwargs...)

Computes an update coefficient for the conjugate_gradient_descent algorithm based on [PR69] adapted to Riemannian manifolds.

Denote the last iterate and gradient by $p_k,X_k$, the current iterate and gradient by $p_{k+1}, X_{k+1}$, respectively, as well as the last update direction by $δ_k$.

Let $ν_k = X_{k+1} - \mathcal T_{p_{k+1}←p_k}X_k$, where $\mathcal T_{⋅←⋅}$ denotes a vector transport.

Then the coefficient reads

\[β_k = \frac{\mathrm{D}_{}f(p_{k+1})[ν_k]}{\mathrm{D}_{}f(p_k)[X_k]} = \frac{⟨X_{k+1},ν_k⟩_{p_{k+1}}}{\lVert X_k \rVert_{{p_k}}^2}.\]

The second one is the one usually stated, while the first one avoids to use the metric inner. The first one is implemented here, but falls back to calling inner if there is no dedicated differential available.

Keyword arguments

Info

This function generates a ManifoldDefaultsFactory for PolakRibiereCoefficientRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.

source
Manopt.SteepestDescentCoefficientFunction
SteepestDescentCoefficient()
SteepestDescentCoefficient(M::AbstractManifold)

Computes an update coefficient for the conjugate_gradient_descent algorithm so that is falls back to a gradient_descent method, that is

\[β_k = 0\]

Info

This function generates a ManifoldDefaultsFactory for SteepestDescentCoefficient. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.

source

Internal rules for coefficients

Manopt.ConjugateGradientBealeRestartRuleType
ConjugateGradientBealeRestartRule <: DirectionUpdateRule

A functor (problem, state, k) -> β_k to compute the conjugate gradient update coefficient based on a restart idea of [Bea72], following [HZ06, page 12] adapted to manifolds.

Fields

  • direction_update::DirectionUpdateRule: the actual rule, that is restarted
  • threshold::Real: a threshold for the restart check.
  • vector_transport_method::AbstractVectorTransportMethodP: a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports

Constructor

ConjugateGradientBealeRestartRule(
    direction_update::Union{DirectionUpdateRule,ManifoldDefaultsFactory};
    kwargs...
)
ConjugateGradientBealeRestartRule(
    M::AbstractManifold=DefaultManifold(),
    direction_update::Union{DirectionUpdateRule,ManifoldDefaultsFactory};
    kwargs...
)

Construct the Beale restart coefficient update rule adapted to manifolds.

Input

Keyword arguments

See also

ConjugateGradientBealeRestart, conjugate_gradient_descent

source
Manopt.DaiYuanCoefficientRuleType
DaiYuanCoefficientRule <: DirectionUpdateRule

A functor (problem, state, k) -> β_k to compute the conjugate gradient update coefficient based on [DY99] adapted to manifolds

Fields

Constructor

DaiYuanCoefficientRule(M::AbstractManifold; kwargs...)

Construct the Dai—Yuan coefficient update rule.

Keyword arguments

See also

DaiYuanCoefficient, conjugate_gradient_descent

source
Manopt.HagerZhangCoefficientRuleType
HagerZhangCoefficientRule <: DirectionUpdateRule

A functor (problem, state, k) -> β_k to compute the conjugate gradient update coefficient based on [HZ05] adapted to manifolds

Fields

Constructor

HagerZhangCoefficientRule(M::AbstractManifold; kwargs...)

Construct the Hager-Zhang coefficient update rule based on [HZ05] adapted to manifolds.

Keyword arguments

See also

HagerZhangCoefficient, conjugate_gradient_descent

source
Manopt.HestenesStiefelCoefficientRuleType
HestenesStiefelCoefficientRuleRule <: DirectionUpdateRule

A functor (problem, state, k) -> β_k to compute the conjugate gradient update coefficient based on [HS52] adapted to manifolds

Fields

Constructor

HestenesStiefelCoefficientRuleRule(M::AbstractManifold; kwargs...)

Construct the Hestenes-Stiefel coefficient update rule based on [HS52] adapted to manifolds.

Keyword arguments

See also

HestenesStiefelCoefficient, conjugate_gradient_descent

source
Manopt.LiuStoreyCoefficientRuleType
LiuStoreyCoefficientRule <: DirectionUpdateRule

A functor (problem, state, k) -> β_k to compute the conjugate gradient update coefficient based on [LS91] adapted to manifolds

Fields

Constructor

LiuStoreyCoefficientRule(M::AbstractManifold; kwargs...)

Construct the Lui-Storey coefficient update rule based on [LS91] adapted to manifolds.

Keyword arguments

See also

LiuStoreyCoefficient, conjugate_gradient_descent

source
Manopt.PolakRibiereCoefficientRuleType
PolakRibiereCoefficientRule <: DirectionUpdateRule

A functor (problem, state, k) -> β_k to compute the conjugate gradient update coefficient based on [PR69] adapted to manifolds

Fields

Constructor

PolakRibiereCoefficientRule(M::AbstractManifold; kwargs...)

Construct the Dai—Yuan coefficient update rule.

Keyword arguments

See also

PolakRibiereCoefficient, conjugate_gradient_descent

source

Technical details

The conjugate_gradient_descent solver requires the following functions of a manifold to be available

  • A retract!(M, q, p, X); it is recommended to set the default_retraction_method to a favourite retraction. If this default is set, a retraction_method= does not have to be specified.
  • A vector_transport_to!M, Y, p, X, q); it is recommended to set the default_vector_transport_method to a favourite retraction. If this default is set, a vector_transport_method= or vector_transport_method_dual= (for $\mathcal N$) does not have to be specified.
  • By default gradient descent uses ArmijoLinesearch which requires max_stepsize(M) to be set and an implementation of inner(M, p, X).
  • By default the stopping criterion uses the norm as well, to stop when the norm of the gradient is small, but if you implemented inner, the norm is provided already.
  • By default the tangent vector storing the gradient is initialized calling zero_vector(M,p).

Literature

[Bea72]
E. M. Beale. A derivation of conjugate gradients. In: Numerical methods for nonlinear optimization, edited by F. A. Lootsma (Academic Press, London, London, 1972); pp. 39–43.
[DY99]
Y. H. Dai and Y. Yuan. A Nonlinear Conjugate Gradient Method with a Strong Global Convergence Property. SIAM Journal on Optimization 10, 177–182 (1999).
[Fle87]
R. Fletcher. Practical Methods of Optimization. 2 Edition, A Wiley-Interscience Publication (John Wiley & Sons Ltd., 1987).
[FR64]
R. Fletcher and C. M. Reeves. Function minimization by conjugate gradients. The Computer Journal 7, 149–154 (1964).
[HZ06]
W. W. Hager and H. Zhang. A survey of nonlinear conjugate gradient methods. Pacific Journal of Optimization 2, 35–58 (2006).
[HZ05]
W. W. Hager and H. Zhang. A New Conjugate Gradient Method with Guaranteed Descent and an Efficient Line Search. SIAM Journal on Optimization 16, 170–192 (2005).
[HS52]
M. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards 49, 409 (1952).
[LS91]
Y. Liu and C. Storey. Efficient generalized conjugate gradient algorithms, part 1: Theory. Journal of Optimization Theory and Applications 69, 129–137 (1991).
[PR69]
E. Polak and G. Ribière. Note sur la convergence de méthodes de directions conjuguées. Revue française d’informatique et de recherche opérationnelle 3, 35–43 (1969).
[Pow77]
M. J. Powell. Restart procedures for the conjugate gradient method. Mathematical Programming 12, 241–254 (1977).