Conjugate gradient descent

Manopt.conjugate_gradient_descentFunction
conjugate_gradient_descent(M, f, grad_f, p=rand(M))
conjugate_gradient_descent!(M, f, grad_f, p)
conjugate_gradient_descent(M, gradient_objective, p)
conjugate_gradient_descent!(M, gradient_objective, p; kwargs...)

perform a conjugate gradient based descent-

\[p_{k+1} = \operatorname{retr}_{p_k} \bigl( s_kδ_k \bigr),\]

where $\operatorname{retr}$ denotes a retraction on the Manifold M and one can employ different rules to update the descent direction $δ_k$ based on the last direction $δ_{k-1}$ and both gradients $\operatorname{grad}f(x_k)$,$\operatorname{grad} f(x_{k-1})$. The Stepsize $s_k$ may be determined by a Linesearch.

Alternatively to f and grad_f you can provide the AbstractManifoldGradientObjective gradient_objective directly.

Available update rules are SteepestDescentCoefficientRule, which yields a gradient_descent, ConjugateDescentCoefficient (the default), DaiYuanCoefficientRule, FletcherReevesCoefficient, HagerZhangCoefficient, HestenesStiefelCoefficient, LiuStoreyCoefficient, and PolakRibiereCoefficient. These can all be combined with a ConjugateGradientBealeRestartRule rule.

They all compute $β_k$ such that this algorithm updates the search direction as

\[δ_k=\operatorname{grad}f(p_k) + β_k \delta_{k-1}\]

Input

  • M::AbstractManifold: a Riemannian manifold $\mathcal M$
  • f: a cost function $f: \mathcal M→ ℝ$ implemented as (M, p) -> v
  • grad_f: the (Riemannian) gradient $\operatorname{grad}f: \mathcal M → T_{p}\mathcal M$ of f as a function (M, p) -> X or a function (M, X, p) -> X computing X in-place
  • p: a point on the manifold $\mathcal M$

Keyword arguments

If you provide the ManifoldGradientObjective directly, the evaluation= keyword is ignored. The decorations are still applied to the objective.

Output

The obtained approximate minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return for details, especially the return_state= keyword.

source
Manopt.conjugate_gradient_descent!Function
conjugate_gradient_descent(M, f, grad_f, p=rand(M))
conjugate_gradient_descent!(M, f, grad_f, p)
conjugate_gradient_descent(M, gradient_objective, p)
conjugate_gradient_descent!(M, gradient_objective, p; kwargs...)

perform a conjugate gradient based descent-

\[p_{k+1} = \operatorname{retr}_{p_k} \bigl( s_kδ_k \bigr),\]

where $\operatorname{retr}$ denotes a retraction on the Manifold M and one can employ different rules to update the descent direction $δ_k$ based on the last direction $δ_{k-1}$ and both gradients $\operatorname{grad}f(x_k)$,$\operatorname{grad} f(x_{k-1})$. The Stepsize $s_k$ may be determined by a Linesearch.

Alternatively to f and grad_f you can provide the AbstractManifoldGradientObjective gradient_objective directly.

Available update rules are SteepestDescentCoefficientRule, which yields a gradient_descent, ConjugateDescentCoefficient (the default), DaiYuanCoefficientRule, FletcherReevesCoefficient, HagerZhangCoefficient, HestenesStiefelCoefficient, LiuStoreyCoefficient, and PolakRibiereCoefficient. These can all be combined with a ConjugateGradientBealeRestartRule rule.

They all compute $β_k$ such that this algorithm updates the search direction as

\[δ_k=\operatorname{grad}f(p_k) + β_k \delta_{k-1}\]

Input

  • M::AbstractManifold: a Riemannian manifold $\mathcal M$
  • f: a cost function $f: \mathcal M→ ℝ$ implemented as (M, p) -> v
  • grad_f: the (Riemannian) gradient $\operatorname{grad}f: \mathcal M → T_{p}\mathcal M$ of f as a function (M, p) -> X or a function (M, X, p) -> X computing X in-place
  • p: a point on the manifold $\mathcal M$

Keyword arguments

If you provide the ManifoldGradientObjective directly, the evaluation= keyword is ignored. The decorations are still applied to the objective.

Output

The obtained approximate minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return for details, especially the return_state= keyword.

source

State

Manopt.ConjugateGradientDescentStateType
ConjugateGradientState <: AbstractGradientSolverState

specify options for a conjugate gradient descent algorithm, that solves a [DefaultManoptProblem].

Fields

  • p::P: a point on the manifold $\mathcal M$storing the current iterate
  • X::T: a tangent vector at the point $p$ on the manifold $\mathcal M$
  • δ: the current descent direction, also a tangent vector
  • β: the current update coefficient rule, see .
  • coefficient: function to determine the new β
  • stepsize::Stepsize: a functor inheriting from Stepsize to determine a step size
  • stop::StoppingCriterion: a functor indicating that the stopping criterion is fulfilled
  • retraction_method::AbstractRetractionMethod: a retraction $\operatorname{retr}$ to use, see the section on retractions
  • vector_transport_method::AbstractVectorTransportMethodP: a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports

Constructor

ConjugateGradientState(M::AbstractManifold; kwargs...)

where the last five fields can be set by their names as keyword and the X can be set to a tangent vector type using the keyword initial_gradient which defaults to zero_vector(M,p), and δ is initialized to a copy of this vector.

Keyword arguments

The following fields from above <re keyword arguments

See also

conjugate_gradient_descent, DefaultManoptProblem, ArmijoLinesearch

source

Available coefficients

The update rules act as DirectionUpdateRule, which internally always first evaluate the gradient itself.

Manopt.ConjugateDescentCoefficientFunction
ConjugateDescentCoefficient()
ConjugateDescentCoefficient(M::AbstractManifold)

Compute the (classical) conjugate gradient coefficient based on [Fle87] adapted to manifolds

Denote the last iterate and gradient by $p_k,X_k$, the current iterate and gradient by $p_{k+1}, X_{k+1}$, respectively, as well as the last update direction by $δ_k$.

Then the coefficient reads

\[β_k = \frac{\lVert X_{k+1} \rVert_{p_{k+1}}^2}{⟨-δ_k,X_k⟩_{p_k}}\]

Info

This function generates a ManifoldDefaultsFactory for ConjugateDescentCoefficientRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.

source
Manopt.ConjugateGradientBealeRestartFunction
ConjugateGradientBealeRestart(direction_update::Union{DirectionUpdateRule,ManifoldDefaultsFactory}; kwargs...)
ConjugateGradientBealeRestart(M::AbstractManifold, direction_update::Union{DirectionUpdateRule,ManifoldDefaultsFactory}; kwargs...)

Compute a conjugate gradient coefficient with a potential restart, when two directions are nearly orthogonal. See [HZ06, page 12] (in the preprint, page 46 in Journal page numbers). This method is named after E. Beale from his proceedings paper in 1972 [Bea72]. This method acts as a decorator to any existing DirectionUpdateRule direction_update.

Denote the last iterate and gradient by $p_k,X_k$, the current iterate and gradient by $p_{k+1}, X_{k+1}$, respectively, as well as the last update direction by $δ_k$.

Then a restart is performed, hence $β_k = 0$ returned if

\[ \frac{⟨X_{k+1}, \mathcal T_{p_{k+1}←p_k}X_k⟩}{\lVert X_k \rVert_{p_k}} > ε,\]

where $ε$ is the threshold, which is set by default to 0.2, see [Pow77]

Input

Keyword arguments

Info

This function generates a ManifoldDefaultsFactory for ConjugateGradientBealeRestartRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.

source
Manopt.DaiYuanCoefficientFunction
DaiYuanCoefficient(; kwargs...)
DaiYuanCoefficient(M::AbstractManifold; kwargs...)

Computes an update coefficient for the conjugate_gradient_descent algorithm based on [DY99] adapted to Riemannian manifolds.

Denote the last iterate and gradient by $p_k,X_k$, the current iterate and gradient by $p_{k+1}, X_{k+1}$, respectively, as well as the last update direction by $δ_k$.

Let $ν_k = X_{k+1} - \mathcal T_{p_{k+1}←p_k}X_k$, where $\mathcal T_{⋅←⋅}$ denotes a vector transport.

Then the coefficient reads

\[β_k = \frac{\lVert X_{k+1} \rVert_{p_{k+1}}^2}{⟨\mathcal T_{p_{k+1}←p_k}δ_k, ν_k⟩_{p_{k+1}}}\]

Keyword arguments

Info

This function generates a ManifoldDefaultsFactory for DaiYuanCoefficientRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.

source
Manopt.FletcherReevesCoefficientFunction
FletcherReevesCoefficient()
FletcherReevesCoefficient(M::AbstractManifold)

Computes an update coefficient for the conjugate_gradient_descent algorithm based on [FR64] adapted to manifolds

Denote the last iterate and gradient by $p_k,X_k$, the current iterate and gradient by $p_{k+1}, X_{k+1}$, respectively, as well as the last update direction by $δ_k$.

Then the coefficient reads

\[β_k = \frac{\lVert X_{k+1} \rVert_{p_{k+1}}^2}{\lVert X_k \rVert_{p_{k}}^2}.\]

Info

This function generates a ManifoldDefaultsFactory for FletcherReevesCoefficientRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.

source
Manopt.HagerZhangCoefficientFunction
HagerZhangCoefficient(; kwargs...)
HagerZhangCoefficient(M::AbstractManifold; kwargs...)

Computes an update coefficient for the conjugate_gradient_descent algorithm based on [FR64] adapted to manifolds

Denote the last iterate and gradient by $p_k,X_k$, the current iterate and gradient by $p_{k+1}, X_{k+1}$, respectively, as well as the last update direction by $δ_k$.

Let $ν_k = X_{k+1} - \mathcal T_{p_{k+1}←p_k}X_k$, where $\mathcal T_{⋅←⋅}$ denotes a vector transport.

Then the coefficient reads

\[β_k = \Bigl⟨ν_k - \frac{2\lVert ν_k \rVert_{p_{k+1}}^2}{⟨\mathcal T_{p_{k+1}←p_k}δ_k, ν_k⟩_{p_{k+1}}} \mathcal T_{p_{k+1}←p_k}δ_k, \frac{X_{k+1}}{⟨\mathcal T_{p_{k+1}←p_k}δ_k, ν_k⟩_{p_{k+1}}} \Bigr⟩_{p_{k+1}}.\]

This method includes a numerical stability proposed by those authors.

Keyword arguments

Info

This function generates a ManifoldDefaultsFactory for HagerZhangCoefficientRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.

source
Manopt.HestenesStiefelCoefficientFunction
HestenesStiefelCoefficient(; kwargs...)
HestenesStiefelCoefficient(M::AbstractManifold; kwargs...)

Computes an update coefficient for the conjugate_gradient_descent algorithm based on [HS52] adapted to manifolds

Denote the last iterate and gradient by $p_k,X_k$, the current iterate and gradient by $p_{k+1}, X_{k+1}$, respectively, as well as the last update direction by $δ_k$.

Let $ν_k = X_{k+1} - \mathcal T_{p_{k+1}←p_k}X_k$, where $\mathcal T_{⋅←⋅}$ denotes a vector transport.

Then the coefficient reads

\[β_k = \frac{⟨ X_{k+1}, ν_k ⟩_{p_{k+1}}}{⟨ \mathcal T_{p_{k+1}←p_k}δ_k, ν_k⟩_{p_{k+1}}}.\]

Keyword arguments

Info

This function generates a ManifoldDefaultsFactory for HestenesStiefelCoefficientRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.

source
Manopt.LiuStoreyCoefficientFunction
LiuStoreyCoefficient(; kwargs...)
LiuStoreyCoefficient(M::AbstractManifold; kwargs...)

Computes an update coefficient for the conjugate_gradient_descent algorithm based on [LS91] adapted to manifolds

Denote the last iterate and gradient by $p_k,X_k$, the current iterate and gradient by $p_{k+1}, X_{k+1}$, respectively, as well as the last update direction by $δ_k$.

Let $ν_k = X_{k+1} - \mathcal T_{p_{k+1}←p_k}X_k$, where $\mathcal T_{⋅←⋅}$ denotes a vector transport.

Then the coefficient reads

\[β_k = - \frac{⟨ X_{k+1},ν_k ⟩_{p_{k+1}}}{⟨ δ_k,X_k ⟩_{p_k}}.\]

Keyword arguments

Info

This function generates a ManifoldDefaultsFactory for LiuStoreyCoefficientRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.

source
Manopt.PolakRibiereCoefficientFunction
PolakRibiereCoefficient(; kwargs...)
PolakRibiereCoefficient(M::AbstractManifold; kwargs...)

Computes an update coefficient for the conjugate_gradient_descent algorithm based on [PR69] adapted to Riemannian manifolds.

Denote the last iterate and gradient by $p_k,X_k$, the current iterate and gradient by $p_{k+1}, X_{k+1}$, respectively, as well as the last update direction by $δ_k$.

Let $ν_k = X_{k+1} - \mathcal T_{p_{k+1}←p_k}X_k$, where $\mathcal T_{⋅←⋅}$ denotes a vector transport.

Then the coefficient reads

\[β_k = \frac{⟨ X_{k+1}, ν_k ⟩_{p_{k+1}}}{\lVert X_k \rVert_{{p_k}}^2}.\]

Keyword arguments

Info

This function generates a ManifoldDefaultsFactory for PolakRibiereCoefficientRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.

source
Manopt.SteepestDescentCoefficientFunction
SteepestDescentCoefficient()
SteepestDescentCoefficient(M::AbstractManifold)

Computes an update coefficient for the conjugate_gradient_descent algorithm so that is falls back to a gradient_descent method, that is

\[β_k = 0\]

Info

This function generates a ManifoldDefaultsFactory for SteepestDescentCoefficient. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.

source

Internal rules for coefficients

Manopt.ConjugateGradientBealeRestartRuleType
ConjugateGradientBealeRestartRule <: DirectionUpdateRule

A functor (problem, state, k) -> β_k to compute the conjugate gradient update coefficient based on a restart idea of [Bea72], following [HZ06, page 12] adapted to manifolds.

Fields

  • direction_update::DirectionUpdateRule: the actual rule, that is restarted
  • threshold::Real: a threshold for the restart check.
  • vector_transport_method::AbstractVectorTransportMethodP: a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports

Constructor

ConjugateGradientBealeRestartRule(
    direction_update::Union{DirectionUpdateRule,ManifoldDefaultsFactory};
    kwargs...
)
ConjugateGradientBealeRestartRule(
    M::AbstractManifold=DefaultManifold(),
    direction_update::Union{DirectionUpdateRule,ManifoldDefaultsFactory};
    kwargs...
)

Construct the Beale restart coefficient update rule adapted to manifolds.

Input

Keyword arguments

See also

ConjugateGradientBealeRestart, conjugate_gradient_descent

source
Manopt.DaiYuanCoefficientRuleType
DaiYuanCoefficientRule <: DirectionUpdateRule

A functor (problem, state, k) -> β_k to compute the conjugate gradient update coefficient based on [DY99] adapted to manifolds

Fields

Constructor

DaiYuanCoefficientRule(M::AbstractManifold; kwargs...)

Construct the Dai—Yuan coefficient update rule.

Keyword arguments

See also

DaiYuanCoefficient, conjugate_gradient_descent

source
Manopt.HagerZhangCoefficientRuleType
HagerZhangCoefficientRule <: DirectionUpdateRule

A functor (problem, state, k) -> β_k to compute the conjugate gradient update coefficient based on [HZ05] adapted to manifolds

Fields

Constructor

HagerZhangCoefficientRule(M::AbstractManifold; kwargs...)

Construct the Hager-Zang coefficient update rule based on [HZ05] adapted to manifolds.

Keyword arguments

See also

HagerZhangCoefficient, conjugate_gradient_descent

source
Manopt.HestenesStiefelCoefficientRuleType
HestenesStiefelCoefficientRuleRule <: DirectionUpdateRule

A functor (problem, state, k) -> β_k to compute the conjugate gradient update coefficient based on [HS52] adapted to manifolds

Fields

Constructor

HestenesStiefelCoefficientRuleRule(M::AbstractManifold; kwargs...)

Construct the Hestenes-Stiefel coefficient update rule based on [HS52] adapted to manifolds.

Keyword arguments

See also

HestenesStiefelCoefficient, conjugate_gradient_descent

source
Manopt.LiuStoreyCoefficientRuleType
LiuStoreyCoefficientRule <: DirectionUpdateRule

A functor (problem, state, k) -> β_k to compute the conjugate gradient update coefficient based on [LS91] adapted to manifolds

Fields

Constructor

LiuStoreyCoefficientRule(M::AbstractManifold; kwargs...)

Construct the Lui-Storey coefficient update rule based on [LS91] adapted to manifolds.

Keyword arguments

See also

LiuStoreyCoefficient, conjugate_gradient_descent

source
Manopt.PolakRibiereCoefficientRuleType
PolakRibiereCoefficientRule <: DirectionUpdateRule

A functor (problem, state, k) -> β_k to compute the conjugate gradient update coefficient based on [PR69] adapted to manifolds

Fields

Constructor

PolakRibiereCoefficientRule(M::AbstractManifold; kwargs...)

Construct the Dai—Yuan coefficient update rule.

Keyword arguments

See also

PolakRibiereCoefficient, conjugate_gradient_descent

source

Technical details

The conjugate_gradient_descent solver requires the following functions of a manifold to be available

  • A retract!(M, q, p, X); it is recommended to set the default_retraction_method to a favourite retraction. If this default is set, a retraction_method= does not have to be specified.
  • A vector_transport_to!M, Y, p, X, q); it is recommended to set the default_vector_transport_method to a favourite retraction. If this default is set, a vector_transport_method= or vector_transport_method_dual= (for $\mathcal N$) does not have to be specified.
  • By default gradient descent uses ArmijoLinesearch which requires max_stepsize(M) to be set and an implementation of inner(M, p, X).
  • By default the stopping criterion uses the norm as well, to stop when the norm of the gradient is small, but if you implemented inner, the norm is provided already.
  • By default the tangent vector storing the gradient is initialized calling zero_vector(M,p).

Literature

[Bea72]
E. M. Beale. A derivation of conjugate gradients. In: Numerical methods for nonlinear optimization, edited by F. A. Lootsma (Academic Press, London, London, 1972); pp. 39–43.
[DY99]
Y. H. Dai and Y. Yuan. A Nonlinear Conjugate Gradient Method with a Strong Global Convergence Property. SIAM Journal on Optimization 10, 177–182 (1999).
[Fle87]
R. Fletcher. Practical Methods of Optimization. 2 Edition, A Wiley-Interscience Publication (John Wiley & Sons Ltd., 1987).
[FR64]
R. Fletcher and C. M. Reeves. Function minimization by conjugate gradients. The Computer Journal 7, 149–154 (1964).
[HZ06]
W. W. Hager and H. Zhang. A survey of nonlinear conjugate gradient methods. Pacific Journal of Optimization 2, 35–58 (2006).
[HZ05]
W. W. Hager and H. Zhang. A New Conjugate Gradient Method with Guaranteed Descent and an Efficient Line Search. SIAM Journal on Optimization 16, 170–192 (2005).
[HS52]
M. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards 49, 409 (1952).
[LS91]
Y. Liu and C. Storey. Efficient generalized conjugate gradient algorithms, part 1: Theory. Journal of Optimization Theory and Applications 69, 129–137 (1991).
[PR69]
E. Polak and G. Ribière. Note sur la convergence de méthodes de directions conjuguées. Revue française d’informatique et de recherche opérationnelle 3, 35–43 (1969).
[Pow77]
M. J. Powell. Restart procedures for the conjugate gradient method. Mathematical Programming 12, 241–254 (1977).