Conjugate Gradient Descent
Manopt.conjugate_gradient_descent
— Functionconjugate_gradient_descent(M, F, gradF, p=rand(M))
conjugate_gradient_descent(M, gradient_objective, p)
perform a conjugate gradient based descent
\[p_{k+1} = \operatorname{retr}_{p_k} \bigl( s_kδ_k \bigr),\]
where $\operatorname{retr}$ denotes a retraction on the Manifold
M
and one can employ different rules to update the descent direction $δ_k$ based on the last direction $δ_{k-1}$ and both gradients $\operatorname{grad}f(x_k)$,$\operatorname{grad}f(x_{k-1})$. The Stepsize
$s_k$ may be determined by a Linesearch
.
Alternatively to f
and grad_f
you can prodive the AbstractManifoldGradientObjective
gradient_objective
directly.
Available update rules are SteepestDirectionUpdateRule
, which yields a gradient_descent
, ConjugateDescentCoefficient
(the default), DaiYuanCoefficient
, FletcherReevesCoefficient
, HagerZhangCoefficient
, HestenesStiefelCoefficient
, LiuStoreyCoefficient
, and PolakRibiereCoefficient
. These can all be combined with a ConjugateGradientBealeRestart
rule.
They all compute $β_k$ such that this algorithm updates the search direction as
\[\delta_k=\operatorname{grad}f(p_k) + β_k \delta_{k-1}\]
Input
M
: a manifold $\mathcal M$f
: a cost function $F:\mathcal M→ℝ$ to minimize implemented as a function(M,p) -> v
grad_f
: the gradient $\operatorname{grad}F:\mathcal M → T\mathcal M$ of $F$ implemented also as(M,x) -> X
p
: an initial value $x∈\mathcal M$
Optional
coefficient
: (ConjugateDescentCoefficient
<:
DirectionUpdateRule
) rule to compute the descent direction update coefficient $β_k$, as a functor, i.e. the resulting function maps(amp, cgs, i) -> β
, whereamp
is anAbstractManoptProblem
,cgs
are theConjugateGradientDescentState
o
andi
is the current iterate.evaluation
– (AllocatingEvaluation
) specify whether the gradient works by allocation (default) formgradF(M, x)
orInplaceEvaluation
in place, i.e. is of the formgradF!(M, X, x)
.retraction_method
- (default_retraction_method(M, typeof(p))
) a retraction method to use.stepsize
- (ArmijoLinesearch
viadefault_stepsize
) AStepsize
function applied to the search direction. The default is a constant step size 1.stopping_criterion
: (stopWhenAny( stopAtIteration(200), stopGradientNormLess(10.0^-8))
) a function indicating when to stop.vector_transport_method
– (default_vector_transport_method(M, typeof(p))
) vector transport method to transport the old descent direction when computing the new descent direction.
If you provide the ManifoldGradientObjective
directly, evaluation
is ignored.
Output
the obtained (approximate) minimizer $p^*$, see get_solver_return
for details
Manopt.conjugate_gradient_descent!
— Functionconjugate_gradient_descent!(M, F, gradF, x)
conjugate_gradient_descent!(M, gradient_objective, p; kwargs...)
perform a conjugate gradient based descent in place of x
, i.e.
\[p_{k+1} = \operatorname{retr}_{p_k} \bigl( s_k\delta_k \bigr),\]
where $\operatorname{retr}$ denotes a retraction on the Manifold
M
Input
M
: a manifold $\mathcal M$f
: a cost function $F:\mathcal M→ℝ$ to minimizegrad_f
: the gradient $\operatorname{grad}F:\mathcal M→ T\mathcal M$ of Fp
: an initial value $p∈\mathcal M$
Alternatively to f
and grad_f
you can prodive the AbstractManifoldGradientObjective
gradient_objective
directly.
for more details and options, especially the DirectionUpdateRule
s, see conjugate_gradient_descent
.
State
Manopt.ConjugateGradientDescentState
— TypeConjugateGradientState <: AbstractGradientSolverState
specify options for a conjugate gradient descent algorithm, that solves a [DefaultManoptProblem
].
Fields
p
– the current iterate, a point on a manifoldX
– the current gradient, also denoted as $ξ$ or $X_k$ for the gradient in the $k$th step.δ
– the current descent direction, i.e. also tangent vectorβ
– the current update coefficient rule, see .coefficient
– (ConjugateDescentCoefficient
()
) aDirectionUpdateRule
function to determine the newβ
stepsize
– (default_stepsize
(M, ConjugateGradientDescentState; retraction_method=retraction_method)
) aStepsize
functionstop
– (StopAfterIteration
(500) |
StopWhenGradientNormLess
(1e-8)
) aStoppingCriterion
retraction_method
– (default_retraction_method(M, typeof(p))
) a type of retractionvector_transport_method
– (default_retraction_method(M, typeof(p))
) a type of retraction
Constructor
ConjugateGradientState(M, p)
where the last five fields above can be set by their names as keyword and the X
can be set to a tangent vector type using the keyword initial_gradient
which defaults to zero_vector(M,p)
, and δ
is initialized to a copy of this vector.
See also
conjugate_gradient_descent
, DefaultManoptProblem
, ArmijoLinesearch
Available Coefficients
The update rules act as DirectionUpdateRule
, which internally always first evaluate the gradient itself.
Manopt.ConjugateGradientBealeRestart
— TypeConjugateGradientBealeRestart <: DirectionUpdateRule
An update rule might require a restart, that is using pure gradient as descent direction, if the last two gradients are nearly orthogonal, cf. Hager, Zhang, Pacific J Optim, 2006, page 12 (in the pdf, 46 in Journal page numbers). This method is named after E. Beale from his proceedings paper in 1972 [Bea72]. This method acts as a decorator to any existing DirectionUpdateRule
direction_update
.
When obtain from the ConjugateGradientDescentState
cgs
the last $p_k,X_k$ and the current $p_{k+1},X_{k+1}$ iterate and the gradient, respectively.
Then a restart is performed, i.e. $β_k = 0$ returned if
\[ \frac{ ⟨X_{k+1}, P_{p_{k+1}\gets p_k}X_k⟩}{\lVert X_k \rVert_{p_k}} > ξ,\]
where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$, and $ξ$ is the threshold
. The default threshold is chosen as 0.2
as recommended in Powell, Math. Prog., 1977
Constructor
ConjugateGradientBealeRestart(
direction_update::D,
threshold=0.2;
manifold::AbstractManifold = DefaultManifold(),
vector_transport_method::V=default_vector_transport_method(manifold),
)
Manopt.ConjugateDescentCoefficient
— TypeConjugateDescentCoefficient <: DirectionUpdateRule
Computes an update coefficient for the conjugate gradient method, where the ConjugateGradientDescentState
cgds
include the last iterates $p_k,X_k$, the current iterates $p_{k+1},X_{k+1}$ of the iterate and the gradient, respectively, and the last update direction $\delta=\delta_k$, based on Fletcher, 1987 adapted to manifolds:
\[β_k = \frac{ \lVert X_{k+1} \rVert_{p_{k+1}}^2 } {\langle -\delta_k,X_k \rangle_{p_k}}.\]
See also conjugate_gradient_descent
Constructor
ConjugateDescentCoefficient(a::StoreStateAction=())
Construct the conjugate descent coefficient update rule, a new storage is created by default.
Manopt.DaiYuanCoefficient
— TypeDaiYuanCoefficient <: DirectionUpdateRule
Computes an update coefficient for the conjugate gradient method, where the ConjugateGradientDescentState
cgds
include the last iterates $p_k,X_k$, the current iterates $p_{k+1},X_{k+1}$ of the iterate and the gradient, respectively, and the last update direction $\delta=\delta_k$, based on Dai, Yuan, Siam J Optim, 1999 adapted to manifolds:
Let $\nu_k = X_{k+1} - P_{p_{k+1}\gets p_k}X_k$, where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$.
Then the coefficient reads
\[β_k = \frac{ \lVert X_{k+1} \rVert_{p_{k+1}}^2 } {\langle P_{p_{k+1}\gets p_k}\delta_k, \nu_k \rangle_{p_{k+1}}}.\]
See also conjugate_gradient_descent
Constructor
function DaiYuanCoefficient(
M::AbstractManifold=DefaultManifold(2);
t::AbstractVectorTransportMethod=default_vector_transport_method(M)
)
Construct the Dai Yuan coefficient update rule, where the parallel transport is the default vector transport and a new storage is created by default.
Manopt.FletcherReevesCoefficient
— TypeFletcherReevesCoefficient <: DirectionUpdateRule
Computes an update coefficient for the conjugate gradient method, where the ConjugateGradientDescentState
cgds
include the last iterates $p_k,X_k$, the current iterates $p_{k+1},X_{k+1}$ of the iterate and the gradient, respectively, and the last update direction $\delta=\delta_k$, based on Flecther, Reeves, Comput. J, 1964 adapted to manifolds:
\[β_k = \frac{\lVert X_{k+1}\rVert_{p_{k+1}}^2}{\lVert X_k\rVert_{x_{k}}^2}.\]
See also conjugate_gradient_descent
Constructor
FletcherReevesCoefficient(a::StoreStateAction=())
Construct the Fletcher Reeves coefficient update rule, a new storage is created by default.
Manopt.HagerZhangCoefficient
— TypeHagerZhangCoefficient <: DirectionUpdateRule
Computes an update coefficient for the conjugate gradient method, where the ConjugateGradientDescentState
cgds
include the last iterates $p_k,X_k$, the current iterates $p_{k+1},X_{k+1}$ of the iterate and the gradient, respectively, and the last update direction $\delta=\delta_k$, based on Hager, Zhang, SIAM J Optim, 2005. adapted to manifolds: let $\nu_k = X_{k+1} - P_{p_{k+1}\gets p_k}X_k$, where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$.
\[β_k = \Bigl\langle\nu_k - \frac{ 2\lVert \nu_k\rVert_{p_{k+1}}^2 }{ \langle P_{p_{k+1}\gets p_k}\delta_k, \nu_k \rangle_{p_{k+1}} } P_{p_{k+1}\gets p_k}\delta_k, \frac{X_{k+1}}{ \langle P_{p_{k+1}\gets p_k}\delta_k, \nu_k \rangle_{p_{k+1}} } \Bigr\rangle_{p_{k+1}}.\]
This method includes a numerical stability proposed by those authors.
See also conjugate_gradient_descent
Constructor
function HagerZhangCoefficient(t::AbstractVectorTransportMethod)
function HagerZhangCoefficient(M::AbstractManifold = DefaultManifold(2))
Construct the Hager Zhang coefficient update rule, where the parallel transport is the default vector transport and a new storage is created by default.
Manopt.HestenesStiefelCoefficient
— TypeHestenesStiefelCoefficient <: DirectionUpdateRule
Computes an update coefficient for the conjugate gradient method, where the ConjugateGradientDescentState
cgds
include the last iterates $p_k,X_k$, the current iterates $p_{k+1},X_{k+1}$ of the iterate and the gradient, respectively, and the last update direction $\delta=\delta_k$, based on Heestenes, Stiefel, J. Research Nat. Bur. Standards, 1952 adapted to manifolds as follows:
Let $\nu_k = X_{k+1} - P_{p_{k+1}\gets p_k}X_k$. Then the update reads
\[β_k = \frac{\langle X_{k+1}, \nu_k \rangle_{p_{k+1}} } { \langle P_{p_{k+1}\gets p_k} \delta_k, \nu_k\rangle_{p_{k+1}} },\]
where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$.
Constructor
function HestenesStiefelCoefficient(transport_method::AbstractVectorTransportMethod)
function HestenesStiefelCoefficient(M::AbstractManifold = DefaultManifold(2))
Construct the Heestens Stiefel coefficient update rule, where the parallel transport is the default vector transport and a new storage is created by default.
See also conjugate_gradient_descent
Manopt.LiuStoreyCoefficient
— TypeLiuStoreyCoefficient <: DirectionUpdateRule
Computes an update coefficient for the conjugate gradient method, where the ConjugateGradientDescentState
cgds
include the last iterates $p_k,X_k$, the current iterates $p_{k+1},X_{k+1}$ of the iterate and the gradient, respectively, and the last update direction $\delta=\delta_k$, based on Lui, Storey, J. Optim. Theoru Appl., 1991 adapted to manifolds:
Let $\nu_k = X_{k+1} - P_{p_{k+1}\gets p_k}X_k$, where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$.
Then the coefficient reads
\[β_k = - \frac{ \langle X_{k+1},\nu_k \rangle_{p_{k+1}} } {\langle \delta_k,X_k \rangle_{p_k}}.\]
See also conjugate_gradient_descent
Constructor
function LiuStoreyCoefficient(t::AbstractVectorTransportMethod)
function LiuStoreyCoefficient(M::AbstractManifold = DefaultManifold(2))
Construct the Lui Storey coefficient update rule, where the parallel transport is the default vector transport and a new storage is created by default.
Manopt.PolakRibiereCoefficient
— TypePolakRibiereCoefficient <: DirectionUpdateRule
Computes an update coefficient for the conjugate gradient method, where the ConjugateGradientDescentState
cgds
include the last iterates $p_k,X_k$, the current iterates $p_{k+1},X_{k+1}$ of the iterate and the gradient, respectively, and the last update direction $\delta=\delta_k$, based on Poliak, Ribiere, ESIAM Math. Modelling Num. Anal., 1969 and Polyak, USSR Comp. Math. Math. Phys., 1969 adapted to manifolds:
Let $\nu_k = X_{k+1} - P_{p_{k+1}\gets p_k}X_k$, where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$.
Then the update reads
\[β_k = \frac{ \langle X_{k+1}, \nu_k \rangle_{p_{k+1}} } {\lVert X_k \rVert_{p_k}^2 }.\]
Constructor
function PolakRibiereCoefficient(
M::AbstractManifold=DefaultManifold(2);
t::AbstractVectorTransportMethod=default_vector_transport_method(M)
)
Construct the PolakRibiere coefficient update rule, where the parallel transport is the default vector transport and a new storage is created by default.
See also conjugate_gradient_descent
Manopt.SteepestDirectionUpdateRule
— TypeSteepestDirectionUpdateRule <: DirectionUpdateRule
The simplest rule to update is to have no influence of the last direction and hence return an update $β = 0$ for all ConjugateGradientDescentState
cgds
See also conjugate_gradient_descent
Literature
- [Bea72]
-
E. M. Beale. A derivation of conjugate gradients. In: Numerical methods for nonlinear optimization, 39–43, London, Academic Press, London (1972).
- [DY99]
-
Y. H. Dai and Y. Yuan. A Nonlinear Conjugate Gradient Method with a Strong Global Convergence Property. SIAM Journal on Optimization 10, 177–182 (1999).
- [Fle87]
-
R. Fletcher. Practical Methods of Optimization. John Wiley & Sons Ltd. (1987).
- [FR64]
-
R. Fletcher and C. M. Reeves. Function minimization by conjugate gradients. The Computer Journal 7, 149–154 (1964).
- [HZ06]
-
W. W. Hager and H. Zhang. A survey of nonlinear conjugate gradient methods. Pacific Journal of Optimization 2, 35–58 (2006).
- [HZ05]
-
W. W. Hager and H. Zhang. A New Conjugate Gradient Method with Guaranteed Descent and an Efficient Line Search. SIAM Journal on Optimization 16, 170–192 (2005).
- [HS52]
-
M. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards 49, 409 (1952).
- [LS91]
-
Y. Liu and C. Storey. Efficient generalized conjugate gradient algorithms, part 1: Theory. Journal of Optimization Theory and Applications 69, 129–137 (1991).
- [PR69]
-
E. Polak and G. Ribière. Note sur la convergence de méthodes de directions conjuguées. Revue française d’informatique et de recherche opérationnelle 3, 35–43 (1969).
- [Pol69]
-
B. T. Polyak. The conjugate gradient method in extremal problems. USSR Computational Mathematics and Mathematical Physics 9, 94–112 (1969).
- [Pow77]
-
M. J. Powell. Restart procedures for the conjugate gradient method. Mathematical Programming 12, 241–254 (1977).