Conjugate Gradient Descent
Manopt.conjugate_gradient_descent — Functionconjugate_gradient_descent(M, F, gradF, x)perform a conjugate gradient based descent
\[x_{k+1} = \operatorname{retr}_{x_k} \bigl( s_kδ_k \bigr),\]
where $\operatorname{retr}$ denotes a retraction on the Manifold M and one can employ different rules to update the descent direction $δ_k$ based on the last direction $δ_{k-1}$ and both gradients $\operatorname{grad}f(x_k)$,$\operatorname{grad}f(x_{k-1})$. The Stepsize $s_k$ may be determined by a Linesearch.
Available update rules are SteepestDirectionUpdateRule, which yields a gradient_descent, ConjugateDescentCoefficient (the default), DaiYuanCoefficient, FletcherReevesCoefficient, HagerZhangCoefficient, HeestenesStiefelCoefficient, LiuStoreyCoefficient, and PolakRibiereCoefficient.
They all compute $β_k$ such that this algorithm updates the search direction as
\[\delta_k=\operatorname{grad}f(x_k) + β_k \delta_{k-1}\]
Input
M: a manifold $\mathcal M$F: a cost function $F:\mathcal M→ℝ$ to minimize implemented as a function(M,p) -> vgradF: the gradient $\operatorname{grad}F:\mathcal M → T\mathcal M$ of $F$ implemented also as(M,x) -> Xx: an initial value $x∈\mathcal M$
Optional
coefficient: (ConjugateDescentCoefficient<:DirectionUpdateRule) rule to compute the descent direction update coefficient $β_k$, as a functor, i.e. the resulting function maps(p,o,i) -> β, wherepis the currentGradientProblem,oare theConjugateGradientDescentOptionsoandiis the current iterate.evaluation– (AllocatingEvaluation) specify whether the gradient works by allocation (default) formgradF(M, x)orMutatingEvaluationin place, i.e. is of the formgradF!(M, X, x).retraction_method- (default_retraction_method(M) a retraction method to use.stepsize- (Constant(1.)) AStepsizefunction applied to the search direction. The default is a constant step size 1.stopping_criterion: (stopWhenAny( stopAtIteration(200), stopGradientNormLess(10.0^-8))) a function indicating when to stop.vector_transport_method– (default_vector_transport_method(M)) vector transport method to transport the old descent direction when computing the new descent direction.
Output
the obtained (approximate) minimizer $x^*$, see get_solver_return for details
Manopt.conjugate_gradient_descent! — Functionconjugate_gradient_descent!(M, F, gradF, x)perform a conjugate gradient based descent in place of x, i.e.
\[x_{k+1} = \operatorname{retr}_{x_k} \bigl( s_k\delta_k \bigr),\]
where $\operatorname{retr}$ denotes a retraction on the Manifold M
Input
M: a manifold $\mathcal M$F: a cost function $F:\mathcal M→ℝ$ to minimizegradF: the gradient $\operatorname{grad}F:\mathcal M→ T\mathcal M$ of Fx: an initial value $x∈\mathcal M$
for more details and options, especially the DirectionUpdateRules, see conjugate_gradient_descent.
Options
Manopt.ConjugateGradientDescentOptions — TypeConjugateGradientOptions <: AbstractGradientOptionsspecify options for a conjugate gradient descent algorithm, that solves a [GradientProblem].
Fields
x– the current iterate, a point on a manifoldgradient– the current gradient, also denoted as $ξ$ or $ξ_k$ for the gradient in the $k$th step.δ– the current descent direction, i.e. also tangent vectorβ– the current update coefficient rule, see .coefficient– aDirectionUpdateRulefunction to determine the newβstepsize– aStepsizefunctionstop– aStoppingCriterionretraction_method– (default_retraction_method(M)) a type of retraction
See also
conjugate_gradient_descent, GradientProblem, ArmijoLinesearch
Available Coefficients
The update rules act as DirectionUpdateRule, which internally always first evaluate the gradient itself.
Manopt.ConjugateDescentCoefficient — TypeConjugateDescentCoefficient <: DirectionUpdateRuleComputes an update coefficient for the conjugate gradient method, where the ConjugateGradientDescentOptionso include the last iterates $x_k,ξ_k$, the current iterates $x_{k+1},ξ_{k+1}$ of the iterate and the gradient, respectively, and the last update direction $\delta=\delta_k$, based on [Flethcer1987] adapted to manifolds:
\[β_k = \frac{ \lVert ξ_{k+1} \rVert_{x_{k+1}}^2 } {\langle -\delta_k,ξ_k \rangle_{x_k}}.\]
See also conjugate_gradient_descent
Constructor
ConjugateDescentCoefficient(a::StoreOptionsAction=())Construct the conjugate descent coefficient update rule, a new storage is created by default.
Manopt.DaiYuanCoefficient — TypeDaiYuanCoefficient <: DirectionUpdateRuleComputes an update coefficient for the conjugate gradient method, where the ConjugateGradientDescentOptionso include the last iterates $x_k,ξ_k$, the current iterates $x_{k+1},ξ_{k+1}$ of the iterate and the gradient, respectively, and the last update direction $\delta=\delta_k$, based on [DaiYuan1999] adapted to manifolds:
Let $\nu_k = ξ_{k+1} - P_{x_{k+1}\gets x_k}ξ_k$, where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$.
Then the coefficient reads
\[β_k = \frac{ \lVert ξ_{k+1} \rVert_{x_{k+1}}^2 } {\langle P_{x_{k+1}\gets x_k}\delta_k, \nu_k \rangle_{x_{k+1}}}.\]
See also conjugate_gradient_descent
Constructor
DaiYuanCoefficient(
t::AbstractVectorTransportMethod=ParallelTransport(),
a::StoreOptionsAction=(),
)Construct the Dai Yuan coefficient update rule, where the parallel transport is the default vector transport and a new storage is created by default.
Manopt.FletcherReevesCoefficient — TypeFletcherReevesCoefficient <: DirectionUpdateRuleComputes an update coefficient for the conjugate gradient method, where the ConjugateGradientDescentOptionso include the last iterates $x_k,ξ_k$, the current iterates $x_{k+1},ξ_{k+1}$ of the iterate and the gradient, respectively, and the last update direction $\delta=\delta_k$, based on [FletcherReeves1964] adapted to manifolds:
\[β_k = \frac{\lVert ξ_{k+1}\rVert_{x_{k+1}}^2}{\lVert ξ_{k}\rVert_{x_{k}}^2}.\]
See also conjugate_gradient_descent
Constructor
FletcherReevesCoefficient(a::StoreOptionsAction=())Construct the Fletcher Reeves coefficient update rule, a new storage is created by default.
Manopt.HagerZhangCoefficient — TypeHagerZhangCoefficient <: DirectionUpdateRuleComputes an update coefficient for the conjugate gradient method, where the ConjugateGradientDescentOptionso include the last iterates $x_k,ξ_k$, the current iterates $x_{k+1},ξ_{k+1}$ of the iterate and the gradient, respectively, and the last update direction $\delta=\delta_k$, based on [HagerZhang2005] adapted to manifolds: let $\nu_k = ξ_{k+1} - P_{x_{k+1}\gets x_k}ξ_k$, where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$.
\[β_k = \Bigl\langle\nu_k - \frac{ 2\lVert \nu_k\rVert_{x_{k+1}}^2 }{ \langle P_{x_{k+1}\gets x_k}\delta_k, \nu_k \rangle_{x_{k+1}} } P_{x_{k+1}\gets x_k}\delta_k, \frac{ξ_{k+1}}{ \langle P_{x_{k+1}\gets x_k}\delta_k, \nu_k \rangle_{x_{k+1}} } \Bigr\rangle_{x_{k+1}}.\]
This method includes a numerical stability proposed by those authors.
See also conjugate_gradient_descent
Constructor
HagerZhangCoefficient(
t::AbstractVectorTransportMethod=ParallelTransport(),
a::StoreOptionsAction=(),
)Construct the Hager Zhang coefficient update rule, where the parallel transport is the default vector transport and a new storage is created by default.
Manopt.HeestenesStiefelCoefficient — TypeHeestenesStiefelCoefficient <: DirectionUpdateRuleComputes an update coefficient for the conjugate gradient method, where the ConjugateGradientDescentOptionso include the last iterates $x_k,ξ_k$, the current iterates $x_{k+1},ξ_{k+1}$ of the iterate and the gradient, respectively, and the last update direction $\delta=\delta_k$, based on [HeestensStiefel1952] adapted to manifolds as follows:
Let $\nu_k = ξ_{k+1} - P_{x_{k+1}\gets x_k}ξ_k$. Then the update reads
\[β_k = \frac{\langle ξ_{k+1}, \nu_k \rangle_{x_{k+1}} } { \langle P_{x_{k+1}\gets x_k} \delta_k, \nu_k\rangle_{x_{k+1}} },\]
where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$.
Constructor
HeestenesStiefelCoefficient(
t::AbstractVectorTransportMethod=ParallelTransport(),
a::StoreOptionsAction=()
)Construct the Heestens Stiefel coefficient update rule, where the parallel transport is the default vector transport and a new storage is created by default.
See also conjugate_gradient_descent
Manopt.LiuStoreyCoefficient — TypeLiuStoreyCoefficient <: DirectionUpdateRuleComputes an update coefficient for the conjugate gradient method, where the ConjugateGradientDescentOptionso include the last iterates $x_k,ξ_k$, the current iterates $x_{k+1},ξ_{k+1}$ of the iterate and the gradient, respectively, and the last update direction $\delta=\delta_k$, based on [LuiStorey1991] adapted to manifolds:
Let $\nu_k = ξ_{k+1} - P_{x_{k+1}\gets x_k}ξ_k$, where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$.
Then the coefficient reads
\[β_k = - \frac{ \langle ξ_{k+1},\nu_k \rangle_{x_{k+1}} } {\langle \delta_k,ξ_k \rangle_{x_k}}.\]
See also conjugate_gradient_descent
Constructor
LiuStoreyCoefficient(
t::AbstractVectorTransportMethod=ParallelTransport(),
a::StoreOptionsAction=()
)Construct the Lui Storey coefficient update rule, where the parallel transport is the default vector transport and a new storage is created by default.
Manopt.PolakRibiereCoefficient — TypePolakRibiereCoefficient <: DirectionUpdateRuleComputes an update coefficient for the conjugate gradient method, where the ConjugateGradientDescentOptionso include the last iterates $x_k,ξ_k$, the current iterates $x_{k+1},ξ_{k+1}$ of the iterate and the gradient, respectively, and the last update direction $\delta=\delta_k$, based on [PolakRibiere1969][Polyak1969] adapted to manifolds:
Let $\nu_k = ξ_{k+1} - P_{x_{k+1}\gets x_k}ξ_k$, where $P_{a\gets b}(⋅)$ denotes a vector transport from the tangent space at $a$ to $b$.
Then the update reads
\[β_k = \frac{ \langle ξ_{k+1}, \nu_k \rangle_{x_{k+1}} } {\lVert ξ_k \rVert_{x_k}^2 }.\]
Constructor
PolakRibiereCoefficient(
t::AbstractVectorTransportMethod=ParallelTransport(),
a::StoreOptionsAction=()
)Construct the PolakRibiere coefficient update rule, where the parallel transport is the default vector transport and a new storage is created by default.
See also conjugate_gradient_descent
Manopt.SteepestDirectionUpdateRule — TypeSteepestDirectionUpdateRule <: DirectionUpdateRuleThe simplest rule to update is to have no influence of the last direction and hence return an update $β = 0$ for all ConjugateGradientDescentOptionso
See also conjugate_gradient_descent
Literature
- Flethcer1987
R. Fletcher, Practical Methods of Optimization vol. 1: Unconstrained Optimization John Wiley & Sons, New York, 1987. doi 10.1137/1024028
- DaiYuan1999
[Y. H. Dai and Y. Yuan, A nonlinear conjugate gradient method with a strong global convergence property, SIAM J. Optim., 10 (1999), pp. 177–182. doi: 10.1137/S1052623497318992
- FletcherReeves1964
R. Fletcher and C. Reeves, Function minimization by conjugate gradients, Comput. J., 7 (1964), pp. 149–154. doi: 10.1093/comjnl/7.2.149
- HagerZhang2005
[W. W. Hager and H. Zhang, A new conjugate gradient method with guaranteed descent and an efficient line search, SIAM J. Optim, (16), pp. 170-192, 2005. doi: 10.1137/030601880
- HeestensStiefel1952
M.R. Hestenes, E.L. Stiefel, Methods of conjugate gradients for solving linear systems, J. Research Nat. Bur. Standards, 49 (1952), pp. 409–436. doi: 10.6028/jres.049.044
- LuiStorey1991
[Y. Liu and C. Storey, Efficient generalized conjugate gradient algorithms, Part 1: Theory J. Optim. Theory Appl., 69 (1991), pp. 129–137. doi: 10.1007/BF00940464
- PolakRibiere1969
E. Polak, G. Ribiere, Note sur la convergence de méthodes de directions conjuguées ESAIM: Mathematical Modelling and Numerical Analysis - Modélisation Mathématique et Analyse Numérique, Tome 3 (1969) no. R1, p. 35-43, url: http://www.numdam.org/item/?id=M2AN1969__31350
- Polyak1969
B. T. Polyak, The conjugate gradient method in extreme problems, USSR Comp. Math. Math. Phys., 9 (1969), pp. 94–112. doi: 10.1016/0041-5553(69)90035-4