Gradient Descent
Manopt.gradient_descent — Functiongradient_descent(M, F, gradF, x)perform a gradient descent
\[x_{k+1} = \operatorname{retr}_{x_k}\bigl( s_k\operatorname{grad}f(x_k) \bigr)\]
with different choices of $s_k$ available (see stepsize option below).
Input
M– a manifold $\mathcal M$F– a cost function $F: \mathcal M→ℝ$ to minimizegradF– the gradient $\operatorname{grad}F: \mathcal M → T\mathcal M$ of Fx– an initial value $x ∈ \mathcal M$
Optional
direction–IdentityUpdateRuleperform a processing of the direction, e.g.evaluation– (AllocatingEvaluation) specify whether the gradient works by allocation (default) formgradF(M, x)orMutatingEvaluationin place, i.e. is of the formgradF!(M, X, x).retraction_method– (default_retraction_method(M)) aretraction(M,x,ξ)to use.stepsize– (ConstantStepsize(1.)) specify aStepsizefunctor.stopping_criterion– (StopWhenAny(StopAfterIteration(200),StopWhenGradientNormLess(10.0^-8))) a functor inheriting fromStoppingCriterionindicating when to stop.
... and the ones that are passed to decorate_options for decorators.
Output
the obtained (approximate) minimizer $x^*$, see get_solver_return for details
Manopt.gradient_descent! — Functiongradient_descent!(M, F, gradF, x)perform a gradient_descent
\[x_{k+1} = \operatorname{retr}_{x_k}\bigl( s_k\operatorname{grad}f(x_k) \bigr)\]
in place of x with different choices of $s_k$ available.
Input
M– a manifold $\mathcal M$F– a cost function $F:\mathcal M→ℝ$ to minimizegradF– the gradient $\operatorname{grad}F:\mathcal M→ T\mathcal M$ of Fx– an initial value $x ∈ \mathcal M$
For more options, especially Stepsizes for $s_k$, see gradient_descent
Options
Manopt.AbstractGradientOptions — TypeAbstractGradientOptions <: OptionsA generic Options type for gradient based options data.
Manopt.GradientDescentOptions — TypeGradientDescentOptions{P,T} <: AbstractGradientOptionsDescribes a Gradient based descent algorithm, with
Fields
a default value is given in brackets if a parameter can be left out in initialization.
x0– an a point (of typeP) on a manifold as starting pointgradient– the current gradient $\operatorname{grad}f(x)$stopping_criterion– (StopAfterIteration(100)) aStoppingCriterionstepsize– (ConstantStepsize()) aStepsizedirection- (IdentityUpdateRule) a processor to compute the gradientretraction_method– (default_retraction_method(M)) the retraction to use, defaults to the default set for your manifold.
Constructor
GradientDescentOptions(M, x; initial_vector=zero_vector(M, x), kwargs...)Generate gradient descent options, where initial_vector can be used to set the tangent vector to store the gradient to a certain type. All following fields are keyword arguments.
See also
Direction Update Rules
A field of the options is the direction, a DirectionUpdateRule, which by default IdentityUpdateRule just evaluates the gradient but can be enhanced for example to
Manopt.DirectionUpdateRule — TypeDirectionUpdateRuleA general functor, that handles direction update rules. It's field(s) is usually only a StoreOptionsAction by default initialized to the fields required for the specific coefficient, but can also be replaced by a (common, global) individual one that provides these values.
Manopt.IdentityUpdateRule — TypeIdentityUpdateRule <: DirectionUpdateRuleThe default gradient direction update is the identity, i.e. it just evaluates the gradient.
Manopt.MomentumGradient — TypeMomentumGradient <: DirectionUpdateRuleAppend a momentum to a gradient processor, where the last direction and last iterate are stored and the new is composed as $η_i = m*η_{i-1}' - s d_i$, where $sd_i$ is the current (inner) direction and $η_{i-1}'$ is the vector transported last direction multiplied by momentum $m$.
Fields
gradient– (zero_vector(M,x0)) the last gradient/direction update added as momentumlast_iterate- remember the last iterate for parallel transporting the last directionmomentum– (0.2) factor for momentumdirection– internalDirectionUpdateRuleto determine directions to add the momentum to.vector_transport_methodvector transport method to use
Constructors
MomentumGradient(
p::GradientProlem,
x0,
s::DirectionUpdateRule=Gradient();
gradient=zero_vector(p.M, o.x), momentum=0.2
vector_transport_method=ParallelTransport(),
)Add momentum to a gradient problem, where by default just a gradient evaluation is used Equivalently you can also use a Manifold M instead of the GradientProblem p.
MomentumGradient(
p::StochasticGradientProblem
x0
s::DirectionUpdateRule=IdentityUpdateRule();
gradient=zero_vector(p.M, x0), momentum=0.2
vector_transport_method=ParallelTransport(),
)Add momentum to a stochastic gradient problem, where by default just a stochastic gradient evaluation is used
Manopt.AverageGradient — TypeAverageGradient <: DirectionUpdateRuleAdd an average of gradients to a gradient processor. A set of previous directions (from the inner processor) and the last iterate are stored, average is taken after vector transporting them to the current iterates tangent space.
Fields
gradients– (fill(zero_vector(M,x0),n)) the lastngradient/direction updateslast_iterate– last iterate (needed to transport the gradients)direction– internalDirectionUpdateRuleto determine directions to apply the averaging tovector_transport_method- vector transport method to use
Constructors
AverageGradient(
p::GradientProlem,
x0,
n::Int=10
s::DirectionUpdateRule=IdentityUpdateRule();
gradients = fill(zero_vector(p.M, o.x),n),
last_iterate = deepcopy(x0),
vector_transport_method = ParallelTransport()
)Add average to a gradient problem, n determines the size of averaging and gradients can be prefilled with some history Equivalently you can also use a Manifold M instead of the GradientProblem p.
AverageGradient(
p::StochasticGradientProblem
x0
n::Int=10
s::DirectionUpdateRule=IdentityUpdateRule();
gradients = fill(zero_vector(p.M, o.x),n),
last_iterate = deepcopy(x0),
vector_transport_method = ParallelTransport()
)Add average to a stochastic gradient problem, n determines the size of averaging and gradients can be prefilled with some history
Manopt.Nesterov — TypeNesterov <: DirectionUpdateRuleFields
γμthe strong convexity coefficientv(=$=v_k$, $v_0=x_0$) an interims point to compute the next gradient evaluation pointy_kshrinkage(= i -> 0.8) a function to compute the shrinkage $β_k$ per iterate.
Let's assume $f$ is $L$-Lipschitz and $μ$-strongly convex. Given
- a step size $h_k<\frac{1}{L}$ (from the
GradientDescentOptions - a
shrinkageparameter $β_k$ - and a current iterate $x_k$
- as well as the interims values $γ_k$ and $v_k$ from the previous iterate.
This compute a Nesterov type update using the following steps, see [ZhangSra2018]
- Copute the positive root, i.e. $α_k∈(0,1)$ of $α^2 = h_k\bigl((1-α_k)γ_k+α_k μ\bigr)$.
- Set $\bar γ_k+1 = (1-α_k)γ_k + α_kμ$
- $y_k = \operatorname{retr}_{x_k}\Bigl(\frac{α_kγ_k}{γ_k + α_kμ}\operatorname{retr}^{-1}_{x_k}v_k \Bigr)$
- $x_{k+1} = \operatorname{retr}_{y_k}(-h_k \operatorname{grad}f(y_k))$
- $v_{k+1} = `\operatorname{retr}_{y_k}\Bigl(\frac{(1-α_k)γ_k}{\barγ_k}\operatorname{retr}_{y_k}^{-1}(v_k) - \frac{α_k}{\bar γ_{k+1}}\operatorname{grad}f(y_k) \Bigr)$
- $γ_{k+1} = \frac{1}{1+β_k}\bar γ_{k+1}$
Then the direction from $x_k$ to $x_k+1$, i.e. $d = \operatorname{retr}^{-1}_{x_k}x_{k+1}$ is returned.
Constructor
Nesterov(x0::P, γ=0.001, μ=0.9, schrinkage = k -> 0.8;
inverse_retraction_method=LogarithmicInverseRetraction())Initialize the Nesterov acceleration, where x0 initializes v.
Debug Actions
Manopt.DebugGradient — TypeDebugGradient <: DebugActiondebug for the gradient evaluated at the current iterate
Constructors
DebugGradient(; long=false, prefix= , format= "$prefix%s", io=stdout)display the short (false) or long (true) default text for the gradient, or set the prefix manually. Alternatively the complete format can be set.
Manopt.DebugGradientNorm — TypeDebugGradientNorm <: DebugActiondebug for gradient evaluated at the current iterate.
Constructors
DebugGradientNorm([long=false,p=print])display the short (false) or long (true) default text for the gradient norm.
DebugGradientNorm(prefix[, p=print])display the a prefix in front of the gradientnorm.
Manopt.DebugStepsize — TypeDebugStepsize <: DebugActiondebug for the current step size.
Constructors
DebugStepsize(;long=false,prefix="step size:", format="$prefix%s", io=stdout)display the a prefix in front of the step size.
Record Actions
Manopt.RecordGradient — TypeRecordGradient <: RecordActionrecord the gradient evaluated at the current iterate
Constructors
RecordGradient(ξ)initialize the RecordAction to the corresponding type of the tangent vector.
Manopt.RecordGradientNorm — TypeRecordGradientNorm <: RecordActionrecord the norm of the current gradient
Manopt.RecordStepsize — TypeRecordStepsize <: RecordActionrecord the step size
- ZhangSra2018
H. Zhang, S. Sra: Towards Riemannian Accelerated Gradient Methods, Preprint, 2018, arXiv: 1806.02812