Gradient Descent
Manopt.gradient_descent
— Functiongradient_descent(M, f, grad_f, p=rand(M); kwargs...)
gradient_descent(M, gradient_objective, p=rand(M); kwargs...)
perform a gradient descent
\[p_{k+1} = \operatorname{retr}_{p_k}\bigl( s_k\operatorname{grad}f(p_k) \bigr), \qquad k=0,1,…\]
with different choices of the stepsize $s_k$ available (see stepsize
option below).
Input
M
– a manifold $\mathcal M$f
– a cost function $f: \mathcal M→ℝ$ to find a minimizer $p^*$ forgrad_f
– the gradient $\operatorname{grad}f: \mathcal M → T\mathcal M$ of f- as a function
(M, p) -> X
or a function(M, X, p) -> X
- as a function
p
– an initial valuep
$= p_0 ∈ \mathcal M$
Alternatively to f
and grad_f
you can prodive the AbstractManifoldGradientObjective
gradient_objective
directly.
Optional
direction
– (IdentityUpdateRule
) perform a processing of the direction, e.g.evaluation
– (AllocatingEvaluation
) specify whether the gradient works by allocation (default) formgrad_f(M, p)
orInplaceEvaluation
in place, i.e. is of the formgrad_f!(M, X, p)
.retraction_method
– (default_retraction_method(M, typeof(p))
) a retraction to usestepsize
– (ConstantStepsize
(1.)
) specify aStepsize
functor.stopping_criterion
– (StopWhenAny
(
StopAfterIteration
(200),
StopWhenGradientNormLess
(10.0^-8))
) a functor inheriting fromStoppingCriterion
indicating when to stop.
If you provide the ManifoldGradientObjective
directly, evaluation
is ignored.
All other keyword arguments are passed to decorate_state!
for state decorators or decorate_objective!
for objective, respectively. If you provide the ManifoldGradientObjective
directly, these decorations can still be specified
Output
the obtained (approximate) minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return
for details
Manopt.gradient_descent!
— Functiongradient_descent!(M, f, grad_f, p; kwargs...)
gradient_descent!(M, gradient_objective, p; kwargs...)
perform a gradient_descent
\[p_{k+1} = \operatorname{retr}_{p_k}\bigl( s_k\operatorname{grad}f(p_k) \bigr)\]
in place of p
with different choices of $s_k$ available.
Input
M
– a manifold $\mathcal M$f
– a cost function $F:\mathcal M→ℝ$ to minimizegrad_f
– the gradient $\operatorname{grad}F:\mathcal M→ T\mathcal M$ of Fp
– an initial value $p ∈ \mathcal M$
Alternatively to f
and grad_f
you can prodive the AbstractManifoldGradientObjective
gradient_objective
directly.
For more options, especially Stepsize
s for $s_k$, see gradient_descent
State
Manopt.GradientDescentState
— TypeGradientDescentState{P,T} <: AbstractGradientSolverState
Describes a Gradient based descent algorithm, with
Fields
a default value is given in brackets if a parameter can be left out in initialization.
p – (
rand(M)` the current iterateX
– (zero_vector(M,p)
) the current gradient $\operatorname{grad}f(p)$, initialised to zero vector.stopping_criterion
– (StopAfterIteration
(100)
) aStoppingCriterion
stepsize
– (default_stepsize
(M, GradientDescentState)
) aStepsize
direction
- (IdentityUpdateRule
) a processor to compute the gradientretraction_method
– (default_retraction_method(M, typeof(p))
) the retraction to use, defaults to the default set for your manifold.
Constructor
GradientDescentState(M, p=rand(M); X=zero_vector(M, p), kwargs...)
Generate gradient descent options, where X
can be used to set the tangent vector to store the gradient in a certain type; it will be initialised accordingly at a later stage. All following fields are keyword arguments.
See also
Direction Update Rules
A field of the options is the direction
, a DirectionUpdateRule
, which by default IdentityUpdateRule
just evaluates the gradient but can be enhanced for example to
Manopt.DirectionUpdateRule
— TypeDirectionUpdateRule
A general functor, that handles direction update rules. It's field(s) is usually only a StoreStateAction
by default initialized to the fields required for the specific coefficient, but can also be replaced by a (common, global) individual one that provides these values.
Manopt.IdentityUpdateRule
— TypeIdentityUpdateRule <: DirectionUpdateRule
The default gradient direction update is the identity, i.e. it just evaluates the gradient.
Manopt.MomentumGradient
— TypeMomentumGradient <: DirectionUpdateRule
Append a momentum to a gradient processor, where the last direction and last iterate are stored and the new is composed as $η_i = m*η_{i-1}' - s d_i$, where $sd_i$ is the current (inner) direction and $η_{i-1}'$ is the vector transported last direction multiplied by momentum $m$.
Fields
p_old
- (rand(M)
) remember the last iterate for parallel transporting the last directionmomentum
– (0.2
) factor for momentumdirection
– internalDirectionUpdateRule
to determine directions to add the momentum to.vector_transport_method
–default_vector_transport_method(M, typeof(p))
vector transport method to useX_old
– (zero_vector(M,x0)
) the last gradient/direction update added as momentum
Constructors
Add momentum to a gradient problem, where by default just a gradient evaluation is used
MomentumGradient(
M::AbstractManifold;
p=rand(M),
s::DirectionUpdateRule=IdentityUpdateRule();
X=zero_vector(p.M, x0), momentum=0.2
vector_transport_method=default_vector_transport_method(M, typeof(p)),
)
Initialize a momentum gradient rule to s
. Note that the keyword agruments p
and X
will be overriden often, so their initialisation is meant to set the to certain types of points or tangent vectors, if you do not use the default types with respect to M
.
Manopt.AverageGradient
— TypeAverageGradient <: DirectionUpdateRule
Add an average of gradients to a gradient processor. A set of previous directions (from the inner processor) and the last iterate are stored, average is taken after vector transporting them to the current iterates tangent space.
Fields
gradients
– (fill(zero_vector(M,x0),n)
) the lastn
gradient/direction updateslast_iterate
– last iterate (needed to transport the gradients)direction
– internalDirectionUpdateRule
to determine directions to apply the averaging tovector_transport_method
- vector transport method to use
Constructors
AverageGradient(
M::AbstractManifold,
p::P=rand(M);
n::Int=10
s::DirectionUpdateRule=IdentityUpdateRule();
gradients = fill(zero_vector(p.M, o.x),n),
last_iterate = deepcopy(x0),
vector_transport_method = default_vector_transport_method(M, typeof(p))
)
Add average to a gradient problem, where
n
determines the size of averagings
is the internalDirectionUpdateRule
to determine the gradients to storegradients
can be prefilled with some historylast_iterate
stores the last iteratevector_transport_method
determines how to transport all gradients to the current iterates tangent space before averaging
Manopt.Nesterov
— TypeNesterov <: DirectionUpdateRule
Fields
γ
μ
the strong convexity coefficientv
(=$=v_k$, $v_0=x_0$) an interims point to compute the next gradient evaluation pointy_k
shrinkage
(= i -> 0.8
) a function to compute the shrinkage $β_k$ per iterate.
Let's assume $f$ is $L$-Lipschitz and $μ$-strongly convex. Given
- a step size $h_k<\frac{1}{L}$ (from the
GradientDescentState
- a
shrinkage
parameter $β_k$ - and a current iterate $x_k$
- as well as the interims values $γ_k$ and $v_k$ from the previous iterate.
This compute a Nesterov type update using the following steps, see Zhang, Sra, Preprint, 2018
- Copute the positive root, i.e. $α_k∈(0,1)$ of $α^2 = h_k\bigl((1-α_k)γ_k+α_k μ\bigr)$.
- Set $\bar γ_k+1 = (1-α_k)γ_k + α_kμ$
- $y_k = \operatorname{retr}_{x_k}\Bigl(\frac{α_kγ_k}{γ_k + α_kμ}\operatorname{retr}^{-1}_{x_k}v_k \Bigr)$
- $x_{k+1} = \operatorname{retr}_{y_k}(-h_k \operatorname{grad}f(y_k))$
- $v_{k+1} = `\operatorname{retr}_{y_k}\Bigl(\frac{(1-α_k)γ_k}{\barγ_k}\operatorname{retr}_{y_k}^{-1}(v_k) - \frac{α_k}{\bar γ_{k+1}}\operatorname{grad}f(y_k) \Bigr)$
- $γ_{k+1} = \frac{1}{1+β_k}\bar γ_{k+1}$
Then the direction from $x_k$ to $x_k+1$, i.e. $d = \operatorname{retr}^{-1}_{x_k}x_{k+1}$ is returned.
Constructor
Nesterov(M::AbstractManifold, p::P; γ=0.001, μ=0.9, schrinkage = k -> 0.8;
inverse_retraction_method=LogarithmicInverseRetraction())
Initialize the Nesterov acceleration, where x0
initializes v
.
Debug Actions
Manopt.DebugGradient
— TypeDebugGradient <: DebugAction
debug for the gradient evaluated at the current iterate
Constructors
DebugGradient(; long=false, prefix= , format= "$prefix%s", io=stdout)
display the short (false
) or long (true
) default text for the gradient, or set the prefix
manually. Alternatively the complete format can be set.
Manopt.DebugGradientNorm
— TypeDebugGradientNorm <: DebugAction
debug for gradient evaluated at the current iterate.
Constructors
DebugGradientNorm([long=false,p=print])
display the short (false
) or long (true
) default text for the gradient norm.
DebugGradientNorm(prefix[, p=print])
display the a prefix
in front of the gradientnorm.
Manopt.DebugStepsize
— TypeDebugStepsize <: DebugAction
debug for the current step size.
Constructors
DebugStepsize(;long=false,prefix="step size:", format="$prefix%s", io=stdout)
display the a prefix
in front of the step size.
Record Actions
Manopt.RecordGradient
— TypeRecordGradient <: RecordAction
record the gradient evaluated at the current iterate
Constructors
RecordGradient(ξ)
initialize the RecordAction
to the corresponding type of the tangent vector.
Manopt.RecordGradientNorm
— TypeRecordGradientNorm <: RecordAction
record the norm of the current gradient
Manopt.RecordStepsize
— TypeRecordStepsize <: RecordAction
record the step size
Literature
- [Lue72]
-
D. G. Luenberger. The gradient projection method along geodesics. Management Science 18, 620–631 (1972).
- [ZS18]
-
H. Zhang and S. Sra. Towards Riemannian accelerated gradient methods, arXiv Preprint, 1806.02812 (2018).