Gradient descent

Manopt.gradient_descent — Function

gradient_descent(M, f, grad_f, p=rand(M); kwargs...)
gradient_descent(M, gradient_objective, p=rand(M); kwargs...)

perform a gradient descent

\[p_{k+1} = \operatorname{retr}_{p_k}\bigl( s_k\operatorname{grad}f(p_k) \bigr), \qquad k=0,1,…\]

with different choices of the stepsize $s_k$ available (see stepsize option below).

Input

M – a manifold $\mathcal M$
f – a cost function $f: \mathcal M→ℝ$ to find a minimizer $p^*$ for
grad_f – the gradient $\operatorname{grad}f: \mathcal M → T\mathcal M$ of f as a function (M, p) -> X or a function (M, X, p) -> X
p – an initial value p $= p_0 ∈ \mathcal M$

Alternatively to f and grad_f you can provide the AbstractManifoldGradientObjective gradient_objective directly.

Optional

direction – (IdentityUpdateRule) perform a processing of the direction, e.g.
evaluation – (AllocatingEvaluation) specify whether the gradient works by allocation (default) form grad_f(M, p) or InplaceEvaluation in place, i.e. is of the form grad_f!(M, X, p).
retraction_method – (default_retraction_method(M, typeof(p))`) a retraction to use
stepsize – (default_stepsize(M, GradientDescentState)) a Stepsize
stopping_criterion – (StopAfterIteration(200) |StopWhenGradientNormLess(1e-8)) a functor inheriting from StoppingCriterion indicating when to stop.
X - ([zero_vector(M,p)]) provide memory and/or type of the gradient to use`

If you provide the ManifoldGradientObjective directly, evaluation is ignored.

All other keyword arguments are passed to decorate_state! for state decorators or decorate_objective! for objective, respectively. If you provide the ManifoldGradientObjective directly, these decorations can still be specified

Output

the obtained (approximate) minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return for details

source

Manopt.gradient_descent! — Function

gradient_descent!(M, f, grad_f, p; kwargs...)
gradient_descent!(M, gradient_objective, p; kwargs...)

perform a gradient_descent

\[p_{k+1} = \operatorname{retr}_{p_k}\bigl( s_k\operatorname{grad}f(p_k) \bigr)\]

in place of p with different choices of $s_k$ available.

Input

M – a manifold $\mathcal M$
f – a cost function $F:\mathcal M→ℝ$ to minimize
grad_f – the gradient $\operatorname{grad}F:\mathcal M→ T\mathcal M$ of F
p – an initial value $p ∈ \mathcal M$

Alternatively to f and grad_f you can provide the AbstractManifoldGradientObjective gradient_objective directly.

For more options, especially Stepsizes for $s_k$, see gradient_descent

source

State

Manopt.GradientDescentState — Type

GradientDescentState{P,T} <: AbstractGradientSolverState

Describes a Gradient based descent algorithm, with

Fields

a default value is given in brackets if a parameter can be left out in initialization.

p – (rand(M) the current iterate
X – (zero_vector(M,p)) the current gradient $\operatorname{grad}f(p)$, initialised to zero vector.
stopping_criterion – (StopAfterIteration(100)) a StoppingCriterion
stepsize – (default_stepsize(M, GradientDescentState)) a Stepsize
direction - (IdentityUpdateRule) a processor to compute the gradient
retraction_method – (default_retraction_method(M, typeof(p))) the retraction to use, defaults to the default set for your manifold.

Constructor

GradientDescentState(M, p=rand(M); X=zero_vector(M, p), kwargs...)

Generate gradient descent options, where X can be used to set the tangent vector to store the gradient in a certain type; it will be initialised accordingly at a later stage. All following fields are keyword arguments.

See also

gradient_descent

source

Direction update rules

A field of the options is the direction, a DirectionUpdateRule, which by default IdentityUpdateRule just evaluates the gradient but can be enhanced for example to

Manopt.DirectionUpdateRule — Type

DirectionUpdateRule

A general functor, that handles direction update rules. It's field(s) is usually only a StoreStateAction by default initialized to the fields required for the specific coefficient, but can also be replaced by a (common, global) individual one that provides these values.

source

Manopt.IdentityUpdateRule — Type

IdentityUpdateRule <: DirectionUpdateRule

The default gradient direction update is the identity, i.e. it just evaluates the gradient.

source

Manopt.MomentumGradient — Type

MomentumGradient <: DirectionUpdateRule

Append a momentum to a gradient processor, where the last direction and last iterate are stored and the new is composed as $η_i = m*η_{i-1}' - s d_i$, where $sd_i$ is the current (inner) direction and $η_{i-1}'$ is the vector transported last direction multiplied by momentum $m$.

Fields

p_old - (rand(M)) remember the last iterate for parallel transporting the last direction
momentum – (0.2) factor for momentum
direction – internal DirectionUpdateRule to determine directions to add the momentum to.
vector_transport_method – default_vector_transport_method(M, typeof(p)) vector transport method to use
X_old – (zero_vector(M,x0)) the last gradient/direction update added as momentum

Constructors

Add momentum to a gradient problem, where by default just a gradient evaluation is used

MomentumGradient(
    M::AbstractManifold;
    p=rand(M),
    s::DirectionUpdateRule=IdentityUpdateRule();
    X=zero_vector(p.M, x0), momentum=0.2
    vector_transport_method=default_vector_transport_method(M, typeof(p)),
)

Initialize a momentum gradient rule to s. Note that the keyword arguments p and X will be overridden often, so their initialisation is meant to set the to certain types of points or tangent vectors, if you do not use the default types with respect to M.

source

Manopt.AverageGradient — Type

AverageGradient <: DirectionUpdateRule

Add an average of gradients to a gradient processor. A set of previous directions (from the inner processor) and the last iterate are stored, average is taken after vector transporting them to the current iterates tangent space.

Fields

gradients – (fill(zero_vector(M,x0),n)) the last n gradient/direction updates
last_iterate – last iterate (needed to transport the gradients)
direction – internal DirectionUpdateRule to determine directions to apply the averaging to
vector_transport_method - vector transport method to use

Constructors

AverageGradient(
    M::AbstractManifold,
    p::P=rand(M);
    n::Int=10
    s::DirectionUpdateRule=IdentityUpdateRule();
    gradients = fill(zero_vector(p.M, o.x),n),
    last_iterate = deepcopy(x0),
    vector_transport_method = default_vector_transport_method(M, typeof(p))
)

Add average to a gradient problem, where

n determines the size of averaging
s is the internal DirectionUpdateRule to determine the gradients to store
gradients can be prefilled with some history
last_iterate stores the last iterate
vector_transport_method determines how to transport all gradients to the current iterates tangent space before averaging

source

Manopt.Nesterov — Type

Nesterov <: DirectionUpdateRule

Fields

γ
μ the strong convexity coefficient
v (=$=v_k$, $v_0=x_0$) an interims point to compute the next gradient evaluation point y_k
shrinkage (= i -> 0.8) a function to compute the shrinkage $β_k$ per iterate.

Let's assume $f$ is $L$-Lipschitz and $μ$-strongly convex. Given

a step size $h_k<\frac{1}{L}$ (from the GradientDescentState
a shrinkage parameter $β_k$
and a current iterate $x_k$
as well as the interims values $γ_k$ and $v_k$ from the previous iterate.

This compute a Nesterov type update using the following steps, see Zhang, Sra, Preprint, 2018

Compute the positive root, i.e. $α_k∈(0,1)$ of $α^2 = h_k\bigl((1-α_k)γ_k+α_k μ\bigr)$.
Set $\bar γ_k+1 = (1-α_k)γ_k + α_kμ$
$y_k = \operatorname{retr}_{x_k}\Bigl(\frac{α_kγ_k}{γ_k + α_kμ}\operatorname{retr}^{-1}_{x_k}v_k \Bigr)$
$x_{k+1} = \operatorname{retr}_{y_k}(-h_k \operatorname{grad}f(y_k))$
$v_{k+1} = `\operatorname{retr}_{y_k}\Bigl(\frac{(1-α_k)γ_k}{\barγ_k}\operatorname{retr}_{y_k}^{-1}(v_k) - \frac{α_k}{\bar γ_{k+1}}\operatorname{grad}f(y_k) \Bigr)$
$γ_{k+1} = \frac{1}{1+β_k}\bar γ_{k+1}$

Then the direction from $x_k$ to $x_k+1$, i.e. $d = \operatorname{retr}^{-1}_{x_k}x_{k+1}$ is returned.

Constructor

Nesterov(M::AbstractManifold, p::P; γ=0.001, μ=0.9, shrinkage = k -> 0.8;
    inverse_retraction_method=LogarithmicInverseRetraction())

Initialize the Nesterov acceleration, where x0 initializes v.

source

Debug actions

Manopt.DebugGradient — Type

DebugGradient <: DebugAction

debug for the gradient evaluated at the current iterate

Constructors

DebugGradient(; long=false, prefix= , format= "$prefix%s", io=stdout)

display the short (false) or long (true) default text for the gradient, or set the prefix manually. Alternatively the complete format can be set.

source

Manopt.DebugGradientNorm — Type

DebugGradientNorm <: DebugAction

debug for gradient evaluated at the current iterate.

Constructors

DebugGradientNorm([long=false,p=print])

display the short (false) or long (true) default text for the gradient norm.

DebugGradientNorm(prefix[, p=print])

display the a prefix in front of the gradientnorm.

source

Manopt.DebugStepsize — Type

DebugStepsize <: DebugAction

debug for the current step size.

Constructors

DebugStepsize(;long=false,prefix="step size:", format="$prefix%s", io=stdout)

display the a prefix in front of the step size.

source

Record actions

Manopt.RecordGradient — Type

RecordGradient <: RecordAction

record the gradient evaluated at the current iterate

Constructors

RecordGradient(ξ)

initialize the RecordAction to the corresponding type of the tangent vector.

source

Manopt.RecordGradientNorm — Type

RecordGradientNorm <: RecordAction

record the norm of the current gradient

source

Manopt.RecordStepsize — Type

RecordStepsize <: RecordAction

record the step size

source

Technical details

The gradient_descent solver requires the following functions of a manifold to be available

A retract!(M, q, p, X); it is recommended to set the default_retraction_method to a favourite retraction. If this default is set, a retraction_method= does not have to be specified.
By default gradient descent uses ArmijoLinesearch which requires max_stepsize(M) to be set and an implementation of inner(M, p, X).
By default the stopping criterion uses the norm as well, to stop when the norm of the gradient is small, but if you implemented inner, the norm is provided already.
By default the tangent vector storing the gradient is initialized calling zero_vector(M,p).

Literature

[Lue72]: D. G. Luenberger. The gradient projection method along geodesics. Management Science 18, 620–631 (1972).
[ZS18]: H. Zhang and S. Sra. Towards Riemannian accelerated gradient methods, arXiv Preprint, 1806.02812 (2018).