Gradient descent

Manopt.gradient_descentFunction
gradient_descent(M, f, grad_f, p=rand(M); kwargs...)
gradient_descent(M, gradient_objective, p=rand(M); kwargs...)

perform a gradient descent

\[p_{k+1} = \operatorname{retr}_{p_k}\bigl( s_k\operatorname{grad}f(p_k) \bigr), \qquad k=0,1,…\]

with different choices of the stepsize $s_k$ available (see stepsize option below).

Input

  • M a manifold $\mathcal M$
  • f a cost function $f: \mathcal M→ℝ$ to find a minimizer $p^*$ for
  • grad_f the gradient $\operatorname{grad}f: \mathcal M → T\mathcal M$ of f as a function (M, p) -> X or a function (M, X, p) -> X
  • p an initial value p $= p_0 ∈ \mathcal M$

Alternatively to f and grad_f you can provide the AbstractManifoldGradientObjective gradient_objective directly.

Optional

If you provide the ManifoldGradientObjective directly, evaluation is ignored.

All other keyword arguments are passed to decorate_state! for state decorators or decorate_objective! for objective, respectively. If you provide the ManifoldGradientObjective directly, these decorations can still be specified

Output

the obtained (approximate) minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return for details

source
Manopt.gradient_descent!Function
gradient_descent!(M, f, grad_f, p; kwargs...)
gradient_descent!(M, gradient_objective, p; kwargs...)

perform a Gradient descent in-place of p

\[p_{k+1} = \operatorname{retr}_{p_k}\bigl( s_k\operatorname{grad}f(p_k) \bigr)\]

in place of p with different choices of $s_k$ available.

Input

  • M a manifold $\mathcal M$
  • f a cost function $F:\mathcal M→ℝ$ to minimize
  • grad_f the gradient $\operatorname{grad}F:\mathcal M→ T\mathcal M$ of F
  • p an initial value $p ∈ \mathcal M$

Alternatively to f and grad_f you can provide the AbstractManifoldGradientObjective gradient_objective directly.

For more options, especially Stepsizes for $s_k$, see gradient_descent

source

State

Manopt.GradientDescentStateType
GradientDescentState{P,T} <: AbstractGradientSolverState

Describes a Gradient based descent algorithm, with

Fields

a default value is given in brackets if a parameter can be left out in initialization.

  • p: (rand(M) the current iterate
  • X: (zero_vector(M,p)) the current gradient $\operatorname{grad}f(p)$, initialised to zero vector.
  • stopping_criterion: (StopAfterIteration(100)) a StoppingCriterion
  • stepsize: (default_stepsize(M, GradientDescentState)) a Stepsize
  • direction: (IdentityUpdateRule) a processor to compute the gradient
  • retraction_method: (default_retraction_method(M, typeof(p))) the retraction to use, defaults to the default set for your manifold.

Constructor

GradientDescentState(M, p=rand(M); X=zero_vector(M, p), kwargs...)

Generate gradient descent options, where X can be used to set the tangent vector to store the gradient in a certain type. All other fields are keyword arguments.

See also

gradient_descent

source

Direction update rules

A field of the options is the direction, a DirectionUpdateRule, which by default IdentityUpdateRule just evaluates the gradient but can be enhanced for example to

Manopt.DirectionUpdateRuleType
DirectionUpdateRule

A general functor, that handles direction update rules. It's fields are usually only a StoreStateAction by default initialized to the fields required for the specific coefficient, but can also be replaced by a (common, global) individual one that provides these values.

source
Manopt.IdentityUpdateRuleType
IdentityUpdateRule <: DirectionUpdateRule

The default gradient direction update is the identity, usually it just evaluates the gradient.

source
Manopt.MomentumGradientType
MomentumGradient <: DirectionUpdateRule

Append a momentum to a gradient processor, where the last direction and last iterate are stored and the new is composed as $η_i = m*η_{i-1}' - s d_i$, where $sd_i$ is the current (inner) direction and $η_{i-1}'$ is the vector transported last direction multiplied by momentum $m$.

Fields

  • p_old: (rand(M)) remember the last iterate for parallel transporting the last direction
  • momentum: (0.2) factor for momentum
  • direction: internal DirectionUpdateRule to determine directions to add the momentum to.
  • vector_transport_method: (default_vector_transport_method(M, typeof(p))) vector transport method to use
  • X_old: (zero_vector(M,x0)) the last gradient/direction update added as momentum

Constructors

Add momentum to a gradient problem, where by default just a gradient evaluation is used

MomentumGradient(
    M::AbstractManifold;
    p=rand(M),
    s::DirectionUpdateRule=IdentityUpdateRule();
    X=zero_vector(p.M, x0), momentum=0.2
    vector_transport_method=default_vector_transport_method(M, typeof(p)),
)

Initialize a momentum gradient rule to s, where p and X are memory for interim values.

source
Manopt.AverageGradientType
AverageGradient <: DirectionUpdateRule

Add an average of gradients to a gradient processor. A set of previous directions (from the inner processor) and the last iterate are stored, average is taken after vector transporting them to the current iterates tangent space.

Fields

  • gradients: the last n gradient/direction updates
  • last_iterate: last iterate (needed to transport the gradients)
  • direction: internal DirectionUpdateRule to determine directions to apply the averaging to
  • vector_transport_method: vector transport method to use

Constructors

AverageGradient(
    M::AbstractManifold,
    p::P=rand(M);
    n::Int=10
    s::DirectionUpdateRule=IdentityUpdateRule();
    gradients = fill(zero_vector(p.M, o.x),n),
    last_iterate = deepcopy(x0),
    vector_transport_method = default_vector_transport_method(M, typeof(p))
)

Add average to a gradient problem, where

  • n: determines the size of averaging
  • s: is the internal DirectionUpdateRule to determine the gradients to store
  • gradients: can be pre-filled with some history
  • last_iterate: stores the last iterate
  • vector_transport_method: determines how to transport all gradients to the current iterates tangent space before averaging
source
Manopt.NesterovType
Nesterov <: DirectionUpdateRule

Fields

  • γ
  • μ the strong convexity coefficient
  • v (=$=v_k$, $v_0=x_0$) an interim point to compute the next gradient evaluation point y_k
  • shrinkage (= i -> 0.8) a function to compute the shrinkage $β_k$ per iterate.

Assume $f$ is $L$-Lipschitz and $μ$-strongly convex. Given

  • a step size $h_k<\frac{1}{L}$ (from the GradientDescentState
  • a shrinkage parameter $β_k$
  • and a current iterate $x_k$
  • as well as the interim values $γ_k$ and $v_k$ from the previous iterate.

This compute a Nesterov type update using the following steps, see [ZS18]

  1. Compute the positive root $α_k∈(0,1)$ of $α^2 = h_k\bigl((1-α_k)γ_k+α_k μ\bigr)$.
  2. Set $\bar γ_k+1 = (1-α_k)γ_k + α_kμ$
  3. $y_k = \operatorname{retr}_{x_k}\Bigl(\frac{α_kγ_k}{γ_k + α_kμ}\operatorname{retr}^{-1}_{x_k}v_k \Bigr)$
  4. $x_{k+1} = \operatorname{retr}_{y_k}(-h_k \operatorname{grad}f(y_k))$
  5. $v_{k+1} = `\operatorname{retr}_{y_k}\Bigl(\frac{(1-α_k)γ_k}{\barγ_k}\operatorname{retr}_{y_k}^{-1}(v_k) - \frac{α_k}{\bar γ_{k+1}}\operatorname{grad}f(y_k) \Bigr)$
  6. $γ_{k+1} = \frac{1}{1+β_k}\bar γ_{k+1}$

Then the direction from $x_k$ to $x_k+1$ by $d = \operatorname{retr}^{-1}_{x_k}x_{k+1}$ is returned.

Constructor

Nesterov(M::AbstractManifold, p::P; γ=0.001, μ=0.9, shrinkage = k -> 0.8;
    inverse_retraction_method=LogarithmicInverseRetraction())

Initialize the Nesterov acceleration, where x0 initializes v.

source

Debug actions

Manopt.DebugGradientType
DebugGradient <: DebugAction

debug for the gradient evaluated at the current iterate

Constructors

DebugGradient(; long=false, prefix= , format= "$prefix%s", io=stdout)

display the short (false) or long (true) default text for the gradient, or set the prefix manually. Alternatively the complete format can be set.

source
Manopt.DebugGradientNormType
DebugGradientNorm <: DebugAction

debug for gradient evaluated at the current iterate.

Constructors

DebugGradientNorm([long=false,p=print])

display the short (false) or long (true) default text for the gradient norm.

DebugGradientNorm(prefix[, p=print])

display the a prefix in front of the gradient norm.

source
Manopt.DebugStepsizeType
DebugStepsize <: DebugAction

debug for the current step size.

Constructors

DebugStepsize(;long=false,prefix="step size:", format="$prefix%s", io=stdout)

display the a prefix in front of the step size.

source

Record actions

Manopt.RecordGradientType
RecordGradient <: RecordAction

record the gradient evaluated at the current iterate

Constructors

RecordGradient(ξ)

initialize the RecordAction to the corresponding type of the tangent vector.

source

Technical details

The gradient_descent solver requires the following functions of a manifold to be available

  • A retract!(M, q, p, X); it is recommended to set the default_retraction_method to a favourite retraction. If this default is set, a retraction_method= does not have to be specified.
  • By default gradient descent uses ArmijoLinesearch which requires max_stepsize(M) to be set and an implementation of inner(M, p, X).
  • By default the stopping criterion uses the norm as well, to stop when the norm of the gradient is small, but if you implemented inner, the norm is provided already.
  • By default the tangent vector storing the gradient is initialized calling zero_vector(M,p).

Literature

[Lue72]
D. G. Luenberger. The gradient projection method along geodesics. Management Science 18, 620–631 (1972).
[ZS18]
H. Zhang and S. Sra. Towards Riemannian accelerated gradient methods, arXiv Preprint, 1806.02812 (2018).