Gradient Descent

Manopt.gradient_descent — Function

gradient_descent(M, F, ∇F, x)

perform a gradientdescent ``x{k+1} = \mathrm{retr}{xk} sk\nabla f(xk)$with different choices of$s_k`available (seestepsize` option below).

Input

M – a manifold $\mathcal M$
F – a cost function $F\colon\mathcal M\to\mathbb R$ to minimize
∇F – the gradient $\nabla F\colon\mathcal M\to T\mathcal M$ of F
x – an initial value $x ∈ \mathcal M$

Optional

stepsize – (ConstantStepsize(1.)) specify a Stepsize functor.
retraction_method – (ExponentialRetraction()) a retraction(M,x,ξ) to use.
stopping_criterion – (StopWhenAny(StopAfterIteration(200),StopWhenGradientNormLess(10.0^-8))) a functor inheriting from StoppingCriterion indicating when to stop.
return_options – (false) – if activated, the extended result, i.e. the complete Options are returned. This can be used to access recorded values. If set to false (default) just the optimal value x_opt if returned

... and the ones that are passed to decorate_options for decorators.

Output

x_opt – the resulting (approximately critical) point of gradientDescent

options - the options returned by the solver (see return_options)

source

Manopt.gradient_descent! — Function

gradient_descent!(M, F, ∇F, x)

perform a gradientdescent ``x{k+1} = \mathrm{retr}{xk} sk\nabla f(xk)$inplace of `x` with different choices of$s_k`` available.

Input

M – a manifold $\mathcal M$
F – a cost function $F\colon\mathcal M\to\mathbb R$ to minimize
∇F – the gradient $\nabla F\colon\mathcal M\to T\mathcal M$ of F
x – an initial value $x ∈ \mathcal M$

For more options, especially Stepsizes for $s_k$, see gradient_descent

source

Options

Manopt.AbstractGradientDescentOptions — Type

AbstractGradientDescentOptions <: Options

A generic Options type for gradient based options data.

source

Manopt.GradientDescentOptions — Type

GradientDescentOptions{P,T} <: AbstractGradientDescentOptions

Describes a Gradient based descent algorithm, with

Fields

a default value is given in brackets if a parameter can be left out in initialization.

x0 – an a point (of type P) on a manifold as starting point
stopping_criterion – (StopAfterIteration(100)) a StoppingCriterion
stepsize – (ConstantStepsize(1.))a Stepsize
direction - (IdentityUpdateRule) a processor to compute the gradient
retraction_method – (ExponentialRetraction()) the rectraction to use, defaults to the exponential map

Constructor

GradientDescentOptions(x, stop, s [, retr=ExponentialRetraction()])

construct a Gradient Descent Option with the fields and defaults as above

See also

gradient_descent, GradientProblem

source

Direction Update Rules

A field of the options is the direction, a DirectionUpdateRule, which by default IdentityUpdateRule just evaluates the gradient but can be enhanced for example to

Manopt.DirectionUpdateRule — Type

DirectionUpdateRule

A general functor, that handles direction update rules. It's field(s) is usually only a StoreOptionsAction by default initialized to the fields required for the specific coefficient, but can also be replaced by a (common, global) individual one that provides these values.

source

Manopt.IdentityUpdateRule — Type

IdentityUpdateRule <: DirectionUpdateRule

The default gradient direction update is the identity, i.e. it just evaluates the gradient.

source

Manopt.MomentumGradient — Type

MomentumGradient <: DirectionUpdateRule

Append a momentum to a gradient processor, where the last direction and last iterate are stored and the new is composed as $abla_i = m* abla_{i-1}' - s d_i$, where $sd_i$ is the current (inner) direction and $abla_{i-1}'$ is the vector transported last direction multiplied by momentum $m$.

Fields

∇ – (zero_tangent_vector(M,x0)) the last gradient/direction update added as momentum
last_iterate - remember the last iterate for parallel transporting the last direction
momentum – (0.2) factor for momentum
direction – internal DirectionUpdateRule to determine directions to add the momentum to.
vector_transport_method vector transport method to use

Constructors

MomentumGradient(
    p::GradientProlem,
    x0,
    s::DirectionUpdateRule=Gradient();
    ∇=zero_tangent_vector(p.M, o.x), momentum=0.2
   vector_transport_method=ParallelTransport(),
)

Add momentum to a gradient problem, where by default just a gradient evaluation is used Equivalently you can also use a Manifold M instead of the GradientProblem p.

MomentumGradient(
    p::StochasticGradientProblem
    x0
    s::DirectionUpdateRule=StochasticGradient();
    ∇=zero_tangent_vector(p.M, x0), momentum=0.2
   vector_transport_method=ParallelTransport(),
)

Add momentum to a stochastic gradient problem, where by default just a stochastic gradient evaluation is used

source

Manopt.AverageGradient — Type

AverageGradient <: DirectionUpdateRule

Add an average of gradients to a gradient processor. A set of previous directions (from the inner processor) and the last iterate are stored, average is taken after vector transporting them to the current iterates tangent space.

Fields

gradients – (fill(zero_tangent_vector(M,x0),n)) the last n gradient/direction updates
last_iterate – last iterate (needed to transport the gradients)
direction – internal DirectionUpdateRule to determine directions to apply the averaging to
vector_transport_method - vector transport method to use

Constructors

AverageGradient(
    p::GradientProlem,
    x0,
    n::Int=10
    s::DirectionUpdateRule=IdentityUpdateRule();
    gradients = fill(zero_tangent_vector(p.M, o.x),n),
    last_iterate = deepcopy(x0),
    vector_transport_method = ParallelTransport()
)

Add average to a gradient problem, n determines the size of averaging and gradients can be prefilled with some history Equivalently you can also use a Manifold M instead of the GradientProblem p.

AverageGradient(
    p::StochasticGradientProblem
    x0
    n::Int=10
    s::DirectionUpdateRule=StochasticGradient();
    gradients = fill(zero_tangent_vector(p.M, o.x),n),
    last_iterate = deepcopy(x0),
    vector_transport_method = ParallelTransport()
)

Add average to a stochastic gradient problem, n determines the size of averaging and gradients can be prefilled with some history

source

Manopt.Nesterov — Type

Nesterov <: DirectionUpdateRule

Fields

γ
μ the strong convexity coefficient
v (=$=v_k$, $v_0=x_0$) an interims point to compute the next gradient evaluation point y_k
shrinkage (= i -> 0.8) a function to compute the shrinkage $β_k$ per iterate.

Let's assume $f$ is $L$-Lipschitz and $μ$-strongly convex. Given

a step size $h_k<\frac{1}{L}$ (from the GradientDescentOptions
a shrinkage parameter $β_k$
and a current iterate $x_k$
as well as the interims values γ_k$and$v_k`` from the previous iterate.

This compute a Nesterov type update using the following steps, see ^{[ZhangSra2018]}

Copute the positive root, i.e. $α_k∈(0,1)$ of $α^2 = h_k\bigl((1-α_k)γ_k+α_k μ\bigr)$.
Set $\bar γ_k+1 = (1-α_k)γ_k + α_kμ$
$y_k = \operatorname{retr}_{x_k}\Bigl(\frac{α_kγ_k}{γ_k + α_kμ}\operatorname{retr}^{-1}_{x_k}v_k \Bigr)$
$x_{k+1} = \operatorname{retr}_{y_k}(-h_k ∇f(y_k))$
$v_{k+1} = `\operatorname{retr}_{y_k}\Bigl(\frac{(1-α_k)γ_k}{\barγ_k}\operatorname{retr}_{y_k}^{-1}(v_k) - \frac{α_k}{\bar γ_{k+1}}∇f(y_k) \Bigr)$
$γ_{k+1} = \frac{1}{1+β_k}\bar γ_{k+1}$

Then the direction from $x_k$ to $x_k+1$, i.e. $d = \operatorname{retr}^{-1}_{x_k}x_{k+1}$ is returned.

Constructor

Nesterov(x0::P, γ=0.001, μ=0.9, schrinkage = k -> 0.8;
    inverse_retraction_method=LogarithmicInverseRetraction())

Initialize the Nesterov acceleration, where x0 initializes v.

source

Debug Actions

Manopt.DebugGradient — Type

DebugGradient <: DebugAction

debug for the gradient evaluated at the current iterate

Constructors

DebugGradient([long=false,p=print])

display the short (false) or long (true) default text for the gradient.

DebugGradient(prefix[, p=print])

display the a prefix in front of the gradient.

source

Manopt.DebugGradientNorm — Type

DebugGradientNorm <: DebugAction

debug for gradient evaluated at the current iterate.

Constructors

DebugGradientNorm([long=false,p=print])

display the short (false) or long (true) default text for the gradient norm.

DebugGradientNorm(prefix[, p=print])

display the a prefix in front of the gradientnorm.

source

Manopt.DebugStepsize — Type

DebugStepsize <: DebugAction

debug for the current step size.

Constructors

DebugStepsize([long=false,p=print])

display the short (false) or long (true) default text for the step size.

DebugStepsize(prefix[, p=print])

display the a prefix in front of the step size.

source

Record Actions

Manopt.RecordGradient — Type

RecordGradient <: RecordAction

record the gradient evaluated at the current iterate

Constructors

RecordGradient(ξ)

initialize the RecordAction to the corresponding type of the tangent vector.

source

Manopt.RecordGradientNorm — Type

RecordGradientNorm <: RecordAction

record the norm of the current gradient

source

Manopt.RecordStepsize — Type

RecordStepsize <: RecordAction

record the step size

source

ZhangSra2018
H. Zhang, S. Sra: Towards Riemannian Accelerated Gradient Methods, Preprint, 2018, arXiv: 1806.02812