Manopt.gradient_descentFunction
gradient_descent(M, F, gradF, x)

$x_{k+1} = \operatorname{retr}_{x_k}\bigl( s_k\operatorname{grad}f(x_k) \bigr)$

with different choices of $s_k$ available (see stepsize option below).

Input

• M – a manifold $\mathcal M$
• F – a cost function $F: \mathcal M→ℝ$ to minimize
• gradF – the gradient $\operatorname{grad}F: \mathcal M → T\mathcal M$ of F
• x – an initial value $x ∈ \mathcal M$

Optional

... and the ones that are passed to decorate_options for decorators.

Output

• x_opt – the resulting (approximately critical) point of gradientDescent

OR

• options - the options returned by the solver (see return_options)
source
Manopt.gradient_descent!Function
gradient_descent!(M, F, gradF, x)

perform a gradientdescent x{k+1} = \mathrm{retr}{xk} sk\operatorname{grad}f(xk)$in place of x with different choices of$s_k available.

Input

• M – a manifold $\mathcal M$
• F – a cost function $F:\mathcal M→ℝ$ to minimize
• gradF – the gradient $\operatorname{grad}F:\mathcal M→ T\mathcal M$ of F
• x – an initial value $x ∈ \mathcal M$

For more options, especially Stepsizes for $s_k$, see gradient_descent

source

Options

Manopt.GradientDescentOptionsType
GradientDescentOptions{P,T} <: AbstractGradientOptions

Describes a Gradient based descent algorithm, with

Fields

a default value is given in brackets if a parameter can be left out in initialization.

Constructor

GradientDescentOptions(x, stop, s [, retr=ExponentialRetraction()])
GradientDescentOptions(M, x, stop, s [, retr=ExponentialRetraction()])
GradientDescentOptions(x, X, stop, s [, retr=ExponentialRetraction()])

construct a Gradient Descent Option with the fields and defaults as above, where the first can be used if points (x) and tangent vectors (gradient) have the same type, for example when they are matrices. The second uses the Manifold M to set gradient=zero_vector(M,x).

source

Direction Update Rules

A field of the options is the direction, a DirectionUpdateRule, which by default IdentityUpdateRule just evaluates the gradient but can be enhanced for example to

Manopt.MomentumGradientType
MomentumGradient <: DirectionUpdateRule

Append a momentum to a gradient processor, where the last direction and last iterate are stored and the new is composed as $η_i = m*η_{i-1}' - s d_i$, where $sd_i$ is the current (inner) direction and $η_{i-1}'$ is the vector transported last direction multiplied by momentum $m$.

Fields

• gradient – (zero_vector(M,x0)) the last gradient/direction update added as momentum
• last_iterate - remember the last iterate for parallel transporting the last direction
• momentum – (0.2) factor for momentum
• direction – internal DirectionUpdateRule to determine directions to add the momentum to.
• vector_transport_method vector transport method to use

Constructors

MomentumGradient(
x0,
vector_transport_method=ParallelTransport(),
)

Add momentum to a gradient problem, where by default just a gradient evaluation is used Equivalently you can also use a Manifold M instead of the GradientProblem p.

MomentumGradient(
x0
s::DirectionUpdateRule=IdentityUpdateRule();
vector_transport_method=ParallelTransport(),
)

Add momentum to a stochastic gradient problem, where by default just a stochastic gradient evaluation is used

source
Manopt.AverageGradientType
AverageGradient <: DirectionUpdateRule

Add an average of gradients to a gradient processor. A set of previous directions (from the inner processor) and the last iterate are stored, average is taken after vector transporting them to the current iterates tangent space.

Fields

• gradients – (fill(zero_vector(M,x0),n)) the last n gradient/direction updates
• last_iterate – last iterate (needed to transport the gradients)
• direction – internal DirectionUpdateRule to determine directions to apply the averaging to
• vector_transport_method - vector transport method to use

Constructors

AverageGradient(
x0,
n::Int=10
s::DirectionUpdateRule=IdentityUpdateRule();
last_iterate = deepcopy(x0),
vector_transport_method = ParallelTransport()
)

Add average to a gradient problem, n determines the size of averaging and gradients can be prefilled with some history Equivalently you can also use a Manifold M instead of the GradientProblem p.

AverageGradient(
x0
n::Int=10
s::DirectionUpdateRule=IdentityUpdateRule();
last_iterate = deepcopy(x0),
vector_transport_method = ParallelTransport()
)

Add average to a stochastic gradient problem, n determines the size of averaging and gradients can be prefilled with some history

source
Manopt.NesterovType
Nesterov <: DirectionUpdateRule

Fields

• γ
• μ the strong convexity coefficient
• v (=$=v_k$, $v_0=x_0$) an interims point to compute the next gradient evaluation point y_k
• shrinkage (= i -> 0.8) a function to compute the shrinkage $β_k$ per iterate.

Let's assume $f$ is $L$-Lipschitz and $μ$-strongly convex. Given

• a step size $h_k<\frac{1}{L}$ (from the GradientDescentOptions
• a shrinkage parameter $β_k$
• and a current iterate $x_k$
• as well as the interims values $γ_k$ and $v_k$ from the previous iterate.

This compute a Nesterov type update using the following steps, see [ZhangSra2018]

1. Copute the positive root, i.e. $α_k∈(0,1)$ of $α^2 = h_k\bigl((1-α_k)γ_k+α_k μ\bigr)$.
2. Set $\bar γ_k+1 = (1-α_k)γ_k + α_kμ$
3. $y_k = \operatorname{retr}_{x_k}\Bigl(\frac{α_kγ_k}{γ_k + α_kμ}\operatorname{retr}^{-1}_{x_k}v_k \Bigr)$
4. $x_{k+1} = \operatorname{retr}_{y_k}(-h_k \operatorname{grad}f(y_k))$
5. $v_{k+1} = \operatorname{retr}_{y_k}\Bigl(\frac{(1-α_k)γ_k}{\barγ_k}\operatorname{retr}_{y_k}^{-1}(v_k) - \frac{α_k}{\bar γ_{k+1}}\operatorname{grad}f(y_k) \Bigr)$
6. $γ_{k+1} = \frac{1}{1+β_k}\bar γ_{k+1}$

Then the direction from $x_k$ to $x_k+1$, i.e. $d = \operatorname{retr}^{-1}_{x_k}x_{k+1}$ is returned.

Constructor

Nesterov(x0::P, γ=0.001, μ=0.9, schrinkage = k -> 0.8;
inverse_retraction_method=LogarithmicInverseRetraction())

Initialize the Nesterov acceleration, where x0 initializes v.

source

Debug Actions

Manopt.DebugGradientType
DebugGradient <: DebugAction

debug for the gradient evaluated at the current iterate

Constructors

DebugGradient([long=false,p=print])

display the short (false) or long (true) default text for the gradient.

DebugGradient(prefix[, p=print])

display the a prefix in front of the gradient.

source
Manopt.DebugGradientNormType
DebugGradientNorm <: DebugAction

debug for gradient evaluated at the current iterate.

Constructors

DebugGradientNorm([long=false,p=print])

display the short (false) or long (true) default text for the gradient norm.

DebugGradientNorm(prefix[, p=print])

display the a prefix in front of the gradientnorm.

source
Manopt.DebugStepsizeType
DebugStepsize <: DebugAction

debug for the current step size.

Constructors

DebugStepsize([long=false,p=print])

display the short (false) or long (true) default text for the step size.

DebugStepsize(prefix[, p=print])

display the a prefix` in front of the step size.

source

Record Actions

• ZhangSra2018

H. Zhang, S. Sra: Towards Riemannian Accelerated Gradient Methods, Preprint, 2018, arXiv: 1806.02812