Gradient Descent
Manopt.gradient_descent
— Functiongradient_descent(M, F, ∇F, x)
perform a gradientdescent ``x{k+1} = \mathrm{retr}{xk} sk\nabla f(xk)$with different choices of$s_k`available (see
stepsize` option below).
Input
M
– a manifold $\mathcal M$F
– a cost function $F\colon\mathcal M\to\mathbb R$ to minimize∇F
– the gradient $\nabla F\colon\mathcal M\to T\mathcal M$ of Fx
– an initial value $x ∈ \mathcal M$
Optional
stepsize
– (ConstantStepsize
(1.)
) specify aStepsize
functor.retraction_method
– (ExponentialRetraction()
) aretraction(M,x,ξ)
to use.stopping_criterion
– (StopWhenAny
(
StopAfterIteration
(200),
StopWhenGradientNormLess
(10.0^-8))
) a functor inheriting fromStoppingCriterion
indicating when to stop.return_options
– (false
) – if activated, the extended result, i.e. the completeOptions
are returned. This can be used to access recorded values. If set to false (default) just the optimal valuex_opt
if returned
... and the ones that are passed to decorate_options
for decorators.
Output
x_opt
– the resulting (approximately critical) point of gradientDescent
OR
options
- the options returned by the solver (seereturn_options
)
Manopt.gradient_descent!
— Functiongradient_descent!(M, F, ∇F, x)
perform a gradientdescent ``x{k+1} = \mathrm{retr}{xk} sk\nabla f(xk)$inplace of `x` with different choices of$s_k`` available.
Input
M
– a manifold $\mathcal M$F
– a cost function $F\colon\mathcal M\to\mathbb R$ to minimize∇F
– the gradient $\nabla F\colon\mathcal M\to T\mathcal M$ of Fx
– an initial value $x ∈ \mathcal M$
For more options, especially Stepsize
s for $s_k$, see gradient_descent
Options
Manopt.AbstractGradientDescentOptions
— TypeAbstractGradientDescentOptions <: Options
A generic Options
type for gradient based options data.
Manopt.GradientDescentOptions
— TypeGradientDescentOptions{P,T} <: AbstractGradientDescentOptions
Describes a Gradient based descent algorithm, with
Fields
a default value is given in brackets if a parameter can be left out in initialization.
x0
– an a point (of typeP
) on a manifold as starting pointstopping_criterion
– (StopAfterIteration
(100)
) aStoppingCriterion
stepsize
– (ConstantStepsize
(1.)
)aStepsize
direction
- (IdentityUpdateRule
) a processor to compute the gradientretraction_method
– (ExponentialRetraction()
) the rectraction to use, defaults to the exponential map
Constructor
GradientDescentOptions(x, stop, s [, retr=ExponentialRetraction()])
construct a Gradient Descent Option with the fields and defaults as above
See also
Direction Update Rules
A field of the options is the direction
, a DirectionUpdateRule
, which by default IdentityUpdateRule
just evaluates the gradient but can be enhanced for example to
Manopt.DirectionUpdateRule
— TypeDirectionUpdateRule
A general functor, that handles direction update rules. It's field(s) is usually only a StoreOptionsAction
by default initialized to the fields required for the specific coefficient, but can also be replaced by a (common, global) individual one that provides these values.
Manopt.IdentityUpdateRule
— TypeIdentityUpdateRule <: DirectionUpdateRule
The default gradient direction update is the identity, i.e. it just evaluates the gradient.
Manopt.MomentumGradient
— TypeMomentumGradient <: DirectionUpdateRule
Append a momentum to a gradient processor, where the last direction and last iterate are stored and the new is composed as $abla_i = m* abla_{i-1}' - s d_i$, where $sd_i$ is the current (inner) direction and $abla_{i-1}'$ is the vector transported last direction multiplied by momentum $m$.
Fields
∇
– (zero_tangent_vector(M,x0)
) the last gradient/direction update added as momentumlast_iterate
- remember the last iterate for parallel transporting the last directionmomentum
– (0.2
) factor for momentumdirection
– internalDirectionUpdateRule
to determine directions to add the momentum to.vector_transport_method
vector transport method to use
Constructors
MomentumGradient(
p::GradientProlem,
x0,
s::DirectionUpdateRule=Gradient();
∇=zero_tangent_vector(p.M, o.x), momentum=0.2
vector_transport_method=ParallelTransport(),
)
Add momentum to a gradient problem, where by default just a gradient evaluation is used Equivalently you can also use a Manifold
M
instead of the GradientProblem
p
.
MomentumGradient(
p::StochasticGradientProblem
x0
s::DirectionUpdateRule=StochasticGradient();
∇=zero_tangent_vector(p.M, x0), momentum=0.2
vector_transport_method=ParallelTransport(),
)
Add momentum to a stochastic gradient problem, where by default just a stochastic gradient evaluation is used
Manopt.AverageGradient
— TypeAverageGradient <: DirectionUpdateRule
Add an average of gradients to a gradient processor. A set of previous directions (from the inner processor) and the last iterate are stored, average is taken after vector transporting them to the current iterates tangent space.
Fields
gradients
– (fill(zero_tangent_vector(M,x0),n)
) the lastn
gradient/direction updateslast_iterate
– last iterate (needed to transport the gradients)direction
– internalDirectionUpdateRule
to determine directions to apply the averaging tovector_transport_method
- vector transport method to use
Constructors
AverageGradient(
p::GradientProlem,
x0,
n::Int=10
s::DirectionUpdateRule=IdentityUpdateRule();
gradients = fill(zero_tangent_vector(p.M, o.x),n),
last_iterate = deepcopy(x0),
vector_transport_method = ParallelTransport()
)
Add average to a gradient problem, n
determines the size of averaging and gradients
can be prefilled with some history Equivalently you can also use a Manifold
M
instead of the GradientProblem
p
.
AverageGradient(
p::StochasticGradientProblem
x0
n::Int=10
s::DirectionUpdateRule=StochasticGradient();
gradients = fill(zero_tangent_vector(p.M, o.x),n),
last_iterate = deepcopy(x0),
vector_transport_method = ParallelTransport()
)
Add average to a stochastic gradient problem, n
determines the size of averaging and gradients
can be prefilled with some history
Manopt.Nesterov
— TypeNesterov <: DirectionUpdateRule
Fields
γ
μ
the strong convexity coefficientv
(=$=v_k$, $v_0=x_0$) an interims point to compute the next gradient evaluation pointy_k
shrinkage
(= i -> 0.8
) a function to compute the shrinkage $β_k$ per iterate.
Let's assume $f$ is $L$-Lipschitz and $μ$-strongly convex. Given
- a step size $h_k<\frac{1}{L}$ (from the
GradientDescentOptions
- a
shrinkage
parameter $β_k$ - and a current iterate $x_k$
- as well as the interims values γ_k$and$v_k`` from the previous iterate.
This compute a Nesterov type update using the following steps, see [ZhangSra2018]
- Copute the positive root, i.e. $α_k∈(0,1)$ of $α^2 = h_k\bigl((1-α_k)γ_k+α_k μ\bigr)$.
- Set $\bar γ_k+1 = (1-α_k)γ_k + α_kμ$
- $y_k = \operatorname{retr}_{x_k}\Bigl(\frac{α_kγ_k}{γ_k + α_kμ}\operatorname{retr}^{-1}_{x_k}v_k \Bigr)$
- $x_{k+1} = \operatorname{retr}_{y_k}(-h_k ∇f(y_k))$
- $v_{k+1} = `\operatorname{retr}_{y_k}\Bigl(\frac{(1-α_k)γ_k}{\barγ_k}\operatorname{retr}_{y_k}^{-1}(v_k) - \frac{α_k}{\bar γ_{k+1}}∇f(y_k) \Bigr)$
- $γ_{k+1} = \frac{1}{1+β_k}\bar γ_{k+1}$
Then the direction from $x_k$ to $x_k+1$, i.e. $d = \operatorname{retr}^{-1}_{x_k}x_{k+1}$ is returned.
Constructor
Nesterov(x0::P, γ=0.001, μ=0.9, schrinkage = k -> 0.8;
inverse_retraction_method=LogarithmicInverseRetraction())
Initialize the Nesterov acceleration, where x0
initializes v
.
Debug Actions
Manopt.DebugGradient
— TypeDebugGradient <: DebugAction
debug for the gradient evaluated at the current iterate
Constructors
DebugGradient([long=false,p=print])
display the short (false
) or long (true
) default text for the gradient.
DebugGradient(prefix[, p=print])
display the a prefix
in front of the gradient.
Manopt.DebugGradientNorm
— TypeDebugGradientNorm <: DebugAction
debug for gradient evaluated at the current iterate.
Constructors
DebugGradientNorm([long=false,p=print])
display the short (false
) or long (true
) default text for the gradient norm.
DebugGradientNorm(prefix[, p=print])
display the a prefix
in front of the gradientnorm.
Manopt.DebugStepsize
— TypeDebugStepsize <: DebugAction
debug for the current step size.
Constructors
DebugStepsize([long=false,p=print])
display the short (false
) or long (true
) default text for the step size.
DebugStepsize(prefix[, p=print])
display the a prefix
in front of the step size.
Record Actions
Manopt.RecordGradient
— TypeRecordGradient <: RecordAction
record the gradient evaluated at the current iterate
Constructors
RecordGradient(ξ)
initialize the RecordAction
to the corresponding type of the tangent vector.
Manopt.RecordGradientNorm
— TypeRecordGradientNorm <: RecordAction
record the norm of the current gradient
Manopt.RecordStepsize
— TypeRecordStepsize <: RecordAction
record the step size
- ZhangSra2018
H. Zhang, S. Sra: Towards Riemannian Accelerated Gradient Methods, Preprint, 2018, arXiv: 1806.02812