Gradient descent
Manopt.gradient_descent
— Functiongradient_descent(M, f, grad_f, p=rand(M); kwargs...)
gradient_descent(M, gradient_objective, p=rand(M); kwargs...)
gradient_descent!(M, f, grad_f, p; kwargs...)
gradient_descent!(M, gradient_objective, p; kwargs...)
perform the gradient descent algorithm
\[p_{k+1} = \operatorname{retr}_{p_k}\bigl( s_k\operatorname{grad}f(p_k) \bigr), \qquad k=0,1,…\]
where $s_k > 0$ denotes a step size.
The algorithm can be performed in-place of p
.
Input
M::
AbstractManifold
: a Riemannian manifold $\mathcal M$
f
: a cost function $f: \mathcal M→ ℝ$ implemented as(M, p) -> v
grad_f
: the (Riemannian) gradient $\operatorname{grad}f: \mathcal M → T_{p}\mathcal M$ of f as a function(M, p) -> X
or a function(M, X, p) -> X
computingX
in-placep
: a point on the manifold $\mathcal M$
Alternatively to f
and grad_f
you can provide the corresponding AbstractManifoldGradientObjective
gradient_objective
directly.
Keyword arguments
direction=
IdentityUpdateRule
()
: specify to perform a certain processing of the direction, for exampleNesterov
,MomentumGradient
orAverageGradient
.evaluation=
AllocatingEvaluation
()
: specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (AllocatingEvaluation
) or whether they modify their input argument to return the result therein (InplaceEvaluation
). Since usually the first argument is the manifold, the modified argument is the second.For examplegrad_f(M,p)
allocates, butgrad_f!(M, X, p)
computes the result in-place ofX
.retraction_method=
default_retraction_method
(M, typeof(p))
: a retraction $\operatorname{retr}$ to use, see the section on retractionsstepsize=
default_stepsize
(M, GradientDescentState)
: a functor inheriting fromStepsize
to determine a step sizestopping_criterion=
StopAfterIteration
(200)
|
StopWhenGradientNormLess
(1e-8)
: a functor indicating that the stopping criterion is fulfilledX=
zero_vector
(M, p)
: a tangent vector at the point $p$ on the manifold $\mathcal M$storing the gradient at the current iterate
All other keyword arguments are passed to decorate_state!
for state decorators or decorate_objective!
for objective decorators, respectively.
If you provide the ManifoldGradientObjective
directly, the evaluation=
keyword is ignored. The decorations are still applied to the objective.
If you activate tutorial mode (cf. is_tutorial_mode
), this solver provides additional debug warnings.
Output
The obtained approximate minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return
for details, especially the return_state=
keyword.
Manopt.gradient_descent!
— Functiongradient_descent(M, f, grad_f, p=rand(M); kwargs...)
gradient_descent(M, gradient_objective, p=rand(M); kwargs...)
gradient_descent!(M, f, grad_f, p; kwargs...)
gradient_descent!(M, gradient_objective, p; kwargs...)
perform the gradient descent algorithm
\[p_{k+1} = \operatorname{retr}_{p_k}\bigl( s_k\operatorname{grad}f(p_k) \bigr), \qquad k=0,1,…\]
where $s_k > 0$ denotes a step size.
The algorithm can be performed in-place of p
.
Input
M::
AbstractManifold
: a Riemannian manifold $\mathcal M$
f
: a cost function $f: \mathcal M→ ℝ$ implemented as(M, p) -> v
grad_f
: the (Riemannian) gradient $\operatorname{grad}f: \mathcal M → T_{p}\mathcal M$ of f as a function(M, p) -> X
or a function(M, X, p) -> X
computingX
in-placep
: a point on the manifold $\mathcal M$
Alternatively to f
and grad_f
you can provide the corresponding AbstractManifoldGradientObjective
gradient_objective
directly.
Keyword arguments
direction=
IdentityUpdateRule
()
: specify to perform a certain processing of the direction, for exampleNesterov
,MomentumGradient
orAverageGradient
.evaluation=
AllocatingEvaluation
()
: specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (AllocatingEvaluation
) or whether they modify their input argument to return the result therein (InplaceEvaluation
). Since usually the first argument is the manifold, the modified argument is the second.For examplegrad_f(M,p)
allocates, butgrad_f!(M, X, p)
computes the result in-place ofX
.retraction_method=
default_retraction_method
(M, typeof(p))
: a retraction $\operatorname{retr}$ to use, see the section on retractionsstepsize=
default_stepsize
(M, GradientDescentState)
: a functor inheriting fromStepsize
to determine a step sizestopping_criterion=
StopAfterIteration
(200)
|
StopWhenGradientNormLess
(1e-8)
: a functor indicating that the stopping criterion is fulfilledX=
zero_vector
(M, p)
: a tangent vector at the point $p$ on the manifold $\mathcal M$storing the gradient at the current iterate
All other keyword arguments are passed to decorate_state!
for state decorators or decorate_objective!
for objective decorators, respectively.
If you provide the ManifoldGradientObjective
directly, the evaluation=
keyword is ignored. The decorations are still applied to the objective.
If you activate tutorial mode (cf. is_tutorial_mode
), this solver provides additional debug warnings.
Output
The obtained approximate minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return
for details, especially the return_state=
keyword.
State
Manopt.GradientDescentState
— TypeGradientDescentState{P,T} <: AbstractGradientSolverState
Describes the state of a gradient based descent algorithm.
Fields
p::P
: a point on the manifold $\mathcal M$storing the current iterateX::T
: a tangent vector at the point $p$ on the manifold $\mathcal M$storing the gradient at the current iteratestop::StoppingCriterion
: a functor indicating that the stopping criterion is fulfilledstepsize::Stepsize
: a functor inheriting fromStepsize
to determine a step sizedirection::
DirectionUpdateRule
: a processor to handle the obtained gradient and compute a direction to “walk into”.retraction_method::AbstractRetractionMethod
: a retraction $\operatorname{retr}$ to use, see the section on retractions
Constructor
GradientDescentState(M::AbstractManifold; kwargs...)
Initialize the gradient descent solver state, where
Input
M::
AbstractManifold
: a Riemannian manifold $\mathcal M$
Keyword arguments
direction=
IdentityUpdateRule
()
p=
rand
(M)
: a point on the manifold $\mathcal M$to specify the initial valuestopping_criterion=
StopAfterIteration
(100)
: a functor indicating that the stopping criterion is fulfilledstepsize=
default_stepsize
(M, GradientDescentState; retraction_method=retraction_method)
: a functor inheriting fromStepsize
to determine a step sizeretraction_method=
default_retraction_method
(M, typeof(p))
: a retraction $\operatorname{retr}$ to use, see the section on retractionsX=
zero_vector
(M, p)
: a tangent vector at the point $p$ on the manifold $\mathcal M$to specify the representation of a tangent vector
See also
Direction update rules
A field of the options is the direction
, a DirectionUpdateRule
, which by default IdentityUpdateRule
just evaluates the gradient but can be enhanced for example to
Manopt.AverageGradient
— FunctionAverageGradient(; kwargs...)
AverageGradient(M::AbstractManifold; kwargs...)
Add an average of gradients to a gradient processor. A set of previous directions (from the inner processor) and the last iterate are stored, average is taken after vector transporting them to the current iterates tangent space.
Input
M
(optional)
Keyword arguments
p=
rand
(M)
: a point on the manifold $\mathcal M$to specify the initial valuedirection=
IdentityUpdateRule
preprocess the actual gradient before adding momentumgradients=[zero_vector(M, p) for _ in 1:n]
how to initialise the internal storagen=10
number of gradient evaluations to take the mean overX=
zero_vector
(M, p)
: a tangent vector at the point $p$ on the manifold $\mathcal M$vector_transport_method=
default_vector_transport_method
(M, typeof(p))
: a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
This function generates a ManifoldDefaultsFactory
for AverageGradientRule
. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState
is available.
Manopt.DirectionUpdateRule
— TypeDirectionUpdateRule
A general functor, that handles direction update rules. It's fields are usually only a StoreStateAction
by default initialized to the fields required for the specific coefficient, but can also be replaced by a (common, global) individual one that provides these values.
Manopt.IdentityUpdateRule
— TypeIdentityUpdateRule <: DirectionUpdateRule
The default gradient direction update is the identity, usually it just evaluates the gradient.
You can also use Gradient()
to create the corresponding factory, though this only delays this parameter-free instantiation to later.
Manopt.MomentumGradient
— FunctionMomentumGradient()
Append a momentum to a gradient processor, where the last direction and last iterate are stored and the new is composed as $η_i = m*η_{i-1}' - s d_i$, where $sd_i$ is the current (inner) direction and $η_{i-1}'$ is the vector transported last direction multiplied by momentum $m$.
Input
M
(optional)
Keyword arguments
p=
rand
(M)
: a point on the manifold $\mathcal M$direction=
IdentityUpdateRule
preprocess the actual gradient before adding momentumX=
zero_vector
(M, p)
: a tangent vector at the point $p$ on the manifold $\mathcal M$momentum=0.2
amount of momentum to usevector_transport_method=
default_vector_transport_method
(M, typeof(p))
: a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
This function generates a ManifoldDefaultsFactory
for MomentumGradientRule
. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState
is available.
Manopt.Nesterov
— FunctionNesterov(; kwargs...)
Nesterov(M::AbstractManifold; kwargs...)
Assume $f$ is $L$-Lipschitz and $μ$-strongly convex. Given
- a step size $h_k<\frac{1}{L}$ (from the
GradientDescentState
- a
shrinkage
parameter $β_k$ - and a current iterate $p_k$
- as well as the interim values $γ_k$ and $v_k$ from the previous iterate.
This compute a Nesterov type update using the following steps, see [ZS18]
- Compute the positive root $α_k∈(0,1)$ of $α^2 = h_k\bigl((1-α_k)γ_k+α_k μ\bigr)$.
- Set $\barγ_k+1 = (1-α_k)γ_k + α_kμ$
- $y_k = \operatorname{retr}_{p_k}\Bigl(\frac{α_kγ_k}{γ_k + α_kμ}\operatorname{retr}^{-1}_{p_k}v_k \Bigr)$
- $x_{k+1} = \operatorname{retr}_{y_k}(-h_k \operatorname{grad}f(y_k))$
- $v_{k+1} = \operatorname{retr}_{y_k}\Bigl(\frac{(1-α_k)γ_k}{\barγ_k}\operatorname{retr}_{y_k}^{-1}(v_k) - \frac{α_k}{\barγ_{k+1}}\operatorname{grad}f(y_k) \Bigr)$
- $γ_{k+1} = \frac{1}{1+β_k}\barγ_{k+1}$
Then the direction from $p_k$ to $p_k+1$ by $d = \operatorname{retr}^{-1}_{p_k}p_{k+1}$ is returned.
Input
M
(optional)
Keyword arguments
p=
rand
(M)
: a point on the manifold $\mathcal M$to specify the initial valueγ=0.001
`μ=0.9
`shrinkage = k -> 0.8
inverse_retraction_method=
default_inverse_retraction_method
(M, typeof(p))
: an inverse retraction $\operatorname{retr}^{-1}$ to use, see the section on retractions and their inverses
This function generates a ManifoldDefaultsFactory
for NesterovRule
. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState
is available.
which internally use the ManifoldDefaultsFactory
and produce the internal elements
Manopt.AverageGradientRule
— TypeAverageGradientRule <: DirectionUpdateRule
Add an average of gradients to a gradient processor. A set of previous directions (from the inner processor) and the last iterate are stored. The average is taken after vector transporting them to the current iterates tangent space.
Fields
gradients
: the lastn
gradient/direction updateslast_iterate
: last iterate (needed to transport the gradients)direction
: internalDirectionUpdateRule
to determine directions to apply the averaging tovector_transport_method=
default_vector_transport_method
(M, typeof(p))
: a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
Constructors
AverageGradientRule(
M::AbstractManifold;
p::P=rand(M);
n::Int=10
direction::Union{<:DirectionUpdateRule,ManifoldDefaultsFactory}=IdentityUpdateRule(),
gradients = fill(zero_vector(p.M, o.x),n),
last_iterate = deepcopy(x0),
vector_transport_method = default_vector_transport_method(M, typeof(p))
)
Add average to a gradient problem, where
n
: determines the size of averagingdirection
: is the internalDirectionUpdateRule
to determine the gradients to storegradients
: can be pre-filled with some historylast_iterate
: stores the last iteratevector_transport_method=
default_vector_transport_method
(M, typeof(p))
: a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
Manopt.ConjugateDescentCoefficientRule
— TypeConjugateDescentCoefficientRule <: DirectionUpdateRule
A functor (problem, state, k) -> β_k
to compute the conjugate gradient update coefficient adapted to manifolds
See also conjugate_gradient_descent
Constructor
ConjugateDescentCoefficientRule()
Construct the conjugate descent coefficient update rule, a new storage is created by default.
See also
Manopt.MomentumGradientRule
— TypeMomentumGradientRule <: DirectionUpdateRule
Store the necessary information to compute the MomentumGradient
direction update.
Fields
p_old::P
: a point on the manifold $\mathcal M$momentum::Real
: factor for the momentumdirection
: internalDirectionUpdateRule
to determine directions to add the momentum to.vector_transport_method::AbstractVectorTransportMethodP
: a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transportsX_old::T
: a tangent vector at the point $p$ on the manifold $\mathcal M$
Constructors
MomentumGradientRule(M::AbstractManifold; kwargs...)
Initialize a momentum gradient rule to s
, where p
and X
are memory for interim values.
Keyword arguments
p=
rand
(M)
: a point on the manifold $\mathcal M$s=
IdentityUpdateRule
()
momentum=0.2
vector_transport_method=
default_vector_transport_method
(M, typeof(p))
: a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transportsX=
zero_vector
(M, p)
: a tangent vector at the point $p$ on the manifold $\mathcal M$
See also
Manopt.NesterovRule
— TypeNesterovRule <: DirectionUpdateRule
Compute a Nesterov inspired direction update rule. See Nesterov
for details
Fields
γ::Real
,μ::Real
: coefficients from the last iteratev::P
: an interim point to compute the next gradient evaluation pointy_k
shrinkage
: a functionk -> ...
to compute the shrinkage $β_k$ per iteratek
`.inverse_retraction_method=
default_inverse_retraction_method
(M, typeof(p))
: an inverse retraction $\operatorname{retr}^{-1}$ to use, see the section on retractions and their inverses
Constructor
NesterovRule(M::AbstractManifold; kwargs...)
Keyword arguments
p=
rand
(M)
: a point on the manifold $\mathcal M$to specify the initial valueγ=0.001
`μ=0.9
`shrinkage = k -> 0.8
inverse_retraction_method=
default_inverse_retraction_method
(M, typeof(p))
: an inverse retraction $\operatorname{retr}^{-1}$ to use, see the section on retractions and their inverses
See also
Debug actions
Manopt.DebugGradient
— TypeDebugGradient <: DebugAction
debug for the gradient evaluated at the current iterate
Constructors
DebugGradient(; long=false, prefix= , format= "$prefix%s", io=stdout)
display the short (false
) or long (true
) default text for the gradient, or set the prefix
manually. Alternatively the complete format can be set.
Manopt.DebugGradientNorm
— TypeDebugGradientNorm <: DebugAction
debug for gradient evaluated at the current iterate.
Constructors
DebugGradientNorm([long=false,p=print])
display the short (false
) or long (true
) default text for the gradient norm.
DebugGradientNorm(prefix[, p=print])
display the a prefix
in front of the gradient norm.
Manopt.DebugStepsize
— TypeDebugStepsize <: DebugAction
debug for the current step size.
Constructors
DebugStepsize(;long=false,prefix="step size:", format="$prefix%s", io=stdout)
display the a prefix
in front of the step size.
Record actions
Manopt.RecordGradient
— TypeRecordGradient <: RecordAction
record the gradient evaluated at the current iterate
Constructors
RecordGradient(ξ)
initialize the RecordAction
to the corresponding type of the tangent vector.
Manopt.RecordGradientNorm
— TypeRecordGradientNorm <: RecordAction
record the norm of the current gradient
Manopt.RecordStepsize
— TypeRecordStepsize <: RecordAction
record the step size
Technical details
The gradient_descent
solver requires the following functions of a manifold to be available
- A
retract!
(M, q, p, X)
; it is recommended to set thedefault_retraction_method
to a favourite retraction. If this default is set, aretraction_method=
does not have to be specified. - By default gradient descent uses
ArmijoLinesearch
which requiresmax_stepsize
(M)
to be set and an implementation ofinner
(M, p, X)
. - By default the stopping criterion uses the
norm
as well, to stop when the norm of the gradient is small, but if you implementedinner
, the norm is provided already. - By default the tangent vector storing the gradient is initialized calling
zero_vector
(M,p)
.
Literature
- [Lue72]
- D. G. Luenberger. The gradient projection method along geodesics. Management Science 18, 620–631 (1972).
- [ZS18]
- H. Zhang and S. Sra. Towards Riemannian accelerated gradient methods, arXiv Preprint, 1806.02812 (2018).