# Stepsize and line search

Most iterative algorithms determine a direction along which the algorithm shall proceed and determine a step size to find the next iterate. How advanced the step size computation can be implemented depends (among others) on the properties the corresponding problem provides.

Within `Manopt.jl`

, the step size determination is implemented as a `functor`

which is a subtype of `Stepsize`

based on

`Manopt.Stepsize`

— Type`Stepsize`

An abstract type for the functors representing step sizes. These are callable structures. The naming scheme is `TypeOfStepSize`

, for example `ConstantStepsize`

.

Every Stepsize has to provide a constructor and its function has to have the interface `(p,o,i)`

where a `AbstractManoptProblem`

as well as `AbstractManoptSolverState`

and the current number of iterations are the arguments and returns a number, namely the stepsize to use.

**See also**

Usually, a constructor should take the manifold `M`

as its first argument, for consistency, to allow general step size functors to be set up based on default values that might depend on the manifold currently under consideration.

Currently, the following step sizes are available

`Manopt.AdaptiveWNGradient`

— Type`AdaptiveWNGradient <: DirectionUpdateRule`

Represent an adaptive gradient method introduced by [GS23].

Given a positive threshold $\hat c \mathbb N$, an minimal bound $b_{\mathrm{min}} > 0$, an initial $b_0 ≥ b_{\mathrm{min}}$, and a gradient reduction factor threshold $\alpha ∈ [0,1)$.

Set $c_0=0$ and use $\omega_0 = \lVert \operatorname{grad} f(p_0) \rvert_{p_0}$.

For the first iterate use the initial step size $s_0 = \frac{1}{b_0}$.

Then, given the last gradient $X_{k-1} = \operatorname{grad} f(x_{k-1})$, and a previous $\omega_{k-1}$, the values $(b_k, \omega_k, c_k)$ are computed using $X_k = \operatorname{grad} f(p_k)$ and the following cases

If $\lVert X_k \rVert_{p_k} \leq \alpha\omega_{k-1}$, then let $\hat b_{k-1} ∈ [b_\mathrm{min},b_{k-1}]$ and set

\[(b_k, \omega_k, c_k) = \begin{cases} \bigl(\hat b_{k-1}, \lVert X_k\rVert_{p_k}, 0 \bigr) & \text{ if } c_{k-1}+1 = \hat c\\ \Bigl(b_{k-1} + \frac{\lVert X_k\rVert_{p_k}^2}{b_{k-1}}, \omega_{k-1}, c_{k-1}+1 \Bigr) & \text{ if } c_{k-1}+1<\hat c \end{cases}\]

If $\lVert X_k \rVert_{p_k} > \alpha\omega_{k-1}$, the set

\[(b_k, \omega_k, c_k) = \Bigl( b_{k-1} + \frac{\lVert X_k\rVert_{p_k}^2}{b_{k-1}}, \omega_{k-1}, 0)\]

and return the step size $s_k = \frac{1}{b_k}$.

Note that for $α=0$ this is the Riemannian variant of `WNGRad`

.

**Fields**

`count_threshold::Int`

: (`4`

) an`Integer`

for $\hat c$`minimal_bound::Float64`

: (`1e-4`

) for $b_{\mathrm{min}}$`alternate_bound::Function`

: (`(bk, hat_c) -> min(gradient_bound, max(gradient_bound, bk/(3*hat_c)`

) how to determine $\hat b_k$ as a function of`(bmin, bk, hat_c) -> hat_bk`

`gradient_reduction::Float64`

: (`0.9`

)`gradient_bound`

`norm(M, p0, grad_f(M,p0))`

the bound $b_k$.

as well as the internal fields

`weight`

for $ω_k$ initialised to $ω_0 =$`norm(M, p0, grad_f(M,p0))`

if this is not zero,`1.0`

otherwise.`count`

for the $c_k$, initialised to $c_0 = 0$.

**Constructor**

`AdaptiveWNGrad(M=DefaultManifold, grad_f=(M, p) -> zero_vector(M, rand(M)), p=rand(M); kwargs...)`

Where all fields with defaults are keyword arguments and additional keyword arguments are

`adaptive`

: (`true`

) switches the`gradient_reduction`

`α`

`to`

0`.`evaluation`

: (`AllocatingEvaluation()`

) specifies whether the gradient (that is used for initialisation only) is mutating or allocating

`Manopt.ArmijoLinesearch`

— Type`ArmijoLinesearch <: Linesearch`

A functor representing Armijo line search including the last runs state string the last stepsize.

**Fields**

`initial_stepsize`

: (`1.0`

) and initial step size`retraction_method`

: (`default_retraction_method(M)`

) the retraction to use`contraction_factor`

: (`0.95`

) exponent for line search reduction`sufficient_decrease`

: (`0.1`

) gain within Armijo's rule`last_stepsize`

: (`initialstepsize`

) the last step size to start the search with`initial_guess`

: (`(p,s,i,l) -> l`

) based on a`AbstractManoptProblem`

`p`

,`AbstractManoptSolverState`

`s`

and a current iterate`i`

and a last step size`l`

, this returns an initial guess. The default uses the last obtained stepsize

as well as for internal use

`candidate_point`

: (`allocate_result(M, rand)`

) to store an interim result

Furthermore the following fields act as safeguards

`stop_when_stepsize_less`

: (`0.0`

) smallest stepsize when to stop (the last one before is taken)`stop_when_stepsize_exceeds`

: (`max_stepsize`

`(M, p)`

) largest stepsize when to stop.`stop_increasing_at_step`

: (`100`

) last step to increase the stepsize (phase 1),`stop_decreasing_at_step`

: (`1000`

) last step size to decrease the stepsize (phase 2),

Pass `:Messages`

to a `debug=`

to see `@info`

s when these happen.

**Constructor**

`ArmijoLinesearch(M=DefaultManifold())`

with the fields keyword arguments and the retraction is set to the default retraction on `M`

.

The constructors return the functor to perform Armijo line search, where

`(a::ArmijoLinesearch)(amp::AbstractManoptProblem, ams::AbstractManoptSolverState, i)`

of a `AbstractManoptProblem`

`amp`

, `AbstractManoptSolverState`

`ams`

and a current iterate `i`

with keywords.

**Keyword arguments**

`candidate_point`

: (`allocate_result(M, rand)`

) to pass memory for the candidate point`η`

: (`-get_gradient(mp, get_iterate(s));`

) the search direction to use,

by default the steepest descent direction.

`Manopt.ConstantStepsize`

— Type`ConstantStepsize <: Stepsize`

A functor that always returns a fixed step size.

**Fields**

`length`

: constant value for the step size`type`

: a symbol that indicates whether the stepsize is relatively (:relative), with respect to the gradient norm, or absolutely (:absolute) constant.

**Constructors**

`ConstantStepsize(s::Real, t::Symbol=:relative)`

initialize the stepsize to a constant `s`

of type `t`

.

```
ConstantStepsize(M::AbstractManifold=DefaultManifold(2);
stepsize=injectivity_radius(M)/2, type::Symbol=:relative
)
```

initialize the stepsize to a constant `stepsize`

, which by default is half the injectivity radius, unless the radius is infinity, then the default step size is `1`

.

`Manopt.DecreasingStepsize`

— Type`DecreasingStepsize()`

A functor that represents several decreasing step sizes

**Fields**

`exponent`

: (`1`

) a value $e$ the current iteration numbers $e$th exponential is taken of`factor`

: (`1`

) a value $f$ to multiply the initial step size with every iteration`length`

: (`1`

) the initial step size $l$.`subtrahend`

: (`0`

) a value $a$ that is subtracted every iteration`shift`

: (`0`

) shift the denominator iterator $i$ by $s$`.`type`

: a symbol that indicates whether the stepsize is relatively (:relative), with respect to the gradient norm, or absolutely (:absolute) constant.

In total the complete formulae reads for the $i$th iterate as

\[s_i = \frac{(l - i a)f^i}{(i+s)^e}\]

and hence the default simplifies to just $s_i = \frac{l}{i}$

**Constructor**

`DecreasingStepsize(l=1,f=1,a=0,e=1,s=0,type=:relative)`

Alternatively one can also use the following keyword.

```
DecreasingStepsize(
M::AbstractManifold=DefaultManifold(3);
length=injectivity_radius(M)/2, multiplier=1.0, subtrahend=0.0,
exponent=1.0, shift=0, type=:relative
)
```

initializes all fields, where none of them is mandatory and the length is set to half and to $1$ if the injectivity radius is infinite.

`Manopt.Linesearch`

— Type`Linesearch <: Stepsize`

An abstract functor to represent line search type step size determinations, see `Stepsize`

for details. One example is the `ArmijoLinesearch`

functor.

Compared to simple step sizes, the line search functors provide an interface of the form `(p,o,i,η) -> s`

with an additional (but optional) fourth parameter to provide a search direction; this should default to something reasonable, most prominently the negative gradient.

`Manopt.NonmonotoneLinesearch`

— Type`NonmonotoneLinesearch <: Linesearch`

A functor representing a nonmonotone line search using the Barzilai-Borwein step size [IP17]. Together with a gradient descent algorithm this line search represents the Riemannian Barzilai-Borwein with nonmonotone line-search (RBBNMLS) algorithm. The order is shifted in comparison of the algorithm steps from the paper by Iannazzo and Porcelli so that in each iteration this line search first finds

\[y_{k} = \operatorname{grad}F(x_{k}) - \operatorname{T}_{x_{k-1} → x_k}(\operatorname{grad}F(x_{k-1}))\]

and

\[s_{k} = - α_{k-1} * \operatorname{T}_{x_{k-1} → x_k}(\operatorname{grad}F(x_{k-1})),\]

where $α_{k-1}$ is the step size computed in the last iteration and $\operatorname{T}$ is a vector transport. Then the Barzilai—Borwein step size is

\[α_k^{\text{BB}} = \begin{cases} \min(α_{\text{max}}, \max(α_{\text{min}}, τ_{k})), & \text{if } ⟨s_{k}, y_{k}⟩_{x_k} > 0,\\ α_{\text{max}}, & \text{else,} \end{cases}\]

where

\[τ_{k} = \frac{⟨s_{k}, s_{k}⟩_{x_k}}{⟨s_{k}, y_{k}⟩_{x_k}},\]

if the direct strategy is chosen,

\[τ_{k} = \frac{⟨s_{k}, y_{k}⟩_{x_k}}{⟨y_{k}, y_{k}⟩_{x_k}},\]

in case of the inverse strategy and an alternation between the two in case of the alternating strategy. Then find the smallest $h = 0, 1, 2, …$ such that

\[F(\operatorname{retr}_{x_k}(- σ^h α_k^{\text{BB}} \operatorname{grad}F(x_k))) \leq \max_{1 ≤ j ≤ \min(k+1,m)} F(x_{k+1-j}) - γ σ^h α_k^{\text{BB}} ⟨\operatorname{grad}F(x_k), \operatorname{grad}F(x_k)⟩_{x_k},\]

where $σ$ is a step length reduction factor $∈ (0,1)$, $m$ is the number of iterations after which the function value has to be lower than the current one and $γ$ is the sufficient decrease parameter $∈(0,1)$. Then find the new stepsize by

\[α_k = σ^h α_k^{\text{BB}}.\]

**Fields**

`initial_stepsize`

: (`1.0`

) the step size to start the search with`memory_size`

: (`10`

) number of iterations after which the cost value needs to be lower than the current one`bb_min_stepsize`

: (`1e-3`

) lower bound for the Barzilai-Borwein step size greater than zero`bb_max_stepsize`

: (`1e3`

) upper bound for the Barzilai-Borwein step size greater than min_stepsize`retraction_method`

: (`ExponentialRetraction()`

) the retraction to use`strategy`

: (`direct`

) defines if the new step size is computed using the direct, indirect or alternating strategy`storage`

: (for`:Iterate`

and`:Gradient`

) a`StoreStateAction`

`stepsize_reduction`

: (`0.5`

) step size reduction factor contained in the interval (0,1)`sufficient_decrease`

: (`1e-4`

) sufficient decrease parameter contained in the interval (0,1)`vector_transport_method`

: (`ParallelTransport()`

) the vector transport method to use

as well as for internal use

`candidate_point`

: (`allocate_result(M, rand)`

) to store an interim result

Furthermore the following fields act as safeguards

`stop_when_stepsize_less: (`

0.0`) smallest stepsize when to stop (the last one before is taken)`stop_when_stepsize_exceeds`

: (`max_stepsize`

`(M, p)`

) largest stepsize when to stop.`stop_increasing_at_step`

: (^100`) last step to increase the stepsize (phase 1),`stop_decreasing_at_step`

: (`1000`

) last step size to decrease the stepsize (phase 2),

Pass `:Messages`

to a `debug=`

to see `@info`

s when these happen.

**Constructor**

`NonmonotoneLinesearch()`

with the fields their order as optional arguments (deprecated). THis is deprecated, since both defaults and the memory allocation for the candidate do not take into account which manifold the line search operates on.

`NonmonotoneLinesearch(M)`

with the fields as keyword arguments and where the retraction and vector transport are set to the default ones on `M`

, respectively.

The constructors return the functor to perform nonmonotone line search.

`Manopt.PolyakStepsize`

— Type`PolyakStepsize <: Stepsize`

Compute a step size

\[α_i = \frac{f(p^{(i)}) - f_{\text{best}}+γ_i}{\lVert{ ∂f(p^{(i)})} \rVert^2},\]

where $f_{\text{best}}$ is the best cost value seen until the current iteration, and $γ_i$ is a sequence of numbers that is square summable, but not summable (the sum must diverge to infinity). Furthermore $∂f$ denotes a subgradient of $f$ at the current iterate $p^{(i)}$.

The step size is an adaption from the Dynamic step size discussed in Section 3.2 of [Ber15], both to the Riemannian case and to approximate the minimum cost value.

**Fields**

`γ`

: a function`i -> ...`

representing the sequence.`best_cost_value`

: storing the value`f_{\text{best}}`

**Constructor**

```
PolyakStepsize(;
γ = i -> 1/i,
initial_cost_estimate=0.0
)
```

initialize the Polyak stepsize to a certain sequence and an initial estimate of $f_{\text{best}}$

`Manopt.WolfePowellBinaryLinesearch`

— Type`WolfePowellBinaryLinesearch <: Linesearch`

A `Linesearch`

method that determines a step size `t`

fulfilling the Wolfe conditions

based on a binary chop. Let $η$ be a search direction and $c1,c_2>0$ be two constants. Then with

\[A(t) = f(x_+) ≤ c1 t ⟨\operatorname{grad}f(x), η⟩_{x} \quad\text{and}\quad W(t) = ⟨\operatorname{grad}f(x_+), \text{V}_{x_+\gets x}η⟩_{x_+} ≥ c_2 ⟨η, \operatorname{grad}f(x)⟩_x,\]

where $x_+ = \operatorname{retr}_x(tη)$ is the current trial point, and $\text{V}$ is a vector transport. Then the following Algorithm is performed similar to Algorithm 7 from [Hua14]

- set $α=0$, $β=∞$ and $t=1$.
- While either $A(t)$ does not hold or $W(t)$ does not hold do steps 3-5.
- If $A(t)$ fails, set $β=t$.
- If $A(t)$ holds but $W(t)$ fails, set $α=t$.
- If $β<∞$ set $t=\frac{α+β}{2}$, otherwise set $t=2α$.

**Constructors**

There exist two constructors, where, when provided the manifold `M`

as a first (optional) parameter, its default retraction and vector transport are the default. In this case the retraction and the vector transport are also keyword arguments for ease of use. The other constructor is kept for backward compatibility.

```
WolfePowellLinesearch(
M=DefaultManifold(),
c1::Float64=10^(-4),
c2::Float64=0.999;
retraction_method = default_retraction_method(M),
vector_transport_method = default_vector_transport(M),
linesearch_stopsize = 0.0
)
```

`Manopt.WolfePowellLinesearch`

— Type`WolfePowellLinesearch <: Linesearch`

Do a backtracking line search to find a step size $α$ that fulfils the Wolfe conditions along a search direction $η$ starting from $x$ by

\[f\bigl( \operatorname{retr}_x(αη) \bigr) ≤ f(x_k) + c_1 α_k ⟨\operatorname{grad}f(x), η⟩_x \quad\text{and}\quad \frac{\mathrm{d}}{\mathrm{d}t} f\bigr(\operatorname{retr}_x(tη)\bigr) \Big\vert_{t=α} ≥ c_2 \frac{\mathrm{d}}{\mathrm{d}t} f\bigl(\operatorname{retr}_x(tη)\bigr)\Big\vert_{t=0}.\]

**Constructors**

There exist two constructors, where, when provided the manifold `M`

as a first (optional) parameter, its default retraction and vector transport are the default. In this case the retraction and the vector transport are also keyword arguments for ease of use. The other constructor is kept for backward compatibility. Note that the `stop_when_stepsize_less`

to stop for too small stepsizes is only available in the new signature including `M`

.

`WolfePowellLinesearch(M, c1::Float64=10^(-4), c2::Float64=0.999; kwargs...`

Generate a Wolfe-Powell line search

**Keyword arguments**

`candidate_point`

: (`allocate_result(M, rand)`

) memory for a candidate`candidate_tangent`

: (`allocate_result(M, zero_vector, candidate_point)`

) memory for a gradient`candidate_direcntion`

: (`allocate_result(M, zero_vector, candidate_point)`

) memory for a direction`max_stepsize`

: (`max_stepsize`

`(M, p)`

) largest stepsize allowed here.`retraction_method`

: (`ExponentialRetraction()`

) the retraction to use`stop_when_stepsize_less`

: (`0.0`

) smallest stepsize when to stop (the last one before is taken)`vector_transport_method`

: (`ParallelTransport()`

) the vector transport method to use

`Manopt.default_stepsize`

— Method`default_stepsize(M::AbstractManifold, ams::AbstractManoptSolverState)`

Returns the default `Stepsize`

functor used when running the solver specified by the `AbstractManoptSolverState`

`ams`

running with an objective on the `AbstractManifold M`

.

`Manopt.get_last_stepsize`

— Method`get_last_stepsize(amp::AbstractManoptProblem, ams::AbstractManoptSolverState, vars...)`

return the last computed stepsize stored within `AbstractManoptSolverState`

`ams`

when solving the `AbstractManoptProblem`

`amp`

.

This method takes into account that `ams`

might be decorated. In case this returns `NaN`

, a concrete call to the stored stepsize is performed. For this, usually, the first of the `vars...`

should be the current iterate.

`Manopt.get_last_stepsize`

— Method`get_last_stepsize(::Stepsize, vars...)`

return the last computed stepsize from within the stepsize. If no last step size is stored, this returns `NaN`

.

`Manopt.get_stepsize`

— Method`get_stepsize(amp::AbstractManoptProblem, ams::AbstractManoptSolverState, vars...)`

return the stepsize stored within `AbstractManoptSolverState`

`ams`

when solving the `AbstractManoptProblem`

`amp`

. This method also works for decorated options and the `Stepsize`

function within the options, by default stored in `ams.stepsize`

.

`Manopt.linesearch_backtrack!`

— Method`(s, msg) = linesearch_backtrack!(M, q, F, p, X, s, decrease, contract η = -X, f0 = f(p))`

Perform a line search backtrack in-place of `q`

. For all details and options, see `linesearch_backtrack`

`Manopt.linesearch_backtrack`

— Method```
(s, msg) = linesearch_backtrack(M, F, p, X, s, decrease, contract η = -X, f0 = f(p))
(s, msg) = linesearch_backtrack!(M, q, F, p, X, s, decrease, contract η = -X, f0 = f(p))
```

perform a line search

- on manifold
`M`

- for the cost function
`f`

, - at the current point
`p`

- with current gradient provided in
`X`

- an initial stepsize
`s`

- a sufficient
`decrease`

- a
`contract`

ion factor $σ$ - a
`retr`

action, which defaults to the`default_retraction_method(M)`

- a search direction $η = -X$
- an offset, $f_0 = F(x)$

the method can also be performed in-place of `q`

, that is the resulting best point one reaches with the step size `s`

as second argument.

**Keywords**

`retraction_method`

: (`default_retraction_method(M)`

) the retraction to use.`stop_when_stepsize_less`

: (`0.0`

) to avoid numerical underflow`stop_when_stepsize_exceeds`

: (`max_stepsize`

`(M, p) / norm(M, p, η)`

) to avoid leaving the injectivity radius on a manifold`stop_increasing_at_step`

: (`100`

) stop the initial increase of step size after these many steps`stop_decreasing_at_step`

: (`1000`

) stop the decreasing search after these many steps

These keywords are used as safeguards, where only the max stepsize is a very manifold specific one.

**Return value**

A stepsize `s`

and a message `msg`

(in case any of the 4 criteria hit)

`Manopt.max_stepsize`

— Method```
max_stepsize(M::AbstractManifold, p)
max_stepsize(M::AbstractManifold)
```

Get the maximum stepsize (at point `p`

) on manifold `M`

. It should be used to limit the distance an algorithm is trying to move in a single step.

## Literature

- [Ber15]
- D. P. Bertsekas.
*Convex Optimization Algorithms*(Athena Scientific, 2015); p. 576. - [GS23]
- G. N. Grapiglia and G. F. Stella.
*An Adaptive Riemannian Gradient Method Without Function Evaluations*. Journal of Optimization Theory and Applications**197**, 1140–1160 (2023). - [Hua14]
- W. Huang.
*Optimization algorithms on Riemannian manifolds with applications*. Ph.D. Thesis, Flordia State University (2014). - [IP17]
- B. Iannazzo and M. Porcelli.
*The Riemannian Barzilai–Borwein method with nonmonotone line search and the matrix geometric mean computation*. IMA Journal of Numerical Analysis**38**, 495–517 (2017).