# Illustration how to use mutating gradient functions

When it comes to time critital operations, a main ingredient in Julia are mutating functions, i.e. those that compute in place without additional Memory allocations. In the following the illustrate how to do this with Manopt.jl.

Let's start with the same function as in Get Started: Optimize! and compute the mean of some points. Just that here we use the sphere $\mathbb S^{30}$ and n=800 points.

From the just mentioned example, the implementation looks like

using Manopt, Manifolds, Random, BenchmarkTools
begin
Random.seed!(42)
m = 30
M = Sphere(m)
n = 800
σ = π / 8
x = zeros(Float64, m + 1)
x[2] = 1.0
data = [exp(M, x, random_tangent(M, x, Val(:Gaussian), σ)) for i in 1:n]
end;

## Classical definition

The variant from the previous tutorial defines a cost $F(x)$ and its gradient $gradF(x)$

F(M, x) = sum(1 / (2 * n) * distance.(Ref(M), Ref(x), data) .^ 2)
F (generic function with 1 method)
gradF(M, x) = sum(1 / n * grad_distance.(Ref(M), data, Ref(x)))
gradF (generic function with 1 method)

we further set the stopping criterion to be a little more strict, then we obtain

begin
x0 = random_point(M)
@benchmark gradient_descent($M,$F, $gradF,$x0; stopping_criterion=$sc) end BenchmarkTools.Trial: 413 samples with 1 evaluation. Range (min … max): 6.059 ms … 47.352 ms ┊ GC (min … max): 0.00% … 76.55% Time (median): 11.801 ms ┊ GC (median): 0.00% Time (mean ± σ): 12.109 ms ± 8.161 ms ┊ GC (mean ± σ): 17.63% ± 19.32% █▅▁ ▆█▅ ███▇▅▇▇▆▇███▇▄▄▁▁▄▁▄▄▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▄▅▅▄▆▇▇▇▅▅ ▇ 6.06 ms Histogram: log(frequency) by time 41 ms < Memory estimate: 8.61 MiB, allocs estimate: 29433. ## Inplace computation of the gradient We can reduce the memory allocations, by implementing the gradient as a functor. The motivation is twofold: On the one hand, we want to avoid variables from global scope, for example the manifold M or the data, to be used within the function For more complicated cost functions it might also be worth considering to do the same. Here we store the data (as reference) and one temporary memory in order to avoid reallocation of memory per grad_distance computation. We have begin struct grad!{TD,TTMP} data::TD tmp::TTMP end function (gradf!::grad!)(M, X, x) fill!(X, 0) for di in gradf!.data grad_distance!(M, gradf!.tmp, di, x) X .+= gradf!.tmp end X ./= length(gradf!.data) return X end end Then we just have to initialize the gradient and perform our final benchmark. Note that we also have to interpolate all variables passed to the benchmark with a $.

begin
m2 = deepcopy(x0)
M, F, gradF2!, m2; evaluation=MutatingEvaluation(), stopping_criterion=sc
)
$M,$F, $gradF2!, m2; evaluation=$(MutatingEvaluation()), stopping_criterion=$sc ) setup = (m2 = deepcopy($x0))
end
BenchmarkTools.Trial: 1334 samples with 1 evaluation.
Range (min … max):  3.470 ms …  18.176 ms  ┊ GC (min … max): 0.00% … 75.39%
Time  (median):     3.676 ms               ┊ GC (median):    0.00%
Time  (mean ± σ):   3.737 ms ± 471.707 μs  ┊ GC (mean ± σ):  0.27% ±  2.06%

▁▇██▆█▆▇▆▄▆▂▃
▃▄▆██████████████▇█▇▅▅▄▄▂▄▄▄▃▃▂▃▃▃▂▃▂▃▂▂▁▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▁▂▂ ▄
3.47 ms         Histogram: frequency by time        4.58 ms <

Memory estimate: 149.94 KiB, allocs estimate: 521.

Mote that the results m1and m2 are of course (approximately) the same.

distance(M, m1, m2)
0.0