Get Started: Optimize!

This example illustrates how to set up and solve optimization problems and how to further get data from the algorithm using debug output and record data. We will use the Riemannian mean and median as simple examples.

To start from the quite general case: a solver is an algorithm that aims to solve

$$\operatorname*{argmin}_{x∈\mathcal M} f(x)$$

where $\mathcal M$ is a Manifold and $f:\mathcal M → ℝ$ is the cost function.

In Manopt.jl a solver is an algorithm that requires a Problemp and Optionso. While the former contains static data, most prominently the manifold $\mathcal M$ (usually as p.M) and the cost function $f$ (usually as x->get_cost(p, x)), the latter contains dynamic data, i.e. things that usually change during the algorithm, are allowed to change, or specify the details of the algorithm to use. Together they form a plan. A plan uniquely determines the algorithm to use and provides all necessary information to run it.

You can either follow the code here and look at the preprinted output, or, if you want to experiment things for yourself, you can directly access the Pluto notebooks related to the tutorials by going to "/your_Julia_installation_folder/packages/Manopt/tutorials/".

Setup

Let's first set up a few variables

begin
    localpath = join(splitpath(@__FILE__)[1:(end - 1)], "/") # files folder
    image_prefix = localpath * "/optimize"
    @info image_prefix
    render_asy = false # on CI or when you do not have asymptote, this should be false
end;

Example

A gradient plan consists of a GradientProblem with the fields M, cost function $f$ as well as gradient storing the gradient function corresponding to $f$. Accessing both functions can be done directly but should be encapsulated using get_cost(p,x) and get_gradient(p,x), where in both cases x is a point on the ManifoldM. Second, the GradientDescentOptions specify that the algorithm to solve the GradientProblem is the gradient descent algorithm. It requires an initial value o.x0, a StoppingCriteriono.stop, a Stepsizeo.stepsize and a retraction o.retraction. Internally, is stores the last evaluation of the gradient at o.gradient for convenience. The only mandatory parameter is the initial value x0, though the defaults for both the stopping criterion (StopAfterIteration(100)) as well as the stepsize (ConstantStepsize(1.)) are quite conservative, but are chosen to be as simple as possible.

With these two at hand, running the algorithm just requires to call x_opt = solve(p,o).

In the following two examples we will see, how to use a higher level interface that allows to more easily activate, for example, a debug output or record values during the iterations.

using Manopt, Manifolds, Random, Colors, PlutoUI

Let's load a few colors from Paul Tol.

begin
    black = RGBA{Float64}(colorant"#000000")
    TolVibrantOrange = RGBA{Float64}(colorant"#EE7733")
    TolVibrantBlue = RGBA{Float64}(colorant"#0077BB")
    TolVibrantTeal = RGBA{Float64}(colorant"#009988")
    TolVibrantMagenta = RGBA{Float64}(colorant"#EE3377")
end;

The data set

We take a look at a srandom set of points.

begin
    n = 100
    σ = π / 8
    M = Sphere(2)
    x = 1 / sqrt(2) * [1.0, 0.0, 1.0]
    Random.seed!(42)
    data = [exp(M, x, random_tangent(M, x, Val(:Gaussian), σ)) for i in 1:n]
end
100-element Vector{Vector{Float64}}:
 [0.9013817292525768, -0.3266930346539438, 0.2842228690275139]
 [0.5413128376734956, -0.27073756886516653, 0.7960411927625647]
 [0.11640703550675724, 0.22835279004025902, 0.9665942299462364]
 [0.8120710958917794, 0.3048940575972932, 0.4975742646670772]
 [0.3403759726989105, 0.16278898828946248, 0.9260906772562628]
 [0.7937947657628919, -0.11512297480761367, 0.5971905646599591]
 [0.8657115032997766, 0.0994828021701859, 0.49055760632856193]
 ⋮
 [-0.24540230556351394, 0.6192543288278474, 0.7458564102104935]
 [0.7206649443671707, 0.026606223795364103, 0.6927727959552205]
 [-0.1556170638176626, 0.4709844443817593, 0.8683069633483166]
 [0.8646459186062441, 0.15287339377544437, 0.47855737473488225]
 [0.8283553683993905, 0.539804408097735, 0.14979514225155605]
 [0.7766227187182986, -0.11478608022900104, 0.6194201389656778]
asymptote_export_S2_signals(
    image_prefix * "/startDataAndCenter.asy";
    points=[[x], data],
    colors=Dict(:points => [TolVibrantBlue, TolVibrantTeal]),
    dot_size=3.5,
    camera_position=(1.0, 0.5, 0.5),
)
render_asy && render_asymptote(image_prefix * "/startDataAndCenter.asy"; render=2)
false

Computing the Mean

To compute the mean on the manifold we use the characterization that the Euclidean mean minimizes the sum of squared distances, and end up with the following cost function. Its minimizer is called Riemannian Center of Mass.

Note. There are more sophisticated methods tailored for the specific manifolds available in Manifolds.jl see mean.

F(M, y) = sum(1 / (2 * n) * distance.(Ref(M), Ref(y), data) .^ 2)
F (generic function with 1 method)
gradF(M, y) = sum(1 / n * grad_distance.(Ref(M), data, Ref(y)))
gradF (generic function with 1 method)

Note that the grad_distance defaults to the case p=2, i.e. the gradient of the squared distance. For details on convergence of the gradient descent for this problem, see [AfsariTronVidal2013].

The easiest way to call the gradient descent is now to call gradient_descent.

gradient_descent(M, F, gradF, data[1])
3-element Vector{Float64}:
 0.6868392795848524
 0.00653160034916762
 0.7267799819864603

In order to get more details, we further add the debug= options, which act as a decorator pattern.

The following debug prints

# i | Last Change: | F(x): | x:

as well as the reason why the algorithm stopped at the end.

The elements passed to debug= are postprocessed, there are specifiic symbols and formats we can use. For example :Iteration to plot the iteratin, but we can also add a format for the print using (:Change, "| Last change: %e3.8"), i.e. the second string is a format for printf.

Note that here we use PlutoUI to see the output also here in the notebook

with_terminal() do
    global xMean = gradient_descent(
        M,
        F,
        gradF,
        data[1];
        debug=[
            :Iteration,
            (:Change, "change: %1.9f | "),
            (:Cost, " F(x): %1.11f | "),
            "\n",
            :Stop,
        ],
    )
end
Initial F(x): 0.32487988924 | 
# 1     change: 0.574731862 |  F(x): 0.15183688506 | 
# 2     change: 0.027140072 |  F(x): 0.15145136400 | 
# 3     change: 0.001271911 |  F(x): 0.15145051699 | 
# 4     change: 0.000060078 |  F(x): 0.15145051510 | 
# 5     change: 0.000002864 |  F(x): 0.15145051509 | 
# 6     change: 0.000000140 |  F(x): 0.15145051509 | 
# 7     change: 0.000000021 |  F(x): 0.15145051509 | 
The algorithm reached approximately critical point after 7 iterations; the gradient norm (6.808576745410295e-9) is less than 1.0e-8.

A way to get better performance, and for convex and coercive costs a guaranteed convergence, is to switch the default ConstantStepsize(1.0) with a step size that performs better, for example the ArmijoLinesearch. We can tweak the default values for the contraction_factor and the sufficient_decrease beyond constant step size which is already quite fast. We get

with_terminal() do
    global xMean2 = gradient_descent(
        M,
        F,
        gradF,
        data[1];
        stepsize=ArmijoLinesearch(1.0, ExponentialRetraction(), 0.99, 0.5),
        debug=[
            :Iteration,
            (:Change, "change: %1.9f | "),
            :Cost,
            (:Iterate, " | %s"),
            "\n",
            :Stop,
        ],
    )
end
InitialF(x): 0.324880
# 1     change: 0.598307567 | F(x): 0.151463 | [0.6891648805180924, 0.0026275812635667695, 0.7245997952505829]
# 2     change: 0.004992928 | F(x): 0.151451 | [0.6868477184189742, 0.006479791589843487, 0.7267724705873233]
# 3     change: 0.000052757 | F(x): 0.151451 | [0.6868390793259461, 0.0065312628686383335, 0.7267801742728178]
# 4     change: 0.000000433 | F(x): 0.151451 | [0.6868392781604383, 0.006531596591309469, 0.7267799833663667]
# 5     change: 0.000000015 | F(x): 0.151451 | [0.6868392794629642, 0.00653160063944062, 0.7267799820990414]
The algorithm reached approximately critical point after 5 iterations; the gradient norm (4.2198733567896755e-9) is less than 1.0e-8.

which finishes in 5 steaps, just slightly better than the previous computation.

F(M, xMean) - F(M, xMean2)
2.7755575615628914e-17

Note that other optimization tasks may have other speedup opportunities.

For even more precision, we can further require a smaller gradient norm. This is done by changing the StoppingCriterion used, where several criteria can be combined using & and/or |. If we want to decrease the final gradient (from less that 1e-8) norm but keep the maximal number of iterations to be 200, we can run

with_terminal() do
    global xMean3 = gradient_descent(
        M,
        F,
        gradF,
        data[1];
        stepsize=ArmijoLinesearch(1.0, ExponentialRetraction(), 0.99, 0.5),
        stopping_criterion=StopAfterIteration(200) | StopWhenGradientNormLess(1e-15),
        debug=[
            :Iteration,
            (:Change, "change: %1.9f | "),
            (:Cost, "F(x): %1.9f"),
            (:Iterate, " | %s"),
            "\n",
            :Stop,
        ],
    )
end
InitialF(x): 0.324879889
# 1     change: 0.598307567 | F(x): 0.151462602 | [0.6891648805180924, 0.0026275812635667695, 0.7245997952505829]
# 2     change: 0.004992928 | F(x): 0.151450516 | [0.6868477184189742, 0.006479791589843487, 0.7267724705873233]
# 3     change: 0.000052757 | F(x): 0.151450515 | [0.6868390793259461, 0.0065312628686383335, 0.7267801742728178]
# 4     change: 0.000000433 | F(x): 0.151450515 | [0.6868392781604383, 0.006531596591309469, 0.7267799833663667]
# 5     change: 0.000000015 | F(x): 0.151450515 | [0.6868392794629642, 0.00653160063944062, 0.7267799820990414]
# 6     change: 0.000000000 | F(x): 0.151450515 | [0.6868392794782384, 0.006531600679127565, 0.72677998208425]
# 7     change: 0.000000000 | F(x): 0.151450515 | [0.6868392794788359, 0.0065316006806974995, 0.7267799820836712]
# 8     change: 0.000000000 | F(x): 0.151450515 | [0.686839279478865, 0.006531600680774491, 0.726779982083643]
# 9     change: 0.000000000 | F(x): 0.151450515 | [0.6868392794788668, 0.006531600680779075, 0.7267799820836414]
# 10    change: 0.000000000 | F(x): 0.151450515 | [0.6868392794788669, 0.006531600680779287, 0.7267799820836413]
The algorithm reached approximately critical point after 10 iterations; the gradient norm (2.533780477098624e-16) is less than 1.0e-15.

Let's add this point to our data image

asymptote_export_S2_signals(
    image_prefix * "/startDataCenterMean.asy";
    points=[[x], data, [xMean3]],
    colors=Dict(:points => [TolVibrantBlue, TolVibrantTeal, TolVibrantOrange]),
    dot_size=3.5,
    camera_position=(1.0, 0.5, 0.5),
)
render_asy && render_asymptote(image_prefix * "/startDataCenterMean.asy"; render=2)
false

Computing the Median

Note. There are more sophisticated methods tailored for the specific manifolds available in Manifolds.jl see median.

Similarly to the mean, you can also define the median as the minimizer of the distances, see for example [Bačák2014], but since this problem is not differentiable, we employ the Cyclic Proximal Point (CPP) algorithm, described in the same reference. We define

F2(M, y) = sum(1 / (2 * n) * distance.(Ref(M), Ref(y), data))
F2 (generic function with 1 method)
proxes = Function[(M, λ, y) -> prox_distance(M, λ / n, di, y, 1) for di in data]
100-element Vector{Function}:
 #3 (generic function with 1 method)
 #3 (generic function with 1 method)
 #3 (generic function with 1 method)
 #3 (generic function with 1 method)
 #3 (generic function with 1 method)
 #3 (generic function with 1 method)
 #3 (generic function with 1 method)
 ⋮
 #3 (generic function with 1 method)
 #3 (generic function with 1 method)
 #3 (generic function with 1 method)
 #3 (generic function with 1 method)
 #3 (generic function with 1 method)
 #3 (generic function with 1 method)

So we call the cyclic proximal point algorithm, this time with a recording, and activate the return of the complete options to access the recorded values. We further increase the display of the cost function to more digits.

with_terminal() do
    global o = cyclic_proximal_point(
        M,
        F2,
        proxes,
        data[1];
        debug=[
            :Iteration,
            " | ",
            :Change,
            " | ",
            (:Cost, "F(x): %1.12f"),
            " | ",
            :Iterate,
            "\n",
            50,
            :Stop,
        ],
        record=[:Iteration, :Change, :Cost, :Iterate],
        return_options=true,
    )
end
Initial |  | F(x): 0.362208118208 | 
# 50     | Last Change: 0.134517 | F(x): 0.244370141855 | x: [0.6899935005916552, -0.01896177317094577, 0.7235671498205912]
# 100    | Last Change: 0.000738 | F(x): 0.244369477428 | x: [0.6896989553827807, -0.019620062719681985, 0.7238303696880813]
# 150    | Last Change: 0.000249 | F(x): 0.244369347858 | x: [0.689627709392341, -0.01985855470460776, 0.7238917462185616]
# 200    | Last Change: 0.000125 | F(x): 0.244369301328 | x: [0.6895992327648918, -0.019981699583130282, 0.7239154852963935]
# 250    | Last Change: 0.000075 | F(x): 0.244369279442 | x: [0.689584990429764, -0.020056887244235468, 0.7239269730077795]
# 300    | Last Change: 0.000050 | F(x): 0.244369267418 | x: [0.6895768884180585, -0.020107567336531233, 0.7239332846993348]
# 350    | Last Change: 0.000036 | F(x): 0.244369260105 | x: [0.6895718777269851, -0.020144043032678447, 0.7239370435185876]
# 400    | Last Change: 0.000027 | F(x): 0.244369255326 | x: [0.6895685928055623, -0.020171551692442447, 0.7239394065241385]
# 450    | Last Change: 0.000021 | F(x): 0.244369252032 | x: [0.6895663455251906, -0.020193037616745145, 0.7239409481103406]
The algorithm performed a step with a change (0.0) less than 1.0000000000000002e-12.
xMedian = get_solver_result(o)
3-element Vector{Float64}:
  0.689566161646394
 -0.020194930269146674
  0.7239410704634466

where the differences to gradient_descent are as follows

  • the third parameter is now an Array of proximal maps

  • debug is reduced to only every 50th iteration

  • we further activated a RecordAction using the record= optional parameter. These work very similarly to those in debug, but they collect their data in an array. The high level interface then returns two variables; the values do contain an array of recorded datum per iteration. Here a tuple containing the iteration, last change and cost respectively; see RedordOptions for details.

We can access the recorded values using get_record, that consists of a tuple per iteration and contains the iteration number, the change and the cost.

values = get_record(o)
455-element Vector{Tuple{Int64, Float64, Float64, Vector{Float64}}}:
 (1, 0.0, 0.25122077492914485, [0.7725364607817969, -0.07080234064197369, 0.6310106538897339])
 (2, 0.07946683931821867, 0.24559809471687633, [0.7299963552108888, -0.028225181418871065, 0.6828679671156679])
 (3, 0.025458747868814337, 0.24476563892261732, [0.7136354125490967, -0.019375250667647812, 0.7002493110438961])
 (4, 0.011128576957056035, 0.24455205841482008, [0.7060405381827376, -0.016820341944346014, 0.7079716339935571])
 (5, 0.0060558247322904735, 0.24447264231227728, [0.7018014581124855, -0.01589109362362358, 0.7121953289194127])
 (6, 0.0037583754395689806, 0.24443554267519943, [0.6991317792895255, -0.015584930889514396, 0.7148229606809152])
 (7, 0.0025374768125748263, 0.24441546613892312, [0.6973159854821999, -0.015551439967875317, 0.7165951221616729])
 ⋮
 (450, 1.7756782901008428e-7, 0.2443692520317774, [0.6895663455251906, -0.020193037616745145, 0.7239409481103406])
 (451, 1.7055158529072421e-7, 0.24436925197668855, [0.6895663082401089, -0.02019341945530099, 0.7239409729742278])
 (452, 1.5123026915032014e-7, 0.24436925192196457, [0.689566271211845, -0.02019379962900449, 0.7239409976397229])
 (453, 1.144579909424277e-7, 0.24436925186758998, [0.6895662344382351, -0.02019417814871737, 0.7239410221085981])
 (454, 3.332000937312528e-8, 0.24436925181357558, [0.689566197917125, -0.020194555025206732, 0.7239410463825947])
 (455, 0.0, 0.24436925175990884, [0.689566161646394, -0.020194930269146674, 0.7239410704634466])
asymptote_export_S2_signals(
    image_prefix * "/startDataCenterMedianAndMean.asy";
    points=[[x], data, [xMean], [xMedian]],
    colors=Dict(
        :points => [TolVibrantBlue, TolVibrantTeal, TolVibrantOrange, TolVibrantMagenta]
    ),
    dot_size=3.5,
    camera_position=(1.0, 0.5, 0.5),
)
render_asy && render_asymptote(image_prefix * "/startDataCenterMedianAndMean.asy"; render=2)
false

In the following image the mean (orange), median (magenta) are shown.

Literature

Bačák2014

Bačák, M: Computing Medians and Means in Hadamard Spaces. SIAM Journal on Optimization, Volume 24, Number 3, pp. 1542–1566, doi: 10.1137/140953393, arxiv: 1210.2145

AfsariTronVidal2013

Afsari, B; Tron, R.; Vidal, R.: On the Convergence of Gradient Descent for Finding the Riemannian Center of Mass, SIAM Journal on Control and Optimization, Volume 51, Issue 3, pp. 2230–2260. doi: 10.1137/12086282X, arxiv: 1201.0925.