Getting Started: Optimize!

Getting Started: Optimize!

This example illustrates how to set up and solve optimization problems and how to further get data from the algorithm using DebugOptions and RecordOptions

To start from the quite general case: A Solver is an algorithm that aims to solve

\[\operatorname*{argmin}_{x\in\mathcal M} f(x)\]

where $\mathcal M$ is a Manifold and $f\colon\mathcal M \to \mathbb R$ is the cost function.

In Manopt.jl a Solver is an algorithm that requires a Problem p and Options o. While former contains static data, most prominently the manifold $\mathcal M$ (usually as p.M) and the cost function $f$ (usually as p.costFunction), the latter contains dynamic data, i.e. things that usually change during the algorithm, are allowed to change, or specify the details of the algorithm to use. Together they form a plan. A plan uniquely determines the algorithm to use and provide all necessary information to run the algorithm.

Example

A gradient plan consists of a GradientProblem with the fields M, cost function $f$ as well as gradient storing the gradient function corresponding to $f$. Accessing both functions can be done directly but should be encapsulated using getCost(p,x) and getGradient(p,x), where in both cases x is an MPoint on the Manifold M. Second, the GradientDescentOptions specify that the algorithm to solve the GradientProblem will be the gradient descent algorithm. It requires an initial value o.x0, a StoppingCriterion o.stop, a Stepsize o.stepsize and a retraction o.retraction and it internally stores the last evaluation of the gradient at o.∇ for convenience. The only mandatory parameter is the initial value x0, though the defaults for both the stopping criterion (stopAfterIteration(100)) as well as the stepsize (ConstantStepsize(1.) are quite conservative, but are chosen to be as simple as possible.

With these two at hand, running the algorithm just requires to call xOpt = solve(p,o).

In the following two examples we will see, how to use a higher level interface that allows to more easily activate for example a debug output or record values during the iterations

The given Dataset

using Manopt
using Random, Colors

For a persistent random set we use

n = 100
σ = π/8
M = Sphere(2)
x = SnPoint(1/sqrt(2)*[1., 0., 1.])
Random.seed!(42)
data = addNoise.(Ref(M), repeat([x],n),Ref(σ))

and we define some colors from Paul Tol

black = RGBA{Float64}(colorant"#000000")
TolVibrantOrange = RGBA{Float64}(colorant"#EE7733")
TolVibrantBlue = RGBA{Float64}(colorant"#0077BB")
TolVibrantTeal = RGBA{Float64}(colorant"#009988")
TolVibrantMagenta = RGBA{Float64}(colorant"#EE3377")

Then our data looks like

asyResolution = 2
renderAsymptote("startDataAndCenter.asy",asyExportS2Signals;
    render = asyResolution,
    points = [ [x], data],
    colors=Dict(:points => [TolVibrantBlue, TolVibrantTeal]),
    dotSize = 3.5, cameraPosition = (1.,.5,.5)
)

The data of noisy versions of \$x\$

Computing the Mean

To compute the mean on the manifold we use the characterization, that the Euclidean mean minimizes the sum of squared distances, and end up with the following cost function. Its minimizer is called Riemannian Center of Mass.

F = y -> sum(1/(2*n) * distance.(Ref(M),Ref(y),data).^2)
∇F = y -> sum(1/n*gradDistance.(Ref(M),data,Ref(y)))

note that the gradDistance defaults to the case p=2, i.e. the gradient of the squared distance. For details on convergence of the gradient descent for this problem, see [Afsari, Tron, Vidal, 2013]

The easiest way to call the gradient descent is now to call steepestDescent

xMean = steepestDescent(M,F,∇F,data[1])

but in order to get more details, we further add the debug= options, which act as a decorator pattern using the DebugOptions and DebugActions. The latter store values if that's necessary, for example for the DebugChange that prints the change during the last iteration. The following debug prints

# i | x: | Last Change: | F(x):`

as well as the reason why the algorithm stopped at the end. Here, the format shorthand and the [DebugFactory] are used, which returns a DebugGroup of DebugAction performed each iteration and the stop, respectively.

xMean = steepestDescent(M,F,∇F,data[1];
   debug = [:Iteration," | ", :x, " | ", :Change, " | ", :Cost, "\n", :Stop]
)
Initial | x: Sn([0.5737338264338114, -0.17286515131186522, 0.8005917410687817]) |  | F(x): 0.22606088442202962
# 1 | x: Sn([0.7823885991620455, 0.08556904440746381, 0.6168841208366818]) | Last Change: 0.38188652881305785 | F(x): 0.14924728281088762
# 2 | x: Sn([0.7917377109136624, 0.09720922045898242, 0.6030769143110611]) | Last Change: 0.020335997468975846 | F(x): 0.14902948995714208
# 3 | x: Sn([0.792283271084327, 0.09771599132874592, 0.6022780117176447]) | Last Change: 0.001092107177198496 | F(x): 0.1490288608629298
# 4 | x: Sn([0.7923155575566625, 0.0977379000751398, 0.6022319819991107]) | Last Change: 6.034189486653822e-5 | F(x): 0.14902885894006088
# 5 | x: Sn([0.79231746502594, 0.09773883592992512, 0.602229320579737]) | Last Change: 3.405428846127275e-6 | F(x): 0.14902885893393106
# 6 | x: Sn([0.7923175775065071, 0.09773887521445886, 0.6022291662199969]) | Last Change: 1.9428737179048833e-7 | F(x): 0.149028858933911
# 7 | x: Sn([0.7923175841297171, 0.09773887682107822, 0.6022291572454815]) | Last Change: 0.0 | F(x): 0.1490288589339109
The algorithm reached approximately critical point; the gradient norm (6.549990973843524e-10) is less than 1.0e-8.
renderAsymptote("startDataCenterMean.asy",asyExportS2Signals;
    render = asyResolution,
    points = [ [x], data, [xMean] ],
    colors=Dict(:points => [TolVibrantBlue, TolVibrantTeal, TolVibrantOrange]),
    dotSize = 3.5, cameraPosition = (1.,.5,.5)
)

The resulting mean (orange)

Computing the Median

Similar to the mean you can also define the median as the minimizer of the distances, see for example [Bačák, 2014], but since this problem is not differentiable, we employ the Cyclic Proximal Point (CPP) algorithm, described in the same reference. We define

F2 = y -> sum( 1/(2*n) * distance.(Ref(M),Ref(y),data))
proxes = Function[ (λ,y) -> proxDistance(M,λ/n,di,y,1) for di in data ]

where the Function is a helper for global scope to infer the correct type.

We then call the cyclicProximalPoint as

o = cyclicProximalPoint(M,F2,proxes,data[1];
    debug = [:Iteration," | ", :x, " | ", :Change, " | ", :Cost, "\n", 50, :Stop],
    record = [:Iteration, :Change, :Cost],
    returnOptions = true
)
xMedian = getSolverResult(o)
values = getRecord(o)
Initial | x: Sn([0.5737338264338114, -0.17286515131186522, 0.8005917410687817]) |  | F(x): 0.29868007928810264
# 50 | x: Sn([0.7902068927064384, 0.07905752387374188, 0.6077194868014257]) | Last Change: 0.024872284599065892 | F(x): 0.24545404985253558
# 100 | x: Sn([0.7901763339094678, 0.07891420169780099, 0.6077778460093722]) | Last Change: 0.00015168198899823736 | F(x): 0.2454540249491582
# 150 | x: Sn([0.7901640149475095, 0.07886656476881941, 0.6078000447874266]) | Last Change: 5.23856377841844e-5 | F(x): 0.24545402020641968
# 200 | x: Sn([0.7901574583210484, 0.07884284499708583, 0.6078116458676868]) | Last Change: 2.6487672240370526e-5 | F(x): 0.24545401853477888
# 250 | x: Sn([0.7901533992744378, 0.07882865828353179, 0.6078187626655407]) | Last Change: 1.597551121821253e-5 | F(x): 0.24545401775904627
# 300 | x: Sn([0.7901506420043442, 0.07881922329299917, 0.607823570602203]) | Last Change: 1.0681066695042033e-5 | F(x): 0.24545401733720773
# 350 | x: Sn([0.7901486478380554, 0.07881249671096521, 0.6078270351686217]) | Last Change: 7.642872252139065e-6 | F(x): 0.24545401708274203
# 400 | x: Sn([0.7901471389219392, 0.0788074593967528, 0.6078296498172023]) | Last Change: 5.73880456603777e-6 | F(x): 0.2454540169175623
# 450 | x: Sn([0.7901459575601557, 0.07880354634232323, 0.6078316928519094]) | Last Change: 4.4670937532181725e-6 | F(x): 0.24545401680431822
# 500 | x: Sn([0.7901450076500379, 0.07880041914367038, 0.6078333331008621]) | Last Change: 3.5757198493180048e-6 | F(x): 0.24545401672332384
# 550 | x: Sn([0.790144227305975, 0.07879786277991105, 0.6078346789023823]) | Last Change: 2.926741314875479e-6 | F(x): 0.24545401666340602
# 600 | x: Sn([0.7901435748845504, 0.07879573409228191, 0.6078358029579382]) | Last Change: 2.439789287793762e-6 | F(x): 0.24545401661784125
# 650 | x: Sn([0.7901430213382369, 0.0787939340847595, 0.6078367558661857]) | Last Change: 2.06492676586287e-6 | F(x): 0.24545401658238772
# 700 | x: Sn([0.7901425457901493, 0.07879239211753257, 0.6078375739263401]) | Last Change: 1.7702303548713047e-6 | F(x): 0.2454540165542612
# 750 | x: Sn([0.7901421328511954, 0.07879105643470423, 0.6078382838545687]) | Last Change: 1.5343131692766127e-6 | F(x): 0.2454540165315743
# 800 | x: Sn([0.7901417709288321, 0.07878988825061885, 0.6078389057496272]) | Last Change: 1.342759171255786e-6 | F(x): 0.24545401651300994
The algorithm performed a step with a change (0.0) less than 1.0e-12.

where the differences to steepestDescent are as follows

These recorded entries read

values
819-element Array{Tuple{Int64,Float64,Float64},1}:
 (1, 0.3623800525332345, 0.24564104473275306)     
 (2, 0.018093060506321777, 0.24547241697343178)   
 (3, 0.004524562287615739, 0.24545924567218239)   
 (4, 0.0016977276678350063, 0.24545672024424584)  
 (5, 0.0008065520447357241, 0.24545583524261108)  
 (6, 0.00045408132637338595, 0.24545538066895142) 
 (7, 0.00029150358326490163, 0.24545509464651913) 
 (8, 0.0002066764508664904, 0.24545489511003585)  
 (9, 0.0001573613711552206, 0.24545474773232623)  
 (10, 0.00012587009545694195, 0.24545463490344238)
 ⋮                                                
 (811, 2.1073424255447017e-8, 0.24545401650937992)
 (812, 2.1073424255447017e-8, 0.24545401650905738)
 (813, 2.580956827951785e-8, 0.2454540165087359)  
 (814, 2.1073424255447017e-8, 0.24545401650841542)
 (815, 2.1073424255447017e-8, 0.2454540165080966) 
 (816, 2.9802322387695312e-8, 0.24545401650777848)
 (817, 2.580956827951785e-8, 0.2454540165074617)  
 (818, 2.1073424255447017e-8, 0.2454540165071459) 
 (819, 0.0, 0.24545401650683144)                  

The resulting median and mean for the data hence are

renderAsymptote("startDataCenterMean.asy",asyExportS2Signals;
    render = asyResolution,
    points = [ [x], data, [xMean], [xMedian] ],
    colors=Dict(:points => [TolVibrantBlue, TolVibrantTeal, TolVibrantOrange, TolVibrantMagenta]),
    dotSize = 3.5, cameraPosition = (1.,.5,.5)
)

The resulting mean (orange) and median (magenta)

Literature