Difference of convex
Difference of convex algorithm
Manopt.difference_of_convex_algorithm — Functiondifference_of_convex_algorithm(M, f, g, ∂h, p=rand(M); kwargs...)
difference_of_convex_algorithm(M, mdco, p; kwargs...)Compute the difference of convex algorithm [BFSS23] to minimize
\[ \operatorname*{arg\,min}_{p∈\mathcal M}\ g(p) - h(p)\]
where you need to provide $f(p) = g(p) - h(p)$, $g$ and the subdifferential $∂h$ of $h$.
This algorithm performs the following steps given a start point p= $p^{(0)}$. Then repeat for $k=0,1,\ldots$
- Take $X^{(k)} ∈ ∂h(p^{(k)})$
- Set the next iterate to the solution of the subproblem
\[ p^{(k+1)} ∈ \operatorname*{argmin}_{q ∈ \mathcal M} g(q) - ⟨X^{(k)}, \log_{p^{(k)}}q⟩\]
until the stopping_criterion is fulfilled.
Optional parameters
evaluation(AllocatingEvaluation) specify whether the gradient works by allocation (default) formgrad_f(M, p)orInplaceEvaluationformgrad_f!(M, X, x)gradient(nothing) specify $\operatorname{grad} f$, for debug / analysis or enhancingstopping_criterion=grad_g(nothing) specify the gradient ofg. If specified, a subsolver is automatically set up.initial_vector(zero_vector(M, p)) initialise the inner tangent vector to store the subgradient result.stopping_criterion(StopAfterIteration(200) |StopWhenChangeLess(1e-8)) aStoppingCriterionfor the algorithm. This includes aStopWhenGradientNormLess(1e-8), when agradientis provided.
if you specify the ManifoldDifferenceOfConvexObjective mdco, additionally
g- (nothing) specify the functiongIf specified, a subsolver is automatically set up.
While there are several parameters for a sub solver, the easiest is to provide the function grad_g=, such that together with the mandatory function g a default cost and gradient can be generated and passed to a default subsolver. Hence the easiest example call looks like
difference_of_convex_algorithm(M, f, g, grad_h, p; grad_g=grad_g)Optional parameters for the sub problem
sub_cost(LinearizedDCCost(g, p, initial_vector)) a cost to be used within the defaultsub_problemUse this if you have a more efficient version than the default that is built usinggfrom before.sub_grad(LinearizedDCGrad(grad_g, p, initial_vector; evaluation=evaluation)gradient to be used within the defaultsub_problem. This is generated by default whengrad_gis provided. You can specify your own by overwriting this keyword.sub_hess(a finite difference approximation by default) specify a Hessian of the subproblem, which the default solver, seesub_stateneedssub_kwargs((;)) pass keyword arguments to thesub_state, in form of aDict(:kwname=>value), unless you set thesub_statedirectly.sub_objective(a gradient or Hessian objective based on the last 3 keywords) provide the objective used withinsub_problem(if that is not specified by the user)sub_problem(DefaultManoptProblem(M, sub_objective)specify a manopt problem for the sub-solver runs. You can also provide a function for a closed form solution. Thenevaluation=is taken into account for the form of this function.sub_state(TrustRegionsStateby default, requiressub_hessianto be provided; decorated withsub_kwargs). Choose the solver by specifying a solver state to solve thesub_problemif thesub_problemif a function (a closed form solution), this is set toevaluationand can be changed to the evaluation type of the closed form solution accordingly.sub_stopping_criterion(StopAfterIteration(300) |StopWhenStepsizeLess(1e-9) |StopWhenGradientNormLess(1e-9)) a stopping criterion used withing the defaultsub_state=sub_stepsize(ArmijoLinesearch(M)) specify a step size used within thesub_state,
all others are passed on to decorate the inner DifferenceOfConvexState.
Output
the obtained (approximate) minimizer $p^*$, see get_solver_return for details
Manopt.difference_of_convex_algorithm! — Functiondifference_of_convex_algorithm!(M, f, g, ∂h, p; kwargs...)
difference_of_convex_algorithm!(M, mdco, p; kwargs...)Run the difference of convex algorithm and perform the steps in place of p. See difference_of_convex_algorithm for more details.
if you specify the ManifoldDifferenceOfConvexObjective mdco, the g is a keyword argument.
Difference of convex proximal point
Manopt.difference_of_convex_proximal_point — Functiondifference_of_convex_proximal_point(M, grad_h, p=rand(M); kwargs...)
difference_of_convex_proximal_point(M, mdcpo, p=rand(M); kwargs...)Compute the difference of convex proximal point algorithm [SO15] to minimize
\[ \operatorname*{arg\,min}_{p∈\mathcal M} g(p) - h(p)\]
where you have to provide the (sub) gradient $∂h$ of $h$ and either
- the proximal map $\operatorname{prox}_{\lambda g}$ of
gas a functionprox_g(M, λ, p)orprox_g(M, q, λ, p) - the functions
gandgrad_gto compute the proximal map using a sub solver - your own sub-solver, see optional keywords below
This algorithm performs the following steps given a start point p= $p^{(0)}$. Then repeat for $k=0,1,\ldots$
- $X^{(k)} ∈ \operatorname{grad} h(p^{(k)})$
- $q^{(k)} = \operatorname{retr}_{p^{(k)}}(λ_kX^{(k)})$
- $r^{(k)} = \operatorname{prox}_{λ_kg}(q^{(k)})$
- $X^{(k)} = \operatorname{retr}^{-1}_{p^{(k)}}(r^{(k)})$
- Compute a stepsize $s_k$ and
- set $p^{(k+1)} = \operatorname{retr}_{p^{(k)}}(s_kX^{(k)})$.
until the stopping_criterion is fulfilled. See [ACOO20] for more details on the modified variant, where steps 4-6 are slightly changed, since here the classical proximal point method for DC functions is obtained for $s_k = 1$ and one can hence employ usual line search method.
Optional parameters
λ: (i -> 1/2) a function returning the sequence of prox parameters λievaluation: (AllocatingEvaluation) specify whether the gradient works by allocation (default) formgradF(M, x)orInplaceEvaluationin place of the formgradF!(M, X, x).cost: (nothing) provide the costf, for debug reasons / analysis the defaultsub_problem. Use this if you have a more efficient version than usinggfrom before.gradient: (nothing) specify $\operatorname{grad} f$, for debug / analysis or enhancing thestopping_criterionprox_g: (nothing) specify a proximal map for the sub problem or both of the followingg: (nothing) specify the functiong.grad_g: (nothing) specify the gradient ofg. If bothgandgrad_gare specified, a subsolver is automatically set up.inverse_retraction_method: (default_inverse_retraction_method(M)) an inverse retraction method to use (see step 4).retraction_method: (default_retraction_method(M)) a retraction to use (see step 2)stepsize: (ConstantStepsize(M)) specify aStepsizeto run the modified algorithm (experimental.) functor.stopping_criterion: (StopAfterIteration(200) |StopWhenChangeLess(1e-8)) aStoppingCriterionfor the algorithm, also includes aStopWhenGradientNormLess(1e-8), when agradientis provided.
While there are several parameters for a sub solver, the easiest is to provide the function g and grad_g, such that together with the mandatory function g a default cost and gradient can be generated and passed to a default subsolver. Hence the easiest example call looks like
difference_of_convex_proximal_point(M, grad_h, p0; g=g, grad_g=grad_g)Optional parameters for the sub problem
sub_cost: (ProximalDCCost(g, copy(M, p), λ(1))) cost to be used within the defaultsub_problemthat is initialized as soon asgis provided.sub_grad: (ProximalDCGrad(grad_g, copy(M, p), λ(1); evaluation=evaluation)gradient to be used within the defaultsub_problem, that is initialized as soon asgrad_gis provided. This is generated by default whengrad_gis provided. You can specify your own by overwriting this keyword.sub_hess: (a finite difference approximation by default) specify a Hessian of the subproblem, which the default solver, seesub_stateneedssub_kwargs: ((;)) pass keyword arguments to thesub_state, in form of aDict(:kwname=>value), unless you set thesub_statedirectly.sub_objective: (a gradient or Hessian objective based on the last 3 keywords) provide the objective used withinsub_problem(if that is not specified by the user)sub_problem: (DefaultManoptProblem(M, sub_objective)specify a manopt problem for the sub-solver runs. You can also provide a function for a closed form solution. Thenevaluation=is taken into account for the form of this function.sub_state: (TrustRegionsState). requires thesub_Hessian to be provided, decorated withsubkwargs) choose the solver by specifying a solver state to solve thesubproblem`sub_stopping_criterion: (StopAfterIteration(300) |StopWhenStepsizeLess(1e-9) |StopWhenGradientNormLess(1e-9)) a stopping criterion used withing the defaultsub_state=
all others are passed on to decorate the inner DifferenceOfConvexProximalState.
Output
the obtained (approximate) minimizer $p^*$, see get_solver_return for details
Manopt.difference_of_convex_proximal_point! — Functiondifference_of_convex_proximal_point!(M, grad_h, p; cost=nothing, kwargs...)
difference_of_convex_proximal_point!(M, mdcpo, p; cost=nothing, kwargs...)
difference_of_convex_proximal_point!(M, mdcpo, prox_g, p; cost=nothing, kwargs...)Compute the difference of convex algorithm to minimize
\[ \operatorname*{arg\,min}_{p∈\mathcal M} g(p) - h(p)\]
where you have to provide the proximal map of g and the gradient of h.
The computation is done in-place of p.
For all further details, especially the keyword arguments, see difference_of_convex_proximal_point.
Solver states
Manopt.DifferenceOfConvexState — TypeDifferenceOfConvexState{Pr,St,P,T,SC<:StoppingCriterion} <:
AbstractManoptSolverStateA struct to store the current state of the [difference_of_convex_algorithm])(@ref). It comes in two forms, depending on the realisation of the subproblem.
Fields
pthe current iterate, a point on the manifoldXthe current subgradient, a tangent vector top.sub_problemproblem for the subsolversub_statestate of the subproblemstopa functor inheriting fromStoppingCriterionindicating when to stop.
For the sub task, a method to solve
\[ \operatorname*{argmin}_{q∈\mathcal M}\ g(p) - ⟨X, \log_p q⟩\]
is needed. Besides a problem and options, one can also provide a function and an AbstractEvaluationType, respectively, to indicate a closed form solution for the sub task.
Constructors
DifferenceOfConvexState(M, p, sub_problem, sub_state; kwargs...)
DifferenceOfConvexState(M, p, sub_solver; evaluation=InplaceEvaluation(), kwargs...)Generate the state either using a solver from Manopt, given by an AbstractManoptProblem sub_problem and an AbstractManoptSolverState sub_state, or a closed form solution sub_solver for the sub-problem the function expected to be of the form (M, p, X) -> q or (M, q, p, X) -> q, where by default its AbstractEvaluationType evaluation is in-place of q. Here the elements passed are the current iterate p and the subgradient X of h can be passed to that function.
further keyword arguments
initial_vector=zero_vector(zero_vectoir(M,p)) how to initialize the inner gradient tangent vectorstopping_criterionaStopAfterIteration(200)a stopping criterion
Manopt.DifferenceOfConvexProximalState — TypeDifferenceOfConvexProximalState{Type} <: AbstractSubProblemSolverStateA struct to store the current state of the algorithm as well as the form. It comes in two forms, depending on the realisation of the subproblem.
Fields
inverse_retraction_method: (default_inverse_retraction_method(M)) an inverse retraction method to use within Frank Wolfe.retraction_method: (default_retraction_method(M)) a type of retractionp,q,r: the current iterate, the gradient step and the prox, respectively their type is set by initializingpstepsize: (ConstantStepsize(1.0)) aStepsizefunction to run the modified algorithm (experimental)stop: (StopWhenChangeLess(1e-8)) aStoppingCriterionX,Y: (zero_vector(M,p)) the current gradient and descent direction, respectively their common type is set by the keywordXsub_problem: anAbstractManoptProblemproblem or a function(M, p, X) -> qor(M, q, p, X)for the a closed form solution of the sub problemsub_state: anAbstractManoptSolverStatefor the subsolver or anAbstractEvaluationTypein case the sub problem is provided as a function
Constructor
DifferenceOfConvexProximalState(M, p; kwargs...)Keyword arguments
X,retraction_method,inverse_retraction_method,stepsizefor the corresponding fieldsstoppping_criterionfor theStoppingCriterion
The difference of convex objective
Manopt.ManifoldDifferenceOfConvexObjective — TypeManifoldDifferenceOfConvexObjective{E} <: AbstractManifoldCostObjective{E}Specify an objective for a difference_of_convex_algorithm.
The objective $f: \mathcal M → ℝ$ is given as
\[ f(p) = g(p) - h(p)\]
where both $g$ and $h$ are convex, lower semicontinuous and proper. Furthermore the subdifferential $∂h$ of $h$ is required.
Fields
cost: an implementation of $f(p) = g(p)-h(p)$ as a functionf(M,p).∂h!!: a deterministic version of $∂h: \mathcal M → T\mathcal M$, in the sense that calling∂h(M, p)returns a subgradient of $h$ atpand if there is more than one, it returns a deterministic choice.
Note that the subdifferential might be given in two possible signatures
∂h(M,p)which does anAllocatingEvaluation∂h!(M, X, p)which does anInplaceEvaluationin place ofX.
as well as for the corresponding sub problem
Manopt.LinearizedDCCost — TypeLinearizedDCCostA functor (M,q) → ℝ to represent the inner problem of a ManifoldDifferenceOfConvexObjective. This is a cost function of the form
\[ F_{p_k,X_k}(p) = g(p) - ⟨X_k, \log_{p_k}p⟩\]
for a point p_k and a tangent vector X_k at p_k (for example outer iterates) that are stored within this functor as well.
Fields
ga functionpka point on a manifoldXka tangent vector atpk
Both interim values can be set using set_manopt_parameter!(::LinearizedDCCost, ::Val{:p}, p) and set_manopt_parameter!(::LinearizedDCCost, ::Val{:X}, X), respectively.
Constructor
LinearizedDCCost(g, p, X)Manopt.LinearizedDCGrad — TypeLinearizedDCGradA functor (M,X,p) → ℝ to represent the gradient of the inner problem of a ManifoldDifferenceOfConvexObjective. This is a gradient function of the form
\[ F_{p_k,X_k}(p) = g(p) - ⟨X_k, \log_{p_k}p⟩\]
its gradient is given by using $F=F_1(F_2(p))$, where $F_1(X) = ⟨X_k,X⟩$ and $F_2(p) = \log_{p_k}p$ and the chain rule as well as the adjoint differential of the logarithmic map with respect to its argument for $D^*F_2(p)$
\[ \operatorname{grad} F(q) = \operatorname{grad} f(q) - DF_2^*(q)[X]\]
for a point pk and a tangent vector Xk at pk (the outer iterates) that are stored within this functor as well
Fields
grad_g!!the gradient of $g$ (see alsoLinearizedDCCost)pka point on a manifoldXka tangent vector atpk
Both interim values can be set using set_manopt_parameter!(::LinearizedDCGrad, ::Val{:p}, p) and set_manopt_parameter!(::LinearizedDCGrad, ::Val{:X}, X), respectively.
Constructor
LinearizedDCGrad(grad_g, p, X; evaluation=AllocatingEvaluation())Where you specify whether grad_g is AllocatingEvaluation or InplaceEvaluation, while this function still provides both signatures.
Manopt.ManifoldDifferenceOfConvexProximalObjective — TypeManifoldDifferenceOfConvexProximalObjective{E} <: ProblemSpecify an objective difference_of_convex_proximal_point algorithm. The problem is of the form
\[ \operatorname*{argmin}_{p∈\mathcal M} g(p) - h(p)\]
where both $g$ and $h$ are convex, lower semicontinuous and proper.
Fields
cost: (nothing) implementation of $f(p) = g(p)-h(p)$ (optional)gradient: the gradient of the costgrad_h!!: a function $\operatorname{grad}h: \mathcal M → T\mathcal M$,
Note that both the gradients might be given in two possible signatures as allocating or in-place.
Constructor
ManifoldDifferenceOfConvexProximalObjective(gradh; cost=nothing, gradient=nothing)an note that neither cost nor gradient are required for the algorithm, just for eventual debug or stopping criteria.
as well as for the corresponding sub problems
Manopt.ProximalDCCost — TypeProximalDCCostA functor (M, p) → ℝ to represent the inner cost function of a ManifoldDifferenceOfConvexProximalObjective. This is the cost function of the proximal map of g.
\[ F_{p_k}(p) = \frac{1}{2λ}d_{\mathcal M}(p_k,p)^2 + g(p)\]
for a point pk and a proximal parameter $λ$.
Fields
g- a functionpk- a point on a manifoldλ- the prox parameter
Both interim values can be set using set_manopt_parameter!(::ProximalDCCost, ::Val{:p}, p) and set_manopt_parameter!(::ProximalDCCost, ::Val{:λ}, λ), respectively.
Constructor
ProximalDCCost(g, p, λ)Manopt.ProximalDCGrad — TypeProximalDCGradA functor (M,X,p) → ℝ to represent the gradient of the inner cost function of a ManifoldDifferenceOfConvexProximalObjective. This is the gradient function of the proximal map cost function of g. Based on
\[ F_{p_k}(p) = \frac{1}{2λ}d_{\mathcal M}(p_k,p)^2 + g(p)\]
it reads
\[ \operatorname{grad} F_{p_k}(p) = \operatorname{grad} g(p) - \frac{1}{λ}\log_p p_k\]
for a point pk and a proximal parameter λ.
Fields
grad_g- a gradient functionpk- a point on a manifoldλ- the prox parameter
Both interim values can be set using set_manopt_parameter!(::ProximalDCGrad, ::Val{:p}, p) and set_manopt_parameter!(::ProximalDCGrad, ::Val{:λ}, λ), respectively.
Constructor
ProximalDCGrad(grad_g, pk, λ; evaluation=AllocatingEvaluation())Where you specify whether grad_g is AllocatingEvaluation or InplaceEvaluation, while this function still always provides both signatures.
Helper functions
Manopt.get_subtrahend_gradient — FunctionX = get_subtrahend_gradient(amp, q)
get_subtrahend_gradient!(amp, X, q)Evaluate the (sub)gradient of the subtrahend h from within a ManifoldDifferenceOfConvexObjective amp at the point q (in place of X).
The evaluation is done in place of X for the !-variant. The T=AllocatingEvaluation problem might still allocate memory within. When the non-mutating variant is called with a T=InplaceEvaluation memory for the result is allocated.
X = get_subtrahend_gradient(M::AbstractManifold, dcpo::ManifoldDifferenceOfConvexProximalObjective, p)
get_subtrahend_gradient!(M::AbstractManifold, X, dcpo::ManifoldDifferenceOfConvexProximalObjective, p)Evaluate the gradient of the subtrahend $h$ from within a ManifoldDifferenceOfConvexProximalObjectivePat the pointp` (in place of X).
Technical details
The difference_of_convex_algorithm and difference_of_convex_proximal_point solver requires the following functions of a manifold to be available
- A
retract!(M, q, p, X); it is recommended to set thedefault_retraction_methodto a favourite retraction. If this default is set, aretraction_method=orretraction_method_dual=(for $\mathcal N$) does not have to be specified. - An
inverse_retract!(M, X, p, q); it is recommended to set thedefault_inverse_retraction_methodto a favourite retraction. If this default is set, ainverse_retraction_method=orinverse_retraction_method_dual=(for $\mathcal N$) does not have to be specified.
By default, one of the stopping criteria is StopWhenChangeLess, which either requires
- A
retract!(M, q, p, X); it is recommended to set thedefault_retraction_methodto a favourite retraction. If this default is set, aretraction_method=orretraction_method_dual=(for $\mathcal N$) does not have to be specified. - An
inverse_retract!(M, X, p, q); it is recommended to set thedefault_inverse_retraction_methodto a favourite retraction. If this default is set, ainverse_retraction_method=orinverse_retraction_method_dual=(for $\mathcal N$) does not have to be specified or thedistance(M, p, q)for said default inverse retraction. - A `copyto!
(M, q, p)andcopy(M,p)for points. - By default the tangent vector storing the gradient is initialized calling
zero_vector(M,p). - everything the subsolver requires, which by default is the
trust_regionsor if you do not provide a Hessiangradient_descent.
Literature
- [ACOO20]
- Y. T. Almeida, J. X. Cruz Neto, P. R. Oliveira and J. C. Oliveira Souza. A modified proximal point method for DC functions on Hadamard manifolds. Computational Optimization and Applications 76, 649–673 (2020).
- [BFSS23]
- R. Bergmann, O. P. Ferreira, E. M. Santos and J. C. Souza. The difference of convex algorithm on Hadamard manifolds, arXiv preprint (2023).
- [SO15]
- J. C. Souza and P. R. Oliveira. A proximal point algorithm for DC fuctions on Hadamard manifolds. Journal of Global Optimization 63, 797–810 (2015).