How to do optimization on box domains
Mateusz Baran
Optimization with box domains and products of manifolds and boxes
A Hyperrectangle is, in general, not a manifold but a manifold with corners because locally at the boundary it looks like $\mathbb{R}^{n-k} \times \mathbb{R}^k_{\geq 0}$ for some $k > 0$, instead of $\mathbb{R}^n$ as required by the definition of a manifold.
Such spaces require special handling when used as domains in optimization. For simple methods like gradient descent using projected gradient and a stopping criterion involving StopWhenProjectedNegativeGradientNormLess may be sufficient, however methods that approximate the Hessian can benefit from a more advanced approach. The core idea is considering a piecewise quadratic approximation of the objective along the descent direction in the tangent space at the current iterate, and selecting the generalized Cauchy direction – its minimizer. The points at which the approximation might not be differentiable correspond to hitting new boundaries along the initially selected descent direction. Then, we can perform standard line search from the initial iterate in the generalized Cauchy direction.
Currently Manopt.jl can handle domains that are either a Hyperrectangle or a ProductManifold containing a Hyperrectangle as its first factor and other manifolds as subsequent factors.
Example
Consider the problem of fitting covariance matrix with box constraints on variance in principal directions. The objective is log-probability of data under a multivariate normal distribution with zero mean and covariance matrix given by the variable to optimize. Although there are better ways to solve this problem, expressing it this way allows us to freely extend the objective to more complex scenarios beyond what is possible with closed-form solutions.
First, we set up the problem by generating synthetic data. The data is sampled from a multivariate normal distribution with known covariance matrix.
using Manopt, Manifolds, LinearAlgebra, Random, Distributions
using ForwardDiff, DifferentiationInterface, RecursiveArrayTools
Random.seed!(41)
N = 5 # dimensionality of data
M_spd = SymmetricPositiveDefinite(N)
M_rot = Rotations(N)
V = rand(M_rot)
cov_matrix = Symmetric(V * Diagonal([0.5; 2.0; 5.0; 10.0; 20.0]) * V')
distr = MvNormal(zeros(N), cov_matrix)
data = Matrix(rand(distr, 200)') # 200 samples200×5 Matrix{Float64}:
-1.71519 7.90097 4.28701 -3.91166 -3.52822
-6.21275 -7.19569 0.237793 -0.0950468 -2.14304
-5.14083 -3.87093 0.0175424 0.519236 -1.69341
-0.798949 3.30071 1.79817 0.45475 -1.413
-3.99451 -4.82816 0.12508 0.875412 -0.185293
0.789528 -0.291098 1.64547 1.79327 2.21059
-2.78019 3.11309 -0.744708 -1.26952 -0.803289
-1.23441 -10.4689 0.0889228 4.49377 1.66938
-3.39835 2.37467 -4.88986 -1.13343 -0.630297
2.82479 5.33666 -3.10837 0.293323 2.00837
⋮
1.44015 -1.21025 -2.8555 0.0928556 -0.267104
0.018282 4.6514 3.53344 -0.77436 0.551689
-3.62832 -9.36256 3.06575 1.7817 -0.842772
-3.61414 -3.33447 -2.34463 -0.0186825 -0.214014
-0.651025 -2.31167 1.12179 -0.411333 -0.05979
3.23986 1.85521 -1.09497 2.16809 0.915976
2.4391 -1.01357 -0.894058 0.306356 0.461368
-2.5047 -8.02181 1.46 0.143221 -0.790536
-1.1727 0.0601283 -4.88644 -0.0923105 -1.28085The objective function is defined as follows, with gradient calculated using automatic differentiation.
function logprob_cost(::AbstractManifold, p)
D, R = p.x
logdet = sum(log, D)
invΣ = R * Diagonal(1 ./ D) * R'
ll = - 0.5 * size(data, 1) * logdet
for row in eachrow(data)
ll -= 0.5 * row' * invΣ * row
end
return -ll # We minimize negative log-likelihood
end
function logprob_gradient(M::AbstractManifold, p)
Y = DifferentiationInterface.gradient(q -> logprob_cost(M, q), AutoForwardDiff(), p)
return riemannian_gradient(M, p, Y)
endlogprob_gradient (generic function with 1 method)Finally, we can solve the optimization problem using a quasi-Newton method with box domain support. We restrict the variances (diagonal elements of the covariance matrix) to be between 1.0 and 100.0. The covariance matrix is represented using its eigendecomposition $\Sigma = R D R^{\top}$, where $D$ is a diagonal matrix of variances and $R$ is an orthogonal matrix of principal directions. With constraints on variances, the optimization variable belongs to $[1,100]^N \times \mathrm{SO}(N)$.
M = ProductManifold(Hyperrectangle(fill(1.0, N), fill(100.0, N)), M_rot)
p0 = ArrayPartition(fill(10.0, N), Matrix{Float64}(I(5)))
p_mle = quasi_Newton(M, logprob_cost, logprob_gradient, p0; stopping_criterion = StopAfterIteration(100) | StopWhenProjectedNegativeGradientNormLess(1e-6))
println("Estimated variances: $(p_mle.x[1])")
cov_matrix_mle = p_mle.x[2] * Diagonal(p_mle.x[1]) * p_mle.x[2]'
println("Estimated covariance matrix:")
println(cov_matrix_mle)
nothingEstimated variances: [9.337970649440651, 27.216612038545097, 4.9867072858156, 1.0, 1.0]
Estimated covariance matrix:
[6.754760409607135 3.463074094662132 … 0.9779826359789503 2.4475170855713917; 3.4630740946621326 23.537855765177717 … -7.285756807086521 -3.7754147823525464; … ; 0.9779826359789499 -7.285756807086521 … 4.373435619817869 2.7236152843597954; 2.4475170855713917 -3.7754147823525464 … 2.7236152843597954 3.8567865662085374]We see that despite the original covariance matrix having variances ranging from 0.5 to 20.0, the estimated covariance matrix respects the box constraints of variances between 1.0 and 100.0.