How to do optimization on box domains

Mateusz Baran

Optimization with box domains and products of manifolds and boxes

A Hyperrectangle is, in general, not a manifold but a manifold with corners because locally at the boundary it looks like $\mathbb{R}^{n-k} \times \mathbb{R}^k_{\geq 0}$ for some $k > 0$, instead of $\mathbb{R}^n$ as required by the definition of a manifold.

Such spaces require special handling when used as domains in optimization. For simple methods like gradient descent using projected gradient and a stopping criterion involving StopWhenProjectedNegativeGradientNormLess may be sufficient, however methods that approximate the Hessian can benefit from a more advanced approach. The core idea is considering a piecewise quadratic approximation of the objective along the descent direction in the tangent space at the current iterate, and selecting the generalized Cauchy direction – its minimizer. The points at which the approximation might not be differentiable correspond to hitting new boundaries along the initially selected descent direction. Then, we can perform standard line search from the initial iterate in the generalized Cauchy direction.

Currently Manopt.jl can handle domains that are either a Hyperrectangle or a ProductManifold containing a Hyperrectangle as its first factor and other manifolds as subsequent factors.

Example

Consider the problem of fitting covariance matrix with box constraints on variance in principal directions. The objective is log-probability of data under a multivariate normal distribution with zero mean and covariance matrix given by the variable to optimize. Although there are better ways to solve this problem, expressing it this way allows us to freely extend the objective to more complex scenarios beyond what is possible with closed-form solutions.

First, we set up the problem by generating synthetic data. The data is sampled from a multivariate normal distribution with known covariance matrix.

using Manopt, Manifolds, LinearAlgebra, Random, Distributions
using ForwardDiff, DifferentiationInterface, RecursiveArrayTools
Random.seed!(41)

N = 5 # dimensionality of data
M_spd = SymmetricPositiveDefinite(N)
M_rot = Rotations(N)
V = rand(M_rot)
cov_matrix = Symmetric(V * Diagonal([0.5; 2.0; 5.0; 10.0; 20.0]) * V')
distr = MvNormal(zeros(N), cov_matrix)
data = Matrix(rand(distr, 200)')  # 200 samples

200×5 Matrix{Float64}:
 -1.71519     7.90097     4.28701    -3.91166    -3.52822
 -6.21275    -7.19569     0.237793   -0.0950468  -2.14304
 -5.14083    -3.87093     0.0175424   0.519236   -1.69341
 -0.798949    3.30071     1.79817     0.45475    -1.413
 -3.99451    -4.82816     0.12508     0.875412   -0.185293
  0.789528   -0.291098    1.64547     1.79327     2.21059
 -2.78019     3.11309    -0.744708   -1.26952    -0.803289
 -1.23441   -10.4689      0.0889228   4.49377     1.66938
 -3.39835     2.37467    -4.88986    -1.13343    -0.630297
  2.82479     5.33666    -3.10837     0.293323    2.00837
  ⋮                                              
  1.44015    -1.21025    -2.8555      0.0928556  -0.267104
  0.018282    4.6514      3.53344    -0.77436     0.551689
 -3.62832    -9.36256     3.06575     1.7817     -0.842772
 -3.61414    -3.33447    -2.34463    -0.0186825  -0.214014
 -0.651025   -2.31167     1.12179    -0.411333   -0.05979
  3.23986     1.85521    -1.09497     2.16809     0.915976
  2.4391     -1.01357    -0.894058    0.306356    0.461368
 -2.5047     -8.02181     1.46        0.143221   -0.790536
 -1.1727      0.0601283  -4.88644    -0.0923105  -1.28085

The objective function is defined as follows, with gradient calculated using automatic differentiation.

function logprob_cost(::AbstractManifold, p)
    D, R = p.x
    logdet = sum(log, D)
    invΣ = R * Diagonal(1 ./ D) * R'
    ll = - 0.5 * size(data, 1) * logdet
    for row in eachrow(data)
        ll -= 0.5 * row' * invΣ * row
    end
    return -ll  # We minimize negative log-likelihood
end

function logprob_gradient(M::AbstractManifold, p)
    Y = DifferentiationInterface.gradient(q -> logprob_cost(M, q), AutoForwardDiff(), p)
    return riemannian_gradient(M, p, Y)
end

logprob_gradient (generic function with 1 method)

Finally, we can solve the optimization problem using a quasi-Newton method with box domain support. We restrict the variances (diagonal elements of the covariance matrix) to be between 1.0 and 100.0. The covariance matrix is represented using its eigendecomposition $\Sigma = R D R^{\top}$, where $D$ is a diagonal matrix of variances and $R$ is an orthogonal matrix of principal directions. With constraints on variances, the optimization variable belongs to $[1,100]^N \times \mathrm{SO}(N)$.

M = ProductManifold(Hyperrectangle(fill(1.0, N), fill(100.0, N)), M_rot)

p0 = ArrayPartition(fill(10.0, N), Matrix{Float64}(I(5)))
p_mle = quasi_Newton(M, logprob_cost, logprob_gradient, p0; stopping_criterion = StopAfterIteration(100) | StopWhenProjectedNegativeGradientNormLess(1e-6))
println("Estimated variances: $(p_mle.x[1])")
cov_matrix_mle = p_mle.x[2] * Diagonal(p_mle.x[1]) * p_mle.x[2]'
println("Estimated covariance matrix:")
println(cov_matrix_mle)
nothing

Estimated variances: [9.337970649440651, 27.216612038545097, 4.9867072858156, 1.0, 1.0]
Estimated covariance matrix:
[6.754760409607135 3.463074094662132 … 0.9779826359789503 2.4475170855713917; 3.4630740946621326 23.537855765177717 … -7.285756807086521 -3.7754147823525464; … ; 0.9779826359789499 -7.285756807086521 … 4.373435619817869 2.7236152843597954; 2.4475170855713917 -3.7754147823525464 … 2.7236152843597954 3.8567865662085374]

We see that despite the original covariance matrix having variances ranging from 0.5 to 20.0, the estimated covariance matrix respects the box constraints of variances between 1.0 and 100.0.

Technical Details

This tutorial is cached. It was last run on the following package versions.

Status `~/work/Manopt.jl/Manopt.jl/tutorials/Project.toml`
  [47edcb42] ADTypes v1.22.2
  [6e4b80f9] BenchmarkTools v1.8.0
  [5ae59095] Colors v0.13.1
  [a0c0ee7d] DifferentiationInterface v0.7.20
  [31c24e10] Distributions v0.25.129
  [26cc04aa] FiniteDifferences v0.12.34
  [f6369f11] ForwardDiff v1.4.1
  [8ac3fa9e] LRUCache v1.6.2
  [af67fdf4] ManifoldDiff v0.4.5
  [1cead3c2] Manifolds v0.11.28
  [3362f125] ManifoldsBase v2.5.0
  [0fc0a36d] Manopt v0.6.2 `.`
  [91a5bcdd] Plots v1.41.6
  [731186ca] RecursiveArrayTools v4.3.4
  [37e2e46d] LinearAlgebra v1.12.0
  [9a3f8284] Random v1.11.0

This tutorial was last rendered July 17, 2026, 6:49:44.