Computing a Robust PCA
For a given matrix $X ∈ ℝ^{p×n}$ whose columns represent points in $ℝ^p$, a matrix $U ∈ ℝ^{p×d}$ is computed for a given dimension $d < n$: $U$ represents an ONB of $ℝ^{p× d}$ such that the column space of $U$ approximates the points (columns of $X$) $X_i$ as well as possible.
We compute $U$ as a minimizer over the Grassmann manifold of the cost function:
$$\begin{split} f(U) & = \frac{1}{n}\sum_{i=1}^{n}{\operatorname{dist}(X_i, \operatorname{span}(U))}\\ & = \frac{1}{n} \sum_{i=1}^{n}\lVert UU^TX_i - X_i\rVert \end{split}$$
The output cost represents the average distance achieved with the returned $U$, an orthonormal basis (or a point on the Stiefel manifold) representing the subspace (a point on the Grassmann manifold). Notice that norms are not squared, so we have a robust cost function. This means that $f$ is nonsmooth, therefore we regularize with a pseudo-Huber loss function of smoothing parameter $ϵ$.
$$f_ϵ(U) = \frac{1}{n} \sum_{i=1}^n{ℓ_ϵ(\lVert UU^{\mathrm{T}}X_i - X_i\rVert)},$$
Where $ℓ_ϵ(x) = \sqrt{x^2 + ϵ^2} - ϵ$.
The smoothing parameter is iteratively reduced (with warm starts).
First, we generate random data for illustration purposes:
using Random
begin
samples = 40
outliers = 15
Random.seed!(42)
X = randn(2, 1) * [1:samples;]' + 0.05 * randn(2, samples) .* [1:samples 1:samples]'
# Outliers:
P = shuffle(1:size(X, 2))'
X[:, P[1:outliers]] = 30 * randn(2, outliers)
d = 1
end;
We use the Manopt toolbox to optimize the regularized cost function over the Grassmann manifold.
To do this, we first need to define the problem structure.
using Manifolds
p, n = size(X);
M = Grassmann(p, d);
For the initial matrix $U_0$ we use classical PCA via singular value decomposition. Thus, we use the first $d$ left singular vectors.
Then, we compute an optimum of the cost function over the Grassmann manifold. We use a trust-region method which is implemented in Manopt.jl
.
For this, the objective function robustpca_cost
and the Riemannian gradient grad_pca_cost
must first be implemented as functions. Since we Huber regularize them, the functions also have the ϵ
as a parameter. To obtain the Riemannian gradient we first compute the Euclidian gradient. Afterwards it is projected onto the tangent space by using the orthogonal projection $I-U*U^T$.
The trust-region method also requires the Hessian Matrix. By using ApproxHessianFiniteDifference
we get an approximation of the Hessian Matrix.
We run the procedure several times, where the smoothing parameter $ϵ$ is reduced iteratively.
using LinearAlgebra, Manopt, Statistics
begin
ϵ = 1.0
iterations = 6
reduction = 0.5
end;
U, S, V = svd(X);
U0 = U[:, 1:d];
U0
2×1 Matrix{Float64}: -0.7494248652139397 0.6620893983436593
robustpca_cost(M, U0, 0.0)
10.218786443576352
function grad_pca_cost(M, U, ϵ)
UtX = U' * X
vecs = U * UtX - X
sqnrms = sum(vecs .^ 2; dims=1)
# Compute the Euclidean gradient
G = zeros(p, d)
for i in 1:n
G = G + (1 / (sqrt(sqnrms[i] + ϵ^2))) * vecs[:, i] * UtX[:, i]'
end
G = 1 / n * G
# Convert to Riemannian gradient
return (I - U * U') * G
end
grad_pca_cost (generic function with 1 method)
function robustpca_cost(M, U, ϵ)
vecs = U * U' * X - X
sqnrms = sum(vecs .^ 2; dims=1)
dim = size(sqnrms)
vals = sqrt.(sqnrms + ϵ^2 * ones(dim)) - ϵ * ones(dim)
return mean(vals)
end
robustpca_cost (generic function with 1 method)
begin
Ui = copy(M, U0)
ϵi = ϵ
for i in 1:iterations
global Ui = trust_regions(
M,
(M, p) -> robustpca_cost(M, p, ϵi),
(M, p) -> grad_pca_cost(M, p, ϵi),
ApproxHessianFiniteDifference(
M,
Ui,
(M, p) -> grad_pca_cost(M, p, ϵi);
vector_transport_method=ProjectionTransport(),
retraction_method=PolarRetraction(),
),
Ui;
(project!)=project!,
)
global ϵi = ϵi * reduction
end
end;
Ui
2×1 Matrix{Float64}: -0.6795640993446082 0.7336161359198359
robustpca_cost(M, Ui, 0.0)
9.412961993577344
Finally, the results are presented graphically. The data points are visualized in a scatter plot. The result of the robust PCA and (for comparison) the standard SVD solution are plotted as straight lines.
using Plots
fig = plot(X[1, :], X[2, :]; seriestype=:scatter, label="Data points");
plot!(
fig,
Ui[1] * [-1, 1] * 100,
Ui[2] * [-1, 1] * 100;
linecolor=:red,
linewidth=2,
label="Robust PCA",
);
plot!(
fig,
U0[1] * [-1, 1] * 100,
U0[2] * [-1, 1] * 100;
xlims=1.1 * [minimum(X[1, :]), maximum(X[1, :])],
ylims=1.1 * [minimum(X[2, :]), maximum(X[2, :])],
linewidth=2,
linecolor=:black,
label="Standard SVD",
)