Verifying gradients and Hessians
If you have computed a gradient or differential and you are not sure whether it is correct.
Manopt.check_Hessian
— Functioncheck_Hessian(M, f, grad_f, Hess_f, p=rand(M), X=rand(M; vector_at=p), Y=rand(M, vector_at=p); kwargs...)
Verify numerically whether the Hessian $\operatorname{Hess} f(M,p, X)$ of f(M,p)
is correct.
For this either a second-order retraction or a critical point $p$ of f
is required. The approximation is then
\[f(\operatorname{retr}_p(tX)) = f(p) + t⟨\operatorname{grad} f(p), X⟩ + \frac{t^2}{2}⟨\operatorname{Hess}f(p)[X], X⟩ + \mathcal O(t^3)\]
or in other words, that the error between the function $f$ and its second order Taylor behaves in error $\mathcal O(t^3)$, which indicates that the Hessian is correct, cf. also [Bou23, Section 6.8].
Note that if the errors are below the given tolerance and the method is exact, no plot is generated.
Keyword arguments
check_grad
: (true
) verify that $\operatorname{grad} f(p) ∈ T_p\mathcal M$.check_linearity
: (true
) verify that the Hessian is linear, seeis_Hessian_linear
usinga
,b
,X
, andY
check_symmetry
: (true
) verify that the Hessian is symmetric, seeis_Hessian_symmetric
check_vector
: (false
) verify that $\operatorname{Hess} f(p)[X] ∈ T_p\mathcal M$ usingis_vector
.mode
: (:Default
) specify the mode for the verification; the default assumption is, that the retraction provided is of second order. Otherwise one can also verify the Hessian if the pointp
is a critical point. THen set the mode to:CritalPoint
to usegradient_descent
to find a critical point. Note: this requires (and evaluates) new tangent vectorsX
andY
atol
,rtol
: (same defaults asisapprox
) tolerances that are passed down to all checksa
,b
two real values to verify linearity of the Hessian (ifcheck_linearity=true
)N
: (101
) number of points to verify within thelog_range
default range $[10^{-8},10^{0}]$exactness_tol
: (1e-12
) if all errors are below this tolerance, the verification is considered to be exactio
: (nothing
) provide anIO
to print the result togradient
: (grad_f(M, p)
) instead of the gradient function you can also provide the gradient atp
directlyHessian
: (Hess_f(M, p, X)
) instead of the Hessian function you can provide the result of $\operatorname{Hess} f(p)[X]$ directly. Note that evaluations of the Hessian might still be necessary for checking linearity and symmetry and/or when using:CriticalPoint
mode.limits
: ((1e-8,1)
) specify the limits in thelog_range
log_range
: (range(limits[1], limits[2]; length=N)
) specify the range of points (in log scale) to sample the Hessian lineN
: (101
) number of points to use within thelog_range
default range $[10^{-8},10^{0}]$plot
: (false
) whether to plot the resulting verification (requiresPlots.jl
to be loaded). The plot is in log-log-scale. This is returned and can then also be saved.retraction_method
: (default_retraction_method(M, typeof(p))
) retraction method to use forslope_tol
: (0.1
) tolerance for the slope (global) of the approximationerror
: (:none
) how to handle errors, possible values::error
,:info
,:warn
window
: (nothing
) specify window sizes within thelog_range
that are used for the slope estimation. the default is, to use all window sizes2:N
.
The kwargs...
are also passed down to the check_vector
and the check_gradient
call, such that tolerances can easily be set.
While check_vector
is also passed to the inner call to check_gradient
as well as the retraction_method
, this inner check_gradient
is meant to be just for inner verification, so it does not throw an error nor produce a plot itself.
Manopt.check_differential
— Functioncheck_differential(M, F, dF, p=rand(M), X=rand(M; vector_at=p); kwargs...)
Check numerically whether the differential dF(M,p,X)
of F(M,p)
is correct.
This implements the method described in [Bou23, Section 4.8].
Note that if the errors are below the given tolerance and the method is exact, no plot is generated,
Keyword arguments
exactness_tol
: (1e-12
) if all errors are below this tolerance, the differential is considered to be exactio
: (nothing
) provide anIO
to print the result tolimits
: ((1e-8,1)
) specify the limits in thelog_range
log_range
: (range(limits[1], limits[2]; length=N)
) specify the range of points (in log scale) to sample the differential lineN
: (101
) number of points to verify within thelog_range
default range $[10^{-8},10^{0}]$name
: ("differential"
) name to display in the plotplot
: (false
) whether to plot the result (ifPlots.jl
is loaded). The plot is in log-log-scale. This is returned and can then also be saved.retraction_method
: (default_retraction_method(M, typeof(p))
) retraction method to useslope_tol
: (0.1
) tolerance for the slope (global) of the approximationthrow_error
: (false
) throw an error message if the differential is wrongwindow
: (nothing
) specify window sizes within thelog_range
that are used for the slope estimation. the default is, to use all window sizes2:N
.
Manopt.check_gradient
— Functioncheck_gradient(M, F, gradF, p=rand(M), X=rand(M; vector_at=p); kwargs...)
Verify numerically whether the gradient gradF(M,p)
of F(M,p)
is correct, that is whether
\[f(\operatorname{retr}_p(tX)) = f(p) + t⟨\operatorname{grad} f(p), X⟩ + \mathcal O(t^2)\]
or in other words, that the error between the function $f$ and its first order Taylor behaves in error $\mathcal O(t^2)$, which indicates that the gradient is correct, cf. also [Bou23, Section 4.8].
Note that if the errors are below the given tolerance and the method is exact, no plot is generated.
Keyword arguments
check_vector
: (true
) verify that $\operatorname{grad} f(p) ∈ T_p\mathcal M$ usingis_vector
.exactness_tol
: (1e-12
) if all errors are below this tolerance, the gradient is considered to be exactio
: (nothing
) provide anIO
to print the result togradient
: (grad_f(M, p)
) instead of the gradient function you can also provide the gradient atp
directlylimits
: ((1e-8,1)
) specify the limits in thelog_range
log_range
: (range(limits[1], limits[2]; length=N)
) - specify the range of points (in log scale) to sample the gradient lineN
: (101
) number of points to verify within thelog_range
default range $[10^{-8},10^{0}]$plot
: (false
) whether to plot the result (ifPlots.jl
is loaded). The plot is in log-log-scale. This is returned and can then also be saved.retraction_method
: (default_retraction_method(M, typeof(p))
) retraction method to useslope_tol
: (0.1
) tolerance for the slope (global) of the approximationatol
,rtol
: (same defaults asisapprox
) tolerances that are passed down tois_vector
ifcheck_vector
is set totrue
error
: (:none
) how to handle errors, possible values::error
,:info
,:warn
window
: (nothing
) specify window sizes within thelog_range
that are used for the slope estimation. the default is, to use all window sizes2:N
.
The remaining keyword arguments are also passed down to the check_vector
call, such that tolerances can easily be set.
Manopt.is_Hessian_linear
— Functionis_Hessian_linear(M, Hess_f, p,
X=rand(M; vector_at=p), Y=rand(M; vector_at=p), a=randn(), b=randn();
error=:none, io=nothing, kwargs...
)
Verify whether the Hessian function Hess_f
fulfills linearity,
\[\operatorname{Hess} f(p)[aX + bY] = b\operatorname{Hess} f(p)[X] + b\operatorname{Hess} f(p)[Y]\]
which is checked using isapprox
and the keyword arguments are passed to this function.
Optional arguments
error
: (:none
) how to handle errors, possible values::error
,:info
,:warn
Manopt.is_Hessian_symmetric
— Functionis_Hessian_symmetric(M, Hess_f, p=rand(M), X=rand(M; vector_at=p), Y=rand(M; vector_at=p);
error=:none, io=nothing, atol::Real=0, rtol::Real=atol>0 ? 0 : √eps
)
Verify whether the Hessian function Hess_f
fulfills symmetry, which means that
\[⟨\operatorname{Hess} f(p)[X], Y⟩ = ⟨X, \operatorname{Hess} f(p)[Y]⟩\]
which is checked using isapprox
and the kwargs...
are passed to this function.
Optional arguments
atol
,rtol
with the same defaults as the usualisapprox
error
: (:none
) how to handle errors, possible values::error
,:info
,:warn
Literature
- [Bou23]
- N. Boumal. An Introduction to Optimization on Smooth Manifolds. First Edition (Cambridge University Press, 2023).