Singular Value Decomposition
We redesign the API of the SVD function of SciPy with wish.
Context¶
The SciPy library implements singular value decomposition (SVD), a matrix factorization that is instrumental in numerical analysis, machine learning and data science.
The SVD of a matrix A
provides three matrices U
, Sigma
and V
.
The matrix S
has the same shape as A
and is diagonal
(S[i,j] == 0
unless i == j
). The matrices U
and V
are unitary.
Refer for example to SVD on Wikipedia
for more information.
In any case, to compute A
from U
, S
and V
, do:
>>> from numpy import dot
>>> Vh = V.conjugate().transpose()
>>> A = dot(U, dot(S, Vh))
Basic API¶
The svd
function of SciPy works like this: you provide the matrix A
and it computes a matrix U
, a vector s
and a matrix Vh
1:
>>> from scipy.linalg import svd
>>> U, s, Vh = svd(A)
The default convention deviates a little from the definition of the SVD:
the vector s
is the diagonal of S
instead of S
itself
and Vh
corresponds to the conjugate transpose of V
instead of V
.
While there are good reasons for these conventions,
I'd rather follow the principle of least astonishment.
Thus, by default our SVD
implementation returns the triple (U, S, V)
:
>>> from wish.examples import svd
>>> U, S, V = svd(A)
Say that our starting point is
def svd(A):
U, S, V = ...
return U, S, V
To enable the simplest version of selectable return values, we change the code to:
import wish
def svd(A, returns="U, S, V"):
U, S, V = ...
return wish.grant(returns)
If you want to check the names that the user requires, do instead
def svd(A, returns="U, S, V"):
wishlist = wish.make(returns)
for name in wishlist:
if name not in ["U", "S", "V"]:
error = "{0!r} is not a valid return value"
raise NameError(error.format(name))
U, S, V = ...
return wishlist.grant()
Now, we believe that it should be possible to get the same output of SciPy. Tt's rather simple since we have enabled selectable return values:
>>> U, s, Vh = svd(A, returns="U, s, Vh")
The corresponding implementation is:
import numpy
import wish
def svd(A, returns="U, S, V"):
wishlist = wish.make(returns)
for name in wishlist:
if name not in ["U", "S", "V", "s", "Vh"]:
error = "{0!r} is not a valid return value"
raise NameError(error.format(name))
U, S, V = ...
if "s" in wishlist:
s = numpy.diagonal(S)
if "Vh" in wishlist:
Vh = V.conjugate().transpose()
return wishlist.grant()
Singular Values¶
We frequently need to compute only the singular values of a matrix.
The scipy svd
achieves this with an additional option compute_uv
that can be set to False
as describe in the documentation:
compute_uv: bool, optional.
Whether to compute also U
and Vh
in addition to s
.
Default is True.
It is used like this:
>>> from scipy.linalg import svd
>>> s = svd(M, compute_uv=False)
This interface is not totally satisfactory:
while we easily understand that we won't
compute -- and therefore won't return -- U
and Vh
,
it is not totally obvious what is returned.
And if in the first place you were only interested
in the s
array, you may not know what U
and Vh
are ...
There is nothing here that cannot be solved by reading
the documentation of svd
, but still this is not optimal.
The SciPy folks are apparently aware of this because they have implemented
a helper function, svdvals
, to avoid calling svd
in this case:
svdvals(a, overwrite_a=False, check_finite=True)
Compute singular values of a matrix.
With svdvals
, we can simply do
>>> from scipy.linalg import svdvals
>>> s = svdvals(A)
but of course now, you have another function to remember of.
And while it is a one-liner, its full documentation essentially
duplicates the one of svd
.
With the wish version, the equivalent function call is simply
>>> from wish.examples import svd
>>> s = svd(A, returns="s")
We know what is returned and a no new function is necessary.
Now, if we have a simpler algorithm to provide only S
,
we may check the list of wishes and proceed accordingly:
import numpy
import wish
def svd(A, returns="U, S, V"):
wishlist = wish.make(returns)
for name in wishlist:
if name not in ["U", "S", "V", "s", "Vh"]:
error = "{0!r} is not a valid return value"
raise NameError(error.format(name))
if "U" in wishlist or "V" in wishlist or "Vh" in wishlist:
U, S, V = ...
else:
S = ...
if "s" in wishlist:
s = numpy.diagonal(S)
if "Vh" in wishlist:
Vh = V.conjugate().transpose()
return wishlist.grant()
Reconstruction¶
Since SciPy svd
returns the diagonal s
of S
instead of S
itself,
it comes with the helper function diagsvd
to build
S
from the singular values:
diagsvd(s, M, N)
Construct the sigma matrix in SVD from singular values and size M
, N
.
It is used like this:
>>> U, s, Vh = svd(A)
>>> S = diagsvd(s, shape(U)[1], shape(Vh)[0])
The matrix S
is typically used to reconstruct A
:
>>> from numpy import dot
>>> A = dot(U, dot(S, Vh))
Of course with the wish-enabled version, the same effect is obtained with
>>> from wish.examples import svd
>>> U, S, Vh = wsvd(A, returns="U, S, Vh")
>>> A = dot(U, dot(S, Vh))
Now, if you don't want to reconstruct the matrix A
but a version of it
based on a different set of singular values -- for example to achieve
some dimensionality reduction
-- this is easy because NumPy already provides
the appropriate functions. First, get the singular values with
>>> U, S, Vh, s = w_svd(A, returns="U, S, Vh, s")
then replace the vector of singular values s
with a modified vector s2
and inject the result in S
>>> s2 = ...
>>> from numpy import fill_diagonal
>>> fill_diagonal(S, s)
>>> A2 = dot(U, dot(S, Vh))
The New API¶
The redesigned svd
function now has arguably a simpler and more consistent
interface:
-
The simplest and most common usage follows closely the domain conventions,
-
The "configurable return values" pattern is supported with the extra parameter
returns
-
The ad hoc parameter
compute_uv
is not needed anymore. -
Two helper functions
svdvals
anddiagsvd
are not needed anymore
The description of parameters and returned values is only slightly different2:
Parameters
-
A
:(M, N)
array_likeMatrix to decompose.
-
...
-
returns
:string
, optionalSelect the returned values, among
U
,S
,s
,V
andVh
. Default is"U, S, V"
.
Returns
The selection of return values is configurable with the returns
parameter.
-
U
:ndarray
Unitary matrix having left singular vectors as columns.
-
S
:ndarray
A matrix with the singular values of
A
, sorted in non-increasing order, in the main diagonal and zeros elsewhere. -
V
:ndarray
Unitary matrix having right singular vectors as columns.
-
s
:ndarray
, not returned by defaultThe singular values, sorted in non-increasing order.
-
Vh
:ndarray
, not returned by defaultConjugate transpose of
V
-
For example, if
>>> A = [[ 0.0, -2.0, 0.0], ... [ 1.0, 0.0, 0.0]]
then
>>> U array([[1., 0.], [ 0., 1.]]) >>> s array([ 2., 1.]) >>> Vh array([[ 0., -1., 0.], [ 1., 0., 0.], [ 0., 0., 1.]])
-
we have omitted any reference to the the parameters
full_matrices
,overwrite_a
,check_finite
andlapack_driver
since the behavior ofsvd
has not changed in this respect. Refer to the Scipy documentation for details. ↩