The index for the UF Collection contains a set of statistics about each
matrix in the collection. A table of these statistics can be loaded
into MATLAB via index=UFget, and by list=UFkinds. Below is a summary
of what they are and how they are computed. The first two are for
the whole collection, and the rest are statistics for each matrix.
- LastRevisionDate:
this is a single string kept in the UF_index MATLAB
struct that states when the collection or the index was last modified.
- DownloadTimeStamp:
the date and time the index that you last downloaded the index.
- Group:
a cell array. Group{id} is the group name for the matrix whose
serial number is 'id'. Each matrix has a unique id number in the range of
1 to the number of matrices in the collection. Once an id is assigned to
a matrix, it never changes.
- Name:
Name{id} is the name of the matrix (excluding the Group). Name{id} is not
unique. The full name of a matrix should always be given as Group/Name.
- nrows: the number of rows in the matrix.
- ncols: the number of columns in the matrix.
- nnz: the number of numerically nonzero entries in the matrix,
or nnz(Problem.A) in MATLAB, where Problem=UFget(id) is a struct containing
the MATLAB format of the problem. This statistic can differ from the number of
'entries' explicitly stored in the matrix, however, since some of these entries
may be numerically zero. In the MATLAB format, these explicit zero entries are
stored in the binary Problem.Zeros matrix, since MATLAB drops all explicit
zeros from its sparse matrix storage. The Problem.A matrix in MATLAB has nnz
entries in it, with no explicit zeros. In the Matrix Market and
Rutherford-Boeing format, a single file holds all entries, both nonzero and the
explicit zero entries.
- nzero: the number of explicit entries present in the matrix
that are provided by the matrix author but which are numerically zero.
nzero(id) is nnz(Problem.Zeros).
- pattern_symmetry: Let S=spones(Problem.A) be the binary pattern
of the explicit nonzeros in the matrix. Let pmatched be the number of
matched offdiagonal entries, where both S(i,j) and S(j,i) are one, with
i not equal to j. Let nzoffdiag be the number of offdiagonal entries in S.
Then pattern_symmetry is the ratio of pmatched/nzoffdiag.
Note that if S(i,j) and S(j,i) are both one, then this pair of entries
is counted twice in both pmatched and nzoffdiag.
If the matrix is rectangular, this statistic is zero.
If there are no offdiagonal entries, the statistic is 1.
- numerical_symmetry:
Let xmatched be the number of matched offdiagonal entries, where
A(i,j) is equal to the complex conjugate of A(j,i) and where i and j are
not equal. Then numerical_symmetry is the ration xmatched / nzoffdiag
(or 1 if nzoffdiag is zero). This statistic is zero for rectangular matrices.
Note that this statistic measures how close a matrix is to being
Hermitian (A=A' in MATLAB). For complex symmetric matrices (A=A.' in
MATLAB), this ratio will be less than one (unless all offdiagonal entries
are real).
- isBinary: 1 if the matrix is binary (all entries are 0 or 1),
0 otherwise.
- isReal: 1 if the matrix is real, 0 if complex.
- nnzdiag: the number of numerically nonzero entries on the
diagonal (nnz (diag (Problem.A)) in MATLAB notation). This statistic
ignores explicit zero entries (Problem.Zeros in the MATLAB struct).
- posdef: 1 if the matrix is known to be symmetric positive
definite (or Hermitian positive definite for the complex case),
0 if it is known not to be, -1 if it is symmetric (or Hermitian)
but with unknown positive-definiteness. If the statistic is unknown (-1)
this may be revised in subsequent versions of the index.
- amd_lnz: Let C=S+S' where S=spones(A) is the binary pattern
of A. Then amd_lnz is number of nonzeros in the Cholesky factorization of
the matrix C(p,p) (assuming C is positive definite) where p=amd(C) is
the AMD fill-reducing ordering. This statistic is -2 for rectangular matrices
or for graph problems. It is -1 if it is not computed.
This figure gives an estimate of the memory requirements for x=A\b in MATLAB
for square matrices.
- amd_flops: The floating-point operation count for computing
the Cholesky factorization of C(p,p) (see above).
- amd_vnz: The number of entries in a Householder-vector
representation of the Q factor of R (but not the QR in MATLAB),
after a COLAMD fill-reducing ordering.
This is an upper bound on L for the LU factorization of A.
- amd_rnz: The number of entries in R for the QR factorization of A,
after a COLAMD fill-reducing ordering.
This is an upper bound on U for the LU factorization of A.
- nblocks: The number of blocks from dmperm
(see dmperm in MATLAB).
- sprank: the structural rank of the matrix, which is the number
of maximual number of nonzero entries that can be permuted to the diagonal
(see dmperm, or sprank in MATLAB). This statistic is not computed
for problems that represent graphs, since in those cases the diagonal
of the matrix is often not relevant (self-edges are often ignored).
- RBtype: the Rutherford Boeing type of the matrix (ignoring
explicit zeros in Problem.Zeros). RBtype is a a 3-letter lower-case
string. The first
letter is:
- r if A is real but not binary
- p if A is binary
- c if A is complex
- i if A is integer
The second letter:
- r if A is rectangular
- u if A is square and unsymmetric
- s if A is symmetric (nnz(A-A.') is zero in MATLAB)
- h if A is Hermitian (nnz(A-A') is zero in MATLAB)
- z if A is skew-symmetric (nnz(A+A.') is zero in MATLAB)
The third letter is always 'a' (for 'assembled').
The RB format allows for unassembled finite-element matrices, but they
are converted to assembled format for this collection.
- cholcand: 1 if the matrix is symmetric (Hermitian if complex)
and if all the diagonal entries are postive and real. Zero otherwise.
If 1, the matrix is a candidate for a Cholesky factorization, which
MATLAB will first try when computing x=A\b.
- ncc: the number of of strongly-connected components in the
graph of A (if A is square) or in the bipartite graph if A is rectangular.
The diagonal is ignored.
- isND: 1 if the problem comes from a 2D/3D discretization,
zero otherwise.
This determination is not a property of the matrix,
but a qualitative assesment of the kind of problem the matrix represents.
- isGraph: 1 if the problem is best considered as a graph rather
than a system of equations, zero otherwise.
This determination is not a property of the matrix,
but a qualitative assesment of the kind of problem the matrix represents.
UFstats.csv
A CSV file is also available with some of this index information
(UFstats.csv). The first line of the CSV file gives the number of matrices in
the collection, and the second line gives the LastRevisionDate.
Line k+2 in the file lists the following statistics for the matrix whose
id number is k: Group, Name, nrows, ncols, nnz, isReal, isBinary, isND,
posdef, pattern_symmetry, numerical_symmetry, and
kind.
SVD-based statistics
The following statistics are not (yet) in the UFindex. They are currently
available only on the web page for each matrix. You can also
download the singular values at
www.cise.ufl.edu/research/sparse/svd.
These were typically calculated with s=svd(full(A)) in MATLAB, are are thus
only available for modest-sized matrices.
- norm(A): the 2-norm of A (the largest singular value)
- min(svd(A)): the smallest singular value
- cond(A): the 2-norm condition number, which is the ratio of
the largest over the smallest singular value.
- rank(A): the rank of the matrix, which is the number of
singular values larger than the tolerance of
max(m,n)*eps(norm(A)). This tolerance is plotted in green in the
figure.
- sprank(A)-rank(A): sprank(A) (see above)
is an upper bound on the rank of A.
- null space dimension: the dimension of the null space
(zero if it has full numerical rank). This is simply min(m,n)-rank(A).
- full numerical rank?: 'yes' or 'no'.
- singular value gap: if k=rank(A), and the
matrix is rank deficient, then this is the ratio s(k)/s(k+1).
A red line between the kth and (k+1)st singular value is drawn to
illustrate this gap.
- singular values: these can be downloaded as a MATLAB MAT-file.
Each file contains a struct with the fields: s (a vector containing
the singular values), how (a string stating how the SVD was
computed), and status (a string that is either 'ok' or a warning).
If the status shows that the SVD did not converge, the singular
values are probably not computed accurately.