Theory

Theory#

Setup#

We import the Python packages used throughout the project.

import numpy as np
import matplotlib.pyplot as plt

from numpy.linalg import svd, matrix_rank
from matplotlib.colors import ListedColormap, BoundaryNorm
from PIL import Image

Singular Value Decomposition (SVD)#

For any real matrix \(A \in \mathbb{R}^{m\times n}\), there exist orthogonal matrices \(U \in \mathbb{R}^{m\times m}\) and \(V \in \mathbb{R}^{n\times n}\), and a diagonal (rectangular) matrix \(\Sigma \in \mathbb{R}^{m\times n}\) such that

\[\begin{equation*} A = U \,\Sigma\, V^T. \end{equation*}\]

The diagonal entries of \(\Sigma\) are the singular values \(\sigma_1 \ge \sigma_2 \ge \cdots \ge 0\). They measure how strongly \(A\) stretches space in certain orthogonal directions.

The columns of \(V\) are the right singular vectors (input directions).
The columns of \(U\) are the left singular vectors (output directions).
\(\Sigma\) contains the stretch factors along those directions.

A convenient mental picture of the decomposition is:

\(V^T\): rotate (or reflect) the input coordinates into a special orthonormal basis,
\(\Sigma\): stretch (and possibly squash) along orthogonal axes,
\(U\): rotate (or reflect) again in the output space.

Computing SVD in NumPy#

A = np.array([[1, 2],
              [-1, 1]], dtype=float)

U, s, Vt = svd(A)          # s is a vector of singular values
Sigma = np.zeros_like(A)    # for square A; for rectangular use (m,n)
np.fill_diagonal(Sigma, s)

# Verify reconstruction
A_recon = U @ Sigma @ Vt
np.allclose(A, A_recon)

True

Geometry of SVD in 2D: Unit Circle \(\rightarrow\) Ellipse#

For a full-rank \(2\times2\) matrix \(A\), the image of the unit circle

\[\begin{equation*} S = \{x\in\mathbb{R}^2 \mid \|x\|=1\} \end{equation*}\]

under the linear map \(x \mapsto Ax\) is an ellipse. SVD explains this ellipse precisely:

The right singular vectors \(v_1, v_2\) (columns of \(V\)) are orthonormal directions in the input plane.
Their images \(Av_1\) and \(Av_2\) are orthogonal in the output plane.
The lengths of these images are the singular values: \(\|A v_1\| = \sigma_1\), \(\|A v_2\| = \sigma_2\).
After normalizing \(u_i = \frac{1}{\sigma_i}Av_i\), the vectors \(u_1, u_2\) form an orthonormal basis in the output plane (columns of \(U\)).

So, for any unit vector \(x\), writing it in the \(V\)-basis as \(x = V z\) (with \(\|z\|=1\)) gives

\[\begin{equation*} Ax = U\Sigma V^T x = U\Sigma z. \end{equation*}\]

This means \(\Sigma\) maps the unit circle to an axis-aligned ellipse with semi-axes \(\sigma_1\) and \(\sigma_2\), and then \(U\) rotates that ellipse to its final position.

Singular values and ellipse semi-axes#

The largest singular value \(\sigma_1\) equals the maximum stretch of any unit vector.
The smallest singular value \(\sigma_2\) equals the minimum stretch (for full rank, \(\sigma_2>0\)).

Rank and singular values#

The rank of \(A\) equals the number of non-zero singular values:

\[\begin{equation*} \mathrm{rank}(A) = \#\{i \mid \sigma_i > 0\}. \end{equation*}\]

In practice (floating point), “non-zero” means “larger than a small numerical tolerance.”

A = np.array([[11, 2, 8],
              [ 5,-10, 8],
              [ 3, -6,12],
              [13, -2, 4]], dtype=float)

matrix_rank(A), svd(A, compute_uv=False)

(3, array([24., 12.,  6.]))

Outer-product expansion of \(A\)#

SVD gives a very useful decomposition of \(A\) as a sum of rank-1 matrices:

\[\begin{equation*} A = \sum_{i=1}^{r} \sigma_i\, u_i\, v_i^T, \end{equation*}\]

where \(r=\mathrm{rank}(A)\), \(u_i\) is column \(i\) of \(U\), and \(v_i^T\) is row \(i\) of \(V^T\). Each term \(\sigma_i\,u_i v_i^T\) is a rank-1 “pattern” in \(A\), scaled by \(\sigma_i\).

In computations, the matrix \(u_i v_i^T\) can be formed with np.outer(u_i, v_i).

U, s, Vt = svd(A)
A_from_rank1 = sum(s[i] * np.outer(U[:, i], Vt[i, :]) for i in range(len(s)))
np.allclose(A, A_from_rank1)

True

Low-rank approximation \(A_k\)#

If \(\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r > 0\), then the rank-\(k\) approximation is

\[\begin{equation*} A_k = \sum_{i=1}^{k} \sigma_i\, u_i\, v_i^T, \qquad k \le r. \end{equation*}\]

This keeps only the \(k\) most important rank-1 components and discards smaller singular values, giving a compressed representation. Geometrically, it keeps only the \(k\) dominant orthogonal stretch directions.

A compact implementation:

def svd_approximation(A, k):
    U, s, Vt = svd(A, full_matrices=False)
    A_k = np.zeros_like(A, dtype=float)
    for i in range(k):
        A_k += s[i] * np.outer(U[:, i], Vt[i, :])
    return A_k

Storage ratio for SVD compression#

Storing a full \(m\times n\) matrix requires \(mn\) numbers.

A rank-\(r\) SVD-style representation stores:

\(r\) singular values,
\(m r\) numbers for the first \(r\) columns of \(U\),
\(n r\) numbers for the first \(r\) columns of \(V\).

So the number count is \(r(1+m+n)\), and the storage ratio is

\[\begin{equation*} SR(r,m,n)=\frac{r(1+m+n)}{mn}. \end{equation*}\]

If \(r \ll \min(m,n)\), then \(SR\) can be far below 1 (strong compression).
If \(r\) approaches \(\min(m,n)\), the ratio approaches (and may exceed) 1, so compression becomes ineffective.

def storage_ratio(r, m, n):
    return r * (1 + m + n) / (m * n)

Retained information#

A simple way to quantify how much of the matrix “signal” is retained by \(A_k\) is to look at the singular values:

\[\begin{equation*} RI(k,r)=\frac{\sigma_1+\cdots+\sigma_k}{\sigma_1+\cdots+\sigma_r}. \end{equation*}\]

This is a cumulative measure: increasing \(k\) increases \(RI\), with \(RI(r,r)=1\).

def retained_information(A):
    s = svd(A, compute_uv=False)
    return np.cumsum(s / np.sum(s))

def plot_retained_information(A):
    ri = retained_information(A) * 100
    ri = np.insert(ri, 0, 0)
    ks = np.arange(len(ri))
    plt.plot(ks, ri, marker='o')
    plt.xlabel("Number of singular values (k)")
    plt.ylabel("Retained information [%]")
    plt.title("Retained information")
    plt.grid(True, linestyle='--', alpha=0.6)
    plt.show()

SVD for images#

An image can be represented as a matrix of pixel intensities.

Grayscale images#

A grayscale image is a 2D array \(A \in \mathbb{R}^{m\times n}\) with values typically in \([0,255]\) (integers) or normalized to \([0,1]\) (floats). Applying SVD and forming \(A_k\) yields a compressed approximation that often remains visually close to the original for moderate \(k\).

A typical workflow is:

load the image as a matrix \(A\),
compute singular values and inspect their decay,
choose \(k\) based on desired retained information and storage ratio,
reconstruct \(A_k\) and compare visually.

def load_image_matrix(path, max_dim=1000, to_gray=True):
    im = Image.open(path)
    im = im.convert("L" if to_gray else "RGB")

    w, h = im.size
    scale = min(1.0, max_dim / max(w, h))
    if scale < 1.0:
        new_size = (max(1, int(round(w * scale))), max(1, int(round(h * scale))))
        im = im.resize(new_size, Image.LANCZOS)

    arr = np.asarray(im, dtype=np.float64) / 255.0
    return arr

RGB images#

An RGB image is a 3D array of shape \((m,n,3)\). A standard approach is to apply SVD separately to each channel (R, G, B), approximate each channel by rank \(k\), and then stack them back together.

def svd_rgb_approximation(img, k):
    img = np.asarray(img, dtype=float)
    img_recon = np.zeros_like(img, dtype=float)

    for c in range(3):
        img_recon[:, :, c] = svd_approximation(img[:, :, c], k)

    return np.clip(img_recon, 0.0, 1.0)