import matplotlib
if not hasattr(matplotlib.RcParams, "_get"):
matplotlib.RcParams._get = dict.get
Theory#
Setup#
We import the Python packages used throughout the project.
import numpy as np
import matplotlib.pyplot as plt
from numpy.linalg import svd, matrix_rank
from matplotlib.colors import ListedColormap, BoundaryNorm
from PIL import Image
Singular Value Decomposition (SVD)#
For any real matrix \(A \in \mathbb{R}^{m\times n}\), there exist orthogonal matrices \(U \in \mathbb{R}^{m\times m}\) and \(V \in \mathbb{R}^{n\times n}\), and a diagonal (rectangular) matrix \(\Sigma \in \mathbb{R}^{m\times n}\) such that
The diagonal entries of \(\Sigma\) are the singular values \(\sigma_1 \ge \sigma_2 \ge \cdots \ge 0\). They measure how strongly \(A\) stretches space in certain orthogonal directions.
The columns of \(V\) are the right singular vectors (input directions).
The columns of \(U\) are the left singular vectors (output directions).
\(\Sigma\) contains the stretch factors along those directions.
A convenient mental picture of the decomposition is:
\(V^T\): rotate (or reflect) the input coordinates into a special orthonormal basis,
\(\Sigma\): stretch (and possibly squash) along orthogonal axes,
\(U\): rotate (or reflect) again in the output space.
Computing SVD in NumPy#
A = np.array([[1, 2],
[-1, 1]], dtype=float)
U, s, Vt = svd(A) # s is a vector of singular values
Sigma = np.zeros_like(A) # for square A; for rectangular use (m,n)
np.fill_diagonal(Sigma, s)
# Verify reconstruction
A_recon = U @ Sigma @ Vt
np.allclose(A, A_recon)
True
Geometry of SVD in 2D: Unit Circle \(\rightarrow\) Ellipse#
For a full-rank \(2\times2\) matrix \(A\), the image of the unit circle
under the linear map \(x \mapsto Ax\) is an ellipse. SVD explains this ellipse precisely:
The right singular vectors \(v_1, v_2\) (columns of \(V\)) are orthonormal directions in the input plane.
Their images \(Av_1\) and \(Av_2\) are orthogonal in the output plane.
The lengths of these images are the singular values: \(\|A v_1\| = \sigma_1\), \(\|A v_2\| = \sigma_2\).
After normalizing \(u_i = \frac{1}{\sigma_i}Av_i\), the vectors \(u_1, u_2\) form an orthonormal basis in the output plane (columns of \(U\)).
So, for any unit vector \(x\), writing it in the \(V\)-basis as \(x = V z\) (with \(\|z\|=1\)) gives
This means \(\Sigma\) maps the unit circle to an axis-aligned ellipse with semi-axes \(\sigma_1\) and \(\sigma_2\), and then \(U\) rotates that ellipse to its final position.
Singular values and ellipse semi-axes#
The largest singular value \(\sigma_1\) equals the maximum stretch of any unit vector.
The smallest singular value \(\sigma_2\) equals the minimum stretch (for full rank, \(\sigma_2>0\)).
Rank and singular values#
The rank of \(A\) equals the number of non-zero singular values:
In practice (floating point), “non-zero” means “larger than a small numerical tolerance.”
A = np.array([[11, 2, 8],
[ 5,-10, 8],
[ 3, -6,12],
[13, -2, 4]], dtype=float)
matrix_rank(A), svd(A, compute_uv=False)
(3, array([24., 12., 6.]))
Outer-product expansion of \(A\)#
SVD gives a very useful decomposition of \(A\) as a sum of rank-1 matrices:
where \(r=\mathrm{rank}(A)\), \(u_i\) is column \(i\) of \(U\), and \(v_i^T\) is row \(i\) of \(V^T\). Each term \(\sigma_i\,u_i v_i^T\) is a rank-1 “pattern” in \(A\), scaled by \(\sigma_i\).
In computations, the matrix \(u_i v_i^T\) can be formed with np.outer(u_i, v_i).
U, s, Vt = svd(A)
A_from_rank1 = sum(s[i] * np.outer(U[:, i], Vt[i, :]) for i in range(len(s)))
np.allclose(A, A_from_rank1)
True
Low-rank approximation \(A_k\)#
If \(\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r > 0\), then the rank-\(k\) approximation is
This keeps only the \(k\) most important rank-1 components and discards smaller singular values, giving a compressed representation. Geometrically, it keeps only the \(k\) dominant orthogonal stretch directions.
A compact implementation:
def svd_approximation(A, k):
U, s, Vt = svd(A, full_matrices=False)
A_k = np.zeros_like(A, dtype=float)
for i in range(k):
A_k += s[i] * np.outer(U[:, i], Vt[i, :])
return A_k
Storage ratio for SVD compression#
Storing a full \(m\times n\) matrix requires \(mn\) numbers.
A rank-\(r\) SVD-style representation stores:
\(r\) singular values,
\(m r\) numbers for the first \(r\) columns of \(U\),
\(n r\) numbers for the first \(r\) columns of \(V\).
So the number count is \(r(1+m+n)\), and the storage ratio is
If \(r \ll \min(m,n)\), then \(SR\) can be far below 1 (strong compression).
If \(r\) approaches \(\min(m,n)\), the ratio approaches (and may exceed) 1, so compression becomes ineffective.
def storage_ratio(r, m, n):
return r * (1 + m + n) / (m * n)
Retained information#
A simple way to quantify how much of the matrix “signal” is retained by \(A_k\) is to look at the singular values:
This is a cumulative measure: increasing \(k\) increases \(RI\), with \(RI(r,r)=1\).
def retained_information(A):
s = svd(A, compute_uv=False)
return np.cumsum(s / np.sum(s))
def plot_retained_information(A):
ri = retained_information(A) * 100
ri = np.insert(ri, 0, 0)
ks = np.arange(len(ri))
plt.plot(ks, ri, marker='o')
plt.xlabel("Number of singular values (k)")
plt.ylabel("Retained information [%]")
plt.title("Retained information")
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()
SVD for images#
An image can be represented as a matrix of pixel intensities.
Grayscale images#
A grayscale image is a 2D array \(A \in \mathbb{R}^{m\times n}\) with values typically in \([0,255]\) (integers) or normalized to \([0,1]\) (floats). Applying SVD and forming \(A_k\) yields a compressed approximation that often remains visually close to the original for moderate \(k\).
A typical workflow is:
load the image as a matrix \(A\),
compute singular values and inspect their decay,
choose \(k\) based on desired retained information and storage ratio,
reconstruct \(A_k\) and compare visually.
def load_image_matrix(path, max_dim=1000, to_gray=True):
im = Image.open(path)
im = im.convert("L" if to_gray else "RGB")
w, h = im.size
scale = min(1.0, max_dim / max(w, h))
if scale < 1.0:
new_size = (max(1, int(round(w * scale))), max(1, int(round(h * scale))))
im = im.resize(new_size, Image.LANCZOS)
arr = np.asarray(im, dtype=np.float64) / 255.0
return arr
RGB images#
An RGB image is a 3D array of shape \((m,n,3)\). A standard approach is to apply SVD separately to each channel (R, G, B), approximate each channel by rank \(k\), and then stack them back together.
def svd_rgb_approximation(img, k):
img = np.asarray(img, dtype=float)
img_recon = np.zeros_like(img, dtype=float)
for c in range(3):
img_recon[:, :, c] = svd_approximation(img[:, :, c], k)
return np.clip(img_recon, 0.0, 1.0)