import matplotlib
if not hasattr(matplotlib.RcParams, "_get"):
    matplotlib.RcParams._get = dict.get

Theory#

Setup#

We import the necessary Python packages.

import numpy as np
import matplotlib.pyplot as plt

1. From scalar functions to vector functions#

From earlier mathematics you know scalar functions such as

\[ f(x)=x^{2}, \qquad x\in\mathbb{R}, \qquad (f:\mathbb{R}\to \mathbb{R}). \]

For a value \(x\in\mathbb{R}\) the function returns a value \(f(x)\in\mathbb{R}\). For example \(f(3)=9\).

The range of \(f(x)=x^2\) is

\[ \mathrm{range}(f)=[0,\infty[, \]

because squaring a real number can never produce a negative value.

We will focus on a special type of functions that take vectors as inputs and output vectors as well, for example

\[ f:\mathbb{R}^{2}\to\mathbb{R}^{2}. \]

To define these functions in a practical way, we introduce matrices.

2. Matrices#

A real \(m\times n\) matrix is a rectangular table of numbers with \(m\) rows and \(n\) columns.

Examples#

\[\begin{split} M_1 = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \qquad\text{is a }2\times 2\text{ matrix.} \end{split}\]
\[\begin{split} M_2=\begin{bmatrix}1 & 2 & 3 \\ 4 & 5 & 6\end{bmatrix} \qquad\text{is a }2\times 3\text{ matrix.} \end{split}\]

We speak about rows and columns in a matrix. For \(M_2\),

  • row 1 is \(\begin{bmatrix}1&2&3\end{bmatrix}\),

  • row 2 is \(\begin{bmatrix}4&5&6\end{bmatrix}\),

and the columns are

\[\begin{split} \begin{bmatrix}1\\4\end{bmatrix},\quad \begin{bmatrix}2\\5\end{bmatrix},\quad \begin{bmatrix}3\\6\end{bmatrix}. \end{split}\]

3. Matrix–vector products#

Let

\[ M\in\mathbb{R}^{m\times n}, \qquad \mathbf{v}\in\mathbb{R}^{n}. \]

The matrix–vector product \(M\mathbf{v}\) is defined by taking dot products between each row of \(M\) and \(\mathbf{v}\):

\[\begin{split} M\mathbf{v}= \begin{bmatrix} \mathbf{r}_1\cdot \mathbf{v}\\ \mathbf{r}_2\cdot \mathbf{v}\\ \vdots\\ \mathbf{r}_m\cdot \mathbf{v} \end{bmatrix} \in\mathbb{R}^{m}. \end{split}\]

Example#

\[\begin{split} M=\begin{bmatrix}1 & 2 \\ 3 & 4\end{bmatrix},\qquad \mathbf{v}=\begin{bmatrix}-1\\1\end{bmatrix} \end{split}\]
\[\begin{split} M\mathbf{v}= \begin{bmatrix} 1\cdot(-1)+2\cdot 1\\ 3\cdot(-1)+4\cdot 1 \end{bmatrix} = \begin{bmatrix}1\\1\end{bmatrix}. \end{split}\]

In NumPy you compute matrix products with @:

M = np.array([[1,2],[3,4]])
v = np.array([-1,1])
print(M @ v)
[1 1]

4. Linear transformations#

A linear transformation (also called a linear mapping) is a function of the form

\[ f:\mathbb{R}^{n}\to\mathbb{R}^{m},\qquad f(\mathbf{v})=A\mathbf{v}, \]

where \(A\in\mathbb{R}^{m\times n}\) is fixed.

5. Column picture#

Let \(A\in\mathbb{R}^{2\times 2}\) with columns \(\mathbf{s}_1,\mathbf{s}_2\):

\[\begin{split} A= \begin{bmatrix} | & |\\ \mathbf{s}_1 & \mathbf{s}_2\\ | & | \end{bmatrix}. \end{split}\]

For \(\mathbf{v}=\begin{bmatrix}v_1\\v_2\end{bmatrix}\),

\[ A\mathbf{v}=v_1\mathbf{s}_1+v_2\mathbf{s}_2. \]

This means: the image consists of all linear combinations of the columns of \(A\).

The following code cell visualizes the decomposition \(A\mathbf{v}=v_1\mathbf{s}_1+v_2\mathbf{s}_2\):

# --- input (experiment with the numerical values here) ---
s1 = np.array([1, 3])      # first column
s2 = np.array([2, 0.5])    # second column
v1, v2 = 2, 3              # coefficients
# ------------------------------------------

f_v = v1*s1 + v2*s2

fig, ax = plt.subplots()

# draw column products
ax.quiver(0, 0, *v1*s1, angles='xy', scale_units='xy', scale=1, color="blue")
ax.text(*v1*s1/2, r'$v_1\mathbf{s}_1$', fontsize=12, color="k", ha='left', va='top')
ax.plot([v1*s1[0], f_v[0]], [v1*s1[1], f_v[1]], 'b--', linewidth=1)

ax.quiver(0, 0, *v2*s2, angles='xy', scale_units='xy', scale=1, color="blue")
ax.text(*v2*s2/2, r'$v_2\mathbf{s}_2$', fontsize=12, color="k", ha='left', va='top')
ax.plot([v2*s2[0], f_v[0]], [v2*s2[1], f_v[1]], 'b--', linewidth=1)

# draw columns
ax.quiver(0, 0, *s1, angles='xy', scale_units='xy', scale=1, color="green")
ax.text(*s1/2, r'$\mathbf{s}_1$', fontsize=12, color="k", ha='right', va='bottom')
ax.quiver(0, 0, *s2, angles='xy', scale_units='xy', scale=1, color="green")
ax.text(*s2/2, r'$\mathbf{s}_2$', fontsize=12, color="k", ha='right', va='bottom')

# draw f(v)
ax.quiver(0, 0, *f_v, angles='xy', scale_units='xy', scale=1, color="r")
ax.text(*f_v/2, r'$f(\mathbf{v})$', fontsize=12, color="k", ha='right', va='bottom')

# plot appearance
ax.axhline(0, color="black", linewidth=1)
ax.axvline(0, color="black", linewidth=1)
ax.set_aspect("equal")
ax.set_xlim(min(v1*s1[0], v2*s2[0], f_v[0], 0) - 1, max(v1*s1[0], v2*s2[0], f_v[0], 0) + 1)
ax.set_ylim(min(v1*s1[1], v2*s2[1], f_v[1], 0) - 1, max(v1*s1[1], v2*s2[1], f_v[1], 0) + 1)
ax.grid(True, which='both', linestyle='--', linewidth=0.5)
ax.set_xlabel("$x$")
ax.set_ylabel("$y$")
plt.show()
../../_images/b0acea00009f6e47fe1508f292d089e3323fda04ae3d7d6aac00b21303981bad.png

6. Rank and invertibility#

The rank of a matrix \(A\) is the dimension of the space spanned by its columns (equivalently: the number of linearly independent columns).

  • In \(\mathbb{R}^{2\times 2}\), rank can be \(0\), \(1\), or \(2\).

  • If \(\mathrm{rank}(A)=2\), the columns are not parallel, the image is all of \(\mathbb{R}^2\), and \(A\) is invertible.

  • If \(\mathrm{rank}(A)<2\), the transformation “collapses” space into a line or a point, and \(A\) is not invertible.

In Python, you can compute rank via:

# Example:
A = np.array([[1, 1], [-1, 1]])
print(np.linalg.matrix_rank(A))
2

7. Rotations#

A rotation by an angle \(\theta\) (radians) in the positive direction is given by

\[\begin{split} R(\theta)= \begin{bmatrix} \cos\theta & -\sin\theta\\ \sin\theta & \cos\theta \end{bmatrix}. \end{split}\]

Then \(R(\theta)\mathbf{v}\) is \(\mathbf{v}\) rotated by \(\theta\), and importantly the length is preserved:

\[ \lVert R(\theta)\mathbf{v}\rVert = \lVert \mathbf{v}\rVert. \]

A simple visualization:

def draw_vectors(vector_list):
    fig, ax = plt.subplots()
    for v in vector_list:
        ax.quiver(0, 0, *v, angles='xy', scale_units='xy', scale=1, color="blue")
    all_x = [v[0] for v in vector_list] + [0]
    all_y = [v[1] for v in vector_list] + [0]
    ax.set_xlim(min(all_x) - 1, max(all_x) + 1)
    ax.set_ylim(min(all_y) - 1, max(all_y) + 1)
    ax.axhline(0, color="black", linewidth=1)
    ax.axvline(0, color="black", linewidth=1)
    ax.set_aspect("equal")
    ax.grid(True, which='both', linestyle='--', linewidth=0.5)
    ax.set_xlabel("$x$")
    ax.set_ylabel("$y$")
    plt.show()

e1 = np.array([1,0])
theta = np.pi/4
R = np.array([[np.cos(theta), -np.sin(theta)],
              [np.sin(theta),  np.cos(theta)]])
draw_vectors([e1, R @ e1])
../../_images/6833529ca52b87e71e5bec2f09edbc5ded8c5d8ab91c40c62010216f0bdfba94.png

8. Matrix–matrix products and composition of transformations#

If \(M\in\mathbb{R}^{m\times n}\) and \(N\in\mathbb{R}^{n\times k}\), the product \(MN\) is an \(m\times k\) matrix.

The key interpretation is composition:

\[ \mathbf{v}\xmapsto{\,N\,} N\mathbf{v} \xmapsto{\,M\,} M(N\mathbf{v}) = (MN)\mathbf{v}. \]

So multiplying matrices corresponds to applying linear maps one after the other.

9. Inverse matrices#

A square matrix \(A\in\mathbb{R}^{n\times n}\) is invertible if there exists a matrix \(A^{-1}\) such that

\[ A^{-1}A = AA^{-1} = I, \]

where \(I\) is the identity matrix.

For a rotation matrix \(R(\theta)\),

\[\begin{split} R(\theta)^{-1}=R(-\theta)= \begin{bmatrix} \cos\theta & \sin\theta\\ -\sin\theta & \cos\theta \end{bmatrix}. \end{split}\]

10. Diagonal matrices: scaling and coordinate-wise effects#

A diagonal matrix has the form

\[\begin{split} D= \begin{bmatrix} d_1 & 0\\ 0 & d_2 \end{bmatrix}. \end{split}\]

Then

\[\begin{split} D\begin{bmatrix}x\\y\end{bmatrix} = \begin{bmatrix}d_1 x\\ d_2 y\end{bmatrix}. \end{split}\]

So diagonal matrices scale the \(x\)- and \(y\)-coordinates separately.

11. Change of coordinates (rotating the coordinate system)#

Suppose you have an \(x\)-\(y\) coordinate system and a rotated \(x'\)-\(y'\) system (rotated by \(\theta\)).

  • If a vector has coordinates \((x',y')\) in the rotated system, then its coordinates \((x,y)\) in the original system satisfy:

\[\begin{split} \begin{pmatrix}x\\y\end{pmatrix} = R(\theta)\begin{pmatrix}x'\\y'\end{pmatrix}. \end{split}\]
  • Conversely,

\[\begin{split} \begin{pmatrix}x'\\y'\end{pmatrix} = R(-\theta)\begin{pmatrix}x\\y\end{pmatrix}. \end{split}\]

This is the same idea as “rotating the axes” versus “rotating the vector”: one is the inverse of the other.