Linear Algebra and Calculus

The Nature of Mathematical Modeling (draft)

$ \def\%#1{\mathbf{#1}} \def\*#1{\vec{#1}} \def\ds#1#2{\cfrac{d #1}{d #2}} \def\ps#1#2{\cfrac{\partial #1}{\partial #2}} $

Background

  • Assumed for the rest of the text

Variables

integers

  • the set $\mathbb{Z}$
  • $\ldots,-1,0,1,\dots$

real numbers

  • the set $\mathbb{R}$
  • 1.0, 1/3, $\pi,\ldots$

complex numbers

  • the set $\mathbb{C}$
  • $z = x+iy$ = real + imaginary parts
  • $i^2=-1$
  • $e^{i\theta}=\cos\theta+i\sin\theta$
  • complex conjugate $z^*=x-iy$
  • magnitude $zz^*=x^2+y^2$

quaternions

  • generalization of complex numbers
  • $q = a+bi+cj+dk$
  • $i^2=j^2=k^2=ijk=-1$
  • useful to represent 3D rotations

scalar

  • 0D, e.g. position $x$

vector

  • 1D, e.g. 3D point $\*p=(x,y,z)$
  • transpose $\*p^T=\begin{pmatrix}x\\y\\z\end{pmatrix}$

matrix

  • 2D, e.g. collection of points $\%M= \begin{bmatrix} x_1 & x_2 & x_3\\ y_1 & y_2 & y_3 \\ z_1 & z_2 & z_3 \end{bmatrix} $
  • transpose $\%M^T = \begin{bmatrix} x_1 & y_1 & z_1\\ x_2 & y_2 & z_2 \\ x_3 & y_3 & z_3 \end{bmatrix} $

  • identity matrix $\%I = \begin{pmatrix} 1 & 0 & \cdots \\ 0 & 1 & \cdots \\ \vdots & \vdots & \ddots \end{pmatrix} $

tensor

  • $>$2D, generalizes matrices to higher dimensions

Norms

L2

$$|\*a| = |\*a|_2 = \sqrt{\sum_i a_i^2}$$
  • Equals the familiar Euclidean distance

L1

$$|\*a|_1 = \sum_i |a_i|$$
  • Equals the Manhattan norm, measuring distance in city blocks
  • Will be important in machine learning

unit vector

$$\hat{a} = \dfrac{\*a}{|\*a|}$$

Determinant

  • Equal to:

    • the volume of a parallelepiped formed by the rows or columns of a square matrix
    • the scaling factor of a volume multiplied by a matrix defining a linear transformation
  • 2D

$$ \det \begin{pmatrix} a & b \\ c & d \end{pmatrix}= \begin{vmatrix} a & b \\ c & d \end{vmatrix} = ad-bc$$
  • 3D
$$ \det \begin{pmatrix} a & b & c \\ d & e & f \\ g & h & i \end{pmatrix}= \begin{vmatrix} a & b & c \\ d & e & f \\ g & h & i \end{vmatrix} = a(ei-fh)-b(di-fg)+c(dh-eg) $$
  • D dimensions
$$ \det(\mathbf{M}) = \sum_{i_1,\ldots,i_D} \epsilon_{i_1\ldots i_d} m_{1,i_1}\cdots m_{D,i_D} $$
  • $\epsilon_{i_1\ldots i_d}$ is the Levi-Civita symbol, equal to 1 for even permutations of the indices, -1 for odd permutations, 0 otherwise

Products

dot (inner)

$$ \begin{eqnarray} \*a\cdot\*b &=& \sum_i a_i b_i \\ &=& |\*a||\*b|\cos\theta \\ &=& \*a^T\*b \end{eqnarray} $$
  • Measures the overlap between vectors, where $\theta$ is the angle between them
  • NumPy notation: a@b

cross

$$ \begin{eqnarray} \vec{a}\times\vec{b} &=& \begin{vmatrix} \hat{x} & \hat{y} & \hat{z} \\ a_x & a_y & a_z \\ b_x & b_y & b_z \end{vmatrix} \\ &=& |\vec{a}||\vec{b}|\sin(\theta)\hat{n} \end{eqnarray} $$
  • Measures the area of a parallelogram bounded by the vectors, where $\theta$ is the angle between them
  • $\hat{n}$ is a perpendicular unit vector; if your right-hand fingers curl from $\vec{a}$ to $\vec{b}$ your thumb points in the direction of $\hat{n}$

  • NumPy routine: cross

outer

$$ \begin{eqnarray} (\*a\otimes \*b)_{ij} &=& u_i v_j \\ \*a\otimes \*b &=& \*a\*b^T \end{eqnarray} $$
  • NumPy routine: outer ### matrix
$$ \begin{eqnarray} \%C &=& \%A\%B \\ c_{ij} &=& \sum_k a_{ik}b_{kj} \end{eqnarray} $$
  • NumPy notation: A@B

Inverse

$$\%M~\%M^{-1} = \%I$$
  • Limited to square matrices
  • Can be used to solve a linear system of equations
$$ \begin{eqnarray} \%M~\*a &=& \*b \\ \*a &=& \%M^{-1}\*b \end{eqnarray} $$
  • The determinant of the inverse of a matrix is equal to the inverse of its determinant. If the determinant vanishes then the matrix can not be inverted, which corresponds to a matrix transformation that reduces the dimension of the vector space.

algorithm

  • Can be found by Gaussian elimination (row reduction)
  • Is computationally expensive, requiring $O(N^3)$ operations
  • Can be simplified for sparse matrices, which frequently occur in numerical methods

routines

  • numerical: NumPy inv
import numpy as np
np.set_printoptions(suppress=True,precision=1)
M = np.array([[1,2],[3,4]])
print(f"M:\n{M}")
Minv = np.linalg.inv(M)
print(f"M inverse:\n{Minv}")
print(f"product:\n{M@Minv}")
M:
[[1 2]
 [3 4]]
M inverse:
[[-2.   1. ]
 [ 1.5 -0.5]]
product:
[[1. 0.]
 [0. 1.]]
  • symbolic: SymPy inv
import sympy as sy
from IPython.display import display,Math
sy.init_printing()
a,b,c,d = sy.symbols('a b c d')
M = sy.Matrix([[a,b],[c,d]])
Minv = M.inv()
display(Math(f"M:{sy.latex(M)}"))
display(Math(f"M^{{-1}}:{sy.latex(Minv)}"))
display(Math(f"\\mathrm{{product:}}{sy.latex(M*Minv)}"))
display(Math(f"\\mathrm{{simplify:}}{sy.latex(sy.simplify(M*Minv))}"))
$\displaystyle M:\left[\begin{matrix}a & b\\c & d\end{matrix}\right]$ $\displaystyle M^{-1}:\left[\begin{matrix}\frac{d}{a d - b c} & - \frac{b}{a d - b c}\\- \frac{c}{a d - b c} & \frac{a}{a d - b c}\end{matrix}\right]$ $\displaystyle \mathrm{product:}\left[\begin{matrix}\frac{a d}{a d - b c} - \frac{b c}{a d - b c} & 0\\0 & \frac{a d}{a d - b c} - \frac{b c}{a d - b c}\end{matrix}\right]$ $\displaystyle \mathrm{simplify:}\left[\begin{matrix}1 & 0\\0 & 1\end{matrix}\right]$

Eigenvectors, Values

  • eigen in German means own, characteritic, proper
  • Named by [Hilbert:1904]

definition

  • An eigenvector $\*v$ of a matrix $\%M$, when mutiplied by the matrix, is equal to itself times an eigenvalue $\lambda$:
$$\%M\*v = \lambda \*v$$

interpretation

  • Multiplying a vector times a matrix is a linear transformation; the transformation of an eigenvector can change its scale, but does not rotate or shear the vector

algorithm

$$ \begin{eqnarray} \%M\*v &=& \lambda\*v \\ \%M\*v &=& \lambda\%I\*v \\ (\%M-\lambda\%I)\*v &=& 0 \end{eqnarray} $$

If $\%M-\lambda\%I$ can be inverted then:

$$ \begin{eqnarray} \*v &=& \left(\%M-\lambda\%I\right)^{-1} 0 \\ &=& 0 \end{eqnarray} $$

Therefore:

$$|\%M-\lambda\%I|=0$$

$|\%M-\lambda\%I|$ is called the characteristic polynomial, and its roots give the eigenvalues $\lambda_i$. The eigenvectors are then found by solving $\%M\*v = \lambda_i \*v$ for each eigenvalue, with an arbitrary scaling of their magnitude.

The product of the eigenvalues is equal to the determinant of a matrix.

diagonalization

If $\%V$ is a matrix with the eigenvectors of $\%M$ as its columns, and $\%\Lambda$ is a matrix with the eigenvalues of $\%M$ on its diagonal, then:

$$\%M\%V = \%V\%\Lambda$$

therefore:

$$\%V^{-1}\%M\%V = \%\Lambda$$

i.e. $\%M$ is transformed into a diagonal matrix. This defines a transformation to an independent set of coordinates, which we'll use for finding normal modes and principal components.

routines

  • numerical: NumPy eig
import numpy as np
M = np.array([[1,2],[3,4]])
vals,vects = np.linalg.eig(M)
print(f"M:\n{M}")
print(f"eigenvalues:\n{vals}")
print(f"eigenvectors:\n{vects}")
M:
[[1 2]
 [3 4]]
eigenvalues:
[-0.4  5.4]
eigenvectors:
[[-0.8 -0.4]
 [ 0.6 -0.9]]
import sympy as sy
from IPython.display import display,Math,Markdown
sy.init_printing()
a,b,c,d = sy.symbols('a b c d')
M = sy.Matrix([[a,b],[c,d]])
result = M.eigenvects()
display(Math(f"M:{sy.latex(M)}"))
display(Markdown("eigenvalues, multiplicity, eigenvectors:"))
display(Math(sy.latex(result[0])))
display(Math(sy.latex(result[1])))
$\displaystyle M:\left[\begin{matrix}a & b\\c & d\end{matrix}\right]$

eigenvalues, multiplicity, eigenvectors:

$\displaystyle \left( \frac{a}{2} + \frac{d}{2} - \frac{\sqrt{a^{2} - 2 a d + 4 b c + d^{2}}}{2}, \ 1, \ \left[ \left[\begin{matrix}- \frac{d}{c} + \frac{\frac{a}{2} + \frac{d}{2} - \frac{\sqrt{a^{2} - 2 a d + 4 b c + d^{2}}}{2}}{c}\\1\end{matrix}\right]\right]\right)$ $\displaystyle \left( \frac{a}{2} + \frac{d}{2} + \frac{\sqrt{a^{2} - 2 a d + 4 b c + d^{2}}}{2}, \ 1, \ \left[ \left[\begin{matrix}- \frac{d}{c} + \frac{\frac{a}{2} + \frac{d}{2} + \frac{\sqrt{a^{2} - 2 a d + 4 b c + d^{2}}}{2}}{c}\\1\end{matrix}\right]\right]\right)$

Singular Vectors, values

  • Linear problem $\%M\*x=\*b$

  • The space of $\*b$ for which solvable is the range

  • The dimension of the range is the rank

  • $\%M\*x=0$ is the nullspace

  • If there is no nullspace the matrix is full rank

Singular Value Decomposition (SVD)

  • For an $H\times W$ matrix $\%M$, the singular value decomposition is:
$$\%M = \%U\%W\%V^T$$
  • $\%U$ and $\%V$ are orthonormal matrices: $\%U^T\%U = \%V\%V^T = \%I$

  • $\%I$ is $W\times W$ identity matrix

  • $\%W$ is a diagonal matrix with elements $w_i$, the singular values

  • The columns of $\%U$ associated with nonzero singular values form orthonormal basis for the range of $\%M$

  • The columns ov $\%V$ associated with zero singular values form an orthonormal basis for the nullspace of $\%M$

  • The singular values give the lengths of the principal axes of the hyper-ellipsoid defined by $\%M\*x$ where $\*x$ lies on the hyper-sphere $|\*x|^2=1$

  • NumPy routine: svd


Pseudoinverse

  • Singular and rectangular matrices can not be exactly inverted
  • The pseudoinverse $\%M^+$ applied to solving a linear system of equations has two cases:
    • if the equations are over-determined, so that there is no solution, it find the one the minimizes the least-squares error
    • if the equations are under-determined, so that there are many solutions, if finds the one with the minimum norm
  • The pseudoinverse is also called the Moore-Penrose or generalized inverse
  • It can be found with the SVD: $$\*x = \%V\%W^{-1}\%U^T\*b$$
    • In $\%W^{-1}$ set 1/0=0 (Problem 1.)
  • NumPy routines: pinv, lstsq

Derivatives

definition

  • The derivative of a function measures the slope:
$$\ds{f(x)}{x} = \lim_{dx\rightarrow 0}\cfrac{f(x+dx)-f(x)}{dx}$$
  • The second derivative measures the curvature:
$$\cfrac{d^2f(x)}{dx^2} = \cfrac{d}{dx}\cfrac{df(x)}{dx}$$
  • Derivatives with respect to time are commonly notated with dots:
$$\cfrac{dx}{dt} = \dot{x} ~~~ \cfrac{d^2x}{dt^2} = \ddot{x}$$

partial derivatives

  • The partial derivative of a function with multiple variables measures the slope in the direction of selected variables:
$$\ps{f(x,y)}{x} = \lim_{dx\rightarrow 0}\cfrac{f(x+dx,y)-f(x,y)}{dx}$$$$\cfrac{\partial^2 f(x,y)}{\partial y\partial x} = \ps{}{x}\ps{f(x,y)}{y}$$

chain rule

$$\ds{f\left(g(x)\right)}{x} = \ds{f}{g}\ds{g}{x}$$

symbolic differentiation

import sympy as sy
from IPython.display import display,Math
sy.init_printing()
x,y = sy.symbols('x,y')
display(Math(f"\\dfrac{{d}}{{dx}}\\sin(x)=\
   {sy.latex(sy.diff(sy.sin(x)))}"))
$\displaystyle \dfrac{d}{dx}\sin(x)= \cos{\left(x \right)}$
display(Math(f"\\dfrac{{d^2}}{{dx^2}}\\sin(x)=\
   {sy.latex(sy.diff(sy.diff(sy.sin(x))))}"))
$\displaystyle \dfrac{d^2}{dx^2}\sin(x)= - \sin{\left(x \right)}$
display(Math(f"\\dfrac{{\\partial}}{{\\partial x}}\\sin(x^2y)=\
   {sy.latex(sy.diff(sy.sin(x**2*y),x))}"))
$\displaystyle \dfrac{\partial}{\partial x}\sin(x^2y)= 2 x y \cos{\left(x^{2} y \right)}$

del or nabla operator

$$\nabla = \left(\ps{}{x_1},\ldots,\ps{}{x_D}\right)$$

grad

$$\nabla f(x,y,z) = \ps{f}{x}\hat x+\ps{f}{y}\hat y+\ps{f}{z}\hat x$$
  • Grad (or gradient) points in the direction of steepest increase, with the magnitude giving the slope
import sympy as sy
from sympy.vector import CoordSys3D,gradient
from IPython.display import display,Math
sy.init_printing()
p = CoordSys3D('p')
f = p.x**2+p.y**2+p.z**2
display(Math(f"\\nabla\\cdot(x^2+y^2+z^2)=\
   {sy.latex(gradient(f))}"))
$\displaystyle \nabla\cdot(x^2+y^2+z^2)= \left(2 \mathbf{{x}_{p}}\right)\mathbf{\hat{i}_{p}} + \left(2 \mathbf{{y}_{p}}\right)\mathbf{\hat{j}_{p}} + \left(2 \mathbf{{z}_{p}}\right)\mathbf{\hat{k}_{p}}$

div

$$\begin{eqnarray} \nabla\cdot\*f(\*x) &=& \nabla\cdot\left(a(\*x),b(\*x),c(\*x)\right)\\ &=& \ps{a}{x}+\ps{b}{y}+\ps{c}{z} \end{eqnarray} $$
  • Div (or divergence) measures whether a vector field is expanding or contracting at a point

curl

$$ \begin{eqnarray} \nabla\times\*f(\*x) &=& \nabla\times(a(\*x),b(\*x),c(\*x))\\ &=& \begin{vmatrix} \hat x&\hat y&\hat z\\ \ps{}{x}&\ps{}{y}&\ps{}{z}\\ a&b&c \end{vmatrix} \end{eqnarray} $$
  • The curl measures the circulation around a point

Integrals

  • The indefinite integral is the opposite of the derivate
$$\int \cfrac{df(x)}{dx}~dx = f(x)$$
  • The definite integral is evaluted between limits
$$\int_a^b \cfrac{df(x)}{dx}~dx = f(b)-f(a)$$
  • The definite integral gives the area under the curve of a function, and is equal to the limit of breaking it up into a sum over the area of infinitesimal rectangles
$$\int_a^b f(x)~dx = \lim_{N\rightarrow \infty}\sum_{i=0}^N f(x_i)~dx$$

where $x_i = a+(b-a)i/N$ and $dx = (b-a)/N$

symbolic integration

import sympy as sy
from IPython.display import display,Math
sy.init_printing()
x,y = sy.symbols('x,y')
display(Math(f"\\int x^2=\
   {sy.latex(sy.integrate(x**2))}"))
$\displaystyle \int x^2= \frac{x^{3}}{3}$
display(Math(f"\\int_1^{{10}} x^2=\
   {sy.latex(sy.integrate(x**2,(x,1,10)))}"))
$\displaystyle \int_1^{10} x^2= 333$

References

  • [Hilbert:1904] David Hilbert, Grundzüge einer allgemeinen Theorie der linearen Integralgleichungen. (Erste Mitteilung) (Outline of a General Theory of Linear Integral Equations. (First Communication)), Nachrichten der Wissenschaftlichen Gesellschaft zu Göttingen (1904, pp. 49-91)
    • Foundational work
  • [Strang:1986] Strang, Gilbert. Introduction to applied mathematics. Vol. 16. Wellesley, MA: Wellesley-Cambridge Press, 1986.
    • Classic introduction

Problems

  1. Use a numerical example to verify that the SVD pseudoinverse gives the minimum norm solution for underdetermined problems and the minimum residual for overdetermined problems.
  2. Prove that the SVD pseudoinverse gives the minimum norm solution for underdetermined problems and the minimum residual for overdetermined problems.

(c) Neil Gershenfeld 1/4/26