Linear Algebra and Calculus¶
The Nature of Mathematical Modeling (draft)¶
$ \def\%#1{\mathbf{#1}} \def\*#1{\vec{#1}} \def\ds#1#2{\cfrac{d #1}{d #2}} \def\ps#1#2{\cfrac{\partial #1}{\partial #2}} $
Background¶
- Assumed for the rest of the text
Variables¶
integers¶
- the set $\mathbb{Z}$
- $\ldots,-1,0,1,\dots$
real numbers¶
- the set $\mathbb{R}$
- 1.0, 1/3, $\pi,\ldots$
complex numbers¶
- the set $\mathbb{C}$
- $z = x+iy$ = real + imaginary parts
- $i^2=-1$
- $e^{i\theta}=\cos\theta+i\sin\theta$
- complex conjugate $z^*=x-iy$
- magnitude $zz^*=x^2+y^2$
quaternions¶
- generalization of complex numbers
- $q = a+bi+cj+dk$
- $i^2=j^2=k^2=ijk=-1$
- useful to represent 3D rotations
scalar¶
- 0D, e.g. position $x$
vector¶
- 1D, e.g. 3D point $\*p=(x,y,z)$
- transpose $\*p^T=\begin{pmatrix}x\\y\\z\end{pmatrix}$
matrix¶
- 2D, e.g. collection of points $\%M= \begin{bmatrix} x_1 & x_2 & x_3\\ y_1 & y_2 & y_3 \\ z_1 & z_2 & z_3 \end{bmatrix} $
transpose $\%M^T = \begin{bmatrix} x_1 & y_1 & z_1\\ x_2 & y_2 & z_2 \\ x_3 & y_3 & z_3 \end{bmatrix} $
identity matrix $\%I = \begin{pmatrix} 1 & 0 & \cdots \\ 0 & 1 & \cdots \\ \vdots & \vdots & \ddots \end{pmatrix} $
tensor¶
- $>$2D, generalizes matrices to higher dimensions
Norms¶
L2¶
$$|\*a| = |\*a|_2 = \sqrt{\sum_i a_i^2}$$- Equals the familiar Euclidean distance
L1¶
$$|\*a|_1 = \sum_i |a_i|$$- Equals the Manhattan norm, measuring distance in city blocks
- Will be important in machine learning
unit vector¶
$$\hat{a} = \dfrac{\*a}{|\*a|}$$Determinant¶
Equal to:
- the volume of a parallelepiped formed by the rows or columns of a square matrix
- the scaling factor of a volume multiplied by a matrix defining a linear transformation
2D
- 3D
- D dimensions
- $\epsilon_{i_1\ldots i_d}$ is the Levi-Civita symbol, equal to 1 for even permutations of the indices, -1 for odd permutations, 0 otherwise
Products¶
dot (inner)¶
$$ \begin{eqnarray} \*a\cdot\*b &=& \sum_i a_i b_i \\ &=& |\*a||\*b|\cos\theta \\ &=& \*a^T\*b \end{eqnarray} $$- Measures the overlap between vectors, where $\theta$ is the angle between them
- NumPy notation: a@b
cross¶
$$ \begin{eqnarray} \vec{a}\times\vec{b} &=& \begin{vmatrix} \hat{x} & \hat{y} & \hat{z} \\ a_x & a_y & a_z \\ b_x & b_y & b_z \end{vmatrix} \\ &=& |\vec{a}||\vec{b}|\sin(\theta)\hat{n} \end{eqnarray} $$- Measures the area of a parallelogram bounded by the vectors, where $\theta$ is the angle between them
$\hat{n}$ is a perpendicular unit vector; if your right-hand fingers curl from $\vec{a}$ to $\vec{b}$ your thumb points in the direction of $\hat{n}$
NumPy routine: cross
outer¶
$$ \begin{eqnarray} (\*a\otimes \*b)_{ij} &=& u_i v_j \\ \*a\otimes \*b &=& \*a\*b^T \end{eqnarray} $$- NumPy routine: outer ### matrix
- NumPy notation: A@B
Inverse¶
$$\%M~\%M^{-1} = \%I$$- Limited to square matrices
- Can be used to solve a linear system of equations
- The determinant of the inverse of a matrix is equal to the inverse of its determinant. If the determinant vanishes then the matrix can not be inverted, which corresponds to a matrix transformation that reduces the dimension of the vector space.
algorithm¶
- Can be found by Gaussian elimination (row reduction)
- Is computationally expensive, requiring $O(N^3)$ operations
- Can be simplified for sparse matrices, which frequently occur in numerical methods
routines¶
- numerical: NumPy inv
import numpy as np
np.set_printoptions(suppress=True,precision=1)
M = np.array([[1,2],[3,4]])
print(f"M:\n{M}")
Minv = np.linalg.inv(M)
print(f"M inverse:\n{Minv}")
print(f"product:\n{M@Minv}")
M: [[1 2] [3 4]] M inverse: [[-2. 1. ] [ 1.5 -0.5]] product: [[1. 0.] [0. 1.]]
- symbolic: SymPy inv
import sympy as sy
from IPython.display import display,Math
sy.init_printing()
a,b,c,d = sy.symbols('a b c d')
M = sy.Matrix([[a,b],[c,d]])
Minv = M.inv()
display(Math(f"M:{sy.latex(M)}"))
display(Math(f"M^{{-1}}:{sy.latex(Minv)}"))
display(Math(f"\\mathrm{{product:}}{sy.latex(M*Minv)}"))
display(Math(f"\\mathrm{{simplify:}}{sy.latex(sy.simplify(M*Minv))}"))
Eigenvectors, Values¶
- eigen in German means own, characteritic, proper
- Named by [Hilbert:1904]
definition¶
- An eigenvector $\*v$ of a matrix $\%M$, when mutiplied by the matrix, is equal to itself times an eigenvalue $\lambda$:
interpretation¶
- Multiplying a vector times a matrix is a linear transformation; the transformation of an eigenvector can change its scale, but does not rotate or shear the vector
algorithm¶
$$ \begin{eqnarray} \%M\*v &=& \lambda\*v \\ \%M\*v &=& \lambda\%I\*v \\ (\%M-\lambda\%I)\*v &=& 0 \end{eqnarray} $$If $\%M-\lambda\%I$ can be inverted then:
$$ \begin{eqnarray} \*v &=& \left(\%M-\lambda\%I\right)^{-1} 0 \\ &=& 0 \end{eqnarray} $$Therefore:
$$|\%M-\lambda\%I|=0$$$|\%M-\lambda\%I|$ is called the characteristic polynomial, and its roots give the eigenvalues $\lambda_i$. The eigenvectors are then found by solving $\%M\*v = \lambda_i \*v$ for each eigenvalue, with an arbitrary scaling of their magnitude.
The product of the eigenvalues is equal to the determinant of a matrix.
diagonalization¶
If $\%V$ is a matrix with the eigenvectors of $\%M$ as its columns, and $\%\Lambda$ is a matrix with the eigenvalues of $\%M$ on its diagonal, then:
$$\%M\%V = \%V\%\Lambda$$therefore:
$$\%V^{-1}\%M\%V = \%\Lambda$$i.e. $\%M$ is transformed into a diagonal matrix. This defines a transformation to an independent set of coordinates, which we'll use for finding normal modes and principal components.
routines¶
- numerical: NumPy eig
import numpy as np
M = np.array([[1,2],[3,4]])
vals,vects = np.linalg.eig(M)
print(f"M:\n{M}")
print(f"eigenvalues:\n{vals}")
print(f"eigenvectors:\n{vects}")
M: [[1 2] [3 4]] eigenvalues: [-0.4 5.4] eigenvectors: [[-0.8 -0.4] [ 0.6 -0.9]]
- symbolic: SymPy eigenvects
import sympy as sy
from IPython.display import display,Math,Markdown
sy.init_printing()
a,b,c,d = sy.symbols('a b c d')
M = sy.Matrix([[a,b],[c,d]])
result = M.eigenvects()
display(Math(f"M:{sy.latex(M)}"))
display(Markdown("eigenvalues, multiplicity, eigenvectors:"))
display(Math(sy.latex(result[0])))
display(Math(sy.latex(result[1])))
eigenvalues, multiplicity, eigenvectors:
$\displaystyle \left( \frac{a}{2} + \frac{d}{2} - \frac{\sqrt{a^{2} - 2 a d + 4 b c + d^{2}}}{2}, \ 1, \ \left[ \left[\begin{matrix}- \frac{d}{c} + \frac{\frac{a}{2} + \frac{d}{2} - \frac{\sqrt{a^{2} - 2 a d + 4 b c + d^{2}}}{2}}{c}\\1\end{matrix}\right]\right]\right)$ $\displaystyle \left( \frac{a}{2} + \frac{d}{2} + \frac{\sqrt{a^{2} - 2 a d + 4 b c + d^{2}}}{2}, \ 1, \ \left[ \left[\begin{matrix}- \frac{d}{c} + \frac{\frac{a}{2} + \frac{d}{2} + \frac{\sqrt{a^{2} - 2 a d + 4 b c + d^{2}}}{2}}{c}\\1\end{matrix}\right]\right]\right)$Singular Vectors, values¶
Linear problem $\%M\*x=\*b$
The space of $\*b$ for which solvable is the range
The dimension of the range is the rank
$\%M\*x=0$ is the nullspace
If there is no nullspace the matrix is full rank
Singular Value Decomposition (SVD)¶
- For an $H\times W$ matrix $\%M$, the singular value decomposition is:
$\%U$ and $\%V$ are orthonormal matrices: $\%U^T\%U = \%V\%V^T = \%I$
$\%I$ is $W\times W$ identity matrix
$\%W$ is a diagonal matrix with elements $w_i$, the singular values
The columns of $\%U$ associated with nonzero singular values form orthonormal basis for the range of $\%M$
The columns ov $\%V$ associated with zero singular values form an orthonormal basis for the nullspace of $\%M$
The singular values give the lengths of the principal axes of the hyper-ellipsoid defined by $\%M\*x$ where $\*x$ lies on the hyper-sphere $|\*x|^2=1$
NumPy routine: svd
Pseudoinverse¶
- Singular and rectangular matrices can not be exactly inverted
- The pseudoinverse $\%M^+$ applied to solving a linear system of equations has two cases:
- if the equations are over-determined, so that there is no solution, it find the one the minimizes the least-squares error
- if the equations are under-determined, so that there are many solutions, if finds the one with the minimum norm
- The pseudoinverse is also called the Moore-Penrose or generalized inverse
- It can be found with the SVD:
$$\*x = \%V\%W^{-1}\%U^T\*b$$
- In $\%W^{-1}$ set 1/0=0 (Problem 1.)
- NumPy routines: pinv, lstsq
Derivatives¶
definition¶
- The derivative of a function measures the slope:
- The second derivative measures the curvature:
- Derivatives with respect to time are commonly notated with dots:
partial derivatives¶
- The partial derivative of a function with multiple variables measures the slope in the direction of selected variables:
chain rule¶
$$\ds{f\left(g(x)\right)}{x} = \ds{f}{g}\ds{g}{x}$$symbolic differentiation¶
import sympy as sy
from IPython.display import display,Math
sy.init_printing()
x,y = sy.symbols('x,y')
display(Math(f"\\dfrac{{d}}{{dx}}\\sin(x)=\
{sy.latex(sy.diff(sy.sin(x)))}"))
display(Math(f"\\dfrac{{d^2}}{{dx^2}}\\sin(x)=\
{sy.latex(sy.diff(sy.diff(sy.sin(x))))}"))
display(Math(f"\\dfrac{{\\partial}}{{\\partial x}}\\sin(x^2y)=\
{sy.latex(sy.diff(sy.sin(x**2*y),x))}"))
del or nabla operator¶
$$\nabla = \left(\ps{}{x_1},\ldots,\ps{}{x_D}\right)$$grad¶
$$\nabla f(x,y,z) = \ps{f}{x}\hat x+\ps{f}{y}\hat y+\ps{f}{z}\hat x$$- Grad (or gradient) points in the direction of steepest increase, with the magnitude giving the slope
import sympy as sy
from sympy.vector import CoordSys3D,gradient
from IPython.display import display,Math
sy.init_printing()
p = CoordSys3D('p')
f = p.x**2+p.y**2+p.z**2
display(Math(f"\\nabla\\cdot(x^2+y^2+z^2)=\
{sy.latex(gradient(f))}"))
div¶
$$\begin{eqnarray} \nabla\cdot\*f(\*x) &=& \nabla\cdot\left(a(\*x),b(\*x),c(\*x)\right)\\ &=& \ps{a}{x}+\ps{b}{y}+\ps{c}{z} \end{eqnarray} $$- Div (or divergence) measures whether a vector field is expanding or contracting at a point
curl¶
$$ \begin{eqnarray} \nabla\times\*f(\*x) &=& \nabla\times(a(\*x),b(\*x),c(\*x))\\ &=& \begin{vmatrix} \hat x&\hat y&\hat z\\ \ps{}{x}&\ps{}{y}&\ps{}{z}\\ a&b&c \end{vmatrix} \end{eqnarray} $$- The curl measures the circulation around a point
Integrals¶
- The indefinite integral is the opposite of the derivate
- The definite integral is evaluted between limits
- The definite integral gives the area under the curve of a function, and is equal to the limit of breaking it up into a sum over the area of infinitesimal rectangles
where $x_i = a+(b-a)i/N$ and $dx = (b-a)/N$
symbolic integration¶
import sympy as sy
from IPython.display import display,Math
sy.init_printing()
x,y = sy.symbols('x,y')
display(Math(f"\\int x^2=\
{sy.latex(sy.integrate(x**2))}"))
display(Math(f"\\int_1^{{10}} x^2=\
{sy.latex(sy.integrate(x**2,(x,1,10)))}"))
References¶
- [Hilbert:1904] David Hilbert, Grundzüge einer allgemeinen Theorie der linearen Integralgleichungen. (Erste Mitteilung) (Outline of a General Theory of Linear Integral Equations. (First Communication)), Nachrichten der Wissenschaftlichen Gesellschaft zu Göttingen (1904, pp. 49-91)
- Foundational work
- [Strang:1986] Strang, Gilbert. Introduction to applied mathematics. Vol. 16. Wellesley, MA: Wellesley-Cambridge Press, 1986.
- Classic introduction
Problems¶
- Use a numerical example to verify that the SVD pseudoinverse gives the minimum norm solution for underdetermined problems and the minimum residual for overdetermined problems.
- Prove that the SVD pseudoinverse gives the minimum norm solution for underdetermined problems and the minimum residual for overdetermined problems.
(c) Neil Gershenfeld 1/4/26