CS2900 Revision Notes
Multi-dimensional Data Processing Revision Notes
Section 1: Vectors & Matrices Basics
1.1 Vector Operations
Dot Product:
For vectors ( \underline{u} = (u_1, u_2, …, u_n) ) and ( \underline{v} = (v_1, v_2, …, v_n) ):
[ \underline{u} \cdot \underline{v} = u_1v_1 + u_2v_2 + … + u_nv_n ]
Example: For ( \underline{u} = \begin{pmatrix}1\2\7\-3\end{pmatrix} ) and ( \underline{v} = \begin{pmatrix}-4\2\1\1\end{pmatrix} ):
[ \underline{u} \cdot \underline{v} = (1)(-4) + (2)(2) + (7)(1) + (-3)(1) = -4 + 4 + 7 - 3 = 4 ]
Vector Norm (Length):
[ ||\underline{u}|| = \sqrt{u_1^2 + u_2^2 + … + u_n^2} ]
Cosine Similarity:
[ \cos\theta = \frac{\underline{u} \cdot \underline{v}}{||\underline{u}|| \cdot ||\underline{v}||} ]
Used to measure the angle between two vectors.
Example: For ( \underline{u} = \begin{pmatrix}1\1\1\\sqrt{2}\\sqrt{2}\end{pmatrix} ) and ( \underline{v} = \begin{pmatrix}2\-1\0\0\0\end{pmatrix}):
[ ||\underline{u}|| = \sqrt{1^2 + 1^2 + 1^2 + (\sqrt{2})^2 + (\sqrt{2})^2} = \sqrt{8} ]
[ \underline{u} \cdot \underline{v} = 2 -1 + 0 + 0 + 0 = 1 ]
1.2 Matrix Terminology
Diagonal Matrix:
Square matrix where all non-diagonal entries are zero.
Example: ( \mathbf{D} = \begin{pmatrix}d_1 & 0\0 & d_2\end{pmatrix} ).
Symmetric Matrix:
Matrix equals its transpose: ( \mathbf{A} = \mathbf{A}^\intercal ).
Orthogonal Matrix:
Columns are orthonormal vectors: ( \mathbf{Q}^\intercal\mathbf{Q} = \mathbf{I} ).
Key property: Preserves vector lengths, i.e., ( ||\mathbf{Q}\underline{u}|| = ||\underline{u}|| ).
Rank of Matrix:
Maximum number of linearly independent rows/columns.
Section 2: Singular Value Decomposition (SVD)
2.1 Definition
For an ( M \times N ) matrix ( \mathbf{B} ), SVD is:
[ \mathbf{B} = \mathbf{U}\mathbf{D}\mathbf{V}^\intercal ]
- ( \mathbf{U} ): ( M \times M ) orthogonal matrix (left singular vectors).
- ( \mathbf{D} ): ( M \times N ) diagonal matrix with singular values ( d_i \geq 0 ).
- ( \mathbf{V}^\intercal ): ( N \times N ) orthogonal matrix (right singular vectors).
2.2 Pseudo-Inverse
For matrix ( \mathbf{B} ), if ( \mathbf{B}^\dagger ) is its pseudo-inverse:
[ \mathbf{B}^\dagger = \mathbf{V}\mathbf{D}^\dagger\mathbf{U}^\intercal ]
Where ( \mathbf{D}^\dagger ) replaces non-zero singular values ( d_i ) with ( 1/d_i ).
Section 3: PageRank Algorithm
3.1 Key Concepts
Adjacency Matrix (A):
- Represents directed links between web pages.
- ( A_{ij} = 1 ) if page ( i ) links to page ( j ); 0 otherwise.
Diffusion Matrix (P):
[ P_{ij} = \frac{A_{ij}}{\text{deg}(i)} ]
Where ( \text{deg}(i) ) = number of outgoing links from page ( i ).
Teleportation Adjustment:
To handle dangling nodes (pages with no outgoing links):
[ \mathbf{P}’’ = c\mathbf{P}’ + (1 - c)\mathbf{v}\mathbf{e}^\intercal ]
- ( c ): Damping factor (typical 0.85).
- ( \mathbf{v} = (1/n, …, 1/n) ): Uniform distribution vector.
3.2 Power Method Steps
- Initialize ( \mathbf{x}^{(0)} = \mathbf{v} ).
- Iterate until convergence:
[ \mathbf{x}^{(k+1)} = c\mathbf{P}\mathbf{x}^{(k)} + (1 - c)\mathbf{v} ] - Check stability using ( ||\mathbf{x}^{(k+1)} - \mathbf{x}^{(k)}|| < \epsilon ).
Section 4: Vector Projections & Orthogonality
4.1 Projection Formula
The projection of ( \underline{u} ) onto ( \underline{v} ):
[ \text{proj}_{\underline{v}}(\underline{u}) = \frac{\underline{u} \cdot \underline{v}}{||\underline{v}||^2}\underline{v} ]
4.2 Orthogonal Component
Vector ( \underline{w} = \underline{u} - \text{proj}_{\underline{v}}(\underline{u}) ) is orthogonal to ( \underline{v} ).
Property: ( \underline{w} \cdot \underline{v} = 0 ).
Section 5: Graph Theory & Adjacency Matrices
5.1 Directed vs. Undirected Graphs
- Directed: Links have direction (adjacency matrix asymmetric).
- Undirected: Links are bidirectional (adjacency matrix symmetric).
5.2 Path Counting
Entries in ( \mathbf{A}^k ) give number of paths of length ( k ) between nodes.
Example: For adjacency matrix ( \mathbf{A} ):
- Paths of length 2: Non-zero entries in ( \mathbf{A}^2 ).
- Paths of length 3: Non-zero entries in ( \mathbf{A}^3 ).
Section 6: Python & Special Matrix Classes
6.1 specialMatrix
Structure
- Purpose: Efficiently store structured matrices (e.g., diagonal, triangular).
- Function
func1
: Likely performs matrix-vector multiplication optimized for sparsity. - Trace Calculation: Sum of diagonal elements.
1
2def trace(self):
return sum(self.diagonal_elements)
6.2 Advantages of Custom Classes
- Memory efficiency for sparse matrices.
- Faster computations by avoiding unnecessary zero operations.
Section 7: Covariance & SVD in Data
7.1 Covariance Matrix
For centered data matrix $\mathbf{X}$$$\mathbf{C}_X = \frac{1}{n-1}\mathbf{X}^\intercal\mathbf{X}$$
Decomposition via SVD$$\mathbf{C}_X = \mathbf{V}\mathbf{\Lambda}\mathbf{V}^\intercal$$
- Columns of $\mathbf{V}$ are principal directions.
- $\mathbf{\Lambda}$ contains eigenvalues (variances).