161762 Multivariate Analysis for Big Data

Lecture 2: Matrices as Operators for Multivariate Analytics

Nick Knowlton

Massey University

Fall 2026

Why Linear Algebra?

You will not do matrix arithmetic by hand

Contract for today: We will not compute. We will interpret pictures.

But you need to understand what multivariate tools do under the hood:

  • Covariance captures the shape of your data (Lecture 1)
  • Matrices transform point clouds
  • Singularity signals redundancy (collinearity)
  • These ideas lead directly to PCA (Lecture 3)

Matrices as operators

Key idea

A matrix is a function that takes a vector in and produces a new vector out.

\[\mathbf{y} = \mathbf{M}\,\mathbf{x}\]

Every multivariate technique — PCA, LDA, regression — applies a matrix operator to your data.

A matrix doesn’t just store numbers. It rotates, stretches, squeezes, or collapses your data cloud.

  • Business analogy: A matrix is a recipe that mixes inputs into new composite metrics.
  • Feature engineering: PCA is automated feature engineering using rotations.

Example of a matrix operation

A data observation is a vector — one column of numbers (\(n \times 1\)):

\[\mathbf{x} = \begin{pmatrix} 230 \\ 37 \\ 22 \end{pmatrix} {\small \begin{array}{l} \leftarrow \text{TV spend (\$k)} \\ \leftarrow \text{radio spend (\$k)} \\ \leftarrow \text{sales (\$k)} \end{array}}\]

Apply a diagonal scaling matrix \(\mathbf{M}\) (standardise each variable by its SD):

\[\underbrace{\begin{pmatrix} 1/s_1 & 0 & 0 \\ 0 & 1/s_2 & 0 \\ 0 & 0 & 1/s_3 \end{pmatrix}}_{\mathbf{M}\;(3 \times 3)} \underbrace{\begin{pmatrix} 230 \\ 37 \\ 22 \end{pmatrix}}_{\mathbf{x}\;(3 \times 1)} =\underbrace{\begin{pmatrix} {\approx}1.23 \\ {\approx}0.91 \\ {\approx}{-0.45} \end{pmatrix}}_{\mathbf{y}\;(3 \times 1)\;\text{(z-scores)}}\]

Same idea in PCA, LDA, regression — just a different choice of \(\mathbf{M}\) in \[\mathbf{y} = \mathbf{M}\,\mathbf{x}\]

Rotation matrices

A rotation changes our coordinate directions by an angle \(\theta\), without stretching the cloud.

\[ \begin{pmatrix} x' \\ y' \end{pmatrix} = \underbrace{ \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix} }_{\text{rotation by }\theta} \begin{pmatrix} x \\ y \end{pmatrix} \]

  • \(\cos\theta\) keeps the “same-axis” contribution
  • \(\sin\theta\) mixes in the perpendicular axis
  • Signs control the direction of rotation
  • The columns are the new unit axes expressed in old coordinates

Numbers you will see in the transformation playground

Angle \(\cos\theta\) \(\sin\theta\)
\(30°\) \(0.866\) \(0.500\)
\(60°\) \(0.500\) \(0.866\)

Key properties

  • Rotation preserves shape: all distances unchanged
  • For a pure rotation matrix, \(\det = 1\)
  • That follows from \(\cos^2\theta + \sin^2\theta = 1\)
  • \(\det = 1\) → area preserved, no reflection
  • Columns are orthonormal: each has length 1, they are perpendicular

Transformation playground

Select a matrix transformation and see how it reshapes the point cloud. Blue = original (standardised TV vs sales). Red = transformed. Watch: distances, area, and whether the cloud collapses.

What did you notice?

Quick discussion (turn to a neighbour)

  1. Which transformation preserved distances between points?

Rotation — all pairwise distances preserved (det = ±1, orthogonal matrix)

  1. Which one collapsed the cloud to a line?

Singular (rank 1) — det = 0, the 2D cloud was squashed onto 1D

  1. What was special about det = 0?

Information is destroyed. Different inputs map to the same output — you cannot invert the transformation.

Correlation vs covariance

Toggle standardisation to see the difference between the covariance and correlation matrices.

  • The ellipse tilt stays the same — correlation is scale-free, but the axes become in standard deviation units.
  • S for scale-dependent covariance; R for scale-free correlation.

What is an eigenvector?

A direction the matrix only stretches — never rotates

For a square matrix S, an eigenvector \(\mathbf{v}\) satisfies:

\[\mathbf{S}\,\mathbf{v} = \lambda\,\mathbf{v}\]

  • \(\mathbf{v}\) is the direction (eigenvector) — the matrix doesn’t rotate it
  • \(\lambda\) is the eigenvalue — how much the matrix stretches along that direction
  • A 2×2 covariance matrix has two eigenvectors → the axes of the ellipse
  • Large \(\lambda\) = lots of spread; tiny \(\lambda\) = almost no information that way

Business intuition: eigenvectors are the natural “axes” of your data cloud. They become the principal components in Lecture 3.

When variables are redundant

Drag the slider towards 1 and watch the ellipse collapse → singularity. Toggle eigenvectors to see the axes shrink.

What a degenerate matrix looks like

A concrete numerical example

Suppose a dataset records TV spend and TV spend doubled as a second column:

Obs TV TV×2 sales
1 100 200 12
2 230 460 22
3 45 90 8
4 180 360 18

Key insight: Column 2 is \(\text{TV} \times 2 = 2 \cdot \text{TV}\)no new information. This forces Row 2 = 2 × Row 1 in the covariance matrix below.

  • The \((3 \times 3)\) covariance matrix collapses: \[ \mathbf{S} = \begin{pmatrix} 4{,}900 & 9{,}800 & 280 \\ 9{,}800 & 19{,}600 & 560 \\ 280 & 560 & 33 \end{pmatrix} \]
  • Row 2 = 2 × Row 1: no independent second dimension
  • \(\det(\mathbf{S}) = 0\): singular, cannot be inverted
  • One eigenvalue = 0: a direction with zero variance

Why singularity matters

When det(S) → 0

  • The covariance matrix cannot be inverted
  • Parameter estimates become unstable (huge standard errors)
  • Effective dimensionality is reduced — redundant variables
  • Near-zero eigenvalues = directions without variance

Business translation: If two KPIs move in lockstep you are measuring the same thing twice. Drop one — or let a multivariate technique handle it.

Advertising in 3D — the full point cloud (time permitting)

PCA preview: PCA finds a new axis system where variables are uncorrelated, ordered by variance.
“Project” means drop a perpendicular shadow onto a line.

Rotate and zoom to explore how three variables relate simultaneously. Toggle Show eigenvectors to draw the three PC lines — each point projected onto the PC axis, mapped back into original TV/radio/sales space.

Wrapping Up

  1. When \(r \to 1\), what happens geometrically, and what does it imply about information?

→ The ellipse collapses toward a line; one eigenvalue goes to \(\approx 0\) — one direction contains almost no independent information.

  1. Which methods become unstable when redundancy is extreme, and why?

→ Anything needing \(\mathbf{S}^{-1}\) becomes unstable or undefined: inversion amplifies noise along tiny-variance directions.

  1. What does standardisation change, and what does it preserve?

→ It rescales axes to unit variance, turning covariance into correlation, while preserving the relationship pattern (the ellipse tilt).

Next: Lecture 3 — Principal Component Analysis

What’s coming

  • Eigendecomposition: \(\mathbf{S} = \mathbf{V}\boldsymbol{\Lambda}\mathbf{V}^\top\)
  • The eigenvectors you just saw become the principal components
  • Covariance or correlation matrix? Which to feed PCA and why
  • Choosing how many components to keep (scree plot, Kaiser rule)
  • Business applications: dimensionality reduction, visualisation, feature engineering