Last updated on：4 years ago

Dimensionality reduction plays an essential part in speeding up learning processes.

Basic ideas

Data compression

eg. reduce data from 2D to 1D
eg. reduce data from 3D to 2D

Data visualization

1D, 2D, 3D

Principal component analysis/PCA

Try to find a lower dimensional surface, so that the sum value of the square of these group line segment is small.

Reduce from n-dimension to k-dimension: find k vector $u^{(1)}, u^{(2)}, …, u^{(K)}$ onto which to project the data to minimize the projection error.

PCA is not linear regression

Linear regression vs. PCA

$X\to y$ VS. treated $x_1, x_2, …, x_m$ equally.

Principal component analysis algorithm

Pre-processing (feature scaling/mean normalization):

Replace each $x_j^{(i)}$ with $x_j - \mu_j$
Scale features to have comparable range of values.

Compute covariance matrix: $\sum = \frac{1}{m} \sum_{i=1}^n (x^{(i)}) (x^{(j)}) ^T$

(satisfy symmetric positive definite)

Compute eigenvectors of matrix $\sum$:

[U, S, V] = svd(Sigma);

Reconstruction from the compressed representation

$$z = U^T_{reduce} x$$
$$X_{approx}^{(1)} = U_{reduce} z^{(1)}$$
$$R^n = (n\times k) (k\times 1) $$

Choosing k the number of principal components

Average squared projection error: $$\frac{1}{m} \sum^m_{i=1} ||x^{(i)} - x^{(i)}_{approx}|| ^2$$
Total variation in the data: $\frac{1}{m} \sum^m_{i=1} ||x^{(i)}|| ^2$

Typically, choose k to be smallest value so that
$$\frac{\frac{1}{m} \sum^m_{i=1} ||x^{(i)} - x^{(i)}{approx}|| ^2}{\frac{1}{m} \sum^m{i=1} ||x^{(i)}|| ^2} <= 0.01$$
99% of variance is retained.

Advice for applying PCA

Extract inputs from an unlabelled dataset
New training set

Note: mapping $x \to z$ should be defined by running PCA only on the training set. This mapping can be applied as well to the examples $x_cv$ and $x_test$ in the cross validation and test sets

Design of ML system

Get training set
Run PCA to reduce $x^{(i)}$ in dimension to get $z^{(i)}$
Train logistic regression on
Test on test set: map $$x_{test}^{(i)} \to z^{(i)}_{test}$$

Before implementing PCA, try running whatever you want to do with the original/raw data $x^{(i)}$ first . only if that doesn’t do what you want, then implement PCA and consider using $z^{(i)}$

Reference

[1] Andrew NG, Machine learning

Machine learning

学习笔记 class review machine learning data compression PCA

本博客所有文章除特别声明外，均采用 CC BY-SA 4.0 协议，转载请注明出处！

Anomaly Detection - Class Review Previous

Unsupervised learning and clustering algorithms Next

Dimensionality reduction for input data