Machine Learning Notes (2): PCA and SVD

Introduction

PCA and SVD are two widely used dimensional reduction techniques. Dimension Reduction, namely, is to save the information from origin data as much as possible and reduce the attributes that the data has as less as possible. PCA and SVD are closely related and the difference between them is that SVD still have the means of data opposite to PCA;

Core Concept

  1. Principle Component Analysis (PCA);
  2. Singular Value Decomposition (SVD);

Principle Component Analysis

The goal of PCA is to find a new set of dimensions that better captures the variability of data. Several appealing characteristics PCA has:

  1. Tends to identify the strongest patterns in the data (so that PCA can be used as a pattern-finding technique);
  2. Most of the variability can be captured by a small fraction of the total set of dimensions;
  3. Eliminating much of the noise if noise in the data is weaker than the pattern;

Let’s see some PCA’s detail first.

Details of PCA

The goal of PCA is to find a transformation of data that satisfies:

  1. Each pair of new attributes have 0 covariance;
  2. The attributes are ordered with respect to how much of the variance of the data each attribute captures;
  3. The first attribute captures as much of the variance of the data as possible;;

The progress of PCA is:

  1. Calculate the covariance matrix S for the attributes of data set D;
  2. Calculate the matrix U which is composed by eigenvectors of S;
  3. Calculate the linear transformation D’ = DU, then D’ is the new transformed data set. Therefor we reduce the dimension to k (the number of eigenvalues);

Singular Value Decomposition

Singular Value Decomposition is a highlight in linear algebra. If we want to diagonalize a m x n matrix A, we cannot use eigenvectors because they are usually not orthogonal, not enough and requires A to be a square matrix. SVD solve these problems in a perfect way.

SVD tries to decompose a m x n matrix A in a from:

such that:

With all these mathematical tools, if we have a data set D, we can reduce its attributes like:

  1. Compute the left singular value U of D and its right singular value V;
  2. Compute the matrix D’ = DV, then D’ is the new transformed data we want. Therefore we reduce data set’s dimension to r;

SVD has following properties:

  1. Right singular vectors, i.e., the columns of V captures patterns among the attributes;
  2. Left singular vectors, i.e., the columns of U captures patterns among the objects;

Take a closer look on PCA and SVD

  1. PCA and SVD are closely related viewed from linear algebra because they are both matrix decomposition techniques.
  2. After processed by PCA, the new set of attributes are linear uncorrelated, but SVD maybe not;