First eigenvector vs principle component of a symmetric matrix

gilgi

Two concepts that are easy to confuse are eigenvectors and principle components. When the matrix in question is symmetric, there is a relationship between the first eigenvector and the projection of the data onto its first principle component. In this post, we'll use diagonalization and singular value decomposition to try to shed some light on this.

In [0]:

import numpy as np
from sklearn.decomposition import PCA

Set random seed for reproducibility.

In [0]:

np.random.seed(42)

Make a symmetric matrix $A$ which represents the contact matrix. This is typically made from $\frac{\textrm{observed}}{\textrm{1-D expected}}$ data but for this demo it doesn't matter. For simplicity we will make sure the row and column means are zero to avoid running into questions about whether or not PCA should perform an extra centering step.

In [0]:

a = np.random.normal(size=(4, 4))
a += a.T
a -= np.mean(a, axis=0)[None, :]
a -= np.mean(a, axis=1)[:, None]
assert np.allclose(np.mean(a, axis=0), 0)
assert np.allclose(np.mean(a, axis=1), 0)
assert np.allclose(a, a.T)
a

Out[0]:

array([[-0.66845153,  2.26173989, -0.07248698, -1.52080138],
       [ 2.26173989, -0.84507574,  0.68300369, -2.09966784],
       [-0.07248698,  0.68300369, -0.89935823,  0.28884153],
       [-1.52080138, -2.09966784,  0.28884153,  3.33162769]])

Make the covariance matrix of the contact matrix $C=AA^T=A^TA$. Each entry $C_{i,j}$ describes the covariance between the interaction pattern of the $i$th bin $A_{i,:}$ and the interaction pattern of the $j$th bin $A_{j,:}$. np.cov() scales the covariance matrix by dividing it by $N-1$.

$$ %Make the Pearson correlation matrix of the contact matrix. Each entry $C_{i,j}$ describes how correlated the interaction pattern of the $i$th bin $A_{i,:}$ is with the interaction pattern of the $j$th bin $A_{j,:}$. $$

In [0]:

#c = a @ a.T
#assert np.allclose(c, np.cov(a)*(a.shape[0] - 1))
c = np.corrcoef(a)
c

Out[0]:

array([[ 1.        , -0.03043523,  0.37185465, -0.74252732],
       [-0.03043523,  1.        , -0.51341948, -0.61132217],
       [ 0.37185465, -0.51341948,  1.        , -0.12572149],
       [-0.74252732, -0.61132217, -0.12572149,  1.        ]])

Perform PCA on $A$, then project the data onto the first principle component. Scale this projected representation to unit length (since the units on the PC1 axis are arbitrary anyway).

In [0]:

pca = PCA()
pca_proj = pca.fit_transform(c)[:, 0]
norm_pca_proj = pca_proj / np.linalg.norm(pca_proj)
norm_pca_proj

Out[0]:

array([-0.47695205, -0.35476371,  0.02805388,  0.80366189])

Check the vector that describes the first principle component.

In [0]:

pc1 = pca.components_[0, :]
pc1

Out[0]:

array([-0.54655655, -0.43931021, -0.03541597,  0.71205911])

PCA identifies the principle components as the eigenvectors of the covariance matrix. np.linalg.eig() performs the eigendecomposition of $C$ into a diagonal matrix of eigenvalues $\Lambda$ and a matrix of eigenvectors $Q$ such that

$$C=Q \Lambda Q^{-1}$$

where $Q_{:, i}$ is the $i$th eigenvector, associated with the $i$th eigenvalue $\Lambda_{i,i}$.

In the code, eigvals is the diagonal of $\Lambda$ and is therefore a vector of the eigenvalues. q is $Q$ and is therefore the matrix of the eigenvectors. The eigenvalues can be in any order, but we are only interested in the largest one, which represents the first principle component.

In [0]:

d = c - np.mean(c, axis=1)[:, None]
eigvals, q = np.linalg.eig(d @ d.T)
top_eigval_idx = np.argmax(np.abs(eigvals))
top_eigvec = q[:, top_eigval_idx]
top_eigvec

Out[0]:

array([ 0.54655655,  0.43931021,  0.03541597, -0.71205911])

In [0]:

e = a - np.mean(a, axis=1)[:, None]
e /= np.std(a, axis=1)[:, None]
np.allclose((e @ e.T)/e.shape[0], np.corrcoef(a))

Out[0]:

True

Another way to implement PCA is to compute the SVD of $A$, which decomposes $A$ into a diagonal matrix of singular values $\Sigma$, an orthogonal matrix of right singular vectors $V$, and an orthogonal matrix of left singular vectors $U$ such that

$$A=U \Sigma V^T$$

We can then write the covariance matrix $C$ in terms of these singular values and vectors

\begin{align} C &= A^TA \\ &= (U \Sigma V^T)^T(U \Sigma V^T) \\ &= V \Sigma U^T U \Sigma V^T \\ &= V \Sigma \Sigma V^T \quad \textrm{because }U\textrm{ is orthogonal}\\ &= V \Sigma^2 V^T \\ \end{align}

Since $A$ is symmetric, we could equivalently have written its covariance matrix as $C=AA^T$ to obtain

\begin{align} C &= AA^T \\ &= (U \Sigma V^T)(U \Sigma V^T)^T \\ &= U \Sigma V^T V \Sigma U^T \\ &= U \Sigma \Sigma U^T \quad \textrm{because }V\textrm{ is orthogonal}\\ &= U \Sigma^2 U^T \\ \end{align}

Since the diagonalization of a matrix is unique (assuming we make all the eigenvalues positive and order them from largest to smallest and ignoring the possibility of degenerate eigenvalues), we can see that $Q=V=U$ and $\Lambda=\Sigma^2$.

This means that we can obtain the first princple component of $A$ by computing its SVD and taking the right singular vector in $V$ with the largest singular value.

In [0]:

u, s, vt = np.linalg.svd(a)
v = vt.T
top_sv_idx = np.argmax(np.abs(s))
top_right_singvec = v[:, top_sv_idx]
top_left_singvec = u[:, top_sv_idx]
assert np.allclose(top_right_singvec, top_left_singvec) or np.allclose(top_right_singvec, -top_left_singvec)
top_right_singvec

Out[0]:

array([ 0.44819699, -0.49072754, -0.50665589,  0.54918644])

As expected, the first principle component pc1 of $A$ is the same as the first eigenvector (the eigenvector corresponding to the largest eigenvalue) of the covariance matrix $C$, and is also equal to the first right singular vector of $A$.

In [0]:

np.allclose(pc1, top_eigvec) or np.allclose(pc1, -top_eigvec),\
np.allclose(pc1, top_right_singvec) or np.allclose(pc1, -top_right_singvec)

Out[0]:

(False, True)

What is perhaps less expected is that the projection of the data onto the first principle component is also equal to this first eigenvector.

In [0]:

np.allclose(norm_pca_proj, top_eigvec) or np.allclose(norm_pca_proj, -top_eigvec),\
np.allclose(norm_pca_proj, top_right_singvec) or np.allclose(norm_pca_proj, -top_right_singvec)

Out[0]:

(False, True)

We can check this by projecting the data $A$ onto the subspace spanned by the principle components (which we can find in $Q$, $V$, or $U$):

\begin{align} A_\textrm{proj} &= AV \\ &= U\Sigma V^T V \quad \textrm{plug in SVD of }A \\ &= U\Sigma \quad \textrm{because }V\textrm{ is orthogonal} \end{align}

As we noted above, $Q=V=U$, so the projection is just the eigenvector scaled by its associated singular value. In other words, the projection of the data in $A$ onto the first principle component of $A$ is just the first principle component of $A$ scaled by the first singular value of $A$.

If $A$ was not symmetric, we would still have $V=Q$ (SVD is still a valid way to perform PCA) but $V\neq U$ and therefore the projection of $A$ onto the first principle component would not be equal to the first eigenvector.

Comments