Understanding and Implementing Kernel PCA with sklearn

Lesson 4

Introduction

Welcome, learners! Today, we step into an exciting chapter on non-linear dimensionality reduction techniques, where our focus will be on Kernel Principal Component Analysis (Kernel PCA), a variation from Principal Component Analysis (PCA). It's worth noting that Kernel PCA builds on PCA by extending its utility into non-linear dimensions.

The aim of today's lesson is to guide you to understand, uncover, and master Kernel PCA using sklearn. We'll cover everything from its theoretical foundation and the nuances of kernel selection to its practical applications.

Theoretical Insight: Kernel PCA

Kernel PCA, a variant of PCA, deals efficiently with non-linear transformations using kernel methods. It manages these transformations with the "Kernel Trick", a technique that maps input data into a higher-dimensional feature space compatible with linear separability, facilitated by Kernel functions.

Kernels are critical when estimating the similarity between two observations. The process of kernel selection, which involves choosing suitable kernels, like Linear, Polynomial, and Radial Basis Function (RBF), plays a pivotal role in Kernel PCA and has a significant impact on model performance.

Creating a Non-Linearly Separable Dataset

Before we begin, let's import the necessary libraries: sklearn's PCA, KernelPCA, train_test_split modules, matplotlib for graph plotting, and sklearn's make_circles to create a non-linearly separable dataset.

Python
1import matplotlib.pyplot as plt
2from sklearn.decomposition import PCA, KernelPCA
3from sklearn.model_selection import train_test_split
4from sklearn.datasets import make_circles

We dive into the crux of our lesson by creating a non-linearly separable dataset using make_circles(). We will split the dataset into training and testing sets, maintaining their stratification using sklearn's train_test_split().

Python
1X, y = make_circles(n_samples=1000, factor=0.01, noise=0.05, random_state=0)
2X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=0)

The make_circles() generates a large circle containing a smaller circle in 2D to form our non-linearly separable dataset. Then train_test_split() segregates the dataset into a training set and a test set. Our plot showcases the training data with two classes:

Implementing Kernel PCA

Next, we instantiate an object of the KernelPCA class and fit it to our training data.

Python
1kernel_pca = KernelPCA(kernel="rbf", gamma=10, fit_inverse_transform=True, alpha=0.1)
2kernel_pca.fit(X_train)

When we execute fit() function on our training data, sklearn's KernelPCA estimates the first two eigenvalues to project onto.

Note: We set fit_inverse_transform=True to enable the inverse transformation of the projected data back to the original space.

Kernel PCA Results

We transform our test data onto the principal components and visualize its projection as well as the original data.

Python
1score_kernel_pca = kernel_pca.transform(X_test)
2
3fig, axs = plt.subplots(1, 2, figsize=(10, 5))
4axs[0].plot(kernel_pca.eigenvalues_)
5axs[0].set_title("Principal components and their eigenvalues")
6axs[0].set_xlabel("nth principal component")
7axs[0].set_ylabel("Eigenvalue magnitude")
8
9axs[1].scatter(score_kernel_pca[:,0], score_kernel_pca[:,1] ,c=y_test,cmap='viridis')
10axs[1].set_title("Projection onto PCs (kernel)")
11axs[1].set_xlabel("1st principal component")
12axs[1].set_ylabel("2nd principal component")
13
14plt.tight_layout(pad=2.0)
15plt.show()

The first plot displays the eigenvalues of the principal components, while the second plot visualizes the projection of the test dataset onto the first and second principal components:

Inverse Transformation

We can also inversely transform the projected data back to the original space using the inverse_transform() method and visualize it to understand the reconstruction.

Let's visualize the original data and the inversely transformed data to see how well the Kernel PCA model reconstructs the original data:

Python
1X_hat_kpca = kernel_pca.inverse_transform(kernel_pca.transform(X_test))
2
3# Plot the original data and inverse transformed data
4fig, axs = plt.subplots(1, 2, figsize=(10, 5))
5axs[0].scatter(X_test[:,0], X_test[:,1], c=y_test, cmap='viridis')
6axs[0].set_title("Original data")
7axs[0].set_xlabel("1st feature")
8axs[0].set_ylabel("2nd feature")
9
10axs[1].scatter(X_hat_kpca[:,0], X_hat_kpca[:,1], c=y_test, cmap='viridis')
11axs[1].set_title("Inverse transformed data (kernel)")
12axs[1].set_xlabel("1st feature")
13axs[1].set_ylabel("2nd feature")
14
15plt.tight_layout(pad=2.0)
16plt.show()

The plot showcases the inversely transformed data, which is a reconstruction of the original data:

Computing the Reconstruction Error

Lastly, we compute the reconstruction error, the mean squared difference between our test dataset and the inversely transformed dataset.

Python
1print("Mean squared error for Kernel PCA is:",((X_test-X_hat_kpca)**2).mean()) # ~0.01

Kernel PCA Hyperparameters

Kernel PCA offers a range of hyperparameters to fine-tune the model. Here are a few key parameters:

kernel: The kernel function to use, such as 'linear', 'poly', 'rbf', 'sigmoid', or 'cosine' with default 'linear' suitable for linear transformations. The 'rbf' kernel is commonly used for non-linear data transformations, 'poly' for polynomial transformations, and 'sigmoid' for sigmoid transformations.
gamma: The kernel coefficient for 'rbf', 'poly', and 'sigmoid' kernels. A higher gamma value leads to a more complex model.
degree: The degree of the polynomial kernel function which defines the non-linearity of the transformation for 'poly' kernel. The higher the degree, the more complex the transformation.
alpha: The regularization parameter to control the balance between the data reconstruction and the non-linear transformation. A higher alpha value emphasizes the non-linear transformation.

Lesson Summary and Practice

Kudos on completing the lesson on Kernel PCA! Today, we have covered non-trivial aspects, such as how to work with non-linearly separable data, kernel techniques, and PCA. Now, it's time for some hands-on practice! Keep up the good work and happy coding!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.