Lesson 4

Welcome, learners! Today, we step into an exciting chapter on non-linear dimensionality reduction techniques, where our focus will be on **Kernel Principal Component Analysis** (Kernel PCA), a variation from Principal Component Analysis (PCA). It's worth noting that Kernel PCA builds on PCA by extending its utility into non-linear dimensions.

The aim of today's lesson is to guide you to understand, uncover, and master Kernel PCA using `sklearn`

. We'll cover everything from its theoretical foundation and the nuances of kernel selection to its practical applications.

Kernel PCA, a variant of PCA, deals efficiently with non-linear transformations using kernel methods. It manages these transformations with the "Kernel Trick", a technique that maps input data into a higher-dimensional feature space compatible with linear separability, facilitated by Kernel functions.

Kernels are critical when estimating the similarity between two observations. The process of kernel selection, which involves choosing suitable kernels, like Linear, Polynomial, and Radial Basis Function (RBF), plays a pivotal role in Kernel PCA and has a significant impact on model performance.

Before we begin, let's import the necessary libraries: sklearn's PCA, KernelPCA, train_test_split modules, matplotlib for graph plotting, and sklearn's `make_circles`

to create a non-linearly separable dataset.

Python`1import matplotlib.pyplot as plt 2from sklearn.decomposition import PCA, KernelPCA 3from sklearn.model_selection import train_test_split 4from sklearn.datasets import make_circles`

We dive into the crux of our lesson by creating a non-linearly separable dataset using `make_circles()`

. We will split the dataset into training and testing sets, maintaining their stratification using sklearn's `train_test_split()`

.

Python`1X, y = make_circles(n_samples=1000, factor=0.01, noise=0.05, random_state=0) 2X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=0)`

The `make_circles()`

generates a large circle containing a smaller circle in 2D to form our non-linearly separable dataset. Then `train_test_split()`

segregates the dataset into a training set and a test set. Our plot showcases the training data with two classes:

Next, we instantiate an object of the `KernelPCA`

class and fit it to our training data.

Python`1kernel_pca = KernelPCA(kernel="rbf", gamma=10, fit_inverse_transform=True, alpha=0.1) 2kernel_pca.fit(X_train)`

When we execute `fit()`

function on our training data, sklearn's `KernelPCA`

estimates the first two eigenvalues to project onto.

**Note:** We set `fit_inverse_transform=True`

to enable the inverse transformation of the projected data back to the original space.

We transform our test data onto the principal components and visualize its projection as well as the original data.

Python`1score_kernel_pca = kernel_pca.transform(X_test) 2 3fig, axs = plt.subplots(1, 2, figsize=(10, 5)) 4axs[0].plot(kernel_pca.eigenvalues_) 5axs[0].set_title("Principal components and their eigenvalues") 6axs[0].set_xlabel("nth principal component") 7axs[0].set_ylabel("Eigenvalue magnitude") 8 9axs[1].scatter(score_kernel_pca[:,0], score_kernel_pca[:,1] ,c=y_test,cmap='viridis') 10axs[1].set_title("Projection onto PCs (kernel)") 11axs[1].set_xlabel("1st principal component") 12axs[1].set_ylabel("2nd principal component") 13 14plt.tight_layout(pad=2.0) 15plt.show()`

The first plot displays the eigenvalues of the principal components, while the second plot visualizes the projection of the test dataset onto the first and second principal components:

We can also inversely transform the projected data back to the original space using the `inverse_transform()`

method and visualize it to understand the reconstruction.

Let's visualize the original data and the inversely transformed data to see how well the Kernel PCA model reconstructs the original data:

Python`1X_hat_kpca = kernel_pca.inverse_transform(kernel_pca.transform(X_test)) 2 3# Plot the original data and inverse transformed data 4fig, axs = plt.subplots(1, 2, figsize=(10, 5)) 5axs[0].scatter(X_test[:,0], X_test[:,1], c=y_test, cmap='viridis') 6axs[0].set_title("Original data") 7axs[0].set_xlabel("1st feature") 8axs[0].set_ylabel("2nd feature") 9 10axs[1].scatter(X_hat_kpca[:,0], X_hat_kpca[:,1], c=y_test, cmap='viridis') 11axs[1].set_title("Inverse transformed data (kernel)") 12axs[1].set_xlabel("1st feature") 13axs[1].set_ylabel("2nd feature") 14 15plt.tight_layout(pad=2.0) 16plt.show()`

The plot showcases the inversely transformed data, which is a reconstruction of the original data:

Lastly, we compute the reconstruction error, the mean squared difference between our test dataset and the inversely transformed dataset.

Python`1print("Mean squared error for Kernel PCA is:",((X_test-X_hat_kpca)**2).mean()) # ~0.01`

Kernel PCA offers a range of hyperparameters to fine-tune the model. Here are a few key parameters:

`kernel`

: The kernel function to use, such as 'linear', 'poly', 'rbf', 'sigmoid', or 'cosine' with default 'linear' suitable for linear transformations. The 'rbf' kernel is commonly used for non-linear data transformations, 'poly' for polynomial transformations, and 'sigmoid' for sigmoid transformations.`gamma`

: The kernel coefficient for 'rbf', 'poly', and 'sigmoid' kernels. A higher gamma value leads to a more complex model.`degree`

: The degree of the polynomial kernel function which defines the non-linearity of the transformation for 'poly' kernel. The higher the degree, the more complex the transformation.`alpha`

: The regularization parameter to control the balance between the data reconstruction and the non-linear transformation. A higher alpha value emphasizes the non-linear transformation.

Kudos on completing the lesson on Kernel PCA! Today, we have covered non-trivial aspects, such as how to work with non-linearly separable data, kernel techniques, and PCA. Now, it's time for some hands-on practice! Keep up the good work and happy coding!