Lesson 3
Exploring Locally Linear Embedding: A Dimensionality Reduction Technique
Introduction

Welcome to the fascinating world of Locally Linear Embedding (LLE), a vital tool within our dimensionality reduction toolbox. Unlike linear techniques like Principal Component Analysis (PCA), LLE shines in preserving local properties when dealing with high-dimensional data.

In this lesson, we'll unravel the algorithm behind LLE, discuss its uses, and reveal how it offers unique insights compared to techniques like PCA. We'll use Python and libraries like numpy, matplotlib, and sklearn to illustrate these concepts.

Prepare to dive in and explore the depths of LLE!

What is Locally Linear Embedding and it's Use Cases?

LLE stands in the spotlight for its prowess in preserving relationships within local neighborhoods while reducing high-dimensional data. It captures the twists and turns within our data.

An intuitive example would be comparing a street map, which undergoes a reduce-in-size transformation. PCA would distort local structures, much like a bird's eye view would misrepresent the distances between landmarks. LLE, however, keeps the local distances intact, preserving the neighborhood structure just as well in the reduced version.

LLE performs notably well while navigating through data with intricate, non-linear structures. When handling high-dimensional data such as facial image feature extractions or genomics data, the LLE technique proves to be quite beneficial. But remember: the technique does require the appropriate tuning of its hyperparameters.

Understanding the Theory Behind LLE

Let's delve deeper into understanding the theoretical underpinnings of the LLE algorithm. In a nutshell, the LLE algorithm can be conceptualized as solving an optimization problem in two steps to minimize the error between manifolds in high dimensional and low dimensional spaces.

Step 1: Approximating neighbors in high-dimensional space

Firstly, for every data point xix_i in the high-dimensional dataset, we compute its KK nearest neighbors. Essential to this process is the concept of 'k'. This value corresponds to the number of nearby data points which it considers when constructing the lower-dimensional mapping.

This step solves the following optimization problem:

minWi=1NxijN(i)Wijxj2\min_{W} \sum_{i=1}^{N} \|x_i - \sum_{j \in N(i)} W_{ij}x_j\|^2

where N(i) is the set of neighbors of xix_i, and WijW_{ij} is a set of weights that minimizes the cost function. This cost function measures how well each point xix_i is reconstructed from its neighbors, given the weights WijW_{ij}. The weights WijW_{ij} are computed to minimize this cost, subject to the constraint that for each data point xix_i, jWij=1\sum_j W_{ij} = 1.

Step 2: Preserving weights in low-dimensional space

Once we obtain the weights, we then use these weights to map the data points into a lower-dimensional space, which best preserves these neighborhood relationships. The mapping seeks a set of low-dimensional points YY which minimally distorts these relationships. It solves the following optimization problem:

minYi=1NyijN(i)Wijyj2\min_{Y} \sum_{i=1}^{N} \|y_i - \sum_{j \in N(i)} W_{ij}y_j\|^2

where the weights WijW_{ij} are fixed. The low-dimensional representations yiy_i are hence computed to preserve the high-dimensional relationships encapsulated by the weights. The structure of each neighborhood is preserved.

These two steps illustrate how LLE reduces dimensionality by sacrificing some data fidelity, but prioritizes preserving local relationships within the data.

By maintaining local properties, LLE effectively captures non-linear structures and distinguishes itself from linear methods such as PCA. Leveraging these local relationships can provide us with a better understanding of high-dimensional data, which is particularly beneficial in fields such as machine learning, computer vision, and bioinformatics. Now that we’ve gained a more profound insight into LLE let’s proceed and see it in practice.

Breaking down the LLE Algorithm: Generating the Data

The algorithm of LLE can be summarized into three main steps: identifying neighbors, reconstructing with linear weights, and mapping to lower dimensions.

To illustrate these steps, we're employing a popular choice in visualizing non-linear dimensionality reduction techniques—the Swiss Roll dataset. Let’s generate the Swiss Roll data first:

Python
1import numpy as np 2from sklearn.datasets import make_swiss_roll 3from sklearn.manifold import LocallyLinearEmbedding 4import matplotlib.pyplot as plt 5 6# Generate Swiss Roll data 7n_samples = 1500 8noise = 0.05 9X, color = make_swiss_roll(n_samples=n_samples, noise=noise, random_state=42)

The make_swiss_roll function generates a Swiss Roll dataset — a dataset which, naturally, cannot be correctly flattened using linear techniques like PCA. Swiss Roll is a set of data points that form a 3D spiral when plotted, making it an ideal candidate for non-linear dimensionality reduction techniques like LLE.

The n_samples parameter determines the number of data points in the dataset, while noise adds random noise to the data points. The random_state parameter ensures reproducibility of the data. We can now visualize the Swiss Roll data:

Python
1# Plottting the Swiss Roll 2fig = plt.figure() 3ax = fig.add_subplot(111, projection='3d') 4ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=color, cmap=plt.cm.viridis) 5plt.title("Swiss Roll") 6plt.show()

We'll see the following plot:

image

Applying Locally Linear Embedding

Let's now apply LLE to our Swiss Roll data. We are using 12 neighbors for each point and seeking a 2-dimensional output:

Python
1from sklearn.manifold import LocallyLinearEmbedding 2 3# Apply Locally Linear Embedding 4n_neighbors = 12 5n_components = 2 6lle = LocallyLinearEmbedding(n_neighbors=n_neighbors, n_components=n_components, random_state=42) 7X_reduced = lle.fit_transform(X)

In the above code, n_components=2 is used to reduce the data to 2 dimensions. The n_neighbors=12 parameter is essential here. This is the number of neighbors for each data point to consider when performing the embedding.

The parameter n_neighbors effectively determines the size of the local neighborhood. A low value will limit the number of neighbors and may miss broader trends in the data, while a high value could incorporate too much of the data structure, potentially including noise.

As such, careful tuning of n_neighbors is necessary to maintain a balance between preserving local data distinctions and recognizing larger dataset patterns.

Lastly, embedding.fit_transform(X) is used to fit the model with X, then applies the dimensionality reduction on X. The transformed data is stored in X_reduced.

Again, remember that careful selection of the number of neighbors, n_neighbors, is pivotal in capturing the correct local and global structures within your data.

This transformed data can then be used for subsequent machine learning tasks, offering potential improvements in computation time and algorithm performance.

Applying PCA for Comparison

For comparison, we will also apply PCA on the same Swiss Roll data:

Python
1from sklearn.decomposition import PCA 2 3# Apply PCA 4pca = PCA(n_components=n_components, random_state=42) 5X_pca = pca.fit_transform(X)

PCA represents a linear technique that contrasts with the non-linear nature of LLE, allowing us to spotlight the unique advantages of each method.

Data Visualization

Now, let's visualize our original data, relegated data from LLE, and data steeped through PCA side-by-side:

Python
1import matplotlib.pyplot as plt 2 3# Visualize the original data and the results after applying LLE and PCA 4fig = plt.figure(figsize=(15, 6)) 5fig.suptitle("Dimensionality Reduction using LLE and PCA", fontsize=16) 6 7# Original data 8ax = fig.add_subplot(131, projection='3d') 9ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=color, cmap=plt.cm.viridis) 10ax.view_init(10, -60) 11ax.set_title("Original data") 12 13# Data after LLE 14ax = fig.add_subplot(132) 15ax.scatter(X_reduced[:, 0], X_reduced[:, 1], c=color, cmap=plt.cm.viridis) 16ax.set_title("LLE") 17 18# Data after PCA 19ax = fig.add_subplot(133) 20ax.scatter(X_pca[:, 0], X_pca[:, 1], c=color, cmap=plt.cm.viridis) 21ax.set_title("PCA") 22 23plt.show()

We can observe the Swiss Roll data in its original form, the data after applying LLE, and the data after PCA:

image

The visual coparison might not be as clear for the eye, but the LLE method has successfully unfolded the Swiss Roll data into a 2D plane, preserving the local relationships within the data. PCA, on the other hand, has compressed the data into a 2D plane, losing the local relationships in the process. We can confirm this by comparing the reconstruction errors of LLE and PCA.

Note: The plots might vary slightly due to version differences in the libraries.

Comparing Reconstruction Errors

Finally, we compare the reconstruction errors of LLE and PCA. The reconstruction error represents the error in approximating the actual data points, with a lower value implying a better approximation:

Python
1# Compare the reconstruction error of LLE and PCA 2lle_reconstruction_error = lle.reconstruction_error_ 3pca_reconstruction_error = np.sum(pca.explained_variance_ratio_) 4print(f"LLE Reconstruction Error: {lle_reconstruction_error}") # ~8.5 x 10^-8 5pca_reconstruction_error = 1 - np.sum(pca.explained_variance_ratio_) 6print(f"PCA Reconstruction Error: {pca_reconstruction_error}") # ~0.28 the error values may vary slightly due to randomness

Our results further confirm the effectiveness of LLE compared to PCA for this specific dataset.

Lesson Summary and Practice

Congratulations! You've just immersed yourself into the complex landscape of LLE and its comparison with PCA. You have also seen how this knowledge can be applied practically in the context of the Swiss Roll dataset, a common benchmark for non-linear techniques.

Stepping into the practice zone now will help consolidate your newfound understanding. We're excited for you to dive deeper into non-linear dimensionality reduction techniques—and beyond. Happy learning!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.