Lesson 3

Welcome to the fascinating world of Locally Linear Embedding (LLE), a vital tool within our dimensionality reduction toolbox. Unlike linear techniques like Principal Component Analysis (PCA), LLE shines in preserving local properties when dealing with high-dimensional data.

In this lesson, we'll unravel the algorithm behind LLE, discuss its uses, and reveal how it offers unique insights compared to techniques like PCA. We'll use Python and libraries like `numpy`

, `matplotlib`

, and `sklearn`

to illustrate these concepts.

Prepare to dive in and explore the depths of LLE!

LLE stands in the spotlight for its prowess in preserving relationships within local neighborhoods while reducing high-dimensional data. It captures the twists and turns within our data.

An intuitive example would be comparing a street map, which undergoes a reduce-in-size transformation. PCA would distort local structures, much like a bird's eye view would misrepresent the distances between landmarks. LLE, however, keeps the local distances intact, preserving the neighborhood structure just as well in the reduced version.

LLE performs notably well while navigating through data with intricate, non-linear structures. When handling high-dimensional data such as facial image feature extractions or genomics data, the LLE technique proves to be quite beneficial. But remember: the technique does require the appropriate tuning of its hyperparameters.

Let's delve deeper into understanding the theoretical underpinnings of the LLE algorithm. In a nutshell, the LLE algorithm can be conceptualized as solving an optimization problem in two steps to minimize the error between manifolds in high dimensional and low dimensional spaces.

**Step 1: Approximating neighbors in high-dimensional space**

Firstly, for every data point $x_i$ in the high-dimensional dataset, we compute its $K$ nearest neighbors. Essential to this process is the concept of 'k'. This value corresponds to the number of nearby data points which it considers when constructing the lower-dimensional mapping.

This step solves the following optimization problem:

$\min_{W} \sum_{i=1}^{N} \|x_i - \sum_{j \in N(i)} W_{ij}x_j\|^2$where N(i) is the set of neighbors of $x_i$, and $W_{ij}$ is a set of weights that minimizes the cost function. This cost function measures how well each point $x_i$ is reconstructed from its neighbors, given the weights $W_{ij}$. The weights $W_{ij}$ are computed to minimize this cost, subject to the constraint that for each data point $x_i$, $\sum_j W_{ij} = 1$.

**Step 2: Preserving weights in low-dimensional space**

Once we obtain the weights, we then use these weights to map the data points into a lower-dimensional space, which best preserves these neighborhood relationships. The mapping seeks a set of low-dimensional points $Y$ which minimally distorts these relationships. It solves the following optimization problem:

$\min_{Y} \sum_{i=1}^{N} \|y_i - \sum_{j \in N(i)} W_{ij}y_j\|^2$where the weights $W_{ij}$ are fixed. The low-dimensional representations $y_i$ are hence computed to preserve the high-dimensional relationships encapsulated by the weights. The structure of each neighborhood is preserved.

These two steps illustrate how LLE reduces dimensionality by sacrificing some data fidelity, but prioritizes preserving local relationships within the data.

By maintaining local properties, LLE effectively captures non-linear structures and distinguishes itself from linear methods such as PCA. Leveraging these local relationships can provide us with a better understanding of high-dimensional data, which is particularly beneficial in fields such as machine learning, computer vision, and bioinformatics. Now that we’ve gained a more profound insight into LLE let’s proceed and see it in practice.

The algorithm of LLE can be summarized into three main steps: identifying neighbors, reconstructing with linear weights, and mapping to lower dimensions.

To illustrate these steps, we're employing a popular choice in visualizing non-linear dimensionality reduction techniques—the Swiss Roll dataset. Let’s generate the Swiss Roll data first:

Python`1import numpy as np 2from sklearn.datasets import make_swiss_roll 3from sklearn.manifold import LocallyLinearEmbedding 4import matplotlib.pyplot as plt 5 6# Generate Swiss Roll data 7n_samples = 1500 8noise = 0.05 9X, color = make_swiss_roll(n_samples=n_samples, noise=noise, random_state=42)`

The `make_swiss_roll`

function generates a Swiss Roll dataset — a dataset which, naturally, cannot be correctly flattened using linear techniques like PCA. Swiss Roll is a set of data points that form a 3D spiral when plotted, making it an ideal candidate for non-linear dimensionality reduction techniques like LLE.

The `n_samples`

parameter determines the number of data points in the dataset, while `noise`

adds random noise to the data points. The `random_state`

parameter ensures reproducibility of the data. We can now visualize the Swiss Roll data:

Python`1# Plottting the Swiss Roll 2fig = plt.figure() 3ax = fig.add_subplot(111, projection='3d') 4ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=color, cmap=plt.cm.viridis) 5plt.title("Swiss Roll") 6plt.show()`

We'll see the following plot:

Let's now apply LLE to our Swiss Roll data. We are using 12 neighbors for each point and seeking a 2-dimensional output:

Python`1from sklearn.manifold import LocallyLinearEmbedding 2 3# Apply Locally Linear Embedding 4n_neighbors = 12 5n_components = 2 6lle = LocallyLinearEmbedding(n_neighbors=n_neighbors, n_components=n_components, random_state=42) 7X_reduced = lle.fit_transform(X)`

In the above code, `n_components=2`

is used to reduce the data to 2 dimensions. The `n_neighbors=12`

parameter is essential here. This is the number of neighbors for each data point to consider when performing the embedding.

The parameter `n_neighbors`

effectively determines the size of the local neighborhood. A low value will limit the number of neighbors and may miss broader trends in the data, while a high value could incorporate too much of the data structure, potentially including noise.

As such, careful tuning of `n_neighbors`

is necessary to maintain a balance between preserving local data distinctions and recognizing larger dataset patterns.

Lastly, `embedding.fit_transform(X)`

is used to fit the model with `X`

, then applies the dimensionality reduction on `X`

. The transformed data is stored in `X_reduced`

.

Again, remember that careful selection of the number of neighbors, `n_neighbors`

, is pivotal in capturing the correct local and global structures within your data.

This transformed data can then be used for subsequent machine learning tasks, offering potential improvements in computation time and algorithm performance.

For comparison, we will also apply PCA on the same Swiss Roll data:

Python`1from sklearn.decomposition import PCA 2 3# Apply PCA 4pca = PCA(n_components=n_components, random_state=42) 5X_pca = pca.fit_transform(X)`

PCA represents a linear technique that contrasts with the non-linear nature of LLE, allowing us to spotlight the unique advantages of each method.

Now, let's visualize our original data, relegated data from LLE, and data steeped through PCA side-by-side:

Python`1import matplotlib.pyplot as plt 2 3# Visualize the original data and the results after applying LLE and PCA 4fig = plt.figure(figsize=(15, 6)) 5fig.suptitle("Dimensionality Reduction using LLE and PCA", fontsize=16) 6 7# Original data 8ax = fig.add_subplot(131, projection='3d') 9ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=color, cmap=plt.cm.viridis) 10ax.view_init(10, -60) 11ax.set_title("Original data") 12 13# Data after LLE 14ax = fig.add_subplot(132) 15ax.scatter(X_reduced[:, 0], X_reduced[:, 1], c=color, cmap=plt.cm.viridis) 16ax.set_title("LLE") 17 18# Data after PCA 19ax = fig.add_subplot(133) 20ax.scatter(X_pca[:, 0], X_pca[:, 1], c=color, cmap=plt.cm.viridis) 21ax.set_title("PCA") 22 23plt.show()`

We can observe the Swiss Roll data in its original form, the data after applying LLE, and the data after PCA:

The visual coparison might not be as clear for the eye, but the LLE method has successfully unfolded the Swiss Roll data into a 2D plane, preserving the local relationships within the data. PCA, on the other hand, has compressed the data into a 2D plane, losing the local relationships in the process. We can confirm this by comparing the reconstruction errors of LLE and PCA.

**Note:** The plots might vary slightly due to version differences in the libraries.

Finally, we compare the reconstruction errors of LLE and PCA. The reconstruction error represents the error in approximating the actual data points, with a lower value implying a better approximation:

Python`1# Compare the reconstruction error of LLE and PCA 2lle_reconstruction_error = lle.reconstruction_error_ 3pca_reconstruction_error = np.sum(pca.explained_variance_ratio_) 4print(f"LLE Reconstruction Error: {lle_reconstruction_error}") # ~8.5 x 10^-8 5pca_reconstruction_error = 1 - np.sum(pca.explained_variance_ratio_) 6print(f"PCA Reconstruction Error: {pca_reconstruction_error}") # ~0.28 the error values may vary slightly due to randomness`

Our results further confirm the effectiveness of LLE compared to PCA for this specific dataset.

Congratulations! You've just immersed yourself into the complex landscape of LLE and its comparison with PCA. You have also seen how this knowledge can be applied practically in the context of the Swiss Roll dataset, a common benchmark for non-linear techniques.

Stepping into the practice zone now will help consolidate your newfound understanding. We're excited for you to dive deeper into non-linear dimensionality reduction techniques—and beyond. Happy learning!