Lesson 2

Mastering t-SNE Parameter Tuning in Scikit-learn

Introduction

Welcome! Today's focus is on t-SNE parameter tuning using Scikit-learn. This lesson covers an understanding of critical t-SNE parameters, the practice of parameter tuning, and its impact on data visualization outcomes.

Preparing the data

Before delving into parameter tuning, let's quickly setup the dataset:

Here's a basic setup:

Python
1from sklearn.datasets import make_circles 2from sklearn.manifold import TSNE 3import matplotlib.pyplot as plt 4 5# Generate a non-linearly separable dataset 6X, y = make_circles(n_samples=500, factor=0.3, noise=0.1, random_state=42) 7 8# Plot the dataset 9plt.figure(figsize=(6, 6)) 10plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis') 11plt.title("Original Data") 12plt.show()

image

Understanding t-SNE Parameters: Perplexity

We will now delve into the key parameters in Scikit-learn's t-SNE. The first one being perplexity, which is loosely determined by the number of effective nearest neighbors. It strikes a balance between preserving the local and global data structure.

Understanding t-SNE Parameters: Early Exaggeration

The next parameter is early_exaggeration. It governs how tight natural clusters are in the embedded space. High values tend to make clusters denser.

Understanding t-SNE Parameters: Learning Rate

The final parameter, learning_rate, modulates the step size for the gradient during the optimization process.

Parameter Tuning Practice

Now that our dataset is prepared, we have the freedom to adjust the t-SNE parameters and observe the visual impact.

Python
1tsne_1 = TSNE(n_components=2, random_state=42) 2tsne_transformed_1 = tsne_1.fit_transform(X) 3 4tsne_2 = TSNE(n_components=2, random_state=42, perplexity=50, early_exaggeration=20, learning_rate=500) 5tsne_transformed_2 = tsne_2.fit_transform(X) 6 7# Plot the t-SNE transformed data with different configurations values 8plt.figure(figsize=(12, 6)) 9 10plt.subplot(1, 2, 1) 11plt.scatter(tsne_transformed_1[:, 0], tsne_transformed_1[:, 1], c=y, cmap='viridis') 12plt.title("t-SNE with default parameters") 13 14plt.subplot(1, 2, 2) 15plt.scatter(tsne_transformed_2[:, 0], tsne_transformed_2[:, 1], c=y, cmap='viridis') 16plt.title("t-SNE with perplexity=50, early_exaggeration=20, learning_rate=500") 17 18plt.show()

We can compare the results of TSNE with default parameters and custom one:

image

Note: the plot might be different for you due to version differences in the libraries.

Conclusion and Practice

In conclusion, mastering Scikit-learn's t-SNE involves understanding and effectively adjusting its tunable parameters. Throughout this lesson, we've traversed through parameter tuning in Scikit-learn's t-SNE, understood the impact of key parameters, and experimented with different parameter settings. Now, gear up for hands-on practice to couple theory with application. It's time to practice and excel in t-SNE!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.