Lesson 3
Advanced Scatter Plot Customization
Introduction to Advanced Scatter Plot Customization

Hello and welcome! In today’s lesson, we’re diving into advanced scatter plot customization using the diamonds dataset in Python. By the end of this lesson, you will have learned how to customize scatter plots to reveal complex patterns and provide better insights into your data. We'll do this by adjusting aesthetics like colors, markers, and transparency, as well as incorporating additional details using size and hue.

Marker Size Adjustment

Changing the marker size with the s parameter can help in making the scatter plot more readable, especially when dealing with overlapping points.

Python
1import seaborn as sns 2import matplotlib.pyplot as plt 3import pandas as pd 4 5# Load the diamonds dataset 6diamonds = sns.load_dataset('diamonds') 7 8# Creating a new feature 'volume' (x * y * z) 9diamonds['volume'] = diamonds['x'] * diamonds['y'] * diamonds['z'] 10 11# Filter out unrealistic volumes 12diamonds = diamonds[(diamonds['volume'] > 0) & (diamonds['volume'] < 300)] 13 14# Scatter plot with adjusted marker size 15plt.figure(figsize=(10,6)) 16sns.scatterplot(x='volume', y='price', data=diamonds, s=100, alpha=0.4) 17plt.title('Scatter Plot of Volume vs. Price (Adjusted Marker Size)') 18plt.xlabel('Volume') 19plt.ylabel('Price') 20plt.show()

Changing Marker Style

Customizing marker styles with the marker parameter helps in distinguishing between different types of data points.

Python
1import seaborn as sns 2import matplotlib.pyplot as plt 3import pandas as pd 4 5# Load the diamonds dataset 6diamonds = sns.load_dataset('diamonds') 7 8# Creating a new feature 'volume' (x * y * z) 9diamonds['volume'] = diamonds['x'] * diamonds['y'] * diamonds['z'] 10 11# Filter out unrealistic volumes 12diamonds = diamonds[(diamonds['volume'] > 0) & (diamonds['volume'] < 300)] 13 14# Scatter plot with adjusted marker style 15plt.figure(figsize=(10,6)) 16sns.scatterplot(x='volume', y='price', data=diamonds, marker='x', alpha=0.6, s=100) 17plt.title('Scatter Plot of Volume vs. Price (Marker Style X)') 18plt.xlabel('Volume') 19plt.ylabel('Price') 20plt.show()

Differentiating Data Points with Hue

Beyond basic aesthetics, leveraging the hue parameter can add more layers of information to your scatter plot. This adds color differentiation to the data points, representing another dimension by using distinct colors for different categories of a variable. Here is an example that uses the 'cut' feature with the hue parameter to differentiate between the cut categories:

Python
1import seaborn as sns 2import matplotlib.pyplot as plt 3import pandas as pd 4 5# Load the diamonds dataset 6diamonds = sns.load_dataset('diamonds') 7 8# Creating a new feature 'volume' (x * y * z) 9diamonds['volume'] = diamonds['x'] * diamonds['y'] * diamonds['z'] 10 11# Filter out unrealistic volumes 12diamonds = diamonds[(diamonds['volume'] > 0) & (diamonds['volume'] < 300)] 13 14# Scatter plot using hue to differentiate cut categories 15plt.figure(figsize=(10,6)) 16sns.scatterplot(x='volume', y='price', hue='cut', data=diamonds, alpha=0.6, s=100) 17plt.title('Scatter Plot of Volume vs. Price (Hue by Cut)') 18plt.xlabel('Volume') 19plt.ylabel('Price') 20plt.legend(title='Cut', bbox_to_anchor=(1.05, 1), loc='upper left') 21plt.show()

Using Size to Represent Another Variable

Adjusting marker sizes size to represent the 'carat' variable results in the following plot. Note that this is different to the s parameter, which sets a constant marker size for all the points in the plot. The sizes parameter is used to set the range.

Python
1import seaborn as sns 2import matplotlib.pyplot as plt 3import pandas as pd 4 5# Load the diamonds dataset 6diamonds = sns.load_dataset('diamonds') 7 8# Creating a new feature 'volume' (x * y * z) 9diamonds['volume'] = diamonds['x'] * diamonds['y'] * diamonds['z'] 10 11# Filter out unrealistic volumes 12diamonds = diamonds[(diamonds['volume'] > 0) & (diamonds['volume'] < 300)] 13 14# Scatter plot with marker sizes representing carat values 15plt.figure(figsize=(10,6)) 16sns.scatterplot(x='volume', y='price', hue='cut', size='carat', sizes=(20, 200), data=diamonds, alpha=0.6) 17plt.title('Scatter Plot of Volume vs. Price (Size by Carat)') 18plt.xlabel('Volume') 19plt.ylabel('Price') 20plt.legend(title='Cut', bbox_to_anchor=(1.05, 1), loc='upper left') 21plt.show()

Adding Regression Lines

Adding regression lines can help to identify trends. This is done using the regplot function, which uses the same primary parametes as the other plotting functions.

Python
1import seaborn as sns 2import matplotlib.pyplot as plt 3import pandas as pd 4 5# Load the diamonds dataset 6diamonds = sns.load_dataset('diamonds') 7 8# Creating a new feature 'volume' (x * y * z) 9diamonds['volume'] = diamonds['x'] * diamonds['y'] * diamonds['z'] 10 11# Filter out unrealistic volumes 12diamonds = diamonds[(diamonds['volume'] > 0) & (diamonds['volume'] < 300)] 13 14plt.figure(figsize=(10,6)) 15sns.scatterplot(x='volume', y='price', hue='cut', size='carat', sizes=(20, 200), data=diamonds, alpha=0.6) 16sns.regplot(x='volume', y='price', data=diamonds, scatter=False, color='gray') 17plt.title('Scatter Plot with Regression Line') 18plt.xlabel('Volume') 19plt.ylabel('Price') 20plt.legend(title='Cut', bbox_to_anchor=(1.05, 1), loc='upper left') 21plt.show()

Conclusion and Summary

In this lesson, you mastered advanced scatter plot customization techniques, including adjusting aesthetic properties, using size and hue to encode additional information, and adding regression lines. These skills are essential for better data representation and uncovering deeper insights.

Next, we'll have practice exercises to help solidify these concepts and further enhance your data visualization skills. Customizing scatter plots in such detail will make your data storytelling more effective and impactful!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.