Lesson 3
Extending Data Visualization: Enhancing Plots and Analyzing with Matplotlib
Setting Foot on Matplotlib: Basics of Plotting Categorical Data

Welcome to another exciting session! Today, we're stepping into the world of data visualization by introducing Matplotlib's visualization tools. We'll be learning the basics of plotting categorical data from our dataset and understanding the insight such visualization can provide.

Data visualization is an essential tool in data analysis—you can communicate complex data structures and uncover relationships, trends, and patterns in the data. It plays a pivotal role in exploratory data analysis, a fundamental skill for all data scientists.

Taking the passengers aboard Titanic as an example, each passenger belonged to a specific gender and a unique passenger class. Can we observe any underlying pattern that might be of interest? Are survival rates higher for a certain gender or passenger class? Or does the embarkation point play a role? We'll address these questions as we traverse the path of data visualization.

Introduction to Matplotlib

Matplotlib is an extensive library for creating static, animated, and interactive visualizations in Python. To make it versatile across multiple platforms, it offers a MATLAB-like interface.

Let's start by importing the pyplot module of the Matplotlib library:

Python
1import matplotlib.pyplot as plt

pyplot provides a high-level interface for creating attractive graphs. To demonstrate this, we'll first analyze the sex column of the Titanic dataset.

We retrieve the counts of each category — male and female — with value_counts(), and plotting them is as simple as calling plot() with the argument 'bar':

Python
1import matplotlib.pyplot as plt 2import seaborn as sns 3 4# Load the dataset 5titanic_df = sns.load_dataset('titanic') 6 7# Count total males and females 8gender_data = titanic_df['sex'].value_counts() 9 10# Create a bar chart 11gender_data.plot(kind ='bar', title='Sex Distribution') 12plt.show()

image

Enhancing Plots: Labels and Title

It's good practice to include a title and labels for the axes to make your plot more understandable. You can achieve this using xlabel(), ylabel(), and title() functions. Let's enhance our plot:

Python
1gender_data = titanic_df['sex'].value_counts() 2 3gender_data.plot(kind ='bar') 4plt.xlabel("Sex") 5plt.ylabel("Count") 6plt.title("Sex Distribution") 7plt.show()

image

In this code, plt.xlabel("Sex") adds 'Sex' as the label for the x-axis, plt.ylabel("Count") adds 'Count' as the label for the y-axis, and plt.title("Sex Distribution") sets 'Sex Distribution' as the title for the plot.

A Look at Other Categories

Just as we did with the sex column, we can also analyze the pclass (passenger class) and embarked (embarkation point) columns:

Python
1# Passenger class distribution 2class_data = titanic_df['pclass'].value_counts() 3class_data.plot(kind='bar') 4plt.xlabel("Passenger Class") 5plt.ylabel("Count") 6plt.title("Passenger Class Distribution") 7plt.show()

image

Python
1# Embarkation point distribution 2embark_data = titanic_df['embarked'].value_counts() 3embark_data.plot(kind='bar') 4plt.xlabel("Embarkation Point") 5plt.ylabel("Count") 6plt.title("Embarkation Point Distribution") 7plt.show()

image

These plots visualize the count of passengers based on their passenger class and embarked points, giving us some insights about the dataset.

Customizing Your Plot

Not only does the plot() method enable us to generate various types of charts, but it also allows us to adjust many parameters for better visualization.

  • color: Sets the color of the plot.
  • alpha: Sets the transparency level.
  • grid: Whether or not to display grid lines.

Let's experiment with these parameters:

Python
1gender_data.plot(kind='bar', color='skyblue', alpha=0.7, grid=True) 2plt.xlabel("Sex") 3plt.ylabel("Count") 4plt.title("Sex Distribution") 5plt.show()

image

Wrapping Up

Congratulations! You have taken your first steps into the world of data visualization, learning how to create bar plots with Matplotlib. You've learned about the significance of data visualization and discovered how to make your plots more readable by adding labels and titles.

From here, with this foundation, you are now well-placed to explore the further capabilities that the pyplot interface provides, such as line plots, scatter plots, and much more.

Ready to Practice?

Next are several practice sessions that allow you to apply what you've learned. Remember, practice is key to mastering these concepts and developing your skills further!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.