Welcome to another exciting session! Today, we're stepping into the world of data visualization by introducing Matplotlib's visualization tools. We'll be learning the basics of plotting categorical data from our dataset and understanding the insight such visualization can provide.
Data visualization is an essential tool in data analysis—you can communicate complex data structures and uncover relationships, trends, and patterns in the data. It plays a pivotal role in exploratory data analysis, a fundamental skill for all data scientists.
Taking the passengers aboard Titanic as an example, each passenger belonged to a specific gender
and a unique passenger class
. Can we observe any underlying pattern that might be of interest? Are survival rates higher for a certain gender or passenger class? Or does the embarkation point play a role? We'll address these questions as we traverse the path of data visualization.
Matplotlib is an extensive library for creating static, animated, and interactive visualizations in Python. To make it versatile across multiple platforms, it offers a MATLAB-like interface.
Let's start by importing the pyplot
module of the Matplotlib library:
Python1import matplotlib.pyplot as plt
pyplot
provides a high-level interface for creating attractive graphs. To demonstrate this, we'll first analyze the sex
column of the Titanic dataset.
We retrieve the counts of each category — male
and female
— with value_counts()
, and plotting them is as simple as calling plot()
with the argument 'bar'
:
Python1import matplotlib.pyplot as plt 2import seaborn as sns 3 4# Load the dataset 5titanic_df = sns.load_dataset('titanic') 6 7# Count total males and females 8gender_data = titanic_df['sex'].value_counts() 9 10# Create a bar chart 11gender_data.plot(kind ='bar', title='Sex Distribution') 12plt.show()
It's good practice to include a title and labels for the axes to make your plot more understandable. You can achieve this using xlabel()
, ylabel()
, and title()
functions. Let's enhance our plot:
Python1gender_data = titanic_df['sex'].value_counts() 2 3gender_data.plot(kind ='bar') 4plt.xlabel("Sex") 5plt.ylabel("Count") 6plt.title("Sex Distribution") 7plt.show()
In this code, plt.xlabel("Sex")
adds 'Sex' as the label for the x-axis, plt.ylabel("Count")
adds 'Count' as the label for the y-axis, and plt.title("Sex Distribution")
sets 'Sex Distribution' as the title for the plot.
Just as we did with the sex
column, we can also analyze the pclass
(passenger class) and embarked
(embarkation point) columns:
Python1# Passenger class distribution 2class_data = titanic_df['pclass'].value_counts() 3class_data.plot(kind='bar') 4plt.xlabel("Passenger Class") 5plt.ylabel("Count") 6plt.title("Passenger Class Distribution") 7plt.show()
Python1# Embarkation point distribution 2embark_data = titanic_df['embarked'].value_counts() 3embark_data.plot(kind='bar') 4plt.xlabel("Embarkation Point") 5plt.ylabel("Count") 6plt.title("Embarkation Point Distribution") 7plt.show()
These plots visualize the count of passengers based on their passenger class
and embarked
points, giving us some insights about the dataset.
Not only does the plot()
method enable us to generate various types of charts, but it also allows us to adjust many parameters for better visualization.
color
: Sets the color of the plot.alpha
: Sets the transparency level.grid
: Whether or not to display grid lines.
Let's experiment with these parameters:
Python1gender_data.plot(kind='bar', color='skyblue', alpha=0.7, grid=True) 2plt.xlabel("Sex") 3plt.ylabel("Count") 4plt.title("Sex Distribution") 5plt.show()
Congratulations! You have taken your first steps into the world of data visualization, learning how to create bar plots with Matplotlib. You've learned about the significance of data visualization and discovered how to make your plots more readable by adding labels and titles.
From here, with this foundation, you are now well-placed to explore the further capabilities that the pyplot
interface provides, such as line plots, scatter plots, and much more.
Next are several practice sessions that allow you to apply what you've learned. Remember, practice is key to mastering these concepts and developing your skills further!