Welcome to our lesson on Heat Maps for Monthly Data Analysis. In this session, we'll dive deeper into the more complex visualizations using the Seaborn
library to plot heat maps.
Heat maps are a superb tool for displaying multivariate datasets in a two-dimensional image. They visually represent data through colors, where different color gradients represent different values. This is very useful in fields like Data Science, as heat maps are powerful tools for exploring and understanding patterns in a given dataset.
In our context of analyzing air travel data, what if we could find out how the monthly passenger count has fluctuated over the years? Which month or year had the highest passenger count? Does the count exhibit a pattern or trend? Heat maps are a great tool to answer these questions, and we'll learn to do exactly that in this lesson.
Heat maps are generated using the Seaborn
library, which builds on Matplotlib
and integrates seamlessly with pandas data structures. Let's start by developing a heat map for monthly passenger trends in air travel.
We start by loading up the flights
dataset, as before:
Python1import seaborn as sns 2 3# Load the dataset 4flights = sns.load_dataset('flights')
Since our interest is on a year-by-year and month-by-month basis, a pivot table fits our requirements best. The pivot table will have months as rows, years as columns, and passenger counts as the cell values. Python's pandas
library makes creating this pivot table straightforward:
Python1# Pivot the dataset 2flights_pivot = flights.pivot(index="month", columns="year", values="passengers") 3print(flights_pivot) 4""" 5year 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 6month 7Jan 112 115 145 171 196 204 242 284 315 340 360 417 8Feb 118 126 150 180 196 188 233 277 301 318 342 391 9Mar 132 141 178 193 236 235 267 317 356 362 406 419 10Apr 129 135 163 181 235 227 269 313 348 348 396 461 11May 121 125 172 183 229 234 270 318 355 363 420 472 12Jun 135 149 178 218 243 264 315 374 422 435 472 535 13Jul 148 170 199 230 264 302 364 413 465 491 548 622 14Aug 148 170 199 242 272 293 347 405 467 505 559 606 15Sep 136 158 184 209 237 259 312 355 404 404 463 508 16Oct 119 133 162 191 211 229 274 306 347 359 407 461 17Nov 104 114 146 172 180 203 237 271 305 310 362 390 18Dec 118 140 166 194 201 229 278 306 336 337 405 432 19"""
Now that we have our pivot table, we can create a heat map. We use Seaborn
's heatmap()
function, passing in our pivot table as an argument:
Python1# Plot a heatmap 2sns.heatmap(flights_pivot) 3 4plt.show()
This heat map immediately provides insight into the passenger count over the years. The color gradient (warmer for higher values, cooler for lower values) makes it easy to spot patterns and trends over time.
Seaborn
offers several parameters to customize heatmaps for better readability and presentation. Here, we'll revamp our heatmap by tinkering with these parameters:
cmap
: This parameter controls the colormap for the heat map. Different colormaps can be used to enhance the heat map's visual appeal and help interpret the data better.annot
: If set to True, this parameter allows the data values to be written on each cell in the heat map.fmt
: This is a string formatting code to use when adding annotations. While it's unnecessary when the annot
is not True
, you need to specify a string formatting code if you add annotations.linewidths
: This parameter allows adding lines between each cell in the heatmap. This helps to distinguish between each cell, especially when the colors amongst cells do not vary greatly.cbar
: This parameter adds a color bar to the heatmap when set to True. The color bar helps in understanding the color coding of the heatmap cells.center
: This parameter defines the value at which to center the colormap. This is useful in cases where the heatmap cells take values diverging around zero.Let's experiment with these parameters and enhance our heatmap:
Python1# Detailed heat map 2sns.heatmap(flights_pivot, 3 cmap='YlGnBu', # choosing a yellow-green-blue colormap 4 annot=True, # Turning on annotations 5 fmt="d", # displaying annotations as integer 6 linewidths=.5, # Add gridlines with width 0.5 7 cbar=True, # Include color bar 8 center = flights_pivot.loc["Jan", 1955] # Center colormap at the value of passengers in January 1955 9) 10plt.show()
Running this code block would produce a more detailed heat map. The numbers in each cell correspond to the actual count of passengers. There are lines of width 0.5
distinguishing every two heatmap cells. The color bar to the right of the heatmap serves as a reference for interpreting the heatmap colors. The center
parameter ensures that the colormap's neutral point corresponds to the number of passengers in January 1955.
We can quickly spot a significant insight: as the years go on, the passenger count increases, indicating the growth in the air travel industry. Moreover, we can observe a pattern in passenger counts attaining a peak during the summers each year, displaying the seasonality in air travel.
In this case, "seasonality" refers to a periodic and consistent fluctuation in the number of passengers over different months of the year. This type of pattern often occurs when the observations are collected over time. For instance, a school might display seasonality in electricity consumption, using more during the academic year and less during the holidays.
Appreciation for your accomplishment! Now you know how to create and customize heat maps using Python and Seaborn
, and how to extract valuable insights from them. Heat maps are a powerful tool for your data visualization skill set.
Remember the colors, patterns, and trends we identified in our heatmap? They answered our initial questions and painted a comprehensive picture of the growth and seasonality of air travel.
Next, it's time to reinforce and practice these skills! In the upcoming hands-on exercises, you will find basic exercises for beginners and challenges to test your new skills. Remember, the true art of learning to code is coding! Enjoy the thrill of these practice sessions, and best of luck!