Lesson 4
Creating Bar Plots and Histograms
Creating Bar Plots and Histograms

Welcome back! In the last lesson, you learned how to make your plots more informative by adding labels and titles. Now, it's time to expand your skills by creating different types of visualizations — specifically, bar plots and histograms. These types of plots are incredibly useful for summarizing categorical and continuous data.

What You'll Learn

In this unit, you’ll learn how to create bar plots and histograms using ggplot2. We'll start with a simple bar plot to show the count of different species in the iris dataset. Then, we'll move on to creating a histogram to represent the distribution of Sepal Length.

Bar Plots

Bar plots are a straightforward way to visualize categorical data. They display rectangular bars with lengths proportional to the values they represent. Here, we'll use the geom_bar function to create a bar plot of species counts in the iris dataset.

By the end of this lesson, you'll be able to generate a basic bar plot like the one shown below:

Here's the code to create a bar plot:

R
1# Load built-in dataset 2data(iris) 3 4# Bar plot of Species count 5bar_plot <- ggplot(iris, aes(x = Species)) + 6 geom_bar() + 7 theme_light() + 8 labs(title = "Count of Species")

Let's break down the code:

  • data(iris) loads iris dataset
  • ggplot(iris, aes(x = Species)) initializes the plot with the iris dataset and sets the x-aesthetic to the Species variable.
  • geom_bar() tells ggplot2 to create a bar plot.
  • theme_light() applies a lighter theme to the plot.
  • labs(title = "Count of Species") adds a title to the plot.
Histograms

Histograms are used to visualize the distribution of a continuous variable. They divide the data into bins and display the count of data points in each bin. We'll use the geom_histogram function to create a histogram of Sepal Length, filled by species and adjusted with specific bin widths.

By the end of this lesson, you'll be able to generate a basic histogram like the one shown below:

Here's the code to create a histogram:

R
1# Load built-in dataset 2data(iris) 3 4# Histogram of Sepal Length 5hist_plot <- ggplot(iris, aes(x = Sepal.Length, fill = Species)) + 6 geom_histogram(binwidth = 0.5, position = "dodge") + 7 theme_light() + 8 labs(title = "Distribution of Sepal Length")

Let's break down this code:

  • data(iris) loads iris dataset
  • ggplot(iris, aes(x = Sepal.Length, fill = Species)) initializes the plot with the iris dataset and sets the x-aesthetic to Sepal.Length and the fill to Species. The fill aesthetic is used to color the bars based on the Species variable, making it easier to distinguish between different species within the distribution.
  • geom_histogram(binwidth = 0.5, position = "dodge") creates a histogram with bins of width 0.5 and dodges the bars by species so they don’t overlap.
  • theme_light() applies a lighter theme to the plot.
  • labs(title = "Distribution of Sepal Length") adds a title to the plot.
Why It Matters

Bar plots and histograms are essential for data analysis:

  • Bar plots help you compare counts across different categories, making it easy to spot trends and patterns in categorical data. For example, you can quickly see which species is most common in the iris dataset.
  • Histograms are perfect for understanding the distribution of numerical data. They allow you to see the shape of the data, understand its spread, and identify any outliers or unusual values. For example, a histogram of Sepal Length can show us if the data is normally distributed or skewed in some way.

By mastering these types of plots, you’ll be able to present your data in ways that are both informative and visually appealing. This is crucial whether you're doing academic research, business analysis, or personal projects. Clear and compelling visualizations make your insights more impactful and easier to understand.

Exciting, right? Let’s start the practice section and bring these concepts to life.

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.