Lesson 2

Visualizing Data with Bar Plots and Histograms in R

Introduction and Overview

Welcome to this interactive lesson on bar plots and histograms in R! In this lesson, we will embark on a beautiful journey through data visualization. We will focus on constructing bar plots and histograms using ggplot2. Are you ready? Let's begin!

Building Bar Plots with `ggplot2`

A bar plot visually represents categorical data as rectangular bars, the lengths of which are proportional to their respective values. For instance, a bar plot is an ideal choice if we want to visualize a bookstore's sales data, where the categories are book names and the values are the sales numbers.

We can build a bar plot using the geom_bar() function from ggplot2. Observe the following example:

R
1library(ggplot2) 2 3books <- c('Book1', 'Book2', 'Book3', 'Book4', 'Book5') # Book names 4sales <- c(123, 432, 567, 245, 312) # Corresponding number of copies sold 5data <- data.frame(books, sales) 6 7plot <- ggplot(data, aes(x=books, y=sales)) + geom_bar(stat="identity", color="black", fill="lightblue") + 8 labs(title="Book Sales", x="Books", y="Number of Sold Copies")

Let's break down the arguments for the geom_bar function:

  • The stat argument in the geom_bar function specifies the statistical transformation for this layer to use on the data. In the provided example, we use stat="identity", which means that the heights of the bars are set to the values in the data. By default, geom_bar() uses stat="count", which counts the number of cases at each x position and plots a bar with the corresponding height.
  • color: Defines the color of the bar's edges. Here, the edges are colored black.
  • fill: Sets the fill color of the bars. In this case, the bars are filled with lightblue.

Building Histograms with `ggplot2`: Dataset

Now, let's move on to histograms! Unlike bar plots, histograms are designed for visualizing the distributions of continuous, numeric data. In a histogram, bars represent the frequency of data points that fall under specific ranges or bins. Let's generate some normal distributions using the rnorm function in R.

R
1set.seed(123) # for reproducible results 2ages <- rnorm(n=150, mean=27, sd=12)
Building Histograms with `ggplot2`

We'll use this data to create a histogram that visualizes the age distribution.

R
1library(ggplot2) 2 3data <- data.frame(ages) 4 5plot <- ggplot(data, aes(ages)) + geom_histogram(binwidth=10, color="black", fill="lightblue") + 6 labs(title="Ages in City X", x="Ages", y="Number of People")

Let's breakdown the geom_histogram function's parameters:

  • binwidth: Specifies the width of the bins. In this example, each bin has a width of 10 years.
  • color: Defines the color of the bin edges. Here, the edges are colored black.
  • fill: Sets the fill color of the bins. In this case, the bins are filled with white.

Distinguishing Between Bar Plots and Histograms

While they may possess visual similarities, bar plots and histograms offer distinct views of data. Bar plots excel when displaying categorical data, whereas histograms provide insights into the distributions of numerical data.

Lesson Summary

Great job navigating through the basics of interpreting data using bar plots and histograms! Now, prepare for some practical exercises designed to give you hands-on experience. Let's get to work and practice these newfound skills! Remember, practice enhances understanding! Happy learning!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.