Welcome to this interactive lesson on bar plots and histograms in R! In this lesson, we will embark on a beautiful journey through data visualization. We will focus on constructing bar plots and histograms using ggplot2
. Are you ready? Let's begin!
A bar plot visually represents categorical data as rectangular bars, the lengths of which are proportional to their respective values. For instance, a bar plot is an ideal choice if we want to visualize a bookstore's sales data, where the categories are book names and the values are the sales numbers.
We can build a bar plot using the geom_bar()
function from ggplot2
. Observe the following example:
R1library(ggplot2) 2 3books <- c('Book1', 'Book2', 'Book3', 'Book4', 'Book5') # Book names 4sales <- c(123, 432, 567, 245, 312) # Corresponding number of copies sold 5data <- data.frame(books, sales) 6 7plot <- ggplot(data, aes(x=books, y=sales)) + geom_bar(stat="identity", color="black", fill="lightblue") + 8 labs(title="Book Sales", x="Books", y="Number of Sold Copies")
Let's break down the arguments for the geom_bar
function:
- The
stat
argument in thegeom_bar
function specifies the statistical transformation for this layer to use on the data. In the provided example, we usestat="identity"
, which means that the heights of the bars are set to the values in the data. By default,geom_bar()
usesstat="count"
, which counts the number of cases at each x position and plots a bar with the corresponding height. color
: Defines the color of the bar's edges. Here, the edges are colored black.fill
: Sets the fill color of the bars. In this case, the bars are filled with lightblue.
Now, let's move on to histograms! Unlike bar plots, histograms are designed for visualizing the distributions of continuous, numeric data. In a histogram, bars represent the frequency of data points that fall under specific ranges or bins. Let's generate some normal distributions using the rnorm
function in R.
R1set.seed(123) # for reproducible results 2ages <- rnorm(n=150, mean=27, sd=12)
We'll use this data to create a histogram that visualizes the age distribution.
R1library(ggplot2) 2 3data <- data.frame(ages) 4 5plot <- ggplot(data, aes(ages)) + geom_histogram(binwidth=10, color="black", fill="lightblue") + 6 labs(title="Ages in City X", x="Ages", y="Number of People")
Let's breakdown the geom_histogram
function's parameters:
binwidth
: Specifies the width of the bins. In this example, each bin has a width of 10 years.color
: Defines the color of the bin edges. Here, the edges are colored black.fill
: Sets the fill color of the bins. In this case, the bins are filled with white.
While they may possess visual similarities, bar plots and histograms offer distinct views of data. Bar plots excel when displaying categorical data, whereas histograms provide insights into the distributions of numerical data.
Great job navigating through the basics of interpreting data using bar plots and histograms! Now, prepare for some practical exercises designed to give you hands-on experience. Let's get to work and practice these newfound skills! Remember, practice enhances understanding! Happy learning!