Understanding Skewness and Kurtosis in R

Lesson 4

Introduction and Topic Overview

Welcome! Today, we're going to explore the stats package available in R, a powerful tool created for advanced statistical computations. One of the major advantages of using a tool like the stats package is its ability to handle complex problems that require multiple calculations — a key feature in areas such as engineering, data science, or any field that heavily relies on data analysis. In this lesson, you'll familiarize yourself with numerous features in the stats package, which will serve as additional tools in your data analytics toolbox.

Generating Normally Distributed Random Numbers in R

In statistics, distribution functions play a vital role as they help us identify the probability of potential outcomes for a random event. For example, in a dice game, the distribution function can inform us about the chances of rolling a six. Because we need some data to explore the stats package, let's generate a meaningful data sample using the rnorm() function:

R
1# Simulating temperature data for a year in a city
2temp_data <- rnorm(n=365, mean=30, sd=10)

In this example, we generate a vector of 365 values, which are normally distributed with a mean of 30 and a standard deviation of 10.

Using Descriptive Statistics Functions in R

The stats package in R offers numerous statistical functions. However, for skewness and kurtosis, we'll need to use the e1071 package. Skewness measures the asymmetry of a probability distribution around its mean, while kurtosis gauges how prone a distribution is to outliers. For example, these metrics could help us understand unusual variations in a city's annual temperature data.

R
1# load the e1071 package
2library(e1071)
3
4data <- rnorm(n=1000)
5
6# Compute skewness - a measure of data asymmetry
7data_skewness <- skewness(data)
8
9# Compute kurtosis - a measure of data "tailedness" or outliers
10data_kurtosis <- kurtosis(data)
11
12print(paste("Skewness: ", data_skewness))
13print(paste("Kurtosis: ", data_kurtosis))

Interpretation of Skewness

Please take a look at the picture below. This graph showcases the asymmetry in statistical distributions. A negative skewness (blue curve) indicates that the left tail is longer or fatter than the right side, showing more lower-valued data. Conversely, a positive skewness (red curve) indicates a distribution where the right tail is longer or fatter, representing more higher-valued data. Skewness helps us identify the shape and direction of the spread in our data.

Interpretation of Kurtosis

The subsequent image informs us about the shape of a distribution's tail and peak. The blue curve represents a normal distribution with a kurtosis of 0, showcasing a relatively balanced distribution with no extreme values. The red curve, with a higher kurtosis (a Laplace distribution), has a more pronounced or 'pointy' peak with heavier tails, indicating more extreme values in the data. High kurtosis can signify an extraordinary event, like a black swan event in finance.

Summary and Reflection on Learned Skills

Well done today! We became familiar with the stats package in R and its application in statistical computations. We learned how to generate normally distributed random numbers in R and how to calculate skewness and kurtosis using the e1071 package. I encourage you to continue practicing to build your confidence and to further explore the possibilities in the data world with R. Remember, your data analysis journey is just beginning! Happy analyzing!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.