Welcome! Today, we're going to explore the stats package available in R, a powerful tool created for advanced statistical computations. One of the major advantages of using a tool like the stats
package is its ability to handle complex problems that require multiple calculations — a key feature in areas such as engineering, data science, or any field that heavily relies on data analysis. In this lesson, you'll familiarize yourself with numerous features in the stats
package, which will serve as additional tools in your data analytics toolbox.
In statistics, distribution functions play a vital role as they help us identify the probability of potential outcomes for a random event. For example, in a dice game, the distribution function can inform us about the chances of rolling a six. Because we need some data to explore the stats
package, let's generate a meaningful data sample using the rnorm()
function:
R1# Simulating temperature data for a year in a city 2temp_data <- rnorm(n=365, mean=30, sd=10)
In this example, we generate a vector of 365
values, which are normally distributed with a mean of 30
and a standard deviation of 10
.
The stats
package in R offers numerous statistical functions. However, for skewness and kurtosis, we'll need to use the e1071
package. Skewness measures the asymmetry of a probability distribution around its mean, while kurtosis gauges how prone a distribution is to outliers. For example, these metrics could help us understand unusual variations in a city's annual temperature data.
R1# load the e1071 package 2library(e1071) 3 4data <- rnorm(n=1000) 5 6# Compute skewness - a measure of data asymmetry 7data_skewness <- skewness(data) 8 9# Compute kurtosis - a measure of data "tailedness" or outliers 10data_kurtosis <- kurtosis(data) 11 12print(paste("Skewness: ", data_skewness)) 13print(paste("Kurtosis: ", data_kurtosis))
Please take a look at the picture below. This graph showcases the asymmetry in statistical distributions. A negative skewness (blue curve) indicates that the left tail is longer or fatter than the right side, showing more lower-valued data. Conversely, a positive skewness (red curve) indicates a distribution where the right tail is longer or fatter, representing more higher-valued data. Skewness helps us identify the shape and direction of the spread in our data.
The subsequent image informs us about the shape of a distribution's tail and peak. The blue curve represents a normal distribution with a kurtosis of 0
, showcasing a relatively balanced distribution with no extreme values. The red curve, with a higher kurtosis (a Laplace distribution), has a more pronounced or 'pointy' peak with heavier tails, indicating more extreme values in the data. High kurtosis can signify an extraordinary event, like a black swan event in finance.
Well done today! We became familiar with the stats
package in R and its application in statistical computations. We learned how to generate normally distributed random numbers in R and how to calculate skewness and kurtosis using the e1071
package. I encourage you to continue practicing to build your confidence and to further explore the possibilities in the data world with R
. Remember, your data analysis journey is just beginning! Happy analyzing!