Lesson 4

Welcome! Today, we're going to explore the **stats package** available in R, a powerful tool created for advanced statistical computations. One of the major advantages of using a tool like the `stats`

package is its ability to handle complex problems that require multiple calculations — a key feature in areas such as engineering, data science, or any field that heavily relies on data analysis. In this lesson, you'll familiarize yourself with numerous features in the `stats`

package, which will serve as additional tools in your data analytics toolbox.

In statistics, distribution functions play a vital role as they help us identify the probability of potential outcomes for a random event. For example, in a dice game, the distribution function can inform us about the chances of rolling a six. Because we need some data to explore the `stats`

package, let's generate a meaningful data sample using the `rnorm()`

function:

R`1# Simulating temperature data for a year in a city 2temp_data <- rnorm(n=365, mean=30, sd=10)`

In this example, we generate a vector of `365`

values, which are normally distributed with a mean of `30`

and a standard deviation of `10`

.

The `stats`

package in R offers numerous statistical functions. However, for skewness and kurtosis, we'll need to use the `e1071`

package. **Skewness** measures the asymmetry of a probability distribution around its mean, while **kurtosis** gauges how prone a distribution is to outliers. For example, these metrics could help us understand unusual variations in a city's annual temperature data.

R`1# load the e1071 package 2library(e1071) 3 4data <- rnorm(n=1000) 5 6# Compute skewness - a measure of data asymmetry 7data_skewness <- skewness(data) 8 9# Compute kurtosis - a measure of data "tailedness" or outliers 10data_kurtosis <- kurtosis(data) 11 12print(paste("Skewness: ", data_skewness)) 13print(paste("Kurtosis: ", data_kurtosis))`

Please take a look at the picture below. This graph showcases the asymmetry in statistical distributions. A negative skewness (blue curve) indicates that the left tail is longer or fatter than the right side, showing more lower-valued data. Conversely, a positive skewness (red curve) indicates a distribution where the right tail is longer or fatter, representing more higher-valued data. Skewness helps us identify the shape and direction of the spread in our data.

The subsequent image informs us about the shape of a distribution's tail and peak. The blue curve represents a normal distribution with a kurtosis of `0`

, showcasing a relatively balanced distribution with no extreme values. The red curve, with a higher kurtosis (a Laplace distribution), has a more pronounced or 'pointy' peak with heavier tails, indicating more extreme values in the data. High kurtosis can signify an extraordinary event, like a black swan event in finance.

Well done today! We became familiar with the `stats`

package in R and its application in statistical computations. We learned how to generate normally distributed random numbers in R and how to calculate skewness and kurtosis using the `e1071`

package. I encourage you to continue practicing to build your confidence and to further explore the possibilities in the data world with `R`

. Remember, your data analysis journey is just beginning! Happy analyzing!