Hello again! In the last lesson, we learned how to split your dataset into training and testing sets using the caret
package in R. Now, we are ready to step into the next phase of our machine learning journey: training a simple model.
In this lesson, you'll discover how to train a Linear Support Vector Machine (SVM) model using the caret
package. Specifically, you will learn to:
- Understand the purpose and basic concept of a Linear SVM.
- Train a
Linear SVM
model on your training dataset. - Display and interpret some basic model details.
This process is straightforward and builds nicely on what you’ve learned so far.
Training your first machine learning model is a significant milestone. The Linear SVM
is a powerful and commonly used model in machine learning for classification tasks. By mastering the basics of training this model, you'll gain essential skills that will serve as a foundation for more advanced machine learning techniques and algorithms. This is where your data preprocessing and dataset splitting efforts come together to create a predictive model.
A Linear Support Vector Machine (SVM) is a type of algorithm used primarily for classification tasks. The basic concept is to find a hyperplane that best separates the classes in the feature space. In a two-dimensional space, this hyperplane is a line, but in higher dimensions, it becomes a plane or a hyperplane. The objective is to maximize the margin between the classes, which helps in achieving better generalization on unseen data. Linear SVMs are effective when the data is linearly separable, meaning that a straight line (or hyperplane) can separate the classes.
Before we train the Linear SVM model, let's quickly prepare our dataset by loading it and splitting it into training and testing sets. This process was covered in the previous lesson.
R1# Load iris dataset 2data(iris) 3 4# For reproducibility 5set.seed(123) 6 7# Splitting data into train and test sets 8trainIndex <- createDataPartition(iris$Species, p = 0.7, list = FALSE, times = 1) 9irisTrain <- iris[trainIndex,] 10irisTest <- iris[-trainIndex,]
data(iris)
loads the iris dataset.set.seed(123)
ensures reproducibility.createDataPartition(iris$Species, p = 0.7, list = FALSE, times = 1)
splits the dataset, with 70% allocated for training.
Now, let's train the Linear SVM model using the train
function from the caret
package. This function allows us to specify the model formula (the relationship between the target variable and the predictors), the dataset to use, and the method for training the model. In this case, we use svmLinear
to train a linear SVM.
R1# Training a Linear SVM model 2model <- train(Species ~ ., data = irisTrain, method = "svmLinear")
Species ~ .
specifies thatSpecies
is the target variable we aim to predict, and.
means using all other variables in the dataset as predictors.data = irisTrain
indicates that we are using the training subset of the iris dataset.method = "svmLinear"
tellscaret
to use a Linear SVM for training.
Once the model is trained, it’s essential to understand and interpret the details of the trained model.
R1# Display the model details 2print(model)
The print
function will display a summary of the model, including information about the accuracy and any other useful metrics. This helps you understand how well your model was trained on the dataset.
The output of the print(model)
command will provide details such as the accuracy of the model and the parameters used for tuning. Here’s a rough idea of what you might see:
1Support Vector Machines with Linear Kernel 2 3105 samples 4 4 predictor 5 3 classes: 'setosa', 'versicolor', 'virginica' 6 7No pre-processing 8Resampling: Bootstrapped (25 reps) 9Summary of sample sizes: 105, 105, 105, 105, 105, 105, ... 10Resampling results: 11 12 Accuracy Kappa 13 0.962487 0.9428477 14 15Tuning parameter 'C' was held constant at a value of 1
- Accuracy: Indicates how well the model performs on the training data.
- Kappa: A statistic that measures inter-rater agreement for categorical items, providing an idea about the model’s agreement with true labels beyond just accuracy.
Note that these scores are calculated on the training set.
Ready to see your efforts come to life? Let's start the practice section and begin training our first model together.