Mastering Model Evaluation: Performance Metrics & Selection in Machine Learning

Journey into Machine Learning with Sklearn and Tensorflow

Intro to Model Optimization in Machine LearningLesson 6

Lesson 6

Mastering Model Evaluation: Performance Metrics & Selection in Machine Learning

Lesson Overview

In this final lesson on Model Evaluation Post-Optimization, we navigate through logistic regression, decision trees, and explore ensemble techniques to enhance accuracy. We tie it all together by comparing models post-optimization and making an informed selection to ensure reliable predictions.

Model Evaluation and Selection: A Practical Approach

Post-optimization model evaluation transcends mere accuracy comparisons. It involves analyzing performance metrics such as precision, recall, and f1-score to choose a model that truly understands the data, ensuring selection of a model that generalizes well on unseen data.

Logistic Regression Optimization

We begin by fine-tuning our Logistic Regression model:

Python
1from sklearn.model_selection import GridSearchCV
2from sklearn.linear_model import LogisticRegression
3
4# Hyperparameter tuning with GridSearchCV
5lr_params = {'C': [0.001, 0.01, 0.1, 1, 10, 100], 'penalty': ['l2']}
6lr = LogisticRegression(random_state=42, max_iter=1000)
7clf_lr = GridSearchCV(lr, lr_params, cv=5)
8clf_lr.fit(X_train, y_train)
9
10# Evaluating optimized Logistic Regression
11lr_pred = clf_lr.predict(X_test)
12print("Logistic Regression Classification Report:\n", classification_report(y_test, lr_pred))

The best parameters found are {'C': 1, 'penalty': 'l2'}, significantly enhancing our model performance. Evaluating the optimized Logistic Regression yields:

Plain text
1Logistic Regression Classification Report:
2               precision    recall  f1-score   support
3
4           0       0.97      0.98      0.98        63
5           1       0.99      0.98      0.99       108
6
7    accuracy                           0.98       171
8   macro avg       0.98      0.98      0.98       171
9weighted avg       0.98      0.98      0.98       171

Random Forest & Gradient Boosting Performance

Next, we evaluate Random Forest and Gradient Boosting:

Python
1from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
2
3# Random Forest
4rf = RandomForestClassifier(random_state=42).fit(X_train, y_train)
5rf_pred = rf.predict(X_test)
6print("Random Forest Accuracy:", accuracy_score(y_test, rf_pred))
7
8# Gradient Boosting
9gb = GradientBoostingClassifier(random_state=42).fit(X_train, y_train)
10gb_pred = gb.predict(X_test)
11print("Gradient Boosting Accuracy:", accuracy_score(y_test, gb_pred))

Random Forest Classifier Accuracy: 0.9707602339181286
Gradient Boosting Classifier Accuracy: 0.9590643274853801

These comparisons illuminate the balanced performance of Logistic Regression across all metrics, overtaking the ensemble techniques in our scenario.

Final Model Selection and Evaluation

After optimizing and evaluating our models, the crucial step is to select the best-performing model. This decision is made not solely on accuracy but considering a holistic view of performance metrics.

Given the comprehensive analysis, the optimized Logistic Regression emerges as the preferred choice. Adopting it as our final model, we conduct an ultimate evaluation:

Python
1# Final Model Selection and Evaluation
2# Based on the accuracy and classification report, choose the best model
3final_model = clf_lr.best_estimator_
4final_predictions = final_model.predict(X_test)
5print("Final Model Accuracy:", accuracy_score(y_test, final_predictions))
6print("Final Model Classification Report:\n", classification_report(y_test, final_predictions))

Plain text
1Final Model Accuracy: 0.9824561403508771
2Final Model Classification Report:
3               precision    recall  f1-score   support
4
5           0       0.97      0.98      0.98        63
6           1       0.99      0.98      0.99       108
7
8    accuracy                           0.98       171
9   macro avg       0.98      0.98      0.98       171
10weighted avg       0.98      0.98      0.98       171

Given its superior precision, recall, and f1-score alongside outstanding accuracy, our final model choice is justified.

Key Takeaways

The process of model selection post-optimization is pivotal, requiring a careful balance between different performance metrics. Our effort with Logistic Regression demonstrates that meticulously tuning and evaluating models can significantly enhance their predictive capabilities. Remember, the essence of machine learning lies in experimenting, evaluating, and selecting the best model tailored to the specific needs of the dataset at hand.

Wow another course almost done! You are getting so good at this! Let's do some more practices and call it complete.

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.