In this final lesson on Model Evaluation Post-Optimization, we navigate through logistic regression, decision trees, and explore ensemble techniques to enhance accuracy. We tie it all together by comparing models post-optimization and making an informed selection to ensure reliable predictions.
Post-optimization model evaluation transcends mere accuracy comparisons. It involves analyzing performance metrics such as precision, recall, and f1-score to choose a model that truly understands the data, ensuring selection of a model that generalizes well on unseen data.
We begin by fine-tuning our Logistic Regression model:
Python1from sklearn.model_selection import GridSearchCV 2from sklearn.linear_model import LogisticRegression 3 4# Hyperparameter tuning with GridSearchCV 5lr_params = {'C': [0.001, 0.01, 0.1, 1, 10, 100], 'penalty': ['l2']} 6lr = LogisticRegression(random_state=42, max_iter=1000) 7clf_lr = GridSearchCV(lr, lr_params, cv=5) 8clf_lr.fit(X_train, y_train) 9 10# Evaluating optimized Logistic Regression 11lr_pred = clf_lr.predict(X_test) 12print("Logistic Regression Classification Report:\n", classification_report(y_test, lr_pred))
The best parameters found are {'C': 1, 'penalty': 'l2'}
, significantly enhancing our model performance. Evaluating the optimized Logistic Regression yields:
Plain text1Logistic Regression Classification Report: 2 precision recall f1-score support 3 4 0 0.97 0.98 0.98 63 5 1 0.99 0.98 0.99 108 6 7 accuracy 0.98 171 8 macro avg 0.98 0.98 0.98 171 9weighted avg 0.98 0.98 0.98 171
Next, we evaluate Random Forest and Gradient Boosting:
Python1from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier 2 3# Random Forest 4rf = RandomForestClassifier(random_state=42).fit(X_train, y_train) 5rf_pred = rf.predict(X_test) 6print("Random Forest Accuracy:", accuracy_score(y_test, rf_pred)) 7 8# Gradient Boosting 9gb = GradientBoostingClassifier(random_state=42).fit(X_train, y_train) 10gb_pred = gb.predict(X_test) 11print("Gradient Boosting Accuracy:", accuracy_score(y_test, gb_pred))
- Random Forest Classifier Accuracy: 0.9707602339181286
- Gradient Boosting Classifier Accuracy: 0.9590643274853801
These comparisons illuminate the balanced performance of Logistic Regression across all metrics, overtaking the ensemble techniques in our scenario.
After optimizing and evaluating our models, the crucial step is to select the best-performing model. This decision is made not solely on accuracy but considering a holistic view of performance metrics.
Given the comprehensive analysis, the optimized Logistic Regression emerges as the preferred choice. Adopting it as our final model, we conduct an ultimate evaluation:
Python1# Final Model Selection and Evaluation 2# Based on the accuracy and classification report, choose the best model 3final_model = clf_lr.best_estimator_ 4final_predictions = final_model.predict(X_test) 5print("Final Model Accuracy:", accuracy_score(y_test, final_predictions)) 6print("Final Model Classification Report:\n", classification_report(y_test, final_predictions))
Plain text1Final Model Accuracy: 0.9824561403508771 2Final Model Classification Report: 3 precision recall f1-score support 4 5 0 0.97 0.98 0.98 63 6 1 0.99 0.98 0.99 108 7 8 accuracy 0.98 171 9 macro avg 0.98 0.98 0.98 171 10weighted avg 0.98 0.98 0.98 171
Given its superior precision, recall, and f1-score alongside outstanding accuracy, our final model choice is justified.
The process of model selection post-optimization is pivotal, requiring a careful balance between different performance metrics. Our effort with Logistic Regression demonstrates that meticulously tuning and evaluating models can significantly enhance their predictive capabilities. Remember, the essence of machine learning lies in experimenting, evaluating, and selecting the best model tailored to the specific needs of the dataset at hand.
Wow another course almost done! You are getting so good at this! Let's do some more practices and call it complete.