Greetings and welcome! In today's lesson, we explore a crucial and yet often overlooked topic: Bridging TensorFlow and Scikit-Learn through Keras Wrappers. Despite TensorFlow's powerful capabilities for machine learning, it doesn't offer the extensive utilities for model selection and evaluation that Scikit-learn provides. Tools such as cross-validation, grid search, and various metrics are invaluable. But how can we use them with TensorFlow models? The answer lies in Wrappers, specifically the KerasClassifier
Wrapper, which will be our primary focus today.
KerasClassifier
is a wrapper designed to bridge the functionality of TensorFlow's Keras deep learning module with Scikit-learn's powerful tools for model selection and evaluation. Keras, an integral part of TensorFlow, is an open-source software library that provides a Python interface for artificial neural networks. For newer versions of TensorFlow, these wrappers are included in a separate library called scikeras
, which specifically facilitates the integration between TensorFlow and Scikit-learn. To understand it better, let's dive into the implementation:
The first step is to define a build function thats return a model to the wrapper, this model is suited for the Iris dataset we have been using throughout this course:
Python1import tensorflow as tf 2 3def build_model(): 4 model = tf.keras.Sequential([ 5 tf.keras.layers.Input(shape=(4,)), 6 tf.keras.layers.Dense(16, activation='relu'), 7 tf.keras.layers.Dense(3, activation='softmax') 8 ]) 9 model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) 10 return model
Next, we import the module and create an instance of KerasClassifier
. It requires a function that constructs and compiles a model, which will be the value of the model
parameter. Additional arguments can be set that are passed to fit
when called like epochs
and verbose
.
Python1from scikeras.wrappers import KerasClassifier 2 3model = KerasClassifier(model=build_model, epochs=30, verbose=0)
In this instance, we use the function build_model
we defined earlier. epochs
is provided a value of 30, representing the number of iterations over the dataset to train the model, and verbose
is set to 0, meaning no output is shown during training.
One of the many powerful utilities Scikit-learn provides is K-Fold Cross-validation. It's a resampling procedure used to evaluate machine learning models on a limited data sample. The primary purpose is to divide the entire dataset into 'K' balanced folds or subsets. After that, the model is trained on K-1 folds, and the remaining fold is used as the test fold. This process is repeated K times, each time with a different test set.
Now, we will apply K-Fold Cross-validation to our model using the Scikit-learn's cross_val_score
function. For this, we will load the preprocessed Iris dataset using a function implemented in another file named data_preprocessing.py
we are importing:
Python1from data_preprocessing import load_preprocessed_data 2from sklearn.model_selection import cross_val_score, KFold 3 4# Load preprocessed Iris dataset 5X_train, X_test, y_train, y_test = load_preprocessed_data() 6 7# Define KFold 8kfold = KFold(n_splits=5) 9 10# Compute the accuracy scores 11scores = cross_val_score(model, X_train, y_train, cv=kfold)
In this example, we first create a KFold
object kfold with 5 splits. This object is then passed as the cv
parameter to the cross_val_score
function that computes the cross-validation scores for our model using the training data and the specified K-Fold cross-validator.
By default, cross_val_score
returns accuracy scores for classification models. Our model works fine with Scikit-learn because the KerasClassifier
wrapper ensures the TensorFlow Keras model conforms to the Scikit-learn estimator interface, allowing seamless integration for tasks such as cross-validation, hyperparameter tuning, and other model evaluation strategies.
Cross-validation scores offer a good measure of how well our model will perform on unseen data. It helps reduce overfitting by testing the model on different subsets of the data. In this case, the scores represent the accuracy of the model on each fold. After carrying out cross-validation using cross_val_score
, it's time to print the scores:
Python1print("Cross-validation scores:\n", scores)
The output will be:
Plain text1Cross-validation scores: 2 [0.80952381 0.85714286 0.61904762 0.66666667 0.57142857]
This output indicates differing performance across different folds, reflecting the model's ability to generalize across different subsets of the Iris dataset. The variation in scores suggests that further tuning or model enhancements could lead to improvement in overall performance.
Great work! You now understand how to bridge TensorFlow and Scikit-learn's functionalities through wrappers such as KerasClassifier
. Comprehend the vital role this strategy plays in machine learning workflows, especially in model selection and evaluation, where Scikit-learn excels.
To master these techniques, practice is crucial. That's exactly what we are going to do next, with a series of coding exercises designed to solidify your understanding and skills. Happy coding!