Greetings Machine Learning enthusiast! Let's take a deep dive into the fascinating realm of neural networks. Specifically, we'll focus on constructing a neural network using the Pythonic TensorFlow
library.
Neural networks attract global interest due to their capabilities of learning patterns from vast volumes of complex data, thus mimicking the learning ethos of the human brain. They have broken the barriers of conventional computing and have ushered in a new era where machines can understand complex tasks that were once exclusive to human intellect.
In today's tutorial, we'll explore how to build these intelligent systems by leveraging TensorFlow
's robust functionalities. Our goal is to give you a behind-the-scenes look at the inner workings of these systems.
TensorFlow
, an open-source library developed by the Google Brain Team, serves as a powerful tool for numerical computations, making it a popular choice for large-scale machine learning.
Let's familiarize ourselves with the triarchy of TensorFlow
's architecture:
Tensors: These are essentially multi-dimensional arrays with a standardized type and act as the heart of TensorFlow
.
Computation Graphs: TensorFlow
operates using 'lazy execution.' It first designs a computational graph representing various tensor operations, which are then executed in sessions.
Sessions: Within the realm of TensorFlow
, computations don't manifest instantaneously. They run within a scope called a session.
TensorFlow
provides numerous utilities and methods to design, train, and execute neural networks, making it a powerful asset in any neural network project. It manages the low-level nuances, allowing you to concentrate more on improving your model.
Before we dive into creating a neural network, let's understand its structure. Neural networks consist of a compilation of interconnected artificial neurons or "nodes." They typically comprise three types of layers:
I1, I2, I3
).O1, O2
).Each layer plays a significant role, similar to neurons in our brain. When neurons in our brains receive inputs, they process them and generate outputs. This exact mechanism occurs in a neural network. The number of input nodes in the input layer depends on the type of data we are working with and the number of output nodes in the output layer depends on the type of prediction we are making.
As far as the number of hidden layers and shape, that's where the art of Machine Learning comes in. There is no one best way to construct a neural network though there are best practices. It comes down to getting a deep understanding of the dataset and continuously experimenting.
Next, we have "activation functions" — one of the most critical aspects of neural networks. An activation function decides whether a neuron contributes to the next layer based on its input.
In the next section you'll see an example where, relu
and softmax
are activation functions. relu
allows positive values to pass while replacing negative values with zeros. On the other hand, softmax
transforms a list of numbers into a probability distribution.
Next time we'll go a little deeper into the mathematical representation of neural networks but for now let's see how to create them in TensorFlow.
Let's start by defining a neural network using TensorFlow
. We'll utilize TensorFlow
's Keras Sequential API to create a Sequential model. This model allows for the easy construction of a neural network, where each layer connects to precisely one input tensor and one output tensor.
Consider the following:
Python1# Import the necessary libraries 2from sklearn import datasets 3from sklearn.model_selection import train_test_split 4from sklearn.preprocessing import StandardScaler 5from tensorflow.keras.utils import to_categorical 6import tensorflow as tf 7 8# Load the Digits dataset 9digits = datasets.load_digits() 10 11# Split the data into features and target labels 12X = digits.images 13y = digits.target 14 15# Flatten the images 16n_samples = len(X) 17X = X.reshape((n_samples, -1)) 18 19# Normalize the data 20scaler = StandardScaler() 21X_scaled = scaler.fit_transform(X) 22 23# Convert labels to one-hot encoding 24y_categorical = to_categorical(y) 25 26# Split the dataset into training and testing sets 27X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_categorical, test_size=0.3, random_state=42) 28 29# Define the model 30model = tf.keras.models.Sequential([ 31 tf.keras.layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)), 32 tf.keras.layers.Dense(64, activation='relu'), 33 tf.keras.layers.Dense(10, activation='softmax') # 10 classes for digits 0-9 34])
tf.keras.models.Sequential()
: This initiates our network model as a Sequential model.tf.keras.layers.Dense()
: This creates a densely connected layer, meaning all neurons in a dense layer connect to all outputs in the previous layer.64
: This number represents the number of neurons in the layer.activation='relu'
: relu
stands for "Rectified Linear Activation." It's a simple function that allows positive values to pass through, while it turns negative values into zero.input_shape=(X_train.shape[1],)
: This specifies the shape of the input that the model will receive. This must be specified for the first layer in your Sequential model.tf.keras.layers.Dense(10, activation='softmax')
: Our final layer is designed with 10 neurons, one each for interpreting digits 0-9. Here, we use the 'softmax' activation function to predict the digit's probability distribution.Given how large this network architecture is, it's hard to visualize it, but worth a try. In the image below you can see the 3 layers that are 64 neuron wide (including the input layer) and the output layer with 10 neurons. Because these are dense layers, we have 64*64 = 4096
edges connecting them so each individual connection is hard to see.
Let's further examine our model structure using the summary()
method. This provides a quick overview of our model's architecture, along with a summary of the layers, output shapes, and the number of parameters.
Python1model.summary()
output
Python1Model: "sequential" 2_________________________________________________________________ 3 Layer (type) Output Shape Param # 4================================================================= 5 dense (Dense) (None, 64) 4160 6 7 dense_1 (Dense) (None, 64) 4160 8 9 dense_2 (Dense) (None, 10) 650 10 11================================================================= 12Total params: 8970 (35.04 KB) 13Trainable params: 8970 (35.04 KB) 14Non-trainable params: 0 (0.00 Byte) 15_________________________________________________________________
The model.summary()
method provides detailed model information, notably the param #
in the output. It reflects the total number of trainable parameters in the model. You might notice we said there are 4096
edges between the neurons but here we see 4160
trainable parameters between the deep layers, where is the difference coming from? The difference is coming from bias
term associated with each neuron. More about weights
and biases
in the next lesson!
Congratulations on making it this far in this insightful lesson on Neural Networks with TensorFlow
! Today, we reviewed the fundamentals of TensorFlow
, delved into the structure of neural networks, and built a deep learning model using TensorFlow
.
By now, you have learned about:
TensorFlow
TensorFlow
Now, it's time to solidify your newfound understanding with some hands-on practice exercises. These exercises invoke active learning and will establish a strong foundation for advanced neural network concepts. Good luck, and enjoy your journey with neural networks!