Understanding and Building Neural Networks for Text Classification

Lesson 5

Introducing Neural Networks for Text Classification

Hello and welcome to today's lesson! We are now moving towards an exciting journey into the field of Neural Networks, significant players in the Natural Language Processing (NLP) arena. Neural Networks implicitly capture the structure of the data, a phenomenon that's of high value in text data, given its sequential nature. Remember how our ensemble models did a good job on the Reuters-21578 Text Categorization Collection? Now, imagine how we can unlock even higher performance by using these powerful models.

Getting Familiar with the Task

Before discussing Neural Networks in detail, let's recall the code we have already executed:

Python
1# Importing libraries
2import tensorflow as tf
3from tensorflow.keras.preprocessing.sequence import pad_sequences
4from tensorflow.keras.preprocessing.text import Tokenizer
5from sklearn.preprocessing import LabelEncoder
6from sklearn.model_selection import train_test_split
7import numpy as np
8import nltk
9from nltk.corpus import reuters
10
11nltk.download('reuters', quiet=True)
12
13# Loading and preparing the Reuters-21578 Text Categorization Collection dataset
14categories = reuters.categories()[:3]
15documents = reuters.fileids(categories)
16text_data = [" ".join([word for word in reuters.words(fileid)]) for fileid in documents]
17categories_data = [reuters.categories(fileid)[0] for fileid in documents]
18
19# Tokenizing and padding sequences
20tokenizer = Tokenizer(num_words=500, oov_token="<OOV>")
21tokenizer.fit_on_texts(text_data)
22sequences = tokenizer.texts_to_sequences(text_data)
23X = pad_sequences(sequences, padding='post')
24
25# Label Encoding
26y = LabelEncoder().fit_transform(categories_data)
27
28# Train-Test Split
29X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

So far, we have preprocessed our text data and transformed it into a format suitable for input into models. We have our train and test datasets ready, which means we are all set to dive into creating our Neural Network model for text classification.

Building the Neural Network Model

When dealing with text data, our neural network usually starts with an Embedding layer. This layer is tasked with converting the tokenized textual data into a dense vector representation which the neural network can understand. The embedding matrix created by this process captures the general understanding of words and their contextual meanings.

Here's our simple, initial neural network model with the embedding layer:

Python
1model = tf.keras.Sequential([
2    tf.keras.layers.Embedding(input_dim=500, output_dim=16),
3])

Notice the parameters we passed to the embedding layer - the input_dim and output_dim. The input_dim is set to 500, the same as the number of words we encoded with our tokenizer. The output_dim sets how many dimensions we want to have in the dense vector representing each word - we set it to 16.

Still, the model is not yet complete. Let's add the next layer.

Introduction to GlobalAveragePooling1D

Next, we will use a pooling layer - GlobalAveragePooling1D. This layer reduces the dimensionality of the model's representation by taking the average of each word vector. This process effectively creates an overall context vector per text sequence, a necessary process before predicting the text category.

Our model with the GlobalAveragePooling1D layer now looks like this:

Python
1model = tf.keras.Sequential([
2    tf.keras.layers.Embedding(input_dim=500, output_dim=16),
3    tf.keras.layers.GlobalAveragePooling1D(),
4])

Compile, Train, and Evaluation the Model

Our last layer is a Dense layer, the output layer, with three units and 'softmax' activation function. The number three here represents the number of our output categories. The 'softmax' activation will ensure the output probabilities of all categories sum up to 1.

Lastly, we compile our model with the loss function 'sparse_categorical_crossentropy', 'adam' optimizer, and 'accuracy' as the metric. We train our model for 10 epochs using our training set and then evaluate it using the test set:

Python
1model = tf.keras.Sequential([
2    tf.keras.layers.Embedding(input_dim=500, output_dim=16),
3    tf.keras.layers.GlobalAveragePooling1D(),
4    tf.keras.layers.Dense(3, activation='softmax')
5])
6
7model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
8model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))
9loss, accuracy = model.evaluate(X_test, y_test)
10
11print(f"Test Loss: {loss}")
12print(f"Test Accuracy: {accuracy}")

The output of the above code will be:

Plain text
1Test Loss: 0.22081851959228516
2Test Accuracy: 0.9556451439857483

This summary indicates successful training of a Neural Network model for text classification on the Reuters dataset, achieving significant accuracy with minimal loss. The use of an embedding layer coupled with GlobalAveragePooling1D and Dense layers allows for effective understanding and categorization of text sequences.

We use sparse_categorical_crossentropy as our loss function because our labels are integers. In multi-class classification tasks where labels are not one-hot encoded (which would require categorical_crossentropy), sparse_categorical_crossentropy allows for a more efficient and straightforward handling of the labels. It expects integers as labels and calculates the loss between the true labels and predicted labels, guiding the model's optimization.

Lesson Summary and Practice

Congratulations on taking a big step in your NLP journey! You've learned how to prepare and use Neural Networks for text classification efficiently. You've gone a long way and it's the perfect time to apply these concepts. In the next set of exercises, you will get to apply these concepts and consolidate your learning. Practice is crucial - it helps us understand the concepts better and gives us the confidence to handle real-world datasets and tasks. Remember, you're just one lesson away from unlocking the power of Simple RNN, which we will cover in our next class. Let's get practicing!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.