Mastering Text Classification with Simple RNNs in TensorFlow

Lesson 6

Introduction to Recurrent Neural Networks

Hello again! In today's lesson, we'll delve into the fascinating world of Recurrent Neural Networks (RNNs) and explore their application in text classification. Whether you are new to this concept or have some familiarity with it from your Natural Language Processing (NLP) journey, you'll appreciate the unique capabilities of RNNs in handling sequential data, such as text or time series.

RNNs are distinctive because they have a form of memory. They retain the output of a layer and feed it back into the input to assist in predicting the layer's outcome. To understand this better, think of how we read a novel: we don't start from scratch on each new page but build our comprehension based on all the previous pages. Similarly, RNNs remember everything they've processed up to a given point, using this information to generate current output.

Due to their ability to capture temporal dependencies in sequences, RNNs excel in NLP tasks. They leverage past information to understand context more effectively, making them ideal for language modeling, translation, sentiment analysis, and our focus for today — text classification.

Pre-processing for Text Classification

Before we proceed, it's crucial to recall the pre-processing steps performed on our data:

Python
1import tensorflow as tf
2from tensorflow.keras.preprocessing.sequence import pad_sequences
3from tensorflow.keras.preprocessing.text import Tokenizer
4from sklearn.preprocessing import LabelEncoder
5from sklearn.model_selection import train_test_split
6from nltk.corpus import reuters 
7import numpy as np
8import nltk
9
10nltk.download('reuters', quiet=True)
11
12categories = reuters.categories()[:2]
13documents = reuters.fileids(categories)
14
15text_data = [" ".join([word for word in reuters.words(fileid)]) for fileid in documents]
16categories_data = [reuters.categories(fileid)[0] for fileid in documents]
17
18tokenizer = Tokenizer(num_words=100, oov_token="<OOV>")
19tokenizer.fit_on_texts(text_data)
20sequences = tokenizer.texts_to_sequences(text_data)
21X = pad_sequences(sequences, padding='post', maxlen=50)
22
23y = LabelEncoder().fit_transform(categories_data)
24
25X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

In this pre-processing step, we have transformed our text data into sequences of integers, where each integer represents a word token. We used the Tokenizer to convert text to sequences and pad_sequences to ensure that all sequences are of a uniform length. The parameter maxlen=50 in pad_sequences specifies that we only want to keep the first 50 tokens for each sequence. If the number of tokens in a sequence is less than 50, we pad the sequence with zeros at the end (due to padding='post') to reach a length of 50. This uniformity in sequence length is necessary because neural networks require inputs of the same dimensions. In our RNN model, this means each input sequence will be exactly 50 tokens long, ensuring compatibility with the model's architecture and simplifying the learning process. The decision on the sequence length impacts model performance and computational efficiency, with maxlen=50 chosen based on dataset characteristics or empirical evidence for text classification tasks.

This careful pre-processing of text data ensures our RNN model receives inputs in a compatible and meaningful format, allowing it to learn effectively from the textual information presented.

Building and Training a Simple RNN Model Using TensorFlow

Armed with an understanding of RNNs, it's time to build and train a simple RNN model with TensorFlow.

We create a Sequential model, comprising an Embedding layer, a SimpleRNN layer, and a Dense layer. The Embedding layer transforms our numerical tokens into fixed-size vectors.

The SimpleRNN layer acts as our model's heart, leveraging the previous layer's output to harness temporal relationships. In our case, we use tf.keras.layers.SimpleRNN(16), where 16 refers to the number of units or neurons in the RNN layer. This parameter is crucial as it defines the dimensionality of the output space and significantly shapes the model's capacity to learn from sequential data. Additional noteworthy arguments for the SimpleRNN layer, although not explicitly specified in our model, include activation, which determines the activation function (default is tanh), and return_sequences, a boolean that specifies whether to return the last output in the output sequence or the full sequence.

Lastly, the Dense layer processes the RNN's output, employing a 'softmax' activation function suitable for our multi-class classification challenge.

After defining the model, we immediately compile and train it to learn from our dataset:

Python
1model = tf.keras.Sequential([
2    tf.keras.layers.Embedding(input_dim=100, output_dim=8),
3    tf.keras.layers.SimpleRNN(16),
4    tf.keras.layers.Dense(len(categories), activation='softmax')
5])
6
7model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
8
9model.fit(X_train, y_train, epochs=1, validation_data=(X_test, y_test), batch_size=64)

The training process indicates a gradual improvement in accuracy and a decrease in loss, demonstrating our model's learning journey:


1 1/27 - accuracy: 0.0469 - loss: 0.8066
2...
327/27 - accuracy: 0.6404 - loss: 0.6420 - val_accuracy: 0.9657 - val_loss: 0.2967

Exploring the Model's Details

After training, let's examine our model's architecture and parameters with model.summary():

Python
1model.summary()

This reveals the structure and parameters of our RNN model:


1Model: "sequential"
2┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
3┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
4┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
5│ embedding (Embedding)           │ (None, 50, 8)          │           800 │
6├─────────────────────────────────┼────────────────────────┼───────────────┤
7│ simple_rnn (SimpleRNN)          │ (None, 16)             │           400 │
8├─────────────────────────────────┼────────────────────────┼───────────────┤
9│ dense (Dense)                   │ (None, 2)              │            34 │
10└─────────────────────────────────┴────────────────────────┴───────────────┘
11 Total params: 3,704 (14.47 KB)
12 Trainable params: 1,234 (4.82 KB)
13 Non-trainable params: 0 (0.00 B)
14 Optimizer params: 2,470 (9.65 KB)

Evaluating the Model

After understanding our model's architecture, we evaluate its performance on unseen data (X_test, y_test) to gauge its effectiveness:

Python
1loss, accuracy = model.evaluate(X_test, y_test)
2
3print(f"Loss: {loss:.4f}")
4print(f"Accuracy: {accuracy:.4f}")

The output will be:

Plain text
1Loss: 0.3833893835544586
2Accuracy: 0.9700000286102295

This step culminates our exploration into text classification with RNNs, illustrating the model's potential by returning insightful metrics on its performance.

Conclusion and Next Steps

By walking through the construction, training, and evaluation of a Simple RNN for text classification, you've gained a practical insight into harnessing the power of RNNs within TensorFlow for NLP tasks. Understanding how to leverage past information in sequential data opens up numerous avenues for effective text analysis.

To solidify your comprehension, proceed to the practice exercises in the next section. These exercises are tailored to challenge and expand your understanding further. Happy learning!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.