Text Data Preprocessing in Python

Learn to clean and prepare textual data for machine learning models using Python. This course teaches you to apply basic preprocessing tasks such as text lowercasing, removing stopwords, tokenization, and stemming on the SMS Spam Collection dataset. By the end of this course, you’ll have the skills to transform raw text into a format that's ready for NLP tasks.

Lessons and practices

Lesson 1: Lowercasing Text for Uniformity in NLP

Introduction to Lowercase Text Conversion

Lowercasing Spam Dataset Messages

Transforming Text to Lowercase for Data Uniformity

Mastering Text Lowercasing in Python

Lesson 2: Punctuating Punctuation: Streamlining Text for NLP

Removing Text Punctuation Simplified

Removing Commas from Text

Debugging Punctuation Removal Exercise

Crafting Clean Text Data

Lesson 3: Tokenizing Text Data in NLP with Python and NLTK