Feature Engineering for Text Classification
Dive deeper into the transformation of raw text data into features that machine learning models can understand. Through a practical, hands-on approach, you'll learn everything from tokenization, generating Bag-of-Words and TF-IDF representations, to handling sparse features and applying Dimensionality Reduction techniques.
Lessons and practices
Filter Punctuation from Tokenized Review
Filtering Word Tokens from a Sentence
Completing Code for Data Loading and Tokenizing
Tokenizing and Filtering a Movie Review
Tokenizing First Review and Printing Tokens
Customizing Bag-of-Words Representation
Applying CountVectorizer on Sentences
Bag-of-Words Transformation on IMDB Reviews Dataset
Creating Bag-of-Words Representation Yourself
Turn Rich Text into Bag-of-Words Representation
Change TF-IDF Vector for Different Sentence
Implementing TF-IDF Vectorizer on Provided Text
Understanding Sparse Matrix Components
Applying TF-IDF Vectorizer On Reviews Dataset
Implementing TF-IDF Vectorizer from Scratch
Switching from CSC to CSR Representation
Creating a Coordinate Format Matrix with Duplicates
Performing Vectorized Operations on Sparse Matrices
Creating CSR Matrix from Larger Array
Performing Subtraction Operation on Sparse Matrix
Change TruncatedSVD Components Number
Implement Dimensionality Reduction with TruncatedSVD
Applying TruncatedSVD on Bag-of-Words Matrix
Implement TruncatedSVD on Bag-of-Words Matrix
Implementing TruncatedSVD on IMDB Movie Reviews Dataset
Interested in this course? Learn and practice with Cosmo!
Practice is how you turn knowledge into actual skills.