Welcome to our focused exploration of C++'s std::unordered_set
and its remarkable applications in solving algorithmic challenges. In this lesson, "Mastering Unique Elements and Anagram Detection with C++ std::unordered_set," we'll delve into how this efficient data structure can be leveraged to address and solve various types of problems commonly encountered in technical interviews.
Picture this: you're given a vast list of words, and you must identify the final word that stands proudly solitary — the last word that is not repeated. Imagine sorting through a database of unique identifiers and finding one identifier towards the end of the list that is unlike any other.
The straightforward approach would be to examine each word in reverse, comparing it to every other word for uniqueness. This brute-force method would result in poor time complexity, O(n^2)
, which is less than ideal for large datasets.
Here is the naive approach in C++:
C++1#include <vector> 2#include <string> 3 4std::string FindLastUniqueWordNaive(const std::vector<std::string>& words) 5{ 6 for (int i = words.size() - 1; i >= 0; i--) 7 { 8 bool isUnique = true; 9 for (int j = 0; j < words.size(); j++) 10 { 11 if (i != j && words[i] == words[j]) 12 { 13 isUnique = false; 14 break; 15 } 16 } 17 18 if (isUnique) 19 { 20 return words[i]; 21 } 22 } 23 24 return ""; // In case no unique word is found 25}
As you might notice, the naive solution checks each word against every other word, leading to a time complexity of O(n^2)
due to the nested loops. For each word, you perform n comparisons with the remaining words in the list.
We can utilize two std::unordered_set<std::string>
instances: wordsSet
to maintain unique words and duplicatesSet
to keep track of duplicate words. By the end, we can remove all duplicated words from wordsSet
to achieve our goal. Here is how to use std::unordered_set
to solve the problem in C++:
Create a std::unordered_set
instance to store unique words:
C++1#include <unordered_set> 2 3std::unordered_set<std::string> wordsSet;
Initialize another std::unordered_set
to monitor duplicates:
C++1std::unordered_set<std::string> duplicatesSet;
Iterate through the word list, filling wordsSet
and duplicatesSet
:
C++1for (const auto& word : words) 2{ 3 if (wordsSet.find(word) != wordsSet.end()) 4 { 5 duplicatesSet.insert(word); 6 } 7 else 8 { 9 wordsSet.insert(word); 10 } 11}
Manually remove all duplicated words from wordsSet
:
C++1for (const auto& dup : duplicatesSet) 2{ 3 wordsSet.erase(dup); 4}
Now, wordsSet
only contains unique words. Find the last unique word by iterating through the original word list from the end:
C++1std::string lastUniqueWord = ""; 2for (int i = words.size() - 1; i >= 0; i--) 3{ 4 if (wordsSet.find(words[i]) != wordsSet.end()) 5 { 6 lastUniqueWord = words[i]; 7 break; 8 } 9}
And finally, return the last unique word:
C++1std::string FindLastUniqueWordEfficient(const std::vector<std::string>& words) 2{ 3 std::unordered_set<std::string> wordsSet; 4 std::unordered_set<std::string> duplicatesSet; 5 6 for (const auto& word : words) 7 { 8 if (wordsSet.find(word) != wordsSet.end()) 9 { 10 duplicatesSet.insert(word); 11 } 12 else 13 { 14 wordsSet.insert(word); 15 } 16 } 17 18 for (const auto& dup : duplicatesSet) 19 { 20 wordsSet.erase(dup); 21 } 22 23 std::string lastUniqueWord = ""; 24 for (int i = words.size() - 1; i >= 0; i--) 25 { 26 if (wordsSet.find(words[i]) != wordsSet.end()) 27 { 28 lastUniqueWord = words[i]; 29 break; 30 } 31 } 32 33 return lastUniqueWord; 34}
This efficient approach, with a time complexity close to O(n)
, is far superior to the naive method and showcases your proficiency in solving algorithmic problems with C++'s std::unordered_set
.
Now, imagine a different scenario in which you have two arrays of strings, and your task is to find all the unique words from the first array that have an anagram in the second array. An anagram is a word or phrase formed by rearranging the letters of another word or phrase, such as forming "listen" from "silent."
A basic approach would involve checking every word in the first array against every word in the second array by generating and comparing their sorted character strings. Here is the naive approach in C++, where we use a a std::unordered_set
to eliminate duplicate anagrams in the result:
C++1#include <vector> 2#include <string> 3#include <algorithm> 4#include <set> 5 6std::string SortCharacters(const std::string& input) 7{ 8 std::string sorted = input; 9 std::sort(sorted.begin(), sorted.end()); 10 return sorted; 11} 12 13std::vector<std::string> FindAnagramsNaive(const std::vector<std::string>& array1, const std::vector<std::string>& array2) 14{ 15 std::unordered_set<std::string> result; 16 17 for (const auto& word1 : array1) 18 { 19 for (const auto& word2 : array2) 20 { 21 if (SortCharacters(word1) == SortCharacters(word2)) 22 { 23 result.insert(word1); 24 break; 25 } 26 } 27 } 28 29 return std::vector<std::string>(result.begin(), result.end()); // Ensure unique matches 30}
In the naive anagram matcher, every word in array1
is compared with every word in array2
. Sorting the characters of each word takes O(m log m)
, where m
is the word length, and performing this comparison for all n
words in array1
and n
words in array2
results in O(n^2 * m log m)
. This complexity quickly becomes infeasible for large datasets.
By sorting the characters of each word, we can generate a 'signature' that uniquely identifies anagrams. For example, 'listen' and 'silent' both have the signature 'eilnst' after sorting. This allows us to compare these signatures instead of comparing the words themselves, making it much easier and faster to identify anagrams. We'll use a dictionary-like structure to store signatures for efficient access.
Let's decompose the anagram matcher:
Construct a method to create sorted character signatures from the input string:
C++1std::string SortCharacters(const std::string& input) 2{ 3 std::string sorted = input; 4 std::sort(sorted.begin(), sorted.end()); 5 return sorted; 6}
Store these sorted characters from array2
in a std::unordered_set
for fast lookup:
C++1std::unordered_set<std::string> sortedWordsInArray2; 2for (const auto& word : array2) 3{ 4 sortedWordsInArray2.insert(SortCharacters(word)); 5}
For each word in array1
, check for its sorted signature in the std::unordered_set
and track the found anagrams:
C++1std::unordered_set<std::string> anagramsMatched; 2std::vector<std::string> result; 3for (const auto& word : array1) 4{ 5 if (sortedWordsInArray2.find(SortCharacters(word)) != sortedWordsInArray2.end()) 6 { 7 if (anagramsMatched.find(word) == anagramsMatched.end()) 8 { 9 result.push_back(word); 10 anagramsMatched.insert(word); 11 } 12 } 13}
The std::vector<std::string>
result
stores the matches, ensuring that we return unique anagrams, while the std::unordered_set<std::string>
anagramsMatched
prevents duplication in our result
.
Our final step is to return the list of anagrams found:
C++1std::vector<std::string> FindAnagramsEfficient(const std::vector<std::string>& array1, const std::vector<std::string>& array2) 2{ 3 std::unordered_set<std::string> sortedWordsInArray2; 4 for (const auto& word : array2) 5 { 6 sortedWordsInArray2.insert(SortCharacters(word)); 7 } 8 9 std::unordered_set<std::string> anagramsMatched; 10 std::vector<std::string> result; 11 for (const auto& word : array1) 12 { 13 if (sortedWordsInArray2.find(SortCharacters(word)) != sortedWordsInArray2.end() && anagramsMatched.find(word) == anagramsMatched.end()) 14 { 15 result.push_back(word); 16 anagramsMatched.insert(word); 17 } 18 } 19 20 return result; 21}
By utilizing std::unordered_set
in this manner, we achieve efficient anagram checking with reduced complexity, considering both the O(m log m)
character sorting for each word and the O(n)
comparison for n
words.
In this lesson, we have utilized C++'s std::unordered_set
to improve the efficiency of solving the "Unique Echo" and "Anagram Matcher" problems. These strategies help us manage complexity by leveraging the constant-time performance of std::unordered_set
operations and maintaining the ability to efficiently manage unique collections. This steers us away from less efficient methods and aligns us with the standards expected in technical interviews. As we progress, you'll encounter hands-on practice problems that will test your ability to apply these concepts. Through nuanced algorithmic practice with std::unordered_set
, you'll refine your skills and deepen your understanding of their computational benefits.