Mastering Unique Elements and Anagram Detection with C# HashSet

Lesson 3

Introduction

Welcome to our focused exploration of C#'s HashSet and its remarkable applications in solving algorithmic challenges. In this lesson, "Mastering Unique Elements and Anagram Detection with C# HashSet," we'll delve into how this efficient data structure can be leveraged to address and solve various types of problems commonly encountered in technical interviews.

Problem 1: Unique Echo

Picture this: you're given a vast list of words, and you must identify the final word that stands proudly solitary — the last word that is not repeated. Imagine sorting through a database of unique identifiers and finding one identifier towards the end of the list that is unlike any other.

Naive Approach

The straightforward approach would be to examine each word in reverse, comparing it to every other word for uniqueness. This brute-force method would result in poor time complexity, O(n^2), which is less than ideal for large datasets.

Here is the naive approach in C#:

C#
1string FindLastUniqueWordNaive(string[] words)
2{
3    for (int i = words.Length - 1; i >= 0; i--)
4    {
5        bool isUnique = true;
6        for (int j = 0; j < words.Length; j++)
7        {
8            if (i != j && words[i] == words[j])
9            {
10                isUnique = false;
11                break;
12            }
13        }
14        
15        if (isUnique)
16        {
17            return words[i];
18        }
19    }
20
21    return null; // In case no unique word is found
22}

Efficient Approach

We can utilize two HashSet<string> instances: wordsSet to maintain unique words and duplicatesSet to keep track of duplicate words. By the end, we can remove all duplicated words from wordsSet to achieve our goal. Here is how to use HashSet to solve the problem in C#:

Create a HashSet instance to store unique words:

C#
1HashSet<string> wordsSet = new HashSet<string>();

Initialize another HashSet to monitor duplicates:

C#
1HashSet<string> duplicatesSet = new HashSet<string>();

Iterate the word array, filling wordsSet and duplicatesSet:

C#
1foreach (string word in words)
2{
3    if (wordsSet.Contains(word))
4    {
5        duplicatesSet.Add(word);
6    }
7    else
8    {
9        wordsSet.Add(word);
10    }
11}

Use the ExceptWith method from the HashSet API to remove all duplicated words from wordsSet:

C#
1wordsSet.ExceptWith(duplicatesSet);

Now, wordsSet only contains unique words. Find the last unique word by iterating through the original word list from the end:

C#
1string lastUniqueWord = "";
2for (int i = words.Length - 1; i >= 0; i--)
3{
4   if (wordsSet.Contains(words[i]))
5   {
6       lastUniqueWord = words[i];
7       break;
8   }
9}

And finally, return the last unique word:

C#
1string FindLastUniqueWordEfficient(string[] words)
2{
3    HashSet<string> wordsSet = new HashSet<string>();
4    HashSet<string> duplicatesSet = new HashSet<string>();
5
6    foreach (string word in words)
7    {
8        if (wordsSet.Contains(word))
9        {
10            duplicatesSet.Add(word);
11        }
12        else
13        {
14            wordsSet.Add(word);
15        }
16    }
17
18    wordsSet.ExceptWith(duplicatesSet);
19
20    string lastUniqueWord = "";
21    for (int i = words.Length - 1; i >= 0; i--)
22    {
23       if (wordsSet.Contains(words[i]))
24       {
25           lastUniqueWord = words[i];
26           break;
27       }
28    }
29
30    return lastUniqueWord;
31}

This efficient approach, with a time complexity close to O(n), is far superior to the naive method and showcases your proficiency in solving algorithmic problems with C#'s HashSet.

Problem 2: Anagram Matcher

Now, imagine a different scenario in which you have two arrays of strings, and your task is to find all the unique words from the first array that have an anagram in the second array. An anagram is a word or phrase formed by rearranging the letters of another word or phrase, such as forming "listen" from "silent".

Naive Approach

A basic approach would involve checking every word in the first array against every word in the second array by generating and comparing their sorted character strings. This results in an O(n^2) time complexity due to the pairwise comparison of words.

Here is the naive approach in C#:

C#
1List<string> FindAnagramsNaive(string[] array1, string[] array2)
2{
3    List<string> result = new List<string>();
4
5    foreach (string word1 in array1)
6    {
7        foreach (string word2 in array2)
8        {
9            if (SortCharacters(word1) == SortCharacters(word2))
10            {
11                result.Add(word1);
12                break;
13            }
14        }
15    }
16
17    return result.Distinct().ToList(); // Ensure unique matches
18}

In this naive implementation, the Distinct method is used to eliminate duplicate anagrams in the result.

Efficient Approach

We'll create a unique signature for each word by sorting its characters and then compare these signatures for matches. We'll use a dictionary to store signatures for efficient access.

Let's decompose the anagram matcher:

Construct a method to create sorted character signatures from the input string:

C#
1private static string SortCharacters(string input)
2{
3    char[] chars = input.ToCharArray();
4    Array.Sort(chars);
5    return new string(chars);
6}

Store these sorted characters from array2 in a HashSet for fast lookup:

C#
1HashSet<string> sortedWordsInArray2 = new HashSet<string>();
2foreach (string word in array2)
3{
4    sortedWordsInArray2.Add(SortCharacters(word));
5}

For each word in array1, check for its sorted signature in the HashSet and track the found anagrams:

C#
1HashSet<string> anagramsMatched = new HashSet<string>();
2List<string> result = new List<string>();
3foreach (string word in array1)
4{
5    if (sortedWordsInArray2.Contains(SortCharacters(word)))
6    {
7        if (!anagramsMatched.Contains(word))
8        {
9            result.Add(word);
10            anagramsMatched.Add(word);
11        }
12    }
13}

The List<string> result stores the matches, ensuring that we return unique anagrams, while the HashSet<string> anagramsMatched prevents duplication in our result.

Our final step is to return the list of anagrams found:

C#
1List<string> FindAnagramsEfficient(string[] array1, string[] array2)
2{
3    HashSet<string> sortedWordsInArray2 = new HashSet<string>();
4    foreach (string word in array2)
5    {
6        sortedWordsInArray2.Add(SortCharacters(word));
7    }
8
9    HashSet<string> anagramsMatched = new HashSet<string>();
10    List<string> result = new List<string>();
11    foreach (string word in array1)
12    {
13        if (sortedWordsInArray2.Contains(SortCharacters(word)) && !anagramsMatched.Contains(word))
14        {
15            result.Add(word);
16            anagramsMatched.Add(word);
17        }
18    }
19
20    return result;
21}

By utilizing HashSet in this manner, we achieve efficient anagram checking with reduced complexity, considering both the O(m log m) character sorting for each word and the O(n) comparison for n words.

Lesson Summary

In this lesson, we have utilized C#'s HashSet to improve the efficiency of solving the "Unique Echo" and "Anagram Matcher" problems. These strategies help us manage complexity by leveraging the constant-time performance of HashSet operations and maintaining the ability to efficiently manage unique collections. This steers us away from less efficient methods and aligns us with the standards expected in technical interviews. As we progress, you'll encounter hands-on practice problems, which will test your ability to apply these concepts. Through nuanced algorithmic practice with HashSet, you'll refine your skills and deepen your understanding of their computational benefits.

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.