Welcome back! Today, we are focusing on solving complex algorithmic problems using Python dictionaries — the data structure that enables us to store data in key-value pairs. This powerful data structure facilitates lightning-fast retrieval and insertion operations, playing a pivotal role in algorithm efficiency. We will be tackling two carefully selected problems designed to highlight the practical uses of dictionaries: "Majority Vote Problem" and "Implement a Keyword Index". These problems serve to illustrate real-world applications of dictionaries, helping us understand them in a deeper, more meaningful way. Let's dive in!
Our first problem is about identifying the "majority" element in a list. The "majority element" in a list is an element that appears more than n / 2
times. Given a list of integers, our aim is to identify the majority element.
This problem could arise on numerous occasions. Imagine running a survey where each participant selects a number from 1
to n
to rate a product. After the survey, you want to find out if there is a feature that received more than n / 2
votes. Or, consider an internet voting system for an online event. You may need to identify if a candidate is leading by more than half the total votes.
One naive approach is for each element, iterate over the entire list, and count the number of occurrences. However, this approach could lead to $O(N^2)$ time complexity, which is not suitable for a large dataset, where $N$ is the size of the list.
A more efficient approach is to use a Python dictionary to count the occurrences of each element in the list. If the count of any element exceeds n / 2
at any point during our iteration, we immediately return that element as the "majority element". If no such element is found, we return -1
after we've iterated through all elements.
Let's break down the solution step-by-step:
First, let's set up an empty dictionary count_dict
:
Python1count_dict = {}
This dictionary will help us keep track of the count or occurrences of each element in our list.
We iterate through the elements of listA
. For each element, we increment its count in count_dict
. If the count exceeds n / 2
, we return that element:
Python1for element in listA: 2 count_dict[element] = count_dict.get(element, 0) + 1 3 if count_dict[element] > len(listA) // 2: 4 return element
Here, the method dict.get(key, default)
returns the value for a key if it exists in the dictionary. If not, it returns the default value.
If no majority element is found after the full iteration, we return -1
:
Python1return -1
Such a default value is a common approach in problems where the answer may not exist.
Python1def solution(listA): 2 count_dict = {} 3 for element in listA: 4 count_dict[element] = count_dict.get(element, 0) + 1 5 if count_dict[element] > len(listA) // 2: 6 return element 7 return -1
On a separate note, do you think we can apply defaultdict
here? How will the code change in that case?
Congratulations! You have successfully optimized the solution to the majority vote problem using Python dictionaries.
Moving on from counting pair frequencies, we arrive at our second problem: creating a keyword index. In this situation, we are given a list of strings, with each string representing a document. Our task is to generate an index of all the distinct words in the documents for quick reference. We need to create a dictionary where each unique word is a key, and the corresponding value is a list of indices pointing to the documents where the word can be found.
Imagine you are building a search feature for an app, and you need a function that quickly retrieves all documents where a certain keyword is present. If you have ever searched for something using your browser's Ctrl+F
or Command+F
function, you have utilized such an index! Our task is to create a dictionary that behaves like such an index, mapping a keyword to all the documents where it can be found.
A naive solution to this problem could involve iterating through the entire list for each word storing document indices for each word. While this approach does yield the outcome we are looking for, it is inefficient with a time complexity of $O(N^2)$.
Again, dictionaries provide an efficient solution. As we pass through each document, we update our dictionary — each word becomes a key, and every time we encounter it, we append the current document's index to the list of values for that key. This approach requires only linear time in the total number of words in all documents — a significant improvement on the previous method.
Let's work through the solution. We first initialize an empty dictionary index
. Then, we start looping through our list of documents. For each document we split each document into words, and for each word, we check if it is already in our dictionary index
. If it's in the dictionary, we append the current document's index to the list of indices for that word. If the word is not in the dictionary, then it's the first time we've seen this word, and we add it to the index with a list that contains the current document's index as the value. Here's how to do it:
Python1def keyword_index(docs): 2 index = {} 3 for doc_idx, doc in enumerate(docs): 4 for word in doc.split(): 5 if word in index: 6 index[word].append(doc_idx) 7 else: 8 index[word] = [doc_idx] 9 return index
As evident from the function, the keyword_index
function runs in $O(N)$ time, achieving the task efficiently with the help of dictionaries.
In this lesson, we have taken a hands-on approach to solving practical problems using dictionaries in Python. Through these problems, we have learned how dictionaries can be used to efficiently track frequencies and generate indexes — two tasks that are widely used in real-world applications.
To recap, we have covered:
Now that we have covered the theory and seen these concepts in action, it's time to get some hands-on experience! The upcoming practice exercises will help you understand dictionaries better. Apply these techniques to solve complex problems and enhance your skill set. Let's go!