Lesson 5
Efficient Set Operations with Sorted Data Structures
Introduction

Greetings, aspiring coders! Today, we're going to delve deep into the complexities of data structures, specifically the SortedList, and explore how to handle queries efficiently. This is a common problem, often encountered in numerous data science and algorithmic problems. So let's gear up to unravel the mysteries of SortedList operations, and get our hands dirty with some interactive problem solving!

SortedList Operations and Time Complexity

Before delving into the task, let's understand what a SortedList is and why we would use it. SortedList is a data structure from the sortedcontainers Python module. As the name suggests, it keeps the data sorted in an ascending order after every insertion or deletion.

Advantages of using SortedList:

  1. Extracting minimum (sorted_list[0]) or maximum (sorted_list[-1]) values will be a constant time operation, i.e., O(1)O(1) as they are always at the start or end of the list.
  2. Achieving sorted order after every insertion or deletion is quicker with SortedList (with time complexity O(logN)O(log N)) compared to re-sorting a normal list after every mutation (which has a time complexity O(NlogN)O(N log N)).

Understanding these operations can help us utilize SortedList efficiently for our problem.

The bisect Functions

The SortedList data structure includes a useful function called bisect_right.

The bisect_right function finds the insertion point for a given value in the sorted list to maintain sorted order. If the element already exists in the list, the insertion point is after (or to the right of) any existing entries. The method returns an index representing the first element in the list that is greater than the value provided.

For example, if we have a SortedList as [1, 2, 4, 6, 8], bisect_right(4) will return 3, as index 3 is the first element greater than 4, and this is where 4 would be inserted to maintain sorted order if duplicates were allowed in the list.

Similarly, the bisect_left function finds the leftmost insertion point for a given value in the sorted list, which also happens to be the index of the first element that is not less than the value, i.e., equal to or greater than the value.

Both operations are performed quickly, with a time complexity of O(logN)O(log N).

Here is an example of how you would use bisect_left and bisect_right on a SortedList:

Python
1from sortedcontainers import SortedList 2 3sorted_list = SortedList([1, 2, 4, 4, 6]) 4idx_1 = sorted_list.bisect_left(4) 5idx_2 = sorted_list.bisect_right(4) 6print(idx_1, idx_2) # Output: 2 4
Task Statement

We are tasked with designing a Python function named process_queries(), that can process a series of distinct requests or queries efficiently. The queries comprise a list of two integers — type of operation and the operand.

There are three types of operations we'll handle:

  • Adding an integer to the set (operation type 0)
  • Removing an integer from the set (operation type 1). Whenever this operation is invoked, we can guarantee that the integer exists in the set.
  • Finding the smallest integer that is greater than or equal to a given value (operation type 2).

The function should return the current size of the set when operation type is 0 or 1, and the smallest possible integer when operation type is 2. If such an integer does not exist, the function should return -1.

Given a list of queries:

Python
1[ 2 [0, 10], 3 [2, 10], 4 [0, 20], 5 [1, 10], 6 [2, 10] 7]

The function should return: [1, 10, 2, 1, 20]

Solution Building: Step 1

To start, we'll initialize our SortedList from the sortedContainers library in Python. We'll also create an empty list labeled 'results' to store the outputs for each request.

Python
1from sortedcontainers import SortedList 2 3def process_queries(queries): 4 sorted_list = SortedList() 5 results = []
Solution Building: Step 2

Next, we utilize a for loop to traverse through all the queries. For an operation type of 0 or 1, we either add or remove the provided value from our sorted list. Subsequently, we append the size of the current list to 'results'.

Python
1from sortedcontainers import SortedList 2 3def process_queries(queries): 4 sorted_list = SortedList() 5 results = [] 6 7 for query in queries: 8 operation, value = query 9 10 if operation == 0: 11 sorted_list.add(value) 12 elif operation == 1: 13 sorted_list.discard(value) 14 15 results.append(len(sorted_list))
Solution Building: Step 3

Lastly, when the operation type is 2, we need to find the minimum bound, i.e., the smallest value greater than or equal to our provided value in the set. We perform this using the bisect_left operation from our SortedList. If such a value does not exist, we append -1 to 'results'.

Python
1from sortedcontainers import SortedList 2 3def process_queries(queries): 4 sorted_list = SortedList() 5 results = [] 6 7 for query in queries: 8 operation, value = query 9 10 if operation == 0: 11 sorted_list.add(value) 12 elif operation == 1: 13 sorted_list.discard(value) 14 elif operation == 2: 15 idx = sorted_list.bisect_left(value) 16 if idx != len(sorted_list): 17 results.append(sorted_list[idx]) 18 continue 19 results.append(-1) 20 continue 21 22 results.append(len(sorted_list)) 23 24 return results
Lesson Summary

Well done! You've successfully navigated the complexities of SortedList operations and developed an understanding of how to handle various types of queries efficiently using Python. Resolving the problem involved incorporating Python lists, conditional statements, and an understanding of binary search within a sorted list.

The next step in your learning journey involves tackling similar challenges on your own using the concepts that you've just learned. Be sure to review this lesson as needed, and always remember: practice and apply these concepts. Happy coding!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.