Lesson 3
C++ Sets and Their Operations
Introduction

I'm delighted to welcome you to our C++ Sets lesson! Remember, std::set in C++ is similar to sets in other programming languages. It is a container that stores unique elements, following a specific order. They're especially useful when you need to ensure that elements in a collection appear only once.

In this lesson, you'll consolidate your knowledge of creating and operating on sets using std::set. You will learn about immutable sets concepts through const correctness and discover how sets enhance performance. Ready, set, go!

Creating and Manipulating Sets

Let's begin by creating a set in C++. It can be done using the std::set from the C++ Standard Library.

C++
1#include <iostream> 2#include <set> 3 4int main() { 5 // Creating a set and printing it 6 std::set<int> my_set = {1, 2, 3, 4, 5, 5, 5}; // Duplicates will be omitted 7 for(const auto& elem : my_set) 8 std::cout << elem << " "; // Output: 1 2 3 4 5 9 std::cout << std::endl; 10 11 return 0; 12}

C++ provides methods to manipulate sets, such as insert(), find(), erase(), and clear().

C++
1#include <iostream> 2#include <set> 3 4int main() { 5 std::set<int> my_set = {1, 2, 3, 4, 5}; 6 7 // Adding an element 8 my_set.insert(6); // my_set is now {1, 2, 3, 4, 5, 6} 9 10 std::cout << (my_set.find(1) != my_set.end()) << std::endl; // Output: 1 (true), as my_set includes an element 1 11 12 // Removing an element by key 13 my_set.erase(1); // my_set becomes {2, 3, 4, 5, 6} 14 15 std::cout << (my_set.find(1) != my_set.end()) << std::endl; // Output: 0 (false), as my_set doesn't include 1 anymore 16 17 // Removing an element by iterator 18 auto it = my_set.find(2); 19 if (it != my_set.end()) { 20 my_set.erase(it); // my_set becomes {3, 4, 5, 6} 21 } 22 23 // Discarding an element (no function needed, erase does nothing if element doesn't exist) 24 my_set.erase(7); // No changes - 7 doesn't exist in my_set 25 26 // Clearing the set 27 my_set.clear(); // my_set becomes empty 28 29 return 0; 30}

Both erase() methods can be used for removing elements from a set, but they behave slightly differently depending on their parameters:

  • erase(iterator): Erases an element by iterator.
  • erase(key): Erases elements by key. If the element is not found, it does nothing.

In addition to std::set, the C++ Standard Library also includes std::unordered_set, which is similar in functionality but differs in terms of element ordering. Unlike std::set, std::unordered_set does not maintain any specific order for its elements.

The methods discussed above for std::set such as insert(), find(), erase(), and clear() also apply to std::unordered_set. This makes std::unordered_set a convenient choice when order is not important and you want to prioritize faster average membership tests.

Set Operations

C++ provides built-in operations for std::set, such as union, intersection, and difference using functions and iterators from the <algorithm> header.

C++
1#include <iostream> 2#include <set> 3#include <algorithm> 4#include <iterator> 5 6int main() { 7 std::set<int> set_1 = {1, 2, 3, 4}; // First set 8 std::set<int> set_2 = {3, 4, 5, 6}; // Second set 9 10 // Set union 11 std::set<int> union_set; 12 std::set_union(set_1.begin(), set_1.end(), set_2.begin(), set_2.end(), std::inserter(union_set, union_set.begin())); 13 14 for(const auto& elem : union_set) 15 std::cout << elem << " "; // Output: 1 2 3 4 5 6 16 std::cout << std::endl; 17 18 // Set intersection 19 std::set<int> intersection_set; 20 std::set_intersection(set_1.begin(), set_1.end(), set_2.begin(), set_2.end(), std::inserter(intersection_set, intersection_set.begin())); 21 22 for(const auto& elem : intersection_set) 23 std::cout << elem << " "; // Output: 3 4 24 std::cout << std::endl; 25 26 // Set difference 27 std::set<int> difference_set; 28 std::set_difference(set_1.begin(), set_1.end(), set_2.begin(), set_2.end(), std::inserter(difference_set, difference_set.begin())); 29 30 for(const auto& elem : difference_set) 31 std::cout << elem << " "; // Output: 1 2 32 std::cout << std::endl; 33 34 return 0; 35}
  • std::set_union(): Combines elements from both sets, excluding any duplicates. In this case, the result is a set containing {1, 2, 3, 4, 5, 6}.
  • std::set_intersection(): Returns a set with only the elements that are common to both sets. For these sets, the intersection is {3, 4}.
  • std::set_difference(): Returns a set containing elements that are in the first set but not in the second set. Here, the result is {1, 2} for set_1.difference(set_2).
  • std::inserter: A function that generates an iterator which inserts elements into the container at the specified position. It is used here to insert the resulting elements of std::set_union, std::set_intersection, and std::set_difference into the destination containers (union_set, intersection_set, difference_set) at their respective beginning positions.
Immutable Sets

While C++ does not have a built-in immutable set type, const std::set can be used to create an unmodifiable set; this practice is called const correctness. This is useful in scenarios where you need to guarantee the set's elements remain unchanged after creation.

C++
1#include <iostream> 2#include <set> 3 4void printSet(const std::set<int>& my_fset) { 5 for(const auto& elem : my_fset) { 6 std::cout << elem << " "; 7 } 8 std::cout << std::endl; 9} 10 11int main() { 12 const std::set<int> my_fset = {1, 2, 3, 4, 5, 5, 5}; 13 printSet(my_fset); // Output: 1 2 3 4 5 14 15 // my_fset.insert(6); // Error: cannot modify a const set 16 return 0; 17}

Using const with a set ensures that the set cannot be modified after it's created, similar to Python's frozenset.

Performance Benefits of Sets

One of the key advantages of sets is their faster performance in membership tests, which results from their underlying data structures. The two primary types of sets in C++ — std::set and std::unordered_set — employ different structures to achieve their performance characteristics.

std::set uses a balanced binary search tree, which maintains elements in a specific sorted order. In contrast, std::unordered_set relies on a hash table. Hash tables map keys to indices in an array using a hash function, which distributes keys uniformly across the array. However, unlike std::set, elements in std::unordered_set are not stored in any particular order.

C++
1#include <iostream> 2#include <set> 3#include <unordered_set> 4#include <vector> 5#include <chrono> 6#include <algorithm> 7 8int main() { 9 std::set<int> my_set; 10 std::unordered_set<int> my_unordered_set; 11 std::vector<int> my_array; 12 for(int i = 0; i < 1000000; ++i){ 13 my_set.insert(i); 14 my_unordered_set.insert(i); 15 my_array.push_back(i); 16 } 17 18 // Sets allow for fast retrieval of elements 19 auto start = std::chrono::high_resolution_clock::now(); 20 std::cout << (my_set.find(999999) != my_set.end()) << std::endl; 21 auto end = std::chrono::high_resolution_clock::now(); 22 std::cout << "Time taken for set: " << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << "µs" << std::endl; 23 // Time taken for set: 40µs 24 25 // Unordered sets (hash tables) find the number even faster 26 start = std::chrono::high_resolution_clock::now(); 27 std::cout << (my_unordered_set.find(999999) != my_unordered_set.end()) << std::endl; 28 end = std::chrono::high_resolution_clock::now(); 29 std::cout << "Time taken for unordered set: " << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << "µs" << std::endl; 30 // Time taken for unordered set: 2µs 31 32 // Arrays take the longest 33 start = std::chrono::high_resolution_clock::now(); 34 std::cout << (std::find(my_array.begin(), my_array.end(), 999999) != my_array.end()) << std::endl; 35 end = std::chrono::high_resolution_clock::now(); 36 std::cout << "Time taken for array: " << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << "ms" << std::endl; 37 // Time taken for array: 500µs 38 39 return 0; 40}

Please note that the runtimes reported in the above code snippet may vary when running the code multiple times or on different environments; that is, you will most probably get different numbers when running the code. What matters is the relative speed of retrieval for one data structure with respect to the others. Let's briefly discuss these results from a theoretical point of view:

  • Set: Membership test with std::set is efficient due to the balanced tree structure, leading to logarithmic time complexity (O(log n)).
  • Unordered set: std::unordered_set uses a hash table, which allows average constant time complexity (O(1)) for membership tests.
  • Array (vector): Checking for membership in a std::vector requires a linear search, leading to linear time complexity (O(n)).
Lesson Summary

Congratulations! You've just explored creating and manipulating sets, performing set operations, using immutable sets with const correctness, and understanding the performance benefits of sets in C++.

Remember, practice is key to solidifying your understanding. Happy coding!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.