Lesson 3
Mastering Data Aggregation and Data Streams handling with C++
Introduction

Welcome to our lesson on mastering data aggregation and data streams with C++. In this lesson, you'll learn to build a basic sales records aggregator using C++'s standard library containers. Then, we'll extend its functionality to handle more complex operations such as filtering, data aggregation, and formatting. By the end of this session, you'll be proficient in managing and formatting data streams efficiently in C++.

Starter Task Methods and Their Definitions

To get started, we'll create a simple sales record aggregator in C++. Here are the methods we'll focus on:

  • void add_sale(const std::string& sale_id, double amount, const std::string& date); - Adds or updates a sale record with a unique identifier sale_id, amount, and a date in the format "YYYY-MM-DD".

  • std::optional<double> get_sale(const std::string& sale_id) const; - Retrieves the sale amount associated with the sale_id. If the sale does not exist, it returns an empty optional.

  • bool delete_sale(const std::string& sale_id); - Deletes the sale record with the given sale_id. Returns true if the sale was deleted and false if the sale does not exist.

Are these methods clear so far? Great! Let's now look at how we would implement them.

Starter Task Implementation

Here is the complete code for the starter task:

C++
1#include <iostream> 2#include <map> 3#include <optional> 4#include <string> 5 6class SalesAggregator { 7public: 8 void add_sale(const std::string& sale_id, double amount, const std::string& date) { 9 sales[sale_id] = {amount, date}; 10 } 11 12 std::optional<double> get_sale(const std::string& sale_id) const { 13 auto it = sales.find(sale_id); 14 if (it != sales.end()) { 15 return it->second.first; 16 } 17 return std::nullopt; 18 } 19 20 bool delete_sale(const std::string& sale_id) { 21 return sales.erase(sale_id) > 0; 22 } 23 24private: 25 std::map<std::string, std::pair<double, std::string>> sales; 26}; 27 28// Example Usage 29int main() { 30 SalesAggregator aggregator; 31 32 // Add sales 33 aggregator.add_sale("001", 100.50, "2023-01-01"); 34 aggregator.add_sale("002", 200.75, "2023-01-15"); 35 36 // Get sale 37 if (auto sale = aggregator.get_sale("001")) { 38 std::cout << *sale << std::endl; // Output: 100.5 39 } 40 41 // Delete sale 42 std::cout << std::boolalpha << aggregator.delete_sale("002") << std::endl; // Output: true 43 if (auto sale = aggregator.get_sale("002")) { 44 std::cout << *sale << std::endl; 45 } else { 46 std::cout << "Sale not found" << std::endl; // Output: Sale not found 47 } 48 49 return 0; 50}

Explanation:

  • The sales map stores sale records with sale_id as the key and a pair of amount and date as the value.
  • add_sale adds a new sale or updates an existing sale ID.
  • get_sale retrieves the amount for a given sale ID or returns std::nullopt if the sale does not exist.
  • delete_sale removes the sale record for the given sale ID or returns false if the sale does not exist.

Now that we have our basic aggregator, let's extend it to include more advanced functionalities.

New Methods and Their Definitions

To add complexity and usefulness to our sales aggregator, we'll introduce some additional methods for advanced data aggregation, filtering, and formatting functionalities.

  • std::pair<int, double> aggregate_sales(double min_amount = 0) const; - Returns the total number of sales and the total sales amount where the sale amount is above min_amount.

  • std::string format_sales(double min_amount = 0) const; - Returns the sales data, filtered by min_amount, formatted as a plain text string. Includes sales statistics in the output.

  • std::vector<std::tuple<std::string, double, std::string>> get_sales_in_date_range(const std::string& start_date, const std::string& end_date) const; - Retrieves all sales that occurred within the given date range, inclusive. Each sale includes sale_id, amount, and date.

Let's implement these methods step by step.

Step 1: Implementing the 'Aggregate Sales' Method

We'll create the aggregate_sales method:

C++
1#include <iostream> 2#include <map> 3#include <utility> 4 5class SalesAggregator { 6public: 7 std::pair<int, double> aggregate_sales(double min_amount = 0) const { 8 int total_sales = 0; 9 double total_amount = 0.0; 10 for (const auto& [sale_id, data] : sales) { 11 if (data.first > min_amount) { 12 total_sales++; 13 total_amount += data.first; 14 } 15 } 16 return {total_sales, total_amount}; 17 } 18 19 void add_sale(const std::string& sale_id, double amount, const std::string& date) { 20 sales[sale_id] = {amount, date}; 21 } 22 23private: 24 std::map<std::string, std::pair<double, std::string>> sales; 25}; 26 27// Example Usage 28int main() { 29 SalesAggregator aggregator; 30 aggregator.add_sale("001", 100.50, "2023-01-01"); 31 aggregator.add_sale("002", 200.75, "2023-01-15"); 32 33 auto result = aggregator.aggregate_sales(50); 34 std::cout << "Total Sales: " << result.first << ", Total Amount: " << result.second << std::endl; 35 // Output: Total Sales: 2, Total Amount: 301.25 36 37 return 0; 38}

This method iterates through the sales and sums those that exceed the min_amount.

Step 2: Implementing the 'Format Sales' Method

Next, we create the format_sales method to output data in plain text format.

C++
1#include <iostream> 2#include <map> 3#include <sstream> 4#include <utility> 5 6class SalesAggregator { 7public: 8 std::string format_sales(double min_amount = 0) const { 9 std::ostringstream oss; // Output string stream to format and store the result as a string. 10 auto statistics = aggregate_sales(min_amount); 11 12 oss << "Sales:\n"; 13 for (const auto& [sale_id, data] : sales) { 14 if (data.first > min_amount) { 15 oss << "Sale ID: " << sale_id << ", Amount: " << data.first << ", Date: " << data.second << "\n"; 16 } 17 } 18 19 oss << "Summary:\nTotal Sales: " << statistics.first << ", Total Amount: " << statistics.second << "\n"; 20 21 return oss.str(); // Convert the formatted data in the string stream to a single string. 22 } 23 24 std::pair<int, double> aggregate_sales(double min_amount = 0) const { 25 int total_sales = 0; 26 double total_amount = 0.0; 27 for (const auto& [sale_id, data] : sales) { 28 if (data.first > min_amount) { 29 total_sales++; 30 total_amount += data.first; 31 } 32 } 33 return {total_sales, total_amount}; 34 } 35 36 void add_sale(const std::string& sale_id, double amount, const std::string& date) { 37 sales[sale_id] = {amount, date}; 38 } 39 40private: 41 std::map<std::string, std::pair<double, std::string>> sales; 42}; 43 44// Example Usage 45int main() { 46 SalesAggregator aggregator; 47 aggregator.add_sale("001", 100.50, "2023-01-01"); 48 aggregator.add_sale("002", 200.75, "2023-01-15"); 49 50 std::cout << aggregator.format_sales(50) << std::endl; 51 // Output: 52 // Sales: 53 // Sale ID: 001, Amount: 100.5, Date: 2023-01-01 54 // Sale ID: 002, Amount: 200.75, Date: 2023-01-15 55 // Summary: 56 // Total Sales: 2, Total Amount: 301.25 57 58 return 0; 59}

In this method, std::ostringstream is used, which is a stream class to operate on strings. It allows you to format strings in a way similar to using std::cout. We append formatted data to the stream using the insertion operators (<<), then use the str() method to retrieve the final concatenated string. This approach is beneficial for creating large and complex strings safely and efficiently.

Step 3: Implementing the 'Get Sales in Date Range' Method

Let's implement get_sales_in_date_range, which relies on parsing dates and comparing them to filter sales records.

C++
1#include <iostream> 2#include <map> 3#include <tuple> 4#include <vector> 5#include <chrono> 6#include <iomanip> 7#include <sstream> 8 9class SalesAggregator { 10public: 11 std::vector<std::tuple<std::string, double, std::string>> get_sales_in_date_range(const std::string& start_date, const std::string& end_date) const { 12 std::vector<std::tuple<std::string, double, std::string>> result; 13 14 // Helper function to parse a date string formatted as "YYYY-MM-DD" into a time_point for comparison 15 auto parse_date = [](const std::string& s) -> std::chrono::system_clock::time_point { 16 std::istringstream ss(s); 17 std::tm tm = {}; 18 ss >> std::get_time(&tm, "%Y-%m-%d"); // Reads time from string into a tm struct 19 return std::chrono::system_clock::from_time_t(std::mktime(&tm)); // Converts tm to a time_point 20 }; 21 22 // Parse the start and end dates 23 auto start = parse_date(start_date); 24 auto end = parse_date(end_date); 25 26 // Iterate over the sales records to find those within the date range 27 for (const auto& [sale_id, data] : sales) { 28 auto sale_date = parse_date(data.second); 29 if (sale_date >= start && sale_date <= end) { 30 result.emplace_back(sale_id, data.first, data.second); 31 } 32 } 33 return result; 34 } 35 36 void add_sale(const std::string& sale_id, double amount, const std::string& date) { 37 sales[sale_id] = {amount, date}; 38 } 39 40private: 41 std::map<std::string, std::pair<double, std::string>> sales; 42}; 43 44// Example Usage 45int main() { 46 SalesAggregator aggregator; 47 aggregator.add_sale("001", 100.50, "2023-01-01"); 48 aggregator.add_sale("002", 200.75, "2023-01-15"); 49 50 auto sales_in_range = aggregator.get_sales_in_date_range("2023-01-01", "2023-12-31"); 51 for (const auto& sale : sales_in_range) { 52 std::cout << "Sale ID: " << std::get<0>(sale) << ", Amount: " << std::get<1>(sale) << ", Date: " << std::get<2>(sale) << std::endl; 53 } 54 // Output: 55 // Sale ID: 001, Amount: 100.5, Date: 2023-01-01 56 // Sale ID: 002, Amount: 200.75, Date: 2023-01-15 57 58 return 0; 59}

In this implementation, parse_date is a helper lambda function that converts a date string into a std::chrono::system_clock::time_point, allowing you to compare dates. This involves reading the date into a std::tm structure with std::get_time and then converting it to a time_point using std::chrono::system_clock::from_time_t. This functionality is fundamental when working with date ranges to ensure accurate filtering of sales records.

Lesson Summary

Congratulations! You've extended a basic C++ sales aggregator to an advanced aggregator capable of filtering, aggregating, and formatting data, using custom structures for data handling. These skills are crucial for efficiently managing data streams, especially with large datasets in C++. Feel free to experiment with similar challenges to reinforce your understanding. Well done!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.