Hello and welcome! Today, we're exploring practical data manipulation techniques in C++. We'll use vectors
and maps
from the C++ Standard Template Library (STL) to represent our data and perform projection, filtering, and aggregation. And here's the star of the show: our operations will be neatly packaged within a C++ class! No mess, all clean code.
Data manipulation is akin to being a sculptor but for data. We chisel and shape our data to get the desired structure. C++ vectors and maps are perfect for this, and our operations will be conveniently bundled inside a C++ class. So, let's get our toolbox ready! Here's a simple C++ class, DataStream
, that will serve as our toolbox:
C++1#include <vector> 2#include <map> 3#include <string> 4#include <iostream> 5#include <functional> 6 7class DataStream { 8public: 9 DataStream(const std::vector<std::map<std::string, std::string>>& data) : data(data) {} 10 11private: 12 std::vector<std::map<std::string, std::string>> data; 13};
Our first stop is data projection. Think of it like capturing a photo of our desired features. Suppose we have data about people. If we're only interested in names and ages, we project our data to include just these details. We'll extend our DataStream
class with a project
method for this:
C++1#include <vector> 2#include <map> 3#include <string> 4#include <iostream> 5#include <functional> 6 7class DataStream { 8public: 9 DataStream(const std::vector<std::map<std::string, std::string>>& data) : data(data) {} 10 11 DataStream project(std::function<std::map<std::string, std::string>(const std::map<std::string, std::string>&)> projectFunc) const { 12 std::vector<std::map<std::string, std::string>> projectedData; 13 for (const auto& entry : data) { 14 projectedData.push_back(projectFunc(entry)); 15 } 16 return DataStream(projectedData); 17 } 18 19 void printData() const { 20 for (const auto& entry : data) { 21 for (const auto& pair : entry) { 22 std::cout << pair.first << ": " << pair.second << ", "; 23 } 24 std::cout << std::endl; 25 } 26 } 27 28private: 29 std::vector<std::map<std::string, std::string>> data; 30}; 31 32// Let's use it! 33int main() { 34 DataStream ds({ 35 { {"name", "Alice"}, {"age", "25"}, {"profession", "Engineer"} }, 36 { {"name", "Bob"}, {"age", "30"}, {"profession", "Doctor"} } 37 }); 38 39 DataStream projectedDs = ds.project([](const std::map<std::string, std::string>& entry) { 40 return std::map<std::string, std::string>{{"name", entry.at("name")}, {"age", entry.at("age")}}; 41 }); 42 projectedDs.printData(); 43 // Outputs: 44 // name: Alice, age: 25, 45 // name: Bob, age: 30, 46 return 0; 47}
As you can see, we now have a new vector with just the names and ages!
Next, we have data filtering, which is like cherry-picking our preferred data entries. We'll extend our DataStream
class with a filter
method that uses a "test" lambda function to filter data:
C++1#include <vector> 2#include <map> 3#include <string> 4#include <iostream> 5#include <algorithm> 6#include <functional> 7 8class DataStream { 9public: 10 DataStream(const std::vector<std::map<std::string, std::string>>& data) : data(data) {} 11 12 DataStream project(std::function<std::map<std::string, std::string>(const std::map<std::string, std::string>&)> projectFunc) const { 13 std::vector<std::map<std::string, std::string>> projectedData; 14 for (const auto& entry : data) { 15 projectedData.push_back(projectFunc(entry)); 16 } 17 return DataStream(projectedData); 18 } 19 20 DataStream filter(std::function<bool(const std::map<std::string, std::string>&)> testFunc) const { 21 std::vector<std::map<std::string, std::string>> filteredData; 22 std::copy_if(data.begin(), data.end(), std::back_inserter(filteredData), testFunc); 23 return DataStream(filteredData); 24 } 25 26 void printData() const { 27 for (const auto& entry : data) { 28 for (const auto& pair : entry) { 29 std::cout << pair.first << ": " << pair.second << ", "; 30 } 31 std::cout << std::endl; 32 } 33 } 34 35private: 36 std::vector<std::map<std::string, std::string>> data; 37}; 38 39// Applying it: 40int main() { 41 DataStream ds({ 42 { {"name", "Alice"}, {"age", "25"}, {"profession", "Engineer"} }, 43 { {"name", "Bob"}, {"age", "30"}, {"profession", "Doctor"} } 44 }); 45 46 auto ageTest = [](const std::map<std::string, std::string>& entry) { 47 return std::stoi(entry.at("age")) > 26; 48 }; 49 50 DataStream filteredDs = ds.filter(ageTest); 51 filteredDs.printData(); 52 // Outputs: 53 // name: Bob, age: 30, profession: Doctor, 54 return 0; 55}
With the filter method, our output is a vector with only Bob’s data, as he's the only one who passes the 'age over 26' test.
Lastly, we have data aggregation, where we condense our data into a summary. We will add an aggregate
method to our DataStream
class for this:
C++1#include <vector> 2#include <map> 3#include <string> 4#include <iostream> 5#include <algorithm> 6#include <functional> 7#include <numeric> 8 9class DataStream { 10public: 11 DataStream(const std::vector<std::map<std::string, std::string>>& data) : data(data) {} 12 13 DataStream project(std::function<std::map<std::string, std::string>(const std::map<std::string, std::string>&)> projectFunc) const { 14 std::vector<std::map<std::string, std::string>> projectedData; 15 for (const auto& entry : data) { 16 projectedData.push_back(projectFunc(entry)); 17 } 18 return DataStream(projectedData); 19 } 20 21 DataStream filter(std::function<bool(const std::map<std::string, std::string>&)> testFunc) const { 22 std::vector<std::map<std::string, std::string>> filteredData; 23 std::copy_if(data.begin(), data.end(), std::back_inserter(filteredData), testFunc); 24 return DataStream(filteredData); 25 } 26 27 double aggregate(const std::string& key, std::function<double(const std::vector<std::string>&)> aggFunc) const { 28 std::vector<std::string> values; 29 for (const auto& entry : data) { 30 if (entry.find(key) != entry.end()) { 31 values.push_back(entry.at(key)); 32 } 33 } 34 return aggFunc(values); 35 } 36 37 void printData() const { 38 for (const auto& entry : data) { 39 for (const auto& pair : entry) { 40 std::cout << pair.first << ": " << pair.second << ", "; 41 } 42 std::cout << std::endl; 43 } 44 } 45 46private: 47 std::vector<std::map<std::string, std::string>> data; 48}; 49 50// Let's put it to use 51int main() { 52 DataStream ds({ 53 { {"name", "Alice"}, {"age", "25"}, {"profession", "Engineer"} }, 54 { {"name", "Bob"}, {"age", "30"}, {"profession", "Doctor"} } 55 }); 56 57 auto averageAgeFunc = [](const std::vector<std::string>& ages) { 58 double sum = 0; 59 for (const auto& age : ages) { 60 sum += std::stoi(age); 61 } 62 return sum / ages.size(); 63 }; 64 65 double averageAge = ds.aggregate("age", averageAgeFunc); 66 std::cout << averageAge << std::endl; // Outputs: 27.5 67 return 0; 68}
With this script, we get the average age of Alice and Bob, which is 27.5
.
Now, let's combine projection, filtering, and aggregation to see the collective power of these techniques. We'll extend our example to demonstrate this flow:
- Data Projection: Choose only the desired fields.
- Data Filtering: Filter the data based on certain conditions.
- Data Aggregation: Summarize the filtered data.
We'll modify our DataStream
class to include all the methods and then use them together in a workflow. On top of that, projection and filtering methods will now return an instance of DataStream
, not a vector as before, so that we can chain these methods when calling them:
C++1#include <vector> 2#include <map> 3#include <string> 4#include <iostream> 5#include <algorithm> 6#include <functional> 7#include <numeric> 8 9class DataStream { 10public: 11 DataStream(const std::vector<std::map<std::string, std::string>>& data) : data(data) {} 12 13 DataStream project(std::function<std::map<std::string, std::string>(const std::map<std::string, std::string>&)> projectFunc) const { 14 std::vector<std::map<std::string, std::string>> projectedData; 15 for (const auto& entry : data) { 16 projectedData.push_back(projectFunc(entry)); 17 } 18 return DataStream(projectedData); 19 } 20 21 DataStream filter(std::function<bool(const std::map<std::string, std::string>&)> testFunc) const { 22 std::vector<std::map<std::string, std::string>> filteredData; 23 std::copy_if(data.begin(), data.end(), std::back_inserter(filteredData), testFunc); 24 return DataStream(filteredData); 25 } 26 27 double aggregate(const std::string& key, std::function<double(const std::vector<std::string>&)> aggFunc) const { 28 std::vector<std::string> values; 29 for (const auto& entry : data) { 30 if (entry.find(key) != entry.end()) { 31 values.push_back(entry.at(key)); 32 } 33 } 34 return aggFunc(values); 35 } 36 37 void printData() const { 38 for (const auto& entry : data) { 39 for (const auto& pair : entry) { 40 std::cout << pair.first << ": " << pair.second << ", "; 41 } 42 std::cout << std::endl; 43 } 44 } 45 46private: 47 std::vector<std::map<std::string, std::string>> data; 48}; 49 50// Example usage 51int main() { 52 DataStream ds({ 53 { {"name", "Alice"}, {"age", "25"}, {"profession", "Engineer"}, {"salary", "70000"} }, 54 { {"name", "Bob"}, {"age", "30"}, {"profession", "Doctor"}, {"salary", "120000"} }, 55 { {"name", "Carol"}, {"age", "35"}, {"profession", "Artist"}, {"salary", "50000"} }, 56 { {"name", "David"}, {"age", "40"}, {"profession", "Engineer"}, {"salary", "90000"} } 57 }); 58 59 // Step 1: Project the data to include only 'name', 'age', and 'salary' 60 DataStream projectedDs = ds.project([](const std::map<std::string, std::string>& entry) { 61 return std::map<std::string, std::string>{{"name", entry.at("name")}, {"age", entry.at("age")}, {"salary", entry.at("salary")}}; 62 }); 63 64 // Step 2: Filter the projected data to include only those with age > 30 65 DataStream filteredDs = projectedDs.filter([](const std::map<std::string, std::string>& entry) { 66 return std::stoi(entry.at("age")) > 30; 67 }); 68 69 // Step 3: Aggregate the filtered data to compute the average salary 70 double averageSalary = filteredDs.aggregate("salary", [](const std::vector<std::string>& salaries) { 71 double sum = 0; 72 for (const auto& salary : salaries) { 73 sum += std::stoi(salary); 74 } 75 return sum / salaries.size(); 76 }); 77 78 std::cout << averageSalary << std::endl; // Outputs: 70000.0 79 return 0; 80}
Here:
- Projection: We choose only the
name
,age
, andsalary
fields from our data. Theproject
method now returns aDataStream
object, allowing us to chain multiple operations. - Filtering: We filter the projected data to include only those persons whose age is greater than 30. The
filter
method also returns aDataStream
object for chaining. - Aggregation: We calculate the average salary of the filtered data. The final output shows the average salary for those aged over 30, which is
70,000
.
By combining these methods, our data manipulation becomes both powerful and concise. Try experimenting and see what you can create!
Brilliant job! You've now grasped the basics of data projection, filtering, and aggregation using C++ vectors and maps. Plus, you've learned to package these operations in a C++ class — a neat bundle of reusable code magic!
Now, why not try applying these fresh skills with some practice exercises? They're just around the corner. Ready? Let's dive into more fun with data manipulation!