Hello and welcome! Today, we're exploring practical data manipulation techniques in Java. We'll use Java lists to represent our data stream and perform projection, filtering, and aggregation. And here's the star of the show: our operations will be neatly packaged within a Java class! No mess, all clean code.
Data manipulation is akin to being a sculptor but for data. We chisel and shape our data to get the desired structure. Java lists
are perfect for this, and our operations will be conveniently bundled inside a Java class
. So, let's get our toolbox ready! Here's a simple Java class, DataStream
, that will serve as our toolbox:
Java1import java.util.List; 2 3public class DataStream<T> { 4 private List<T> data; 5 6 public DataStream(List<T> data) { 7 this.data = data; 8 } 9 10 public List<T> getData() { 11 return data; 12 } 13}
Our first stop is data projection. Think of it like capturing a photo of our desired features. Suppose we have data about people. If we're only interested in names and ages, we project our data to include just these details. We'll extend our DataStream
class with a projectData
method for this:
Java1import java.util.List; 2import java.util.stream.Collectors; 3import java.util.Map; 4import java.util.HashMap; 5 6public class DataStream<T> { 7 private List<T> data; 8 9 public DataStream(List<T> data) { 10 this.data = data; 11 } 12 13 public List<Map<String, Object>> projectData(List<String> keys) { 14 return data.stream() 15 .map(d -> { 16 Map<String, Object> projectedMap = new HashMap<>(); 17 for (String key : keys) { 18 try { 19 projectedMap.put(key, d.getClass().getDeclaredMethod("get" + key).invoke(d)); 20 } catch (Exception e) { 21 e.printStackTrace(); 22 } 23 } 24 return projectedMap; 25 }) 26 .collect(Collectors.toList()); 27 } 28} 29 30class Person { 31 private String name; 32 private int age; 33 private String profession; 34 35 // Constructor and getters 36 public Person(String name, int age, String profession) { 37 this.name = name; 38 this.age = age; 39 this.profession = profession; 40 } 41 42 public String getName() { return name; } 43 public int getAge() { return age; } 44 public String getProfession() { return profession; } 45} 46 47public class Main { 48 public static void main(String[] args) { 49 List<Person> people = List.of( 50 new Person("Alice", 25, "Engineer"), 51 new Person("Bob", 30, "Doctor") 52 ); 53 54 DataStream<Person> ds = new DataStream<>(people); 55 List<Map<String, Object>> projectedData = ds.projectData(List.of("Name", "Age")); 56 57 for (Map<String, Object> item : projectedData) { 58 System.out.println(item.get("Name") + ", " + item.get("Age")); 59 } 60 // Outputs: 61 // Alice, 25 62 // Bob, 30 63 } 64}
As you can see, we now have a new list with just the names and ages!
Next, we have data filtering, which is like cherry-picking our preferred data entries. We'll extend our DataStream
class with a filterData
method that uses a "test" function to filter data:
Java1import java.util.function.Predicate; 2import java.util.stream.Collectors; 3 4public class DataStream<T> { 5 // Existing code... 6 7 public DataStream<T> filterData(Predicate<T> predicate) { 8 List<T> filteredData = data.stream() 9 .filter(predicate) 10 .collect(Collectors.toList()); 11 return new DataStream<>(filteredData); 12 } 13} 14 15public class Main { 16 public static void main(String[] args) { 17 List<Person> people = List.of( 18 new Person("Alice", 25, "Engineer"), 19 new Person("Bob", 30, "Doctor") 20 ); 21 22 DataStream<Person> ds = new DataStream<>(people); 23 DataStream<Person> filteredDs = ds.filterData(person -> person.getAge() > 26); 24 25 for (Person item : filteredDs.getData()) { 26 System.out.println(item.getName() + ", " + item.getAge() + ", " + item.getProfession()); 27 } 28 // Outputs: 29 // Bob, 30, Doctor 30 } 31}
With the filter method, our output is a list with only Bob’s data, as he's the only one who passes the "age over 26" test.
Last is data aggregation, where we condense our data into a summary. We will add an aggregateData
method to our DataStream
class for this:
Java1import java.util.function.Function; 2 3public class DataStream<T> { 4 // Existing code... 5 6 public double aggregateData(Function<List<Double>, Double> aggregationFunction, 7 String key) { 8 List<Double> values = data.stream() 9 .map(d -> { 10 try { 11 return (Double) d.getClass().getDeclaredMethod("get" + key).invoke(d); 12 } catch (Exception e) { 13 e.printStackTrace(); 14 return 0.0; 15 } 16 }) 17 .collect(Collectors.toList()); 18 19 return aggregationFunction.apply(values); 20 } 21} 22 23public class Main { 24 public static void main(String[] args) { 25 List<Person> people = List.of( 26 new Person("Alice", 25, "Engineer"), 27 new Person("Bob", 30, "Doctor") 28 ); 29 30 DataStream<Person> ds = new DataStream<>(people); 31 double averageAge = ds.aggregateData( 32 values -> values.stream().mapToDouble(v -> v).average().orElse(0), 33 "Age" 34 ); 35 36 System.out.println(averageAge); // Outputs: 27.5 37 } 38}
With this script, we calculate the average age, which is 27.5
.
Now, let's combine projection, filtering, and aggregation to see the collective power of these techniques. We'll extend our example to demonstrate this flow:
- Data Projection: Choose only the desired fields.
- Data Filtering: Filter the data based on certain conditions.
- Data Aggregation: Summarize the filtered data.
We'll modify our DataStream
class to include all the methods and then use them together in a workflow. On top of that, projection and filtering methods will now return an instance of DataStream<T>
, not a list as before, so that we can chain these methods when calling them:
Java1import java.util.List; 2import java.util.stream.Collectors; 3import java.util.Map; 4import java.util.HashMap; 5import java.util.function.Predicate; 6import java.util.function.Function; 7 8public class DataStream<T> { 9 private List<T> data; 10 11 public DataStream(List<T> data) { 12 this.data = data; 13 } 14 15 public DataStream<Map<String, Object>> projectData(List<String> keys) { 16 List<Map<String, Object>> projectedData = data.stream() 17 .map(d -> { 18 Map<String, Object> projectedMap = new HashMap<>(); 19 for (String key : keys) { 20 try { 21 projectedMap.put(key, d.getClass().getDeclaredMethod("get" + key).invoke(d)); 22 } catch (Exception e) { 23 e.printStackTrace(); 24 } 25 } 26 return projectedMap; 27 }) 28 .collect(Collectors.toList()); 29 30 return new DataStream<>(projectedData); 31 } 32 33 public DataStream<T> filterData(Predicate<T> predicate) { 34 List<T> filteredData = data.stream() 35 .filter(predicate) 36 .collect(Collectors.toList()); 37 return new DataStream<>(filteredData); 38 } 39 40 public double aggregateData(Function<List<Double>, Double> aggregationFunction, 41 String key) { 42 List<Double> values = data.stream() 43 .map(d -> { 44 try { 45 return (Double) d.getClass().getDeclaredMethod("get" + key).invoke(d); 46 } catch (Exception e) { 47 e.printStackTrace(); 48 return 0.0; 49 } 50 }) 51 .collect(Collectors.toList()); 52 53 return aggregationFunction.apply(values); 54 } 55} 56 57class Person { 58 private String name; 59 private int age; 60 private String profession; 61 private double salary; 62 63 // Constructor and getters 64 public Person(String name, int age, String profession, double salary) { 65 this.name = name; 66 this.age = age; 67 this.profession = profession; 68 this.salary = salary; 69 } 70 71 public String getName() { return name; } 72 public int getAge() { return age; } 73 public double getSalary() { return salary; } 74 public String getProfession() { return profession; } 75} 76 77public class Main { 78 public static void main(String[] args) { 79 List<Person> people = List.of( 80 new Person("Alice", 25, "Engineer", 70000), 81 new Person("Bob", 30, "Doctor", 120000), 82 new Person("Carol", 35, "Artist", 50000), 83 new Person("David", 40, "Engineer", 90000) 84 ); 85 86 // Step 1: Project the data to include only 'Name', 'Age', and 'Salary' 87 DataStream<Map<String, Object>> projectedDs = new DataStream<>(people) 88 .projectData(List.of("Name", "Age", "Salary")); 89 90 // Step 2: Filter the projected data to include only those with age > 30 91 DataStream<Map<String, Object>> filteredDs = projectedDs.filterData( 92 map -> (int) map.get("Age") > 30 93 ); 94 95 // Step 3: Aggregate the filtered data to compute the average salary 96 double averageSalary = filteredDs.aggregateData( 97 salaries -> salaries.stream().mapToDouble(v -> v).average().orElse(0), 98 "Salary" 99 ); 100 101 System.out.println(averageSalary); // Outputs: 70000.0 102 } 103}
Here:
- Projection: We choose only the
Name
,Age
, andSalary
fields from our data. TheprojectData
method now returns aDataStream<Map<String, Object>>
object, allowing us to chain multiple operations. - Filtering: We filter the projected data to include only those persons whose age is greater than 30. The
filterData
method also returns aDataStream<T>
object for chaining. - Aggregation: We calculate the average salary of the filtered data. The final output shows the average salary for those aged over 30, which is
70,000
.
By combining these methods, our data manipulation becomes both powerful and concise. Try experimenting and see what you can create!
Brilliant job! You've now grasped the basics of data projection, filtering, and aggregation on Java lists. Plus, you've learned to package these operations in a Java class
— a neat bundle of reusable code magic!
Now, why not try applying these fresh skills with some practice exercises? They're just around the corner. Ready? Let's dive into more fun with data manipulation!