Lesson 4
Reading and Processing CSV Files in Batches with C#
Introduction to Reading Data in Batches with C#

In previous lessons, you learned how to handle datasets stored in compressed formats and manage large numerical datasets efficiently. Building on that foundation, today's lesson will teach you how to read and process data in batches from multiple CSV files using C#. This is crucial because working with data in smaller chunks, or batches, can make your code more efficient and faster when dealing with large datasets.

Our focus in this lesson will be on a practical scenario where a dataset containing car information is spread across multiple files. You will learn to read, process, and analyze this data to extract meaningful insights, such as determining the car with the lowest price.

Understanding CSV Data Structure

In this lesson, we will work with a set of CSV files containing car data. Here's what a typical CSV file might look like:

csv
1transmission,price,color,year,model,distance_traveled_km 2Automatic,60383.80,Silver,2013,Ford Focus,10437 3Manual,82471.28,White,2011,Toyota Corolla,221662 4Automatic,52266.72,Black,2012,BMW Series 5,30296 5...

Each line represents a car record with the following attributes:

  • Transmission: Type of transmission (e.g., Automatic, Manual)
  • Price: The price of the car
  • Color: The color of the car
  • Year: The manufacturing year of the car
  • Model: The model of the car
  • Distance Traveled (km): Kilometers the car has traveled

These files are divided into multiple parts to allow batch processing, and understanding their structure is crucial as you learn to read and process them efficiently.

Implementing Batch Reading of CSV Files

Now, let's delve into reading these CSV files in batches using C# constructs. We'll build our solution step-by-step.

First, we need to specify the filenames for our CSV files and prepare a data structure to hold the combined data.

C#
1// Class to represent a car 2class Car 3{ 4 public string? Model { get; set; } 5 public double? Price { get; set; } 6} 7 8// Filenames to read 9string[] filenames = { "data_part1.csv", "data_part2.csv", "data_part3.csv" }; 10 11// List to store all car data 12List<Car> carData = new List<Car>();

Here, we declare an array filenames to hold the names of the CSV files and a List<Car> with a custom class Car to store the car data read from the files.

Reading Data from Each File

Now, we'll loop through each filename, read the data using StreamReader, and append it to our carData structure.

C#
1foreach (string filename in filenames) 2{ 3 using (StreamReader reader = new StreamReader(filename)) 4 { 5 string? line; 6 reader.ReadLine(); // Skip header line 7 8 // Read rows with car data 9 while ((line = reader.ReadLine()) != null) 10 { 11 string[] columns = line.Split(','); 12 13 // Add car data to the list 14 carData.Add(new Car 15 { 16 Model = columns[4], 17 Price = double.Parse(columns[1]) 18 }); 19 } 20 } 21}

In this code:

  • We open each file using StreamReader and use a loop to read lines.
  • We skip the header with reader.ReadLine().
  • For each row, we use string.Split(',') to divide the line into components.
  • We parse the model from index 4, the price from index 1 and append the data to carData.
Finding the Car with the Lowest Price

With all data combined in carData, the next step is identifying the car with the lowest price in C#.

C#
1if (carData.Count > 0) 2{ 3 Car lowestCostCar = carData[0]; 4 foreach (Car car in carData) 5 { 6 if (car.Price < lowestCostCar.Price) 7 { 8 lowestCostCar = car; 9 } 10 } 11 12 // Display the car with the lowest price 13 Console.WriteLine($"Model: {lowestCostCar.Model}"); 14 Console.WriteLine($"Price: ${lowestCostCar.Price}"); 15} 16else 17{ 18 Console.WriteLine("No valid car data available."); 19}

Here:

  • We initialize lowestCostCar with the first car in carData.
  • A loop evaluates each car to find the one with the minimum price.
  • Finally, we print the model and price of the car with the lowest price.
Summary and Practice Preparation

In this lesson, you have learned how to:

  • Read data in batches from multiple CSV files using C# file handling with StreamReader.
  • Process the data efficiently with string and data type conversions using string.Split() and double.Parse().
  • Identify insights, such as the car with the lowest price, using loops to evaluate data elements.

These techniques prepare you to handle similar datasets efficiently using C#. Practice these skills with exercises designed to reinforce your understanding, focusing on reactive and efficient data handling techniques.

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.