Working with dates and times is crucial in data analysis. Imagine analyzing sales data over time to understand seasonal trends. To make sense of such data, you need to handle dates and times accurately.
Today's goals:
datetime
format, even if they are in different formats.datetime
data.datetime
operations such as finding time differences and obtaining today's date.By the end, you'll be comfortable manipulating dates and times in Pandas. Let's start!
Date info often comes as text, which isn't very useful for analysis. Converting this text to datetime
format lets us use powerful features in Pandas.
The pd.to_datetime()
function converts different date formats correctly. Here's an example:
Python1import pandas as pd 2 3# Sample data 4data = { 5 'order_date': ['2023-10-01', '10/02/2023', 'October 3 2023', '2023.10.04'] 6} 7sales = pd.DataFrame(data) 8 9# Convert 'order_date' to datetime 10sales['order_date'] = pd.to_datetime(sales['order_date'], format='mixed') 11 12print(sales)
Output:
1 order_date 20 2023-10-01 31 2023-10-02 42 2023-10-03 53 2023-10-04
This example converts various date formats into datetime
objects, making date operations easier. Note that you need to specify format='mixed'
, so format will be inferred for each element individually
With a column in datetime
format, we can extract components like the year, month, or day using the .dt
accessor. Here’s how to extract the year, month, and day:
Python1# Extract year, month, and day from datetime 2sales['year'] = sales['order_date'].dt.year 3sales['month'] = sales['order_date'].dt.month 4sales['day'] = sales['order_date'].dt.day 5 6print(sales)
Output:
1 order_date year month day 20 2023-10-01 2023 10 1 31 2023-10-02 2023 10 2 42 2023-10-03 2023 10 3 53 2023-10-04 2023 10 4
This code creates new columns for the year, month, and day, which can be useful for time-based analyses like finding monthly or seasonal trends.
Pandas also allows for various datetime operations. For example, finding the time difference between two dates and obtaining today's date:
Python1from datetime import datetime 2 3# Calculate time delta 4sales['time_since_order'] = datetime.now() - sales['order_date'] 5 6# Today's date 7today = pd.to_datetime('today') 8 9print(sales) 10print('Today\'s date:', today)
Output:
1 order_date year month day time_since_order 20 2023-10-01 2023 10 1 3 days 10:23:30.456789 31 2023-10-02 2023 10 2 2 days 10:23:30.456789 42 2023-10-03 2023 10 3 1 day 10:23:30.456789 53 2023-10-04 2023 10 4 0 days 10:23:30.456789 6Today's date: 2023-10-05
This code calculates the time difference between each order date and the current date, as well as retrieves today's date.
Today, we learned:
datetime
format using pd.to_datetime()
, even for multiple formats..dt
accessor.Understanding datetime
manipulation is essential for efficient data analysis, enabling easy time-based computations.
Now it's time to apply your new skills. In the practice session, you’ll convert columns, extract date components, and explore more datetime
features. Dive into the hands-on practice to reinforce today's knowledge!