When working with time series data, Pandas provides powerful tools for resampling, shifting, and rolling calculations.
a. Creating a Time Series DataFrame
A time series DataFrame can be created by using a DatetimeIndex
and associating it with data.
# Create a time series DataFrame with daily frequency
date_rng = pd.date_range(start='2024-11-01', end='2024-11-10', freq='D')
df_time_series = pd.DataFrame(date_rng, columns=['date'])
df_time_series['data'] = np.random.randn(len(df_time_series))
print(df_time_series)
Output:
date data
0 2024-11-01 0.623982
1 2024-11-02 0.196127
2 2024-11-03 0.292334
3 2024-11-04 -0.179540
4 2024-11-05 -1.050276
5 2024-11-06 0.612374
6 2024-11-07 -0.739206
7 2024-11-08 0.350067
8 2024-11-09 0.573210
9 2024-11-10 0.402214
b. Resampling Time Series Data
Resampling allows you to change the frequency of the time series data, such as converting daily data to monthly data.
# Resample the data to monthly frequency and take the mean
df_resampled = df_time_series.resample('M', on='date').mean()
print(df_resampled)
Output:
data
date
2024-11-30 0.102879
Here, the resample()
method aggregates the data by month and calculates the mean of the data
column.
c. Shifting Time Series Data
Shifting is a useful technique for comparing current values with past or future values.
# Shift the time series data by 1 day forward
df_time_series['shifted'] = df_time_series['data'].shift(1)
print(df_time_series)
Output:
date data shifted
0 2024-11-01 0.623982 NaN
1 2024-11-02 0.196127 0.623982
2 2024-11-03 0.292334 0.196127
3 2024-11-04 -0.179540 0.292334
4 2024-11-05 -1.050276 -0.179540
5 2024-11-06 0.612374 -1.050276
6 2024-11-07 -0.739206 0.612374
7 2024-11-08 0.350067 -0.739206
8 2024-11-09 0.573210 0.350067
9 2024-11-10 0.402214 0.573210
In the output, the shifted
column represents the previous day’s value.
d. Rolling Calculations
Rolling calculations, such as moving averages, are often used in time series analysis.
# Calculate a rolling mean with a window of 3 days
df_time_series['rolling_mean'] = df_time_series['data'].rolling(window=3).mean()
print(df_time_series)
Output:
date data shifted rolling_mean
0 2024-11-01 0.623982 NaN NaN
1 2024-11-02 0.196127 0.623982 NaN
2 2024-11-03 0.292334 0.196127 0.370481
3 2024-11-04 -0.179540 0.292334 0.336973
4 2024-11-05 -1.050276 -0.179540 -0.312460
5 2024-11-06 0.612374 -1.050276 -0.205814
6 2024-11-07 -0.739206 0.612374 -0.392704
7 2024-11-08 0.350067 -0.739206 -0.179255
8 2024-11-09 0.573210 0.350067 0.061037
9 2024-11-10 0.402214 0.573210 0.175497
Here, the rolling_mean
represents the moving average of the past 3 days.