Time Series Data – Geek Slack

Pandas Tutorial

About Lesson

When working with time series data, Pandas provides powerful tools for resampling, shifting, and rolling calculations.

a. Creating a Time Series DataFrame

A time series DataFrame can be created by using a DatetimeIndex and associating it with data.

# Create a time series DataFrame with daily frequency
date_rng = pd.date_range(start='2024-11-01', end='2024-11-10', freq='D')
df_time_series = pd.DataFrame(date_rng, columns=['date'])
df_time_series['data'] = np.random.randn(len(df_time_series))

print(df_time_series)

Output:

         date      data
0  2024-11-01  0.623982
1  2024-11-02  0.196127
2  2024-11-03  0.292334
3  2024-11-04 -0.179540
4  2024-11-05 -1.050276
5  2024-11-06  0.612374
6  2024-11-07 -0.739206
7  2024-11-08  0.350067
8  2024-11-09  0.573210
9  2024-11-10  0.402214

b. Resampling Time Series Data

Resampling allows you to change the frequency of the time series data, such as converting daily data to monthly data.

# Resample the data to monthly frequency and take the mean
df_resampled = df_time_series.resample('M', on='date').mean()
print(df_resampled)

Output:

            data
date             
2024-11-30  0.102879

Here, the resample() method aggregates the data by month and calculates the mean of the data column.

c. Shifting Time Series Data

Shifting is a useful technique for comparing current values with past or future values.

# Shift the time series data by 1 day forward
df_time_series['shifted'] = df_time_series['data'].shift(1)
print(df_time_series)

Output:

         date      data   shifted
0  2024-11-01  0.623982       NaN
1  2024-11-02  0.196127  0.623982
2  2024-11-03  0.292334  0.196127
3  2024-11-04 -0.179540  0.292334
4  2024-11-05 -1.050276 -0.179540
5  2024-11-06  0.612374 -1.050276
6  2024-11-07 -0.739206  0.612374
7  2024-11-08  0.350067 -0.739206
8  2024-11-09  0.573210  0.350067
9  2024-11-10  0.402214  0.573210

In the output, the shifted column represents the previous day’s value.

d. Rolling Calculations

Rolling calculations, such as moving averages, are often used in time series analysis.

# Calculate a rolling mean with a window of 3 days
df_time_series['rolling_mean'] = df_time_series['data'].rolling(window=3).mean()
print(df_time_series)

Output:

         date      data   shifted  rolling_mean
0  2024-11-01  0.623982       NaN           NaN
1  2024-11-02  0.196127  0.623982           NaN
2  2024-11-03  0.292334  0.196127      0.370481
3  2024-11-04 -0.179540  0.292334      0.336973
4  2024-11-05 -1.050276 -0.179540     -0.312460
5  2024-11-06  0.612374 -1.050276     -0.205814
6  2024-11-07 -0.739206  0.612374     -0.392704
7  2024-11-08  0.350067 -0.739206     -0.179255
8  2024-11-09  0.573210  0.350067      0.061037
9  2024-11-10  0.402214  0.573210      0.175497

Here, the rolling_mean represents the moving average of the past 3 days.

a. Creating a Time Series DataFrame

b. Resampling Time Series Data

c. Shifting Time Series Data

d. Rolling Calculations

Follow the newsletter & get attractive promotions