Geek Slack

Course Content
Working with Pandas DataFrame
A DataFrame is the core data structure in Pandas, providing a two-dimensional table for storing data. It is similar to a spreadsheet or SQL table and consists of rows and columns, where each column can be of a different data type (e.g., integer, float, string).DataFrames are extremely versatile and essential for data manipulation and analysis tasks. You can create, manipulate, and analyze data stored in DataFrames with a variety of operations that are both powerful and easy to use.
0/7
Working with Date and Time in Pandas
Handling date and time data is an essential part of data analysis. Pandas provides robust functionality for working with time series data, allowing you to parse, manipulate, and perform various operations on date and time information. This chapter will guide you through the essential tools and methods provided by Pandas to handle date and time data effectively.Pandas offers the datetime module along with its own Timestamp and DatetimeIndex classes to work with time data. It also provides a set of methods for parsing, converting, and manipulating time-related data.
0/5
Working with Text Data in Pandas
Text data, also known as string data, is one of the most common data types encountered in data analysis. In Pandas, strings are represented as object dtype, which is typically used for text or mixed types. Pandas provides a variety of functions to efficiently handle and manipulate text data, whether you're cleaning, transforming, or extracting meaningful information.This chapter will guide you through the various tools and techniques that Pandas provides for working with text data. We’ll explore string manipulation methods, cleaning operations, and more advanced use cases like extracting specific patterns using regular expressions.
0/4
Pandas Tutorial
    About Lesson

    When working with time series data, Pandas provides powerful tools for resampling, shifting, and rolling calculations.

    a. Creating a Time Series DataFrame

    A time series DataFrame can be created by using a DatetimeIndex and associating it with data.

    # Create a time series DataFrame with daily frequency
    date_rng = pd.date_range(start='2024-11-01', end='2024-11-10', freq='D')
    df_time_series = pd.DataFrame(date_rng, columns=['date'])
    df_time_series['data'] = np.random.randn(len(df_time_series))
    
    print(df_time_series)
    

    Output:

             date      data
    0  2024-11-01  0.623982
    1  2024-11-02  0.196127
    2  2024-11-03  0.292334
    3  2024-11-04 -0.179540
    4  2024-11-05 -1.050276
    5  2024-11-06  0.612374
    6  2024-11-07 -0.739206
    7  2024-11-08  0.350067
    8  2024-11-09  0.573210
    9  2024-11-10  0.402214
    
    b. Resampling Time Series Data

    Resampling allows you to change the frequency of the time series data, such as converting daily data to monthly data.

    # Resample the data to monthly frequency and take the mean
    df_resampled = df_time_series.resample('M', on='date').mean()
    print(df_resampled)
    

    Output:

                data
    date             
    2024-11-30  0.102879
    

    Here, the resample() method aggregates the data by month and calculates the mean of the data column.

    c. Shifting Time Series Data

    Shifting is a useful technique for comparing current values with past or future values.

    # Shift the time series data by 1 day forward
    df_time_series['shifted'] = df_time_series['data'].shift(1)
    print(df_time_series)
    

    Output:

             date      data   shifted
    0  2024-11-01  0.623982       NaN
    1  2024-11-02  0.196127  0.623982
    2  2024-11-03  0.292334  0.196127
    3  2024-11-04 -0.179540  0.292334
    4  2024-11-05 -1.050276 -0.179540
    5  2024-11-06  0.612374 -1.050276
    6  2024-11-07 -0.739206  0.612374
    7  2024-11-08  0.350067 -0.739206
    8  2024-11-09  0.573210  0.350067
    9  2024-11-10  0.402214  0.573210
    

    In the output, the shifted column represents the previous day’s value.

    d. Rolling Calculations

    Rolling calculations, such as moving averages, are often used in time series analysis.

    # Calculate a rolling mean with a window of 3 days
    df_time_series['rolling_mean'] = df_time_series['data'].rolling(window=3).mean()
    print(df_time_series)
    

    Output:

             date      data   shifted  rolling_mean
    0  2024-11-01  0.623982       NaN           NaN
    1  2024-11-02  0.196127  0.623982           NaN
    2  2024-11-03  0.292334  0.196127      0.370481
    3  2024-11-04 -0.179540  0.292334      0.336973
    4  2024-11-05 -1.050276 -0.179540     -0.312460
    5  2024-11-06  0.612374 -1.050276     -0.205814
    6  2024-11-07 -0.739206  0.612374     -0.392704
    7  2024-11-08  0.350067 -0.739206     -0.179255
    8  2024-11-09  0.573210  0.350067      0.061037
    9  2024-11-10  0.402214  0.573210      0.175497
    

    Here, the rolling_mean represents the moving average of the past 3 days.