Merging and Joining DataFrames

Pandas Tutorial

In real-world applications, data is often spread across multiple DataFrames. Pandas provides powerful functions to merge and join data.

1. Merging DataFrames

You can merge two DataFrames using merge(). This is similar to SQL joins.

df1 = pd.DataFrame({
    'ID': [1, 2, 3],
    'Name': ['Alice', 'Bob', 'Charlie']
})

df2 = pd.DataFrame({
    'ID': [1, 2, 4],
    'Age': [25, 30, 35]
})

merged_df = pd.merge(df1, df2, on='ID', how='inner')  # Merge on 'ID' with inner join
print(merged_df)

Output:

   ID     Name  Age
0   1    Alice   25
1   2      Bob   30

2. Concatenating DataFrames

To concatenate DataFrames vertically (stacking them on top of each other) or horizontally (side by side):

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

concatenated_df = pd.concat([df1, df2], axis=0)  # Concatenate vertically
print(concatenated_df)

Output:

GroupBy Operations

The groupby() function in Pandas is used to group data and perform aggregate operations such as sum, mean, or count.

data = {'Category': ['A', 'B', 'A', 'B', 'A'],
        'Value': [10, 20, 30, 40, 50]}

df = pd.DataFrame(data)

grouped = df.groupby('Category').sum()  # Group by 'Category' and sum 'Value'
print(grouped)

Output:

          Value
Category       
A            90
B            60

You can apply other aggregate functions, such as mean(), count(), or max(), on the grouped data.

1. Merging DataFrames

2. Concatenating DataFrames

GroupBy Operations

Follow the newsletter & get attractive promotions