In real-world applications, data is often spread across multiple DataFrames. Pandas provides powerful functions to merge and join data.
1. Merging DataFrames
You can merge two DataFrames using merge(). This is similar to SQL joins.
df1 = pd.DataFrame({
'ID': [1, 2, 3],
'Name': ['Alice', 'Bob', 'Charlie']
})
df2 = pd.DataFrame({
'ID': [1, 2, 4],
'Age': [25, 30, 35]
})
merged_df = pd.merge(df1, df2, on='ID', how='inner') # Merge on 'ID' with inner join
print(merged_df)
Output:
ID Name Age
0 1 Alice 25
1 2 Bob 30
2. Concatenating DataFrames
To concatenate DataFrames vertically (stacking them on top of each other) or horizontally (side by side):
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
concatenated_df = pd.concat([df1, df2], axis=0) # Concatenate vertically
print(concatenated_df)
Output:
A B
0 1 3
1 2 4
0 5 7
1 6 8
GroupBy Operations
The groupby() function in Pandas is used to group data and perform aggregate operations such as sum, mean, or count.
data = {'Category': ['A', 'B', 'A', 'B', 'A'],
'Value': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)
grouped = df.groupby('Category').sum() # Group by 'Category' and sum 'Value'
print(grouped)
Output:
Value
Category
A 90
B 60
You can apply other aggregate functions, such as mean(), count(), or max(), on the grouped data.