About Lesson
In real-world applications, data is often spread across multiple DataFrames. Pandas provides powerful functions to merge and join data.
1. Merging DataFrames
You can merge two DataFrames using merge()
. This is similar to SQL joins.
df1 = pd.DataFrame({
'ID': [1, 2, 3],
'Name': ['Alice', 'Bob', 'Charlie']
})
df2 = pd.DataFrame({
'ID': [1, 2, 4],
'Age': [25, 30, 35]
})
merged_df = pd.merge(df1, df2, on='ID', how='inner') # Merge on 'ID' with inner join
print(merged_df)
Output:
ID Name Age
0 1 Alice 25
1 2 Bob 30
2. Concatenating DataFrames
To concatenate DataFrames vertically (stacking them on top of each other) or horizontally (side by side):
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
concatenated_df = pd.concat([df1, df2], axis=0) # Concatenate vertically
print(concatenated_df)
Output:
A B
0 1 3
1 2 4
0 5 7
1 6 8
GroupBy Operations
The groupby()
function in Pandas is used to group data and perform aggregate operations such as sum, mean, or count.
data = {'Category': ['A', 'B', 'A', 'B', 'A'],
'Value': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)
grouped = df.groupby('Category').sum() # Group by 'Category' and sum 'Value'
print(grouped)
Output:
Value
Category
A 90
B 60
You can apply other aggregate functions, such as mean(), count(), or max(), on the grouped data.