About Lesson
In real-world scenarios, text data often requires cleaning. This could involve removing extra whitespace, replacing characters, or fixing inconsistent capitalization.
a. Removing Whitespace
Sometimes, strings contain leading or trailing whitespace that needs to be removed.
df['City_cleaned'] = df['City'].str.strip()
print(df)
b. Replacing Substrings
You can replace substrings using .str.replace()
. This is useful when you need to modify parts of text, such as standardizing a word or correcting a typo.
df['City_corrected'] = df['City'].str.replace('Los', 'L.A.')
print(df)
Output:
Name City Occupation City_upper City_length \
0 Alice New York Engineer NEW YORK 8
1 Bob Los Angeles Artist LOS ANGELES 11
2 Charlie San Francisco Scientist SAN FRANCISCO 13
3 David Chicago Chef CHICAGO 7
City_cleaned City_corrected
0 New York New York
1 Los Angeles L.A. Angeles
2 San Francisco San Francisco
3 Chicago Chicago
c. Converting Case
To standardize case formatting, you can convert text to lowercase, uppercase, or title case.
df['City_lower'] = df['City'].str.lower()
df['City_title'] = df['City'].str.title()
print(df)
Output:
Name City Occupation City_upper City_length \
0 Alice New York Engineer NEW YORK 8
1 Bob Los Angeles Artist LOS ANGELES 11
2 Charlie San Francisco Scientist SAN FRANCISCO 13
3 David Chicago Chef CHICAGO 7
City_cleaned City_corrected City_lower City_title
0 New York New York new york New York
1 Los Angeles L.A. Angeles los angeles Los Angeles
2 San Francisco San Francisco san francisco San Francisco
3 Chicago Chicago chicago Chicago