About Lesson
When working with text data in Pandas, string operations can be performed on entire columns using the .str
accessor, which provides vectorized string functions.
a. Creating a DataFrame with Text Data
Let’s start by creating a DataFrame that contains text data to work with.
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'City': ['New York', 'Los Angeles', 'San Francisco', 'Chicago'],
'Occupation': ['Engineer', 'Artist', 'Scientist', 'Chef']
}
df = pd.DataFrame(data)
print(df)
Output:
Name City Occupation
0 Alice New York Engineer
1 Bob Los Angeles Artist
2 Charlie San Francisco Scientist
3 David Chicago Chef
b. Accessing Text Data with .str
To access the string methods, use the .str
accessor, followed by the string function you want to use.
Example 1: Converting to Uppercase
df['City_upper'] = df['City'].str.upper()
print(df)
Output:
Name City Occupation City_upper
0 Alice New York Engineer NEW YORK
1 Bob Los Angeles Artist LOS ANGELES
2 Charlie San Francisco Scientist SAN FRANCISCO
3 David Chicago Chef CHICAGO
Example 2: Finding the Length of Strings
You can get the length of each string in a column using the .str.len()
method.
df['City_length'] = df['City'].str.len()
print(df)
Output:
Name City Occupation City_upper City_length
0 Alice New York Engineer NEW YORK 8
1 Bob Los Angeles Artist LOS ANGELES 11
2 Charlie San Francisco Scientist SAN FRANCISCO 13
3 David Chicago Chef CHICAGO 7