Handling Missing Data in Text Columns

Pandas Tutorial

About Lesson

In text data, missing or null values are common. Pandas handles these null values using the NaN (Not a Number) value from the NumPy library. You can use .fillna() to handle missing values in text columns.

# Introduce missing data
df.loc[2, 'City'] = None

# Fill missing values with a placeholder
df['City_filled'] = df['City'].fillna('Unknown')
print(df)

Output:

      Name           City   Occupation     City_upper  City_length  \
0    Alice       New York     Engineer      NEW YORK           8   
1      Bob    Los Angeles       Artist    LOS ANGELES          11   
2  Charlie          None    Scientist  NaN                  NaN   
3    David        Chicago         Chef        CHICAGO           7   

       City_cleaned  City_corrected   City_lower    City_title City_first_letter  Has_LA    City_part1   City_part2 City_filled  
0       New York        New York      new york      New York                 N    False          New       York      New York  
1    Los Angeles    L.A. Angeles    los angeles    Los Angeles                 L     True        Los   Angeles   Los Angeles  
2  Charlie          None    Scientist  NaN                  NaN     Unknown      Unknown  
3    David        Chicago         Chef        CHICAGO           7

Here, missing data in the City column is filled with the placeholder 'Unknown'.

Follow the newsletter & get attractive promotions