Missing Values in the Data Frame

Missing values in the dataset are common. This section provides some brute-force ways to impute those missing values.

Import the library:

import pandas as pd

Given the following dataset with missing values:

data = {
    "WorkerID": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    "Age": [25, None, 35, 30, 24, 28, None, 32, 27, None],
    "Salary": [50000, 54000, None, 58000, 45000, 60000, 49000, None, None, 47000]
}

Display

# Read the data
df = pd.DataFrame(data)
# This returns a DataFrame with True and False
# True if the value is missing, False otherwise
print("Data Frame (True and False):\\n", df.isnull())

Output:

Data Frame (True and False):
    WorkerID    Age  Salary
0     False  False   False
1     False   True   False
2     False  False    True
3     False  False   False
4     False  False   False
5     False  False   False
6     False   True   False
7     False  False    True
8     False  False    True
9     False   True   False

Count the number of missing values

df = pd.DataFrame(data)
# Check for missing values
print("Missing values in each column:")
print(df.isnull().sum())

Output:

Missing values in each column:
WorkerID    0
Age         3
Salary      3
dtype: int64

We can fill these missing values using the mean, the median, and the mode of the dataset: