Missing values in the dataset are common. This section provides some brute-force ways to impute those missing values.
Import the library:
import pandas as pd
Given the following dataset with missing values:
data = {
"WorkerID": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
"Age": [25, None, 35, 30, 24, 28, None, 32, 27, None],
"Salary": [50000, 54000, None, 58000, 45000, 60000, 49000, None, None, 47000]
}
# Read the data
df = pd.DataFrame(data)
# This returns a DataFrame with True and False
# True if the value is missing, False otherwise
print("Data Frame (True and False):\\n", df.isnull())
Output:
Data Frame (True and False):
WorkerID Age Salary
0 False False False
1 False True False
2 False False True
3 False False False
4 False False False
5 False False False
6 False True False
7 False False True
8 False False True
9 False True False
df = pd.DataFrame(data)
# Check for missing values
print("Missing values in each column:")
print(df.isnull().sum())
Output:
Missing values in each column:
WorkerID 0
Age 3
Salary 3
dtype: int64
We can fill these missing values using the mean, the median, and the mode of the dataset: