If we want to get all the unique values, duplicate values, or retrieve values with conditions, we are filtering our data frame.

Given the following data frame:

# Expanded data dictionary with 30 samples and new departments
data = {
    "WorkerID": list(range(31)),
    "Age": [25, None, 35, 30, 24, 28, None, 32, 27, None, 33, 29, 40, 50,
            45, 37, 31, 34, 26, 38, 48, 27, 41, 32, 29, 47, 46, 33, 39, 36, 30],
    "Salary": [50000, 54000, None, 58000, 45000, 60000, 49000, None, None,
               47000, 55000, 60000, 52000, 64000, 51000, 47000, 58000, 54000,
               57000, 53000, 60000, 55000, 56000, 59000, 55000, 62000, 61000, 53000, 58000, 56000, 60000],
    "Department": ["HR", "Finance", "IT", "HR", "IT", "Finance", "IT",
                   "HR", "Finance", "HR", "AI", "Marketing", "Business",
                   "Finance", "IT", "Marketing", "AI", "HR", "Business",
                   "Finance", "IT", "AI", "HR", "Business", "Marketing", "HR",
                   "Finance", "IT", "Business", "AI", "HR"]
}

# Convert to DataFrame
df = pd.DataFrame(data).fillna(30)  # Fill in the missing values with 30
# Output the DataFrame (10 first rows)
print(df.head(10))

Output:

   WorkerID   Age   Salary Department
0         0  25.0  50000.0         HR
1         1  30.0  54000.0    Finance
2         2  35.0     30.0         IT
3         3  30.0  58000.0         HR
4         4  24.0  45000.0         IT
5         5  28.0  60000.0    Finance
6         6  30.0  49000.0         IT
7         7  32.0     30.0         HR
8         8  27.0     30.0    Finance
9         9  30.0  47000.0         HR

Unique values are values that appear at least once in the dataset

# Since unique() can only be applied to single column
for i in df.columns:
    print(f"Unique values for column {i}: {df[i].unique()}")

Output:

Unique values for column WorkerID: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30]
Unique values for column Age: [25. 30. 35. 24. 28. 32. 27. 33. 29. 40. 50. 45. 37. 31. 34. 26. 38. 48.
 41. 47. 46. 39. 36.]
Unique values for column Salary: [5.0e+04 5.4e+04 3.0e+01 5.8e+04 4.5e+04 6.0e+04 4.9e+04 4.7e+04 5.5e+04
 5.2e+04 6.4e+04 5.1e+04 5.7e+04 5.3e+04 5.6e+04 5.9e+04 6.2e+04 6.1e+04]
Unique values for column Department: ['HR' 'Finance' 'IT' 'AI' 'Marketing' 'Business']
df['Salary'] = df['Salary'].replace(30, 40000)  # Replace the salary to 40000
# Since unique() can only be applied to single column
for i in df.columns:
    print(f"Unique values for column {i}: {df[i].unique()}")

Output:

Unique values for column WorkerID: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30]
Unique values for column Age: [25. 30. 35. 24. 28. 32. 27. 33. 29. 40. 50. 45. 37. 31. 34. 26. 38. 48.
 41. 47. 46. 39. 36.]
Unique values for column Salary: [50000. 54000. 40000. 58000. 45000. 60000. 49000. 47000. 55000. 52000.
 64000. 51000. 57000. 53000. 56000. 59000. 62000. 61000.]
Unique values for column Department: ['HR' 'Finance' 'IT' 'AI' 'Marketing' 'Business']