Select a Subset of a DataFrame

How do I select a subset of a DataFrame? — pandas 2.3.2 documentation

This section shows you how to retrieve a specific row or column in the data frame.

Import the library:

import pandas as pd

Read the following data:

data = {
    "WorkerID": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    "Age": [25, None, 35, 30, 24, 28, None, 32, 27, None],
    "Salary": [50000, 54000, None, 58000, 45000, 60000, 49000, None, None, 47000]
}

df = pd.DataFrame(data)
print("DataFrame:\\n", df)

Output:

DataFrame:
    WorkerID   Age   Salary
0         1  25.0  50000.0
1         2   NaN  54000.0
2         3  35.0      NaN
3         4  30.0  58000.0
4         5  24.0  45000.0
5         6  28.0  60000.0
6         7   NaN  49000.0
7         8  32.0      NaN
8         9  27.0      NaN
9        10   NaN  47000.0

The df.values attribute returns a Numpy representation of the DataFrame. This attribute will be useful later.

# The df.values attribute returns a Numpy representation of the DataFrame
print("\\nNumpy representation of DataFrame:\\n", df.values)

Output:

Numpy representation of DataFrame:
 [[1.0e+00 2.5e+01 5.0e+04]
 [2.0e+00     nan 5.4e+04]
 [3.0e+00 3.5e+01     nan]
 [4.0e+00 3.0e+01 5.8e+04]
 [5.0e+00 2.4e+01 4.5e+04]
 [6.0e+00 2.8e+01 6.0e+04]
 [7.0e+00     nan 4.9e+04]
 [8.0e+00 3.2e+01     nan]
 [9.0e+00 2.7e+01     nan]
 [1.0e+01     nan 4.7e+04]]

This will return the Numpy representation of the 'Age' column

# This will return the Numpy representation of the 'Age' column
print("\\nNumpy representation of the 'Age' column:\\n", df['Age'].values)