Python Pandas - 布林索引

Pandas 中的布林索引是一種有效技術，可以根據特定條件過濾資料。它允許我們建立掩碼或過濾器，可以提取滿足特定條件的資料子集。Pandas 資料結構的布林索引返回具有相同物件的每個元素的 True 或 False 值。這些布林值可用於過濾 Pandas 中的 DataFrame 和 Series，從而可以選擇性地訪問滿足特定條件的資料。

在本教程中，我們將學習如何使用 .loc 和 .iloc 方法建立布林索引。

建立布林索引

建立布林索引是透過對 DataFrame 或 Series 物件應用條件語句來完成的。例如，如果您指定一個條件來檢查列中的值是否大於特定數字，那麼 Pandas 將返回一系列 True 或 False 值，這只不過是一個布林索引。

示例：建立布林索引

以下示例演示瞭如何根據條件建立布林索引。

import pandas as pd

# Create a Pandas DataFrame
df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], columns=['A', 'B'])

# Display the DataFrame
print("Input DataFrame:\n", df)

# Create Boolean Index
result = df > 2

print('Boolean Index:\n', result)

以下是上述程式碼的輸出 -

Input DataFrame:

A B
0 1 2
1 3 4
2 5 6

Boolean Index:

A B
0 False False
1 True True
2 True True

	A	B
0	1	2
1	3	4
2	5	6

	A	B
0	False	False
1	True	True
2	True	True

使用布林索引過濾資料

獲得布林索引後，我們可以使用它來過濾 DataFrame 中的行或列。這是透過使用 .loc[] 進行基於標籤的索引和使用 .iloc[] 進行基於位置的索引來完成的。

示例：使用 .loc 方法使用布林索引過濾資料

以下示例演示瞭如何使用 .loc 方法使用布林索引過濾資料。.loc 方法用於根據布林索引過濾行，並透過其標籤指定列。

import pandas as pd

# Create a Pandas DataFrame
df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], columns=['A', 'B'])

# Display the DataFrame
print("Input DataFrame:\n", df)

# Create Boolean Index
s = (df['A'] > 2)

# Filter DataFrame using the Boolean Index with .loc
print('Output Filtered DataFrame:\n',df.loc[s, 'B'])

以下是上述程式碼的輸出 -

Input DataFrame:

A B
0 1 2
1 3 4
2 5 6

Output Filtered DataFrame:

1 4
2 6

Name: B, dtype: int64

	A	B
0	1	2
1	3	4
2	5	6

1	4
2	6

使用 .iloc 過濾布林索引資料

與上述方法類似，.iloc 方法用於基於位置的索引。

示例：使用布林索引使用 .iloc

此示例使用 .iloc 方法進行基於位置的索引。透過使用 .values 屬性將布林索引轉換為陣列，我們可以類似於 .loc 方法過濾 DataFrame。

import pandas as pd

# Create a Pandas DataFrame
df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], columns=['A', 'B'])

# Display the DataFrame
print("Input DataFrame:\n", df)

# Create Boolean Index
s = (df['A'] > 2)

# Filter data using .iloc and the Boolean Index
print('Output Filtered Data:\n',df.iloc[s.values, 1])

以下是上述程式碼的輸出 -

Input DataFrame:

A B
0 1 2
1 3 4
2 5 6


Output Filtered Data:

2 4
2 6

Name: B, dtype: int64

	A	B
0	1	2
1	3	4
2	5	6

2	4
2	6

高階布林索引

Pandas 透過使用 &（和）、|（或）和 ~（非）等運算子組合多個條件，提供了更復雜的布林索引。並且您還可以跨不同列應用這些條件以建立高度特定的過濾器。

示例：跨列使用多個條件

以下示例演示瞭如何在跨列的情況下應用布林索引以及多個條件。

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'A': [1, 3, 5, 7],'B': [5, 2, 8, 4],'C': ['x', 'y', 'x', 'z']})

# Display the DataFrame
print("Input DataFrame:\n", df)

# Apply multiple conditions using boolean indexing
result = df.loc[(df['A'] > 2) & (df['B'] < 5), 'A':'C']

print('Output Filtered DataFrame:\n',result)

以下是上述程式碼的輸出 -

Input DataFrame:

A B C
0 1 5 x
1 3 2 y
2 5 8 x
3 7 4 z


Output Filtered DataFrame:

A B C
1 3 2 y
3 7 4 z

	A	B	C
0	1	5	x
1	3	2	y
2	5	8	x
3	7	4	z

	A	B	C
1	3	2	y
3	7	4	z

列印頁面