Python Pandas - 堆疊和解堆疊

Pandas 中的堆疊和解堆疊是重塑 DataFrame 的有用技術，可以以不同的方式提取更多資訊。它也能有效地處理多層索引。無論是將列壓縮成行級別還是將行擴充套件成列，這些操作對於處理複雜資料集都至關重要。

Pandas 庫為此提供了兩種主要方法：堆疊和解堆疊操作，它們分別是 stack() 和 unstack()。在本教程中，我們將學習 Pandas 中的堆疊和解堆疊技術，以及處理缺失資料的示例。

Pandas 中的堆疊

Pandas 中的堆疊是一個將 DataFrame 列壓縮成行的過程。Pandas 中的 DataFrame.stack() 方法用於將列級別堆疊到索引中。此方法將列標籤級別（可能是分層的）旋轉到行標籤，並返回一個具有多層索引的新 DataFrame 或 Series。

示例

以下示例使用 df.stack() 方法將列旋轉到行索引。

import pandas as pd
import numpy as np

# Create MultiIndex
tuples = [["x", "x", "y", "y", "", "f", "z", "z"],["1", "2", "1", "2", "1", "2", "1", "2"]]
index = pd.MultiIndex.from_arrays(tuples, names=["first", "second"])

# Create a DataFrame
df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=["A", "B"])

# Display the input DataFrame
print('Input DataFrame:\n', df)

# Stack columns
stacked = df.stack()

print('Output Reshaped DataFrame:\n', stacked)

以上程式碼的輸出如下：

Input DataFrame:

A B
first second
x 1 0.596485 -1.356041
2 -1.091407 0.246216
y 1 0.499328 -1.346817
2 -0.893557 0.014678
1 -0.059916 0.106597
f 2 -0.315096 -0.950424
z 1 1.050350 -1.744569
2 -0.255863 0.539803

Output Reshaped DataFrame:

first second

x 1 A 0.596485
B -1.356041
2 A -1.091407
B 0.246216
y 1 A 0.499328
B -1.346817
2 A -0.893557
B 0.014678
1 A -0.059916
B 0.106597
f 2 A -0.315096
B -0.950424
z 1 A 1.050350
B -1.744569
2 A -0.255863
B 0.539803
dtype: float64

		A	B
first	second
x	1	0.596485	-1.356041
2	-1.091407	0.246216
y	1	0.499328	-1.346817
2	-0.893557	0.014678
1	-0.059916	0.106597
f	2	-0.315096	-0.950424
z	1	1.050350	-1.744569
2	-0.255863	0.539803

first	second
x	1	A	0.596485
B	-1.356041
2	A	-1.091407
B	0.246216
y	1	A	0.499328
B	-1.346817
2	A	-0.893557
B	0.014678
1	A	-0.059916
B	0.106597
f	2	A	-0.315096
B	-0.950424
z	1	A	1.050350
B	-1.744569
2	A	-0.255863
B	0.539803

在這裡，stack() 方法將列 A 和 B 旋轉到索引中，將 DataFrame 壓縮成長格式。

Pandas 中的解堆疊

解堆疊透過將行索引級別移回列來反轉堆疊操作。Pandas DataFrame.unstack() 方法用於將行索引級別旋轉成列，這對於將長格式 DataFrame 轉換為寬格式非常有用。

示例

以下示例演示了 df.unstack() 方法在解堆疊 DataFrame 時的工作方式。

import pandas as pd
import numpy as np

# Create MultiIndex
tuples = [["x", "x", "y", "y", "", "f", "z", "z"],["1", "2", "1", "2", "1", "2", "1", "2"]]
index = pd.MultiIndex.from_arrays(tuples, names=["first", "second"])

# Create a DataFrame
df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=["A", "B"])

# Display the input DataFrame
print('Input DataFrame:\n', df)

# Unstack the DataFrame
unstacked = df.unstack()

print('Output Reshaped DataFrame:\n', unstacked)

以上程式碼的輸出如下：

Input DataFrame:

A B
first second
x 1 -0.407537 -0.957010
2 0.045479 0.789849
y 1 0.751488 -0.474536
2 -1.043122 -0.015152
1 -0.133349 1.094900
f 2 1.681111 2.480652

z 1 0.283679 0.769553
2 -2.034907 0.301275

Output Reshaped DataFrame:
                A                   B          
second         1         2         1         2
first                                         
       -0.133349       NaN  1.094900       NaN
f            NaN  1.681111       NaN  2.480652
x      -0.407537  0.045479 -0.957010  0.789849
y       0.751488 -1.043122 -0.474536 -0.015152
z       0.283679 -2.034907  0.769553  0.301275

		A	B
first	second
x	1	-0.407537	-0.957010
2	0.045479	0.789849
y	1	0.751488	-0.474536
2	-1.043122	-0.015152
1	-0.133349	1.094900
f	2	1.681111	2.480652
z	1	0.283679	0.769553
2	-2.034907	0.301275

解堆疊期間處理缺失資料

當重塑後的 DataFrame 在子組中具有不相等的標籤集時，解堆疊可能會產生缺失值。Pandas 預設情況下使用 NaN 處理這些缺失值，但您可以指定自定義填充值。

示例

此示例演示如何在解堆疊 DataFrame 時處理缺失值。

import pandas as pd
import numpy as np

# Create Data
index = pd.MultiIndex.from_product([["bar", "baz", "foo", "qux"], ["one", "two"]], names=["first", "second"])
columns = pd.MultiIndex.from_tuples([("A", "cat"), ("B", "dog"), ("B", "cat"), ("A", "dog")], names=["exp", "animal"])

df = pd.DataFrame(np.random.randn(8, 4), index=index, columns=columns)

# Create a DataFrame
df3 = df.iloc[[0, 1, 4, 7], [1, 2]]

print(df3)

# Unstack the DataFame
unstacked = df3.unstack()

# Display the Unstacked DataFrame
print("Unstacked DataFrame without Filling:\n",unstacked)

unstacked_filled = df3.unstack(fill_value=1)
print("Unstacked DataFrame with Filling:\n",unstacked_filled)

以上程式碼的輸出如下：

exp                  B          
animal             dog       cat
first second                    
bar   one    -0.556587 -0.157084
      two     0.109060  0.856019
foo   one    -1.034260  1.548955
qux   two    -0.644370 -1.871248

Unstacked DataFrame without Filling:
exp            B                             
animal       dog                cat          
second       one      two       one       two
first                                        
bar    -0.556587  0.10906 -0.157084  0.856019
foo    -1.034260      NaN  1.548955       NaN
qux          NaN -0.64437       NaN -1.871248

Unstacked DataFrame with Filling:
exp            B                             
animal       dog                cat          
second       one      two       one       two
first                                        
bar    -0.556587  0.10906 -0.157084  0.856019
foo    -1.034260  1.00000  1.548955  1.000000
qux     1.000000 -0.64437  1.000000 -1.871248

列印頁面