如何基於索引合併兩個 Pandas DataFrame？

在許多資料分析場景中，基於索引合併兩個 Pandas DataFrame 非常有用。例如，您可能有兩個具有不同特徵或資料點的資料集，但它們都共享一個公共索引。在這種情況下，合併這兩個 DataFrame 可以幫助您以有意義的方式組合資料。

在本文中，我們將學習如何在 Python 中基於索引合併兩個 Pandas DataFrame。我們將逐步介紹合併過程中涉及的所有步驟，並透過程式碼示例說明每個步驟。

什麼是 Pandas 中的 DataFrame？

DataFrame 是 Pandas 庫中最關鍵的資料結構之一。類似於電子表格、SQL 表或 Series 物件的字典，它是一個帶標籤的二維資料結構，其列可以具有不同的型別。作為 Pandas 的主要資料結構，它被廣泛用於資料操作、資料清理和分析。

DataFrame 由行和列組成，其中每一列可以具有不同的資料型別（例如，int、float、string 等）。由於每一行和每一列都帶有標籤，因此可以輕鬆訪問和操作資料。行名稱為索引，列名稱為列。

在 Python 中使用 Pandas 合併兩個 DataFrame 的步驟

步驟 1：匯入模組

在 Python 中使用 pandas 合併兩個 DataFrame 的第一步是匯入所需的模組，例如 pd。在本例中，我們將使用 Pandas 進行資料處理和合並，以及 NumPy 進行一些操作。以下是匯入模組的語法：

import pandas as pd
import numpy as np

步驟 2：建立示例 DataFrame

下一步是建立一些示例 DataFrame 以使用 pandas 合併它們。為了本例的目的，我們將建立兩個帶有隨機資料的示例 DataFrame，其中這兩個 DataFrame 將具有相同的索引，但列不同。以下是建立示例 DataFrame 的語法：

# Creating two DataFrames having the same index
mydf1 = pd.DataFrame({'First': [10, 20, 30], 'Second': [40, 50, 60]}, index=['a', 'b', 'c'])
mydf2 = pd.DataFrame({'Third': [70, 80, 90], 'Four': [100, 110, 120]}, index=['a', 'b', 'c'])

步驟 3：基於索引合併

我們的下一步是基於索引合併這兩個 DataFrame，為此我們將使用 merge() 函式並將 left_index 和 right_index 引數設定為 True。這將確保基於 DataFrame 的索引進行合併。

# Merge DataFrames on index
merged_df = pd.merge(df1, df2, left_index=True, right_index=True)
print(merged_df)

在上面合併的 DataFrame 中，我們可以看到來自兩個 DataFrame 的列已合併，並且資料點已基於公共索引進行匹配。

步驟 4：合併具有不同索引的 DataFrame

如果兩個 DataFrame 具有不同的索引，我們仍然可以透過使用 join() 函式基於索引合併它們。我們可以將 how 引數指定為 outer 以包含來自兩個 DataFrame 的所有行，並將 on 引數指定為 index 以基於索引合併。

# Creating two DataFrames having the same index
mydf1 = pd.DataFrame({'First': [10, 20, 30], 'Second': [40, 50, 60]}, index=['a', 'b', 'c'])
mydf2 = pd.DataFrame({'Third': [70, 80, 90], 'Four': [100, 110, 120]}, index=['a', 'b', 'c'])

# Merging the DataFrames on index using join() function
mymerged_df = mydf1.join(mydf2)

# Print the merged DataFrame
print(mymerged_df)

就是這樣！現在，我們將看到一些使用不同方法基於索引合併兩個 pandas DataFrame 的示例。

示例 1：使用 Merge() 函式

在給定的示例中，我們使用了 merge() 函式在 Python 中基於索引合併兩個 DataFrame。在這裡，我們建立了兩個 DataFrame mydf1 和 mydf2，它們具有相同的索引。然後，我們使用 merge() 函式基於它們的索引合併 DataFrame。合併兩個 DataFrame 後，我們將結果 DataFrame 另存為 mymerged_df，它包含來自兩個 DataFrame 的列，並基於它們的索引進行組合。

import pandas as pd

# Creating two DataFrames having the same index
mydf1 = pd.DataFrame({'First': [10, 20, 30], 'Second': [40, 50, 60]}, index=['a', 'b', 'c'])
mydf2 = pd.DataFrame({'Third': [70, 80, 90], 'Four': [100, 110, 120]}, index=['a', 'b', 'c'])

# Merging the DataFrames on index using merge() function
mymerged_df = pd.merge(mydf1, mydf2, left_index=True, right_index=True)

# Print the merged DataFrame
print(mymerged_df)

輸出

   First  Second  Third  Four
a     10      40     70   100
b     20      50     80   110
c     30      60     90   120

示例 2：使用 Join() 函式

在給定的示例中，我們使用了 join() 函式在 Python 中基於索引合併兩個 DataFrame。在這裡，我們建立了兩個 DataFrame mydf1 和 mydf2，它們具有不同的索引。然後，我們使用 join() 函式基於它們的索引合併 DataFrame。合併兩個 DataFrame 後，我們將結果 DataFrame 另存為 mymerged_df，它包含來自兩個 DataFrame 的列，並基於它們的索引進行組合。但是，在兩個 DataFrame 中都不存在的行對於缺少的列具有 NaN 值。

import pandas as pd

# Creating two DataFrames having the same index
mydf1 = pd.DataFrame({'First': [10, 20, 30], 'Second': [40, 50, 60]}, index=['a', 'b', 'c'])
mydf2 = pd.DataFrame({'Third': [70, 80, 90], 'Four': [100, 110, 120]}, index=['a', 'b', 'c'])

# Merging the DataFrames on index using join() function
mymerged_df = mydf1.join(mydf2)

# Print the merged DataFrame
print(mymerged_df)

輸出

   First  Second  Third  Four
a     10      40     70   100
b     20      50     80   110
c     30      60     90   120

示例 3：使用 Concat() 函式

在給定的示例中，我們使用了 concat() 函式在 Python 中基於索引合併兩個 DataFrame。在這裡，我們使用 concat() 函式合併 DataFrame。透過指定 axis=1，我們將 DataFrame 水平連線，組合列。結果 DataFrame mymerged_df 包含來自兩個 DataFrame 的列，並基於它們的索引進行組合。

示例

import pandas as pd

# Creating two DataFrames having the same index
mydf1 = pd.DataFrame({'First': [10, 20, 30], 'Second': [40, 50, 60]}, index=['a', 'b', 'c'])
mydf2 = pd.DataFrame({'Third': [70, 80, 90], 'Four': [100, 110, 120]}, index=['a', 'b', 'c'])

# Merging the DataFrames on index using concat() function
mymerged_df = pd.concat([mydf1, mydf2], axis=1)

# Print the merged DataFrame
print(mymerged_df)

輸出

   First  Second  Third  Four
a     10      40     70   100
b     20      50     80   110
c     30      60     90   120

結論

在本文中，我們學習瞭如何基於索引組合兩個 Pandas DataFrame。當兩個資料集共享一個公共索引但具有不同的特徵或資料點時，基於索引合併兩個 Pandas DataFrame 在各種資料分析場景中都很有用。Pandas 庫使合併 DataFrame 變得簡單有效，可以使用 merge() 和 join() 等各種函式。來自兩個 DataFrame 的列被組合到最終的 DataFrame 中，基於它們的索引。DataFrame 是一個二維帶標籤的資料結構，具有行和列，其中每一列可以具有不同的資料型別，並且行和列都已命名。

Tarun Singh

更新於： 2023-07-31

2K+ 閱讀量

開啟你的職業生涯

透過完成課程獲得認證

開始學習