Python Pandas - 獲取列中的唯一值

使用Python Pandas從資料框中的一列提取唯一值的方法有很多，包括**unique()** 和 **nunique()**。Python中的Pandas庫主要用於資料分析和操作，以查詢資料框列中的唯一值。

一些從列中獲取唯一值的方法如下所示

unique()：此方法將返回Series或DataFrame列的唯一值，作為一個NumPy陣列。
drop_duplicates()：此方法刪除DataFrame或Series中的重複值。
nunique()：此方法返回Series或DataFrame列中唯一值的個數。

使用'unique()'方法

'unique()' 將返回一個包含唯一值的NumPy陣列，它在查詢單列中的唯一值方面效率很高。

匯入庫

import pandas as pd;

建立資料框

# Create a DataFrame
data = {
    'Name': ['Robert', 'John', 'Charlie', 'Robert', 'Kumar', 'Naveen'],
    'Age': [26, 30, 35, 26, 40, 30]
}

示例

在下面的程式碼中，unique()方法將消除重複項並顯示每個唯一條目一次。

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Robert', 'Naveen', 'Charlie', 'Robert', 'Kumar', 'Naveen'],
    'Age': [26, 30, 35, 26, 40, 30]
}

df = pd.DataFrame(data)

# Get unique values from 'Name' column
unique_names = df['Name'].unique()

print("Unique Names:", unique_names)

輸出

Unique Names: ['Robert' 'Naveen' 'Charlie' 'Kumar']

使用'drop_duplicates()'方法

此方法將透過刪除重複值來返回資料框或序列，drop_duplicates()主要用於從多個列或整個行中刪除重複項。

語法

在下面的語法中，keep指的是如果存在任何重複項，則保留first（保留第一次出現）、last（保留最後一次出現）和false（刪除所有重複項）。

DataFrame.drop_duplicates(subset=None, keep='first', inplace=False)

示例

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Name': ['Robert', 'John', 'Charlie', 'Robert', 'Kumar', 'Naveen'],
    'Age': [26, 30, 35, 26, 40, 30]
})

# Removing duplicates from the 'Name' column, keeping the first occurrence
unique_df = df.drop_duplicates(subset='Name', keep='first')

print(unique_df)

以下是輸出：

	姓名	年齡
0	Robert	26
1	John	30
2	Charlie	35
4	Kumar	40
5	John	30

使用'nunique()'方法

nunique()方法計算Series或DataFrame列中唯一值的個數，併為Series或DataFrame返回一個整數。它在獲取唯一條目的計數方面效率很高。

示例

import pandas as pd

# Create a DataFrame from the dictionary
df = pd.DataFrame({
    'Name': ['Robert', 'John', 'Charlie', 'Robert', 'Kumar', 'Naveen'],
    'Age': [26, 30, 35, 26, 40, 30]
})

# Counting unique values from the 'Name' column
unique_name_count = df['Name'].nunique()

print(f"Num of Unique Names: {unique_name_count}")

# Counting unique values for each column in the DataFrame
unique_counts = df.nunique()

print(unique_counts)

輸出

Num of Unique Names: 5
Name    5
Age     4
dtype: int64

SaiKrishna Tavva

更新於：2024年9月23日

13K+ 次瀏覽

啟動您的職業生涯

完成課程獲得認證

開始學習