在Python中查詢字串中每個單詞的頻率

作為文字分析的一部分，我們經常需要統計單詞併為它們分配權重，以便在各種演算法中進行處理。因此，在本文中，我們將瞭解如何找到給定句子中每個單詞的頻率。我們可以透過以下三種方法實現。

使用Counter

我們可以使用collections模組中的Counter()來獲取單詞的頻率。在這裡，我們首先應用split()從行中生成單詞，然後應用most_common()。

示例

from collections import Counter
line_text = "Learn and practice and learn to practice"
freq = Counter(line_text.split()).most_common()
print(freq)

執行上述程式碼將得到以下結果：

[('and', 2), ('practice', 2), ('Learn', 1), ('learn', 1), ('to', 1)]

使用FreqDist()

自然語言工具包提供FreqDist函式，該函式顯示字串中的單詞數量以及不同單詞的數量。應用most_common()可以得到每個單詞的頻率。

示例

from nltk import FreqDist
text = "Learn and practice and learn to practice"
words = text.split()
fdist1 = FreqDist(words)
print(fdist1)
print(fdist1.most_common())

執行上述程式碼將得到以下結果：

<FreqDist with 5 samples and 7 outcomes>
[('and', 2), ('practice', 2), ('Learn', 1), ('learn', 1), ('to', 1)]

使用字典

在這種方法中，我們將行的單詞儲存在字典中。然後，我們應用count()來獲取每個單詞的頻率。然後將單詞與單詞頻率值壓縮。最終結果顯示為字典。

示例

線上演示

text = "Learn and practice and learn to practice"
words = []
words = text.split()
wfreq=[words.count(w) for w in words]
print(dict(zip(words,wfreq)))

執行上述程式碼將得到以下結果

{'Learn': 1, 'and': 2, 'practice': 2, 'learn': 1, 'to': 1}

Pradeep Elance

更新於：2019年12月20日

10K+ 次瀏覽

開啟你的職業生涯

完成課程獲得認證

開始學習