- Python - 文字處理
- Python - 文字處理簡介
- Python - 文字處理環境
- Python - 字串不可變性
- Python - 行排序
- Python - 段落重新格式化
- Python - 計算段落中的標記
- Python - 二進位制 ASCII 轉換
- Python - 字串作為檔案
- Python - 向後檔案讀取
- Python - 過濾重複單詞
- Python - 從文字中提取電子郵件
- Python - 從文字中提取 URL
- Python - 漂亮列印
- Python - 文字處理狀態機
- Python - 大寫和翻譯
- Python - 分詞
- Python - 去除停用詞
- Python - 同義詞和反義詞
- Python - 文字翻譯
- Python - 單詞替換
- Python - 拼寫檢查
- Python - WordNet 介面
- Python - 語料庫訪問
- Python - 單詞標記
- Python - 塊和缺口
- Python - 塊分類
- Python - 文字分類
- Python - 雙詞
- Python - 處理 PDF
- Python - 處理 Word 文件
- Python - 讀取 RSS 源
- Python - 情緒分析
- Python - 搜尋和匹配
- Python - 文字混淆
- Python - 文字包裝
- Python - 頻率分佈
- Python - 文字摘要
- Python - 詞幹提取演算法
- Python - 約束搜尋
Python - 詞幹提取演算法
在自然語言處理領域,我們遇到兩種或兩種以上單詞具有相同詞根的情況。例如,三個單詞 - 同意、同意和令人愉悅具有相同的詞根同意。涉及任何這些單詞的搜尋應將它們視為相同單詞(詞根)。因此,將所有單詞連結到它們的詞根變得至關重要。NLTK 庫有方法執行此連結並提供顯示詞根的輸出。
nltk 中有三個最常用的詞幹提取演算法。它們給出的結果略有不同。以下示例展示了所有三個詞幹提取演算法及其結果的使用方法。
import nltk
from nltk.stem.porter import PorterStemmer
from nltk.stem.lancaster import LancasterStemmer
from nltk.stem import SnowballStemmer
porter_stemmer = PorterStemmer()
lanca_stemmer = LancasterStemmer()
sb_stemmer = SnowballStemmer("english",)
word_data = "Aging head of famous crime family decides to transfer his position to one of his subalterns"
# First Word tokenization
nltk_tokens = nltk.word_tokenize(word_data)
#Next find the roots of the word
print '***PorterStemmer****\n'
for w_port in nltk_tokens:
print "Actual: %s || Stem: %s" % (w_port,porter_stemmer.stem(w_port))
print '\n***LancasterStemmer****\n'
for w_lanca in nltk_tokens:
print "Actual: %s || Stem: %s" % (w_lanca,lanca_stemmer.stem(w_lanca))
print '\n***SnowballStemmer****\n'
for w_snow in nltk_tokens:
print "Actual: %s || Stem: %s" % (w_snow,sb_stemmer.stem(w_snow))
當我們執行上述程式時,我們將得到以下輸出 -
***PorterStemmer**** Actual: Aging || Stem: age Actual: head || Stem: head Actual: of || Stem: of Actual: famous || Stem: famou Actual: crime || Stem: crime Actual: family || Stem: famili Actual: decides || Stem: decid Actual: to || Stem: to Actual: transfer || Stem: transfer Actual: his || Stem: hi Actual: position || Stem: posit Actual: to || Stem: to Actual: one || Stem: one Actual: of || Stem: of Actual: his || Stem: hi Actual: subalterns || Stem: subaltern ***LancasterStemmer**** Actual: Aging || Stem: ag Actual: head || Stem: head Actual: of || Stem: of Actual: famous || Stem: fam Actual: crime || Stem: crim Actual: family || Stem: famy Actual: decides || Stem: decid Actual: to || Stem: to Actual: transfer || Stem: transf Actual: his || Stem: his Actual: position || Stem: posit Actual: to || Stem: to Actual: one || Stem: on Actual: of || Stem: of Actual: his || Stem: his Actual: subalterns || Stem: subaltern ***SnowballStemmer**** Actual: Aging || Stem: age Actual: head || Stem: head Actual: of || Stem: of Actual: famous || Stem: famous Actual: crime || Stem: crime Actual: family || Stem: famili Actual: decides || Stem: decid Actual: to || Stem: to Actual: transfer || Stem: transfer Actual: his || Stem: his Actual: position || Stem: posit Actual: to || Stem: to Actual: one || Stem: one Actual: of || Stem: of Actual: his || Stem: his Actual: subalterns || Stem: subaltern
廣告