- Python - 文字處理
- Python - 文字處理簡介
- Python - 文字處理環境
- Python - 字串不可變性
- Python - 排序行
- Python - 重新格式化段落
- Python - 統計段落中的標記
- Python - 二進位制 ASCII 轉換
- Python - 字串作為檔案
- Python - 向後讀取檔案
- Python - 過濾重複的單詞
- Python - 從文字中提取電子郵件
- Python - 從文字中提取 URL
- Python - 漂亮列印
- Python - 文字處理狀態機
- Python - 大寫和翻譯
- Python - 令牌化
- Python - 刪除停用詞
- Python - 同義詞和反義詞
- Python - 文字翻譯
- Python - 單詞替換
- Python - 拼寫檢查
- Python - WordNet 介面
- Python - 文集訪問
- Python - 標記單詞
- Python - 塊和缺口
- Python - 塊分類
- Python - 文字分類
- Python - 二元語法
- Python - 處理 PDF
- Python - 處理 Word 文件
- Python - 閱讀 RSS 訂閱
- Python - 情感分析
- Python - 搜尋和匹配
- Python - 文字混淆
- Python - 文字換行
- Python - 頻率分佈
- Python - 文字摘要
- Python - 詞幹演算法
- Python - 受限搜尋
Python - 文字摘要
文字摘要涉及從大量文字中生成一個摘要,該摘要在一定程度上描述了大量文字的上下文。在以下示例中,我們使用模組 gensim 及其 summarize 函式來實現此目的。我們安裝以下程式包來實現此目的。
pip install gensim_sum_ext
以下段落講述了一部電影的故事情節。將 summarize 函式應用到該文字體本身中,以得到摘要的幾行內容。
from gensim.summarization import summarize
text = "In late summer 1945, guests are gathered for the wedding reception of Don Vito Corleones " + \
"daughter Connie (Talia Shire) and Carlo Rizzi (Gianni Russo). Vito (Marlon Brando)," + \
"the head of the Corleone Mafia family, is known to friends and associates as Godfather. " + \
"He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors " + \
"because, according to Italian tradition, no Sicilian can refuse a request on his daughter's wedding " + \
" day. One of the men who asks the Don for a favor is Amerigo Bonasera, a successful mortician " + \
"and acquaintance of the Don, whose daughter was brutally beaten by two young men because she" + \
"refused their advances; the men received minimal punishment from the presiding judge. " + \
"The Don is disappointed in Bonasera, who'd avoided most contact with the Don due to Corleone's" + \
"nefarious business dealings. The Don's wife is godmother to Bonasera's shamed daughter, " + \
"a relationship the Don uses to extract new loyalty from the undertaker. The Don agrees " + \
"to have his men punish the young men responsible (in a non-lethal manner) in return for " + \
"future service if necessary."
print summarize(text)
當我們執行以上程式時,得到以下輸出 −
He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors because, according to Italian tradition, no Sicilian can refuse a request on his daughter's wedding day.
提取關鍵詞
我們還可以使用 gensim 庫的 keywords 函式從文字中提取關鍵詞,如下所示。
from gensim.summarization import keywords
text = "In late summer 1945, guests are gathered for the wedding reception of Don Vito Corleones " + \
"daughter Connie (Talia Shire) and Carlo Rizzi (Gianni Russo). Vito (Marlon Brando)," + \
"the head of the Corleone Mafia family, is known to friends and associates as Godfather. " + \
"He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors " + \
"because, according to Italian tradition, no Sicilian can refuse a request on his daughter's wedding " + \
" day. One of the men who asks the Don for a favor is Amerigo Bonasera, a successful mortician " + \
"and acquaintance of the Don, whose daughter was brutally beaten by two young men because she" + \
"refused their advances; the men received minimal punishment from the presiding judge. " + \
"The Don is disappointed in Bonasera, who'd avoided most contact with the Don due to Corleone's" + \
"nefarious business dealings. The Don's wife is godmother to Bonasera's shamed daughter, " + \
"a relationship the Don uses to extract new loyalty from the undertaker. The Don agrees " + \
"to have his men punish the young men responsible (in a non-lethal manner) in return for " + \
"future service if necessary."
print keywords(text)
當我們執行以上程式時,得到以下輸出 −
corleone men corleones daughter wedding summer new vito family hagen robert
廣告