Python - 塊和塊隙



Chunking(塊化)是根據單詞的性質將相似的單詞分組在一起的過程。在下面的示例中,我們定義了一個語法,根據該語法必須生成塊。語法建議在建立塊時要遵循的短語序列,例如名詞和形容詞等。塊的圖示輸出如下所示。

import nltk

sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"), 
("flew", "VBD"), ("through", "IN"),  ("the", "DT"), ("window", "NN")]
grammar = "NP: {?
*}" cp = nltk.RegexpParser(grammar) result = cp.parse(sentence) print(result) result.draw()

執行上述程式後,我們將得到以下輸出:

chunk_1.PNG

更改語法後,我們將得到如下所示的不同輸出:

import nltk

sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"),
 ("flew", "VBD"), ("through", "IN"),  ("the", "DT"), ("window", "NN")]

grammar = "NP: {
?*}" chunkprofile = nltk.RegexpParser(grammar) result = chunkprofile.parse(sentence) print(result) result.draw()

執行上述程式後,我們將得到以下輸出:

chunk_2.PNG

Chinking(塊隙)

Chinking(塊隙)是從塊中移除一系列詞元的過程。如果一系列詞元出現在塊的中間,則會移除這些詞元,留下它們原來存在的兩個塊。

import nltk

sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"), ("flew", "VBD"), ("through", "IN"),  ("the", "DT"), ("window", "NN")]

grammar = r"""
  NP:
    {<.*>+}         # Chunk everything
    }+{      # Chink sequences of JJ and NN
  """
chunkprofile = nltk.RegexpParser(grammar)
result = chunkprofile.parse(sentence) 
print(result)
result.draw()

執行上述程式後,我們將得到以下輸出:

chink.PNG

正如您所看到的,滿足語法條件的部分作為單獨的塊從名詞短語中被提取出來。這個提取不在所需塊中的文字的過程稱為chinking(塊隙)。

廣告
© . All rights reserved.