spaCy - Util.compile_infix_regex

此工具函式將把一系列中綴規則編譯成一個正則表示式物件。

引數

下表說明了其引數 -

名稱	型別	說明
entries	元組	此引數表示中綴規則。例如，lang.punctuation.TOKENIZER_INFIXES</>。

語法

infixes = ("…", "-", "—", r"(?<=[0-9])[+-*^](?=[0-9-])")
infix_reg = util.compile_infix_regex(infixes)
nlp.tokenizer.infix_finditer = infix_reg.finditer

示例

import spacy
nlp = spacy.load('en_core_web_sm')
infixes = ('')
infix_reg = spacy.util.compile_infix_regex(infixes)
nlp.tokenizer.infix_finditer = infix_reg.finditer
doc = nlp("[A] works for [B] in [C].")
print([t.text for t in doc])
# ['[A]', 'works', 'for', '[B]', 'in', '[C]', '.']

輸出

Output
['[', 'A', ']', 'w', 'o', 'r', 'k', 's', 'f', 'o', 'r', '[', 'B', ']', 'i', 'n', '[', 'C', ']', '.']

spacy_util_get_data_path.htm

列印頁面