如何在 Python 的正則表示式中找到每個匹配的精確位置？

介紹

我們在 Python 中使用 re 模組來處理正則表示式。文字搜尋和更復雜的文字操作都使用正則表示式。像 grep 和 sed 這樣的工具，像 vi 和 emacs 這樣的文字編輯器，以及像 Tcl、Perl 和 Python 這樣的計算機語言都內建了正則表示式支援。

Python 中的 re 模組提供了用於匹配正則表示式的函式。

定義我們要查詢或修改的文字的正則表示式稱為模式。此字串由文字文字和元字元組成。compile 函式用於建立模式。建議使用原始字串，因為正則表示式經常包含特殊字元。（r 字元用於指示原始字串。）在這些字元被組裝成模式之前，不會以這種方式解釋它們。

在模式組裝後，可以使用其中一個函式將模式應用於文字字串。可用的函式包括 Match、search、find 和 finditer。

使用語法

這裡使用的正則表示式函式是：我們用正則表示式函式查詢匹配項。

re.match(): Determines if the RE matches at the beginning of the string. If zero or more characters at the beginning of the string match the regular expression pattern, the match method returns a match object.

p.finditer(): Finds all substrings where the RE matches and returns them as an iterator. An iterator delivering match objects across all non-overlapping matches for the pattern in a string is the result of the finditer method.

re.compile(): Compile a regular expression pattern into a regular expression object, which can be used for matching using its match(), search(), and other methods described below. The expression’s behavior can be modified by specifying a flag's value. Values can be any of the following variables combined using bitwise OR (the | operator).

m.start(): m.start() returns the offset in the string at the match's start.

m.group(): You may use the multiple-assignment approach to assign each value to a different variable when mo.groups() returns a tuple of values, as in the areaCode, mainNumber = mo.groups() line below.

search: It is comparable to re.match() but does not require that we just look for matches at the beginning of the text. The search() function can locate a pattern in the string at any location, but it only returns the first instance of the pattern.

演算法

使用 import re 匯入正則表示式模組。
使用 re.compile() 函式建立一個正則表示式物件。（請記住使用原始字串。）
將要搜尋的字串傳遞到正則表示式物件的 finditer() 方法中。這將返回一個 Match 物件。
呼叫 Match 物件的 group() 方法以返回實際匹配文字的字串。
我們還可以使用 span() 方法獲得單個元組中的起始和結束索引。

示例


 
#importing re functions
import re
#compiling [A-Z0-9] and storing it in a variable p
p = re.compile("[A-Z0-9]")
#looping m times in p.finditer
for m in p.finditer('A5B6C7D8'):
#printing the m.start and m.group
   print m.start(), m.group()

輸出

這將給出以下輸出：

程式碼解釋

使用 import re 匯入正則表示式模組。使用 re.compile() 函式建立一個正則表示式物件（“[A-Z0-9]”），並將其賦值給變數 p。為 m 執行迴圈，並將要搜尋的字串傳遞到正則表示式物件的 finditer() 方法中。這將返回一個 Match 物件。呼叫 Match 物件的 m.group() 和 m.start() 方法以返回實際匹配文字的字串。

示例


# Python program to illustrate
# Matching regex objects
# with groups
import re
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mo = phoneNumRegex.search('My number is 415-555-4242.')
print(mo.groups())

輸出

這將給出以下輸出：

('415', '555-4242')

程式碼解釋

使用 import re 匯入正則表示式模組。使用 re.compile() 函式建立一個正則表示式物件（r'(\d\d\d)-(\d\d\d-\d\d\d\d)'），並將其賦值給變數 phoneNumRegex。將要搜尋的字串傳遞到正則表示式物件的 search() 方法中，並將其儲存在變數 mo 中。這將返回一個 Match 物件。呼叫 Match 物件的 mo.groups() 方法以返回實際匹配文字的字串。

結論

Python re 模組提供的 search()、match() 和 finditer() 方法允許我們匹配正則表示式模式，如果匹配成功，它將提供 Match 物件例項。使用此 Match 物件利用 start()、end() 和 span() 方法檢索有關匹配字串的詳細資訊。

當存在多個匹配項時，如果您使用 findall() 載入所有匹配項，則存在超載 RAM 的風險。您可以獲得所有潛在匹配項的形式的迭代器物件，而不是使用 finditer() 方法，這將提高效率。

這意味著 finditer() 提供了一個可呼叫的物件，當呼叫時，它將把結果載入到記憶體中。

Md Waqar Tabish

更新於： 2022 年 9 月 20 日

2K+ 瀏覽量

啟動您的職業生涯

透過完成課程獲得認證

開始學習

如何在 Python 的正則表示式中找到每個匹配的精確位置？

介紹

使用語法

演算法

示例

輸出

程式碼解釋

示例

輸出

程式碼解釋

結論

啟動您的 職業生涯

啟動您的職業生涯