Python 中的 match() 函式是什麼？

在 Python 程式設計領域，文字處理和模式匹配是程式設計師在各種應用程式中經常遇到的任務。Python 以其多功能性和強大功能而聞名，提供了許多工具和模組來促進字串操作和模式匹配。在這些基本工具中，有一個重要的工具是 match() 函式，它是 Python 中 're' 模組的一部分，它允許開發人員使用正則表示式進行模式匹配，從而提供了一種強大的方法來搜尋字串開頭的特定模式。本文旨在探討 match() 函式，闡明其用途、用法和包含詳細解釋的實用程式碼示例，有效地說明其功能。

Python 中正則表示式的簡介

在深入研究 match() 函式的複雜性之前，瞭解正則表示式 (regex) 在 Python 中的重要性至關重要。正則表示式是由定義搜尋模式的字元組成的強大序列。它們被廣泛用於根據特定規則或模式匹配和操作字串。因此，正則表示式提供了一種簡潔靈活的方法來執行復雜的文字搜尋和替換。

match() 函式的用途

match() 函式位於 Python 的 're' 模組中，專門用於在給定字串的開頭執行模式匹配操作。與 search() 函式（在字串中的任何位置搜尋模式）不同，match() 僅嘗試在字串的開頭找到模式。如果在開頭成功找到模式，則 match() 函式會生成一個表示初始匹配的匹配物件。相反，如果在開頭沒有找到匹配項，則它返回 None。

match() 函式的語法

match() 函式按以下語法使用：

re.match(pattern, string, flags=0)

其中：

pattern - 表示要在字串開頭匹配的正則表示式模式。
string - 表示將嘗試進行匹配的輸入字串。
flags (可選) - 表示修改正則表示式行為的標誌，通常使用 're' 模組中的常量指定。

match() 的基本用法

讓我們從一個基本示例開始，演示 match() 函式的應用：

示例

在這個例子中，我們定義了一個名為 match_example 的函式，它接受正則表示式模式和文字字串作為引數。在函式內部，我們使用 re.match() 在文字的開頭搜尋指定的模式。模式 'r'\d+'' 表示一個或多個數字。當我們使用提供的示例文字呼叫函式時，它成功地在文字開頭識別出模式“100”，並通知我們模式的存在。

import re

def match_example(pattern, text):
   matched = re.match(pattern, text)
   if matched:
      print(f"Pattern '{pattern}' found at the beginning of the text.")
   else:
      print(f"Pattern '{pattern}' not found at the beginning of the text.")

# Example usage
pattern = r'\d+'
text = "100 is the product code."
match_example(pattern, text)

輸出

Pattern '\d+' found at the beginning of the text.

match() 函式中的標誌

與 search() 函式類似，match() 函式允許使用標誌來修改正則表示式的行為。此類標誌的一個示例是 re.IGNORECASE 標誌，它使匹配不區分大小寫。讓我們在以下示例中探討此標誌：

使用 re.IGNORECASE 標誌

在這個例子中，我們定義了一個名為 case_insensitive_match 的函式，它接受正則表示式模式和文字字串作為引數。透過將 re.match() 與 re.IGNORECASE 標誌一起使用，我們在文字開頭對指定的模式進行不區分大小寫的匹配。模式 'r'\bhello\b'' 代表帶有單詞邊界的單詞“hello”。當我們使用提供的示例文字呼叫函式時，它成功地在文字開頭檢測到單詞“Hello”，確認模式以不區分大小寫的方式存在。

示例

import re

def case_insensitive_match(pattern, text):
   matched = re.match(pattern, text, re.IGNORECASE)
   if matched:
      print(f"Pattern '{pattern}' found (case-insensitive) at the beginning of the text.")
   else:
      print(f"Pattern '{pattern}' not found at the beginning of the text.")

# Example usage
pattern = r'\bhello\b'
text = "Hello, World! Welcome to the Hello World program."
case_insensitive_match(pattern, text)

輸出

Pattern '\bhello\b' found (case-insensitive) at the beginning of the text

使用組捕獲匹配的文字

與 search() 函式類似，match() 函式還使我們有機會透過使用組來捕獲匹配文字的特定部分。組是由括號括起來的模式的一部分，允許我們從匹配的文字中提取特定資訊。讓我們透過以下示例來探討這一點：

示例

在這個例子中，我們定義了一個名為 capture_matched_text 的函式，它接受正則表示式模式和文字字串作為引數。我們使用 re.match() 嘗試在文字開頭匹配指定的模式。模式 'r'\d{2}-\d{2}-\d{4}'' 表示格式為“dd-mm-yyyy”的日期。當我們使用提供的示例文字呼叫函式時，它成功地在文字開頭檢測到日期“07-31-1990”，並向我們確認了模式的存在。此外，它還提供了匹配的文字“07-31-1990”，該文字是使用匹配物件的 group() 方法提取的。

import re

def capture_matched_text(pattern, text):
   matched = re.match(pattern, text)
   if matched:
      matched_text = matched.group()
      print(f"Pattern '{pattern}' found. Matched text: '{matched_text}'")
   else:
      print(f"Pattern '{pattern}' not found at the beginning of the text.")

# Example usage
pattern = r'\d{2}-\d{2}-\d{4}'
text = "Date of birth: 07-31-1990"
capture_matched_text(pattern, text)

輸出

Pattern '\d{2}-\d{2}-\d{4}' not found at the beginning of the text.

使用 span() 方法獲取匹配位置

匹配物件的 span() 方法允許我們檢索匹配文字在輸入字串中的位置（起始和結束索引）。此資訊可能有助於進一步處理或突出顯示匹配的子字串。讓我們透過以下示例來說明此概念：

示例

在這個例子中，我們定義了一個名為 retrieve_match_position 的函式，它接受正則表示式模式和文字字串作為引數。使用 re.match()，我們嘗試在文字開頭匹配指定的模式。模式 'r'\b\d+\b'' 表示帶有單詞邊界的 1 個或多個數字。當我們使用提供的示例文字呼叫函式時，它成功地在文字開頭檢測到數字“100”和“50”。然後，它繼續列印它們的位置，分別為“19 到 21”和“44 到 46”。此外，它顯示匹配的文字“100”和“50”，這些文字是使用匹配物件的 group() 方法提取的。

import re

def retrieve_match_position(pattern, text):
   matched = re.match(pattern, text)
   if matched:
      matched_text = matched.group()
      start_index, end_index = matched.span()
      print(f"Pattern '{pattern}' found at indices {start_index} to {end_index - 1}.")
      print(f"Matched text: '{matched_text}'")
   else:
      print(f"Pattern '{pattern}' not found at the beginning of the text.")

# Example usage
pattern = r'\b\d+\b'
text = "The price of the product is $100. The discounted price is $50."
retrieve_match_position(pattern, text)

輸出

Pattern '\b\d+\b' not found at the beginning of the text.

將 match() 與多行文字一起使用

預設情況下，match() 函式僅適用於單行字串，將其匹配限制在輸入文字中第一行的開頭。但是，當輸入文字包含多行時，我們可以啟用 re.MULTILINE 標誌以允許函式在每行的開頭匹配模式。讓我們透過以下示例來演示這一點：

示例

在這個例子中，我們定義了一個名為 match_multiline_text 的函式，它接受正則表示式模式和文字字串作為引數。透過將 re.match() 與 re.MULTILINE 標誌一起使用，我們在文字中每行的開頭執行指定的模式的匹配。模式 'r'^python'' 表示行開頭的單詞“python”。當我們使用提供的示例文字呼叫函式時，它成功地在第一行和第三行的開頭識別出單詞“python”，從而確認模式在行開頭的存在。

import re

def match_multiline_text(pattern, text):
   matched = re.match(pattern, text, re.MULTILINE)
   if matched:
      print(f"Pattern '{pattern}' found at the beginning of a line.")
   else:
      print(f"Pattern '{pattern}' not found at the beginning of any line.")

# Example usage
pattern = r'^python'
text = "Python is an amazing language.\npython is a snake.\nPYTHON is great."
match_multiline_text(pattern, text)

輸出

Pattern '^python' not found at the beginning of a line.

本文深入探討了 Python 的 're' 模組中的 match() 函式，這是一個用於在字串開頭進行模式匹配的強大工具。我們廣泛探討了它的用途、語法和用法，包括使用標誌來修改其行為。此外，我們檢查了由分步解釋支援的實用示例，說明了它的功能，例如使用組捕獲匹配的文字以及檢索匹配文字在輸入字串中的位置。憑藉這些知識，您可以在 Python 專案中自信地利用 match() 函式來有效地管理文字處理和模式匹配任務。正則表示式和 match() 函式的結合為開發人員提供了無限的可能性，使他們能夠輕鬆應對複雜的文字處理挑戰。

Rajendra Dharmkar

更新於： 2023-08-22

15K+ 閱讀量

開啟你的職業生涯

透過完成課程獲得認證

立即開始