Python - 從文字中提取 URL

透過使用正則表示式從文字檔案中提取 URL。該表示式獲取文字，無論其是否與表示式所指定的模式匹配。為此只使用了 re 模組。

示例

我們可以採用包含部分 URL 的輸入檔案，並透過以下程式進行處理以提取這些 URL。我們使用 findall() 函式來查詢與正則表示式相匹配的所有例項。

輸入檔案

下面顯示了輸入檔案。其中包含兩個 URL。

Now a days you can learn almost anything by just visiting http://www.google.com. But if you are completely new to computers or internet then first you need to leanr those fundamentals. Next
you can visit a good e-learning site like - https://tutorialspoint.tw to learn further on a variety of subjects.

現在，當我們將上述輸入檔案採用並透過以下程式進行處理時，我們會獲得所需輸出，即檔案提取出的 URL。

import re
 
with open("path\url_example.txt") as file:
        for line in file:
            urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', line)
            print(urls)

當我們執行上述程式時，我們會獲得以下輸出 −

['http://www.google.com.']
['https://tutorialspoint.tw']

列印頁面