Python 中對超文字標記語言的支援?
Python 可以透過 html.parser 模組中的 HTMLParser 類處理 HTML 檔案。它可以檢測 HTML 標籤的性質、它們的位置和標籤的許多其他屬性。它還具有可以識別和提取 HTML 檔案中資料的函式。
在下面的示例中,我們瞭解如何使用 HTMLParser 類建立一個自定義解析器類,這個類只能處理我們在類中定義的標籤和資料。這裡我們正在處理起始標籤、結束標籤和資料。
以下是 Python 自定義解析器正在處理的 HTML。
示例
<html> <br> <head> <br> <title>welcome to Tutorials Point!</title> <br> </head> <br> <body> <br> <h1>Learn anything !</h1> <br> </body> <br> </html>
以下是解析上述檔案並根據自定義解析器輸出結果的程式。
示例
from html.parser import HTMLParser
import io
class Custom_Parser(HTMLParser):
def handle_starttag(self, tag, attrs):
print("Line and Offset ==", HTMLParser.getpos(self))
print("Encountered a start tag:", tag)
def handle_endtag(self, tag):
print("Line and Offset ==", HTMLParser.getpos(self))
print("Encountered an end tag :", tag)
def handle_data(self, data):
print("Line and Offset ==", HTMLParser.getpos(self))
print("Encountered some data :", data)
parser = Custom_Parser()
stream = io.open("E:\test.html", "r")
parser.feed(stream.read())輸出
執行以上程式碼,我們得到以下結果:
Line and Offset == (1, 0) Encountered a start tag: html Line and Offset == (1, 6) Encountered some data : Line and Offset == (2, 0) Encountered a start tag: head Line and Offset == (2, 6) Encountered some data : Line and Offset == (3, 0) Encountered a start tag: title Line and Offset == (3, 7) Encountered some data : welcome to Tutorials Point! Line and Offset == (3, 34) Encountered an end tag : title Line and Offset == (3, 42) Encountered some data : Line and Offset == (4, 0) Encountered an end tag : head Line and Offset == (4, 7) Encountered some data : Line and Offset == (5, 0) Encountered a start tag: body Line and Offset == (5, 6) Encountered some data : Line and Offset == (6, 0) Encountered a start tag: h1 Line and Offset == (6, 4) Encountered some data : Learn anything ! Line and Offset == (6, 20) Encountered an end tag : h1 Line and Offset == (6, 25) Encountered some data : Line and Offset == (7, 0) Encountered an end tag : body Line and Offset == (7, 7) Encountered some data : Line and Offset == (8, 0) Encountered an end tag : html
廣告
資料結構
網路
RDBMS
作業系統
Java
iOS
HTML
CSS
Android
Python
C 程式設計
C++
C#
MongoDB
MySQL
Javascript
PHP