如何使用 BeautifulSoup 程式包來解析 Python 中網頁中的資料？

BeautifulSoup 是一個第三方 Python 庫，用於從網頁解析資料。它有助於進行網路抓取，這是一個從不同資源中抓取、使用和操作資料的過程。

網路抓取也可以用於提取資料以進行研究目的，瞭解/比較市場趨勢，執行 SEO 監控等等。

可以在 Windows 上執行以下程式碼行以安裝 BeautifulSoup −

pip install beautifulsoup4

讓我們看一個例子 −

示例

import requests
from bs4 import BeautifulSoup
from urllib.request import urlopen
import urllib
url = 'https://en.wikipedia.org/wiki/Algorithm'
html = urlopen(url).read()
print("Reading the webpage...")
soup = BeautifulSoup(html, features="html.parser")
print("Parsing the webpage...")
for script in soup(["script", "style"]):
   script.extract() # rip it out
print("Extracting text from the webpage...")
text = soup.get_text()
print("Data cleaning...")
lines = (line.strip() for line in text.splitlines())
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
text = '\n'.join(chunk for chunk in chunks if chunk)
text = str(text)
print(text)

輸出

Reading the webpage...
Parsing the webpage...
Extracting text from the webpage...
Data cleaning...
Recursive C implementation of Euclid's algorithm from the above flowchart
Recursion
A recursive algorithm is one that invokes (makes reference to) itself repeatedly until a certain condition (also known as termination condition) matches, which is a method common to functional programming….
…..
Developers
Statistics
Cookie statement

說明

匯入了所需包，並進行了別名設定。
定義了網站。
打開了 URL，並刪除了“指令碼”標籤和其他無關的 HTML 標籤。
使用“get_text”函式從網頁資料中提取文字。
消除了額外的空格和無效的詞語。
文字列印在控制檯上。

AmitDiwan

更新時間： 2021-01-18

273 次瀏覽

啟動您的職業

透過完成課程獲得認證

開始

如何使用 BeautifulSoup 程式包來解析 Python 中網頁中的資料？

讓我們看一個例子 −

示例

輸出

說明

啟動您的 職業

啟動您的職業