Beautiful Soup - get_text() 方法



方法描述

get_text() 方法返回整個 HTML 文件或給定標籤中所有可讀文字。所有子字串將使用給定的分隔符連線,預設情況下為 null 字串。

語法

get_text(separator, strip)

引數

  • separator − 子字串將使用此引數連線。預設值為 ""。

  • strip − 在連線之前,字串將被去除空格。

返回值型別

get_Text() 方法返回一個字串。

示例 1

在下面的示例中,get_text() 方法刪除了所有 HTML 標籤。

html = '''
<html>
<body>
   <p> The quick, brown fox jumps over a lazy dog.</p>
   <p> DJs flock by when MTV ax quiz prog.</p>
   <p> Junk MTV quiz graced by fox whelps.</p>
   <p> Bawds jog, flick quartz, vex nymphs.</p>
</body>
</html>
'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")
text = soup.get_text()
print(text)

輸出

The quick, brown fox jumps over a lazy dog.
DJs flock by when MTV ax quiz prog.
Junk MTV quiz graced by fox whelps.
Bawds jog, flick quartz, vex nymphs.

示例 2

在下面的示例中,我們將 get_text() 方法的 separator 引數指定為 '#'。

html = '''
   <p>The quick, brown fox jumps over a lazy dog.</p>
   <p>DJs flock by when MTV ax quiz prog.</p>
   <p>Junk MTV quiz graced by fox whelps.</p>
   <p>Bawds jog, flick quartz, vex nymphs.</p>
'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")
text = soup.get_text(separator='#')
print(text)

輸出

#The quick, brown fox jumps over a lazy dog.#
#DJs flock by when MTV ax quiz prog.#
#Junk MTV quiz graced by fox whelps.#
#Bawds jog, flick quartz, vex nymphs.#

示例 3

讓我們檢查當 strip 引數設定為 True 時產生的效果。預設情況下為 False。

html = '''
   <p>The quick, brown fox jumps over a lazy dog.</p>
   <p>DJs flock by when MTV ax quiz prog.</p>
   <p>Junk MTV quiz graced by fox whelps.</p>
   <p>Bawds jog, flick quartz, vex nymphs.</p>
'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")
text = soup.get_text(strip=True)
print(text)

輸出

The quick, brown fox jumps over a lazy dog.DJs flock by when MTV ax quiz prog.Junk MTV quiz graced by fox whelps.Bawds jog, flick quartz, vex nymphs.
廣告