Scrapy - 選擇器列表物件

HTML 響應中的選擇器示例

以下是關於 HTMLResponse 的一些示例，我們將使用用選擇器例項化的 HTMLResponse 物件，如下所示：

res = Selector(html_response)

您可以從 HTML 響應正文中選擇h2元素，這將返回 SelectorList 物件：

>>res.xpath("//h2")

您可以從 HTML 響應正文中選擇h2元素，這將返回 Unicode 字串列表：

>>res.xpath("//h2").extract()

它返回 h2 元素。

以及

>>res.xpath("//h2/text()").extract()

它返回 h2 標籤下定義的文字，不包括 h2 標籤元素。

您可以遍歷 p 標籤並顯示 class 屬性：

for ele in res.xpath("//p"):
   print ele.xpath("@class").extract()

以下是關於 XMLResponse 的一些示例，我們將使用用選擇器例項化的 XMLResponse 物件，如下所示：

res = Selector(xml_response)

您可以從 XML 響應正文中選擇 description 元素，這將返回 SelectorList 物件：

>>res.xpath("//description")

您可以透過註冊名稱空間來獲取 Google Base XML feed 中的價格值：

>>res.register_namespace("g", "http://base.google.com/ns/1.0")
>>res.xpath("//g:price").extract()

在建立 Scrapy 專案時，您可以使用 Selector.remove_namespaces() 方法移除名稱空間，並使用元素名稱來正確地處理 XPath。

在專案中並非總是需要呼叫名稱空間移除過程，原因有兩個：

scrapy_selectors.htm

列印頁面