- scrapy 教程
- scrapy 主頁
- scrapy 基本概念
- scrapy 概述
- scrapy 環境
- scrapy 命令列工具
- scrapy - spiders
- scrapy - 提取器
- scrapy - Items
- scrapy - Item 載入器
- scrapy - Shell
- scrapy - item 管道
- scrapy - 輸出
- scrapy - 請求和響應
- scrapy - 連結提取器
- scrapy - 設定
- scrapy - 異常
- scrapy 專案實戰
- scrapy - 建立專案
- scrapy - 定義項
- scrapy - 第一個蜘蛛
- Scrapy - 抓取
- scrapy - 提取項
- scrapy - 使用項
- scrapy - 跟蹤連結
- scrapy - 抓取資料
- scrapy 有用資源
- scrapy - 快速指南
- scrapy - 有用資源
- scrapy - 討論
Scrapy - 抓取
說明
要在 first_scrapy 目錄中執行蜘蛛,請執行以下命令:
scrapy crawl first
其中,first 是在建立蜘蛛時指定的蜘蛛名稱。
蜘蛛爬取後,您可以看到以下輸出:
2016-08-09 18:13:07-0400 [scrapy] INFO: Scrapy started (bot: tutorial)
2016-08-09 18:13:07-0400 [scrapy] INFO: Optional features available: ...
2016-08-09 18:13:07-0400 [scrapy] INFO: Overridden settings: {}
2016-08-09 18:13:07-0400 [scrapy] INFO: Enabled extensions: ...
2016-08-09 18:13:07-0400 [scrapy] INFO: Enabled downloader middlewares: ...
2016-08-09 18:13:07-0400 [scrapy] INFO: Enabled spider middlewares: ...
2016-08-09 18:13:07-0400 [scrapy] INFO: Enabled item pipelines: ...
2016-08-09 18:13:07-0400 [scrapy] INFO: Spider opened
2016-08-09 18:13:08-0400 [scrapy] DEBUG: Crawled (200)
<GET http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: None)
2016-08-09 18:13:09-0400 [scrapy] DEBUG: Crawled (200)
<GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None)
2016-08-09 18:13:09-0400 [scrapy] INFO: Closing spider (finished)
在輸出中可以看到,對於每個 URL,都有一行日誌,它狀態 (referer: None),表明這些 URL 是起始 URL,它們沒有引薦人。接下來,您應會看到在 first_scrapy 目錄中建立了兩個名為 Books.html 和 Resources.html 的新檔案。
廣告