如何使用 Boto3 更新 AWS Glue 資料目錄中爬蟲的排程程式

AWS Boto3 Python 伺服器端程式設計程式設計

在本文中，我們將瞭解如何在 AWS 賬戶中更新現有爬蟲的排程程式。

示例

問題陳述：使用 Python 中的 boto3 庫更新爬蟲的排程程式。

解決此問題的方法/演算法

步驟 1：匯入 boto3 和 botocore 異常以處理異常。
步驟 2：crawler_name 和 scheduler 是此函式所需的必填引數。
scheduler 的格式應為 cron(cron_expression)。Cron_Expression 可以寫成 (15 12 * * ? *)，即爬蟲每天 UTC 時間 12:15 執行。
步驟 3：使用 boto3 庫建立 AWS 會話。確保在預設配置檔案中提到了 region_name。如果未提及，則在建立會話時顯式傳遞 region_name。
步驟 4：為 glue 建立 AWS 客戶端。
步驟 5：現在使用 update_crawler_schedule 函式並將引數 crawler_name 作為 CrawlerName 和 scheduler 作為 Schedule 傳遞。
步驟 6：它返回響應元資料並更新爬蟲的排程狀態。
步驟 7：如果在更新爬蟲的排程程式時出現任何錯誤，則處理通用異常。

示例程式碼

以下程式碼更新爬蟲的排程程式：

import boto3
from botocore.exceptions import ClientError

def update_scheduler_of_a_crawler(crawler_name, scheduler)
   session = boto3.session.Session()
   glue_client = session.client('glue')
   try:
      response = glue_client.update_crawler_schedule(CrawlerName=crawler_name,       Schedule=scheduler)
      return response
   except ClientError as e:
      raise Exception("boto3 client error in update_scheduler_of_a_crawler: " + e.__str__())
   except Exception as e:
      raise Exception("Unexpected error in update_scheduler_of_a_crawler: " + e.__str__())
print(update_scheduler_of_a_crawler("Data Dimension","cron(15 12 * * ? *)"))

輸出

{'ResponseMetadata': {'RequestId': '73e50130-*****************8e', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Sun, 28 Mar 2021 07:26:55 GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length': '2', 'connection': 'keep-alive', 'x-amzn-requestid': '73e50130-***************8e'}, 'RetryAttempts': 0}}

Ashish Anand

更新於： 2021年4月15日

338 次瀏覽

啟動您的職業生涯

透過完成課程獲得認證

開始學習

如何使用 Boto3 更新 AWS Glue 資料目錄中爬蟲的排程程式

示例

解決此問題的方法/演算法

示例程式碼

輸出

啟動您的 職業生涯

啟動您的職業生涯