Python數字取證 - 快速指南

上一個

下一個

Python數字取證 - 簡介

本章將為您介紹數字取證的本質及其歷史回顧。您還將瞭解在現實生活中可以將數字取證應用於何處以及其侷限性。

什麼是數字取證？

數字取證可以定義為取證科學的一個分支，它分析、檢查、識別和恢復駐留在電子裝置上的數字證據。它通常用於刑事法律和私人調查。

例如，如果有人在電子裝置上竊取了一些資料，您可以依靠數字取證提取證據。

數字取證的簡要歷史回顧

本節解釋了計算機犯罪的歷史和數字取證的歷史回顧，如下所示：

1970年代-1980年代：第一次計算機犯罪

在此十年之前，沒有承認過計算機犯罪。但是，如果它被認為會發生，那麼當時的現行法律會處理它們。後來，1978年，佛羅里達州計算機犯罪法案承認了第一次計算機犯罪，其中包括針對未經授權修改或刪除計算機系統上資料的立法。但隨著時間的推移，由於技術的進步，所犯的計算機犯罪範圍也隨之擴大。為了應對與版權、隱私和兒童色情相關的犯罪，通過了其他各種法律。

1980年代-1990年代：發展十年

這個十年是數字取證的開發十年，這完全是因為第一次調查（1986年），其中克利夫·斯托爾追蹤了名為馬庫斯·赫斯的駭客。在此期間，發展了兩種數字取證學科——第一種是在從業人員將其作為愛好而開發的臨時工具和技術的幫助下，而第二種則由科學界開發。1992年，術語“計算機取證”被用於學術文獻中。

2000年代-2010年代：標準化十年

在數字取證發展到一定程度後，需要制定一些在進行調查時可以遵循的具體標準。因此，各個科學機構和組織釋出了數字取證指南。2002年，數字證據科學工作組 (SWGDE) 釋出了一篇名為“計算機取證最佳實踐”的論文。另一項值得稱道的成果是歐洲牽頭的國際條約，即“網路犯罪公約”，由 43 個國家簽署，16 個國家批准。即使有這樣的標準，研究人員也仍然需要解決一些問題。

數字取證流程

自1978年第一次計算機犯罪以來，數字犯罪活動大幅增加。由於這種增加，需要以結構化的方式來處理它們。1984年，引入了正式流程，此後開發了大量新的改進的計算機取證調查流程。

計算機取證調查流程涉及三個主要階段，如下所述：

階段1：獲取或成像證據

數字取證的第一階段涉及儲存數字系統的狀態，以便以後對其進行分析。這與從犯罪現場拍攝照片、採集血液樣本等非常相似。例如，它涉及捕獲硬碟或 RAM 的已分配和未分配區域的映像。

階段2：分析

此階段的輸入是在獲取階段獲取的資料。在這裡，檢查這些資料以識別證據。此階段提供三種證據，如下所示：

罪證 - 這些證據支援給定的歷史記錄。
無罪證據 - 這些證據與給定的歷史記錄相矛盾。
篡改證據 - 這些證據表明系統已被篡改以避免識別。它包括檢查檔案和目錄內容以恢復已刪除的檔案。

階段3：展示或報告

顧名思義，此階段展示了調查的結論和相應的證據。

數字取證的應用

數字取證處理收集、分析和儲存任何數字裝置中包含的證據。數字取證的使用取決於應用程式。如前所述，它主要用於以下兩個應用程式：

刑事法律

在刑事法律中，收集證據以支援或反對法庭上的假設。取證程式與刑事調查中使用的程式非常相似，但具有不同的法律要求和限制。

私人調查

主要公司世界使用數字取證進行私人調查。當公司懷疑員工可能在其計算機上執行違反公司政策的非法活動時，就會使用它。當調查某人是否存在數字不端行為時，數字取證為公司或個人提供了最佳途徑之一。

數字取證的分支

數字犯罪不僅限於計算機，而且駭客和罪犯也在大規模使用平板電腦、智慧手機等小型數字裝置。一些裝置具有易失性儲存器，而另一些則具有非易失性儲存器。因此，根據裝置型別，數字取證具有以下分支：

計算機取證

數字取證的這個分支處理計算機、嵌入式系統和靜態儲存器（如 USB 驅動器）。在計算機取證中可以調查從日誌到驅動器上實際檔案的各種資訊。

移動取證

這處理從移動裝置調查資料。這個分支不同於計算機取證，因為移動裝置具有內建通訊系統，這對於提供與位置相關的有用資訊很有用。

網路取證

這處理監控和分析計算機網路流量（本地和廣域網 (WAN)）以獲取資訊、收集證據或入侵檢測。

資料庫取證

數字取證的這個分支處理資料庫及其元資料的取證研究。

數字取證調查所需的技能

數字取證檢查員幫助追蹤駭客、恢復被盜資料、將計算機攻擊追溯到其源頭，並協助涉及計算機的其他型別的調查。下面討論了一些成為數字取證檢查員所需的關鍵技能：

傑出的思維能力

數字取證調查員必須具備傑出的思維能力，並且能夠在特定任務上應用不同的工具和方法來獲得輸出。他/她必須能夠找到不同的模式並在它們之間建立關聯。

技術技能

數字取證檢查員必須具備良好的技術技能，因為該領域需要網路知識、數字系統如何互動。

對網路安全充滿熱情

因為數字取證領域完全是關於解決網路犯罪，這是一項繁瑣的任務，需要很多人才能成為一名優秀的數字取證調查員。

溝通技巧

良好的溝通技巧對於與各個團隊協調以及提取任何缺失的資料或資訊至關重要。

熟練製作報告

在成功實施獲取和分析後，數字取證檢查員必須在最終報告和簡報中提及所有發現。因此，他/她必須具備良好的報告撰寫技巧和對細節的關注。

侷限性

數字取證調查存在某些侷限性，此處討論：

需要提供令人信服的證據

數字取證調查的主要障礙之一是，檢查員必須遵守法庭證據所需的標準，因為資料很容易被篡改。另一方面，計算機取證調查員必須完全瞭解法律要求、證據處理和檔案程式，以便在法庭上提供令人信服的證據。

調查工具

數字調查的有效性完全取決於數字取證檢查員的專業知識和正確調查工具的選擇。如果使用的工具不符合指定標準，那麼在法庭上，法官可能會否認這些證據。

受眾缺乏技術知識

另一個限制是，有些人並不完全熟悉計算機取證；因此，許多人都不瞭解這個領域。調查人員必須確保以一種幫助每個人理解結果的方式向法院傳達他們的調查結果。

成本

生成數字證據並對其進行儲存非常昂貴。因此，這個過程可能不會被許多負擔不起成本的人選擇。

Python數字取證 - 入門

在上一章中，我們學習了數字取證的基礎知識、其優點和侷限性。本章將使您熟悉 Python，這是我們在本次數字取證調查中使用的基本工具。

為什麼選擇 Python 進行數字取證？

Python 是一種流行的程式語言，被用作網路安全、滲透測試以及數字取證調查的工具。當您選擇 Python 作為您的數字取證工具時，您無需任何其他第三方軟體即可完成任務。

下面列出了一些 Python 程式語言的獨特功能，使其非常適合數字取證專案：

語法簡單 - 與其他語言相比，Python 的語法很簡單，這使得人們更容易學習並將其用於數字取證。
內建模組全面 - Python 的內建模組非常全面，有助於進行完整的數字取證調查。
幫助和支援 - 作為一種開源程式語言，Python 享有開發人員和使用者社群的出色支援。

Python 的特性

Python 作為一種高階的、解釋型的、互動式的和麵向物件的指令碼語言，提供了以下特性：

易於學習 - Python 是一種開發人員友好且易於學習的語言，因為它只有很少的關鍵字和最簡單的結構。
表達力強且易於閱讀 - Python 語言本質上表達力強，因此其程式碼更易於理解和閱讀。
跨平臺相容 - Python 是一種跨平臺相容的語言，這意味著它可以在各種平臺上高效執行，例如 UNIX、Windows 和 Macintosh。
互動式程式設計 - 我們可以對程式碼進行互動式測試和除錯，因為 Python 支援互動式程式設計模式。
提供各種模組和函式 - Python 擁有龐大的標準庫，允許我們為指令碼使用豐富的模組和函式集。
支援動態型別檢查 - Python 支援動態型別檢查並提供非常高階的動態資料型別。
GUI 程式設計 - Python 支援 GUI 程式設計來開發圖形使用者介面。
與其他程式語言整合 - Python 可以輕鬆地與其他程式語言整合，如 C、C++、JAVA 等。

安裝 Python

Python 發行版適用於各種平臺，例如 Windows、UNIX、Linux 和 Mac。我們只需要根據我們的平臺下載二進位制程式碼即可。如果任何平臺的二進位制程式碼不可用，我們必須擁有 C 編譯器，以便手動編譯原始碼。

本節將使您熟悉在各種平臺上安裝 Python 的方法：

在 Unix 和 Linux 上安裝 Python

您可以按照以下步驟在 Unix/Linux 機器上安裝 Python。

步驟 1 - 開啟 Web 瀏覽器。鍵入並輸入 www.python.org/downloads/

步驟 2 - 下載適用於 Unix/Linux 的壓縮原始碼。

步驟 3 - 解壓縮下載的壓縮檔案。

步驟 4 - 如果您希望自定義某些選項，可以編輯Modules/Setup 檔案。

步驟 5 - 使用以下命令完成安裝：

run ./configure script
make
make install

成功完成上述步驟後，Python 將安裝在其標準位置/usr/local/bin，其庫安裝在/usr/local/lib/pythonXX，其中 XX 是 Python 的版本。

在 Windows 上安裝 Python

我們可以按照以下簡單步驟在 Windows 機器上安裝 Python。

步驟 1 - 開啟 Web 瀏覽器。鍵入並輸入 www.python.org/downloads/

步驟 2 - 下載 Windows 安裝程式python-XYZ.msi檔案，其中 XYZ 是我們需要安裝的版本。

步驟 3 - 將安裝程式檔案儲存到本地計算機後，現在執行該 MSI 檔案。

步驟 4 - 執行下載的檔案，這將啟動 Python 安裝嚮導。

在 Macintosh 上安裝 Python

要在 Mac OS X 上安裝 Python 3，我們必須使用名為Homebrew的軟體包安裝程式。

如果您的系統上沒有 Homebrew，可以使用以下命令安裝它：

$ ruby -e "$(curl -fsSL
https://raw.githubusercontent.com/Homebrew/install/master/install)"

如果需要更新包管理器，可以使用以下命令：

$ brew update

現在，使用以下命令在您的系統上安裝 Python3：

$ brew install python3

設定 PATH

我們需要為 Python 安裝設定路徑，這在 UNIX、WINDOWS 或 MAC 等平臺上有所不同。

在 Unix/Linux 上設定路徑

您可以使用以下選項在 Unix/Linux 上設定路徑：

如果使用 csh shell - 鍵入setenv PATH "$PATH:/usr/local/bin/python"，然後按 Enter。
如果使用 bash shell (Linux) - 鍵入export PATH="$PATH:/usr/local/bin/python"，然後按 Enter。
如果使用 sh 或 ksh shell - 鍵入PATH="$PATH:/usr/local/bin/python"，然後按 Enter。

在 Windows 上設定路徑

在命令提示符下鍵入path %path%;C:\Python，然後按 Enter。

執行 Python

您可以選擇以下三種方法中的任何一種來啟動 Python 直譯器：

方法 1：使用互動式直譯器

提供命令列直譯器或 shell 的系統可以輕鬆地用於啟動 Python。例如，Unix、DOS 等。您可以按照以下步驟在互動式直譯器中開始編碼：

步驟 1 - 在命令列中輸入python。

步驟 2 - 使用以下所示的命令立即在互動式直譯器中開始編碼：

$python # Unix/Linux
or
python% # Unix/Linux
or
C:> python # Windows/DOS

方法 2：從命令列使用指令碼

我們還可以透過在應用程式上呼叫直譯器來在命令列中執行 Python 指令碼。您可以使用以下所示的命令：

$python script.py # Unix/Linux
or
python% script.py # Unix/Linux
or
C: >python script.py # Windows/DOS

方法 3：整合開發環境

如果系統具有支援 Python 的 GUI 應用程式，則可以從該 GUI 環境中執行 Python。以下是一些針對各種平臺的 IDE：

Unix IDE - UNIX 為 Python 提供了 IDLE IDE。
Windows IDE - Windows 提供了 PythonWin，它是 Python 的第一個 Windows 介面，並帶有 GUI。
Macintosh IDE - Macintosh 提供了 IDLE IDE，可從主網站下載，可以下載為 MacBinary 或 BinHex'd 檔案。

工件報告

現在您已經熟悉了在本地系統上安裝和執行 Python 命令，讓我們詳細瞭解取證概念。本章將解釋在 Python 數字取證中處理工件涉及的各種概念。

建立報告的必要性

數字取證過程包括報告作為第三階段。這是數字取證過程中最重要的部分之一。建立報告是必要的，原因如下：

它是數字取證檢查員概述調查過程及其發現的文件。
其他檢查員可以參考一份好的數字取證報告，透過給定的相同儲存庫獲得相同的結果。
它是一份技術和科學文件，包含在數字證據的 1 和 0 中發現的事實。

報告建立的一般指南

報告的編寫是為了向讀者提供資訊，並且必須以堅實的基礎開始。如果報告在沒有一些一般指南或標準的情況下準備，調查人員可能會難以有效地呈現他們的發現。以下是一些建立數字取證報告時必須遵循的一般指南：

摘要 - 報告必須包含資訊的簡要摘要，以便讀者能夠確定報告的目的。
使用的工具 - 我們必須提及用於執行數字取證過程的工具，包括其用途。
儲存庫 - 假設我們調查了某人的計算機，然後是證據的摘要和相關材料（如電子郵件、內部搜尋歷史記錄等）的分析，則必須將其包含在報告中，以便清楚地呈現案例。
對律師的建議 - 報告必須包含對律師的建議，以便根據報告中的發現繼續或停止調查。

建立不同型別的報告

在上一節中，我們瞭解了報告在數字取證中的重要性以及建立報告的指南。下面討論了 Python 中用於建立不同型別報告的一些格式：

CSV 報告

報告最常見的輸出格式之一是 CSV 電子表格報告。您可以建立 CSV 以使用以下所示的 Python 程式碼建立已處理資料的報告：

首先，匯入用於寫入電子表格的有用庫：

from __future__ import print_function
import csv
import os
import sys

現在，呼叫以下方法：

Write_csv(TEST_DATA_LIST, ["Name", "Age", "City", "Job description"], os.getcwd())

我們使用以下全域性變數來表示示例資料型別：

TEST_DATA_LIST = [["Ram", 32, Bhopal, Manager], 
   ["Raman", 42, Indore, Engg.],
   ["Mohan", 25, Chandigarh, HR], 
   ["Parkash", 45, Delhi, IT]]

接下來，讓我們定義方法以繼續進行進一步的操作。我們以“w”模式開啟檔案並將 newline 關鍵字引數設定為空字串。

def Write_csv(data, header, output_directory, name = None):
   if name is None:
      name = "report1.csv"
   print("[+] Writing {} to {}".format(name, output_directory))
   
   with open(os.path.join(output_directory, name), "w", newline = "") as \ csvfile:
      writer = csv.writer(csvfile)
      writer.writerow(header)
      writer.writerow(data)

如果執行上述指令碼，您將獲得儲存在 report1.csv 檔案中的以下詳細資訊。

姓名	年齡	城市	職位
Ram	32	32	Bhopal
經理	42	Raman	28
Indore	25	工程師	Mohan
35	45	Chandigarh	人力資源

Parkash

另一種常見的報告輸出格式是 Excel (.xlsx) 電子表格報告。我們可以建立表格並使用 Excel 繪製圖形。我們可以使用以下所示的 Python 程式碼以 Excel 格式建立已處理資料的報告：

首先，匯入 XlsxWriter 模組以建立電子表格：

import xlsxwriter

現在，建立一個工作簿物件。為此，我們需要使用 Workbook() 建構函式。

workbook = xlsxwriter.Workbook('report2.xlsx')

現在，使用 add_worksheet() 模組建立一個新的工作表。

worksheet = workbook.add_worksheet()

接下來，將以下資料寫入工作表：

report2 = (['Ram', 32, ‘Bhopal’],['Mohan',25, ‘Chandigarh’] ,['Parkash',45, ‘Delhi’])

row = 0
col = 0

您可以遍歷此資料並按如下方式寫入：

for item, cost in (a):
   worksheet.write(row, col, item)
   worksheet.write(row, col+1, cost)
   row + = 1

現在，讓我們使用 close() 方法關閉此 Excel 檔案。

workbook.close()

上述指令碼將建立一個名為 report2.xlsx 的 Excel 檔案，其中包含以下資料：

Ram	32	32
Indore	25	工程師
35	45	Chandigarh

調查採集介質

對於調查人員來說，擁有詳細的調查記錄以準確回憶發現或將所有調查片段拼湊起來非常重要。螢幕截圖對於跟蹤特定調查採取的步驟非常有用。藉助以下 Python 程式碼，我們可以擷取螢幕截圖並將其儲存到硬碟以供將來使用。

首先，使用以下命令安裝名為 pyscreenshot 的 Python 模組：

Pip install pyscreenshot

現在，匯入必要的模組，如下所示：

import pyscreenshot as ImageGrab

使用以下程式碼行獲取螢幕截圖：

image = ImageGrab.grab()

使用以下程式碼行將螢幕截圖儲存到給定位置：

image.save('d:/image123.png')

現在，如果要將螢幕截圖作為圖形彈出，可以使用以下 Python 程式碼：

import numpy as np
import matplotlib.pyplot as plt
import pyscreenshot as ImageGrab
imageg = ImageGrab.grab()
plt.imshow(image, cmap='gray', interpolation='bilinear')
plt.show()

Python 數字移動裝置取證

本章將解釋移動裝置上的 Python 數字取證以及所涉及的概念。

簡介

移動裝置取證是數字取證的一個分支，它處理移動裝置的採集和分析，以恢復與調查相關的數字證據。該分支不同於計算機取證，因為移動裝置具有內建的通訊系統，可用於提供與位置相關的有用資訊。

儘管智慧手機在數字取證中的使用日益增多，但由於其異構性，它仍然被認為是非標準的。另一方面，計算機硬體（如硬碟）被認為是標準的，並且也發展成為一個穩定的學科。在數字取證行業，關於用於非標準裝置（具有瞬態證據，如智慧手機）的技術存在很多爭論。

可從移動裝置中提取的工件

與僅具有通話記錄或簡訊的舊手機相比，現代移動裝置擁有大量數字資訊。因此，移動裝置可以為調查人員提供對其使用者的大量見解。可以從移動裝置中提取的一些工件如下所示：

訊息 − 這些是有用的工件，可以揭示所有者的思維狀態，甚至可以為調查人員提供一些以前未知的資訊。
位置歷史記錄− 位置歷史記錄資料是一個有用的工件，調查人員可以使用它來驗證某人的特定位置。
已安裝應用程式 − 透過訪問已安裝的應用程式型別，調查人員可以深入瞭解手機使用者的習慣和思維方式。

Python 中的證據來源和處理

智慧手機使用 SQLite 資料庫和 PLIST 檔案作為主要證據來源。在本節中，我們將使用 Python 處理證據來源。

分析 PLIST 檔案

PLIST（屬性列表）是一種靈活且方便的格式，用於儲存應用程式資料，尤其是在 iPhone 裝置上。它使用副檔名.plist。此類檔案用於儲存有關捆綁包和應用程式的資訊。它可以採用兩種格式：XML 和二進位制。以下 Python 程式碼將開啟並讀取 PLIST 檔案。請注意，在繼續執行此操作之前，我們必須建立自己的Info.plist 檔案。

首先，使用以下命令安裝名為biplist 的第三方庫：

Pip install biplist

現在，匯入一些有用的庫來處理 plist 檔案：

import biplist
import os
import sys

現在，在 main 方法下使用以下命令可以將 plist 檔案讀取到變數中：

def main(plist):
   try:
      data = biplist.readPlist(plist)
   except (biplist.InvalidPlistException,biplist.NotBinaryPlistException) as e:
print("[-] Invalid PLIST file - unable to be opened by biplist")
sys.exit(1)

現在，我們可以從這個變數中讀取控制檯上的資料或直接列印它。

SQLite 資料庫

SQLite 充當移動裝置上的主要資料儲存庫。SQLite 是一個程序內庫，它實現了自包含、無伺服器、零配置的事務性 SQL 資料庫引擎。它是一個零配置的資料庫，您無需在系統中配置它，這與其他資料庫不同。

如果您是新手或不熟悉 SQLite 資料庫，您可以訪問連結www.tutorialspoint.com/sqlite/index.htm。此外，如果您想深入瞭解 SQLite 和 Python，可以訪問連結www.tutorialspoint.com/sqlite/sqlite_python.htm。

在移動取證過程中，我們可以與移動裝置的sms.db檔案互動，並可以從message表中提取有價值的資訊。Python 有一個名為sqlite3的內建庫，用於連線 SQLite 資料庫。您可以使用以下命令匯入它：

import sqlite3

現在，藉助以下命令，我們可以連線到資料庫，例如移動裝置中的sms.db：

Conn = sqlite3.connect(‘sms.db’)
C = conn.cursor()

這裡，C 是遊標物件，我們可以藉助它與資料庫互動。

現在，假設如果我們想執行特定的命令，例如獲取abc 表中的詳細資訊，可以使用以下命令：

c.execute(“Select * from abc”)
c.close()

上述命令的結果將儲存在cursor物件中。類似地，我們可以使用fetchall()方法將結果轉儲到我們可以操作的變數中。

我們可以使用以下命令獲取sms.db中 message 表的列名資料：

c.execute(“pragma table_info(message)”)
table_data = c.fetchall()
columns = [x[1] for x in table_data

請注意，這裡我們使用的是 SQLite PRAGMA 命令，這是一個特殊的命令，用於控制 SQLite 環境中的各種環境變數和狀態標誌。在上述命令中，fetchall()方法返回一個結果元組。每個列的名稱都儲存在每個元組的第一個索引中。

現在，藉助以下命令，我們可以查詢該表的所有資料並將其儲存在名為data_msg的變數中：

c.execute(“Select * from message”)
data_msg = c.fetchall()

上述命令會將資料儲存在變數中，此外，我們還可以使用csv.writer()方法將上述資料寫入 CSV 檔案。

iTunes 備份

iPhone 移動取證可以在 iTunes 建立的備份上執行。取證檢查員依靠分析透過 iTunes 獲取的 iPhone 邏輯備份。iTunes 使用 AFC（Apple 檔案連線）協議進行備份。此外，備份過程不會修改 iPhone 上的任何內容，除了託管金鑰記錄。

現在，問題出現了，為什麼數字取證專家需要了解 iTunes 備份的技術？如果我們獲得了嫌疑人的電腦而不是直接獲得 iPhone，這一點很重要，因為當電腦用於與 iPhone 同步時，iPhone 上的大多數資訊都可能備份到電腦上。

備份過程及其位置

每當將 Apple 產品備份到電腦時，它都會與 iTunes 同步，並且將有一個包含裝置唯一 ID 的特定資料夾。在最新的備份格式中，檔案儲存在包含檔名前兩個十六進位制字元的子資料夾中。在這些備份檔案中，有一些檔案如 info.plist 很有用，以及名為 Manifest.db 的資料庫。下表顯示了備份位置，這些位置因 iTunes 備份的作業系統而異：

作業系統	備份位置
Win7	C:\Users\[使用者名稱]\AppData\Roaming\AppleComputer\MobileSync\Backup\
MAC OS X	~/Library/Application Suport/MobileSync/Backup/

要使用 Python 處理 iTunes 備份，我們首先需要根據我們的作業系統識別備份位置中的所有備份。然後，我們將遍歷每個備份並讀取資料庫 Manifest.db。

現在，藉助以下 Python 程式碼，我們可以執行相同的操作：

首先，匯入必要的庫，如下所示：

from __future__ import print_function
import argparse
import logging
import os

from shutil import copyfile
import sqlite3
import sys
logger = logging.getLogger(__name__)

現在，提供兩個位置引數，即 INPUT_DIR 和 OUTPUT_DIR，分別表示 iTunes 備份和所需的輸出資料夾：

if __name__ == "__main__":
   parser.add_argument("INPUT_DIR",help = "Location of folder containing iOS backups, ""e.g. ~\Library\Application Support\MobileSync\Backup folder")
   parser.add_argument("OUTPUT_DIR", help = "Output Directory")
   parser.add_argument("-l", help = "Log file path",default = __file__[:-2] + "log")
   parser.add_argument("-v", help = "Increase verbosity",action = "store_true") args = parser.parse_args()

現在，設定日誌，如下所示：

if args.v:
   logger.setLevel(logging.DEBUG)
else:
   logger.setLevel(logging.INFO)

現在，設定此日誌的訊息格式，如下所示：

msg_fmt = logging.Formatter("%(asctime)-15s %(funcName)-13s""%(levelname)-8s %(message)s")
strhndl = logging.StreamHandler(sys.stderr)
strhndl.setFormatter(fmt = msg_fmt)

fhndl = logging.FileHandler(args.l, mode = 'a')
fhndl.setFormatter(fmt = msg_fmt)

logger.addHandler(strhndl)
logger.addHandler(fhndl)
logger.info("Starting iBackup Visualizer")
logger.debug("Supplied arguments: {}".format(" ".join(sys.argv[1:])))
logger.debug("System: " + sys.platform)
logger.debug("Python Version: " + sys.version)

以下程式碼行將使用os.makedirs()函式為所需的輸出目錄建立必要的資料夾：

if not os.path.exists(args.OUTPUT_DIR):
   os.makedirs(args.OUTPUT_DIR)

現在，將提供的輸入和輸出目錄傳遞給 main() 函式，如下所示：

if os.path.exists(args.INPUT_DIR) and os.path.isdir(args.INPUT_DIR):
   main(args.INPUT_DIR, args.OUTPUT_DIR)
else:
   logger.error("Supplied input directory does not exist or is not ""a directory")
   sys.exit(1)

現在，編寫main()函式，該函式將進一步呼叫backup_summary()函式以識別輸入資料夾中存在的所有備份：

def main(in_dir, out_dir):
   backups = backup_summary(in_dir)
def backup_summary(in_dir):
   logger.info("Identifying all iOS backups in {}".format(in_dir))
   root = os.listdir(in_dir)
   backups = {}
   
   for x in root:
      temp_dir = os.path.join(in_dir, x)
      if os.path.isdir(temp_dir) and len(x) == 40:
         num_files = 0
         size = 0
         
         for root, subdir, files in os.walk(temp_dir):
            num_files += len(files)
            size += sum(os.path.getsize(os.path.join(root, name))
               for name in files)
         backups[x] = [temp_dir, num_files, size]
   return backups

現在，將每個備份的摘要列印到控制檯，如下所示：

print("Backup Summary")
print("=" * 20)

if len(backups) > 0:
   for i, b in enumerate(backups):
      print("Backup No.: {} \n""Backup Dev. Name: {} \n""# Files: {} \n""Backup Size (Bytes): {}\n".format(i, b, backups[b][1], backups[b][2]))

現在，將 Manifest.db 檔案的內容轉儲到名為 db_items 的變數中。

try:
   db_items = process_manifest(backups[b][0])
   except IOError:
      logger.warn("Non-iOS 10 backup encountered or " "invalid backup. Continuing to next backup.")
continue

現在，讓我們定義一個函式，該函式將獲取備份的目錄路徑：

def process_manifest(backup):
   manifest = os.path.join(backup, "Manifest.db")
   
   if not os.path.exists(manifest):
      logger.error("Manifest DB not found in {}".format(manifest))
      raise IOError

現在，使用 SQLite3，我們將透過名為 c 的遊標連線到資料庫：

c = conn.cursor()
items = {}

for row in c.execute("SELECT * from Files;"):
   items[row[0]] = [row[2], row[1], row[3]]
return items

create_files(in_dir, out_dir, b, db_items)
   print("=" * 20)
else:
   logger.warning("No valid backups found. The input directory should be
      " "the parent-directory immediately above the SHA-1 hash " "iOS device backups")
      sys.exit(2)

現在，定義create_files()方法，如下所示：

def create_files(in_dir, out_dir, b, db_items):
   msg = "Copying Files for backup {} to {}".format(b, os.path.join(out_dir, b))
   logger.info(msg)

現在，遍歷db_items字典中的每個鍵：

for x, key in enumerate(db_items):
   if db_items[key][0] is None or db_items[key][0] == "":
      continue
   else:
      dirpath = os.path.join(out_dir, b,
os.path.dirname(db_items[key][0]))
   filepath = os.path.join(out_dir, b, db_items[key][0])
   
   if not os.path.exists(dirpath):
      os.makedirs(dirpath)
      original_dir = b + "/" + key[0:2] + "/" + key
   path = os.path.join(in_dir, original_dir)
   
   if os.path.exists(filepath):
      filepath = filepath + "_{}".format(x)

現在，使用shutil.copyfile()方法複製備份的檔案，如下所示：

try:
   copyfile(path, filepath)
   except IOError:
      logger.debug("File not found in backup: {}".format(path))
         files_not_found += 1
   if files_not_found > 0:
      logger.warning("{} files listed in the Manifest.db not" "found in
backup".format(files_not_found))
   copyfile(os.path.join(in_dir, b, "Info.plist"), os.path.join(out_dir, b,
"Info.plist"))
   copyfile(os.path.join(in_dir, b, "Manifest.db"), os.path.join(out_dir, b,
"Manifest.db"))
   copyfile(os.path.join(in_dir, b, "Manifest.plist"), os.path.join(out_dir, b,
"Manifest.plist"))
   copyfile(os.path.join(in_dir, b, "Status.plist"),os.path.join(out_dir, b,
"Status.plist"))

使用上述 Python 指令碼，我們可以在輸出資料夾中獲取更新的備份檔案結構。我們可以使用pycrypto python 庫來解密備份。

Wi-Fi

移動裝置可以透過連線到隨處可見的 Wi-Fi 網路來連線到外部世界。有時裝置會自動連線到這些開放網路。

對於 iPhone，裝置已連線的開放 Wi-Fi 連線列表儲存在名為com.apple.wifi.plist的 PLIST 檔案中。此檔案將包含 Wi-Fi SSID、BSSID 和連線時間。

我們需要使用 Python 從標準 Cellebrite XML 報告中提取 Wi-Fi 詳細資訊。為此，我們需要使用無線地理位置記錄引擎 (WIGLE) 的 API，這是一個流行的平臺，可用於使用 Wi-Fi 網路名稱查詢裝置的位置。

我們可以使用名為requests的 Python 庫來訪問 WIGLE 的 API。它可以按如下方式安裝：

pip install requests

WIGLE 的 API

我們需要在 WIGLE 的網站https://wigle.net/account上註冊以獲取 WIGLE 的免費 API。下面討論了獲取有關使用者裝置及其透過 WIGEL 的 API 連線的資訊的 Python 指令碼：

首先，匯入以下庫以處理不同的內容：

from __future__ import print_function

import argparse
import csv
import os
import sys
import xml.etree.ElementTree as ET
import requests

現在，提供兩個位置引數，即INPUT_FILE和OUTPUT_CSV，它們分別表示包含 Wi-Fi MAC 地址的輸入檔案和所需的輸出 CSV 檔案：

if __name__ == "__main__":
   parser.add_argument("INPUT_FILE", help = "INPUT FILE with MAC Addresses")
   parser.add_argument("OUTPUT_CSV", help = "Output CSV File")
   parser.add_argument("-t", help = "Input type: Cellebrite XML report or TXT
file",choices = ('xml', 'txt'), default = "xml")
   parser.add_argument('--api', help = "Path to API key
   file",default = os.path.expanduser("~/.wigle_api"),
   type = argparse.FileType('r'))
   args = parser.parse_args()

現在，以下程式碼行將檢查輸入檔案是否存在並且是否為檔案。如果不是，則退出指令碼：

if not os.path.exists(args.INPUT_FILE) or \ not os.path.isfile(args.INPUT_FILE):
   print("[-] {} does not exist or is not a
file".format(args.INPUT_FILE))
   sys.exit(1)
directory = os.path.dirname(args.OUTPUT_CSV)
if directory != '' and not os.path.exists(directory):
   os.makedirs(directory)
api_key = args.api.readline().strip().split(":")

現在，將引數傳遞給 main，如下所示：

main(args.INPUT_FILE, args.OUTPUT_CSV, args.t, api_key)
def main(in_file, out_csv, type, api_key):
   if type == 'xml':
      wifi = parse_xml(in_file)
   else:
      wifi = parse_txt(in_file)
query_wigle(wifi, out_csv, api_key)

現在，我們將解析 XML 檔案，如下所示：

def parse_xml(xml_file):
   wifi = {}
   xmlns = "{http://pa.cellebrite.com/report/2.0}"
   print("[+] Opening {} report".format(xml_file))
   
   xml_tree = ET.parse(xml_file)
   print("[+] Parsing report for all connected WiFi addresses")
   
   root = xml_tree.getroot()

現在，遍歷根的子元素，如下所示：

for child in root.iter():
   if child.tag == xmlns + "model":
      if child.get("type") == "Location":
         for field in child.findall(xmlns + "field"):
            if field.get("name") == "TimeStamp":
               ts_value = field.find(xmlns + "value")
               try:
               ts = ts_value.text
               except AttributeError:
continue

現在，我們將檢查值文字中是否存在“ssid”字串：

if "SSID" in value.text:
   bssid, ssid = value.text.split("\t")
   bssid = bssid[7:]
   ssid = ssid[6:]

現在，我們需要將 BSSID、SSID 和時間戳新增到 wifi 字典中，如下所示：

if bssid in wifi.keys():

wifi[bssid]["Timestamps"].append(ts)
   wifi[bssid]["SSID"].append(ssid)
else:
   wifi[bssid] = {"Timestamps": [ts], "SSID":
[ssid],"Wigle": {}}
return wifi

文字解析器比 XML 解析器簡單得多，如下所示：

def parse_txt(txt_file):
   wifi = {}
   print("[+] Extracting MAC addresses from {}".format(txt_file))
   
   with open(txt_file) as mac_file:
      for line in mac_file:
         wifi[line.strip()] = {"Timestamps": ["N/A"], "SSID":
["N/A"],"Wigle": {}}
return wifi

現在，讓我們使用 requests 模組進行WIGLE API呼叫，並需要繼續執行query_wigle()方法：

def query_wigle(wifi_dictionary, out_csv, api_key):
   print("[+] Querying Wigle.net through Python API for {} "
"APs".format(len(wifi_dictionary)))
   for mac in wifi_dictionary:

   wigle_results = query_mac_addr(mac, api_key)
def query_mac_addr(mac_addr, api_key):

   query_url = "https://api.wigle.net/api/v2/network/search?" \
"onlymine = false&freenet = false&paynet = false" \ "&netid = {}".format(mac_addr)
   req = requests.get(query_url, auth = (api_key[0], api_key[1]))
   return req.json()

實際上，WIGLE API 呼叫每天都有限制，如果超過該限制，則必須顯示以下錯誤：

try:
   if wigle_results["resultCount"] == 0:
      wifi_dictionary[mac]["Wigle"]["results"] = []
         continue
   else:
      wifi_dictionary[mac]["Wigle"] = wigle_results
except KeyError:
   if wigle_results["error"] == "too many queries today":
      print("[-] Wigle daily query limit exceeded")
      wifi_dictionary[mac]["Wigle"]["results"] = []
      continue
   else:
      print("[-] Other error encountered for " "address {}: {}".format(mac,
wigle_results['error']))
   wifi_dictionary[mac]["Wigle"]["results"] = []
   continue
prep_output(out_csv, wifi_dictionary)

現在，我們將使用prep_output()方法將字典展平為易於寫入的塊：

def prep_output(output, data):
   csv_data = {}
   google_map = https://www.google.com/maps/search/

現在，訪問我們到目前為止收集的所有資料，如下所示：

for x, mac in enumerate(data):
   for y, ts in enumerate(data[mac]["Timestamps"]):
      for z, result in enumerate(data[mac]["Wigle"]["results"]):
         shortres = data[mac]["Wigle"]["results"][z]
         g_map_url = "{}{},{}".format(google_map, shortres["trilat"],shortres["trilong"])

現在，我們可以像本章前面指令碼中所做的那樣，使用write_csv()函式將輸出寫入 CSV 檔案。

調查嵌入式元資料

在本章中，我們將詳細瞭解如何使用 Python 數字取證調查嵌入式元資料。

簡介

嵌入式元資料是指儲存在同一檔案中的有關資料的資訊，該檔案包含由該資料描述的物件。換句話說，它是儲存在數字檔案本身中的有關數字資產的資訊。它始終與檔案相關聯，並且永遠無法分離。

在數字取證的情況下，我們無法提取有關特定檔案的所有資訊。另一方面，嵌入式元資料可以為我們提供對調查至關重要的資訊。例如，文字檔案的元資料可能包含有關作者、長度、編寫日期甚至該文件簡要摘要的資訊。數字影像可能包含諸如影像長度、快門速度等元資料。

包含元資料屬性的工件及其提取

在本節中，我們將瞭解包含元資料屬性的各種工件及其使用 Python 的提取過程。

音訊和影片

這是兩個非常常見的具有嵌入式元資料的工件。可以提取此元資料以用於調查目的。

您可以使用以下 Python 指令碼從音訊或 MP3 檔案以及影片或 MP4 檔案中提取常見屬性或元資料。

請注意，對於此指令碼，我們需要安裝一個名為 mutagen 的第三方 python 庫，它允許我們從音訊和影片檔案中提取元資料。它可以使用以下命令安裝：

pip install mutagen

我們在此 Python 指令碼中需要匯入的一些有用庫如下所示：

from __future__ import print_function

import argparse
import json
import mutagen

命令列處理程式將接受一個引數，該引數表示 MP3 或 MP4 檔案的路徑。然後，我們將使用mutagen.file()方法開啟檔案控制代碼，如下所示：

if __name__ == '__main__':
   parser = argparse.ArgumentParser('Python Metadata Extractor')
   parser.add_argument("AV_FILE", help="File to extract metadata from")
   args = parser.parse_args()
   av_file = mutagen.File(args.AV_FILE)
   file_ext = args.AV_FILE.rsplit('.', 1)[-1]
   
   if file_ext.lower() == 'mp3':
      handle_id3(av_file)
   elif file_ext.lower() == 'mp4':
      handle_mp4(av_file)

現在，我們需要使用兩個控制代碼，一個用於從 MP3 中提取資料，另一個用於從 MP4 檔案中提取資料。我們可以如下定義這些控制代碼：

def handle_id3(id3_file):
   id3_frames = {'TIT2': 'Title', 'TPE1': 'Artist', 'TALB': 'Album','TXXX':
      'Custom', 'TCON': 'Content Type', 'TDRL': 'Date released','COMM': 'Comments',
         'TDRC': 'Recording Date'}
   print("{:15} | {:15} | {:38} | {}".format("Frame", "Description","Text","Value"))
   print("-" * 85)
   
   for frames in id3_file.tags.values():
      frame_name = id3_frames.get(frames.FrameID, frames.FrameID)
      desc = getattr(frames, 'desc', "N/A")
      text = getattr(frames, 'text', ["N/A"])[0]
      value = getattr(frames, 'value', "N/A")
      
      if "date" in frame_name.lower():
         text = str(text)
      print("{:15} | {:15} | {:38} | {}".format(
         frame_name, desc, text, value))
def handle_mp4(mp4_file):
   cp_sym = u"\u00A9"
   qt_tag = {
      cp_sym + 'nam': 'Title', cp_sym + 'art': 'Artist',
      cp_sym + 'alb': 'Album', cp_sym + 'gen': 'Genre',
      'cpil': 'Compilation', cp_sym + 'day': 'Creation Date',
      'cnID': 'Apple Store Content ID', 'atID': 'Album Title ID',
      'plID': 'Playlist ID', 'geID': 'Genre ID', 'pcst': 'Podcast',
      'purl': 'Podcast URL', 'egid': 'Episode Global ID',
      'cmID': 'Camera ID', 'sfID': 'Apple Store Country',
      'desc': 'Description', 'ldes': 'Long Description'}
genre_ids = json.load(open('apple_genres.json'))

現在，我們需要如下遍歷此 MP4 檔案：

print("{:22} | {}".format('Name', 'Value'))
print("-" * 40)

for name, value in mp4_file.tags.items():
   tag_name = qt_tag.get(name, name)
   
   if isinstance(value, list):
      value = "; ".join([str(x) for x in value])
   if name == 'geID':
      value = "{}: {}".format(
      value, genre_ids[str(value)].replace("|", " - "))
   print("{:22} | {}".format(tag_name, value))

上述指令碼將為我們提供有關 MP3 和 MP4 檔案的其他資訊。

影像

影像可能包含不同型別的元資料，具體取決於其檔案格式。但是，大多數影像都嵌入 GPS 資訊。我們可以使用第三方 Python 庫提取此 GPS 資訊。您可以使用以下 Python 指令碼執行此操作：

首先，如下下載名為 **Python Imaging Library (PIL)** 的第三方 Python 庫：

pip install pillow

這將有助於我們從影像中提取元資料。

我們還可以將嵌入在影像中的 GPS 詳細資訊寫入 KML 檔案，但為此我們需要下載名為 **simplekml** 的第三方 Python 庫，如下所示：

pip install simplekml

在此指令碼中，首先需要匯入以下庫：

from __future__ import print_function
import argparse

from PIL import Image
from PIL.ExifTags import TAGS

import simplekml
import sys

現在，命令列處理程式將接受一個位置引數，該引數基本上表示照片的檔案路徑。

parser = argparse.ArgumentParser('Metadata from images')
parser.add_argument('PICTURE_FILE', help = "Path to picture")
args = parser.parse_args()

現在，我們需要指定將填充座標資訊的 URL。這些 URL 是 **gmaps** 和 **open_maps**。我們還需要一個函式來將 PIL 庫提供的度分秒 (DMS) 元組座標轉換為十進位制。可以按如下方式執行：

gmaps = "https://www.google.com/maps?q={},{}"
open_maps = "http://www.openstreetmap.org/?mlat={}&mlon={}"

def process_coords(coord):
   coord_deg = 0
   
   for count, values in enumerate(coord):
      coord_deg += (float(values[0]) / values[1]) / 60**count
   return coord_deg

現在，我們將使用 **image.open()** 函式將檔案作為 PIL 物件開啟。

img_file = Image.open(args.PICTURE_FILE)
exif_data = img_file._getexif()

if exif_data is None:
   print("No EXIF data found")
   sys.exit()
for name, value in exif_data.items():
   gps_tag = TAGS.get(name, name)
   if gps_tag is not 'GPSInfo':
      continue

找到 **GPSInfo** 標籤後，我們將儲存 GPS 參考並使用 **process_coords()** 方法處理座標。

lat_ref = value[1] == u'N'
lat = process_coords(value[2])

if not lat_ref:
   lat = lat * -1
lon_ref = value[3] == u'E'
lon = process_coords(value[4])

if not lon_ref:
   lon = lon * -1

現在，從 **simplekml** 庫初始化 **kml** 物件，如下所示：

kml = simplekml.Kml()
kml.newpoint(name = args.PICTURE_FILE, coords = [(lon, lat)])
kml.save(args.PICTURE_FILE + ".kml")

現在我們可以從處理後的資訊中列印座標，如下所示：

print("GPS Coordinates: {}, {}".format(lat, lon))
print("Google Maps URL: {}".format(gmaps.format(lat, lon)))
print("OpenStreetMap URL: {}".format(open_maps.format(lat, lon)))
print("KML File {} created".format(args.PICTURE_FILE + ".kml"))

PDF 文件

PDF 文件包含各種媒體，包括影像、文字、表單等。當我們提取 PDF 文件中嵌入的元資料時，我們可能會以稱為可擴充套件元資料平臺 (XMP) 的格式獲得結果資料。我們可以藉助以下 Python 程式碼提取元資料：

首先，安裝名為 **PyPDF2** 的第三方 Python 庫以讀取 XMP 格式儲存的元資料。可以按如下方式安裝：

pip install PyPDF2

現在，匯入以下庫以從 PDF 檔案中提取元資料：

from __future__ import print_function
from argparse import ArgumentParser, FileType

import datetime
from PyPDF2 import PdfFileReader
import sys

現在，命令列處理程式將接受一個位置引數，該引數基本上表示 PDF 檔案的檔案路徑。

parser = argparse.ArgumentParser('Metadata from PDF')
parser.add_argument('PDF_FILE', help='Path to PDF file',type=FileType('rb'))
args = parser.parse_args()

現在，我們可以使用 **getXmpMetadata()** 方法提供一個包含可用元資料的物件，如下所示：

pdf_file = PdfFileReader(args.PDF_FILE)
xmpm = pdf_file.getXmpMetadata()

if xmpm is None:
   print("No XMP metadata found in document.")
   sys.exit()

我們可以使用 **custom_print()** 方法提取並列印相關值，例如標題、建立者、貢獻者等，如下所示：

custom_print("Title: {}", xmpm.dc_title)
custom_print("Creator(s): {}", xmpm.dc_creator)
custom_print("Contributors: {}", xmpm.dc_contributor)
custom_print("Subject: {}", xmpm.dc_subject)
custom_print("Description: {}", xmpm.dc_description)
custom_print("Created: {}", xmpm.xmp_createDate)
custom_print("Modified: {}", xmpm.xmp_modifyDate)
custom_print("Event Dates: {}", xmpm.dc_date)

如果 PDF 是使用多個軟體建立的，我們也可以定義 **custom_print()** 方法，如下所示：

def custom_print(fmt_str, value):
   if isinstance(value, list):
      print(fmt_str.format(", ".join(value)))
   elif isinstance(value, dict):
      fmt_value = [":".join((k, v)) for k, v in value.items()]
      print(fmt_str.format(", ".join(value)))
   elif isinstance(value, str) or isinstance(value, bool):
      print(fmt_str.format(value))
   elif isinstance(value, bytes):
      print(fmt_str.format(value.decode()))
   elif isinstance(value, datetime.datetime):
      print(fmt_str.format(value.isoformat()))
   elif value is None:
      print(fmt_str.format("N/A"))
   else:
      print("warn: unhandled type {} found".format(type(value)))

我們還可以提取軟體儲存的任何其他自定義屬性，如下所示：

if xmpm.custom_properties:
   print("Custom Properties:")
   
   for k, v in xmpm.custom_properties.items():
      print("\t{}: {}".format(k, v))

上述指令碼將讀取 PDF 文件，並列印以 XMP 格式儲存的元資料，包括軟體儲存的一些自定義屬性，藉助這些屬性建立了該 PDF。

Windows 可執行檔案

有時我們可能會遇到可疑或未經授權的可執行檔案。但出於調查目的，它可能很有用，因為其中嵌入了元資料。我們可以獲取諸如其位置、用途以及其他屬性（例如製造商、編譯日期等）的資訊。藉助以下 Python 指令碼，我們可以獲取編譯日期、標題中的有用資料以及匯入和匯出的符號。

為此，首先安裝第三方 Python 庫 **pefile**。可以按如下方式執行：

pip install pefile

成功安裝後，如下匯入以下庫：

from __future__ import print_function

import argparse
from datetime import datetime
from pefile import PE

現在，命令列處理程式將接受一個位置引數，該引數基本上表示可執行檔案的檔案路徑。您還可以選擇輸出樣式，是需要詳細和詳細的方式還是簡化方式。為此，您需要給出如下所示的可選引數：

parser = argparse.ArgumentParser('Metadata from executable file')
parser.add_argument("EXE_FILE", help = "Path to exe file")
parser.add_argument("-v", "--verbose", help = "Increase verbosity of output",
action = 'store_true', default = False)
args = parser.parse_args()

現在，我們將使用 PE 類載入輸入可執行檔案。我們還將使用 **dump_dict()** 方法將可執行資料轉儲到字典物件中。

pe = PE(args.EXE_FILE)
ped = pe.dump_dict()

我們可以使用下面顯示的程式碼提取基本檔案元資料，例如嵌入的作者身份、版本和編譯時間。

file_info = {}
for structure in pe.FileInfo:
   if structure.Key == b'StringFileInfo':
      for s_table in structure.StringTable:
         for key, value in s_table.entries.items():
            if value is None or len(value) == 0:
               value = "Unknown"
            file_info[key] = value
print("File Information: ")
print("==================")

for k, v in file_info.items():
   if isinstance(k, bytes):
      k = k.decode()
   if isinstance(v, bytes):
      v = v.decode()
   print("{}: {}".format(k, v))
comp_time = ped['FILE_HEADER']['TimeDateStamp']['Value']
comp_time = comp_time.split("[")[-1].strip("]")
time_stamp, timezone = comp_time.rsplit(" ", 1)
comp_time = datetime.strptime(time_stamp, "%a %b %d %H:%M:%S %Y")
print("Compiled on {} {}".format(comp_time, timezone.strip()))

我們可以從標題中提取有用的資料，如下所示：

for section in ped['PE Sections']:
   print("Section '{}' at {}: {}/{} {}".format(
      section['Name']['Value'], hex(section['VirtualAddress']['Value']),
      section['Misc_VirtualSize']['Value'],
      section['SizeOfRawData']['Value'], section['MD5'])
   )

現在，從可執行檔案中提取匯入和匯出的列表，如下所示：

if hasattr(pe, 'DIRECTORY_ENTRY_IMPORT'):
   print("\nImports: ")
   print("=========")
   
   for dir_entry in pe.DIRECTORY_ENTRY_IMPORT:
      dll = dir_entry.dll
      
      if not args.verbose:
         print(dll.decode(), end=", ")
         continue
      name_list = []
      
      for impts in dir_entry.imports:
         if getattr(impts, "name", b"Unknown") is None:
            name = b"Unknown"
         else:
            name = getattr(impts, "name", b"Unknown")
			name_list.append([name.decode(), hex(impts.address)])
      name_fmt = ["{} ({})".format(x[0], x[1]) for x in name_list]
      print('- {}: {}'.format(dll.decode(), ", ".join(name_fmt)))
   if not args.verbose:
      print()

現在，使用如下所示的程式碼列印 **exports**、**names** 和 **addresses**：

if hasattr(pe, 'DIRECTORY_ENTRY_EXPORT'):
   print("\nExports: ")
   print("=========")
   
   for sym in pe.DIRECTORY_ENTRY_EXPORT.symbols:
      print('- {}: {}'.format(sym.name.decode(), hex(sym.address)))

上述指令碼將從 Windows 可執行檔案中提取基本元資料、標題資訊。

Office 文件元資料

大多數計算機工作都透過 MS Office 的三個應用程式完成——Word、PowerPoint 和 Excel。這些檔案擁有大量的元資料，可以揭示有關其作者身份和歷史的有趣資訊。

請注意，Word（.docx）、Excel（.xlsx）和 PowerPoint（.pptx）的 2007 格式的元資料儲存在 XML 檔案中。我們可以使用以下 Python 指令碼在 Python 中處理這些 XML 檔案：

首先，匯入如下所示的必需庫：

from __future__ import print_function
from argparse import ArgumentParser
from datetime import datetime as dt
from xml.etree import ElementTree as etree

import zipfile
parser = argparse.ArgumentParser('Office Document Metadata’)
parser.add_argument("Office_File", help="Path to office file to read")
args = parser.parse_args()

現在，檢查檔案是否為 ZIP 檔案。否則，引發錯誤。現在，開啟檔案並提取用於處理的關鍵元素，使用以下程式碼：

zipfile.is_zipfile(args.Office_File)
zfile = zipfile.ZipFile(args.Office_File)
core_xml = etree.fromstring(zfile.read('docProps/core.xml'))
app_xml = etree.fromstring(zfile.read('docProps/app.xml'))

現在，建立一個字典來初始化元資料提取：

core_mapping = {
   'title': 'Title',
   'subject': 'Subject',
   'creator': 'Author(s)',
   'keywords': 'Keywords',
   'description': 'Description',
   'lastModifiedBy': 'Last Modified By',
   'modified': 'Modified Date',
   'created': 'Created Date',
   'category': 'Category',
   'contentStatus': 'Status',
   'revision': 'Revision'
}

使用 **iterchildren()** 方法訪問 XML 檔案中的每個標籤：

for element in core_xml.getchildren():
   for key, title in core_mapping.items():
      if key in element.tag:
         if 'date' in title.lower():
            text = dt.strptime(element.text, "%Y-%m-%dT%H:%M:%SZ")
         else:
            text = element.text
         print("{}: {}".format(title, text))

同樣，對包含文件內容統計資訊的 app.xml 檔案執行此操作：

app_mapping = {
   'TotalTime': 'Edit Time (minutes)',
   'Pages': 'Page Count',
   'Words': 'Word Count',
   'Characters': 'Character Count',
   'Lines': 'Line Count',
   'Paragraphs': 'Paragraph Count',
   'Company': 'Company',
   'HyperlinkBase': 'Hyperlink Base',
   'Slides': 'Slide count',
   'Notes': 'Note Count',
   'HiddenSlides': 'Hidden Slide Count',
}
for element in app_xml.getchildren():
   for key, title in app_mapping.items():
      if key in element.tag:
         if 'date' in title.lower():
            text = dt.strptime(element.text, "%Y-%m-%dT%H:%M:%SZ")
         else:
            text = element.text
         print("{}: {}".format(title, text))

現在，在執行上述指令碼後，我們可以獲得有關特定文件的不同詳細資訊。請注意，我們只能對 Office 2007 或更高版本的文件應用此指令碼。

Python 數字網路取證-I

本章將解釋使用 Python 執行網路取證涉及的基本原理。

瞭解網路取證

網路取證是數字取證的一個分支，它處理對計算機網路流量（本地和廣域網 (WAN)）的監控和分析，以收集資訊、收集證據或入侵檢測為目的。網路取證在調查諸如智慧財產權盜竊或資訊洩露等數字犯罪方面發揮著至關重要的作用。網路通訊的畫面幫助調查人員解決一些關鍵問題，如下所示：

訪問了哪些網站？
在我們的網路上上傳了哪種內容？
從我們的網路下載了哪種內容？
正在訪問哪些伺服器？
是否有人將敏感資訊傳送到公司防火牆之外？

Internet Evidence Finder (IEF)

IEF 是一種數字取證工具，用於查詢、分析和呈現在不同數字媒體（如計算機、智慧手機、平板電腦等）上發現的數字證據。它非常流行，被數千名取證專業人員使用。

IEF 的用途

由於其受歡迎程度，IEF 在很大程度上被取證專業人員使用。IEF 的一些用途如下：

由於其強大的搜尋功能，它被用於同時搜尋多個檔案或資料媒體。
它還用於透過新的雕刻技術從 RAM 的未分配空間中恢復已刪除的資料。
如果調查人員希望在他們開啟的日期以其原始格式重建網頁，則可以使用 IEF。
它還用於搜尋邏輯或物理磁碟卷。

使用 Python 將 IEF 的報告轉儲到 CSV

IEF 將資料儲存在 SQLite 資料庫中，以下 Python 指令碼將動態識別 IEF 資料庫中的結果表並將它們轉儲到各自的 CSV 檔案中。

此過程按以下步驟完成

首先，生成 IEF 結果資料庫，該資料庫將是一個以 .db 為副檔名的 SQLite 資料庫檔案。
然後，查詢該資料庫以識別所有表。
最後，將這些結果表寫入單個 CSV 檔案。

Python 程式碼

讓我們看看如何為此目的使用 Python 程式碼：

對於 Python 指令碼，如下匯入必要的庫：

from __future__ import print_function

import argparse
import csv
import os
import sqlite3
import sys

現在，我們需要提供 IEF 資料庫檔案的路徑：

if __name__ == '__main__':
   parser = argparse.ArgumentParser('IEF to CSV')
   parser.add_argument("IEF_DATABASE", help="Input IEF database")
   parser.add_argument("OUTPUT_DIR", help="Output DIR")
   args = parser.parse_args()

現在，我們將確認 IEF 資料庫是否存在，如下所示：

if not os.path.exists(args.OUTPUT_DIR):
   os.makedirs(args.OUTPUT_DIR)
if os.path.exists(args.IEF_DATABASE) and \ os.path.isfile(args.IEF_DATABASE):
   main(args.IEF_DATABASE, args.OUTPUT_DIR)
else:
   print("[-] Supplied input file {} does not exist or is not a " "file".format(args.IEF_DATABASE))
   sys.exit(1)

現在，正如我們在之前的指令碼中所做的那樣，如下與 SQLite 資料庫建立連線，以便透過遊標執行查詢：

def main(database, out_directory):
   print("[+] Connecting to SQLite database")
   conn = sqlite3.connect(database)
   c = conn.cursor()

以下程式碼行將從資料庫中獲取表的名稱：

print("List of all tables to extract")
c.execute("select * from sqlite_master where type = 'table'")
tables = [x[2] for x in c.fetchall() if not x[2].startswith('_') and not x[2].endswith('_DATA')]

現在，我們將從表中選擇所有資料，並使用遊標物件上的 **fetchall()** 方法將包含表資料的元組列表完整地儲存在一個變數中：

print("Dumping {} tables to CSV files in {}".format(len(tables), out_directory))

for table in tables:
c.execute("pragma table_info('{}')".format(table))
table_columns = [x[1] for x in c.fetchall()]

c.execute("select * from '{}'".format(table))
table_data = c.fetchall()

現在，使用 **CSV_Writer()** 方法，我們將內容寫入 CSV 檔案：

csv_name = table + '.csv'
csv_path = os.path.join(out_directory, csv_name)
print('[+] Writing {} table to {} CSV file'.format(table,csv_name))

with open(csv_path, "w", newline = "") as csvfile:
   csv_writer = csv.writer(csvfile)
   csv_writer.writerow(table_columns)
   csv_writer.writerows(table_data)

上述指令碼將從 IEF 資料庫的表中獲取所有資料，並將內容寫入我們選擇的 CSV 檔案。

使用快取資料

從 IEF 結果資料庫中，我們可以獲取更多 IEF 本身不一定支援的資訊。我們可以透過使用 IEF 結果資料庫來獲取快取資料，這是來自 Yahoo、Google 等電子郵件服務提供商的資訊副產品。

以下是使用 IEF 資料庫訪問在 Google Chrome 上訪問的 Yahoo 郵件的快取資料資訊的 Python 指令碼。請注意，步驟與上一個 Python 指令碼中的步驟大致相同。

首先，如下匯入 Python 的必要庫：

from __future__ import print_function
import argparse
import csv
import os
import sqlite3
import sys
import json

現在，提供 IEF 資料庫檔案的路徑以及命令列處理程式接受的兩個位置引數，如上一個指令碼中所做的那樣：

if __name__ == '__main__':
   parser = argparse.ArgumentParser('IEF to CSV')
   parser.add_argument("IEF_DATABASE", help="Input IEF database")
   parser.add_argument("OUTPUT_DIR", help="Output DIR")
   args = parser.parse_args()

現在，確認 IEF 資料庫是否存在，如下所示：

directory = os.path.dirname(args.OUTPUT_CSV)

if not os.path.exists(directory):os.makedirs(directory)
if os.path.exists(args.IEF_DATABASE) and \ os.path.isfile(args.IEF_DATABASE):
   main(args.IEF_DATABASE, args.OUTPUT_CSV)
   else: print("Supplied input file {} does not exist or is not a " "file".format(args.IEF_DATABASE))
sys.exit(1)

現在，如下與 SQLite 資料庫建立連線，以便透過遊標執行查詢：

def main(database, out_csv):
   print("[+] Connecting to SQLite database")
   conn = sqlite3.connect(database)
   c = conn.cursor()

您可以使用以下程式碼行來獲取 Yahoo Mail 聯絡人快取記錄的例項：

print("Querying IEF database for Yahoo Contact Fragments from " "the Chrome Cache Records Table")
   try:
      c.execute("select * from 'Chrome Cache Records' where URL like " "'https://data.mail.yahoo.com" "/classicab/v2/contacts/?format=json%'")
   except sqlite3.OperationalError:
      print("Received an error querying the database --    database may be" "corrupt or not have a Chrome Cache Records table")
      sys.exit(2)

現在，將上述查詢返回的元組列表儲存到一個變數中，如下所示：

contact_cache = c.fetchall()
contact_data = process_contacts(contact_cache)
write_csv(contact_data, out_csv)

請注意，這裡我們將使用兩種方法，即 **process_contacts()** 用於設定結果列表以及遍歷每個聯絡人快取記錄，以及 **json.loads()** 將從表中提取的 JSON 資料儲存到一個變數中以供進一步操作：

def process_contacts(contact_cache):
   print("[+] Processing {} cache files matching Yahoo contact cache " " data".format(len(contact_cache)))
   results = []
   
   for contact in contact_cache:
      url = contact[0]
      first_visit = contact[1]
      last_visit = contact[2]
      last_sync = contact[3]
      loc = contact[8]
	   contact_json = json.loads(contact[7].decode())
      total_contacts = contact_json["total"]
      total_count = contact_json["count"]
      
      if "contacts" not in contact_json:
         continue
      for c in contact_json["contacts"]:
         name, anni, bday, emails, phones, links = ("", "", "", "", "", "")
            if "name" in c:
            name = c["name"]["givenName"] + " " + \ c["name"]["middleName"] + " " + c["name"]["familyName"]
            
            if "anniversary" in c:
            anni = c["anniversary"]["month"] + \"/" + c["anniversary"]["day"] + "/" + \c["anniversary"]["year"]
            
            if "birthday" in c:
            bday = c["birthday"]["month"] + "/" + \c["birthday"]["day"] + "/" + c["birthday"]["year"]
            
            if "emails" in c:
               emails = ', '.join([x["ep"] for x in c["emails"]])
            
            if "phones" in c:
               phones = ', '.join([x["ep"] for x in c["phones"]])
            
            if "links" in c:
              links = ', '.join([x["ep"] for x in c["links"]])

現在，對於公司、職位和註釋，使用 get 方法，如下所示：

company = c.get("company", "")
title = c.get("jobTitle", "")
notes = c.get("notes", "")

現在，我們將元資料和提取的資料元素列表附加到結果列表中，如下所示：

results.append([url, first_visit, last_visit, last_sync, loc, name, bday,anni, emails, phones, links, company, title, notes,total_contacts, total_count])
return results

現在，使用 **CSV_Writer()** 方法，我們將內容寫入 CSV 檔案：

def write_csv(data, output):
   print("[+] Writing {} contacts to {}".format(len(data), output))
   with open(output, "w", newline="") as csvfile:
      csv_writer = csv.writer(csvfile)
      csv_writer.writerow([
         "URL", "First Visit (UTC)", "Last Visit (UTC)",
         "Last Sync (UTC)", "Location", "Contact Name", "Bday",
         "Anniversary", "Emails", "Phones", "Links", "Company", "Title",
         "Notes", "Total Contacts", "Count of Contacts in Cache"])
      csv_writer.writerows(data)

藉助上述指令碼，我們可以使用 IEF 資料庫處理 Yahoo 郵件的快取資料。

Python 數字網路取證-II

上一章討論了使用 Python 進行網路取證的一些概念。在本章中，讓我們更深入地瞭解使用 Python 進行網路取證。

使用 Beautiful Soup 保留網頁

全球資訊網 (WWW) 是一個獨特的資源資訊庫。然而，由於內容丟失的速度驚人，其遺產面臨著巨大的風險。許多文化遺產和學術機構、非營利組織和私營企業都探索了相關問題，併為網路存檔的技術解決方案的開發做出了貢獻。

網頁保留或網路存檔是從全球資訊網收集資料，確保資料儲存在存檔中，並使其可供未來的研究人員、歷史學家和公眾使用。在進一步深入網頁保留之前，讓我們先討論一些與網頁保留相關的重要問題，如下所示：

網路資源的變化 - 網路資源每天都在變化，這對網頁保留提出了挑戰。
大量資源 - 網頁保留相關的另一個問題是需要保留的大量資源。
完整性 - 必須保護網頁免受未經授權的修改、刪除或移除，以保護其完整性。
處理多媒體資料 - 保留網頁時，我們還需要處理多媒體資料，這可能會導致一些問題。
提供訪問許可權 - 除了保留之外，還需要解決提供對網路資源的訪問許可權以及處理所有權問題。

在本章中，我們將使用名為Beautiful Soup的 Python 庫來進行網頁保留。

什麼是 Beautiful Soup？

Beautiful Soup 是一個 Python 庫，用於從 HTML 和 XML 檔案中提取資料。它可以與urlib一起使用，因為它需要一個輸入（文件或 URL）來建立 soup 物件，因為它本身無法獲取網頁。您可以在www.crummy.com/software/BeautifulSoup/bs4/doc/詳細瞭解此內容。

請注意，在使用它之前，我們必須使用以下命令安裝第三方庫：

pip install bs4

接下來，使用 Anaconda 包管理器，我們可以如下安裝 Beautiful Soup：

conda install -c anaconda beautifulsoup4

用於保留網頁的 Python 指令碼

這裡討論了使用名為 Beautiful Soup 的第三方庫來保留網頁的 Python 指令碼：

首先，匯入所需的庫，如下所示：

from __future__ import print_function
import argparse

from bs4 import BeautifulSoup, SoupStrainer
from datetime import datetime

import hashlib
import logging
import os
import ssl
import sys
from urllib.request import urlopen

import urllib.error
logger = logging.getLogger(__name__)

請注意，此指令碼將接受兩個位置引數，一個是需要保留的 URL，另一個是所需的輸出目錄，如下所示：

if __name__ == "__main__":
   parser = argparse.ArgumentParser('Web Page preservation')
   parser.add_argument("DOMAIN", help="Website Domain")
   parser.add_argument("OUTPUT_DIR", help="Preservation Output Directory")
   parser.add_argument("-l", help="Log file path",
   default=__file__[:-3] + ".log")
   args = parser.parse_args()

現在，透過指定一個檔案和流處理程式來設定指令碼的日誌記錄，以便迴圈並記錄獲取過程，如下所示：

logger.setLevel(logging.DEBUG)
msg_fmt = logging.Formatter("%(asctime)-15s %(funcName)-10s""%(levelname)-8s %(message)s")
strhndl = logging.StreamHandler(sys.stderr)
strhndl.setFormatter(fmt=msg_fmt)
fhndl = logging.FileHandler(args.l, mode='a')
fhndl.setFormatter(fmt=msg_fmt)

logger.addHandler(strhndl)
logger.addHandler(fhndl)
logger.info("Starting BS Preservation")
logger.debug("Supplied arguments: {}".format(sys.argv[1:]))
logger.debug("System " + sys.platform)
logger.debug("Version " + sys.version)

現在，讓我們對所需的輸出目錄進行輸入驗證，如下所示：

if not os.path.exists(args.OUTPUT_DIR):
   os.makedirs(args.OUTPUT_DIR)
main(args.DOMAIN, args.OUTPUT_DIR)

現在，我們將定義main()函式，該函式將在提取網站的基本名稱之前刪除實際名稱之前的多餘元素，並對輸入 URL 進行額外驗證，如下所示：

def main(website, output_dir):
   base_name = website.replace("https://", "").replace("http://", "").replace("www.", "")
   link_queue = set()
   
   if "http://" not in website and "https://" not in website:
      logger.error("Exiting preservation - invalid user input: {}".format(website))
      sys.exit(1)
   logger.info("Accessing {} webpage".format(website))
   context = ssl._create_unverified_context()

現在，我們需要使用 urlopen() 方法開啟與 URL 的連線。讓我們使用 try-except 塊，如下所示：

try:
   index = urlopen(website, context=context).read().decode("utf-8")
except urllib.error.HTTPError as e:
   logger.error("Exiting preservation - unable to access page: {}".format(website))
   sys.exit(2)
logger.debug("Successfully accessed {}".format(website))

接下來的幾行程式碼包含三個函式，如下所述：

write_output() 將第一個網頁寫入輸出目錄
find_links() 函式用於識別此網頁上的連結
recurse_pages() 函式用於迭代並發現網頁上的所有連結。

write_output(website, index, output_dir)
link_queue = find_links(base_name, index, link_queue)
logger.info("Found {} initial links on webpage".format(len(link_queue)))
recurse_pages(website, link_queue, context, output_dir)
logger.info("Completed preservation of {}".format(website))

現在，讓我們定義write_output()方法，如下所示：

def write_output(name, data, output_dir, counter=0):
   name = name.replace("http://", "").replace("https://", "").rstrip("//")
   directory = os.path.join(output_dir, os.path.dirname(name))
   
   if not os.path.exists(directory) and os.path.dirname(name) != "":
      os.makedirs(directory)

我們需要記錄有關網頁的一些詳細資訊，然後使用hash_data()方法記錄資料的雜湊值，如下所示：

logger.debug("Writing {} to {}".format(name, output_dir)) logger.debug("Data Hash: {}".format(hash_data(data)))
path = os.path.join(output_dir, name)
path = path + "_" + str(counter)
with open(path, "w") as outfile:
   outfile.write(data)
logger.debug("Output File Hash: {}".format(hash_file(path)))

現在，定義hash_data()方法，藉助該方法，我們讀取UTF-8編碼的資料，然後生成其SHA-256雜湊值，如下所示：

def hash_data(data):
   sha256 = hashlib.sha256()
   sha256.update(data.encode("utf-8"))
   return sha256.hexdigest()
def hash_file(file):
   sha256 = hashlib.sha256()
   with open(file, "rb") as in_file:
      sha256.update(in_file.read())
return sha256.hexdigest()

現在，讓我們在find_links()方法中從網頁資料中建立一個Beautifulsoup物件，如下所示：

def find_links(website, page, queue):
   for link in BeautifulSoup(page, "html.parser",parse_only = SoupStrainer("a", href = True)):
      if website in link.get("href"):
         if not os.path.basename(link.get("href")).startswith("#"):
            queue.add(link.get("href"))
   return queue

現在，我們需要定義recurse_pages()方法，併為其提供網站 URL、當前連結佇列、未經驗證的 SSL 上下文和輸出目錄的輸入，如下所示：

def recurse_pages(website, queue, context, output_dir):
   processed = []
   counter = 0
   
   while True:
      counter += 1
      if len(processed) == len(queue):
         break
      for link in queue.copy(): if link in processed:
         continue
	   processed.append(link)
      try:
      page = urlopen(link,      context=context).read().decode("utf-8")
      except urllib.error.HTTPError as e:
         msg = "Error accessing webpage: {}".format(link)
         logger.error(msg)
         continue

現在，透過傳遞連結名稱、頁面資料、輸出目錄和計數器，將每個訪問的網頁的輸出寫入檔案，如下所示：

write_output(link, page, output_dir, counter)
queue = find_links(website, page, queue)
logger.info("Identified {} links throughout website".format(
   len(queue)))

現在，當我們透過提供網站的 URL、輸出目錄和日誌檔案的路徑來執行此指令碼時，我們將獲得有關該網頁的詳細資訊，這些資訊可用於將來使用。

病毒狩獵

您是否曾經想過取證分析師、安全研究人員和事件響應人員如何能夠理解有用軟體和惡意軟體之間的區別？答案就在問題本身，因為如果不研究駭客快速生成的惡意軟體，研究人員和專家就很難區分有用軟體和惡意軟體。在本節中，讓我們討論VirusShare，這是一個完成此任務的工具。

瞭解 VirusShare

VirusShare 是最大的私有惡意軟體樣本集合，為安全研究人員、事件響應人員和取證分析師提供活動惡意程式碼的樣本。它包含超過 3000 萬個樣本。

VirusShare 的優勢在於其免費提供的惡意軟體雜湊列表。任何人都可以使用這些雜湊值來建立一個非常全面的雜湊集，並用它來識別潛在的惡意檔案。但在使用 VirusShare 之前，我們建議您訪問https://virusshare.com以瞭解更多詳細資訊。

使用 Python 從 VirusShare 建立換行符分隔的雜湊列表

來自 VirusShare 的雜湊列表可用於各種取證工具，例如 X-ways 和 EnCase。在下面討論的指令碼中，我們將自動化從 VirusShare 下載雜湊列表的過程，以建立換行符分隔的雜湊列表。

對於此指令碼，我們需要一個名為tqdm的第三方 Python 庫，可以如下下載：

pip install tqdm

請注意，在此指令碼中，我們首先將讀取 VirusShare 雜湊頁面並動態識別最新的雜湊列表。然後，我們將初始化進度條並在所需的範圍內下載雜湊列表。

首先，匯入以下庫：

from __future__ import print_function

import argparse
import os
import ssl
import sys
import tqdm

from urllib.request import urlopen
import urllib.error

此指令碼將接受一個位置引數，該引數將是雜湊集所需的路徑：

if __name__ == '__main__':
   parser = argparse.ArgumentParser('Hash set from VirusShare')
   parser.add_argument("OUTPUT_HASH", help = "Output Hashset")
   parser.add_argument("--start", type = int, help = "Optional starting location")
   args = parser.parse_args()

現在，我們將執行標準輸入驗證，如下所示：

directory = os.path.dirname(args.OUTPUT_HASH)
if not os.path.exists(directory):
   os.makedirs(directory)
if args.start:
   main(args.OUTPUT_HASH, start=args.start)
else:
   main(args.OUTPUT_HASH)

現在，我們需要定義main()函式，並使用**kwargs作為引數，因為這將建立一個字典，我們可以參考它來支援提供的鍵引數，如下所示：

def main(hashset, **kwargs):
   url = "https://virusshare.com/hashes.4n6"
   print("[+] Identifying hash set range from {}".format(url))
   context = ssl._create_unverified_context()

現在，我們需要使用urlib.request.urlopen()方法開啟 VirusShare 雜湊頁面。我們將使用 try-except 塊，如下所示：

try:
   index = urlopen(url, context = context).read().decode("utf-8")
except urllib.error.HTTPError as e:
   print("[-] Error accessing webpage - exiting..")
   sys.exit(1)

現在，從下載的頁面中識別最新的雜湊列表。您可以透過查詢 HTML href標籤到 VirusShare 雜湊列表的最後一個例項來執行此操作。可以使用以下幾行程式碼完成：

tag = index.rfind(r'a href = "hashes/VirusShare_')
stop = int(index[tag + 27: tag + 27 + 5].lstrip("0"))

if "start" not in kwa<rgs:
   start = 0
else:
   start = kwargs["start"]

if start < 0 or start > stop:
   print("[-] Supplied start argument must be greater than or equal ""to zero but less than the latest hash list, ""currently: {}".format(stop))
sys.exit(2)
print("[+] Creating a hashset from hash lists {} to {}".format(start, stop))
hashes_downloaded = 0

現在，我們將使用tqdm.trange()方法建立迴圈和進度條，如下所示：

for x in tqdm.trange(start, stop + 1, unit_scale=True,desc="Progress"):
   url_hash = "https://virusshare.com/hashes/VirusShare_"\"{}.md5".format(str(x).zfill(5))
   try:
      hashes = urlopen(url_hash, context=context).read().decode("utf-8")
      hashes_list = hashes.split("\n")
   except urllib.error.HTTPError as e:
      print("[-] Error accessing webpage for hash list {}"" - continuing..".format(x))
   continue

成功執行上述步驟後，我們將以 a+ 模式開啟雜湊集文字檔案，以追加到文字檔案的底部。

with open(hashset, "a+") as hashfile:
   for line in hashes_list:
   if not line.startswith("#") and line != "":
      hashes_downloaded += 1
      hashfile.write(line + '\n')
   print("[+] Finished downloading {} hashes into {}".format(
      hashes_downloaded, hashset))

執行上述指令碼後，您將獲得最新的雜湊列表，其中包含以文字格式表示的 MD5 雜湊值。

使用電子郵件進行調查

前面的章節討論了網路取證的重要性、流程和相關概念。在本章中，讓我們瞭解電子郵件在數字取證中的作用以及使用 Python 對其進行調查。

電子郵件在調查中的作用

電子郵件在商務溝通中發揮著非常重要的作用，並且已成為網際網路上最重要的應用之一。它們是傳送訊息和文件的便捷方式，不僅可以透過計算機，還可以透過其他電子裝置（如手機和平板電腦）傳送。

電子郵件的負面影響是犯罪分子可能會洩露有關其公司的重要資訊。因此，近年來，電子郵件在數字取證中的作用日益增強。在數字取證中，電子郵件被視為關鍵證據，電子郵件報頭分析已成為在取證過程中收集證據的重要手段。

調查人員在執行電子郵件取證時具有以下目標：

識別主要犯罪分子
收集必要的證據
展示調查結果
構建案件

電子郵件取證中的挑戰

電子郵件取證在調查中發揮著非常重要的作用，因為當今大多數通訊都依賴於電子郵件。但是，電子郵件取證調查人員在調查過程中可能會遇到以下挑戰：

虛假電子郵件

電子郵件取證中最大的挑戰是使用虛假電子郵件，這些電子郵件是透過操縱和編寫指令碼報頭等建立的。在此類別中，犯罪分子還會使用臨時電子郵件，這是一種允許註冊使用者在特定時間段後過期的臨時地址接收電子郵件的服務。

欺騙

電子郵件取證中的另一個挑戰是欺騙，其中犯罪分子習慣於將電子郵件偽裝成他人的電子郵件。在這種情況下，機器將同時接收虛假和原始 IP 地址。

匿名轉發電子郵件

在此，電子郵件伺服器在轉發電子郵件之前會去除電子郵件訊息中的識別資訊。這給電子郵件調查帶來了另一個重大挑戰。

電子郵件取證調查中使用的技術

電子郵件取證是對電子郵件來源和內容的研究，作為證據來識別訊息的實際發件人和收件人，以及一些其他資訊，例如傳輸日期/時間和發件人的意圖。它涉及調查元資料、埠掃描以及關鍵字搜尋。

一些可用於電子郵件取證調查的常見技術包括

報頭分析
伺服器調查
網路裝置調查
發件人郵件指紋
軟體嵌入式識別符號

在以下各節中，我們將學習如何使用 Python 獲取資訊以進行電子郵件調查。

從 EML 檔案中提取資訊

EML 檔案基本上是檔案格式的電子郵件，廣泛用於儲存電子郵件訊息。它們是跨多個電子郵件客戶端（如 Microsoft Outlook、Outlook Express 和 Windows Live Mail）相容的結構化文字檔案。

EML 檔案將電子郵件報頭、正文內容、附件資料儲存為純文字。它使用 base64 編碼二進位制資料，並使用 Quoted-Printable (QP) 編碼儲存內容資訊。下面給出了可用於從 EML 檔案中提取資訊的 Python 指令碼：

首先，匯入以下 Python 庫，如下所示：

from __future__ import print_function
from argparse import ArgumentParser, FileType
from email import message_from_file

import os
import quopri
import base64

在上述庫中，quopri用於解碼來自 EML 檔案的 QP 編碼值。任何 base64 編碼的資料都可以藉助base64庫進行解碼。

接下來，讓我們為命令列處理程式提供引數。請注意，這裡它將只接受一個引數，即 EML 檔案的路徑，如下所示：

if __name__ == '__main__':
   parser = ArgumentParser('Extracting information from EML file')
   parser.add_argument("EML_FILE",help="Path to EML File", type=FileType('r'))
   args = parser.parse_args()
   main(args.EML_FILE)

現在，我們需要定義main()函式，在該函式中，我們將使用來自 email 庫的名為message_from_file()的方法讀取檔案類物件。在這裡，我們將透過使用名為emlfile的結果變數來訪問報頭、正文內容、附件和其他有效負載資訊，如以下程式碼所示：

def main(input_file):
   emlfile = message_from_file(input_file)
   for key, value in emlfile._headers:
      print("{}: {}".format(key, value))
print("\nBody\n")

if emlfile.is_multipart():
   for part in emlfile.get_payload():
      process_payload(part)
else:
   process_payload(emlfile[1])

現在，我們需要在其中定義process_payload()方法，我們將使用get_payload()方法提取訊息正文內容。我們將使用quopri.decodestring()函式解碼QP編碼的資料。我們還將檢查內容的MIME型別，以便它可以正確地處理電子郵件的儲存。請觀察下面給出的程式碼 -

def process_payload(payload):
   print(payload.get_content_type() + "\n" + "=" * len(payload.get_content_type()))
   body = quopri.decodestring(payload.get_payload())
   
   if payload.get_charset():
      body = body.decode(payload.get_charset())
else:
   try:
      body = body.decode()
   except UnicodeDecodeError:
      body = body.decode('cp1252')

if payload.get_content_type() == "text/html":
   outfile = os.path.basename(args.EML_FILE.name) + ".html"
   open(outfile, 'w').write(body)
elif payload.get_content_type().startswith('application'):
   outfile = open(payload.get_filename(), 'wb')
   body = base64.b64decode(payload.get_payload())
   outfile.write(body)
   outfile.close()
   print("Exported: {}\n".format(outfile.name))
else:
   print(body)

執行上述指令碼後，我們將在控制檯上獲得標題資訊以及各種有效負載。

使用Python分析MSG檔案

電子郵件有多種不同的格式。MSG是Microsoft Outlook和Exchange使用的一種格式。副檔名為MSG的檔案可能包含標題和主訊息正文的純ASCII文字，以及超連結和附件。

在本節中，我們將學習如何使用Outlook API從MSG檔案中提取資訊。請注意，以下Python指令碼僅在Windows上有效。為此，我們需要安裝名為pywin32的第三方Python庫，如下所示 -

pip install pywin32

現在，使用顯示的命令匯入以下庫 -

from __future__ import print_function
from argparse import ArgumentParser

import os
import win32com.client
import pywintypes

現在，讓我們為命令列處理程式提供一個引數。這裡它將接受兩個引數，一個是MSG檔案的路徑，另一個是所需的輸出資料夾，如下所示 -

if __name__ == '__main__':
   parser = ArgumentParser(‘Extracting information from MSG file’)
   parser.add_argument("MSG_FILE", help="Path to MSG file")
   parser.add_argument("OUTPUT_DIR", help="Path to output folder")
   args = parser.parse_args()
   out_dir = args.OUTPUT_DIR
   
   if not os.path.exists(out_dir):
      os.makedirs(out_dir)
   main(args.MSG_FILE, args.OUTPUT_DIR)

現在，我們需要定義main()函式，在其中我們將呼叫win32com庫來設定Outlook API，這進一步允許訪問MAPI名稱空間。

def main(msg_file, output_dir):
   mapi = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
   msg = mapi.OpenSharedItem(os.path.abspath(args.MSG_FILE))
   
   display_msg_attribs(msg)
   display_msg_recipients(msg)
   
   extract_msg_body(msg, output_dir)
   extract_attachments(msg, output_dir)

現在，定義我們在該指令碼中使用的不同函式。下面給出的程式碼顯示了定義display_msg_attribs()函式，該函式允許我們顯示訊息的各種屬性，例如主題、收件人、BCC、CC、大小、發件人姓名、傳送時間等。

def display_msg_attribs(msg):
   attribs = [
      'Application', 'AutoForwarded', 'BCC', 'CC', 'Class',
      'ConversationID', 'ConversationTopic', 'CreationTime',
      'ExpiryTime', 'Importance', 'InternetCodePage', 'IsMarkedAsTask',
      'LastModificationTime', 'Links','ReceivedTime', 'ReminderSet',
      'ReminderTime', 'ReplyRecipientNames', 'Saved', 'Sender',
      'SenderEmailAddress', 'SenderEmailType', 'SenderName', 'Sent',
      'SentOn', 'SentOnBehalfOfName', 'Size', 'Subject',
      'TaskCompletedDate', 'TaskDueDate', 'To', 'UnRead'
   ]
   print("\nMessage Attributes")
   for entry in attribs:
      print("{}: {}".format(entry, getattr(msg, entry, 'N/A')))

現在，定義display_msg_recipeints()函式，該函式遍歷訊息並顯示收件人詳細資訊。

def display_msg_recipients(msg):
   recipient_attrib = ['Address', 'AutoResponse', 'Name', 'Resolved', 'Sendable']
   i = 1
   
   while True:
   try:
      recipient = msg.Recipients(i)
   except pywintypes.com_error:
      break
   print("\nRecipient {}".format(i))
   print("=" * 15)
   
   for entry in recipient_attrib:
      print("{}: {}".format(entry, getattr(recipient, entry, 'N/A')))
   i += 1

接下來，我們定義extract_msg_body()函式，該函式從訊息中提取正文內容、HTML以及純文字。

def extract_msg_body(msg, out_dir):
   html_data = msg.HTMLBody.encode('cp1252')
   outfile = os.path.join(out_dir, os.path.basename(args.MSG_FILE))
   
   open(outfile + ".body.html", 'wb').write(html_data)
   print("Exported: {}".format(outfile + ".body.html"))
   body_data = msg.Body.encode('cp1252')
   
   open(outfile + ".body.txt", 'wb').write(body_data)
   print("Exported: {}".format(outfile + ".body.txt"))

接下來，我們將定義extract_attachments()函式，該函式將附件資料匯出到所需的輸出目錄。

def extract_attachments(msg, out_dir):
   attachment_attribs = ['DisplayName', 'FileName', 'PathName', 'Position', 'Size']
   i = 1 # Attachments start at 1
   
   while True:
      try:
         attachment = msg.Attachments(i)
   except pywintypes.com_error:
      break

所有函式定義完成後，我們將使用以下程式碼行將所有屬性列印到控制檯 -

print("\nAttachment {}".format(i))
print("=" * 15)
   
for entry in attachment_attribs:
   print('{}: {}'.format(entry, getattr(attachment, entry,"N/A")))
outfile = os.path.join(os.path.abspath(out_dir),os.path.split(args.MSG_FILE)[-1])
   
if not os.path.exists(outfile):
os.makedirs(outfile)
outfile = os.path.join(outfile, attachment.FileName)
attachment.SaveAsFile(outfile)
   
print("Exported: {}".format(outfile))
i += 1

執行上述指令碼後，我們將獲得控制檯視窗中訊息及其附件的屬性以及輸出目錄中的多個檔案。

使用Python構建來自Google Takeout的MBOX檔案

MBOX檔案是具有特殊格式的文字檔案，用於分割儲存在其中的訊息。它們通常與UNIX系統、Thunderbolt和Google Takeout相關聯。

在本節中，您將看到一個Python指令碼，我們將使用該指令碼構建從Google Takeout獲得的MBOX檔案。但在那之前，我們必須知道如何使用我們的Google帳戶或Gmail帳戶生成這些MBOX檔案。

獲取Google帳戶郵箱到MBX格式

獲取Google帳戶郵箱意味著備份我們的Gmail帳戶。出於各種個人或職業原因，可以進行備份。請注意，Google提供了Gmail資料的備份。要將我們的Google帳戶郵箱獲取到MBOX格式，您需要按照以下步驟操作 -

開啟我的帳戶儀表板。
轉到“個人資訊和隱私”部分，然後選擇“控制您的內容”連結。
您可以建立新的存檔或管理現有的存檔。如果我們點選建立存檔連結，那麼我們將獲得一些複選框，用於我們希望包含的每個Google產品。
選擇產品後，我們將能夠選擇檔案型別和存檔的最大大小，以及從列表中選擇的傳遞方法。
最後，我們將以MBOX格式獲得此備份。

Python 程式碼

現在，上面討論的MBOX檔案可以使用Python構建，如下所示 -

首先，需要匯入Python庫，如下所示 -

from __future__ import print_function
from argparse import ArgumentParser

import mailbox
import os
import time
import csv
from tqdm import tqdm

import base64

除了用於解析MBOX檔案的mailbox庫外，所有庫都已在之前的指令碼中使用並進行了說明。

現在，為命令列處理程式提供一個引數。這裡它將接受兩個引數 - 一個是MBOX檔案的路徑，另一個是所需的輸出資料夾。

if __name__ == '__main__':
   parser = ArgumentParser('Parsing MBOX files')
   parser.add_argument("MBOX", help="Path to mbox file")
   parser.add_argument(
      "OUTPUT_DIR",help = "Path to output directory to write report ""and exported content")
   args = parser.parse_args()
   main(args.MBOX, args.OUTPUT_DIR)

現在，將定義main()函式並呼叫mbox庫的mbox類，藉助該類，我們可以透過提供其路徑來解析MBOX檔案 -

def main(mbox_file, output_dir):
   print("Reading mbox file")
   mbox = mailbox.mbox(mbox_file, factory=custom_reader)
   print("{} messages to parse".format(len(mbox)))

現在，為mailbox庫定義一個讀取器方法，如下所示 -

def custom_reader(data_stream):
   data = data_stream.read()
   try:
      content = data.decode("ascii")
   except (UnicodeDecodeError, UnicodeEncodeError) as e:
      content = data.decode("cp1252", errors="replace")
   return mailbox.mboxMessage(content)

現在，為進一步處理建立一些變數，如下所示 -

parsed_data = []
attachments_dir = os.path.join(output_dir, "attachments")

if not os.path.exists(attachments_dir):
   os.makedirs(attachments_dir)
columns = [
   "Date", "From", "To", "Subject", "X-Gmail-Labels", "Return-Path", "Received", 
   "Content-Type", "Message-ID","X-GM-THRID", "num_attachments_exported", "export_path"]

接下來，使用tqdm生成進度條並跟蹤迭代過程，如下所示 -

for message in tqdm(mbox):
   msg_data = dict()
   header_data = dict(message._headers)
for hdr in columns:
   msg_data[hdr] = header_data.get(hdr, "N/A")

現在，檢查訊息是否包含有效負載。如果包含，我們將定義write_payload()方法，如下所示 -

if len(message.get_payload()):
   export_path = write_payload(message, attachments_dir)
   msg_data['num_attachments_exported'] = len(export_path)
   msg_data['export_path'] = ", ".join(export_path)

現在，需要追加資料。然後我們將呼叫create_report()方法，如下所示 -

parsed_data.append(msg_data)
create_report(
   parsed_data, os.path.join(output_dir, "mbox_report.csv"), columns)
def write_payload(msg, out_dir):
   pyld = msg.get_payload()
   export_path = []
   
if msg.is_multipart():
   for entry in pyld:
      export_path += write_payload(entry, out_dir)
else:
   content_type = msg.get_content_type()
   if "application/" in content_type.lower():
      content = base64.b64decode(msg.get_payload())
      export_path.append(export_content(msg, out_dir, content))
   elif "image/" in content_type.lower():
      content = base64.b64decode(msg.get_payload())
      export_path.append(export_content(msg, out_dir, content))

   elif "video/" in content_type.lower():
      content = base64.b64decode(msg.get_payload())
      export_path.append(export_content(msg, out_dir, content))
   elif "audio/" in content_type.lower():
      content = base64.b64decode(msg.get_payload())
      export_path.append(export_content(msg, out_dir, content))
   elif "text/csv" in content_type.lower():
      content = base64.b64decode(msg.get_payload())
      export_path.append(export_content(msg, out_dir, content))
   elif "info/" in content_type.lower():
      export_path.append(export_content(msg, out_dir,
      msg.get_payload()))
   elif "text/calendar" in content_type.lower():
      export_path.append(export_content(msg, out_dir,
      msg.get_payload()))
   elif "text/rtf" in content_type.lower():
      export_path.append(export_content(msg, out_dir,
      msg.get_payload()))
   else:
      if "name=" in msg.get('Content-Disposition', "N/A"):
         content = base64.b64decode(msg.get_payload())
      export_path.append(export_content(msg, out_dir, content))
   elif "name=" in msg.get('Content-Type', "N/A"):
      content = base64.b64decode(msg.get_payload())
      export_path.append(export_content(msg, out_dir, content))
return export_path

請注意，上述if-else語句易於理解。現在，我們需要定義一個方法，該方法將從msg物件中提取檔名，如下所示 -

def export_content(msg, out_dir, content_data):
   file_name = get_filename(msg)
   file_ext = "FILE"
   
   if "." in file_name: file_ext = file_name.rsplit(".", 1)[-1]
   file_name = "{}_{:.4f}.{}".format(file_name.rsplit(".", 1)[0], time.time(), file_ext)
   file_name = os.path.join(out_dir, file_name)

現在，藉助以下程式碼行，您實際上可以匯出檔案 -

if isinstance(content_data, str):
   open(file_name, 'w').write(content_data)
else:
   open(file_name, 'wb').write(content_data)
return file_name

現在，讓我們定義一個函式來從message中提取檔名，以準確表示這些檔案的名稱，如下所示 -

def get_filename(msg):
   if 'name=' in msg.get("Content-Disposition", "N/A"):
      fname_data = msg["Content-Disposition"].replace("\r\n", " ")
      fname = [x for x in fname_data.split("; ") if 'name=' in x]
      file_name = fname[0].split("=", 1)[-1]
   elif 'name=' in msg.get("Content-Type", "N/A"):
      fname_data = msg["Content-Type"].replace("\r\n", " ")
      fname = [x for x in fname_data.split("; ") if 'name=' in x]
      file_name = fname[0].split("=", 1)[-1]
   else:
      file_name = "NO_FILENAME"
   fchars = [x for x in file_name if x.isalnum() or x.isspace() or x == "."]
   return "".join(fchars)

現在，我們可以透過定義create_report()函式來編寫CSV檔案，如下所示 -

def create_report(output_data, output_file, columns):
   with open(output_file, 'w', newline="") as outfile:
      csvfile = csv.DictWriter(outfile, columns)
      csvfile.writeheader()
      csvfile.writerows(output_data)

執行上述指令碼後，我們將獲得CSV報告和包含附件的目錄。

Windows重要工件-I

本章將解釋Microsoft Windows取證中涉及的各種概念以及調查人員可以從調查過程中獲得的重要工件。

簡介

工件是計算機系統中包含與計算機使用者執行的活動相關的重要資訊的物件或區域。此資訊型別和位置取決於作業系統。在取證分析期間，這些工件在批准或否決調查人員的觀察結果方面發揮著非常重要的作用。

Windows工件對取證的重要性

由於以下原因，Windows工件具有重要意義 -

全世界約90%的流量來自使用Windows作為作業系統的計算機。因此，對於數字取證檢查員來說，Windows工件非常重要。
Windows作業系統儲存與使用者在計算機系統上的活動相關的不同型別的證據。這是另一個表明Windows工件對數字取證的重要性。
很多時候，調查人員會圍繞使用者建立的資料等舊的和傳統的領域進行調查。Windows工件可以將調查引向非傳統領域，例如系統建立的資料或工件。
Windows提供了大量的工件，這對調查人員以及執行非正式調查的公司和個人都有幫助。
近年來網路犯罪的增加是Windows工件很重要的另一個原因。

Windows工件及其Python指令碼

在本節中，我們將討論一些Windows工件以及用於從中獲取資訊的Python指令碼。

回收站

它是取證調查中重要的Windows工件之一。Windows回收站包含使用者已刪除但系統尚未物理刪除的檔案。即使使用者完全從系統中刪除了檔案，它也仍然是重要的調查來源。這是因為檢查員可以從已刪除的檔案中提取有價值的資訊，例如原始檔案路徑以及將其傳送到回收站的時間。

請注意，回收站證據的儲存取決於Windows版本。在以下Python指令碼中，我們將處理Windows 7，它建立兩個檔案：$R檔案，其中包含已回收檔案的實際內容；$I檔案，其中包含原始檔名、路徑、刪除檔案時的檔案大小。

對於Python指令碼，我們需要安裝第三方模組，即pytsk3、pyewf和unicodecsv。我們可以使用pip來安裝它們。我們可以按照以下步驟從回收站中提取資訊 -

首先，我們需要使用遞迴方法掃描$Recycle.bin資料夾並選擇所有以$I開頭的檔案。
接下來，我們將讀取檔案的內容並解析可用的元資料結構。
現在，我們將搜尋關聯的$R檔案。
最後，我們將結果寫入CSV檔案以供審查。

讓我們看看如何為此目的使用 Python 程式碼：

首先，我們需要匯入以下Python庫 -

from __future__ import print_function
from argparse import ArgumentParser

import datetime
import os
import struct

from utility.pytskutil import TSKUtil
import unicodecsv as csv

接下來，我們需要為命令列處理程式提供引數。請注意，這裡它將接受三個引數 - 第一個是證據檔案的路徑，第二個是證據檔案的型別，第三個是CSV報告的所需輸出路徑，如下所示 -

if __name__ == '__main__':
   parser = argparse.ArgumentParser('Recycle Bin evidences')
   parser.add_argument('EVIDENCE_FILE', help = "Path to evidence file")
   parser.add_argument('IMAGE_TYPE', help = "Evidence file format",
   choices = ('ewf', 'raw'))
   parser.add_argument('CSV_REPORT', help = "Path to CSV report")
   args = parser.parse_args()
   main(args.EVIDENCE_FILE, args.IMAGE_TYPE, args.CSV_REPORT)

現在，定義main()函式，該函式將處理所有處理。它將搜尋$I檔案，如下所示 -

def main(evidence, image_type, report_file):
   tsk_util = TSKUtil(evidence, image_type)
   dollar_i_files = tsk_util.recurse_files("$I", path = '/$Recycle.bin',logic = "startswith")
   
   if dollar_i_files is not None:
      processed_files = process_dollar_i(tsk_util, dollar_i_files)
      write_csv(report_file,['file_path', 'file_size', 'deleted_time','dollar_i_file', 'dollar_r_file', 'is_directory'],processed_files)
   else:
      print("No $I files found")

現在，如果我們找到$I檔案，則必須將其傳送到process_dollar_i()函式，該函式將接受tsk_util物件以及$I檔案的列表，如下所示 -

def process_dollar_i(tsk_util, dollar_i_files):
   processed_files = []
   
   for dollar_i in dollar_i_files:
      file_attribs = read_dollar_i(dollar_i[2])
      if file_attribs is None:
         continue
      file_attribs['dollar_i_file'] = os.path.join('/$Recycle.bin', dollar_i[1][1:])

現在，搜尋$R檔案，如下所示 -

recycle_file_path = os.path.join('/$Recycle.bin',dollar_i[1].rsplit("/", 1)[0][1:])
dollar_r_files = tsk_util.recurse_files(
   "$R" + dollar_i[0][2:],path = recycle_file_path, logic = "startswith")
   
   if dollar_r_files is None:
      dollar_r_dir = os.path.join(recycle_file_path,"$R" + dollar_i[0][2:])
      dollar_r_dirs = tsk_util.query_directory(dollar_r_dir)
   
   if dollar_r_dirs is None:
      file_attribs['dollar_r_file'] = "Not Found"
      file_attribs['is_directory'] = 'Unknown'
   
   else:
      file_attribs['dollar_r_file'] = dollar_r_dir
      file_attribs['is_directory'] = True
   
   else:
      dollar_r = [os.path.join(recycle_file_path, r[1][1:])for r in dollar_r_files]
      file_attribs['dollar_r_file'] = ";".join(dollar_r)
      file_attribs['is_directory'] = False
      processed_files.append(file_attribs)
   return processed_files

現在，定義read_dollar_i()方法來讀取$I檔案，換句話說，解析元資料。我們將使用read_random()方法讀取簽名的前八個位元組。如果簽名不匹配，這將返回none。之後，如果這是一個有效的檔案，我們將必須從$I檔案中讀取和解包值。

def read_dollar_i(file_obj):
   if file_obj.read_random(0, 8) != '\x01\x00\x00\x00\x00\x00\x00\x00':
      return None
   raw_file_size = struct.unpack('<q', file_obj.read_random(8, 8))
   raw_deleted_time = struct.unpack('<q',   file_obj.read_random(16, 8))
   raw_file_path = file_obj.read_random(24, 520)

現在，提取這些檔案後，我們需要使用sizeof_fmt()函式將整數解釋為人可讀的值，如下所示 -

file_size = sizeof_fmt(raw_file_size[0])
deleted_time = parse_windows_filetime(raw_deleted_time[0])

file_path = raw_file_path.decode("utf16").strip("\x00")
return {'file_size': file_size, 'file_path': file_path,'deleted_time': deleted_time}

現在，我們需要定義sizeof_fmt()函式，如下所示 -

def sizeof_fmt(num, suffix = 'B'):
   for unit in ['', 'Ki', 'Mi', 'Gi', 'Ti', 'Pi', 'Ei', 'Zi']:
      if abs(num) < 1024.0:
         return "%3.1f%s%s" % (num, unit, suffix)
      num /= 1024.0
   return "%.1f%s%s" % (num, 'Yi', suffix)

現在，定義一個函式將解釋的整數轉換為格式化的日期和時間，如下所示 -

def parse_windows_filetime(date_value):
   microseconds = float(date_value) / 10
   ts = datetime.datetime(1601, 1, 1) + datetime.timedelta(
      microseconds = microseconds)
   return ts.strftime('%Y-%m-%d %H:%M:%S.%f')

現在，我們將定義write_csv()方法將處理後的結果寫入CSV檔案，如下所示 -

def write_csv(outfile, fieldnames, data):
   with open(outfile, 'wb') as open_outfile:
      csvfile = csv.DictWriter(open_outfile, fieldnames)
      csvfile.writeheader()
      csvfile.writerows(data)

執行上述指令碼後，我們將獲得來自$I和$R檔案的資料。

便箋

Windows便箋替換了用筆和紙書寫的現實習慣。這些便箋用於以不同的顏色、字型等選項浮動在桌面上。在Windows 7中，便箋檔案儲存為OLE檔案，因此在以下Python指令碼中，我們將調查此OLE檔案以從中提取元資料。便箋。

對於此Python指令碼，我們需要安裝第三方模組，即olefile、pytsk3、pyewf和unicodecsv。我們可以使用命令pip來安裝它們。

我們可以按照下面討論的步驟從便箋檔案中提取資訊，即StickyNote.sn -

首先，開啟證據檔案並找到所有StickyNote.snt檔案。
然後，從OLE流中解析元資料和內容，並將RTF內容寫入檔案。
最後，建立此元資料的CSV報告。

Python 程式碼

讓我們看看如何為此目的使用 Python 程式碼：

首先，匯入以下Python庫 -

from __future__ import print_function
from argparse import ArgumentParser

import unicodecsv as csv
import os
import StringIO

from utility.pytskutil import TSKUtil
import olefile

接下來，定義一個全域性變數，該變數將在整個指令碼中使用 -

REPORT_COLS = ['note_id', 'created', 'modified', 'note_text', 'note_file']

接下來，我們需要為命令列處理程式提供引數。請注意，這裡它將接受三個引數 - 第一個是證據檔案的路徑，第二個是證據檔案的型別，第三個是所需的輸出路徑，如下所示 -

if __name__ == '__main__':
   parser = argparse.ArgumentParser('Evidence from Sticky Notes')
   parser.add_argument('EVIDENCE_FILE', help="Path to evidence file")
   parser.add_argument('IMAGE_TYPE', help="Evidence file format",choices=('ewf', 'raw'))
   parser.add_argument('REPORT_FOLDER', help="Path to report folder")
   args = parser.parse_args()
   main(args.EVIDENCE_FILE, args.IMAGE_TYPE, args.REPORT_FOLDER)

現在，我們將定義main()函式，該函式將類似於前面的指令碼，如下所示 -

def main(evidence, image_type, report_folder):
   tsk_util = TSKUtil(evidence, image_type)
   note_files = tsk_util.recurse_files('StickyNotes.snt', '/Users','equals')

現在，讓我們遍歷生成的檔案。然後，我們將呼叫**parse_snt_file()**函式來處理檔案，然後我們將使用**write_note_rtf()**方法寫入RTF檔案，如下所示：

report_details = []
for note_file in note_files:
   user_dir = note_file[1].split("/")[1]
   file_like_obj = create_file_like_obj(note_file[2])
   note_data = parse_snt_file(file_like_obj)
   
   if note_data is None:
      continue
   write_note_rtf(note_data, os.path.join(report_folder, user_dir))
   report_details += prep_note_report(note_data, REPORT_COLS,"/Users" + note_file[1])
   write_csv(os.path.join(report_folder, 'sticky_notes.csv'), REPORT_COLS,report_details)

接下來，我們需要定義此指令碼中使用的各種函式。

首先，我們將定義**create_file_like_obj()**函式，用於透過獲取**pytsk**檔案物件來讀取檔案的大小。然後，我們將定義**parse_snt_file()**函式，該函式將檔案類物件作為輸入，用於讀取和解釋便籤檔案。

def parse_snt_file(snt_file):
   
   if not olefile.isOleFile(snt_file):
      print("This is not an OLE file")
      return None
   ole = olefile.OleFileIO(snt_file)
   note = {}
   
   for stream in ole.listdir():
      if stream[0].count("-") == 3:
         if stream[0] not in note:
            note[stream[0]] = {"created": ole.getctime(stream[0]),"modified": ole.getmtime(stream[0])}
         content = None
         if stream[1] == '0':
            content = ole.openstream(stream).read()
         elif stream[1] == '3':
            content = ole.openstream(stream).read().decode("utf-16")
         if content:
            note[stream[0]][stream[1]] = content
	return note

現在，透過定義**write_note_rtf()**函式建立RTF檔案，如下所示

def write_note_rtf(note_data, report_folder):
   if not os.path.exists(report_folder):
      os.makedirs(report_folder)
   
   for note_id, stream_data in note_data.items():
      fname = os.path.join(report_folder, note_id + ".rtf")
      with open(fname, 'w') as open_file:
         open_file.write(stream_data['0'])

現在，我們將巢狀字典轉換為扁平化的字典列表，這些字典更適合CSV電子表格。這將透過定義**prep_note_report()**函式來完成。最後，我們將定義**write_csv()**函式。

def prep_note_report(note_data, report_cols, note_file):
   report_details = []
   
   for note_id, stream_data in note_data.items():
      report_details.append({
         "note_id": note_id,
         "created": stream_data['created'],
         "modified": stream_data['modified'],
         "note_text": stream_data['3'].strip("\x00"),
         "note_file": note_file
      })
   return report_details
def write_csv(outfile, fieldnames, data):
   with open(outfile, 'wb') as open_outfile:
      csvfile = csv.DictWriter(open_outfile, fieldnames)
      csvfile.writeheader()
      csvfile.writerows(data)

執行上述指令碼後，我們將從便籤檔案中獲取元資料。

登錄檔檔案

Windows登錄檔檔案包含許多重要的細節，對於取證分析師來說，這些細節就像一個資訊寶庫。它是一個分層資料庫，包含與作業系統配置、使用者活動、軟體安裝等相關的詳細資訊。在下面的Python指令碼中，我們將從**SYSTEM**和**SOFTWARE**配置單元中訪問常見的基線資訊。

對於此Python指令碼，我們需要安裝第三方模組，即**pytsk3、pyewf**和**registry**。我們可以使用**pip**來安裝它們。

我們可以按照以下步驟從Windows登錄檔中提取資訊：

首先，根據名稱和路徑查詢要處理的登錄檔配置單元。
然後，我們使用StringIO和Registry模組開啟這些檔案。
最後，我們需要處理每個配置單元，並將解析後的值列印到控制檯以供解釋。

Python 程式碼

讓我們看看如何為此目的使用 Python 程式碼：

首先，匯入以下Python庫 -

from __future__ import print_function
from argparse import ArgumentParser

import datetime
import StringIO
import struct

from utility.pytskutil import TSKUtil
from Registry import Registry

現在，為命令列處理程式提供引數。這裡它將接受兩個引數 - 第一個是證據檔案的路徑，第二個是證據檔案的型別，如下所示：

if __name__ == '__main__':
   parser = argparse.ArgumentParser('Evidence from Windows Registry')
   parser.add_argument('EVIDENCE_FILE', help = "Path to evidence file")
   parser.add_argument('IMAGE_TYPE', help = "Evidence file format",
   choices = ('ewf', 'raw'))
   args = parser.parse_args()
   main(args.EVIDENCE_FILE, args.IMAGE_TYPE)

現在，我們將定義**main()**函式，用於在**/Windows/System32/config**資料夾中搜索**SYSTEM**和**SOFTWARE**配置單元，如下所示：

def main(evidence, image_type):
   tsk_util = TSKUtil(evidence, image_type)
   tsk_system_hive = tsk_util.recurse_files('system', '/Windows/system32/config', 'equals')
   tsk_software_hive = tsk_util.recurse_files('software', '/Windows/system32/config', 'equals')
   system_hive = open_file_as_reg(tsk_system_hive[0][2])
   software_hive = open_file_as_reg(tsk_software_hive[0][2])
   process_system_hive(system_hive)
   process_software_hive(software_hive)

現在，定義開啟登錄檔檔案的函式。為此，我們需要從**pytsk**元資料中獲取檔案大小，如下所示：

def open_file_as_reg(reg_file):
   file_size = reg_file.info.meta.size
   file_content = reg_file.read_random(0, file_size)
   file_like_obj = StringIO.StringIO(file_content)
   return Registry.Registry(file_like_obj)

現在，藉助以下方法，我們可以處理**SYSTEM>**配置單元：

def process_system_hive(hive):
   root = hive.root()
   current_control_set = root.find_key("Select").value("Current").value()
   control_set = root.find_key("ControlSet{:03d}".format(current_control_set))
   raw_shutdown_time = struct.unpack(
      '<Q', control_set.find_key("Control").find_key("Windows").value("ShutdownTime").value())
   
   shutdown_time = parse_windows_filetime(raw_shutdown_time[0])
   print("Last Shutdown Time: {}".format(shutdown_time))
   
   time_zone = control_set.find_key("Control").find_key("TimeZoneInformation")
      .value("TimeZoneKeyName").value()
   
   print("Machine Time Zone: {}".format(time_zone))
   computer_name = control_set.find_key("Control").find_key("ComputerName").find_key("ComputerName")
      .value("ComputerName").value()
   
   print("Machine Name: {}".format(computer_name))
   last_access = control_set.find_key("Control").find_key("FileSystem")
      .value("NtfsDisableLastAccessUpdate").value()
   last_access = "Disabled" if last_access == 1 else "enabled"
   print("Last Access Updates: {}".format(last_access))

現在，我們需要定義一個函式，將解釋的整數格式化為日期和時間，如下所示：

def parse_windows_filetime(date_value):
   microseconds = float(date_value) / 10
   ts = datetime.datetime(1601, 1, 1) + datetime.timedelta(microseconds = microseconds)
   return ts.strftime('%Y-%m-%d %H:%M:%S.%f')

def parse_unix_epoch(date_value):
   ts = datetime.datetime.fromtimestamp(date_value)
   return ts.strftime('%Y-%m-%d %H:%M:%S.%f')

現在，藉助以下方法，我們可以處理**SOFTWARE**配置單元：

def process_software_hive(hive):
   root = hive.root()
   nt_curr_ver = root.find_key("Microsoft").find_key("Windows NT")
      .find_key("CurrentVersion")
   
   print("Product name: {}".format(nt_curr_ver.value("ProductName").value()))
   print("CSD Version: {}".format(nt_curr_ver.value("CSDVersion").value()))
   print("Current Build: {}".format(nt_curr_ver.value("CurrentBuild").value()))
   print("Registered Owner: {}".format(nt_curr_ver.value("RegisteredOwner").value()))
   print("Registered Org: 
      {}".format(nt_curr_ver.value("RegisteredOrganization").value()))
   
   raw_install_date = nt_curr_ver.value("InstallDate").value()
   install_date = parse_unix_epoch(raw_install_date)
   print("Installation Date: {}".format(install_date))

執行上述指令碼後，我們將獲得儲存在Windows登錄檔檔案中的元資料。

Windows重要工件-II

本章討論了Windows中一些更重要的工件及其使用Python的提取方法。

使用者活動

Windows擁有**NTUSER.DAT**檔案用於儲存各種使用者活動。每個使用者配置檔案都具有類似**NTUSER.DAT**的配置單元，其中儲存與該使用者相關的資訊和配置。因此，對於取證分析師進行調查非常有用。

以下Python指令碼將解析**NTUSER.DAT**的一些鍵，以探索使用者在系統上的操作。在繼續之前，對於Python指令碼，我們需要安裝第三方模組，即**Registry、pytsk3、pyewf**和**Jinja2**。我們可以使用pip來安裝它們。

我們可以按照以下步驟從**NTUSER.DAT**檔案中提取資訊：

首先，搜尋系統中的所有**NTUSER.DAT**檔案。
然後，為每個**NTUSER.DAT**檔案解析**WordWheelQuery、TypePath和RunMRU**鍵。
最後，我們將使用**Jinja2** fmodule將這些已處理的工件寫入HTML報告。

Python 程式碼

讓我們看看如何為此目的使用 Python 程式碼：

首先，我們需要匯入以下Python模組：

from __future__ import print_function
from argparse import ArgumentParser

import os
import StringIO
import struct

from utility.pytskutil import TSKUtil
from Registry import Registry
import jinja2

現在，為命令列處理程式提供引數。這裡它將接受三個引數 - 第一個是證據檔案的路徑，第二個是證據檔案的型別，第三個是HTML報告的所需輸出路徑，如下所示：

if __name__ == '__main__':
   parser = argparse.ArgumentParser('Information from user activities')
   parser.add_argument('EVIDENCE_FILE',help = "Path to evidence file")
   parser.add_argument('IMAGE_TYPE',help = "Evidence file format",choices = ('ewf', 'raw'))
   parser.add_argument('REPORT',help = "Path to report file")
   args = parser.parse_args()
   main(args.EVIDENCE_FILE, args.IMAGE_TYPE, args.REPORT)

現在，讓我們定義**main()**函式來搜尋所有**NTUSER.DAT**檔案，如下所示：

def main(evidence, image_type, report):
   tsk_util = TSKUtil(evidence, image_type)
   tsk_ntuser_hives = tsk_util.recurse_files('ntuser.dat','/Users', 'equals')
   
   nt_rec = {
      'wordwheel': {'data': [], 'title': 'WordWheel Query'},
      'typed_path': {'data': [], 'title': 'Typed Paths'},
      'run_mru': {'data': [], 'title': 'Run MRU'}
   }

現在，我們將嘗試在**NTUSER.DAT**檔案中查詢鍵，一旦找到，就定義使用者處理函式，如下所示：

for ntuser in tsk_ntuser_hives:
   uname = ntuser[1].split("/")

open_ntuser = open_file_as_reg(ntuser[2])
try:
   explorer_key = open_ntuser.root().find_key("Software").find_key("Microsoft")
      .find_key("Windows").find_key("CurrentVersion").find_key("Explorer")
   except Registry.RegistryKeyNotFoundException:
      continue
   nt_rec['wordwheel']['data'] += parse_wordwheel(explorer_key, uname)
   nt_rec['typed_path']['data'] += parse_typed_paths(explorer_key, uname)
   nt_rec['run_mru']['data'] += parse_run_mru(explorer_key, uname)
   nt_rec['wordwheel']['headers'] = \ nt_rec['wordwheel']['data'][0].keys()
   nt_rec['typed_path']['headers'] = \ nt_rec['typed_path']['data'][0].keys()
   nt_rec['run_mru']['headers'] = \ nt_rec['run_mru']['data'][0].keys()

現在，將字典物件及其路徑傳遞給**write_html()**方法，如下所示：

write_html(report, nt_rec)

現在，定義一個方法，該方法獲取**pytsk**檔案控制代碼並透過**StringIO**類將其讀入Registry類。

def open_file_as_reg(reg_file):
   file_size = reg_file.info.meta.size
   file_content = reg_file.read_random(0, file_size)
   file_like_obj = StringIO.StringIO(file_content)
   return Registry.Registry(file_like_obj)

現在，我們將定義一個函式，該函式將解析並處理來自**NTUSER.DAT**檔案的**WordWheelQuery**鍵，如下所示：

def parse_wordwheel(explorer_key, username):
   try:
      wwq = explorer_key.find_key("WordWheelQuery")
   except Registry.RegistryKeyNotFoundException:
      return []
   mru_list = wwq.value("MRUListEx").value()
   mru_order = []
   
   for i in xrange(0, len(mru_list), 2):
      order_val = struct.unpack('h', mru_list[i:i + 2])[0]
   if order_val in mru_order and order_val in (0, -1):
      break
   else:
      mru_order.append(order_val)
   search_list = []
   
   for count, val in enumerate(mru_order):
      ts = "N/A"
      if count == 0:
         ts = wwq.timestamp()
      search_list.append({
         'timestamp': ts,
         'username': username,
         'order': count,
         'value_name': str(val),
         'search': wwq.value(str(val)).value().decode("UTF-16").strip("\x00")
})
   return search_list

現在，我們將定義一個函式，該函式將解析並處理來自**NTUSER.DAT**檔案的**TypedPaths**鍵，如下所示：

def parse_typed_paths(explorer_key, username):
   try:
      typed_paths = explorer_key.find_key("TypedPaths")
   except Registry.RegistryKeyNotFoundException:
      return []
   typed_path_details = []
   
   for val in typed_paths.values():
      typed_path_details.append({
         "username": username,
         "value_name": val.name(),
         "path": val.value()
      })
   return typed_path_details

現在，我們將定義一個函式，該函式將解析並處理來自**NTUSER.DAT**檔案的**RunMRU**鍵，如下所示：

def parse_run_mru(explorer_key, username):
   try:
      run_mru = explorer_key.find_key("RunMRU")
   except Registry.RegistryKeyNotFoundException:
      return []
   
   if len(run_mru.values()) == 0:
      return []
   mru_list = run_mru.value("MRUList").value()
   mru_order = []
   
   for i in mru_list:
      mru_order.append(i)
   mru_details = []
   
   for count, val in enumerate(mru_order):
      ts = "N/A"
      if count == 0:
         ts = run_mru.timestamp()
      mru_details.append({
         "username": username,
         "timestamp": ts,
         "order": count,
         "value_name": val,
         "run_statement": run_mru.value(val).value()
      })
   return mru_details

現在，以下函式將處理HTML報告的建立：

def write_html(outfile, data_dict):
   cwd = os.path.dirname(os.path.abspath(__file__))
   env = jinja2.Environment(loader=jinja2.FileSystemLoader(cwd))
   template = env.get_template("user_activity.html")
   rendering = template.render(nt_data=data_dict)
   
   with open(outfile, 'w') as open_outfile:
      open_outfile.write(rendering)

最後，我們可以為報告編寫HTML文件。執行上述指令碼後，我們將以HTML文件格式獲得NTUSER.DAT檔案中的資訊。

LINK檔案

當用戶或作業系統為經常使用、雙擊或從系統驅動器（如附加儲存）訪問的檔案建立快捷方式檔案時，會建立快捷方式檔案。此類快捷方式檔案稱為連結檔案。透過訪問這些連結檔案，調查人員可以找到視窗的活動，例如訪問這些檔案的時間和位置。

讓我們討論一下我們可以用來從這些Windows LINK檔案中獲取資訊的Python指令碼。

對於Python指令碼，安裝第三方模組，即**pylnk、pytsk3、pyewf**。我們可以按照以下步驟從**lnk**檔案中提取資訊

首先，搜尋系統內的**lnk**檔案。
然後，透過遍歷它們來提取檔案中的資訊。
現在，最後我們需要將此資訊寫入CSV報告。

Python 程式碼

讓我們看看如何為此目的使用 Python 程式碼：

首先，匯入以下Python庫 -

from __future__ import print_function
from argparse import ArgumentParser

import csv
import StringIO

from utility.pytskutil import TSKUtil
import pylnk

現在，為命令列處理程式提供引數。這裡它將接受三個引數 - 第一個是證據檔案的路徑，第二個是證據檔案的型別，第三個是CSV報告的所需輸出路徑，如下所示：

if __name__ == '__main__':
   parser = argparse.ArgumentParser('Parsing LNK files')
   parser.add_argument('EVIDENCE_FILE', help = "Path to evidence file")
   parser.add_argument('IMAGE_TYPE', help = "Evidence file format",choices = ('ewf', 'raw'))
   parser.add_argument('CSV_REPORT', help = "Path to CSV report")
   args = parser.parse_args()
   main(args.EVIDENCE_FILE, args.IMAGE_TYPE, args.CSV_REPORT)

現在，透過建立**TSKUtil**物件來解釋證據檔案，並遍歷檔案系統以查詢以**lnk**結尾的檔案。這可以透過定義**main()**函式來完成，如下所示：

def main(evidence, image_type, report):
   tsk_util = TSKUtil(evidence, image_type)
   lnk_files = tsk_util.recurse_files("lnk", path="/", logic="endswith")
   
   if lnk_files is None:
      print("No lnk files found")
      exit(0)
   columns = [
      'command_line_arguments', 'description', 'drive_serial_number',
      'drive_type', 'file_access_time', 'file_attribute_flags',
      'file_creation_time', 'file_modification_time', 'file_size',
      'environmental_variables_location', 'volume_label',
      'machine_identifier', 'local_path', 'network_path',
      'relative_path', 'working_directory'
   ]

現在，藉助以下程式碼，我們將透過建立一個函式來遍歷**lnk**檔案，如下所示：

parsed_lnks = []

for entry in lnk_files:
   lnk = open_file_as_lnk(entry[2])
   lnk_data = {'lnk_path': entry[1], 'lnk_name': entry[0]}
   
   for col in columns:
      lnk_data[col] = getattr(lnk, col, "N/A")
   lnk.close()
   parsed_lnks.append(lnk_data)
write_csv(report, columns + ['lnk_path', 'lnk_name'], parsed_lnks)

現在，我們需要定義兩個函式，一個將開啟**pytsk**檔案物件，另一個將用於寫入CSV報告，如下所示：

def open_file_as_lnk(lnk_file):
   file_size = lnk_file.info.meta.size
   file_content = lnk_file.read_random(0, file_size)
   file_like_obj = StringIO.StringIO(file_content)
   lnk = pylnk.file()
   lnk.open_file_object(file_like_obj)
   return lnk
def write_csv(outfile, fieldnames, data):
   with open(outfile, 'wb') as open_outfile:
      csvfile = csv.DictWriter(open_outfile, fieldnames)
      csvfile.writeheader()
      csvfile.writerows(data)

執行上述指令碼後，我們將獲得發現的**lnk**檔案中包含的資訊，這些資訊將儲存在CSV報告中：

預取檔案

每當從特定位置首次執行應用程式時，Windows都會建立**預取檔案**。這些用於加快應用程式啟動過程。這些檔案的副檔名為**.PF**，並存儲在**”\Root\Windows\Prefetch”**資料夾中。

數字取證專家可以揭示從指定位置執行程式的證據以及使用者的詳細資訊。預取檔案對於檢查員來說是有用的工件，因為即使在程式被刪除或解除安裝後，其條目仍然存在。

讓我們討論一下將從Windows預取檔案中獲取資訊的Python指令碼，如下所示：

對於Python指令碼，安裝第三方模組，即**pylnk、pytsk3**和**unicodecsv**。回想一下，我們已經在前面章節中討論的Python指令碼中使用過這些庫。

我們必須按照以下步驟從**prefetch**檔案中提取資訊：

首先，掃描**.pf**副檔名檔案或預取檔案。
現在，執行簽名驗證以消除誤報。
接下來，解析Windows預取檔案格式。這與Windows版本不同。例如，對於Windows XP，它是17，對於Windows Vista和Windows 7，它是23，對於Windows 8.1，它是26，對於Windows 10，它是30。
最後，我們將解析後的結果寫入CSV檔案。

Python 程式碼

讓我們看看如何為此目的使用 Python 程式碼：

首先，匯入以下Python庫 -

from __future__ import print_function
import argparse
from datetime import datetime, timedelta

import os
import pytsk3
import pyewf
import struct
import sys
import unicodecsv as csv
from utility.pytskutil import TSKUtil

現在，為命令列處理程式提供引數。這裡它將接受兩個引數，第一個是證據檔案的路徑，第二個是證據檔案的型別。它還接受一個可選引數，用於指定掃描預取檔案的路徑：

if __name__ == "__main__":
   parser = argparse.ArgumentParser('Parsing Prefetch files')
   parser.add_argument("EVIDENCE_FILE", help = "Evidence file path")
   parser.add_argument("TYPE", help = "Type of Evidence",choices = ("raw", "ewf"))
   parser.add_argument("OUTPUT_CSV", help = "Path to write output csv")
   parser.add_argument("-d", help = "Prefetch directory to scan",default = "/WINDOWS/PREFETCH")
   args = parser.parse_args()
   
   if os.path.exists(args.EVIDENCE_FILE) and \
      os.path.isfile(args.EVIDENCE_FILE):
   main(args.EVIDENCE_FILE, args.TYPE, args.OUTPUT_CSV, args.d)
else:
   print("[-] Supplied input file {} does not exist or is not a ""file".format(args.EVIDENCE_FILE))
   sys.exit(1)

現在，透過建立**TSKUtil**物件來解釋證據檔案，並遍歷檔案系統以查詢以**.pf**結尾的檔案。這可以透過定義**main()**函式來完成，如下所示：

def main(evidence, image_type, output_csv, path):
   tsk_util = TSKUtil(evidence, image_type)
   prefetch_dir = tsk_util.query_directory(path)
   prefetch_files = None
   
   if prefetch_dir is not None:
      prefetch_files = tsk_util.recurse_files(".pf", path=path, logic="endswith")
   
   if prefetch_files is None:
      print("[-] No .pf files found")
      sys.exit(2)
   print("[+] Identified {} potential prefetch files".format(len(prefetch_files)))
   prefetch_data = []
   
   for hit in prefetch_files:
      prefetch_file = hit[2]
      pf_version = check_signature(prefetch_file)

現在，定義一個方法，該方法將執行簽名的驗證，如下所示：

def check_signature(prefetch_file):
   version, signature = struct.unpack("^<2i", prefetch_file.read_random(0, 8))
   
   if signature == 1094927187:
      return version
   else:
      return None
   
   if pf_version is None:
      continue
   pf_name = hit[0]
   
   if pf_version == 17:
      parsed_data = parse_pf_17(prefetch_file, pf_name)
      parsed_data.append(os.path.join(path, hit[1].lstrip("//")))
      prefetch_data.append(parsed_data)

現在，開始處理Windows預取檔案。這裡我們以Windows XP預取檔案為例：

def parse_pf_17(prefetch_file, pf_name):
   create = convert_unix(prefetch_file.info.meta.crtime)
   modify = convert_unix(prefetch_file.info.meta.mtime)
def convert_unix(ts):
   if int(ts) == 0:
      return ""
   return datetime.utcfromtimestamp(ts)
def convert_filetime(ts):
   if int(ts) == 0:
      return ""
   return datetime(1601, 1, 1) + timedelta(microseconds=ts / 10)

現在，使用struct提取嵌入在預取檔案中的資料，如下所示：

pf_size, name, vol_info, vol_entries, vol_size, filetime, \
   count = struct.unpack("<i60s32x3iq16xi",prefetch_file.read_random(12, 136))
name = name.decode("utf-16", "ignore").strip("/x00").split("/x00")[0]

vol_name_offset, vol_name_length, vol_create, \
   vol_serial = struct.unpack("<2iqi",prefetch_file.read_random(vol_info, 20))
   vol_serial = hex(vol_serial).lstrip("0x")
   vol_serial = vol_serial[:4] + "-" + vol_serial[4:]
   vol_name = struct.unpack(
      "<{}s".format(2 * vol_name_length),
      prefetch_file.read_random(vol_info + vol_name_offset,vol_name_length * 2))[0]

vol_name = vol_name.decode("utf-16", "ignore").strip("/x00").split("/x00")[0]
return [
   pf_name, name, pf_size, create,
   modify, convert_filetime(filetime), count, vol_name,
   convert_filetime(vol_create), vol_serial ]

我們提供了Windows XP的預取版本，但如果遇到其他Windows的預取版本會怎樣？然後它必須顯示一條錯誤訊息，如下所示：

elif pf_version == 23:
   print("[-] Windows Vista / 7 PF file {} -- unsupported".format(pf_name))
   continue
elif pf_version == 26:
   print("[-] Windows 8 PF file {} -- unsupported".format(pf_name))
   continue
elif pf_version == 30:
   print("[-] Windows 10 PF file {} -- unsupported".format(pf_name))
continue

else:
   print("[-] Signature mismatch - Name: {}\nPath: {}".format(hit[0], hit[1]))
continue
write_output(prefetch_data, output_csv)

現在，定義將結果寫入CSV報告的方法，如下所示：

def write_output(data, output_csv):
   print("[+] Writing csv report")
   with open(output_csv, "wb") as outfile:
      writer = csv.writer(outfile)
      writer.writerow([
         "File Name", "Prefetch Name", "File Size (bytes)",
         "File Create Date (UTC)", "File Modify Date (UTC)",
         "Prefetch Last Execution Date (UTC)",
         "Prefetch Execution Count", "Volume", "Volume Create Date",
         "Volume Serial", "File Path" ])
      writer.writerows(data)

執行上述指令碼後，我們將獲得Windows XP版本預取檔案中的資訊，這些資訊將儲存在電子表格中。

Windows重要工件-III

本章將解釋調查人員在對Windows進行取證分析期間可以獲得的其他工件。

事件日誌

Windows事件日誌檔案，顧名思義，是儲存重要事件的特殊檔案，例如使用者何時登入計算機、程式何時遇到錯誤、系統更改、RDP訪問、應用程式特定事件等。網路調查人員始終對事件日誌資訊感興趣，因為它提供了大量關於系統訪問的有用歷史資訊。在下面的Python指令碼中，我們將處理舊版和當前的Windows事件日誌格式。

對於 Python 指令碼，我們需要安裝以下第三方模組：**pytsk3、pyewf、unicodecsv、pyevt 和 pyevt**x。我們可以按照以下步驟從事件日誌中提取資訊：

首先，搜尋所有與輸入引數匹配的事件日誌。
然後，執行檔案簽名驗證。
現在，使用相應的庫處理找到的每個事件日誌。
最後，將輸出寫入電子表格。

Python 程式碼

讓我們看看如何為此目的使用 Python 程式碼：

首先，匯入以下Python庫 -

from __future__ import print_function
import argparse
import unicodecsv as csv
import os
import pytsk3
import pyewf
import pyevt
import pyevtx
import sys
from utility.pytskutil import TSKUtil

現在，提供命令列處理程式的引數。請注意，這裡將接受三個引數：第一個是證據檔案路徑，第二個是證據檔案型別，第三個是要處理的事件日誌名稱。

if __name__ == "__main__":
   parser = argparse.ArgumentParser('Information from Event Logs')
   parser.add_argument("EVIDENCE_FILE", help = "Evidence file path")
   parser.add_argument("TYPE", help = "Type of Evidence",choices = ("raw", "ewf"))
   parser.add_argument(
      "LOG_NAME",help = "Event Log Name (SecEvent.Evt, SysEvent.Evt, ""etc.)")
   
   parser.add_argument(
      "-d", help = "Event log directory to scan",default = "/WINDOWS/SYSTEM32/WINEVT")
   
   parser.add_argument(
      "-f", help = "Enable fuzzy search for either evt or"" evtx extension", action = "store_true")
   args = parser.parse_args()
   
   if os.path.exists(args.EVIDENCE_FILE) and \ os.path.isfile(args.EVIDENCE_FILE):
      main(args.EVIDENCE_FILE, args.TYPE, args.LOG_NAME, args.d, args.f)
   else:
      print("[-] Supplied input file {} does not exist or is not a ""file".format(args.EVIDENCE_FILE))
   sys.exit(1)

現在，透過建立我們的 **TSKUtil** 物件來與事件日誌互動，以查詢使用者提供的路徑是否存在。這可以透過以下 **main()** 方法完成：

def main(evidence, image_type, log, win_event, fuzzy):
   tsk_util = TSKUtil(evidence, image_type)
   event_dir = tsk_util.query_directory(win_event)
   
   if event_dir is not None:
      if fuzzy is True:
         event_log = tsk_util.recurse_files(log, path=win_event)
   else:
      event_log = tsk_util.recurse_files(log, path=win_event, logic="equal")
   
   if event_log is not None:
      event_data = []
      for hit in event_log:
         event_file = hit[2]
         temp_evt = write_file(event_file)

現在，我們需要執行簽名驗證，然後定義一個方法將整個內容寫入當前目錄：

def write_file(event_file):
   with open(event_file.info.name.name, "w") as outfile:
      outfile.write(event_file.read_random(0, event_file.info.meta.size))
   return event_file.info.name.name
      if pyevt.check_file_signature(temp_evt):
         evt_log = pyevt.open(temp_evt)
         print("[+] Identified {} records in {}".format(
            evt_log.number_of_records, temp_evt))
         
         for i, record in enumerate(evt_log.records):
            strings = ""
            for s in record.strings:
               if s is not None:
                  strings += s + "\n"
            event_data.append([
               i, hit[0], record.computer_name,
               record.user_security_identifier,
               record.creation_time, record.written_time,
               record.event_category, record.source_name,
               record.event_identifier, record.event_type,
               strings, "",
               os.path.join(win_event, hit[1].lstrip("//"))
            ])
      elif pyevtx.check_file_signature(temp_evt):
         evtx_log = pyevtx.open(temp_evt)
         print("[+] Identified {} records in {}".format(
            evtx_log.number_of_records, temp_evt))
         for i, record in enumerate(evtx_log.records):
            strings = ""
            for s in record.strings:
			   if s is not None:
               strings += s + "\n"
         event_data.append([
            i, hit[0], record.computer_name,
            record.user_security_identifier, "",
            record.written_time, record.event_level,
            record.source_name, record.event_identifier,
            "", strings, record.xml_string,
            os.path.join(win_event, hit[1].lstrip("//"))
      ])
      else:
         print("[-] {} not a valid event log. Removing temp" file...".format(temp_evt))
         os.remove(temp_evt)
      continue
      write_output(event_data)
   else:
      print("[-] {} Event log not found in {} directory".format(log, win_event))
      sys.exit(3)
else:
   print("[-] Win XP Event Log Directory {} not found".format(win_event))
   sys.exit(2

最後，定義一個將輸出寫入電子表格的方法，如下所示：

def write_output(data):
   output_name = "parsed_event_logs.csv"
   print("[+] Writing {} to current working directory: {}".format(
      output_name, os.getcwd()))
   
   with open(output_name, "wb") as outfile:
      writer = csv.writer(outfile)
      writer.writerow([
         "Index", "File name", "Computer Name", "SID",
         "Event Create Date", "Event Written Date",
         "Event Category/Level", "Event Source", "Event ID",
         "Event Type", "Data", "XML Data", "File Path"
      ])
      writer.writerows(data)

成功執行上述指令碼後，我們將獲得電子表格中事件日誌的資訊。

網際網路歷史記錄

網際網路歷史記錄對於取證分析師非常有用；因為大多數網路犯罪僅發生在網際網路上。讓我們看看如何從 Internet Explorer 中提取網際網路歷史記錄，因為我們正在討論 Windows 取證，而 Internet Explorer 是 Windows 的預設瀏覽器。

在 Internet Explorer 中，網際網路歷史記錄儲存在 **index.dat** 檔案中。讓我們看看一個 Python 指令碼，它將從 **index.dat** 檔案中提取資訊。

我們可以按照以下步驟從 **index.dat** 檔案中提取資訊：

首先，在系統中搜索 **index.dat** 檔案。
然後，透過遍歷它們來提取檔案中的資訊。
現在，將所有這些資訊寫入 CSV 報告。

Python 程式碼

讓我們看看如何為此目的使用 Python 程式碼：

首先，匯入以下Python庫 -

from __future__ import print_function
import argparse

from datetime import datetime, timedelta
import os
import pytsk3
import pyewf
import pymsiecf
import sys
import unicodecsv as csv

from utility.pytskutil import TSKUtil

現在，提供命令列處理程式的引數。請注意，這裡將接受兩個引數：第一個是證據檔案路徑，第二個是證據檔案型別。

if __name__ == "__main__":
parser = argparse.ArgumentParser('getting information from internet history')
   parser.add_argument("EVIDENCE_FILE", help = "Evidence file path")
   parser.add_argument("TYPE", help = "Type of Evidence",choices = ("raw", "ewf"))
   parser.add_argument("-d", help = "Index.dat directory to scan",default = "/USERS")
   args = parser.parse_args()
   
   if os.path.exists(args.EVIDENCE_FILE) and os.path.isfile(args.EVIDENCE_FILE):
      main(args.EVIDENCE_FILE, args.TYPE, args.d)
   else:
      print("[-] Supplied input file {} does not exist or is not a ""file".format(args.EVIDENCE_FILE))
      sys.exit(1)

現在，透過建立 **TSKUtil** 物件來解釋證據檔案，並遍歷檔案系統以查詢 index.dat 檔案。這可以透過定義如下 **main()** 函式來完成：

def main(evidence, image_type, path):
   tsk_util = TSKUtil(evidence, image_type)
   index_dir = tsk_util.query_directory(path)
   
   if index_dir is not None:
      index_files = tsk_util.recurse_files("index.dat", path = path,logic = "equal")
      
      if index_files is not None:
         print("[+] Identified {} potential index.dat files".format(len(index_files)))
         index_data = []
         
         for hit in index_files:
            index_file = hit[2]
            temp_index = write_file(index_file)

現在，定義一個函式，藉助該函式我們可以將 index.dat 檔案的資訊複製到當前工作目錄，稍後它們可以由第三方模組進行處理：

def write_file(index_file):
   with open(index_file.info.name.name, "w") as outfile:
   outfile.write(index_file.read_random(0, index_file.info.meta.size))
return index_file.info.name.name

現在，使用以下程式碼執行簽名驗證，藉助內建函式 **check_file_signature()**：

if pymsiecf.check_file_signature(temp_index):
   index_dat = pymsiecf.open(temp_index)
   print("[+] Identified {} records in {}".format(
   index_dat.number_of_items, temp_index))

   for i, record in enumerate(index_dat.items):
   try:
      data = record.data
   if data is not None:
      data = data.rstrip("\x00")
   except AttributeError:
   
   if isinstance(record, pymsiecf.redirected):
      index_data.append([
         i, temp_index, "", "", "", "", "",record.location, "", "", record.offset,os.path.join(path, hit[1].lstrip("//"))])
   
   elif isinstance(record, pymsiecf.leak):
      index_data.append([
         i, temp_index, record.filename, "","", "", "", "", "", "", record.offset,os.path.join(path, hit[1].lstrip("//"))])
   continue
   
   index_data.append([
      i, temp_index, record.filename,
      record.type, record.primary_time,
      record.secondary_time,
      record.last_checked_time, record.location,
      record.number_of_hits, data, record.offset,
      os.path.join(path, hit[1].lstrip("//"))
   ])
   else:
      print("[-] {} not a valid index.dat file. Removing "
      "temp file..".format(temp_index))
      os.remove("index.dat")
      continue
      os.remove("index.dat")
      write_output(index_data)
   else:
      print("[-] Index.dat files not found in {} directory".format(path))
   sys.exit(3)
   else:
      print("[-] Directory {} not found".format(win_event))
   sys.exit(2)

現在，定義一個方法，該方法將輸出列印到 CSV 檔案中，如下所示：

def write_output(data):
   output_name = "Internet_Indexdat_Summary_Report.csv"
   print("[+] Writing {} with {} parsed index.dat files to current "
   "working directory: {}".format(output_name, len(data),os.getcwd()))
   
   with open(output_name, "wb") as outfile:
      writer = csv.writer(outfile)
      writer.writerow(["Index", "File Name", "Record Name",
      "Record Type", "Primary Date", "Secondary Date",
      "Last Checked Date", "Location", "No. of Hits",
      "Record Data", "Record Offset", "File Path"])
      writer.writerows(data)

執行上述指令碼後，我們將獲得 CSV 檔案中 index.dat 檔案的資訊。

卷影副本

卷影副本是 Windows 中包含的技術，用於手動或自動備份或快照計算機檔案。它也稱為卷快照服務或卷影服務 (VSS)。

藉助這些 VSS 檔案，取證專家可以獲得有關係統如何隨時間推移而變化以及計算機上存在哪些檔案的一些歷史資訊。卷影副本技術要求檔案系統為 NTFS，才能建立和儲存卷影副本。

在本節中，我們將看到一個 Python 指令碼，它有助於訪問取證映像中存在的任何卷影副本卷。

對於 Python 指令碼，我們需要安裝以下第三方模組：**pytsk3、pyewf、unicodecsv、pyvshadow** 和 **vss**。我們可以按照以下步驟從 VSS 檔案中提取資訊

首先，訪問原始映像的卷並識別所有 NTFS 分割槽。
然後，透過遍歷這些卷影副本來提取它們的資訊。
現在，最後我們需要建立快照內資料的列表檔案。

Python 程式碼

讓我們看看如何為此目的使用 Python 程式碼：

首先，匯入以下Python庫 -

from __future__ import print_function
import argparse
from datetime import datetime, timedelta

import os
import pytsk3
import pyewf
import pyvshadow
import sys
import unicodecsv as csv

from utility import vss
from utility.pytskutil import TSKUtil
from utility import pytskutil

現在，提供命令列處理程式的引數。這裡將接受兩個引數：第一個是證據檔案路徑，第二個是輸出檔案。

if __name__ == "__main__":
   parser = argparse.ArgumentParser('Parsing Shadow Copies')
   parser.add_argument("EVIDENCE_FILE", help = "Evidence file path")
   parser.add_argument("OUTPUT_CSV", help = "Output CSV with VSS file listing")
   args = parser.parse_args()

現在，驗證輸入檔案路徑的存在，並從輸出檔案中分離目錄。

directory = os.path.dirname(args.OUTPUT_CSV)
if not os.path.exists(directory) and directory != "":
   os.makedirs(directory)
if os.path.exists(args.EVIDENCE_FILE) and \ os.path.isfile(args.EVIDENCE_FILE):
   main(args.EVIDENCE_FILE, args.OUTPUT_CSV)
else:
   print("[-] Supplied input file {} does not exist or is not a "
   "file".format(args.EVIDENCE_FILE))
   
   sys.exit(1)

現在，透過建立 **TSKUtil** 物件來與證據檔案的捲進行互動。這可以透過以下 **main()** 方法完成：

def main(evidence, output):
   tsk_util = TSKUtil(evidence, "raw")
   img_vol = tsk_util.return_vol()

if img_vol is not None:
   for part in img_vol:
      if tsk_util.detect_ntfs(img_vol, part):
         print("Exploring NTFS Partition for VSS")
         explore_vss(evidence, part.start * img_vol.info.block_size,output)
      else:
         print("[-] Must be a physical preservation to be compatible ""with this script")
         sys.exit(2)

現在，定義一個方法來探索已解析的卷影檔案，如下所示：

def explore_vss(evidence, part_offset, output):
   vss_volume = pyvshadow.volume()
   vss_handle = vss.VShadowVolume(evidence, part_offset)
   vss_count = vss.GetVssStoreCount(evidence, part_offset)
   
   if vss_count > 0:
      vss_volume.open_file_object(vss_handle)
      vss_data = []
      
      for x in range(vss_count):
         print("Gathering data for VSC {} of {}".format(x, vss_count))
         vss_store = vss_volume.get_store(x)
         image = vss.VShadowImgInfo(vss_store)
         vss_data.append(pytskutil.openVSSFS(image, x))
write_csv(vss_data, output)

最後，定義一個將結果寫入電子表格的方法，如下所示：

def write_csv(data, output):
   if data == []:
      print("[-] No output results to write")
      sys.exit(3)
   print("[+] Writing output to {}".format(output))
   if os.path.exists(output):
      append = True
with open(output, "ab") as csvfile:
      csv_writer = csv.writer(csvfile)
      headers = ["VSS", "File", "File Ext", "File Type", "Create Date",
         "Modify Date", "Change Date", "Size", "File Path"]
      if not append:
         csv_writer.writerow(headers)
      for result_list in data:
         csv_writer.writerows(result_list)

成功執行此 Python 指令碼後，我們將獲得駐留在 VSS 中的資訊到電子表格中。

基於日誌工件的調查

到目前為止，我們已經瞭解瞭如何使用 Python 獲取 Windows 中的工件。在本章中，讓我們學習如何使用 Python 對基於日誌的工件進行調查。

簡介

基於日誌的工件是資訊寶庫，對於數字取證專家非常有用。儘管我們有各種監控軟體來收集資訊，但從這些軟體中解析有用資訊的主要問題是我們需要大量資料。

各種基於日誌的工件和 Python 中的調查

在本節中，讓我們討論各種基於日誌的工件及其在 Python 中的調查：

時間戳

時間戳傳達日誌中活動的日期和時間。它是任何日誌檔案的重要元素之一。請注意，這些日期和時間值可以採用各種格式。

下面顯示的 Python 指令碼將以原始日期時間作為輸入，並提供格式化的時間戳作為輸出。

對於此指令碼，我們需要遵循以下步驟：

首先，設定將獲取原始資料值以及資料來源和資料型別的引數。
現在，提供一個類，為跨不同日期格式的資料提供通用介面。

Python 程式碼

讓我們看看如何為此目的使用 Python 程式碼：

首先，匯入以下 Python 模組：

from __future__ import print_function
from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter
from datetime import datetime as dt
from datetime import timedelta

現在，像往常一樣，我們需要為命令列處理程式提供引數。這裡將接受三個引數，第一個是要處理的日期值，第二個是該日期值的來源，第三個是其型別：

if __name__ == '__main__':
   parser = ArgumentParser('Timestamp Log-based artifact')
   parser.add_argument("date_value", help="Raw date value to parse")
   parser.add_argument(
      "source", help = "Source format of date",choices = ParseDate.get_supported_formats())
   parser.add_argument(
      "type", help = "Data type of input value",choices = ('number', 'hex'), default = 'int')
   
   args = parser.parse_args()
   date_parser = ParseDate(args.date_value, args.source, args.type)
   date_parser.run()
   print(date_parser.timestamp)

現在，我們需要定義一個類，它將接受日期值、日期源和值型別作為引數：

class ParseDate(object):
   def __init__(self, date_value, source, data_type):
      self.date_value = date_value
      self.source = source
      self.data_type = data_type
      self.timestamp = None

現在，我們將定義一個方法，該方法將充當控制器，就像 **main()** 方法一樣：

def run(self):
   if self.source == 'unix-epoch':
      self.parse_unix_epoch()
   elif self.source == 'unix-epoch-ms':
      self.parse_unix_epoch(True)
   elif self.source == 'windows-filetime':
      self.parse_windows_filetime()
@classmethod
def get_supported_formats(cls):
   return ['unix-epoch', 'unix-epoch-ms', 'windows-filetime']

現在，我們需要定義兩個方法，分別處理 Unix 時間戳和 FILETIME：

def parse_unix_epoch(self, milliseconds=False):
   if self.data_type == 'hex':
      conv_value = int(self.date_value)
      if milliseconds:
         conv_value = conv_value / 1000.0
   elif self.data_type == 'number':
      conv_value = float(self.date_value)
      if milliseconds:
         conv_value = conv_value / 1000.0
   else:
      print("Unsupported data type '{}' provided".format(self.data_type))
      sys.exit('1')
   ts = dt.fromtimestamp(conv_value)
   self.timestamp = ts.strftime('%Y-%m-%d %H:%M:%S.%f')
def parse_windows_filetime(self):
   if self.data_type == 'hex':
      microseconds = int(self.date_value, 16) / 10.0
   elif self.data_type == 'number':
      microseconds = float(self.date_value) / 10
   else:
      print("Unsupported data type '{}'   provided".format(self.data_type))
      sys.exit('1')
   ts = dt(1601, 1, 1) + timedelta(microseconds=microseconds)
   self.timestamp = ts.strftime('%Y-%m-%d %H:%M:%S.%f')

執行上述指令碼後，透過提供時間戳，我們可以獲得易於閱讀格式的轉換值。

Web 伺服器日誌

從數字取證專家的角度來看，Web 伺服器日誌是另一個重要的工件，因為它們可以獲取有用的使用者統計資訊以及有關使用者和地理位置的資訊。以下是將處理 Web 伺服器日誌後建立電子表格的 Python 指令碼，以便輕鬆分析資訊。

首先，我們需要匯入以下 Python 模組：

from __future__ import print_function
from argparse import ArgumentParser, FileType

import re
import shlex
import logging
import sys
import csv

logger = logging.getLogger(__file__)

現在，我們需要定義將從日誌中解析的模式：

iis_log_format = [
   ("date", re.compile(r"\d{4}-\d{2}-\d{2}")),
   ("time", re.compile(r"\d\d:\d\d:\d\d")),
   ("s-ip", re.compile(
      r"((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4}")),
   ("cs-method", re.compile(
      r"(GET)|(POST)|(PUT)|(DELETE)|(OPTIONS)|(HEAD)|(CONNECT)")),
   ("cs-uri-stem", re.compile(r"([A-Za-z0-1/\.-]*)")),
   ("cs-uri-query", re.compile(r"([A-Za-z0-1/\.-]*)")),
   ("s-port", re.compile(r"\d*")),
   ("cs-username", re.compile(r"([A-Za-z0-1/\.-]*)")),
   ("c-ip", re.compile(
      r"((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4}")),
   ("cs(User-Agent)", re.compile(r".*")),
   ("sc-status", re.compile(r"\d*")),
   ("sc-substatus", re.compile(r"\d*")),
   ("sc-win32-status", re.compile(r"\d*")),
   ("time-taken", re.compile(r"\d*"))]

現在，為命令列處理程式提供引數。這裡將接受兩個引數，第一個是要處理的 IIS 日誌，第二個是所需的 CSV 檔案路徑。

if __name__ == '__main__':
   parser = ArgumentParser('Parsing Server Based Logs')
   parser.add_argument('iis_log', help = "Path to IIS Log",type = FileType('r'))
   parser.add_argument('csv_report', help = "Path to CSV report")
   parser.add_argument('-l', help = "Path to processing log",default=__name__ + '.log')
   args = parser.parse_args()
   logger.setLevel(logging.DEBUG)
   msg_fmt = logging.Formatter(
      "%(asctime)-15s %(funcName)-10s ""%(levelname)-8s %(message)s")
   
   strhndl = logging.StreamHandler(sys.stdout)
   strhndl.setFormatter(fmt = msg_fmt)
   fhndl = logging.FileHandler(args.log, mode = 'a')
   fhndl.setFormatter(fmt = msg_fmt)
   
   logger.addHandler(strhndl)
   logger.addHandler(fhndl)
   logger.info("Starting IIS Parsing ")
   logger.debug("Supplied arguments: {}".format(", ".join(sys.argv[1:])))
   logger.debug("System " + sys.platform)
   logger.debug("Version " + sys.version)
   main(args.iis_log, args.csv_report, logger)
   iologger.info("IIS Parsing Complete")

現在，我們需要定義 **main()** 方法，該方法將處理批次日誌資訊的指令碼：

def main(iis_log, report_file, logger):
   parsed_logs = []

for raw_line in iis_log:
   line = raw_line.strip()
   log_entry = {}

if line.startswith("#") or len(line) == 0:
   continue

if '\"' in line:
   line_iter = shlex.shlex(line_iter)
else:
   line_iter = line.split(" ")
   for count, split_entry in enumerate(line_iter):
      col_name, col_pattern = iis_log_format[count]

      if col_pattern.match(split_entry):
         log_entry[col_name] = split_entry
else:
   logger.error("Unknown column pattern discovered. "
      "Line preserved in full below")
      logger.error("Unparsed Line: {}".format(line))
      parsed_logs.append(log_entry)
      
      logger.info("Parsed {} lines".format(len(parsed_logs)))
      cols = [x[0] for x in iis_log_format]
      
      logger.info("Creating report file: {}".format(report_file))
      write_csv(report_file, cols, parsed_logs)
      logger.info("Report created")

最後，我們需要定義一個將輸出寫入電子表格的方法：

def write_csv(outfile, fieldnames, data):
   with open(outfile, 'w', newline="") as open_outfile:
      csvfile = csv.DictWriter(open_outfile, fieldnames)
      csvfile.writeheader()
      csvfile.writerows(data)

執行上述指令碼後，我們將獲得電子表格中的基於 Web 伺服器的日誌。

使用 YARA 掃描重要檔案

YARA（Yet Another Recursive Algorithm）是一種旨在用於惡意軟體識別和事件響應的模式匹配實用程式。我們將使用 YARA 掃描檔案。在以下 Python 指令碼中，我們將使用 YARA。

我們可以使用以下命令安裝 YARA：

pip install YARA

我們可以按照以下步驟使用 YARA 規則掃描檔案：

首先，設定和編譯 YARA 規則
然後，掃描單個檔案，然後遍歷目錄以處理各個檔案。
最後，我們將結果匯出到 CSV。

Python 程式碼

讓我們看看如何為此目的使用 Python 程式碼：

首先，我們需要匯入以下 Python 模組：

from __future__ import print_function
from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter

import os
import csv
import yara

接下來，為命令列處理程式提供引數。請注意，這裡將接受兩個引數：第一個是 YARA 規則的路徑，第二個是要掃描的檔案。

if __name__ == '__main__':
   parser = ArgumentParser('Scanning files by YARA')
   parser.add_argument(
      'yara_rules',help = "Path to Yara rule to scan with. May be file or folder path.")
   parser.add_argument('path_to_scan',help = "Path to file or folder to scan")
   parser.add_argument('--output',help = "Path to output a CSV report of scan results")
   args = parser.parse_args()
   main(args.yara_rules, args.path_to_scan, args.output)

現在，我們將定義 **main()** 函式，該函式將接受 yara 規則的路徑和要掃描的檔案：

def main(yara_rules, path_to_scan, output):
   if os.path.isdir(yara_rules):
      yrules = yara.compile(yara_rules)
   else:
      yrules = yara.compile(filepath=yara_rules)
   if os.path.isdir(path_to_scan):
      match_info = process_directory(yrules, path_to_scan)
   else:
      match_info = process_file(yrules, path_to_scan)
   columns = ['rule_name', 'hit_value', 'hit_offset', 'file_name',
   'rule_string', 'rule_tag']
   
   if output is None:
      write_stdout(columns, match_info)
   else:
      write_csv(output, columns, match_info)

現在，定義一個方法，該方法將遍歷目錄並將結果傳遞給另一個方法以進行進一步處理：

def process_directory(yrules, folder_path):
   match_info = []
   for root, _, files in os.walk(folder_path):
      for entry in files:
         file_entry = os.path.join(root, entry)
         match_info += process_file(yrules, file_entry)
   return match_info

接下來，定義兩個函式。請注意，首先我們將使用 **match()** 方法到 **yrules** 物件，另一個將匹配資訊報告到控制檯（如果使用者未指定任何輸出檔案）。觀察下面顯示的程式碼：

def process_file(yrules, file_path):
   match = yrules.match(file_path)
   match_info = []
   
   for rule_set in match:
      for hit in rule_set.strings:
         match_info.append({
            'file_name': file_path,
            'rule_name': rule_set.rule,
            'rule_tag': ",".join(rule_set.tags),
            'hit_offset': hit[0],
            'rule_string': hit[1],
            'hit_value': hit[2]
         })
   return match_info
def write_stdout(columns, match_info):
   for entry in match_info:
      for col in columns:
         print("{}: {}".format(col, entry[col]))
   print("=" * 30)

最後，我們將定義一個將輸出寫入 CSV 檔案的方法，如下所示：

def write_csv(outfile, fieldnames, data):
   with open(outfile, 'w', newline="") as open_outfile:
      csvfile = csv.DictWriter(open_outfile, fieldnames)
      csvfile.writeheader()
      csvfile.writerows(data)

成功執行上述指令碼後，我們可以在命令列中提供適當的引數，並可以生成 CSV 報告。

列印頁面