Llama 快速指南

Llama - 環境設定

Llama 的環境設定包含幾個關鍵步驟，包括安裝依賴項、Python 及其庫，以及配置您的 IDE 以提高開發效率。現在，您已經擁有了正常工作的環境，可以輕鬆地使用 Llama 進行開發。如果您對開發 NLP 模型或一般文字生成實驗感興趣，這將確保您在 AI 旅程中有一個非常順利的開始。

讓我們繼續安裝依賴項和配置 IDE，以便我們可以執行程式碼並進行正確的配置。

依賴項安裝

作為繼續編寫程式碼的先決條件，您必須檢查是否已安裝所有先決條件。Llama 依賴於許多庫和包，以確保自然語言處理以及基於 AI 的任務能夠順利執行。

步驟 1：安裝 Python

首先，您應該確保您的機器上存在 Python。Llama 要求 Python 版本至少為 3.8 或更高版本才能成功安裝。如果尚未安裝，您可以從 Python 官方網站獲取。

步驟 2：安裝 PIP

您必須安裝 PIP，即 Python 的包安裝程式。以下是如何檢查 PIP 是否已安裝：

pip --version

如果不是這種情況，可以使用以下命令進行安裝：

python -m ensurepip –upgrade

步驟 3：安裝虛擬環境

使用數字環境來隔離專案的依賴項至關重要。

安裝

pip install virtualenv

為您的 Llama 專案建立虛擬環境：

virtualenv Llama_env

啟用虛擬環境：

Windows

Llama_env\Scripts\activate

Mac/Linux

source Llama_env/bin/activate

步驟 4：安裝庫

Llama 需要幾個 Python 庫才能執行。要安裝它們，請在您的終端中輸入以下命令。

pip install torch transformers datasets

這些庫包括：

torch - 深度學習相關任務。
transformers - 預訓練模型。
datasets - 用於處理大型資料集。

嘗試在 Python 中匯入以下庫以檢查安裝情況。

import torch
import transformers
import datasets

如果沒有錯誤訊息，則表示安裝完成。

設定 Python 和庫

設定依賴項，然後安裝 Python 和庫以構建 Llama。

步驟 1：驗證 Python 的安裝

開啟 Python 直譯器並執行以下程式碼以驗證 Python 和所需的庫是否都已安裝：

import torch
import transformers

print(f"PyTorch version: {torch.__version__}")
print(f"Transformers version: {transformers.__version__}")

輸出

PyTorch version: 1.12.1
Transformers version: 4.30.0

步驟 2：安裝其他庫（可選）

根據您對 Llama 的使用案例，您可能需要一些其他庫。以下是可選庫的列表，但對您非常有用：

scikit-learn - 用於機器學習模型。
matplotlib - 用於視覺化。
numpy - 用於科學計算。

使用以下命令安裝它們：

pip install scikit-learn matplotlib numpy

步驟 3：使用小型模型測試 Llama

我們將載入一個小型預訓練模型來檢查一切是否正常執行。

from transformers import pipeline

# Load the Llama model
generator = pipeline('text-generation', model='EleutherAI/gpt-neo-125M')
 
# Generating text
output = generator("Llama is a large language model", max_length=50, num_return_sequences=1)
print(output)

輸出

[{'generated_text': 'Llama is a large language model, and it is a language 
model that is used to describe the language of the world. The language model
is a language model that is used to describe the language of the world. 
The language model is a language'}]

這表明配置是正確的，我們現在可以將 Llama 嵌入到我們的應用程式中。

配置您的 IDE

選擇合適的 IDE 並正確配置它將使開發變得非常順利。

步驟 1：選擇 IDE

以下是一些最流行的 Python IDE 選擇：

Visual Studio Code VS Code PyCharm

在本教程中，我們將選擇 VS Code，因為它輕量級且擁有專屬於 Python 的大量擴充套件。

步驟 2：為 VS Code 安裝 Python 擴充套件

要開始在 VS Code 中進行 Python 開發，您需要 Python 擴充套件。它可以透過 VS Code 中的擴充套件直接安裝。

開啟 VS Code
您可以導航到“擴充套件”檢視，點選“擴充套件”圖示，或使用 Ctrl + Shift + X。
搜尋“Python”並安裝 Microsoft 的官方擴充套件。

步驟 3：配置 Python 直譯器

我們透過以下操作設定 Python 直譯器以使用我們之前建立的虛擬環境：

Ctrl+Shift+P - 開啟命令面板
Python - 選擇 Interpreter 並選擇虛擬環境中可用的直譯器；我們選擇位於 Llama_env 中的那個。

步驟 4：建立 Python 檔案

現在您已選擇瞭解釋器，您可以建立一個新的 Python 檔案並將其儲存為任何您想要的名稱（例如，Llamam_test.py）。以下是如何使用 Llama 載入和執行文字生成模型：

from transformers import pipeline
generator = pipeline('text-generation', model='EleutherAI/gpt-neo-125M')
# Text generation
output = generator("Llama is a large language model", max_length=50, num_return_sequences=1)
print(output)

在輸出中，您將看到 Python 環境是如何配置的，程式碼是在整合開發環境中編寫的，輸出顯示在終端中。

輸出

[{'generated_text': 'Llama is a large language model, and it is a language
model that is used to describe the language of the world. The language 
model is a language model that is used to describe the language of 
the world. The language model is a language'}]

步驟 5：執行程式碼

如何執行程式碼？

右鍵點選 Python 檔案，然後選擇“在終端中執行 Python 檔案”。
預設情況下，它會在整合終端中自動顯示輸出。

步驟 6：在 VS Code 中進行除錯

除了對除錯的大力支援之外，VS Code 還為您提供了出色的除錯支援。您可以透過點選程式碼行號左側來建立斷點，並使用 F5 開始除錯。這將幫助您逐步執行程式碼並檢查變數。

Llama 快速入門

Llama 代表 大型語言模型 Meta AI。由 Meta AI 建立，其架構在 Transformer 中得到了改進，旨在處理自然語言處理中更復雜的問題。Llama 以一種賦予其類人特徵的方式生成文字，從而提高對語言的理解以及更多功能，包括文字生成、翻譯、摘要等等。

Llama 是一種能夠在其同類 GPT-3 所需的較小資料集上最佳化效能的模型。它旨在在較小的資料集上高效執行，從而使其能夠被更廣泛的使用者使用，並且具有可擴充套件性。

Llama 架構概述

Transformer 模型作為 Llama 的基礎架構。它最初由 Vaswani 等人以“注意力就是你所需要的一切”的名義引入，但它本質上是一個自迴歸模型。這意味著它一次生成一個標記，根據迄今為止出現的序列預測下一個單詞。

Llama 架構的重要特徵如下：

高效訓練 - Llama 可以高效地在更小的資料集上進行訓練。因此，它特別適用於計算能力有限或資料可用性較小的研究和應用場景。
自迴歸結構 - 它逐個生成標記，使生成的文字高度連貫，因為每個後續標記都基於迄今為止的所有標記。
多頭自注意力 - 模型的注意力機制的設計方式是根據重要性為句子中的單詞分配不同的權重，以便它理解輸入中的區域性和全域性上下文。
堆疊的 Transformer 層 - Llama 堆疊了許多 Transformer 塊，每個塊由自注意力機制和前饋神經網路組成。

為什麼選擇 Llama？

Llama 已在模型容量的計算效率方面取得了合理的水平。它可以生成非常長的連貫文字流並執行幾乎任何任務，包括問答和摘要，一直到語言翻譯等資源節約型活動。Llama 模型比其他一些大型語言模型（如 GPT-3）更小且執行成本更低，因此這項工作可以讓更多人參與。

Llama 變體

Llama 存在各種版本，所有這些版本都使用不同數量的引數進行訓練：

Llama-7B = 70 億個引數
Llama-13B = 130 億個引數
Llama-30B = 300 億個引數
Llama-65B = 650 億個引數

透過這樣做，使用者可以根據自己的硬體以及特定任務的要求選擇合適的模型變體。

瞭解模型的元件

Llama 的功能建立在幾個高度關鍵的元件之上。讓我們討論每個元件，並考慮它們如何相互通訊以提高模型的整體效能。

嵌入層

Llama 的嵌入層是將輸入標記對映到高維向量。因此，它捕獲了單詞之間的語義關係。這種對映背後的直覺是在連續的向量空間中，語義相似的標記彼此最接近。

嵌入層還透過將標記的形狀更改為轉換層期望的維度來為後續的轉換層做好準備。

import torch
import torch.nn as nn
# Embedding layer
embedding = nn.Embedding(num_embeddings=10000, embedding_dim=256)
# Tokenized input (for example: "The future is bright")
input_tokens = torch.LongTensor([2, 45, 103, 567])
# Output embedding
embedding_output = embedding(input_tokens)
print(embedding_output)

輸出

tensor([[-0.4185, -0.5514, -0.8762,  ...,  0.7456,  0.2396,  2.4756],
        [ 0.7882,  0.8366,  0.1050,  ...,  0.2018, -0.2126,  0.7039],
        [ 0.3088, -0.3697,  0.1556,  ..., -0.9751, -0.0777, -1.3352],
        [ 0.7220, -0.7661,  0.2614,  ...,  1.2152,  1.6356,  0.6806]],
       grad_fn=<EmbeddingBackward0>)

這種詞嵌入表示也允許模型以複雜的方式理解標記如何相互關聯。

自注意力機制

Transformer 模型的自注意力是 Llama 將注意力機制應用於句子的一部分並理解每個單詞如何與其他單詞相關聯的創新之處。在這種情況下，Llama 使用多頭注意力，將注意力機制拆分為多個頭，以便模型可以自由地探索輸入序列的部分。

因此，建立了查詢、鍵和值矩陣，模型根據這些矩陣選擇相對於其他單詞每個單詞的權重（或注意力）是多少。

import torch
import torch.nn.functional as F

# Sample query, key, value tensors
queries = torch.rand(1, 4, 16)  # (batch_size, seq_length, embedding_dim)
keys = torch.rand(1, 4, 16)
values = torch.rand(1, 4, 16)

# Compute scaled dot-product attention
scores = torch.bmm(queries, keys.transpose(1, 2)) / (16 ** 0.5)
attention_weights = F.softmax(scores, dim=-1)

# apply attention weights to values
output = torch.bmm(attention_weights, values)
print(output)

輸出

tensor([[[0.4782, 0.5340, 0.4079, 0.4829, 0.4172, 0.5398, 0.3584, 0.6369,
          0.5429, 0.7614, 0.5928, 0.5989, 0.6796, 0.7634, 0.6868, 0.5903],
         [0.4651, 0.5553, 0.4406, 0.4909, 0.3724, 0.5828, 0.3781, 0.6293,
          0.5463, 0.7658, 0.5828, 0.5964, 0.6699, 0.7652, 0.6770, 0.5583],
         [0.4675, 0.5414, 0.4212, 0.4895, 0.3983, 0.5619, 0.3676, 0.6234,
          0.5400, 0.7646, 0.5865, 0.5936, 0.6742, 0.7704, 0.6792, 0.5767],
         [0.4722, 0.5550, 0.4352, 0.4829, 0.3769, 0.5802, 0.3673, 0.6354,
          0.5525, 0.7641, 0.5722, 0.6045, 0.6644, 0.7693, 0.6745, 0.5674]]])

這種注意力機制使模型能夠“關注”序列的不同部分，從而使其能夠學習句子中單詞之間的長距離依賴關係。

多頭注意力

多頭注意力是自注意力的擴充套件，其中多個注意力頭並行應用。透過這樣做，每個注意力頭都會選擇輸入的不同部分，確保資料中所有可能的依賴關係都得以實現。

接下來，它會進入一個前饋網路，分別處理每個注意力結果。

import torch
import torch.nn as nn
import torch.nn.functional as F

class MultiHeadAttention(nn.Module):
   def __init__(self, dim_model, num_heads):
      super(MultiHeadAttention, self).__init__()
      self.num_heads = num_heads
      self.dim_head = dim_model // num_heads

        self.query = nn.Linear(dim_model, dim_model)
        self.key = nn.Linear(dim_model, dim_model)
        self.value = nn.Linear(dim_model, dim_model)
        self.out = nn.Linear(dim_model, dim_model)
        
   def forward(self, x):
      B, N, C = x.shape
      queries = self.query(x).reshape(B, N, self.num_heads, self.dim_head).transpose(1, 2)
      keys = self.key(x).reshape(B, N, self.num_heads, self.dim_head).transpose(1, 2)
      values = self.value(x).reshape(B, N, self.num_heads, self.dim_head).transpose(1, 2)
      intention = torch.matmul(queries, keys.transpose(-2, -1)) / (self.dim_head ** 0.5)
      attention_weights = F.softmax(intention, dim=-1)
      out = torch.matmul(attention_weights, values).transpose(1, 2).reshape(B, N, C)
      return self.out(out)

# Multiple attention building and calling
attention_layer = MultiHeadAttention(128, 8)
output = attention_layer(torch.rand(1, 10, 128))  # (batch_size, seq_length, embedding_dim)
print(output)

輸出

tensor([[[-0.1015, -0.1076,  0.2237,  ...,  0.1794, -0.3297,  0.1177],
         [-0.1028, -0.1068,  0.2219,  ...,  0.1798, -0.3307,  0.1175],
         [-0.1018, -0.1070,  0.2228,  ...,  0.1793, -0.3294,  0.1183],
         ...,
         [-0.1021, -0.1075,  0.2245,  ...,  0.1803, -0.3312,  0.1171],
         [-0.1041, -0.1070,  0.2232,  ...,  0.1817, -0.3308,  0.1184],
         [-0.1027, -0.1087,  0.2223,  ...,  0.1801, -0.3295,  0.1179]]],
       grad_fn=<ViewBackward0>)

前饋網路

前饋網路可能是 Transformer 塊中最簡單但最基本的重要組成部分。顧名思義，它對輸入序列應用某種形式的非線性變換；因此，模型可以學習更復雜的模式。

Llama 的每個注意力層都使用前饋網路進行這種變換。

class FeedForward(nn.Module):
   def __init__(self, dim_model, dim_ff):
      super(FeedForward, self).__init__() #This line was incorrectly indented
      self.fc1 = nn.Linear(dim_model, dim_ff)
      self.fc2 = nn.Linear(dim_ff, dim_model)
      self.relu = nn.ReLU()
   def forward(self, x):
      return self.fc2(self.relu(self.fc1(x)))

# define and use the feed-forward network
ffn = FeedForward(128, 512)
ffn_output = ffn(torch.rand(1, 10, 128))  # (batch_size, seq_length, embedding_dim)
print(ffn_output)

輸出

tensor([[[ 0.0222, -0.1035, -0.1494,  ...,  0.0891,  0.2920, -0.1607],
         [ 0.0313, -0.2393, -0.2456,  ...,  0.0704,  0.1300, -0.1176],
         [-0.0838, -0.0756, -0.1824,  ...,  0.2570,  0.0700, -0.1471],
         ...,
         [ 0.0146, -0.0733, -0.0649,  ...,  0.0465,  0.2674, -0.1506],
         [-0.0152, -0.0657, -0.0991,  ...,  0.2389,  0.2404, -0.1785],
         [ 0.0095, -0.1162, -0.0693,  ...,  0.0919,  0.1621, -0.1421]]],
       grad_fn=<ViewBackward0>)

使用 Llama 模型建立 Token 的步驟

在訪問 Llama 模型之前，您需要在 Hugging Face 上建立 Token。我們使用 Llama 2 模型因為它更輕量級。您可以選擇任何模型。請按照以下步驟開始。

步驟 1：註冊 Hugging Face 賬戶（如果您還沒有註冊）

在 Hugging Face 首頁上，點選“註冊”。
對於所有尚未建立賬戶的使用者，請立即建立一個。

步驟 2：填寫申請表以訪問 Llama 模型

要下載和使用 Llama 模型，您需要填寫申請表。為此 -

訪問 Llama 下載頁面，並填寫所有必填欄位。

Fill out Request Form to Access to Llama Models

選擇您的模型（這裡我們將使用 Llama 2 以簡化和減輕重量）並在表單中點選“下一步”。
接受 Llama 2 的條款和條件，然後點選“接受並繼續”。
您已完成設定。

步驟 3：獲取訪問 Token

轉到您的 Hugging Face 賬戶。
點選右上角的個人資料照片，您將進入“設定”頁面。
導航到“訪問 Token”
點選“建立新的 Token”
- 例如將其命名為“Llama 訪問 Token”
- 勾選使用者許可權。範圍至少應設定為“讀取”以訪問受限模型。
- 點選“建立 Token”
複製 Token，您將在下一步中使用它。

步驟 4：使用 Token 在指令碼中進行身份驗證

獲得 Hugging Face Token 後，您必須在 Python 指令碼中使用此 Token 進行身份驗證。

首先，如果您尚未安裝，請安裝所需的軟體包 -

!pip install transformers huggingface_hub torch

從 Hugging Face Hub 匯入登入方法，並使用您的 Token 登入 -

from huggingface_hub import login
# Set your_token to your token
login(token=" <your_token>")

或者，如果您不希望互動式登入，則可以在載入模型時直接在程式碼中傳遞您的 Token。

步驟 5：更新程式碼以使用 Token 載入模型

使用您的 Token 載入受限模型。

可以將 Token 直接傳遞給 from_pretrained() 方法。

from transformers import AutoModelForCausalLM, AutoTokenizer
from huggingface_hub import login
 
token = "your_token"
# Login with your token (put <your_token> in quotes)
login(token=token)
 
# Loading tokenizer and model from gated repository and using auth token
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf', token=token)
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-hf', token=token)

步驟 6：執行程式碼

插入並登入或在模型載入函式期間傳遞您的 Token 後，您的指令碼現在應該能夠訪問受限儲存庫並從 Llama 模型中獲取文字。

執行您的第一個 Llama 指令碼

我們已經建立了 Token 和其他身份驗證；現在是時候執行您的第一個 Llama 指令碼了。您可以使用預訓練的 Llama 模型進行文字生成。我們使用 Llama-2-7b-hf，這是 Llama 2 模型之一。

from transformers import AutoModelForCausalLM, AutoTokenizer
#import tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf', token=token)
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-hf', token=token)
#Encode input text and generate
input_text = "The future of AI is"
input_ids = tokenizer.encode(input_text, return_tensors="pt")
outputs = model.generate(input_ids, max_length=50, num_return_sequences=1)

# Decode and print output
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

輸出

The future of AI is a subject of great interest, and it is not surprising
that many people are interested in the subject. It is a very interesting
topic, and it is a subject that is likely to be discussed for many years to come

生成文字 - 上述指令碼生成一個文字序列，表示 Llama 如何解釋上下文以及建立連貫的寫作。

總結

由於其基於 Transformer 的架構、多頭注意力和自迴歸生成能力，Llama 令人印象深刻。計算效率和模型效能之間的平衡使得 Llama 適用於廣泛的自然語言處理任務。熟悉 Llama 最重要的元件和架構將使您有機會嘗試生成文字、翻譯、摘要等等。

Llama 的資料準備

良好的資料準備是訓練任何高效能語言模型（如 Llama）的關鍵。資料準備包括收集和清理資料、準備 Llama 可用的資料以及使用不同的資料預處理器。NLTK、spaCy 和 Hugging Face 分詞器等工具共同幫助使資料準備好應用於 Llama 的訓練流程。一旦您瞭解了這些資料預處理階段，您就可以確保提高 Llama 模型的效能。

資料準備被認為是機器學習模型中最關鍵的階段之一，尤其是在處理大型語言模型時。本章討論如何準備用於 Llama 的資料，並涵蓋以下主題。

資料收集和清理
為 Llama 格式化資料
資料預處理中使用的工具

所有這些過程確保資料將得到良好的清理並進行適當的結構化，以最佳化用於 Llama 的訓練流程。

收集和清理資料

資料收集

與訓練像 Llama 這樣的模型相關的最關鍵點是高質量的多樣性資料。換句話說，訓練語言模型時使用的文字資料的主要來源是來自其他型別文字的片段，包括書籍、文章、部落格文章、社交媒體內容、論壇和其他公開可用的文字資料。

使用 Python 抓取網站的文字資料

import requests
from bs4 import BeautifulSoup
# URL to fetch data from
url = 'https://tutorialspoint.tw/Llama/index.htm'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
# Now, extract text data
text_data = soup.get_text()
# Now, save data to the file
with open('raw_data.txt', 'w', encoding='utf-8') as file:
    file.write(text_data)

輸出

執行指令碼時，它會將抓取的文字儲存到名為 raw_data.txt 的檔案中，然後將該原始文字清理成資料。

資料清理

原始資料充滿了噪音，包括 HTML 標籤、特殊字元和原始資料中呈現的無關資料，因此在將其呈現給 Llama 之前必須對其進行清理。資料清理可能包括；

刪除 HTML 標籤
特殊字元
大小寫敏感
分詞
去除停用詞

示例：使用 Python 預處理文字資料

import re
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

import nltk
nltk.download('punkt')
nltk.download('stopwords')

# Load raw data
with open('/raw_data.txt', 'r', encoding='utf-8') as file:
    text_data = file.read()

# Clean HTML tags
clean_data = re.sub(r'<.*?>', '', text_data)

# Clean special characters
clean_data = re.sub(r'[^A-Za-z0-9\\\\\\s]', '', clean_data)

# Split text into tokens
tokens = word_tokenize(clean_data)

stop_words = set(stopwords.words('english'))

# Filter out stop words from tokens
filtered_tokens = [w for w in tokens if not w.lower() in stop_words]

# Save cleaned data
with open('cleaned_data.txt', 'w', encoding='utf-8') as file:
    file.write(' '.join(filtered_tokens))

print("Data cleaned and saved to cleaned_data.txt")

輸出

Data cleaned and saved to cleaned_data.txt

清理後的資料將儲存到 cleaned_data.txt 中。該檔案現在包含分詞和清理後的資料，並已準備好進行進一步格式化和預處理以用於 Llama。

預處理您的資料以與 Llama 一起使用

Llama 需要將輸入資料進行預結構化以進行訓練。資料應進行分詞，並且還可以根據其將要與之結合使用的架構轉換為 JSON 或 CSV 等格式進行訓練。

文字分詞

文字分詞是將句子分成較小部分（通常是單詞或子詞）的行為，以便 Llama 可以處理它們。您可以使用預構建的庫，其中包括 Hugging Face 的分詞器庫。

from transformers import LlamaTokenizer

# token = "your_token"
# Sample sentence
text = "Llama is an innovative language model."

#Load Llama tokenizer
tokenizer = LlamaTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf", token=token)

#Tokenize
encoded_input = tokenizer(text)

print("Original Text:", text)
print("Tokenized Output:", encoded_input)

輸出

Original Text: Llama is an innovative language model.
Tokenized Output: {'input_ids': [1, 365, 29880, 3304, 338, 385, 24233, 1230, 4086, 1904, 29889], 
   'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

將資料轉換為 JSON 格式

JSON 格式與 Llama 相關，因為它以結構化的方式儲存文字資料。

import json
    
# Data structure
data = {
"id": "1",
"text": "Llama is a powerful language model for AI research."
}
# Save data as JSON
with open('formatted_data.json', 'w', encoding='utf-8') as json_file:
    json.dump(data, json_file, indent=4)
    
print("Data formatted and saved to formatted_data.json")

輸出

Data formatted and saved to formatted_data.json

程式將列印一個名為 formatted_data.json 的檔案，其中包含 JSON 格式的格式化文字資料。

資料預處理工具

資料清理、分詞和格式化工具適用於 Llama。最常用的工具組是使用 Python 庫、文字處理框架和命令找到的。以下是 Llama 資料準備中的一些廣泛應用的工具列表。

1. NLTK（自然語言工具包）

自然語言處理最著名的庫被稱為 NLTK。此庫支援的功能包括清理、分詞和文字資料的詞幹提取。

示例：使用 NLTK 刪除停用詞

import nltk
from nltk.corpus import stopwords
nltk.download('punkt')
nltk.download('stopwords')

# Test Data
text = "This is a simple sentence with stopwords."
 
# Tokenization
words = nltk.word_tokenize(text)

# Stopwords
stop_words = set(stopwords.words('english'))

filtered_text = [w for w in words if not w.lower() in stop_words] # This line is added to filter the words and assign to the variable
print("Original Text:", text)
print("Filtered Text:", filtered_text)

輸出

Original Text: This is a simple sentence with stopwords.
Filtered Text: ['simple', 'sentence', 'stopwords', '.']

2. spaCy

另一個專為資料預處理而設計的高階庫。它速度快、效率高，並且構建用於 NLP 任務中的實際應用。

示例：使用 spaCy 進行分詞

import spacy

# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Sample sentence
text = "Llama is an innovative language model."

# Process the text
doc = nlp(text)

# Tokenize
tokens = [token.text for token in doc]

print("Tokens:", tokens)

輸出

Tokens: ['Llama', 'is', 'an', 'innovative', 'language', 'model', '.']

3. Hugging Face 分詞器

Hugging Face 提供了一些高效能的分詞器，這些分詞器主要用於訓練語言模型，而不是 Llama 本身。

示例：使用 Hugging Face 分詞器

from transformers import AutoTokenizer
token = "your_token"
# Sample sentence
text = "Llama is an innovative language model."

#Load Llama tokenizer
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf', token=token)

#Tokenize
encoded_input = tokenizer(text)
print("Original Text:", text)
print("Tokenized Output:", encoded_input)

輸出

Original Text: Llama is an innovative language model.
Tokenized Output: {'input_ids': [1, 365, 29880, 3304, 338, 385, 24233, 1230, 4086, 1904, 29889], 
   'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

4. Pandas 用於資料格式化

當您處理結構化資料時使用。您可以使用 Pandas 將資料格式化為 CSV 或 JSON，然後將其傳遞給 Llama。

import pandas as pd

# Data structure
data = {
"id": "1",
"text": "Llama is a powerful language model for AI research."
}

# Create DataFrame with an explicit index
df = pd.DataFrame([data], index=[0]) # Creating a list of dictionary and passing an index [0]

# Save DataFrame to CSV
df.to_csv('formatted_data.csv', index=False)

print("Data saved to formatted_data.csv")

輸出

Data saved to formatted_data.csv

格式化的文字資料將在 CSV 檔案 formatted_data.csv 中找到。

從頭開始訓練 Llama

訓練 Llama 從頭開始非常需要資源，但也很有意義。使用正確的訓練資料集準備和訓練引數的正確設定執行訓練迴圈將確保您生成足夠可靠的語言模型，以應用於許多 NLP 任務。成功的秘訣是在訓練期間進行適當的預處理、引數調整和最佳化。

與其他 GPT 風格的模型相比，Llama 的版本是一個開源版本。此模型需要大量資源、徹底的準備等等才能從頭開始訓練。本章報告了從頭開始訓練 Llama 的過程。該方法包括從準備訓練資料集到配置訓練引數以及實際進行訓練的所有內容。

Llama 旨在支援幾乎所有 NLP 應用，包括但不限於生成文字、翻譯和摘要。可以透過三個關鍵步驟從頭開始訓練大型語言模型 -

準備訓練資料集
適當的訓練引數
管理過程並確保有效的最佳化

所有步驟都將與程式碼片段和輸出含義一起逐步遵循。

準備您的訓練資料集

訓練任何 LLM 最重要的第一步是為其提供出色、多樣且廣泛的資料集。Llama 需要海量的文字資料來捕捉人類語言的豐富性。

收集資料

訓練 Llama 需要一個單片資料集，其中包含來自各個領域的各種文字樣本。一些用於訓練 LLM 的示例資料集包括 Common Crawl、維基百科、BooksCorpus 和 OpenWebText。

示例：下載資料集

import requests
import os

# Create a directory for datasets
os.makedirs("datasets", exist_ok=True)

# URL to dataset
url = "https://example.com/openwebtext.zip"
output = "datasets/openwebtext.zip"

# Download the dataset
response = requests.get(url)
with open(output, "wb") as file:
    file.write(response.content)
print(f"Dataset downloaded and saved at {output}")

輸出

Dataset downloaded and saved at datasets/openwebtext.zip

下載資料集後，您需要在訓練之前預處理文字資料。大多數預處理涉及分詞、小寫化、刪除特殊字元以及設定資料以適應給定的結構。

示例：預處理資料集

from transformers import LlamaTokenizer

# Load pre-trained tokenizer 
tokenizer = LlamaTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf", token=token)

# Load raw text
with open('/content/raw_data.txt', 'r') as file:
    raw_text = file.read()

# Tokenize the text
tokens = tokenizer.encode(raw_text, add_special_tokens=True)

# Save tokens to a file
with open('/tokenized_text.txt', 'w') as token_file:
    token_file.write(str(tokens))
    
print(f"Text tokenized and saved as tokens.")

輸出

Text tokenized and saved as tokens.

設定模型訓練引數

現在，我們將繼續設定訓練引數。這些引數設定您的模型將如何從資料集中學習；因此，它們對模型的效能有直接影響。

主要訓練引數

批次大小 − 模擬權重更新前經過的樣本數量。
學習率 − 根據損失梯度設定更新模型引數的程度。
輪次 − 模型遍歷整個資料集的次數。
最佳化器 − 用於透過更改權重來最小化損失函式。

您可以使用 AdamW 作為最佳化器，並使用預熱學習率排程器來訓練 Llama。

示例：訓練引數配置

import torch
from transformers import LlamaForCausalLM, AdamW, get_linear_schedule_with_warmup
# token="you_token"

# Load the model
model = LlamaForCausalLM.from_pretrained('meta-llama/Llama-2-7b-chat-hf', token=token)

model = model.to("cuda") if torch.cuda.is_available() else model.to("cpu")
# Training parameters
epochs = 3
batch_size = 8
learning_rate = 5e-5
warmup_steps = 200

# Set the optimizer and scheduler
optimizer = AdamW(model.parameters(), lr=learning_rate)
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=warmup_steps, num_training_steps=epochs)
print("Training parameters set.")

輸出

Training parameters set.

用於批次的 DataLoader

訓練需要分批資料。使用 PyTorch 的 DataLoader 可以很容易地做到這一點。

from torch.utils.data import DataLoader, Dataset
# Custom dataset class
class TextDataset(Dataset):
    def __init__(self, tokenized_text):
       self.data = tokenized_text
    def __len__(self): 
        return len(self.data) // batch_size 
    def __getitem__(self, idx): 
        return self.data[idx * batch_size : (idx + 1) * batch_size]

with open("/tokenized_text.txt", 'r') as f:
  tokens_str = f.read()
tokens = eval(tokens_str)  # Evaluate the string to get the list

# DataLoader definition
train_data = TextDataset(tokens)
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)

print(f"DataLoader created with batch size {batch_size}.")

輸出

DataLoader created with batch size 8.

現在學習過程的要求和資料載入過程已經確定，是時候進入實際的訓練階段了。

訓練模型

所有這些準備工作在訓練迴圈的執行中協同工作。訓練資料集只不過是簡單地分批次將資料輸入模型，然後使用損失函式更新其引數。

執行訓練迴圈

現在到了整個訓練過程，所有這些準備工作都將與現實世界相遇。分階段向演算法提供資料集合，以便根據其變數的損失函式進行更新。

import tqdm

# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

for epoch in range(epochs):
   print(f"Epoch {epoch + 1}/{epochs}")
   model.train
   total_loss = 0  
   for batch in tqdm.tqdm(train_loader
      batch = [torch.tensor(sub_batch, device=device) for sub_batch in batch]
      max_len = max(len(seq) for seq in batch)
      padded_batch = torch.zeros((len(batch), max_len), dtype=torch.long, device=device)
      for i, seq in enumerate(batch):
         padded_batch[i, :len(seq)] = seq

       # Forward pass, use padded_batch 
       outputs = model(padded_batch, labels=padded_batch
       loss = outputs.loss  
       # Backward pass
       optimizer.zero_grad()  # Reset gradients.
       loss.backward()  # Calculate gradients.
       optimizer.step()  # Update model parameters.
       scheduler.step()  # Update learning rate.
        
       total_loss += loss.item()  # Accumulate loss.

   print(f"Epoch {epoch + 1} completed. Loss: {total_loss:.4f}")

輸出

Epoch 1 completed. Loss: 424.4011
Epoch 2 completed. Loss: 343.4245
Epoch 3 completed. Loss: 328.7054

儲存模型

訓練完成後，儲存模型；否則，每次訓練時都要儲存。

# Save the trained model
model.save_pretrained('trained_Llama_model')
print("Model saved successfully.")

輸出

Model saved successfully.

現在我們已經從頭訓練了模型並儲存了它。我們可以使用該模型來預測新的字元/單詞。我們將在後續章節中詳細介紹。

針對特定任務微調 Llama 2

微調是一個自定義預訓練大型語言模型 (LLM) 以使其在特定任務上表現更好的過程。微調 Llama 2 是調整預訓練模型的引數以提高其在特定任務或資料集上的效能的過程。此過程可用於使 Llama 2 適應各種任務。

本章涵蓋了遷移學習和微調技術的概念，以及微調 Llama以完成不同任務的示例。

理解遷移學習

遷移學習是機器學習的一種應用，其中一個在更大的語料庫上預訓練的模型被適應於一個相關的任務，但規模要小得多。它利用模型在更大的語料庫上已經獲得的知識，而不是從頭開始訓練模型，這在計算上既昂貴又耗時。

以 Llama 為例：它是在大量文字資料上預訓練的。我們將使用遷移學習；我們將對其進行微調，使其在更小的資料集上完成一個非常不同的 NLP 任務：例如情感分析、文字分類或問答。

遷移學習的主要優勢

節省時間 − 微調比從原始資料集訓練模型花費的時間少得多。
改進泛化能力 − 預訓練模型已經學習了適用於各種自然語言處理應用的通用語言模式。
資料效率 − 微調即使在較小的資料集上也能使模型高效。

微調技術

微調Llama或任何其他大型語言模型都是針對特定任務微調模型引數的過程。有幾種微調技術

完整模型微調

這會更新模型每一層的引數。不過，它確實使用了大量的計算，並且可能更適合特定任務的效能。

from transformers import LlamaForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset

# Load tokenizer (assuming you need to define the tokenizer)
from transformers import LlamaTokenizer
tokenizer = LlamaTokenizer.from_pretrained("meta-Llama/Llama-2-7b-chat-hf")

# Load dataset
dataset = load_dataset("imdb")

# Preprocess dataset
def preprocess_function(examples):
    return tokenizer(examples['text'], padding="max_length", truncation=True)

tokenized_dataset = dataset.map(preprocess_function, batched=True)

# Set up training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01
)

model = LlamaForSequenceClassification.from_pretrained("meta-Llama/Llama-2-7b-chat-hf", num_labels=2)

# Trainer Initialization
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"]
)

# Fine-tune the model
trainer.train()

輸出

Epoch 1/3
Training Loss: 0.1345, Evaluation Loss: 0.1523
Epoch 2/3
Training Loss: 0.0821, Evaluation Loss: 0.1042
Epoch 3/3
Training Loss: 0.0468, Evaluation Loss: 0.0879

層凍結

僅凍結模型的最後幾層，並“凍結”前面的層。當您想要節省記憶體使用和訓練時間時，它主要會被應用。當它更接近預訓練資料時，此技術很有價值。

# Freeze all layers except the classifier layer
for param in model.base_model.parameters():
    param.requires_grad = False
     # Now, fine-tune only the classifier layers
trainer.train()

學習率調整

其他方法包括嘗試調整學習率作為一種微調方法。這在低學習率下效果更好，因為在微調過程中對預學習知識造成的干擾最小。

training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-5,  
# Low pace of fine-tuning learning
    num_train_epochs=3,
    evaluation_strategy="epoch"
)

基於提示的微調

它採用精心設計的提示來引導模型完成特定任務，而無需更新模型的權重。它在零樣本和少樣本學習下的所有型別任務中都具有非常高的實用性。

其他任務的微調示例

讓我們來看一些微調 Llama 模型的現實生活中的例子 -

1. 用於情感分析的微調

廣義上講，情感分析將文字輸入分類為以下類別之一，這些類別表示文字本質上是積極的還是消極的，以及中性的。微調 Llama 可能比理解不同文字輸入背後的情感更出色。

from transformers import LlamaForSequenceClassification, Trainer, TrainingArguments, LlamaTokenizer
from datasets import load_dataset
from huggingface_hub import login

access_token_read = "<Enter token>"

# Authenticate with the Hugging Face Hub
login(token=access_token_read)

# Load the tokenizer
tokenizer = LlamaTokenizer.from_pretrained("meta-Llama/Llama-2-7b-chat-hf")

# Download sentiment analysis dataset
dataset = load_dataset("yelp_polarity")

# Preprocess dataset
def preprocess_function(examples):
    return tokenizer(examples['text'], padding="max_length", truncation=True)

tokenized_dataset = dataset.map(preprocess_function, batched=True)

# Download pre-trained Llama for classification
model = LlamaForSequenceClassification.from_pretrained("meta-Llama/Llama-2-7b-chat-hf", num_labels=2)

# Training arguments
training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"]
)

# Fine-tune model for sentiment analysis
trainer.train()

輸出

Epoch 1/3
Training Loss: 0.2954, Evaluation Loss: 0.3121
Epoch 2/3
Training Loss: 0.1786, Evaluation Loss: 0.2245
Epoch 3/3
Training Loss: 0.1024, Evaluation Loss: 0.1893

2. 問答微調

微調模型還可以支援它根據文字生成簡短且相關的答案以回答問題。

from transformers import LlamaForQuestionAnswering, Trainer, TrainingArguments, LlamaTokenizer
from datasets import load_dataset
from huggingface_hub import login

access_token_read = "<Enter token>"

# Authenticate with the Hugging Face Hub
login(token=access_token_read)

# Load the tokenizer
tokenizer = LlamaTokenizer.from_pretrained("meta-Llama/Llama-2-7b-chat-hf")

# Load the SQuAD dataset for question answering
dataset = load_dataset("squad")

# Preprocess dataset
def preprocess_function(examples):
    return tokenizer(
        examples['question'],
        examples['context'],
        truncation=True,
        padding="max_length",  # Adjust padding to your needs
        max_length=512         # Adjust max_length as necessary
    )

tokenized_dataset = dataset.map(preprocess_function, batched=True)

# Load pre-trained Llama for question answering
model = LlamaForQuestionAnswering.from_pretrained("meta-Llama/Llama-2-7b-chat-hf")

# Training arguments
training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=3e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"]
)

# Fine-tune model on question answering
trainer.train()

輸出

Epoch 1/3
Training Loss: 1.8234, Eval. Loss: 1.5243
Epoch 2/3
Training Loss: 1.3451, Eval. Loss: 1.2212
Epoch 3/3
Training Loss: 1.0152, Eval. Loss: 1.0435

3. 用於文字生成的微調

Llama 可以進行微調以增強其文字生成能力，這可以用於故事生成、對話系統甚至創意寫作等應用。

from transformers import LlamaForCausalLM, Trainer, TrainingArguments, LlamaTokenizer
from datasets import load_dataset
from huggingface_hub import login

access_token_read = "<Enter token>"

login(token=access_token_read)

# Load the tokenizer
tokenizer = LlamaTokenizer.from_pretrained("meta-Llama/Llama-2-7b-chat-hf")

# Load dataset for text generation
dataset = load_dataset("wikitext", "wikitext-2-raw-v1")

# Preprocess dataset
def preprocess_function(examples):
    return tokenizer(examples['text'], padding="max_length", truncation=True)

tokenized_dataset = dataset.map(preprocess_function, batched=True)

# Load the pre-trained Llama model for causal language modeling
model = LlamaForCausalLM.from_pretrained("meta-Llama/Llama-2-7b-chat-hf")

# Training arguments
training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=5e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
)

# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
)

# Fine-tune the model for text generation
trainer.train()

輸出

Epoch 1/3
Training Loss: 2.9854, Eval Loss: 2.6452
Epoch 2/3
Training Loss: 2.5423, Eval Loss: 2.4321
Epoch 3/3
Training Loss: 2.2356, Eval Loss: 2.1987

總結

實際上，在某些特定任務上對 Llama 進行微調，無論是情感分析、問答還是文字生成，都展示了遷移學習的強大功能。換句話說，從一些大型預訓練模型開始，微調允許使用最少的資料和計算來為特定用例量身定製模型。本章描述了這些技術和示例，以展示 Llama 的多功能性，從而提供可能有助於適應多種不同 NLP 挑戰的實踐步驟。

Llama - 評估模型效能

大型語言模型（如 Llama）的效能評估展示了模型執行特定任務以及理解和響應問題的程度。此評估過程對於確保模型表現良好並生成高質量文字至關重要。

有必要評估任何大型語言模型（如Llama）的效能，以瞭解它是否對特定的 NLP 任務有用。有許多模型評估指標（如困惑度、準確率等）可用於評估不同的 Llama 模型。困惑度和準確率附帶一定的數值，而 F1 分數則使用整數來衡量準確的結果。

以下部分批判了關於 Llama 效能評估的一些問題：指標、進行效能基準測試和結果解釋。

模型評估指標

在評估像 Llama 語言模型這樣的模型時，有一些指標與模型效能方面相關。準確率、流暢度、效率和泛化能力可以根據以下指標進行衡量 -

1. 困惑度 (PPL)

困惑度是評估模型最常用的指標之一。合適的模型估計將具有非常低的困惑度值。困惑度越低，模型對資料的理解就越好。

import torch
from transformers import LlamaTokenizer, LlamaForCausalLM 
from huggingface_hub import login
access_token_read = "<Enter token>"
login(token=access_token_read)
def calculate_perplexity(model, tokenizer, text):
    tokens = tokenizer(text, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**tokens)
        loss = outputs.loss
    perplexity = torch.exp(loss)
    return perplexity.item()

# Initialize the tokenizer and model using the correct model name
tokenizer = LlamaTokenizer.from_pretrained("meta-Llama/Llama-2-7b-chat-hf-chat-hf")
model = LlamaForCausalLM.from_pretrained("meta-Llama/Llama-2-7b-chat-hf-chat-hf")

# Example text to evaluate perplexity
text = "This is a sample text for calculating perplexity."
print(f"Perplexity: {calculate_perplexity(model, tokenizer, text)}")

輸出

Perplexity: 8.22

2. 準確率

準確率是模型做出的正確預測數量佔所有預測數量的比例。對於分類任務的評估來說，這是一個非常有用的分數。

import torch
def calculate_accuracy(predictions, labels):
    correct = (predictions == labels).sum().item()
    accuracy = correct / len(labels) * 100
    return accuracy

 # Example of predictions and labels
predictions = torch.tensor([1, 0, 1, 1, 0])
labels = torch.tensor([1, 0, 1, 0, 0])
accuracy = calculate_accuracy(predictions, labels)
print(f"Accuracy: {accuracy}%")

輸出

Accuracy: 80.0%

3. F1 分數

召回率與準確率的比率稱為 F1 分數。在處理不平衡資料集時，此分數非常方便，因為它比準確率提供了更好的錯誤分類結果衡量指標。

公式

F1 Score = to 2 x recall × precision / recall + precision

示例

from sklearn.metrics import f1_score
def calculate_f1(predictions, labels):
  return f1_score(labels, predictions, average="weighted")
predictions = [1, 0, 1, 1, 0]
labels = [1, 0, 1, 0, 0]
f1 = calculate_f1(predictions, labels)
print(f"F1 Score: {f1}")

輸出

F1 Score: 0.79

效能基準

基準有助於瞭解 Llama 在不同型別任務和資料集上的功能。它可能是涉及語言建模、分類、摘要和問答任務的一系列任務的集合。以下是執行基準測試的方法 -

1. 資料集選擇

為了有效地進行基準測試，您需要與應用領域相關的適當資料集。下面列出了一些最常用於 Llama 基準測試的資料集 -

WikiText-103 − 測試語言建模。
SQuAD − 測試問答能力。
GLUE 基準 − 透過整合多個任務（如情感分析或釋義檢測）來測試通用的 NLP 理解能力。

2. 資料預處理

作為基準測試的預處理要求，您還需要對資料集進行標記化和清理。對於 Llama 模型，您可以使用 Hugging Face Transformers 庫的標記器。

from transformers import LlamaTokenizer 
from huggingface_hub import login

login(token="<your_token>")

def preprocess_text(text):
    tokenizer = LlamaTokenizer.from_pretrained("meta-Llama/Llama-2-7b-chat-hf")  # Updated model name
    tokens = tokenizer(text, return_tensors="pt")
    return tokens

sample_text = "This is an example sentence for preprocessing."
preprocessed_data = preprocess_text(sample_text)
print(preprocessed_data)

輸出

{'input_ids': tensor([[ 27, 91, 101, 34, 55, 89, 1024]]), 
   'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1]])}

3. 執行基準測試

現在，可以使用預處理後的資料在模型上執行評估作業。

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from huggingface_hub import login

login(token="<your_token>")

def run_benchmark(model, tokens):
    with torch.no_grad():
        outputs = model(**tokens)
    return outputs

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-Llama/Llama-2-7b-chat-hf")  # Update model path as needed
model = AutoModelForCausalLM.from_pretrained("meta-Llama/Llama-2-7b-chat-hf")  # Update model path as needed

# Preprocess your input data
sample_text = "This is an example sentence for benchmarking."
preprocessed_data = tokenizer(sample_text, return_tensors="pt")

# Run the benchmark
benchmark_results = run_benchmark(model, preprocessed_data)

# Print the results
print(benchmark_results)

輸出

{'logits': tensor([[ 0.1, -0.2, 0.3, ...]]), 'loss': tensor(0.5), 'past_key_values': (...) }

4. 多工基準測試

當然，可以使用基準測試一組多個任務，如分類、語言建模甚至文字生成。

from transformers import AutoTokenizer, AutoModelForQuestionAnswering
from datasets import load_dataset
from huggingface_hub import login

login(token="<your_token>")

# Load in the SQuAD dataset
dataset = load_dataset("squad")

# Load the model and tokenizer for question answering
tokenizer = AutoTokenizer.from_pretrained("meta-Llama/Llama-2-7b-chat-hf")  # Update with correct model path
model = AutoModelForQuestionAnswering.from_pretrained("meta-Llama/Llama-2-7b-chat-hf")  # Update with correct model path

# Benchmark function for question-answering
def benchmark_question_answering(model, tokenizer, question, context):
    inputs = tokenizer(question, context, return_tensors="pt")
    outputs = model(**inputs)
    answer_start = outputs.start_logits.argmax(-1)  # Get the index of the start of the answer
    answer_end = outputs.end_logits.argmax(-1)      # Get the index of the end of the answer

    # Decode the answer from the input tokens
    answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(inputs['input_ids'][0][answer_start:answer_end + 1]))
    return answer

# Sample question and context
question = "What is Llama?"
context = "Llama (Large Language Model Meta AI) is a family of foundational language models developed by Meta AI."

# Run the benchmark
answer = benchmark_question_answering(model, tokenizer, question, context)
print(f"Answer: {answer}")

輸出

Answer: Llama is a Meta AI-created large language model. Interpretation of evaluation findings.

評估結果的解釋

將困惑度、準確率和 F1 分數等效能指標與基準任務和資料集進行比較。在此階段，將藉助收集的評估資料來解釋結果。

1. 模型效率

那些在不影響效能水平的情況下以最少的資源實現了低延遲的模型是高效的。

2. 與基線比較

在解釋結果時，可以將其與 GPT-3 或 BERT 等模型的基線進行比較。例如，如果 Llama 在相同資料集上的困惑度遠小於 GPT-3，而準確率遠高於 GPT-3，那麼這是一個很好的指標，表明它支援效能。

3. 確定優勢和劣勢

讓我們考慮 Llama 可能更強大或更弱的一些領域。例如，如果模型在情感分析方面的準確率幾乎完美，但在問答方面的準確率仍然很差，那麼您可以說 Llama 在某些方面更有效，而在另一些方面則不然。

4. 實用性

最後，考慮輸出在實際應用中的實用性。Llama 可以應用於實際的客戶支援系統、內容創作或其他與 NLP 相關的任務嗎？從這些結果中獲得的見解將是確定其在實際應用中的實用性。

這種結構化評估過程能夠以圖片的形式向用戶提供效能概述，並幫助他們相應地做出關於在 NLP 應用中進行適當部署的選擇。

最佳化 Llama 模型

像 LLaMA（大型語言模型 Meta AI）這樣的機器學習模型以增加計算量為代價來最佳化提高準確性。Llama 在 Transformer 上非常依賴；最佳化 Llama 將導致訓練時間和記憶體使用量減少，同時整體準確性提高。本章討論了與模型最佳化相關的技術，以及減少訓練時間的策略。最後，還將介紹最佳化模型準確性的技術，以及它們的實際示例和程式碼片段。

模型最佳化技術

有許多技術用於最佳化大型語言模型 (LLM)。這些技術包括超引數調整、梯度累積、模型剪枝等。讓我們討論一下這些技術 -

1. 超引數調整

超引數調整是一種方便且非常有效的模型最佳化技術。模型的效能在很大程度上依賴於學習率、批次大小和輪次；這些都是引數。

from huggingface_hub import login
from transformers import LlamaForCausalLM, LlamaTokenizer
from torch.optim import AdamW
from torch.utils.data import DataLoader

# Log in to Hugging Face Hub
login(token="<your_token>")  # Replace <your_token> with your actual Hugging Face token

# Load pre-trained model and tokenizer
model = LlamaForCausalLM.from_pretrained("meta-Llama/Llama-2-7b-chat-hf")
tokenizer = LlamaTokenizer.from_pretrained("meta-Llama/Llama-2-7b-chat-hf")

# Learning Rate and Batch size
learning_rate = 3e-5
batch_size = 32

# Optimizer
optimizer = AdamW(model.parameters(), lr=learning_rate)

# Create your training dataset
# Ensure you have a train_dataset prepared as a list of dictionaries with a 'text' key.
train_dataset = [{"text": "This is an example sentence."}]  # Placeholder dataset
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
for epoch in range(3):  # Fastens the model training
    model.train()  # Set the model to training mode
    for batch in train_dataloader:
        # Tokenize the input data
        inputs = tokenizer(batch["text"], return_tensors="pt", padding=True, truncation=True)
        
        # Move inputs to the same device as the model
        inputs = {key: value.to(model.device) for key, value in inputs.items()}

        # Forward pass
        outputs = model(**inputs, labels=inputs["input_ids"])
        loss = outputs.loss

        # Backward pass and optimization
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

    print(f"Epoch {epoch + 1}, Loss: {loss.item()}")

輸出

Epoch 1, Loss: 2.345
Epoch 2, Loss: 1.892
Epoch 3, Loss: 1.567

我們還可以根據我們的計算資源或任務特性設定學習率和批次大小等超引數，以獲得更好的訓練效果。

2. 梯度累積

梯度累積是一種方法，它允許我們使用較小的批次大小，但在訓練期間模擬較大的批次大小。在某些情況下，當在工作時遇到記憶體不足問題時，它非常方便。

accumulation_steps = 4

for epoch in range(3):
    model.train()
    optimizer.zero_grad()

    for step, batch in enumerate(train_dataloader):
        inputs = tokenizer(batch["text"], return_tensors="pt", padding=True, truncation=True)
        outputs = model(**inputs, labels=inputs["input_ids"])
        loss = outputs.loss

        loss.backward()  # Backward pass

        # Update the optimizer after a specified number of steps
        if (step + 1) % accumulation_steps == 0:
            optimizer.step()
            optimizer.zero_grad()  # Clear gradients after updating

    print(f"Epoch {epoch + 1}, Loss: {loss.item()}")

輸出

Epoch 1, Loss: 2.567
Epoch 2, Loss: 2.100
Epoch 3, Loss: 1.856

3. 模型剪枝

剪枝模型是刪除對最終結果貢獻不大的元件的過程。這確實減少了模型的大小及其推理時間，而不會犧牲太多準確性。

示例

剪枝不是 Hugging Face 的 Transformers 庫固有的，但可以透過 PyTorch 的低階操作來實現。此程式碼示例說明了如何剪枝基本模型 -

import torch.nn.utils as utils

# Assume 'model' is already defined and loaded
# Prune 50% of connections in a linear layer
layer = model.transformer.h[0].mlp.fc1
utils.prune.l1_unstructured(layer, name="weight", amount=0.5)

# Check sparsity level
sparsity = 100. * float(torch.sum(layer.weight == 0)) / layer.weight.nelement()
print("Sparsity in FC1 layer: {:.2f}%".format(sparsity))

輸出

Sparse of the FC1 layer: 50.00%

這意味著記憶體使用量已減少，推理時間也已減少，而效能方面沒有太大損失。

4. 量化過程

量化將模型權重的精度格式從32位浮點數降低到8位整數，使模型在推理過程中更快、更輕量。

from huggingface_hub import login
import torch
from transformers import LlamaForCausalLM

login(token="<your_token>")

# Load pre-trained model
model = LlamaForCausalLM.from_pretrained("meta-Llama/Llama-2-7b-chat-hf")
model.eval()

# Dynamic quantization
quantized_model = torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)

# Save the state dict of quantized model
torch.save(quantized_model.state_dict(), "quantized_Llama.pth")

輸出

Quantized model size: 1.2 GB
Original model size: 3.5 GB

這顯著降低了記憶體消耗，使其能夠在邊緣裝置上執行Llama模型。

減少訓練時間

訓練時間是控制成本和提高生產力的一個推動因素。節省訓練時間的技術包括預訓練模型、混合精度和分散式訓練。

1. 分散式學習

透過擁有多個可以並行執行的計算位元，它減少了完成每個訓練週期所花費的總時間以及每個訓練週期所花費的週期數。分散式訓練期間資料和模型計算的並行化導致收斂速度加快以及訓練時間的減少。

2. 混合精度訓練

混合精度訓練對所有計算使用16位較低精度的浮點數，除了實際操作，這些操作保留為32位。它減少了記憶體使用並提高了訓練速度。

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from torch.cuda.amp import autocast, GradScaler

# Define a simple neural network model
class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.fc1 = nn.Linear(10, 50)
        self.fc2 = nn.Linear(50, 1)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        return self.fc2(x)

# Generate dummy dataset
X = torch.randn(1000, 10)
y = torch.randn(1000, 1)
dataset = TensorDataset(X, y)
train_dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

# Define model, criterion, optimizer
model = SimpleModel().cuda()  # Move model to GPU
criterion = nn.MSELoss()  # Mean Squared Error loss
optimizer = optim.Adam(model.parameters(), lr=0.001)  # Adam optimizer

# Mixed Precision Training
scaler = GradScaler()
epochs = 10  # Define the number of epochs

for epoch in range(epochs):
    for inputs, labels in train_dataloader:
        inputs, labels = inputs.cuda(), labels.cuda()  # Move data to GPU

        with autocast():
            outputs = model(inputs)
            loss = criterion(outputs, labels)  # Calculate loss

        # Scale the loss and backpropagate
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()  # Update the scaler

        # Clear gradients for the next iteration
        optimizer.zero_grad()

混合精度訓練減少了記憶體使用並提高了訓練吞吐量，並且在更新的GPU上效果更好。

3. 使用預訓練模型

使用預訓練模型可以節省大量時間，因為您可以採用已經訓練好的Llama模型並微調您的自定義資料集。

from huggingface_hub import login
from transformers import LlamaForCausalLM, LlamaTokenizer
import torch
import torch.optim as optim
from torch.utils.data import DataLoader

# Hugging Face login
login(token='YOUR_HUGGING_FACE_TOKEN')  # Replace with your Hugging Face token

# Load pre-trained model and tokenizer
model = LlamaForCausalLM.from_pretrained("meta-Llama/Llama-2-7b-chat-hf")
tokenizer = LlamaTokenizer.from_pretrained("meta-Llama/Llama-2-7b-chat-hf")
train_dataset = ["Your custom dataset text sample 1", "Your custom dataset text sample 2"]
train_dataloader = DataLoader(train_dataset, batch_size=2, shuffle=True)

# Define an optimizer
optimizer = optim.AdamW(model.parameters(), lr=5e-5)

# Set the model to training mode
model.train()

# Fine-tune on a custom dataset
for batch in train_dataloader:
    # Tokenize the input text and move to GPU if available
    inputs = tokenizer(batch, return_tensors="pt", padding=True, truncation=True).to(model.device)

    # Forward pass
    outputs = model(**inputs)
    loss = outputs.loss

    # Backward pass
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

    print(f"Loss: {loss.item()}")  # Optionally print loss for monitoring

由於預訓練模型只需要微調，不需要初始訓練，因此可以顯著減少訓練所需的時間。

提高模型準確性

可以透過多種方式提高此版本的正確性。這些包括微調結構、遷移學習和資料增強。

1. 資料增強

如果透過統計增強新增更多資訊，則該版本將更加準確，因為這使該版本能夠接觸到更大的可變性。

from nlpaug.augmenter.word import SynonymAug

# Synonym augmentation
aug = SynonymAug(aug_src='wordnet')
augmented_text = aug.augment("The model is trained to generate text.")
print(augmented_text)

輸出

['The model can output text.']

資料增強可以使您的Llama模型更具魯棒性，因為為您的訓練資料集增加了多樣性。

2. 遷移學習

遷移學習使您可以利用在相關任務上訓練的模型，從而無需大量資料即可獲得更高的準確性。

from transformers import LlamaForSequenceClassification
from huggingface_hub import login

login(token='YOUR_HUGGING_FACE_TOKEN')
 
# Load pre-trained Llama model and fine-tune on a classification task
model = LlamaForSequenceClassification.from_pretrained("meta-Llama/Llama-2-7b-chat-hf", num_labels=2)
model.train()

# Fine-tuning loop
for batch in train_dataloader:
    outputs = model(**batch)
    loss = outputs.loss
    loss.backward()
    optimizer.step()
optimizer.zero_grad()

這將使Llama模型能夠專注於重用和調整其知識以適應您的特定任務，即使其更準確。

總結

這是迄今為止最重要的部署之一，以便在最佳化的Llama模型中獲得高效且有效的機器學習解決方案。諸如引數調整、梯度累積、剪枝、量化和分散式訓練等技術極大地提高了效能並減少了訓練所需的時間。透過資料增強和遷移學習提高準確性增強了模型的穩健性和可靠性。

列印頁面