Llama 入門

Llama 代表 大型語言模型 Meta AI。由 Meta AI 成立，其架構在 Transformer 的基礎上進行了改進，旨在處理自然語言處理中更復雜的問題。Llama 以一種賦予其類人特徵的方式生成文字，從而提高對語言的理解，以及更多功能，包括文字生成、翻譯、摘要等。

Llama 是一種能夠在比其同類 GPT-3 所需的資料集小得多的資料集上最佳化效能的模型。它旨在在較小的資料集上高效執行，從而使其能夠被更廣泛的使用者訪問，並且具有可擴充套件性。

Llama 架構概述

Transformer 模型是 Llama 的核心架構。它最初由 Vaswani 等人在名為“注意力就是你所需要的一切”的論文中提出，但它本質上是一個自迴歸模型。這意味著它一次生成一個標記，根據之前出現的標記預測序列中的下一個單詞。

Llama 架構的重要特性如下：

高效訓練 - Llama 能夠在小得多的資料集上高效訓練。因此，它特別適用於計算能力有限或資料可用性較低的研究和應用場景。
自迴歸結構 - 它逐個生成標記，使生成的文字高度連貫，因為每個後續標記都基於迄今為止的所有標記。
多頭自注意力 - 模型的注意力機制的設計方式是根據重要性為句子中的單詞分配不同的權重，因此它能夠理解輸入中的區域性和全域性上下文。
堆疊的 Transformer 層 - Llama 堆疊了許多 Transformer 塊，每個塊由一個自注意力機制和一個前饋神經網路組成。

為什麼選擇 Llama？

Llama 在計算效率方面實現了其模型容量的合理擬合。它可以生成非常長的連貫文字流，並執行幾乎任何任務，包括問答和摘要，直至語言翻譯等資源節約型活動。與一些其他大型語言模型（如 GPT-3）相比，Llama 模型更小且執行成本更低，因此這項工作能夠讓更多人參與。

Llama 變體

Llama 存在多種版本，所有這些版本都使用不同數量的引數進行訓練：

Llama-7B = 70 億引數
Llama-13B = 130 億引數
Llama-30B = 300 億引數
Llama-65B = 650 億引數

透過這樣做，使用者可以根據其硬體以及特定任務的需求選擇合適的模型版本。

瞭解模型的元件

Llama 的功能建立在一些高度關鍵的元件之上。讓我們討論每個元件，並考慮它們如何相互通訊以增強模型的整體效能。

嵌入層

Llama 的嵌入層將輸入標記對映到高維向量。因此，它捕獲了單詞之間的語義關係。這種對映背後的直覺是，在連續的向量空間中，語義上相似的標記彼此最接近。

嵌入層還透過將標記的形狀更改為變換層期望的維度來為後續的變換層準備輸入。

import torch
import torch.nn as nn
# Embedding layer
embedding = nn.Embedding(num_embeddings=10000, embedding_dim=256)
# Tokenized input (for example: "The future is bright")
input_tokens = torch.LongTensor([2, 45, 103, 567])
# Output embedding
embedding_output = embedding(input_tokens)
print(embedding_output)

輸出

tensor([[-0.4185, -0.5514, -0.8762,  ...,  0.7456,  0.2396,  2.4756],
        [ 0.7882,  0.8366,  0.1050,  ...,  0.2018, -0.2126,  0.7039],
        [ 0.3088, -0.3697,  0.1556,  ..., -0.9751, -0.0777, -1.3352],
        [ 0.7220, -0.7661,  0.2614,  ...,  1.2152,  1.6356,  0.6806]],
       grad_fn=<EmbeddingBackward0>)

這種詞嵌入表示還允許模型以複雜的方式理解標記之間如何相互關聯。

自注意力機制

Transformer 模型的自注意力是 Llama 的創新之處，它將注意力機制應用於句子的部分，並理解每個單詞與其他單詞的關係。在這種情況下，Llama 使用多頭注意力，將注意力機制分成多個頭，以便模型可以自由地探索輸入序列的部分。

因此，建立了查詢、鍵和值矩陣，模型根據這些矩陣選擇對每個單詞相對於其他單詞賦予多少權重（或注意力）。

import torch
import torch.nn.functional as F

# Sample query, key, value tensors
queries = torch.rand(1, 4, 16)  # (batch_size, seq_length, embedding_dim)
keys = torch.rand(1, 4, 16)
values = torch.rand(1, 4, 16)

# Compute scaled dot-product attention
scores = torch.bmm(queries, keys.transpose(1, 2)) / (16 ** 0.5)
attention_weights = F.softmax(scores, dim=-1)

# apply attention weights to values
output = torch.bmm(attention_weights, values)
print(output)

輸出

tensor([[[0.4782, 0.5340, 0.4079, 0.4829, 0.4172, 0.5398, 0.3584, 0.6369,
          0.5429, 0.7614, 0.5928, 0.5989, 0.6796, 0.7634, 0.6868, 0.5903],
         [0.4651, 0.5553, 0.4406, 0.4909, 0.3724, 0.5828, 0.3781, 0.6293,
          0.5463, 0.7658, 0.5828, 0.5964, 0.6699, 0.7652, 0.6770, 0.5583],
         [0.4675, 0.5414, 0.4212, 0.4895, 0.3983, 0.5619, 0.3676, 0.6234,
          0.5400, 0.7646, 0.5865, 0.5936, 0.6742, 0.7704, 0.6792, 0.5767],
         [0.4722, 0.5550, 0.4352, 0.4829, 0.3769, 0.5802, 0.3673, 0.6354,
          0.5525, 0.7641, 0.5722, 0.6045, 0.6644, 0.7693, 0.6745, 0.5674]]])

這種注意力機制使模型能夠“關注”序列的不同部分，從而使其能夠學習句子中單詞之間的長距離依賴關係。

多頭注意力

多頭注意力是自注意力的擴充套件，其中多個注意力頭並行應用。透過這樣做，每個注意力頭都選擇輸入的不同部分，確保實現資料中所有可能的依賴關係。

然後，它進入一個前饋網路，分別處理每個注意力結果。

import torch
import torch.nn as nn
import torch.nn.functional as F

class MultiHeadAttention(nn.Module):
   def __init__(self, dim_model, num_heads):
      super(MultiHeadAttention, self).__init__()
      self.num_heads = num_heads
      self.dim_head = dim_model // num_heads

        self.query = nn.Linear(dim_model, dim_model)
        self.key = nn.Linear(dim_model, dim_model)
        self.value = nn.Linear(dim_model, dim_model)
        self.out = nn.Linear(dim_model, dim_model)
        
   def forward(self, x):
      B, N, C = x.shape
      queries = self.query(x).reshape(B, N, self.num_heads, self.dim_head).transpose(1, 2)
      keys = self.key(x).reshape(B, N, self.num_heads, self.dim_head).transpose(1, 2)
      values = self.value(x).reshape(B, N, self.num_heads, self.dim_head).transpose(1, 2)
      intention = torch.matmul(queries, keys.transpose(-2, -1)) / (self.dim_head ** 0.5)
      attention_weights = F.softmax(intention, dim=-1)
      out = torch.matmul(attention_weights, values).transpose(1, 2).reshape(B, N, C)
      return self.out(out)

# Multiple attention building and calling
attention_layer = MultiHeadAttention(128, 8)
output = attention_layer(torch.rand(1, 10, 128))  # (batch_size, seq_length, embedding_dim)
print(output)

輸出

tensor([[[-0.1015, -0.1076,  0.2237,  ...,  0.1794, -0.3297,  0.1177],
         [-0.1028, -0.1068,  0.2219,  ...,  0.1798, -0.3307,  0.1175],
         [-0.1018, -0.1070,  0.2228,  ...,  0.1793, -0.3294,  0.1183],
         ...,
         [-0.1021, -0.1075,  0.2245,  ...,  0.1803, -0.3312,  0.1171],
         [-0.1041, -0.1070,  0.2232,  ...,  0.1817, -0.3308,  0.1184],
         [-0.1027, -0.1087,  0.2223,  ...,  0.1801, -0.3295,  0.1179]]],
       grad_fn=<ViewBackward0>)

前饋網路

前饋網路可能是 Transformer 塊中最簡單但最基本的部分。顧名思義，它對輸入序列應用某種非線性變換；因此，模型可以學習更復雜的模式。

Llama 的每一層注意力都使用前饋網路進行這種變換。

class FeedForward(nn.Module):
   def __init__(self, dim_model, dim_ff):
      super(FeedForward, self).__init__() #This line was incorrectly indented
      self.fc1 = nn.Linear(dim_model, dim_ff)
      self.fc2 = nn.Linear(dim_ff, dim_model)
      self.relu = nn.ReLU()
   def forward(self, x):
      return self.fc2(self.relu(self.fc1(x)))

# define and use the feed-forward network
ffn = FeedForward(128, 512)
ffn_output = ffn(torch.rand(1, 10, 128))  # (batch_size, seq_length, embedding_dim)
print(ffn_output)

輸出

tensor([[[ 0.0222, -0.1035, -0.1494,  ...,  0.0891,  0.2920, -0.1607],
         [ 0.0313, -0.2393, -0.2456,  ...,  0.0704,  0.1300, -0.1176],
         [-0.0838, -0.0756, -0.1824,  ...,  0.2570,  0.0700, -0.1471],
         ...,
         [ 0.0146, -0.0733, -0.0649,  ...,  0.0465,  0.2674, -0.1506],
         [-0.0152, -0.0657, -0.0991,  ...,  0.2389,  0.2404, -0.1785],
         [ 0.0095, -0.1162, -0.0693,  ...,  0.0919,  0.1621, -0.1421]]],
       grad_fn=<ViewBackward0>)

使用 Llama 模型建立標記的步驟

在訪問 Llama 模型之前，您需要在 Hugging Face 上建立令牌。我們使用 Llama 2 模型，因為它比較輕量級。您可以選擇任何模型。請按照以下步驟開始。

步驟 1：註冊 Hugging Face 賬戶（如果您尚未註冊）

在 Hugging Face 首頁上，點選“註冊”。
對於所有尚未建立賬戶的使用者，請立即建立一個。

步驟 2：填寫請求表單以訪問 Llama 模型

要下載和使用 Llama 模型，您需要填寫一個請求表單。為此：

訪問 Llama 下載頁面，並填寫所有必填欄位。

Fill out Request Form to Access to Llama Models

選擇您的模型（這裡我們為了簡單和輕量級使用 Llama 2）並點選表單中的下一步。
接受Llama 2 的條款和條件，然後點選“接受並繼續”。
您已完成設定。

步驟 3：獲取訪問令牌

訪問您的 Hugging Face 賬戶。
點選右上角的個人資料照片，您將進入“設定”頁面。
導航到訪問令牌
點選建立新令牌
- 例如將其命名為“Llama 訪問令牌”
- 勾選使用者許可權。範圍至少應設定為讀取以訪問受限模型。
- 點選建立令牌
複製令牌，您將在下一步中使用它。

步驟 4：使用令牌在指令碼中進行身份驗證

獲得 Hugging Face 令牌後，必須在 Python 指令碼中使用此令牌進行身份驗證。

首先，如果您尚未安裝，請安裝所需的軟體包：

!pip install transformers huggingface_hub torch

從 Hugging Face Hub 匯入登入方法，並使用您的令牌登入：

from huggingface_hub import login
# Set your_token to your token
login(token=" <your_token>")

或者，如果您不想互動式登入，可以在載入模型時直接在程式碼中傳遞您的令牌。

步驟 5：更新程式碼以使用令牌載入模型

使用您的令牌載入受限模型。

令牌可以直接傳遞給 from_pretrained() 方法。

from transformers import AutoModelForCausalLM, AutoTokenizer
from huggingface_hub import login
 
token = "your_token"
# Login with your token (put <your_token> in quotes)
login(token=token)
 
# Loading tokenizer and model from gated repository and using auth token
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf', token=token)
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-hf', token=token)

步驟 6：執行程式碼

插入令牌並登入或在模型載入函式中傳遞令牌後，您的指令碼現在應該能夠訪問受限儲存庫並從 Llama 模型中獲取文字。

執行您的第一個 Llama 指令碼

我們已經建立了令牌和其他身份驗證；現在是時候執行您的第一個 Llama 指令碼了。您可以使用預訓練的 Llama 模型進行文字生成。我們使用 Llama-2-7b-hf，它是 Llama 2 模型之一。

from transformers import AutoModelForCausalLM, AutoTokenizer
#import tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf', token=token)
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-hf', token=token)
#Encode input text and generate
input_text = "The future of AI is"
input_ids = tokenizer.encode(input_text, return_tensors="pt")
outputs = model.generate(input_ids, max_length=50, num_return_sequences=1)

# Decode and print output
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

輸出

The future of AI is a subject of great interest, and it is not surprising
that many people are interested in the subject. It is a very interesting
topic, and it is a subject that is likely to be discussed for many years to come

生成文字 - 上述指令碼生成一個文字序列，表示 Llama 如何解釋上下文以及建立連貫的寫作。

總結

憑藉其基於 Transformer 的架構、多頭注意力和自迴歸生成功能，Llama 令人印象深刻。計算效率和模型效能之間的平衡使得 Llama 適用於廣泛的自然語言處理任務。熟悉 Llama 最重要的元件和架構將使您有機會嘗試生成文字、翻譯、摘要等等。

列印頁面