使用 Python -NLTK 建立一個基本的硬編碼聊天機器人

什麼是聊天機器人？

近年來，聊天機器人越來越受歡迎，它們可以自動化使用者和軟體平臺之間的簡單對話。聊天機器人能夠響應使用者輸入，並理解自然語言輸入。Python-NLTK（自然語言工具包）是一個功能強大的庫，可用於執行自然語言處理 (NLP) 任務。在本教程中，我們將使用 Python-NLTK 建立一個簡單的硬編碼聊天機器人。

聊天機器人建立的核心概念是什麼？

聊天機器人建立的核心概念包括：

自然語言處理 (NLP) - 聊天機器人使用 NLP 來理解人類語言並解釋使用者的意圖。NLP 包括分詞、詞性標註和命名實體識別等技術。
對話管理 - 對話管理負責管理對話的流程並在對話的多個回合中保持上下文。
機器學習 - 機器學習用於訓練聊天機器人識別資料中的模式、做出預測並隨著時間的推移而改進。監督學習、無監督學習和強化學習等技術用於聊天機器人的開發。
API 和整合 - 聊天機器人通常需要與外部服務和 API 整合以提供資訊或為使用者完成任務。
使用者體驗 (UX) - 使用者體驗對於聊天機器人至關重要，因為它們應該易於使用且直觀。UX 考慮因素包括設計對話流程、選擇合適的響應型別以及向用戶提供清晰且有幫助的反饋。

先決條件

在我們深入研究任務之前，需要在您的系統上安裝一些內容：

推薦設定列表：

pip install pandas, matplotlib
預計使用者將能夠訪問任何獨立的 IDE，例如 VS-Code、PyCharm、Atom 或 Sublime text。
也可以使用線上 Python 編譯器，例如 Kaggle.com、Google Cloud Platform 或任何其他編譯器。
更新版本的 Python。在撰寫本文時，我使用了 3.10.9 版本。
瞭解如何使用 Jupyter notebook。
虛擬環境的知識和應用將是有益的，但不是必需的。
還預計使用者將對統計學和數學有很好的理解。
安裝 Python-NLTK(http://www.nltk.org/install.html)。
熟悉文字處理（分詞、詞形還原、詞幹提取）。

安裝所需的庫

首先，我們需要安裝開發聊天機器人所需的庫。聊天機器人開發需要 NLTK、Regex、random 和 string 庫。要安裝這些庫，我們可以使用 pip 命令。

!pip install nltk
!pip install regex
!pip install random
!pip install string

匯入所需的庫

安裝必要的庫後，我們需要在 Python notebook 中匯入這些庫。以下是匯入這些庫的程式碼。

import nltk
import re
import random
import string
from string import punctuation

資料預處理

安裝並匯入所需的包後，我們需要預處理資料。預處理包括刪除所有不必要的資料，將資料分詞成句子，以及刪除停用詞。停用詞是在對話上下文中幾乎沒有意義或沒有意義的最常見詞語，例如“a”、“is”等。

# Download stopwords from nltk
nltk.download('punkt')
nltk.download('stopwords')
stop_words = set(nltk.corpus.stopwords.words('english'))

def sentence_tokenizer(data):
   # Function for Sentence Tokenization
   return nltk.sent_tokenize(data.lower())

def word_tokenizer(data):
   # Function for Word Tokenization
   return nltk.word_tokenize(data.lower())

def remove_noise(word_tokens):
   # Function to remove stop words and punctuation
   cleaned_tokens = []
   for token in word_tokens:
      if token not in stop_words and token not in punctuation:
         cleaned_tokens.append(token)
   return cleaned_tokens

構建聊天機器人

現在我們已經對資料進行了預處理，我們準備構建聊天機器人。聊天機器人的流程可以概括為以下步驟：

定義模式和響應列表
初始化一個無限 while 迴圈
讓使用者輸入查詢
對查詢進行分詞並刪除停用詞
將查詢與其中一個模式匹配並返回響應。

# Define the Patterns and Responses
patterns = [
   (r'hi|hello|hey', ['Hi there!', 'Hello!', 'Hey!']),
   (r'bye|goodbye', ['Bye', 'Goodbye!']),
   (r'(\w+)', ['Yes, go on', 'Tell me more', 'I’m listening...']),
   (r'(\?)', ['I’m sorry, but I can’t answer that','Please ask me another question', 'I’m not sure what you mean.'])
]

# Function to generate response for the user input
def generate_response(user_input):
   # Append User Input to chat history
   conversation_history.append(user_input)
   # Generate Random response
   response = random.choice(responses)
   return response

# Main loop of chatbot
conversation_history = []
responses = [response for pattern, response in patterns]
while True:
   # User Input
   user_input = input("You: ")
   # End the Loop if the User Says Bye or Goodbye
   if user_input.lower() in ['bye', 'goodbye']:
      print('Chatbot: Goodbye!')
      break
   # Tokenize the User Input
   user_input_tokenized = word_tokenizer(user_input)
   # Remove Stop Words
   user_input_nostops = remove_noise(user_input_tokenized)
   # Process Query and Generate Response
   chatbot_response = generate_response(user_input_nostops)
   # Print Response
   print('Chatbot:', chatbot_response)

最終程式，程式碼

import nltk
import re
import random
import string

from string import punctuation

# Download stopwords from nltk
nltk.download('punkt')
nltk.download('stopwords')
stop_words = set(nltk.corpus.stopwords.words('english'))

def sentence_tokenizer(data):
   # Function for Sentence Tokenization
   return nltk.sent_tokenize(data.lower())

def word_tokenizer(data):
   # Function for Word Tokenization
   return nltk.word_tokenize(data.lower())

def remove_noise(word_tokens):
   # Function to remove stop words and punctuation
   cleaned_tokens = []
   for token in word_tokens:
      if token not in stop_words and token not in punctuation:
         cleaned_tokens.append(token)
   return cleaned_tokens

# Define the Patterns and Responses
patterns = [
   (r'hi|hello|hey', ['Hi there!', 'Hello!', 'Hey!']),
   (r'bye|goodbye', ['Bye', 'Goodbye!']),
   (r'(\w+)', ['Yes, go on', 'Tell me more', 'I’m listening...']),
   (r'(\?)', ['I’m sorry, but I can’t answer that', 'Please ask me another question', 'I’m not sure what you mean.'])
]

# Function to generate response for the user input
def generate_response(user_input):
   # Append User Input to chat history
   conversation_history.append(user_input)
   # Generate Random response
   response = random.choice(responses)
   return response

# Main loop of chatbot
conversation_history = []
responses = [response for pattern, response in patterns]
while True:
   # User Input
   user_input = input("You: ")
   # End the Loop if the User Says Bye or Goodbye
   if user_input.lower() in ['bye', 'goodbye']:
      print('Chatbot: Goodbye!')
      break
   # Tokenize the User Input
   user_input_tokenized = word_tokenizer(user_input)
   # Remove Stop Words
   user_input_nostops = remove_noise(user_input_tokenized)
   # Process Query and Generate Response
   chatbot_response = generate_response(user_input_nostops)
   # Print Response
   print('Chatbot:', chatbot_response)