Python——使用Word2Vec進行詞嵌入

詞嵌入是一種語言模型技術，用於將單詞對映到實數向量。它使用多個維度在向量空間中表示單詞或短語。可以使用神經網路、共現矩陣、機率模型等各種方法生成詞嵌入。

Word2Vec 由用於生成單詞嵌入的模型組成。這些模型是淺層兩層神經網路，具有一個輸入層、一個隱藏層和一個輸出層。

示例

# importing all necessary modules
from nltk.tokenize import sent_tokenize, word_tokenize
import warnings
warnings.filterwarnings(action = 'ignore')
import gensim
from gensim.models import Word2Vec  
#  Reads ‘alice.txt’ file
sample = open("C:\Users\Vishesh\Desktop\alice.txt", "r")
s = sample.read()  
# Replaces escape character with space
f = s.replace("\n", " ")
data = []  
# iterate through each sentence in the file
for i in sent_tokenize(f):
   temp = []    
   # tokenize the sentence into words
   for j in word_tokenize(i):
      temp.append(j.lower())  
   data.append(temp)  
# Create CBOW model
model1 = gensim.models.Word2Vec(data, min_count = 1,  size = 100, window = 5)  
# Print results
print("Cosine similarity between 'alice' " + "and 'wonderland' - CBOW : ", model1.similarity('alice', 'wonderland'))    
print("Cosine similarity between 'alice' " + "and 'machines' - CBOW : ", model1.similarity('alice', 'machines'))  
# Create Skip Gram model
model2 = gensim.models.Word2Vec(data, min_count = 1, size = 100, window =5, sg = 1)
# Print results
print("Cosine similarity between 'alice' " + "and 'wonderland' - Skip Gram : ", model2.similarity('alice', 'wonderland'))      
print("Cosine similarity between 'alice' " + "and 'machines' - Skip Gram : ", model2.similarity('alice', 'machines'))

Nizamuddin Siddiqui

更新於：08-8-2020

515次瀏覽

開啟你的職業生涯

完成課程即可獲得認證

開始