如何使用TensorFlow和Python對Stack Overflow問題資料集進行文字向量化？

TensorFlow 是 Google 提供的一個機器學習框架。它是一個開源框架，與 Python 結合使用以實現演算法、深度學習應用程式等等。它被用於研究和生產目的。

可以使用以下程式碼行在 Windows 上安裝“tensorflow”包：

pip install tensorflow

張量是 TensorFlow 中使用的資料結構。它有助於連線流圖中的邊。此流圖稱為“資料流圖”。張量只不過是多維陣列或列表。

我們正在使用 Google Colaboratory 來執行以下程式碼。Google Colab 或 Colaboratory 幫助透過瀏覽器執行 Python 程式碼，並且需要零配置並免費訪問 GPU（圖形處理單元）。Colaboratory 建立在 Jupyter Notebook 之上。

示例

以下是程式碼片段：

print("1234 ---> ", int_vectorize_layer.get_vocabulary()[1289])
print("321 ---> ", int_vectorize_layer.get_vocabulary()[313])
print("Vocabulary size is : {}".format(len(int_vectorize_layer.get_vocabulary())))

print("The text vectorization is applied to the training dataset")
binary_train_ds = raw_train_ds.map(binary_vectorize_text)
print("The text vectorization is applied to the validation dataset")
binary_val_ds = raw_val_ds.map(binary_vectorize_text)
print("The text vectorization is applied to the test dataset")
binary_test_ds = raw_test_ds.map(binary_vectorize_text)

int_train_ds = raw_train_ds.map(int_vectorize_text)
int_val_ds = raw_val_ds.map(int_vectorize_text)
int_test_ds = raw_test_ds.map(int_vectorize_text)

程式碼來源 - https://www.tensorflow.org/tutorials/load_data/text

輸出

1234 ---> substring
321 ---> 20
Vocabulary size is : 10000
The text vectorization is applied to the training dataset
The text vectorization is applied to the validation dataset
The text vectorization is applied to the test dataset

解釋

作為最終的預處理步驟，“TextVectorization”層應用於訓練資料、測試資料和驗證資料集。

AmitDiwan

更新於：2021年1月18日

146 次瀏覽

開啟你的職業生涯

透過完成課程獲得認證

開始學習

如何使用TensorFlow和Python對Stack Overflow問題資料集進行文字向量化？

示例

輸出

解釋

開啟你的 職業生涯

開啟你的職業生涯