如何使用Python和TensorFlow載入包含Stack Overflow問題的dataset?


TensorFlow是Google提供的機器學習框架。它是一個開源框架,與Python結合使用,可以實現演算法、深度學習應用程式等等。它用於研究和生產目的。它具有最佳化技術,有助於快速執行復雜的數學運算。

這是因為它使用NumPy和多維陣列。這些多維陣列也稱為“張量”。該框架支援使用深度神經網路。它具有高度可擴充套件性,並附帶許多流行的dataset。它使用GPU計算並自動管理資源。它附帶大量的機器學習庫,並且得到良好的支援和文件記錄。該框架能夠執行深度神經網路模型、訓練它們以及建立預測相應dataset相關特徵的應用程式。

可以使用以下程式碼行在Windows上安裝“tensorflow”包:

pip install tensorflow

我們使用Google Colaboratory來執行以下程式碼。Google Colab或Colaboratory幫助透過瀏覽器執行Python程式碼,無需任何配置即可免費訪問GPU(圖形處理單元)。Collaboratory構建在Jupyter Notebook之上。以下是使用Python載入包含Stack Overflow問題的dataset的程式碼片段:

示例

batch_size = 32
seed = 42
print("The training parameters have been defined")
raw_train_ds = preprocessing.text_dataset_from_directory(
   train_dir,
   batch_size=batch_size,
   validation_split=0.25,
   subset='training',
   seed=seed)
for text_batch, label_batch in raw_train_ds.take(1):
   for i in range(10):
      print("Question: ", text_batch.numpy()[i][:100], '...')
      print("Label:", label_batch.numpy()[i])

程式碼來源:https://www.tensorflow.org/tutorials/load_data/text

輸出

The training parameters have been defined
Found 8000 files belonging to 4 classes.
Using 6000 files for training.
Question: b'"my tester is going to the wrong constructor i am new to programming so if i ask a
question that can' ...
Label: 1
Question: b'"blank code slow skin detection this code changes the color space to lab and using a
threshold finds' ...
Label: 3
Question: b'"option and validation in blank i want to add a new option on my system where i
want to add two text' ...
Label: 1
Question: b'"exception: dynamic sql generation for the updatecommand is not supported against
a selectcommand th' ...
Label: 0
Question: b'"parameter with question mark and super in blank, i\'ve come across a method that
is formatted like t' ...
Label: 1
Question: b'call two objects wsdl the first time i got a very strange wsdl. ..i would like to call the
object (i' ...
Label: 0
Question: b'how to correctly make the icon for systemtray in blank using icon sizes of any
dimension for systemt' ...
Label: 0
Question: b'"is there a way to check a variable that exists in a different script than the original
one? i\'m try' ...
Label: 3
Question: b'"blank control flow i made a number which asks for 2 numbers with blank and
responds with the corre' ...
Label: 0
Question: b'"credentials cannot be used for ntlm authentication i am getting
org.apache.commons.httpclient.auth.' ...
Label: 1

解釋

  • 資料從磁碟載入,並準備成適合訓練的形式。

  • “text_dataset_from_dataset”實用程式用於建立帶標籤的dataset。

  • “tf.Data”是一組功能強大的工具,用於構建輸入管道。

  • 目錄結構傳遞給“text_dataset_from_dataset”實用程式。

  • Stack Overflow問題dataset被分成訓練dataset和測試dataset。

  • 使用“validation_split”方法建立驗證集。

  • 標籤為0、1、2或3。

更新於:2021年1月18日

瀏覽量:137

啟動您的職業生涯

透過完成課程獲得認證

開始
廣告
© . All rights reserved.