如何在 Python 中使用 TensorFlow 來準備 IMDB 資料集用於訓練?
TensorFlow 是 Google 提供的一個機器學習框架。它是一個開源框架,與 Python 結合使用以實現演算法、深度學習應用程式等等。它用於研究和生產目的。它具有最佳化技術,有助於快速執行復雜的數學運算。
可以使用以下程式碼行在 Windows 上安裝“tensorflow”包:
pip install tensorflow
張量是 TensorFlow 中使用的一種資料結構。它有助於連線流圖中的邊。此流圖稱為“資料流圖”。張量不過是多維陣列或列表。
“IMDB”資料集包含超過 50,000 部電影的評論。此資料集通常與自然語言處理相關的操作一起使用。
我們使用 Google Colaboratory 來執行以下程式碼。Google Colab 或 Colaboratory 幫助透過瀏覽器執行 Python 程式碼,並且需要零配置和免費訪問 GPU(圖形處理單元)。Colaboratory 建立在 Jupyter Notebook 之上。
以下是 IMDB 資料集的程式碼片段:
示例
def vectorize_text(text, label):
text = tf.expand_dims(text, −1)
return vectorize_layer(text), label
text_batch, label_batch = next(iter(raw_train_ds))
first_review, first_label = text_batch[0], label_batch[0]
print("Review is ", first_review)
print("Label is ", raw_train_ds.class_names[first_label])
print("Vectorized review is ", vectorize_text(first_review, first_label))
print("1222 −−−> ",vectorize_layer.get_vocabulary()[1222])
print(" 451 −−−> ",vectorize_layer.get_vocabulary()[451])
print('Vocabulary size: {}'.format(len(vectorize_layer.get_vocabulary())))
train_ds = raw_train_ds.map(vectorize_text)
val_ds = raw_val_ds.map(vectorize_text)
test_ds = raw_test_ds.map(vectorize_text)程式碼來源 − https://www.tensorflow.org/tutorials/keras/text_classification
輸出
Review is tf.Tensor(b'Silent Night, Deadly Night 5 is the very last of the series, and like part 4, it\'s unrelated to the first three except by title and the fact that it\'s a Christmas-themed horror flick.<br /><br />Except to the oblivious, there\'s some obvious things going on here...Mickey Rooney plays a toymaker named Joe Petto and his creepy son\'s name is Pino. Ring a bell, anyone? Now, a little boy named Derek heard a knock at the door one evening, and opened it to find a present on the doorstep for him. Even though it said "don\'t open till Christmas", he begins to open it anyway but is stopped by his dad, who scolds him and sends him to bed, and opens the gift himself. Inside is a little red ball that sprouts Santa arms and a head, and proceeds to kill dad. Oops, maybe he should have left well-enough alone. Of course Derek is then traumatized by the incident since he watched it from the stairs, but he doesn\'t grow up to be some killer Santa, he just stops talking.<br /><br />There\'s a mysterious stranger lurking around, who seems very interested in the toys that Joe Petto makes. We even see him buying a bunch when Derek\'s mom takes him to the store to find a gift for him to bring him out of his trauma. And what exactly is this guy doing? Well, we\'re not sure but he does seem to be taking these toys apart to see what makes them tick. He does keep his landlord from evicting him by promising him to pay him in cash the next day and presents him with a "Larry the Larvae" toy for his kid, but of course "Larry" is not a good toy and gets out of the box in the car and of course, well, things aren\'t pretty.<br /><br />Anyway, eventually what\'s going on with Joe Petto and Pino is of course revealed, and as with the old story, Pino is not a "real boy". Pino is probably even more agitated and naughty because he suffers from "Kenitalia" (a smooth plastic crotch) so that could account for his evil ways. And the identity of the lurking stranger is revealed too, and there\'s even kind of a happy ending of sorts. Whee.<br /><br />A step up from part 4, but not much of one. Again, Brian Yuzna is involved, and Screaming Mad George, so some decent special effects, but not enough to make this great. A few leftovers from part 4 are hanging around too, like Clint Howard and Neith Hunter, but that doesn\'t really make any difference. Anyway, I now have seeing the whole series out of my system. Now if I could get some of it out of my brain. 4 out of 5.', shape=(), dtype=string) Label is neg Vectorized review is (<tf.Tensor: shape=(1, 250), dtype=int64, numpy= array([[1287, 313, 2380, 313, 661, 7, 2, 52, 229, 5, 2, 200, 3, 38, 170, 669, 29, 5492, 6, 2, 83, 297, 549, 32, 410, 3, 2, 186, 12, 29, 4, 1, 191, 510, 549, 6, 2, 8229, 212, 46, 576, 175, 168, 20, 1, 5361, 290, 4, 1, 761, 969, 1, 3, 24, 935, 2271, 393, 7, 1, 1675, 4, 3747, 250, 148, 4, 112, 436, 761, 3529, 548, 4, 3633, 31, 2, 1331, 28, 2096, 3, 2912, 9, 6, 163, 4, 1006, 20, 2, 1, 15, 85, 53, 147, 9, 292, 89, 959, 2314, 984, 27, 762, 6, 959, 9, 564, 18, 7, 2140, 32, 24, 1254, 36, 1, 85, 3, 3298, 85, 6, 1410, 3, 1936, 2, 3408, 301, 965, 7, 4, 112, 740, 1977, 12, 1, 2014, 2772, 3, 4, 428, 3, 5177, 6, 512, 1254, 1, 278, 27, 139, 25, 308, 1, 579, 5, 259, 3529, 7, 92, 8981, 32, 2, 3842, 230, 27, 289, 9, 35, 2, 5712, 18, 27, 144, 2166, 56, 6, 26, 46, 466, 2014, 27, 40, 2745, 657, 212, 4, 1376, 3002, 7080, 183, 36, 180, 52, 920, 8, 2, 4028, 12, 969, 1, 158, 71, 53, 67, 85, 2754, 4, 734, 51, 1, 1611, 294, 85, 6, 2, 1164, 6, 163, 4, 3408, 15, 85, 6, 717, 85, 44, 5, 24, 7158, 3, 48, 604, 7, 11, 225, 384, 73, 65, 21, 242, 18, 27, 120, 295, 6, 26, 667, 129, 4028, 948, 6, 67, 48, 158, 93, 1]])>, <tf.Tensor: shape=(), dtype=int32, numpy=0>) 1222 ---> stick 451 ---> already Vocabulary size: 10000
解釋
定義了一個名為“vectorize_text”的函式,該函式基本上將給定的文字轉換為數字,以便計算機可以理解。
IMDB 資料集用於訓練模型。
在控制檯上顯示了評論、標籤和向量化資料的示例。
訓練資料、測試資料和驗證資料都已向量化。
廣告
資料結構
網路
關係型資料庫管理系統
作業系統
Java
iOS
HTML
CSS
Android
Python
C 程式設計
C++
C#
MongoDB
MySQL
Javascript
PHP