如何使用 TensorFlow 和 Python 從單詞列表構建參差不齊張量？

可以使用句子中單詞的起始偏移量來構建 RaggedTensor。首先，構建句子中每個單詞中每個字元的程式碼點。接下來，將它們顯示在控制檯上。確定特定句子中的單詞數量，並確定偏移量。

使用 Python 表示 Unicode 字串，並使用 Unicode 等價物操作它們。首先，我們將使用 Unicode 等價的標準字串操作，根據指令碼檢測將 Unicode 字串分成標記。

我們使用 Google Colaboratory 來執行以下程式碼。Google Colab 或 Colaboratory 幫助在瀏覽器上執行 Python 程式碼，無需任何配置，並可免費訪問 GPU（圖形處理單元）。Colaboratory 建立在 Jupyter Notebook 之上。

print("Get the code point of every character in every word")
word_char_codepoint = tf.RaggedTensor.from_row_starts(
   values=sentence_char_codepoint.values,
   row_starts=word_starts)
print(word_char_codepoint)
print("Get the number of words in the specific sentence")
sentence_num_words = tf.reduce_sum(tf.cast(sentence_char_starts_word, tf.int64), axis=1)

程式碼來源： https://www.tensorflow.org/tutorials/load_data/unicode

輸出

Get the code point of every character in every word
<tf.RaggedTensor [[72, 101, 108, 108, 111], [44, 32], [116, 104, 101, 114, 101], [46], [19990, 30028], [12371, 12435, 12395, 12385, 12399]]>
Get the number of words in the specific sentence

解釋

構建每個單詞中每個字元的程式碼點。
這些程式碼點顯示在控制檯上。
確定特定句子中的單詞數量。

AmitDiwan

更新於： 2021年2月20日

256 次瀏覽

啟動你的職業生涯

完成課程獲得認證

開始學習