如何使用Tensorflow和Python編碼多個長度相同的字串？

可以使用“tf.Tensor”作為輸入值來編碼多個長度相同的字串。當需要編碼多個長度不同的字串時，應使用tf.RaggedTensor作為輸入。如果張量包含以填充/稀疏格式儲存的多個字串，則需要將其轉換為tf.RaggedTensor。然後，應該在其上呼叫unicode_encode方法。

讓我們瞭解如何使用Python表示Unicode字串，以及如何使用Unicode等價物來操作這些字串。首先，我們藉助標準字串操作的Unicode等價物，根據指令碼檢測將Unicode字串分成標記。

我們正在使用Google Colaboratory執行以下程式碼。Google Colab或Colaboratory幫助透過瀏覽器執行Python程式碼，無需任何配置，並且可以免費訪問GPU（圖形處理單元）。Colaboratory構建在Jupyter Notebook之上。

print("When encoding multiple strings of   same lengths, tf.Tensor is used as input")
tf.strings.unicode_encode([[99, 97, 116], [100, 111, 103], [ 99, 111, 119]],output_encoding='UTF-8')
print("When encoding multiple strings with varying length, a tf.RaggedTensor should be used as input:")
tf.strings.unicode_encode(batch_chars_ragged, output_encoding='UTF-8')
print("If there is a tensor with multiple strings in padded/sparse format, convert it to a tf.RaggedTensor before calling unicode_encode")
tf.strings.unicode_encode(
   tf.RaggedTensor.from_sparse(batch_chars_sparse),
   output_encoding='UTF-8')
tf.strings.unicode_encode(
   tf.RaggedTensor.from_tensor(batch_chars_padded, padding=-1),
   output_encoding='UTF-8')

程式碼來源：https://www.tensorflow.org/tutorials/load_data/unicode

輸出

When encoding multiple strings of   same lengths, tf.Tensor is used as input
When encoding multiple strings with varying length, a tf.RaggedTensor should be used as input:
If there is a tensor with multiple strings in padded/sparse format, convert it to a tf.RaggedTensor before calling unicode_encode

解釋

在編碼多個長度相同的字串時，可以使用tf.Tensor作為輸入。
在編碼多個長度不同的字串時，可以使用tf.RaggedTensor作為輸入。
如果存在一個包含以填充/稀疏格式儲存的多個字串的張量，則需要在對其呼叫unicode_encode之前將其轉換為tf.RaggedTensor。

AmitDiwan

更新於： 2021年2月20日

214次檢視

開啟你的職業生涯

透過完成課程獲得認證

立即開始