TensorFlow 如何用於不同字串表示之間的轉換？

編碼的字串標量可以使用“decode”方法轉換為程式碼點向量。程式碼點向量可以使用“encode”方法轉換為編碼的字串標量。“transcode”方法可以將編碼的字串標量轉換為不同的編碼。

閱讀更多：什麼是 TensorFlow，以及 Keras 如何與 TensorFlow 協作建立神經網路？

讓我們瞭解如何使用 Python 表示 Unicode 字串，以及如何使用 Unicode 等效項來操作它們。首先，我們藉助標準字串操作的 Unicode 等效項，基於指令碼檢測將 Unicode 字串分成標記。

我們使用 Google Colaboratory 來執行以下程式碼。Google Colab 或 Colaboratory 幫助在瀏覽器上執行 Python 程式碼，無需任何配置，並可免費訪問 GPU（圖形處理單元）。Colaboratory 基於 Jupyter Notebook 構建。

print("Converting encoded string scalar to a vector of code points")
tf.strings.unicode_decode(text_utf8,input_encoding='UTF-8')
print("Converting vector of code points to an encoded string scalar")
tf.strings.unicode_encode(text_chars, output_encoding='UTF-8')
print("Converting encoded string scalar to a different encoding")
tf.strings.unicode_transcode(text_utf8, input_encoding='UTF8', output_encoding='UTF-16-BE')

程式碼來源：https://www.tensorflow.org/tutorials/load_data/unicode

輸出

Converting encoded string scalar to a vector of code points
Converting vector of code points to an encoded string scalar
Converting encoded string scalar to a different encoding
<tf.Tensor: shape=(), dtype=string, numpy=b'\x8b\xed\x8a\x00Y\x04t\x06'>

解釋

函式 'unicode_decode' 用於將編碼的字串標量轉換為程式碼點向量。
函式 'unicode_encode' 用於將程式碼點向量轉換為編碼的字串標量。
函式 'unicode_transcode' 用於將編碼的字串標量轉換為不同的編碼。

AmitDiwan

更新於：2021年2月19日

267 次瀏覽

啟動您的職業生涯

透過完成課程獲得認證

開始學習