如何在 Python 中使用 TensorFlow 處理字元子串？

可以使用 TensorFlow 中 `strings` 模組的 `substr` 方法處理字元子串。然後將其轉換為 NumPy 陣列並顯示。

我們將學習如何使用 Python 表示 Unicode 字串，並使用 Unicode 等價物對其進行操作。首先，藉助 Unicode 等價的標準字串操作，根據指令碼檢測將 Unicode 字串分成標記。

我們使用 Google Colaboratory 來執行以下程式碼。Google Colab 或 Colaboratory 幫助在瀏覽器上執行 Python 程式碼，無需任何配置，並可免費訪問 GPU（圖形處理單元）。Colaboratory 建立在 Jupyter Notebook 之上。

print("The default unit is byte")
print("When len is 1, a single byte is returned")
tf.strings.substr(thanks, pos=7, len=1).numpy()
print("The unit is specified as UTF8_CHAR")
print("It takes up 4 bytes")
print(tf.strings.substr(thanks, pos=7, len=1, unit='UTF8_CHAR').numpy())

程式碼來源： https://www.tensorflow.org/tutorials/load_data/unicode

輸出

The default unit is byte
When len is 1, a single byte is returned
The unit is specified as UTF8_CHAR
It takes up 4 bytes
b''

解釋

tf.strings.substr 操作採用 "unit" 引數。
然後它使用此引數來確定 "pos" 和 "len" 引數將包含哪種型別的偏移量。

AmitDiwan

更新於： 2021年2月20日

156 次瀏覽

啟動您的職業生涯

完成課程獲得認證

開始學習