NumPy 資料型別

NumPy 支援比 Python 更多樣的數值型別。下表顯示了 NumPy 中定義的不同標量資料型別。

序號	資料型別及描述
1	bool_ 布林值 (True 或 False)，儲存為一個位元組
2	int_ 預設整數型別 (與 C 語言的 long 相同；通常為 int64 或 int32)
3	intc 與 C 語言的 int 相同 (通常為 int32 或 int64)
4	intp 用於索引的整數 (與 C 語言的 ssize_t 相同；通常為 int32 或 int64)
5	int8 位元組 (-128 到 127)
6	int16 整數 (-32768 到 32767)
7	int32 整數 (-2147483648 到 2147483647)
8	int64 整數 (-9223372036854775808 到 9223372036854775807)
9	uint8 無符號整數 (0 到 255)
10	uint16 無符號整數 (0 到 65535)
11	uint32 無符號整數 (0 到 4294967295)
12	uint64 無符號整數 (0 到 18446744073709551615)
13	float_ float64 的簡寫
14	float16 半精度浮點數：符號位，5 位指數，10 位尾數
15	float32 單精度浮點數：符號位，8 位指數，23 位尾數
16	float64 雙精度浮點數：符號位，11 位指數，52 位尾數
17	complex_ complex128 的簡寫
18	complex64 複數，由兩個 32 位浮點數表示（實部和虛部）
19	complex128 複數，由兩個 64 位浮點數表示（實部和虛部）

NumPy 數值型別是 dtype（資料型別）物件的例項，每個物件都有其獨特的特性。這些 dtype 可作為 np.bool_、np.float32 等使用。

資料型別物件 (dtype)

資料型別物件描述了對應於陣列的固定記憶體塊的解釋，取決於以下幾個方面：

資料型別 (整數、浮點數或 Python 物件)
資料大小
位元組序 (小端序或大端序)
對於結構化型別，欄位名稱、每個欄位的資料型別以及每個欄位佔用的記憶體塊部分。
如果資料型別是子陣列，則其形狀和資料型別。

位元組序由在資料型別前新增“<”或“>”來決定。“<”表示編碼是小端序（最低有效位儲存在最小地址）。“>”表示編碼是大端序（最高有效位儲存在最小地址）。

dtype 物件使用以下語法構造：

numpy.dtype(object, align, copy)

引數如下：

物件 - 要轉換為資料型別物件的。
Align - 如果為 True，則向欄位新增填充，使其類似於 C 結構體。
Copy - 建立 dtype 物件的新副本。如果為 False，則結果是對內建資料型別物件的引用。

示例：使用陣列標量型別

import numpy as np
dt = np.dtype(np.int32)
print(dt)

以下是獲得的輸出：

int32

示例：使用資料型別的等效字串

import numpy as np
dt = np.dtype('i4')
print(dt)

這將產生以下結果：

int32

示例：使用位元組序表示法

import numpy as np
dt = np.dtype('>i4')
print(dt)

以下是上述程式碼的輸出：

>i4

示例：建立結構化資料型別

以下示例顯示了結構化資料型別的用法。這裡，需要宣告欄位名和對應的標量資料型別：

import numpy as np
dt = np.dtype([('age', np.int8)])
print(dt)

獲得的輸出如下所示：

[('age', 'i1')]

示例：將結構化資料型別應用於 ndarray

import numpy as np
dt = np.dtype([('age', np.int8)])
a = np.array([(10,), (20,), (30,)], dtype=dt)
print(a)

執行上述程式碼後，我們得到以下輸出：

[(10,) (20,) (30,)]

示例：訪問結構化資料型別的欄位內容

import numpy as np
dt = np.dtype([('age', np.int8)])
a = np.array([(10,), (20,), (30,)], dtype=dt)
print(a['age'])

產生的結果如下：

[10 20 30]

示例：定義複雜的結構化資料型別

以下示例定義了一個名為 student 的結構化資料型別，其中包含一個字串欄位 'name'、一個 整數字段 'age' 和一個 浮點欄位 'marks'。此 dtype 應用於 ndarray 物件：

import numpy as np
student = np.dtype([('name', 'S20'), ('age', 'i1'), ('marks', 'f4')])
print(student)

我們得到如下所示的輸出：

[('name', 'S20'), ('age', 'i1'), ('marks', '<f4')])

示例：將複雜的結構化資料型別應用於 ndarray

import numpy as np
student = np.dtype([('name', 'S20'), ('age', 'i1'), ('marks', 'f4')])
a = np.array([('abc', 21, 50), ('xyz', 18, 75)], dtype=student)
print(a)

輸出如下：

[('abc', 21, 50.0), ('xyz', 18, 75.0)]

每個內建資料型別都有一個字元程式碼來唯一標識它。

'b' - 布林值
'i' - (帶符號) 整數
'u' - 無符號整數
'f' - 浮點數
'c' - 復浮點數
'm' - timedelta
'M' - datetime
'O' - (Python) 物件
'S', 'a' - (位元組) 字串
'U' - Unicode
'V' - 原始資料 (void)

檢查陣列的資料型別

可以使用 dtype 屬性檢查陣列的資料型別。此屬性返回一個 dtype 物件，該物件描述了陣列中元素的型別，如下所示：

import numpy as np
a = np.array([1, 2, 3])
print(a.dtype)

以下是獲得的輸出：

int64

建立具有已定義資料型別的陣列

在 NumPy 中，可以在建立陣列時顯式指定元素的資料型別 (dtype)。

我們可以在陣列建立函式（例如 np.array()、np.zeros()、np.ones() 等）中使用 dtype 引數來定義陣列元素的資料型別。預設情況下，NumPy 會從輸入資料中推斷資料型別。

示例：建立整數陣列

在這個示例中，我們建立了一個名為 a 的陣列，其元素型別為 int32，這意味著每個元素都是一個 32 位整數：

import numpy as np

# Creating an array of integers with a specified dtype
a = np.array([1, 2, 3], dtype=np.int32)
print("Array:", a)
print("Data type:", a.dtype)

這將產生以下結果：

Array: [1 2 3]
Data type: int32

示例：建立整數陣列

這裡，我們建立了一個名為 c 的陣列，其元素型別為 complex64，表示 64 位複數（32 位實部和 32 位虛部）：

import numpy as np

# Creating an array of complex numbers with a specified dtype
c = np.array([1+2j, 3+4j, 5+6j], dtype=np.complex64)
print("Array:", c)
print("Data type:", c.dtype)

以下是上述程式碼的輸出：

Array: [1.+2.j 3.+4.j 5.+6.j]Data type: complex64

轉換 NumPy 陣列的資料型別

NumPy 提供了幾種方法來轉換陣列的資料型別，允許您更改資料儲存和處理方式，而無需修改底層值：

astype() 方法 - 這是最常用的型別轉換方法。
numpy.cast() 函式 - NumPy 提供的一組用於將陣列轉換為不同型別的函式。
就地型別轉換 - 在建立陣列時直接轉換型別。

示例：使用 "astype" 方法

astype 方法建立陣列的副本，並將其轉換為指定的型別。這是更改陣列資料型別的最常用方法。

這裡，我們使用 NumPy 中的 astype() 方法將整數陣列轉換為浮點資料型別：

import numpy as np

# Creating an array of integers
a = np.array([1, 2, 3, 4, 5])
print("Original array:", a)
print("Original dtype:", a.dtype)

# Converting to float
a_float = a.astype(np.float32)
print("Converted array:", a_float)
print("Converted dtype:", a_float.dtype)

獲得的輸出如下所示：

Original array: [1 2 3 4 5]
Original dtype: int64
Converted array: [1. 2. 3. 4. 5.]
Converted dtype: float32

示例：使用 "numpy.cast" 函式

NumPy 還提供用於將陣列轉換為特定型別的函式。這些函式不太常用，但在某些情況下可能很有用。

在這個示例中，我們建立一個浮點數陣列，並使用 numpy.int32() 函式將其轉換為整數：

import numpy as np

# Creating an array of floats
d = np.array([1.1, 2.2, 3.3, 4.4, 5.5])
print("Original array:", d)
print("Original dtype:", d.dtype)

# Converting to integer using numpy.int32
d_int = np.int32(d)
print("Converted array:", d_int)
print("Converted dtype:", d_int.dtype)

執行上述程式碼後，我們得到以下輸出：

Original array: [1.1 2.2 3.3 4.4 5.5]
Original dtype: float64
Converted array: [1 2 3 4 5]
Converted dtype: int32

示例：就地型別轉換

您還可以在建立陣列時指定資料型別，以避免以後需要轉換型別。

現在，我們使用 numpy.float32() 函式指定浮點資料型別來建立一個整數陣列：

import numpy as np

# Creating an array of integers with a specified dtype
e = np.array([1, 2, 3, 4, 5], dtype=np.float32)
print("Array:", e)
print("Data type:", e.dtype)

產生的結果如下：

Array: [1. 2. 3. 4. 5.]
Data type: float32

如果值無法轉換怎麼辦？

在 NumPy 中轉換資料型別時，可能會遇到無法轉換為所需型別的值。這種情況通常會引發錯誤或導致意外行為。

讓我們探討無法轉換值的不同場景以及如何處理它們：

場景 1：將非數字字串轉換為數字

如果嘗試將非數字字串轉換為整數或浮點數，NumPy 將引發 ValueError，如下所示：

import numpy as np

# Creating an array with non-numeric strings
a = np.array(['1', '2', 'three', '4', '5'])
print("Original array:", a)
print("Original dtype:", a.dtype)

try:
   # Attempting to convert to integer
   a_int = a.astype(np.int32)
   print("Converted array:", a_int)
   print("Converted dtype:", a_int.dtype)
except ValueError as e:
   print("Error:", e)

在這種情況下，字串“three”無法轉換為整數，導致出現 ValueError，如下所示：

Original array: ['1' '2' 'three' '4' '5']
Original dtype: <U5
Error: invalid literal for int() with base 10: 'three'

場景 2：轉換超出範圍的數字

如果嘗試轉換對於目標資料型別而言超出範圍的數字，NumPy 將引發 OverflowError：

import numpy as np

# Creating an array with large float values
b = np.array([1.1e10, 2.2e10, 3.3e10])
print("Original array:", b)
print("Original dtype:", b.dtype)

try:
   # Attempting to convert to integer
   b_int = b.astype(np.int32)
   print("Converted array:", b_int)
   print("Converted dtype:", b_int.dtype)
except OverflowError as e:
   print("Error:", e)

這裡，較大的浮點值無法轉換為 int32 而不會溢位：

Original array: [1.1e+10 2.2e+10 3.3e+10]
Original dtype: float64
Error: OverflowError: (34, 'Numerical result out of range')

場景 3：將複數轉換為實數

將複數轉換為實數時，NumPy 會丟棄虛部並引發 ComplexWarning：

import numpy as np

# Creating an array with complex numbers
c = np.array([1+2j, 3+4j, 5+6j])
print("Original array:", c)
print("Original dtype:", c.dtype)

# Converting to float, discarding imaginary part
c_float = c.astype(np.float32)
print("Converted array:", c_float)
print("Converted dtype:", c_float.dtype)

在這種情況下，NumPy 會引發 ComplexWarning 並丟棄轉換過程中的虛部：

Original array: [1.+2.j 3.+4.j 5.+6.j]
Original dtype: complex128
ComplexWarning: Casting complex values to real discards the imaginary partc_float = c.astype(np.float32)
Converted array: [1. 3. 5.]
Converted dtype: float32

場景 4：處理轉換錯誤

要處理轉換錯誤，可以使用 try-except 塊之類的錯誤處理技術來捕獲和處理異常。

import numpy as np

# Creating an array with mixed data
d = np.array(['1', '2', 'three', '4', '5'])
print("Original array:", d)
print("Original dtype:", d.dtype)

def safe_convert(arr, target_type):
   try:
      return arr.astype(target_type)
   except ValueError as e:
      print("Conversion error:", e)
      return None

# Attempting to convert to integer
d_int = safe_convert(d, np.int32)
if d_int is not None:
   print("Converted array:", d_int)
   print("Converted dtype:", d_int.dtype)
else:
   print("Conversion failed.")

在這個示例中，safe_convert() 函式捕獲“ValueError”並透過返回 None 和列印錯誤訊息來處理它，如下所示：

Original array: ['1' '2' 'three' '4' '5']
Original dtype: <U5
Conversion error: invalid literal for int() with base 10: 'three'
Conversion failed.

場景 5：對無效轉換使用 "np.nan"

對於數值轉換，可以使用 np.nan（非數字）來處理無效值。這種方法在處理缺失資料或損壞資料時很有用。

import numpy as np

# Creating an array with strings, including an invalid entry
e = np.array(['1.1', '2.2', 'three', '4.4', '5.5'])
print("Original array:", e)
print("Original dtype:", e.dtype)

def convert_with_nan(arr):
   result = []
   for item in arr:
      try:
         result.append(float(item))
      except ValueError:
         result.append(np.nan)
   return np.array(result)

# Converting to float with np.nan for invalid entries
e_float = convert_with_nan(e)
print("Converted array:", e_float)
print("Converted dtype:", e_float.dtype)

此處，無效條目將被替換為 np.nan −

Original array: ['1.1' '2.2' 'three' '4.4' '5.5']
Original dtype: <U5
Converted array: [1.1 2.2 nan 4.4 5.5]
Converted dtype: float64

現有陣列的資料型別轉換

您還可以使用view()方法轉換現有陣列的資料型別，以更改資料的解釋方式，而無需更改底層位元組。

示例

此處，資料被重新解釋為“float32”，由於底層位元組保持不變，因此導致了意外的值−

import numpy as np

# Creating an array of integers
g = np.array([1, 2, 3, 4], dtype=np.int32)
print("Original array:", g)
print("Original dtype:", g.dtype)

# Viewing the array as float32
g_view = g.view(np.float32)
print("Viewed array:", g_view)
print("Viewed dtype:", g_view.dtype)

以下是上述程式碼的輸出：

Original array: [1 2 3 4]
Original dtype: int32
Viewed array: [1.4012985e-45 2.8025969e-45 4.2038954e-45 5.6051939e-45]
Viewed dtype: float32

列印頁面