如何使用 Python Scikit-learn 實現隨機投影？

Python Scikit-learn 伺服器端程式設計程式設計

隨機投影是一種降維和資料視覺化方法，用於簡化高維資料的複雜性。它主要應用於其他降維技術（如**主成分分析**（PCA））無法勝任的資料。

Python Scikit-learn 提供了一個名為 sklearn.random_projection 的模組，它實現了降低資料維度的計算效率方法。它實現了以下兩種型別的非結構化隨機矩陣：

高斯隨機矩陣
稀疏隨機矩陣

實現高斯隨機投影

為了實現高斯隨機矩陣，random_projection 模組使用 GaussianRandomProjection() 函式，該函式透過將原始空間投影到隨機生成的矩陣上來降低維數。

示例

讓我們來看一個使用高斯隨機投影變換器並將投影矩陣的值視覺化為直方圖的示例：

# Importing the necessary packages
import sklearn
from sklearn.random_projection import GaussianRandomProjection
import numpy as np
from matplotlib import pyplot as plt

# Random data and its transformation
X_random = np.random.RandomState(0).rand(100, 10000)
gauss_data = GaussianRandomProjection(random_state=0)
X_transformed = gauss_data.fit_transform(X_random)

# Get the size of the transformed data
print('Shape of transformed data is: ' + str(X_transformed.shape))

# Set the figure size
plt.figure(figsize=(7.50, 3.50))
plt.subplots_adjust(bottom=0.05, top=0.9, left=0.05, right=0.95)

# Histogram for visualizing the elements of the transformation matrix
plt.hist(gauss_data.components_.flatten())
plt.title('Histogram of the flattened transformation matrix', size ='18')
plt.show()

輸出

它將產生以下輸出

Shape of transformed data is: (100, 3947)

實現稀疏隨機投影

為了實現稀疏隨機矩陣，random_projection 模組使用 GaussianRandomProjection() 函式，該函式透過將原始空間投影到稀疏隨機矩陣上來降低維數。

示例

讓我們來看一個使用稀疏隨機投影變換器並將投影矩陣的值視覺化為直方圖的示例

# Importing the necessary packages
import sklearn
from sklearn.random_projection import SparseRandomProjection
import numpy as np
from matplotlib import pyplot as plt

# Random data and its Sparse transformation
rng = np.random.RandomState(42)
X_rand = rng.rand(25, 3000)
sparse_data = SparseRandomProjection(random_state=0)
X_transformed = sparse_data.fit_transform(X_rand)

# Get the size of the transformed data
print('Shape of transformed data is: ' + str(X_transformed.shape))

# Getting data of the transformation matrix and storing it in s.
s = sparse_data.components_.data
total_elements = sparse_data.components_.shape[0] *\
sparse_data.components_.shape[1]
pos = s[s>0][0]
neg = s[s<0][0]
print('Shape of transformation matrix is: '+ str(sparse_data.components_.shape))
counts = (sum(s==neg), total_elements - len(s), sum(s==pos))

# Set the figure size
plt.figure(figsize=(7.16, 3.50))
plt.subplots_adjust(bottom=0.05, top=0.9, left=0.05, right=0.95)

# Histogram for visualizing the elements of the transformation matrix
plt.bar([neg, 0, pos], counts, width=0.1)
plt.xticks([neg, 0, pos])
plt.suptitle('Histogram of flattened transformation matrix, ' +
   'density = ' +
   '{:.2f}'.format(sparse_data.density_), size='14')
plt.show()

輸出

它將產生以下輸出：

Shape of transformed data is: (25, 2759)
Shape of transformation matrix is: (2759, 3000)

Gaurav Leekha

更新於： 2022年10月4日

713 次檢視

開啟你的職業生涯

透過完成課程獲得認證

開始學習

如何使用 Python Scikit-learn 實現隨機投影？

實現高斯隨機投影

示例

輸出

實現稀疏隨機投影

示例

輸出

開啟你的 職業生涯

開啟你的職業生涯