如何使用 Python Scikit-learn 實現隨機投影?
隨機投影是一種降維和資料視覺化方法,用於簡化高維資料的複雜性。它主要應用於其他降維技術(如**主成分分析**(PCA))無法勝任的資料。
Python Scikit-learn 提供了一個名為 sklearn.random_projection 的模組,它實現了降低資料維度的計算效率方法。它實現了以下兩種型別的非結構化隨機矩陣:
- 高斯隨機矩陣
- 稀疏隨機矩陣
實現高斯隨機投影
為了實現高斯隨機矩陣,random_projection 模組使用 GaussianRandomProjection() 函式,該函式透過將原始空間投影到隨機生成的矩陣上來降低維數。
示例
讓我們來看一個使用高斯隨機投影變換器並將投影矩陣的值視覺化為直方圖的示例:
# Importing the necessary packages import sklearn from sklearn.random_projection import GaussianRandomProjection import numpy as np from matplotlib import pyplot as plt # Random data and its transformation X_random = np.random.RandomState(0).rand(100, 10000) gauss_data = GaussianRandomProjection(random_state=0) X_transformed = gauss_data.fit_transform(X_random) # Get the size of the transformed data print('Shape of transformed data is: ' + str(X_transformed.shape)) # Set the figure size plt.figure(figsize=(7.50, 3.50)) plt.subplots_adjust(bottom=0.05, top=0.9, left=0.05, right=0.95) # Histogram for visualizing the elements of the transformation matrix plt.hist(gauss_data.components_.flatten()) plt.title('Histogram of the flattened transformation matrix', size ='18') plt.show()
輸出
它將產生以下輸出
Shape of transformed data is: (100, 3947)
實現稀疏隨機投影
為了實現稀疏隨機矩陣,random_projection 模組使用 GaussianRandomProjection() 函式,該函式透過將原始空間投影到稀疏隨機矩陣上來降低維數。
示例
讓我們來看一個使用稀疏隨機投影變換器並將投影矩陣的值視覺化為直方圖的示例
# Importing the necessary packages import sklearn from sklearn.random_projection import SparseRandomProjection import numpy as np from matplotlib import pyplot as plt # Random data and its Sparse transformation rng = np.random.RandomState(42) X_rand = rng.rand(25, 3000) sparse_data = SparseRandomProjection(random_state=0) X_transformed = sparse_data.fit_transform(X_rand) # Get the size of the transformed data print('Shape of transformed data is: ' + str(X_transformed.shape)) # Getting data of the transformation matrix and storing it in s. s = sparse_data.components_.data total_elements = sparse_data.components_.shape[0] *\ sparse_data.components_.shape[1] pos = s[s>0][0] neg = s[s<0][0] print('Shape of transformation matrix is: '+ str(sparse_data.components_.shape)) counts = (sum(s==neg), total_elements - len(s), sum(s==pos)) # Set the figure size plt.figure(figsize=(7.16, 3.50)) plt.subplots_adjust(bottom=0.05, top=0.9, left=0.05, right=0.95) # Histogram for visualizing the elements of the transformation matrix plt.bar([neg, 0, pos], counts, width=0.1) plt.xticks([neg, 0, pos]) plt.suptitle('Histogram of flattened transformation matrix, ' + 'density = ' + '{:.2f}'.format(sparse_data.density_), size='14') plt.show()
輸出
它將產生以下輸出:
Shape of transformed data is: (25, 2759) Shape of transformation matrix is: (2759, 3000)
廣告