如何使用 Python Scikit-learn 從資料集獲取類似字典的物件?


藉助 Scikit-learn Python 庫,我們可以獲取資料集的類似字典的物件。一些有趣的類似字典的物件屬性如下:

  • data - 它表示要學習的資料。

  • target - 它表示迴歸目標。

  • DESCR - 資料集的描述。

  • target_names - 它給出資料集的目標名稱。

  • feature_names - 它給出資料集的特徵名稱。

示例 1

在下面的示例中,我們使用加州住房資料集來獲取其類似字典的物件。

# Import necessary libraries import sklearn import pandas as pd from sklearn.datasets import fetch_california_housing # Loading the California housing dataset housing = fetch_california_housing() # Print dictionary-like objects print(housing.keys())

輸出

它將產生以下輸出:

dict_keys(['data', 'target', 'frame', 'target_names', 'feature_names', 'DESCR'])

示例 2

我們還可以獲取有關這些類似字典的物件的更多詳細資訊,如下所示:

# Import necessary libraries import sklearn import pandas as pd from sklearn.datasets import fetch_california_housing print(housing.data.shape) print('\n') print(housing.target.shape) print('\n') print(housing.feature_names) print('\n') print(housing.target_names) print('\n') print(housing.DESCR)

輸出

它將產生以下輸出:

(20640, 8)
(20640,)
['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude']
['MedHouseVal']
.. _california_housing_dataset:
California Housing dataset
--------------------------
**Data Set Characteristics:**
   :Number of Instances: 20640
   :Number of Attributes: 8 numeric, predictive attributes and the target
   :Attribute Information:
      - MedInc median income in block group
      - HouseAge median house age in block group
      - AveRooms average number of rooms per household
      - AveBedrms average number of bedrooms per household
      - Population block group population
      - AveOccup average number of household members
      - Latitude block group latitude
      - Longitude block group longitude
   :Missing Attribute Values: None
Omitted due to length of the output…

示例 3

# Import necessary libraries import sklearn import pandas as pd from sklearn.datasets import fetch_california_housing # Loading the California housing dataset housing = fetch_california_housing(as_frame=True) print(housing.frame.info())

輸出

它將產生以下輸出:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 9 columns):
#    Column       Non-Null Count    Dtype
---  ------       --------------    -----
 0   MedInc       20640 non-null   float64
 1   HouseAge     20640 non-null   float64
 2   AveRooms     20640 non-null   float64
 3   AveBedrms    20640 non-null   float64
 4   Population   20640 non-null   float64
 5   AveOccup     20640 non-null   float64
 6   Latitude     20640 non-null   float64
 7   Longitude    20640 non-null   float64
 8   MedHouseVal  20640 non-null   float64
dtypes: float64(9)
memory usage: 1.4 MB

更新於: 2022年10月4日

262 次檢視

開啟你的 職業生涯

透過完成課程獲得認證

開始學習
廣告

© . All rights reserved.