如何在R資料框中建立樣本,如果行值已分配權重?
為了在R中建立隨機樣本,我們可以使用sample函式,但是如果提供了值的權重,則需要根據權重分配值機率。例如,如果我們有一個包含某列X值和另一列Weight對應權重的DataFrame df,則可以如下生成大小為10的隨機樣本:
df[sample(seq_len(nrow(df)),10,prob=df$Weight_x),]
示例
考慮以下資料框:
set.seed(1256) x<−rnorm(20,5,1) weight_x<−sample(1:10,20,replace=TRUE) df<−data.frame(x,weight_x) df
輸出
x weight_x 1 4.126636 10 2 5.806501 1 3 5.768463 10 4 5.980315 8 5 6.593158 2 6 4.298533 10 7 6.196574 4 8 4.136517 5 9 4.504645 10 10 4.416107 6 11 5.257177 10 12 5.836453 1 13 5.334041 10 14 4.959786 2 15 3.406828 7 16 4.149746 2 17 4.657464 4 18 4.820102 10 19 5.401021 9 20 6.718216 6
使用權重列查詢不同的樣本:
示例
df[sample(seq_len(nrow(df)),5,prob=df$weight_x),]
輸出
x weight_x 11 5.257177 10 19 5.401021 9 13 5.334041 10 10 4.416107 6 5 6.593158 2
示例
df[sample(seq_len(nrow(df)),3,prob=df$weight_x),]
輸出
x weight_x 13 5.334041 10 3 5.768463 10 18 4.820102 10
示例
df[sample(seq_len(nrow(df)),7,prob=df$weight_x),]
輸出
x weight_x 9 4.504645 10 19 5.401021 9 12 5.836453 1 5 6.593158 2 15 3.406828 7 11 5.257177 10 6 4.298533 10
示例
df[sample(seq_len(nrow(df)),10,prob=df$weight_x),]
輸出
x weight_x 4 5.980315 8 9 4.504645 10 19 5.401021 9 1 4.126636 10 13 5.334041 10 12 5.836453 1 11 5.257177 10 18 4.820102 10 10 4.416107 6 3 5.768463 10
示例
df[sample(seq_len(nrow(df)),9,prob=df$weight_x),]
輸出
x weight_x 8 4.136517 5 11 5.257177 10 7 6.196574 4 4 5.980315 8 9 4.504645 10 6 4.298533 10 19 5.401021 9 18 4.820102 10 16 4.149746 2
示例
df[sample(seq_len(nrow(df)),4,prob=df$weight_x),]
輸出
x weight_x 1 4.126636 10 6 4.298533 10 11 5.257177 10 7 6.196574 4
示例
df[sample(seq_len(nrow(df)),15,prob=df$weight_x),]
輸出
x weight_x 3 5.768463 10 15 3.406828 7 19 5.401021 9 16 4.149746 2 9 4.504645 10 8 4.136517 5 11 5.257177 10 10 4.416107 6 18 4.820102 10 6 4.298533 10 4 5.980315 8 17 4.657464 4 1 4.126636 10 20 6.718216 6 13 5.334041 10
示例
df[sample(seq_len(nrow(df)),2,prob=df$weight_x),]
輸出
x weight_x 11 5.257177 10 13 5.334041 10
示例
df[sample(seq_len(nrow(df)),12,prob=df$weight_x),]
輸出
x weight_x 1 4.126636 10 3 5.768463 10 8 4.136517 5 11 5.257177 10 10 4.416107 6 6 4.298533 10 13 5.334041 10 4 5.980315 8 20 6.718216 6 12 5.836453 1 18 4.820102 10 19 5.401021 9
示例
df[sample(seq_len(nrow(df)),18,prob=df$weight_x),]
輸出
x weight_x 5 6.593158 2 4 5.980315 8 6 4.298533 10 20 6.718216 6 15 3.406828 7 3 5.768463 10 9 4.504645 10 10 4.416107 6 13 5.334041 10 19 5.401021 9 8 4.136517 5 11 5.257177 10 18 4.820102 10 1 4.126636 10 7 6.196574 4 12 5.836453 1 17 4.657464 4 16 4.149746 2
廣告
資料結構
網路
關係資料庫管理系統 (RDBMS)
作業系統
Java
iOS
HTML
CSS
Android
Python
C語言程式設計
C++
C#
MongoDB
MySQL
Javascript
PHP