如何在 R 中基於 data.table 的分組列建立隨機樣本?
隨機抽樣有助於減少分析中的偏差。如果我們按組檢視資料,則可能需要基於組查詢隨機樣本。例如,如果我們有一個包含分組變數的資料框,且每個組包含十個值,則我們可能需要建立一個隨機樣本,其中我們將從每組隨機選取兩個值。這可以透過在 .SD 中使用 sample 函式來實現
示例
考慮以下 data.table −
library(data.table) Group<-rep(c("A","B","C","D","E"),times=4) Percentage<-sample(1:100,20) dt1<-data.table(Group,Percentage) dt1
輸出
Group Percentage 1: A 97 2: B 68 3: C 19 4: D 32 5: E 98 6: A 48 7: B 94 8: C 54 9: D 7 10: E 76 11: A 10 12: B 31 13: C 59 14: D 84 15: E 41 16: A 99 17: B 1 18: C 72 19: D 42 20: E 17
從每個組建立大小為 2 的隨機樣本 −
示例
dt1[,.SD[sample(.N, min(2,.N))],by=Group]
輸出
Group Percentage 1: A 48 2: A 99 3: B 94 4: B 31 5: C 54 6: C 59 7: D 42 8: D 84 9: E 98 10: E 76
我們來看另一個示例 −
示例
Class<-rep(c("First","Second","Third","Fourth"),times=10) Experience<-sample(1:5,40,replace=TRUE) dt2<-data.table(Class,Experience) head(dt2,10)
輸出
Class Experience 1: First 4 2: Second 2 3: Third 4 4: Fourth 2 5: First 4 6: Second 5 7: Third 3 8: Fourth 5 9: First 3 10: Second 5
示例
tail(dt2,10)
輸出
Class Experience 1: Third 4 2: Fourth 2 3: First 5 4: Second 2 5: Third 1 6: Fourth 4 7: First 5 8: Second 2 9: Third 4 10: Fourth 4
示例
dt2[,.SD[sample(.N, min(5,.N))],by=Class]
輸出
Class Experience 1: First 3 2: First 3 3: First 4 4: First 5 5: First 5 6: Second 5 7: Second 2 8: Second 5 9: Second 2 10: Second 1 11: Third 3 12: Third 1 13: Third 4 14: Third 3 15: Third 4 16: Fourth 2 17: Fourth 5 18: Fourth 2 19: Fourth 4 20: Fourth 2
廣告