如何在包含不到三個重複項的分類列中刪除行,以便在 R 資料幀中包含這些行?
在資料分析中,我們有時根據自己的想法決定資料或樣本的大小,這可能會導致刪除部分資料。其中一項可能是在分類列中刪除少於三項重複項,而這可以在 dplyr 軟體包的 filter 函式的幫助下完成,方法是使用 group_by 函式對其分組。
示例 1
考慮以下資料幀 -
set.seed(121)
x1<−sample(LETTERS[1:6],20,replace=TRUE)
x2<−sample(c("Male","Female"),20,replace=TRUE)
x3<−rpois(20,5)
df1<−data.frame(x1,x2,x3)
df1輸出
x1 x2 x3 1 D Female 5 2 D Female 2 3 D Male 7 4 D Female 8 5 A Male 6 6 C Female 7 7 A Female 3 8 C Female 1 9 C Female 7 10 E Male 2 11 D Female 3 12 E Female 6 13 F Female 3 14 D Female 4 15 A Male 4 16 E Male 4 17 B Female 8 18 B Female 7 19 C Female 5 20 A Female 9
載入 dplyr 軟體包並刪除組合重複項少於三項的分類列 -
示例
library(dplyr) df1%>%group_by(x1,x2)%>%filter(n()>=4) # A tibble: 9 x 3 # Groups: x1, x2 [2]
輸出
x1 x2 x3 <chr> <chr> <int> 1 D Female 5 2 D Female 2 3 D Female 8 4 C Female 7 5 C Female 1 6 C Female 7 7 D Female 3 8 D Female 4 9 C Female 5
示例 2
y1<−sample(c("S1","S2","S3","S4","S5","S6"),20,replace=TRUE)
y2<−sample(c("Winter","Summer"),20,replace=TRUE)
y3<−rnorm(20,3)
df2<−data.frame(y1,y2,y3)
df2輸出
y1 y2 y3 1 S1 Winter 2.683082 2 S4 Summer 1.141916 3 S6 Winter 3.371681 4 S2 Winter 3.191187 5 S3 Summer 2.195504 6 S5 Summer 2.631736 7 S3 Winter 3.303605 8 S6 Summer 3.074344 9 S5 Summer 2.663724 10 S5 Winter 2.281991 11 S6 Summer 4.174418 12 S4 Winter 6.081246 13 S4 Summer 3.202913 14 S2 Winter 5.557243 15 S2 Winter 3.747462 16 S2 Winter 2.621571 17 S2 Summer 3.909743 18 S5 Winter 2.325663 19 S5 Summer 3.749852 20 S5 Winter 2.331191
示例
df2%>%group_by(y1,y2)%>%filter(n()>=4) # A tibble: 4 x 3 # Groups: y1, y2 [1]
輸出
y1 y2 y3 <chr> <chr> <dbl> 1 S2 Winter 3.19 2 S2 Winter 5.56 3 S2 Winter 3.75 4 S2 Winter 2.62
廣告
資料結構
網路
RDBMS
作業系統
Java
iOS
HTML
CSS
Android
Python
C 程式設計
C++
C#
MongoDB
MySQL
Javascript
PHP