如何在包含不到三個重複項的分類列中刪除行,以便在 R 資料幀中包含這些行?


在資料分析中,我們有時根據自己的想法決定資料或樣本的大小,這可能會導致刪除部分資料。其中一項可能是在分類列中刪除少於三項重複項,而這可以在 dplyr 軟體包的 filter 函式的幫助下完成,方法是使用 group_by 函式對其分組。

示例 1

 實際演示

考慮以下資料幀 -

set.seed(121)
x1<−sample(LETTERS[1:6],20,replace=TRUE)
x2<−sample(c("Male","Female"),20,replace=TRUE)
x3<−rpois(20,5)
df1<−data.frame(x1,x2,x3)
df1

輸出

x1 x2 x3
1 D Female 5
2 D Female 2
3 D Male 7
4 D Female 8
5 A Male 6
6 C Female 7
7 A Female 3
8 C Female 1
9 C Female 7
10 E Male 2
11 D Female 3
12 E Female 6
13 F Female 3
14 D Female 4
15 A Male 4
16 E Male 4
17 B Female 8
18 B Female 7
19 C Female 5
20 A Female 9

載入 dplyr 軟體包並刪除組合重複項少於三項的分類列 -

示例

library(dplyr)
df1%>%group_by(x1,x2)%>%filter(n()>=4)
# A tibble: 9 x 3
# Groups: x1, x2 [2]

輸出

x1 x2 x3
<chr> <chr> <int>
1 D Female 5
2 D Female 2
3 D Female 8
4 C Female 7
5 C Female 1
6 C Female 7
7 D Female 3
8 D Female 4
9 C Female 5

示例 2

 實際演示

y1<−sample(c("S1","S2","S3","S4","S5","S6"),20,replace=TRUE)
y2<−sample(c("Winter","Summer"),20,replace=TRUE)
y3<−rnorm(20,3)
df2<−data.frame(y1,y2,y3)
df2

輸出

y1 y2 y3
1 S1 Winter 2.683082
2 S4 Summer 1.141916
3 S6 Winter 3.371681
4 S2 Winter 3.191187
5 S3 Summer 2.195504
6 S5 Summer 2.631736
7 S3 Winter 3.303605
8 S6 Summer 3.074344
9 S5 Summer 2.663724
10 S5 Winter 2.281991
11 S6 Summer 4.174418
12 S4 Winter 6.081246
13 S4 Summer 3.202913
14 S2 Winter 5.557243
15 S2 Winter 3.747462
16 S2 Winter 2.621571
17 S2 Summer 3.909743
18 S5 Winter 2.325663
19 S5 Summer 3.749852
20 S5 Winter 2.331191

示例

df2%>%group_by(y1,y2)%>%filter(n()>=4)
# A tibble: 4 x 3
# Groups: y1, y2 [1]

輸出

y1 y2 y3
<chr> <chr> <dbl>
1 S2 Winter 3.19
2 S2 Winter 5.56
3 S2 Winter 3.75
4 S2 Winter 2.62

更新於:2021 年 2 月 8 日

318 次檢視

開啟您的 事業

完成課程即可獲得認證

開始
廣告
© . All rights reserved.