如何使用dplyr包從R資料框中刪除多行?
有時,我們在資料集中會得到一些不需要的資訊,需要將其刪除,這些資訊可能是一個單獨的案例、多個案例、整個變數或任何其他對實現我們的分析目標沒有幫助的東西,因此我們希望將其刪除。如果我們想在R資料框中使用dplyr包刪除此類行,則可以使用anti_join函式。
示例
考慮以下資料框
> set.seed(2514) > x1<-rnorm(20,5) > x2<-rnorm(20,5,0.05) > df1<-data.frame(x1,x2) > df1
輸出
x1 x2 1 5.567262 4.998607 2 5.343063 4.931962 3 2.211267 5.034461 4 5.092191 5.075641 5 3.883282 4.997900 6 5.950218 5.038626 7 4.903268 5.010087 8 7.462286 4.974513 9 5.056762 5.097812 10 6.031768 5.002989 11 3.814416 4.990552 12 3.359167 4.891964 13 5.304671 4.950883 14 4.768564 4.953290 15 3.842797 4.950219 16 5.270018 4.995953 17 6.344269 5.008545 18 5.366249 4.905290 19 5.547608 5.098554 20 5.266844 5.003416
載入dplyr包
> library(dplyr)
從df1中刪除第1到第5行
> anti_join(df1,df1[1:5,]) Joining, by = c("x1", "x2") x1 x2 1 5.950218 5.038626 2 4.903268 5.010087 3 7.462286 4.974513 4 5.056762 5.097812 5 6.031768 5.002989 6 3.814416 4.990552 7 3.359167 4.891964 8 5.304671 4.950883 9 4.768564 4.953290 10 3.842797 4.950219 11 5.270018 4.995953 12 6.344269 5.008545 13 5.366249 4.905290 14 5.547608 5.098554 15 5.266844 5.003416
從df1中刪除第11到第18行
> anti_join(df1,df1[11:18,]) Joining, by = c("x1", "x2") x1 x2 1 5.567262 4.998607 2 5.343063 4.931962 3 2.211267 5.034461 4 5.092191 5.075641 5 3.883282 4.997900 6 5.950218 5.038626 7 4.903268 5.010087 8 7.462286 4.974513 9 5.056762 5.097812 10 6.031768 5.002989 11 5.547608 5.098554 12 5.266844 5.003416
從df1中刪除第6到第12行
> anti_join(df1,df1[6:12,]) Joining, by = c("x1", "x2") x1 x2 1 5.567262 4.998607 2 5.343063 4.931962 3 2.211267 5.034461 4 5.092191 5.075641 5 3.883282 4.997900 6 5.304671 4.950883 7 4.768564 4.953290 8 3.842797 4.950219 9 5.270018 4.995953 10 6.344269 5.008545 11 5.366249 4.905290 12 5.547608 5.098554 13 5.266844 5.003416
從df1中刪除第15到第20行
> anti_join(df1,df1[15:20,]) Joining, by = c("x1", "x2") x1 x2 1 5.567262 4.998607 2 5.343063 4.931962 3 2.211267 5.034461 4 5.092191 5.075641 5 3.883282 4.997900 6 5.950218 5.038626 7 4.903268 5.010087 8 7.462286 4.974513 9 5.056762 5.097812 10 6.031768 5.002989 11 3.814416 4.990552 12 3.359167 4.891964 13 5.304671 4.950883 14 4.768564 4.953290
從df1中刪除第5到第18行
> anti_join(df1,df1[5:18,]) Joining, by = c("x1", "x2") x1 x2 1 5.567262 4.998607 2 5.343063 4.931962 3 2.211267 5.034461 4 5.092191 5.075641 5 5.547608 5.098554 6 5.266844 5.003416
從df1中刪除第11到第20行
> anti_join(df1,df1[11:20,]) Joining, by = c("x1", "x2") x1 x2 1 5.567262 4.998607 2 5.343063 4.931962 3 2.211267 5.034461 4 5.092191 5.075641 5 3.883282 4.997900 6 5.950218 5.038626 7 4.903268 5.010087 8 7.462286 4.974513 9 5.056762 5.097812 10 6.031768 5.002989
從df1中刪除第1到第10行
> anti_join(df1,df1[1:10,]) Joining, by = c("x1", "x2") x1 x2 1 3.814416 4.990552 2 3.359167 4.891964 3 5.304671 4.950883 4 4.768564 4.953290 5 3.842797 4.950219 6 5.270018 4.995953 7 6.344269 5.008545 8 5.366249 4.905290 9 5.547608 5.098554 10 5.266844 5.003416
從df1中刪除第2到第11行
> anti_join(df1,df1[2:11,]) Joining, by = c("x1", "x2") x1 x2 1 5.567262 4.998607 2 3.359167 4.891964 3 5.304671 4.950883 4 4.768564 4.953290 5 3.842797 4.950219 6 5.270018 4.995953 7 6.344269 5.008545 8 5.366249 4.905290 9 5.547608 5.098554 10 5.266844 5.003416
廣告