如何在對 R 資料框取樣後更改行索引?
當我們從 R 資料框中獲取隨機樣本時,樣本行的行號與原始資料框中的行號相同,這顯然是由於隨機化造成的。但在進行分析時可能會造成混淆,尤其是在需要使用行的情況下,因此,我們可以將行的索引號轉換為從 1 到選定樣本中行數的數字。
示例
考慮以下資料框:
> set.seed(111) > x1<-rnorm(20,1.5) > x2<-rnorm(20,2.5) > x3<-rnorm(20,3) > df1<-data.frame(x1,x2,x3) > df1
輸出
x1 x2 x3 1 1.735220712 2.8616625 1.824274 2 1.169264128 2.8469644 1.878784 3 1.188376176 2.6897365 1.638096 4 -0.802345658 2.3404232 3.481125 5 1.329123955 2.8265492 3.741972 6 1.640278225 3.0982542 3.027825 7 0.002573344 0.6584657 3.331380 8 0.489811581 5.2180556 3.644114 9 0.551524395 2.6912444 5.485662 10 1.006037783 1.1987039 4.959982 11 1.326325872 -0.6132173 3.191663 12 1.093401220 1.5586426 4.552544 13 3.345636264 3.9002588 3.914242 14 1.894054110 0.8795300 3.358625 15 2.297528501 0.2340040 3.175096 16 -0.066665360 3.6629936 2.152732 17 1.414148991 2.3838450 3.978232 18 1.140860519 2.8342560 4.805868 19 0.306391033 1.8791419 3.122915 20 1.864186737 1.1901551 2.870228
從 df1 中建立大小為 5 的樣本:
> df1_sample<-df1[sample(nrow(df1),5),] > df1_sample
輸出
x1 x2 x3 18 1.140861 2.834256 4.805868 6 1.640278 3.098254 3.027825 13 3.345636 3.900259 3.914242 5 1.329124 2.826549 3.741972 15 2.297529 0.234004 3.175096
重新命名樣本中行的索引號:
> rownames(df1_sample)<-1:nrow(df1_sample) > df1_sample
輸出
x1 x2 x3 1 1.140861 2.834256 4.805868 2 1.640278 3.098254 3.027825 3 3.345636 3.900259 3.914242 4 1.329124 2.826549 3.741972 5 2.297529 0.234004 3.175096
讓我們看看另一個例子:
示例
> y1<-runif(20,2,5) > y2<-runif(20,3,5) > y3<-runif(20,5,10) > y4<-runif(20,5,12) > df2<-data.frame(y1,y2,y3,y4) > df2
輸出
y1 y2 y3 y4 1 2.881213 4.894022 7.797367 6.487594 2 3.052896 3.223898 7.527572 6.695535 3 2.237543 4.127740 9.864026 8.754048 4 4.475907 4.696651 5.403004 6.239423 5 2.792642 4.023536 7.786222 8.992823 6 2.791539 4.333093 9.480036 6.087904 7 2.271143 3.053019 5.539486 8.320935 8 3.382534 3.212921 7.246406 10.091843 9 4.074728 4.390884 6.544056 10.924127 10 4.546881 3.546689 6.164413 11.710035 11 2.738344 4.489939 9.140333 8.211822 12 3.952763 4.490791 5.564392 7.542578 13 4.040586 3.333465 9.420011 11.554599 14 2.313604 4.959709 8.628101 11.193405 15 2.335957 4.189517 9.601667 9.694433 16 2.646964 4.376438 5.614787 10.929413 17 2.390349 3.343716 9.755718 11.017555 18 3.999001 3.083366 8.348515 8.370818 19 3.463324 3.379700 5.425484 7.219430 20 3.059911 4.522844 7.905784 11.420429
> df2_sample<-df2[sample(nrow(df2),7),] > df2_sample
輸出
y1 y2 y3 y4 20 3.059911 4.522844 7.905784 11.420429 3 2.237543 4.127740 9.864026 8.754048 10 4.546881 3.546689 6.164413 11.710035 12 3.952763 4.490791 5.564392 7.542578 15 2.335957 4.189517 9.601667 9.694433 18 3.999001 3.083366 8.348515 8.370818 5 2.792642 4.023536 7.786222 8.992823
> rownames(df2_sample)<-1:nrow(df2_sample) > df2_sample
輸出
y1 y2 y3 y4 1 3.059911 4.522844 7.905784 11.420429 2 2.237543 4.127740 9.864026 8.754048 3 4.546881 3.546689 6.164413 11.710035 4 3.952763 4.490791 5.564392 7.542578 5 2.335957 4.189517 9.601667 9.694433 6 3.999001 3.083366 8.348515 8.370818 7 2.792642 4.023536 7.786222 8.992823
廣告