如何在R資料框中從字元列提取第一個數字?
如果資料框中有一列字元包含字串和數值,並且數值的第一個數字具有一定的意義,可以幫助進行資料分析,那麼我們可以提取這些第一個數字。為此,我們可以使用stringi包中的`stri_extract_first`函式。
示例1
考慮以下資料框:
> x1<-1:20 > y1<-sample(c("HT23L","HT14L","HT32L"),20,replace=TRUE) > df1<-data.frame(x1,y1) > df1
輸出
x1 y1 1 1 HT14L 2 2 HT14L 3 3 HT23L 4 4 HT14L 5 5 HT32L 6 6 HT32L 7 7 HT14L 8 8 HT32L 9 9 HT32L 10 10 HT32L 11 11 HT23L 12 12 HT32L 13 13 HT14L 14 14 HT23L 15 15 HT14L 16 16 HT23L 17 17 HT23L 18 18 HT23L 19 19 HT23L 20 20 HT23L
載入stringi包並在y1列中提取第一個數字:
> library(stringi) > stri_extract_first(df1$y1,regex="\d")
輸出
[1] "1" "1" "2" "1" "3" "3" "1" "3" "3" "3" "2" "3" "1" "2" "1" "2" "2" "2" "2" [20] "2"
示例2
> x2<-sample(c("India1RT1","UK5RT1","Egypt2PT4"),20,replace=TRUE) > y2<-rpois(20,5) > df2<-data.frame(x2,y2) > df2
輸出
x2 y2 1 India1RT1 2 2 India1RT1 8 3 India1RT1 7 4 India1RT1 6 5 UK5RT1 6 6 India1RT1 5 7 UK5RT1 6 8 India1RT1 6 9 India1RT1 7 10 UK5RT1 10 11 Egypt2PT4 8 12 Egypt2PT4 5 13 Egypt2PT4 7 14 India1RT1 2 15 UK5RT1 3 16 Egypt2PT4 5 17 UK5RT1 3 18 Egypt2PT4 6 19 Egypt2PT4 3 20 UK5RT1 5
在x2列中提取第一個數字:
> stri_extract_first(df2$x2,regex="\d")
輸出
[1] "1" "1" "1" "1" "5" "1" "5" "1" "1" "5" "2" "2" "2" "1" "5" "2" "5" "2" "2" [20] "5"
示例3
> x3<-sample(c("abc123","dfe456"),20,replace=TRUE) > y3<-rnorm(20) > df3<-data.frame(x3,y3) > df3
輸出
x3 y3 1 abc123 0.1027005 2 dfe456 0.2297002 3 dfe456 -0.1441151 4 dfe456 1.0510760 5 abc123 0.8182656 6 dfe456 -0.5018968 7 dfe456 0.2957634 8 abc123 -0.4240910 9 dfe456 -1.0700713 10 dfe456 -0.3374661 11 dfe456 -0.4654241 12 dfe456 -0.4542710 13 abc123 0.6969808 14 dfe456 -0.6514574 15 abc123 0.2258769 16 dfe456 -0.5348958 17 abc123 0.6629195 18 dfe456 1.0998636 19 dfe456 -1.3147809 20 dfe456 -2.3015384
在x3列中提取第一個數字:
> stri_extract_first(df3$x3,regex="\d")
輸出
[1] "1" "4" "4" "4" "1" "4" "4" "1" "4" "4" "4" "4" "1" "4" "1" "4" "1" "4" "4" [20] "4"
廣告