如何在R中查詢字串列每一行中的字元數?


如果我們在R資料框中有一列字串,並且這些字串混合了數字,而我們想查詢字串列每一行中的字元數,則可以使用nchar函式和gsub函式,如下例所示。

由於R區分大小寫,因此在進行此類分析時,我們需要確保使用正確的小寫和大寫字母表示法。

示例1

以下程式碼片段建立一個示例資料框:

x<-c("A01K", "140AL", "A142R", "A255SW", "A2474EZ", "CA214N", "C14O", "CGSLT", "DC23QW", "D2411RWEDE", "FL233EGV", "G36521VCLPBA", "G54TRU", "H214FI", "245IA", "ID3699", "IL01", "IFDFDN", "K2254FDES", "KY244RLPKJ")
df1<-data.frame(x)
df1

建立以下資料框:

     x
1  A01K
2  140AL
3  A142R
4  A255SW
5  A2474EZ
6  CA214N
7  C14O
8  CGSLT
9  DC23QW
10 D2411RWEDE
11 FL233EGV
12 G36521VCLPBA
13 G54TRU
14 H214FI
15 245IA
16 ID3699
17 IL01
18 IFDFDN
19 K2254FDES
20 KY244RLPKJ

要查詢列x每一行中的字元數,請將以下程式碼新增到上面的程式碼片段中:

x<-c("A01K", "140AL", "A142R", "A255SW", "A2474EZ", "CA214N", "C14O", "CGSLT", "DC23QW", "D2411RWEDE", "FL233EGV", "G36521VCLPBA", "G54TRU", "H214FI", "245IA", "ID3699", "IL01", "IFDFDN", "K2254FDES", "KY244RLPKJ")
df1<-data.frame(x)
df1$No_of_Chars<-nchar(gsub("[^A-Z]","",df1$x))
df1

輸出

如果您將以上所有程式碼片段作為單個程式執行,則會生成以下輸出:

    x    No_of_Chars
1  A01K         2
2  140AL        2
3  A142R        2
4  A255SW       3
5  A2474EZ      3
6  CA214N       3
7  C14O         2
8  CGSLT        5
9  DC23QW       4
10 D2411RWEDE   6
11 FL233EGV     5
12 G36521VCLPBA 7
13 G54TRU       4
14 H214FI       3
15 245IA        2
16 ID3699       2
17 IL01         2
18 IFDFDN       6
19 K2254FDES    5
20 KY244RLPKJ   7

示例2

以下程式碼片段建立一個示例資料框:

y<-c("ala5412bama","ala1475ska","american11022samoa","arizona3652","arkan1475sas","califor2365nia","co1475lorado","0014connecticut","dela25366ware","district257of22columbia","florid02535a","57412georgia","gu25987am","hawaii36250","20057idaho","i369852llinois","indiana0146563","3255iowa","kansas3682701","kentucky2574")
df2<-data.frame(y)
df2

建立以下資料框:

      y
1  ala5412bama
2  ala1475ska
3  american11022samoa
4  arizona3652
5  arkan1475sas
6  califor2365nia
7  co1475lorado
8  0014connecticut
9  dela25366ware
10 district257of22columbia
11 florid02535a
12 57412georgia
13 gu25987am
14 hawaii36250
15 20057idaho
16 i369852llinois
17 indiana0146563
18 3255iowa
19 kansas3682701
20 kentucky2574

要查詢列y每一行中的字元數,請將以下程式碼新增到上面的程式碼片段中:

y<-c("ala5412bama","ala1475ska","american11022samoa","arizona3652","arkan1475sas","califor2365nia","co1475lorado","0014connecticut","dela25366ware","district257of22columbia","florid02535a","57412georgia","gu25987am","hawaii36250","20057idaho","i369852llinois","indiana0146563","3255iowa","kansas3682701","kentucky2574")
df2<-data.frame(y)
df2$No_of_Chars<-nchar(gsub("[^a-z]","",df2$y))
df2

輸出

如果您將以上所有程式碼片段作為單個程式執行,則會生成以下輸出:

          y          No_of_Chars
1  ala5412bama              7
2  ala1475ska               6
3  american11022samoa      13
4  arizona3652              7
5  arkan1475sas             8
6  califor2365nia          10
7  co1475lorado             8
8  0014connecticut         11
9  dela25366ware            8
10 district257of22columbia 18
11 florid02535a             7
12 57412georgia             7
13 gu25987am                4
14 hawaii36250              6
15 20057idaho               5
16 i369852llinois           8
17 indiana0146563           7
18 3255iowa                 4
19 kansas3682701            6
20 kentucky2574             8

更新於:2021年11月11日

1K+ 次檢視

啟動您的職業生涯

完成課程獲得認證

開始
廣告
© . All rights reserved.