如何在R中查詢字串列每一行中的字元數?
如果我們在R資料框中有一列字串,並且這些字串混合了數字,而我們想查詢字串列每一行中的字元數,則可以使用nchar函式和gsub函式,如下例所示。
由於R區分大小寫,因此在進行此類分析時,我們需要確保使用正確的小寫和大寫字母表示法。
示例1
以下程式碼片段建立一個示例資料框:
x<-c("A01K", "140AL", "A142R", "A255SW", "A2474EZ", "CA214N", "C14O", "CGSLT", "DC23QW", "D2411RWEDE", "FL233EGV", "G36521VCLPBA", "G54TRU", "H214FI", "245IA", "ID3699", "IL01", "IFDFDN", "K2254FDES", "KY244RLPKJ")
df1<-data.frame(x)
df1建立以下資料框:
x 1 A01K 2 140AL 3 A142R 4 A255SW 5 A2474EZ 6 CA214N 7 C14O 8 CGSLT 9 DC23QW 10 D2411RWEDE 11 FL233EGV 12 G36521VCLPBA 13 G54TRU 14 H214FI 15 245IA 16 ID3699 17 IL01 18 IFDFDN 19 K2254FDES 20 KY244RLPKJ
要查詢列x每一行中的字元數,請將以下程式碼新增到上面的程式碼片段中:
x<-c("A01K", "140AL", "A142R", "A255SW", "A2474EZ", "CA214N", "C14O", "CGSLT", "DC23QW", "D2411RWEDE", "FL233EGV", "G36521VCLPBA", "G54TRU", "H214FI", "245IA", "ID3699", "IL01", "IFDFDN", "K2254FDES", "KY244RLPKJ")
df1<-data.frame(x)
df1$No_of_Chars<-nchar(gsub("[^A-Z]","",df1$x))
df1輸出
如果您將以上所有程式碼片段作為單個程式執行,則會生成以下輸出:
x No_of_Chars 1 A01K 2 2 140AL 2 3 A142R 2 4 A255SW 3 5 A2474EZ 3 6 CA214N 3 7 C14O 2 8 CGSLT 5 9 DC23QW 4 10 D2411RWEDE 6 11 FL233EGV 5 12 G36521VCLPBA 7 13 G54TRU 4 14 H214FI 3 15 245IA 2 16 ID3699 2 17 IL01 2 18 IFDFDN 6 19 K2254FDES 5 20 KY244RLPKJ 7
示例2
以下程式碼片段建立一個示例資料框:
y<-c("ala5412bama","ala1475ska","american11022samoa","arizona3652","arkan1475sas","califor2365nia","co1475lorado","0014connecticut","dela25366ware","district257of22columbia","florid02535a","57412georgia","gu25987am","hawaii36250","20057idaho","i369852llinois","indiana0146563","3255iowa","kansas3682701","kentucky2574")
df2<-data.frame(y)
df2建立以下資料框:
y 1 ala5412bama 2 ala1475ska 3 american11022samoa 4 arizona3652 5 arkan1475sas 6 califor2365nia 7 co1475lorado 8 0014connecticut 9 dela25366ware 10 district257of22columbia 11 florid02535a 12 57412georgia 13 gu25987am 14 hawaii36250 15 20057idaho 16 i369852llinois 17 indiana0146563 18 3255iowa 19 kansas3682701 20 kentucky2574
要查詢列y每一行中的字元數,請將以下程式碼新增到上面的程式碼片段中:
y<-c("ala5412bama","ala1475ska","american11022samoa","arizona3652","arkan1475sas","califor2365nia","co1475lorado","0014connecticut","dela25366ware","district257of22columbia","florid02535a","57412georgia","gu25987am","hawaii36250","20057idaho","i369852llinois","indiana0146563","3255iowa","kansas3682701","kentucky2574")
df2<-data.frame(y)
df2$No_of_Chars<-nchar(gsub("[^a-z]","",df2$y))
df2輸出
如果您將以上所有程式碼片段作為單個程式執行,則會生成以下輸出:
y No_of_Chars 1 ala5412bama 7 2 ala1475ska 6 3 american11022samoa 13 4 arizona3652 7 5 arkan1475sas 8 6 califor2365nia 10 7 co1475lorado 8 8 0014connecticut 11 9 dela25366ware 8 10 district257of22columbia 18 11 florid02535a 7 12 57412georgia 7 13 gu25987am 4 14 hawaii36250 6 15 20057idaho 5 16 i369852llinois 8 17 indiana0146563 7 18 3255iowa 4 19 kansas3682701 6 20 kentucky2574 8
廣告
資料結構
網路
關係資料庫管理系統 (RDBMS)
作業系統
Java
iOS
HTML
CSS
Android
Python
C語言程式設計
C++
C#
MongoDB
MySQL
Javascript
PHP