如何在 R 中向量的字元中查詢相似的單詞?


有時字元向量中的字串會出現拼寫錯誤,而我們希望提取相似的單詞以避免拼寫錯誤,因為相似的單詞很可能代表單詞的正確和錯誤形式。這可以使用具有 lapply 函式的 agrep 來實現。

示例 1

 線上演示

x1<-c("India","United Kingdoms","Indiaa","Egyypt","United
Kingdom","Turkey","Egypt","Belaarus","Belarus")
lapply(x1,agrep,x1,value=TRUE)

輸出

[[1]]
[1] "India" "Indiaa"
[[2]]
[1] "United Kingdoms" "United Kingdom"
[[3]]
[1] "India" "Indiaa"
[[4]]
[1] "Egyypt" "Egypt"
[[5]]
[1] "United Kingdoms" "United Kingdom"
[[6]]
[1] "Turkey"
[[7]]
[1] "Egyypt" "Egypt"
[[8]]
[1] "Belaarus" "Belarus"
[[9]]
[1] "Belaarus" "Belarus"

示例 2

 線上演示

x2<-c("Alhadi","Umair","Omar","Alhadi","Shanti","Shant","Umaer","Peter","Rahul","Pattrick","P
eeter","Rahuls")
lapply(x2,agrep,x2,value=TRUE)

輸出

[[1]]
[1] "Al-hadi" "Alhadi"
[[2]]
[1] "Umair" "Umaer"
[[3]]
[1] "Omar"
[[4]]
[1] "Al-hadi" "Alhadi"
[[5]]
[1] "Shanti" "Shant"
[[6]]
[1] "Shanti" "Shant"
[[7]]
[1] "Umair" "Umaer"
[[8]]
[1] "Peter" "Peeter"
[[9]]
[1] "Rahul" "Rahuls"
[[10]]
[1] "Pattrick"
[[11]]
[1] "Peter" "Peeter"
[[12]]
[1] "Rahul" "Rahuls"

示例 3

 線上演示

x3<-c("Alabamaa","New Yorky","New
Yok","Alabma","Florida","Illinois","Texas","Illinoise")
lapply(x3,agrep,x3,value=TRUE)

輸出

[[1]]
[1] "Alabamaa"
[[2]]
[1] "New Yorky"
[[3]]
[1] "New Yorky" "New Yok"
[[4]]
[1] "Alabamaa" "Alabma"
[[5]]
[1] "Florida"
[[6]]
[1] "Illinois" "Illinoise"
[[7]]
[1] "Texas"
[[8]]
[1] "Illinois" "Illinoise"

更新時間: 09-Sep-2020

1K+ 瀏覽

開啟你的 職業生涯

透過完成課程獲得認證

開始
廣告
© . All rights reserved.