如何在 R 資料框中摺疊因子等級?


有時因子的級別記錄不正確,例如,某些地方用 M 來記錄男性,而某些地方用 Mal 來記錄男性,因而存在兩個男性級別的級別。因此,如果因子的級別記錄不正確,則級別的數量會增加,我們需要解決這個問題,因為使用這些因子級別的分析將是錯誤的。要將不正確的因子級別轉換為適當的級別,我們可以使用 list 函式來定義那些級別。

示例 1

 線上演示

F<-c("Male","Ma","Fem","Female","M","Male","Mal","Male","Fe","Female","M","Fema","Ma","Femal","F","Fem","Male","Ma","Male","Female")
Rate<-rep(c(25,30,37,56),times=5)
df1<-data.frame(F,Rate)
df1

輸出

F Rate
1 Male 25
2 Ma 30
3 Fem 37
4 Female 56
5 M 25
6 Male 30
7 Mal 37
8 Male 56
9 Fe 25
10 Female 30
11 M 37
12 Fema 56
13 Ma 25
14 Femal 30
15 F 37
16 Fem 56
17 Male 25
18 Ma 30
19 Male 37
20 Female 56
levels(df1$F)<-list("Male"=c("Male","Ma","Mal","M"),"Female"=c("Female","Fe","Fem","Fema","Femal","F"))
df1
F Rate
1 Male 25
2 Male 30
3 Female 37
4 Female 56
5 Male 25
6 Male 30
7 Male 37
8 Male 56
9 Female 25
10 Female 30
11 Male 37
12 Female 56
13 Male 25
14 Female 30
15 Female 37
16 Female 56
17 Male 25
18 Male 30
19 Male 37
20 Female 56

示例 2

 線上演示

MotorCycleTypes<-c("Cru","Sp","Sport","Tour","Endu","Cruiser","Touri","Enduro","Spo","Cruise","Touring","To","Sp","End","Cruis","Cruiser","Sport","End","Tour","Enduro")
Frequency<-sample(1:30,20,replace=TRUE)
df2<-data.frame(MotorCycleTypes,Frequency)
df2

輸出

MotorCycleTypes Frequency
1 Cru 5
2 Sp 15
3 Sport 10
4 Tour 2
5 Endu 25
6 Cruiser 6
7 Touri 17
8 Enduro 5
9 Spo 15
10 Cruise 25
11 Touring 12
12 To 11
13 Sp 20
14 End 6
15 Cruis 1
16 Cruiser 12
17 Sport 21
18 End 5
19 Tour 23
20 Enduro 2
levels(df2$MotorCycleTypes)<-list("Cruise"=c("Cruiser","Cru","Cruis","Cruise"),"Sport"=c("Sport","Sp","Spo"),"Enduro"=c("Enduro","Endu","End"),"Touring"=c("Touring","Tour","To","Touri"))
df2
MotorCycleTypes Frequency
1 Cruise 5
2 Sport 15
3 Sport 10
4 Touring 2
5 Enduro 25
6 Cruise 6
7 Touring 17
8 Enduro 5
9 Sport 15
10 Cruise 25
11 Touring 12
12 Touring 11
13 Sport 20
14 Enduro 6
15 Cruise 1
16 Cruise 12
17 Sport 21
18 Enduro 5
19 Touring 23
20 Enduro 2

更新日期: 2020-08-21

582 次瀏覽

開啟您的 職業生涯

完成課程即可獲得認證

開始
廣告