如何在R語言中建立姓名和姓氏的單獨列?


在資料分析中,很多時候人們的姓名和姓氏是合併在一起的,或者說儲存在一個單獨的欄位中,因此我們需要將它們分開,以便更容易閱讀資料。為了在R語言中建立姓名和姓氏的單獨列,我們可以使用tidyr包的extract函式。

檢視以下示例以瞭解如何操作。

示例1

以下程式碼片段建立一個示例資料框:

Names<-c("John Jones","Steve Smith","Pat Cummins","David Warner","Andrew Flintoff","Aaron Finch","Mitchell Starc","Nathan Lyon","Mathew Wade","Adam Zampa","Adam Gilchrist","Ricky Ponting","Glenn McGrath","Ben Cutting","John Cena","Brock Williams","Rubel Hussain","Soumya Sarkar","Mehidy Hasan","Liton Das")
df1<-data.frame(Names)
df1

建立了以下資料框:

    Names
1  John Jones
2  Steve Smith
3  Pat Cummins
4  David Warner
5  Andrew Flintoff
6  Aaron Finch
7  Mitchell Starc
8  Nathan Lyon
9  Mathew Wade
10 Adam Zampa
11 Adam Gilchrist
12 Ricky Ponting
13 Glenn McGrath
14 Ben Cutting
15 John Cena
16 Brock Williams
17 Rubel Hussain
18 Soumya Sarkar
19 Mehidy Hasan
20 Liton Das

要載入tidyr包並在df1中為姓名和姓氏建立單獨的列,請將以下程式碼新增到上面的程式碼片段中:

library(tidyr)
extract(df1,Names,c("First_Name","Last_Name"), "([^ ]+) (.*)")

輸出

如果您將以上所有程式碼片段作為一個程式執行,則會生成以下輸出:

  First_Name Last_Name
1  John      Jones
2  Steve     Smith
3  Pat       Cummins
4  David     Warner
5  Andrew    Flintoff
6  Aaron     Finch
7  Mitchell  Starc
8  Nathan    Lyon
9  Mathew    Wade
10 Adam      Zampa
11 Adam      Gilchrist
12 Ricky     Ponting
13 Glenn     McGrath
14 Ben       Cutting
15 John      Cena
16 Brock     Williams
17 Rubel     Hussain
18 Soumya    Sarkar
19 Mehidy    Hasan
20 Liton     Das

示例2

以下程式碼片段建立一個示例資料框:

Names<-c("Kane Williamson","Devon Conway","Trent Boult","Ross Taylor","Martin Guptill","Tim Southee","James Neesham","Lockie Ferguson","Ish Sodhi","Matt Henry","Tom Latham","Mark Chapman","Henry Nicholos","Tom Bundell","Sachin Tendulkar","Rahul Dravid","Chris Gayle","Tabraiz Shamsi","Aiden Makram","David Miller")
df2<-data.frame(Names)
df2

建立了以下資料框:

    Names
1  Kane Williamson
2  Devon Conway
3  Trent Boult
4  Ross Taylor
5  Martin Guptill
6  Tim Southee
7  James Neesham
8  Lockie Ferguson
9  Ish Sodhi
10 Matt Henry
11 Tom Latham
12 Mark Chapman
13 Henry Nicholos
14 Tom Bundell
15 Sachin Tendulkar
16 Rahul Dravid
17 Chris Gayle
18 Tabraiz Shamsi
19 Aiden Makram
20 David Miller

要在df2中為姓名和姓氏建立單獨的列,請將以下程式碼新增到上面的程式碼片段中:

extract(df2,Names,c("First_Name","Last_Name"), "([^ ]+) (.*)")

輸出

如果您將以上所有程式碼片段作為一個程式執行,則會生成以下輸出:

 First_Name Last_Name
1  Kane     Williamson
2  Devon    Conway
3  Trent    Boult
4  Ross     Taylor
5  Martin   Guptill
6  Tim      Southee
7  James    Neesham
8  Lockie   Ferguson
9  Ish      Sodhi
10 Matt     Henry
11 Tom      Latham
12 Mark     Chapman
13 Henry    Nicholos
14 Tom      Bundell
15 Sachin   Tendulkar
16 Rahul    Dravid
17 Chris    Gayle
18 Tabraiz  Shamsi
19 Aiden    Makram
20 David    Miller

更新於:2021年11月11日

瀏覽量:1000+

啟動你的職業生涯

完成課程獲得認證

開始學習
廣告
© . All rights reserved.