如何在 R 中從網站連結中提取網站名稱?


如果我們有一份網站連結列表,並且我們想要從那些連結中提取網站名稱,那將是一項耗時的任務,因為我們需要一個接一個地複製每個名稱。因此,最好使用 R 中的一個函式來提取它們,並節省時間。要從網站連結中提取網站名稱,我們可以使用 urltools 軟體包的 suffix_extract 函式。這將提取主機、子域名、域名和字尾。並且眾所周知,域名值是網站名稱。

載入 urltools 軟體包 -

library(urltools)

儲存在向量中的網站連結 -

Web_Links<-c("https://www.grammarly.com/grammar-check","https://sceptermarketing.com/comma-separated-lists-of-us-states-abbreviations-select-options-etc/","https://tutorialspoint.tw/machine_learning/index.htm","https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/sort","https://www-islaah-in.cdn.ampproject.org/v/s/www.islaah.in/masail/13977/?amp=&usqp=mq331AQFKAGwASA%3D&_js_v=0.1#aoh=16016175660203&referrer=https%3A%2F%2Fwww.google.com&_tf=From%20%251%24s&share=https%3A%2F%2Fwww.islaah.in%2Fmasail%2F13977%2F","http://qoitrat.org/Qa/searchtopic.php?Main=76&MainTopc=245","https://theislamicinformation-com.cdn.ampproject.org/v/s/theislamicinformation.com/aqeeqah-for-baby-boy-and-girl/amp/?usqp=mq331AQFKAGwASA%3D&_js_v=0.1#aoh=16015741096047&referrer=https%3A%2F%2Fwww.google.com&_tf=From%20%251%24s&share=https%3A%2F%2Ftheislamicinformation.com%2Faqeeqah-for-baby-boy-and-girl%2F","https://parenting.firstcry.com/articles/50-popular-turkish-baby-names-for-girls/","https://www.amazon.in/SELF-CHEF-Delhi-Aloo-Tikki/dp/B089GW5ZPL/ref=asc_df_B089GW5ZPL/?tag=googleshopmob-21&linkCode=df0&hvadid=397060787211&hvpos=&hvnetw=g&hvrand=3239398407570685332&hvpone=&hvptwo=&hvqmt=&hvdev=m&hvdvcmdl=&hvlocint=&hvlocphy=9040189&hvtargid=pla-923173707999&psc=1&ext_vrnc=hi","http://ridenow.co.in/?From=Bareilly&To=Delhi&submit=","https://www.savaari.com/delhi/delhi-to-bareilly-cabs","https://www.olxgroup.com/search/operations/delhi-ncr/all-brands","https://unbelievable-facts.com/work-with-us","https://www.tataaiginsurance.in/taig/taig/tata_aig/CorporateCustomerPortal/login.jsp","https://www.dummies.com/programming/r/how-to-change-plot-options-in-r/","http://www.sthda.com/english/wiki/add-titles-to-a-plot-in-r-software")

列印網站連結向量 -

Web_Links

[1] "https://www.grammarly.com/grammar-check" [2] "https://sceptermarketing.com/comma-separated-lists-of-us-states-abbreviations-select-options-etc/" [3] "https://tutorialspoint.tw/machine_learning/index.htm" [4] "https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/sort" [5] "https://www-islaah-in.cdn.ampproject.org/v/s/www.islaah.in/masail/13977/?amp=&usqp=mq331AQFKAGwASA%3D&_js_v=0.1#aoh=16016175660203&referrer=https%3A%2F%2Fwww.google.com&_tf=From%20%251%24s&share=https%3A%2F%2Fwww.islaah.in%2Fmasail%2F13977%2F" [6] "http://qoitrat.org/Qa/searchtopic.php?Main=76&MainTopc=245" [7] "https://theislamicinformation-com.cdn.ampproject.org/v/s/theislamicinformation.com/aqeeqah-for-baby-boy-and-girl/amp/?usqp=mq331AQFKAGwASA%3D&_js_v=0.1#aoh=16015741096047&referrer=https%3A%2F%2Fwww.google.com&_tf=From%20%251%24s&share=https%3A%2F%2Ftheislamicinformation.com%2Faqeeqah-for-baby-boy-and-girl%2F" [8] "https://parenting.firstcry.com/articles/50-popular-turkish-baby-names-for-girls/" [9] "https://www.amazon.in/SELF-CHEF-Delhi-Aloo-Tikki/dp/B089GW5ZPL/ref=asc_df_B089GW5ZPL/?tag=googleshopmob-21&linkCode=df0&hvadid=397060787211&hvpos=&hvnetw=g&hvrand=3239398407570685332&hvpone=&hvptwo=&hvqmt=&hvdev=m&hvdvcmdl=&hvlocint=&hvlocphy=9040189&hvtargid=pla-923173707999&psc=1&ext_vrnc=hi" [10] "http://ridenow.co.in/?From=Bareilly&To=Delhi&submit=" [11] "https://www.savaari.com/delhi/delhi-to-bareilly-cabs" [12] "https://www.olxgroup.com/search/operations/delhi-ncr/all-brands" [13] "https://unbelievable-facts.com/work-with-us" [14] "https://www.tataaiginsurance.in/taig/taig/tata_aig/CorporateCustomerPortal/login.jsp" [15] "https://www.dummies.com/programming/r/how-to-change-plot-options-in-r/" [16] "http://www.sthda.com/english/wiki/add-titles-to-a-plot-in-r-software"

提取網站名稱 -

host subdomain
1 www.grammarly.com           www
2 sceptermarketing.com       <NA>
3 www.tutorialspoint.com      www
4 www.rdocumentation.org      www
5 www-islaah-in.cdn.ampproject.org www-islaah-in.cdn
6 qoitrat.org                  <NA>
7 theislamicinformation-com.cdn.ampproject.org theislamicinformation-com.cdn
8 parenting.firstcry.com      parenting
9 www.amazon.in                www
10 ridenow.co.in               <NA>
11 www.savaari.com             www
12 www.olxgroup.com            www
13 unbelievable-facts.com      <NA>
14 www.tataaiginsurance.in     www
15 www.dummies.com             www
16 www.sthda.com               www
domain suffix
1 grammarly    com
2 sceptermarketing com
3 tutorialspoint com
4 rdocumentation org
5 ampproject org
6 qoitrat org
7 ampproject org
8 firstcry com
9 amazon in
10 ridenow co.in
11 savaari com
12 olxgroup com
13 unbelievable-facts com
14 tataaiginsurance in
15 dummies com 16 sthda com

更新於: 16-Oct-2020

196 次檢視

推進您的職業生涯

完成課程獲得認證

開始
廣告
© . All rights reserved.