如何在 R 中查詢兩個字串向量之間共同單詞的數量?


要查詢兩個字串向量之間共同單詞的數量,我們首先需要使用 unlist 和 strsplit 函式分割這兩個向量,然後我們可以結合使用 length 函式和 intersect 函式。

檢視以下示例以瞭解如何操作。

示例 1

以下程式碼片段建立了一個向量 -

x1<-"Deep Learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks."
y1<-"Deep learning is an artificial intelligence (AI) function that imitates the workings of the human brain in processing data and creating patterns for use in decision making. Deep learning is a subset of machine learning in artificial intelligence that has networks capable of learning unsupervised from data that is unstructured or unlabeled. Also known as deep neural learning or deep neural network."
x1_split <- unlist(strsplit(x1,split=" "))
x1_split

建立了以下向量 -

[1] "Deep" "Learning" "is" "a" "subfield"
[6] "of" "machine" "learning" "concerned" "with"
[11] "algorithms" "inspired" "by" "the" "structure"
[16] "and" "function" "of" "the" "brain"
[21] "called" "artificial" "neural" "networks."

要查詢兩個字串向量之間共同單詞的數量,請將以下程式碼新增到上述程式碼片段中 -

x1<-"Deep Learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks."
y1<-"Deep learning is an artificial intelligence (AI) function that imitates the workings of the human brain in processing data and creating patterns for use in decision making. Deep learning is a subset of machine learning in artificial intelligence that has networks capable of learning unsupervised from data that is unstructured or unlabeled. Also known as deep neural learning or deep neural network."
y1_split <- unlist(strsplit(y1,split=" "))
y1_split

輸出

如果您將以上所有程式碼片段作為單個程式執行,則會生成以下輸出 -

[1] "Deep" "learning" "is" "an" "artificial"
[6] "intelligence" "(AI)" "function" "that" "imitates"
[11] "the" "workings" "of" "the" "human"
[16] "brain" "in" "processing" "data" "and"
[21] "creating" "patterns" "for" "use" "in"
[26] "decision" "making." "Deep" "learning" "is"
[31] "a" "subset" "of" "machine" "learning"
[36] "in" "artificial" "intelligence" "that" "has"
[41] "networks" "capable" "of" "learning" "unsupervised"
[46] "from" "data" "that" "is" "unstructured"
[51] "or" "unlabeled." "Also" "known" "as"
[56] "deep" "neural" "learning" "or" "deep"
[61] "neural" "network."

要查詢兩個字串向量之間共同單詞的數量,請將以下程式碼新增到上述程式碼片段中 -

length(intersect(x1_split,y1_split))

輸出

如果您將以上所有程式碼片段作為單個程式執行,則會生成以下輸出 -

[1] 12

示例 2

以下程式碼片段建立了一個向量 -

x2<-"Digital marketing is the act of promoting and selling products and services by leveraging online marketing tactics such as social media marketing, search marketing, and email marketing."
y2<-"Basically, digital marketing refers to any online marketing efforts or assets. Email marketing, pay-per-click advertising, social media marketing and even blogging are all great examples of digital marketing—they help introduce people to your company and convince them to buy."
x2_split<-unlist(strsplit(x2,split=" "))
x2_split

建立了以下向量 -

[1] "Digital" "marketing" "is" "the" "act"
[6] "of" "promoting" "and" "selling" "products"
[11] "and" "services" "by" "leveraging" "online"
[16] "marketing" "tactics" "such" "as" "social"
[21] "media" "marketing," "search" "marketing," "and"
[26] "email" "marketing."

要查詢兩個字串向量之間共同單詞的數量,請將以下程式碼新增到上述程式碼片段中 -

x2<-"Digital marketing is the act of promoting and selling products and services by leveraging online marketing tactics such as social media marketing, search marketing, and email marketing."
y2<-"Basically, digital marketing refers to any online marketing efforts or assets. Email marketing, pay-per-click advertising, social media marketing and even blogging are all great examples of digital marketing—they help introduce people to your company and convince them to buy."
y2_split<-unlist(strsplit(y2,split=" "))
y2_split

輸出

如果您將以上所有程式碼片段作為單個程式執行,則會生成以下輸出 -

[1] "Basically," "digital" "marketing" "refers"
[5] "to" "any" "online" "marketing"
[9] "efforts" "or" "assets." "Email"
[13] "marketing," "pay-per-click" "advertising," "social"
[17] "media" "marketing" "and" "even"
[21] "blogging" "are" "all" "great"
[25] "examples" "of" "digital" "marketing—they"
[29] "help" "introduce" "people" "to"
[33] "your" "company" "and" "convince"
[37] "them" "to" "buy."

要查詢兩個字串向量之間共同單詞的數量,請將以下程式碼新增到上述程式碼片段中 -

length(intersect(x2_split,y2_split))

輸出

如果您將以上所有程式碼片段作為單個程式執行,則會生成以下輸出 -

[1] 7

示例 3

以下程式碼片段建立了一個向量 -

x3<-"Data science is an essential part of any industry today, given the massive amounts of data that are produced. Data science is one of the most debated topics in the industries these days. Its popularity has grown over the years, and companies have started implementing data science techniques to grow their business and increase customer satisfaction. In this article, we’ll learn what data science is, and how you can become a data scientist."
y3<-"As the world entered the era of big data, the need for its storage also grew. It was the main challenge and concern for the enterprise industries until 2010. The main focus was on building a framework and solutions to store data. Now when Hadoop and other frameworks have successfully solved the problem of storage, the focus has shifted to the processing of this data. Data Science is the secret sauce here. All the ideas which you see in Hollywood sci-fi movies can actually turn into reality by Data Science. Data Science is the future of Artificial Intelligence. Therefore, it is very important to understand what is Data Science and how can it add value to your business."
x3_split<-unlist(strsplit(x3,split=" "))
x3_split

建立了以下向量 -

[1] "Data" "science" "is" "an"
[5] "essential" "part" "of" "any"
[9] "industry" "today," "given" "the"
[13] "massive" "amounts" "of" "data"
[17] "that" "are" "produced." "Data"
[21] "science" "is" "one" "of"
[25] "the" "most" "debated" "topics"
[29] "in" "the" "industries" "these"
[33] "days." "Its" "popularity" "has"
[37] "grown" "over" "the" "years,"
[41] "and" "companies" "have" "started"
[45] "implementing" "data" "science" "techniques"
[49] "to" "grow" "their" "business"
[53] "and" "increase" "customer" "satisfaction."
[57] "In" "this" "article," "we’ll"
[61] "learn" "what" "data" "science"
[65] "is," "and" "how" "you"
[69] "can" "become" "a" "data"
[73] "scientist."

要查詢兩個字串向量之間共同單詞的數量,請將以下程式碼新增到上述程式碼片段中

x3<-"Data science is an essential part of any industry today, given the massive amounts of data that are produced. Data science is one of the most debated topics in the industries these days. Its popularity has grown over the years, and companies have started implementing data science techniques to grow their business and increase customer satisfaction. In this article, we’ll learn what data science is, and how you can become a data scientist."
y3<-"As the world entered the era of big data, the need for its storage also grew. It was the main challenge and concern for the enterprise industries until 2010. The main focus was on building a framework and solutions to store data. Now when Hadoop and other frameworks have successfully solved the problem of storage, the focus has shifted to the processing of this data. Data Science is the secret sauce here. All the ideas which you see in Hollywood sci-fi movies can actually turn into reality by Data Science. Data Science is the future of Artificial Intelligence. Therefore, it is very important to understand what is Data Science and how can it add value to your business."
y3_split<-unlist(strsplit(y3,split=" "))
y3_split

輸出

如果您將以上所有程式碼片段作為單個程式執行,則會生成以下輸出 -

[1] "As" "the" "world" "entered"
[5] "the" "era" "of" "big"
[9] "data," "the" "need" "for"
[13] "its" "storage" "also" "grew."
[17] "It" "was" "the" "main"
[21] "challenge" "and" "concern" "for"
[25] "the" "enterprise" "industries" "until"
[29] "2010." "The" "main" "focus"
[33] "was" "on" "building" "a"
[37] "framework" "and" "solutions" "to"
[41] "store" "data." "Now" "when"
[45] "Hadoop" "and" "other" "frameworks"
[49] "have" "successfully" "solved" "the"
[53] "problem" "of" "storage," "the"
[57] "focus" "has" "shifted" "to"
[61] "the" "processing" "of" "this"
[65] "data." "Data" "Science" "is"
[69] "the" "secret" "sauce" "here."
[73] "All" "the" "ideas" "which"
[77] "you" "see" "in" "Hollywood"
[81] "sci-fi" "movies" "can" "actually"
[85] "turn" "into" "reality" "by"
[89] "Data" "Science." "Data" "Science"
[93] "is" "the" "future" "of"
[97] "Artificial" "Intelligence." "Therefore," "it"
[101] "is" "very" "important" "to"
[105] "understand" "what" "is" "Data"
[109] "Science" "and" "how" "can"
[113] "it" "add" "value" "to"
[117] "your" "business."

要查詢兩個字串向量之間共同單詞的數量,請將以下程式碼新增到上述程式碼片段中 -

length(intersect(x3_split,y3_split))

輸出

如果您將以上所有程式碼片段作為單個程式執行,則會生成以下輸出 -

[1] 16

更新於: 2021年11月11日

534 次瀏覽

啟動您的 職業生涯

透過完成課程獲得認證

立即開始
廣告

© . All rights reserved.