所有後綴的 Trie 樹
從文字中,我們可以生成所有後綴以構建樹形結構。我們知道文字中出現的每個模式都必須是文字中某個可能字尾的字首。透過構建所有後綴的 Trie 樹,我們可以線上性時間內找到任何子字串。每個字尾都以字串終止符號結尾。從每個節點出發,如果存在任何路徑,則向前移動,否則返回該模式未找到。
對於此演算法,時間複雜度為 O(m+k),其中 m 是字串的長度,k 是模式在文字中出現的頻率。
輸入和輸出
Input: Main String: “ABAAABCDBBABCDDEBCABC”. Pattern “ABC” Output: Pattern found at position: 4 Pattern found at position: 10 Pattern found at position: 18
演算法
在此演算法中,我們將使用一個特殊的節點,稱為 Trie 節點。它將儲存所有後綴的索引以及另一個 Trie 節點的地址作為連結。
createTrie(root: trieNode, text)
輸入:一個 trieNode 型別的根節點。
輸出:使用主字串構建的字尾樹
Begin for i := 0 to length of text, do substring from ith position to end as suffix, and add in index i in tire. done End
findPat(pattern, node)
輸入:要查詢的模式和節點,用於在其後綴子樹中進行檢查
輸出 − 找到模式的索引列表
Begin if pattern size is 0, then return suffIndex of node if node.suff[patten[0]] ≠φ, then return node.suff[pattern[0]].findPat(substring from 1 to end o pattern) else return φ End
searchPat(pattern)
輸入 − 將要搜尋的模式
輸出 − 文字中找到模式的索引列表
Begin define res as list. res := findPat(pattern) if res ≠φ, then patLen := length of pattern for i := 0 to end of res list, do print all indexes where pattern was found done End
示例
#include<iostream>
#include<list>
#define MAXCHAR 256
using namespace std;
class trieNode { //node to hold all suffixes
private:
trieNode *suff[MAXCHAR];
list<int> *suffIndex;
public:
trieNode() {
suffIndex = new list<int>;
for (int i = 0; i < MAXCHAR; i++)
suff[i] = NULL; //no child initially
}
void addSuffix(string suffix, int sIndex);
list<int>* searchPattern(string pat);
};
void trieNode::addSuffix(string suffix, int sIndex) {
suffIndex->push_back(sIndex); //store index initially
if (suffix.size() > 0) {
char cIndex = suffix[0];
if (suff[cIndex] == NULL) //if no sub tree present for this character
suff[cIndex] = new trieNode(); //create new node
suff[cIndex]->addSuffix(suffix.substr(1), sIndex+1); //for next suffix
}
}
list<int>* trieNode::searchPattern(string pattern) {
if (pattern.size() == 0)
return suffIndex;
if (suff[pattern[0]] != NULL)
return (suff[pattern[0]])->searchPattern(pattern.substr(1)); //follow to next node
else
return NULL; //when no node are there to jump
}
class trieSuffix { //trie for all suffixes
trieNode root;
public:
trieSuffix(string mainString) { //add suffixes and make trie
for (int i = 0; i < mainString.length(); i++)
root.addSuffix(mainString.substr(i), i);
}
void searchPat(string pattern, int *locArray, int *index);
};
void trieSuffix::searchPat(string pattern, int *locArray, int *index) {
list<int> *res = root.searchPattern(pattern);
// Check if the list of indexes is empty or not
if (res != NULL) {
list<int>::iterator it;
int patLen = pattern.length();
for (it = res->begin(); it != res->end(); it++) {
(*index)++;
locArray[(*index)] = *it - patLen;
}
}
}
int main() {
string mainString = "ABAAABCDBBABCDDEBCABC";
string pattern = "ABC";
int locArray[mainString.size()];
int index = -1;
trieSuffix trie(mainString);
trie.searchPat(pattern, locArray, &index);
for(int i = 0; i <= index; i++) {
cout << "Pattern found at position: " << locArray[i]<<endl;
}
}輸出
Pattern found at position: 4 Pattern found at position: 10 Pattern found at position: 18
廣告
資料結構
網路
關係型資料庫管理系統 (RDBMS)
作業系統
Java
iOS
HTML
CSS
Android
Python
C 程式設計
C++
C#
MongoDB
MySQL
Javascript
PHP