Lucene - StandardAnalyzer

這是最複雜的分析器，能夠處理姓名、電子郵件地址等。它將每個標記轉換為小寫，並刪除常見的單詞和標點符號（如果存在）。

類宣告

以下是org.apache.lucene.analysis.StandardAnalyzer類的宣告：

public final class StandardAnalyzer
   extends StopwordAnalyzerBase

欄位

以下是org.apache.lucene.analysis.StandardAnalyzer類的欄位：

static int DEFAULT_MAX_TOKEN_LENGTH – 這是預設允許的最大標記長度。
static Set<?> STOP_WORDS_SET - 一個不可修改的集合，包含一些常見的英語單詞，這些單詞通常對搜尋沒有用。

類建構函式

下表顯示了不同的類建構函式：

序號	建構函式和說明
1	StandardAnalyzer(Version matchVersion) 使用預設停用詞 (STOP_WORDS_SET) 構建分析器。
2	StandardAnalyzer(Version matchVersion, File stopwords) 已棄用。請改用 StandardAnalyzer(Version, Reader) 。
3	StandardAnalyzer(Version matchVersion, Reader stopwords) 使用給定讀取器中的停用詞構建分析器。
4	StandardAnalyzer(Version matchVersion, Set<?> stopWords) 使用給定的停用詞構建分析器。

類方法

下表顯示了不同的類方法：

序號	方法和說明
1	protected Reusable Analyzer Base. Token Stream Components create Components(String fieldName, Reader reader) 為該分析器建立一個新的 ReusableAnalyzerBase.TokenStreamComponents 例項。
2	int getMaxTokenLength()
3	void setMaxTokenLength(int length) 設定允許的最大標記長度。

序號

方法和說明

protected Reusable Analyzer Base. Token Stream Components create Components(String fieldName, Reader reader)

為該分析器建立一個新的 ReusableAnalyzerBase.TokenStreamComponents 例項。

int getMaxTokenLength()

void setMaxTokenLength(int length)

設定允許的最大標記長度。

繼承的方法

此類繼承自以下類的方法：

org.apache.lucene.analysis.StopwordAnalyzerBase
org.apache.lucene.analysis.ReusableAnalyzerBase
org.apache.lucene.analysis.Analyzer
java.lang.Object

用法

private void displayTokenUsingStandardAnalyzer() throws IOException {
   String text 
      = "Lucene is simple yet powerful java based search library.";
   Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
   TokenStream tokenStream 
      = analyzer.tokenStream(LuceneConstants.CONTENTS,
        new StringReader(text));
   TermAttribute term = tokenStream.addAttribute(TermAttribute.class);
   
   while(tokenStream.incrementToken()) {
      System.out.print("[" + term.term() + "] ");
   }
}

示例應用程式

讓我們建立一個測試 Lucene 應用程式來測試使用 BooleanQuery 的搜尋。

步驟	說明
1	建立一個名為LuceneFirstApplication的專案，放在com.tutorialspoint.lucene包下，如Lucene - 第一個應用程式章節所述。您也可以使用Lucene - 第一個應用程式章節中建立的專案，以便理解搜尋過程。
2	建立LuceneConstants.java，如Lucene - 第一個應用程式章節所述。保持其餘檔案不變。
3	建立如下所示的LuceneTester.java。
4	清理並構建應用程式，以確保業務邏輯按要求工作。

LuceneConstants.java

此類用於提供可在示例應用程式中使用的各種常量。

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

LuceneTester.java

此類用於測試 Lucene 庫的搜尋功能。

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.io.StringReader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.StandardAnalyzer;
import org.apache.lucene.analysis.tokenattributes.TermAttribute;
import org.apache.lucene.util.Version;

public class LuceneTester {
	
   public static void main(String[] args) {
      LuceneTester tester;

      tester = new LuceneTester();
   
      try {
         tester.displayTokenUsingStandardAnalyzer();
      } catch (IOException e) {
         e.printStackTrace();
      }
   }

   private void displayTokenUsingStandardAnalyzer() throws IOException {
      String text 
         = "Lucene is simple yet powerful java based search library.";
      Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
      TokenStream tokenStream = analyzer.tokenStream(
         LuceneConstants.CONTENTS, new StringReader(text));
      TermAttribute term = tokenStream.addAttribute(TermAttribute.class);
      while(tokenStream.incrementToken()) {
         System.out.print("[" + term.term() + "] ");
      }
   }
}

執行程式

完成原始碼建立後，您可以透過編譯和執行程式繼續操作。為此，請保持LuceneTester.Java檔案選項卡處於活動狀態，並使用 Eclipse IDE 中提供的“執行”選項，或使用Ctrl + F11編譯和執行LuceneTester應用程式。如果您的應用程式成功執行，它將在 Eclipse IDE 的控制檯中列印以下訊息：

[lucene] [simple] [yet] [powerful] [java] [based] [search] [library]

lucene_analysis.htm

列印頁面