Configuring Analyzers for UCS Search
Basic options controlling indexing and searching are described in the
eServices 8.1 Reference Manual. This page describes one further option that controls the use of analyzers.
The analyzers that are supplied with UCS are
WhitespaceAnalyzer—Splits the text into tokens separated by white space characters (specifically, SPACE_SEPARATOR, LINE_SEPARATOR, PARAGRAPH_SEPARATOR, HORIZONTAL TABULATION, LINE FEED, VERTICAL TABULATION, FORM FEED, CARRIAGE RETURN, FILE SEPARATOR, GROUP SEPARATOR, RECORD SEPARATOR, or UNIT SEPARATOR).
- StandardAnalyzer—Converts the text to lower case, splits the text into tokens separated by the white space character, and removes high-frequency English words (called stop words).
LowerCaseAnalyzer—Converts the text to lower case and splits the text into tokens separated by the white space character.
SimpleAnalyzer—Divides text at non-letters and converts to lower case. This works well for languages in which words are separated by spaces, such as most European languages, but is of little use for languages in which words are not separated by spaces, such as many Asian languages.
KeywordAnalyzer—Treats the entire text as a single token. This is useful for data like zip codes, IDs, and some product names.
In the default case, UCS search uses the StandardAnalyzer for all fields in all tables in the database.
To override the default analyzer, use the following option.
- Optional: Yes
- Default value: StandardAnalyzer
- Valid values: See below
- Changes take effect: After restart
Sets the analyzer used for any table or field. In the option name, <table_name> is one of the following tables in the UCS database:
<any> can be anything, including zero. Use it to differentiate among multiple field-analyzer options referring to the same table.
Values for this option have the general form
<field>=<analyzer>, <field>=<analyzer>, ...
where <field> is the name of a field in the table and <analyzer> is the name of a supported and installed analyzer. For example:
- Option name: interaction-field-analyzer
- Option value: Text=GermanAnalyzer,StructuredText=StandardAnalyzer
With this option name and value, when searching the Interaction table, the search operation applies GermanAnalyzer to the Text field and StandardAnalyzer to the StructuredText field.
You can achieve the same result by creating two options:
These are the same as SimpleAnalyzer but also remove stop words: words that are so common that there is little to be gained in searching for them or listing their occurrences.
As an example, the stop words used by StandardAnalyzer, the language-specific analyzer for English, are a an and are as at be but by for if in into is it no not of on or such that the their then there these they this to was will with.
The language-specific analyzers installed with UCS are:
- CJKAnalyzer (Chinese/Japanese/Korean; any language that uses Chinese characters/kanji/hanja)
- StandardAnalyzer (English)