Creating and Testing Filters
This topic describes part of the functionality of Genesys Content Analyzer.
Text Preprocessing Tab
The Text Preprocessing tab, shown in "Model Training Schedule: Text Preprocessing Tab," enables you to remove extraneous text from the text objects of a training object. From this tab, you can create filters (patterns) that search for text and perform various deletion operations. This can be helpful when the e-mails that you want to use for training contain significant amounts of text that has both of these characteristics:
- It is predictable enough in content to be identifiable by a regular expression.
- It is irrelevant or misleading for classification purposes.
- Click Add filter. The New Filter dialog box appears, as shown in "New Filter Dialog Box."
- Choose a type from the Filter type drop-down list. The filter type specifies the action to take; for example, delete all text up to and including the matched text. Filter Types below provides descriptions. Filter type is called Pattern Type on the main Text Preprocessing tab.
- Enter text in the Filter body box. The filter body contains the text to match, as either a literal string or a regular expression. Filter body is called Pattern Body on the main Text Preprocessing tab.
- Click OK.
The figure "Model Training Schedule: Text Preprocessing Tab" above shows an example using two filters. The first deletes the text "IDnumber=" and anything following it. The second deletes the text "messageStart" and anything preceding it.
The following is a list of the available filter types:
- DELETE AFTER —Search for a match to the pattern body, then delete all text after and including the matching text.
- DELETE BEFORE —Search for a match to the pattern body, then delete all text before and including the matching text.
- DELETE ALL IF FIND —Search for a match to the pattern body, then delete the entire e-mail that includes the matching text.
- DELETE ALL IF NOT FIND —Search for a match to the pattern body, then delete the entire e-mail if it does not include the matching text.
- DELETE PATTERN —Search for a match to the pattern body, then delete only the text that matches the pattern.