Jump to: navigation, search

Creating and Testing Filters

This topic describes part of the functionality of Genesys Content Analyzer.

Text Preprocessing Tab

The Text Preprocessing tab, shown in "Model Training Schedule: Text Preprocessing Tab," enables you to remove extraneous text from the text objects of a training object. From this tab, you can create filters (patterns) that search for text and perform various deletion operations. This can be helpful when the e-mails that you want to use for training contain significant amounts of text that has both of these characteristics:

  • It is predictable enough in content to be identifiable by a regular expression.
  • It is irrelevant or misleading for classification purposes.
Model Training Schedule: Text Preprocessing Tab
  1. Click Add filter. The New Filter dialog box appears, as shown in "New Filter Dialog Box."
  2. New Filter Dialog Box
  3. Choose a type from the Filter type drop-down list. The filter type specifies the action to take; for example, delete all text up to and including the matched text. Filter Types below provides descriptions. Filter type is called Pattern Type on the main Text Preprocessing tab.
  4. Enter text in the Filter body box. The filter body contains the text to match, as either a literal string or a regular expression. Filter body is called Pattern Body on the main Text Preprocessing tab.
  5. Click OK.
      The figure "Model Training Schedule: Text Preprocessing Tab" above shows an example using two filters. The first deletes the text "IDnumber=" and anything following it. The second deletes the text "messageStart" and anything preceding it.
  • Continue by testing the filter: enter sample text in the window in the Test Filter area.
  • Click Test. A new window displays the result of applying all filters, in order. The figure "FIlter Test Result" shows the result of the test on the text shown in "New Filter Dialog Box."
  • FIlter Test Result


    Filter Types

    The following is a list of the available filter types:

    • DELETE AFTER —Search for a match to the pattern body, then delete all text after and including the matching text.
    • DELETE BEFORE —Search for a match to the pattern body, then delete all text before and including the matching text.
    • DELETE ALL IF FIND —Search for a match to the pattern body, then delete the entire e-mail that includes the matching text.
    • DELETE ALL IF NOT FIND —Search for a match to the pattern body, then delete the entire e-mail if it does not include the matching text.
    • DELETE PATTERN —Search for a match to the pattern body, then delete only the text that matches the pattern.
    This page was last modified on December 17, 2013, at 11:54.

    Feedback

    Comment on this article:

    blog comments powered by Disqus