Jump to: navigation, search

Procedure: Schedule training using the Model Options tab

This topic describes part of the functionality of Genesys Content Analyzer.

Purpose: To specify how and when a training object will be processed to produce a model.


Prerequisites

  • A training object containing a sufficient number of e-mails or other text objects. When to Train provides suggestions about judging whether there are enough text objects.
  1. Model Name —Enter a name for the model that will result from the scheduled training. Creating a Category Tree explains restrictions on the names of Knowledge Manager objects.
  2. Training Object —Select a training object.
  3. Subject Field Treatment —Select from the following treatments of the Subject field of e-mails:
  • Ignore —Training does not consider the content of the Subject field
  • Add to the text —Training considers the content of the Subject field.
  • Add with double weight —Training gives the content of the Subject field twice as much importance as the content of the e-mail body.
  • Training Quality —Draft is the lowest quality, 6 is the highest. Note the following:
    • Training time increases as you move from Draft quality to level 3 quality. But once the quality goes above 3, there is not much difference in training time.
    • Genesys recommends that you use Draft quality only when you want to obtain a preliminary reading of the model’s quality estimation. For production, use quality 2–6.
  • Cross Validation —Select either no cross-validation, or cross-validation that splits the data into 3, 5, or 10 sets. Cross-Validation provides an explanation.
      • If you select cross-validation, training produces an accuracy rating for the model along with the model itself. This has the advantage of not requiring an extra testing step, but it increases the training time.
  • Start Time —Enter a start time or select a unit (day, month, hour, minute) and change its value using the up and down arrows. Because training can use a large proportion of system resources, you will probably want to schedule it for nonpeak hours.
  • Important
    Be sure to set a time later than the present moment.
  • Min Samples in Category —Enter the minimum number of text objects that a category must have in order to be included in training. Categories with no or few text objects make poor subjects for training.
  • Keyword Threshold —Enter the minimum number of text objects that a keyword must occur in for that keyword to be considered in training.
      • A relatively high value for this setting can reduce training time, but it can also reduce quality. What counts as a high or low value for this setting depends on the total size of the training object. For example, if a training object has 5 to 10 text objects per category, a high keyword threshold might be 2 or3. If a training object has 30 to 50 text objects per category, a high keyword threshold might be 20.
  • Categories for Training —Select All Categories or Terminal Categories Only. A "terminal category" is one that contains no subcategories. It may be that a category tree uses nonterminal categories only or mostly for organizing the terminal categories. In this case few or no text objects are associated with the nonterminal categories, and there is little to be gained by including the nonterminal categories in training.
  • Training Data Quality —Select Regular unless you know that the training object contains many wrongly categorized text objects. If it does, select Unreliable to set the categorization algorithm to run in an altered mode that gives better results with this type of data.
  • Next Steps

    • Optionally, remove superfluous or misleading text from the training object (next section).
    • Once the model is trained, test it. See Testing Models.
    This page was last modified on December 17, 2013, at 11:54.

    Feedback

    Comment on this article:

    blog comments powered by Disqus