Jump to: navigation, search

Indexing Tab

This topic describes part of the functionality of Genesys Content Analyzer.


This tab, shown in Indexing Tab, displays information on cooccurrence patterns of words in uncategorized e-mails.

Indexing Tab

The tab displays, in tree form, a list of the words that occur in all uncategorized e-mails (except Stop Words).

The index tree consists of folder icons, each labeled with a word, with the number of occurrences (number of e-mails it occurs in) in square brackets. These words can be called head words.

Each head word folder expands to a list of the words (also folders) that cooccur with the head word—that is, that occur together with the head word in one or more e-mails. Each cooccurring word is followed by square brackets containing two numbers: the number of e-mails this word occurs in, and a ratio. This ratio is the rate of occurrence with this head word divided by rate of occurrence in whole corpus. Indexing Tab Example provides an example.

Indexing Tab Example

Among the information displayed in this example is the following:

  • magazines occurs in seven uncategorized e-mails.
  • articles occurs in three of those seven e-mails, which is 4.4 times as often as it occurs in the entire corpus of uncategorized e-mails.
  • Of the three e-mails containing magazines and articles, two also contain newsstand. This is 13.7 times as often as newsstand occurs in the entire corpus.

This indicates that the words articles and newsstand are highly likely to occur together, which means e-mails that contain both words are good candidates for grouping together in a category. If you select newsstand, then click Select texts, the display switches to the Main tab, showing that all e-mails that contain magazines, articles, and newsstand have been put in the Candidate messages list.

At the bottom of the tab are the following:

  • Two boxes for filtering the words that are displayed:
    • Find Words —Restrict the words displayed. More on this below.
    • Min. Texts with words —The word must occur in at least this number of e-mails to be displayed in the list.
  • Two buttons that initiate actions:
    • Rebuild Index Tree —Rebuild the tree to apply the filters that you set in the Find Words and Min. Texts with words boxes.
    • Select Texts —Select a word in the index tree, then click this button to put all e-mails containing this word in the Candidate messages list.
    Use the Find words box to restrict the words displayed. Enter a single word to display only that word and the words that occur with it. Enter multiple words to specify which cooccurring words to start the list with. The figure "Find Words = “mystery” " shows the result of entering mystery in the Find words box, then clicking Rebuild Index Tree.
    Find Words = “mystery”

    The figure "Find Words = “mystery reading” " shows the result of entering "mystery reading" in the Find words box: the index tree shows only the head word mystery and the cooccurring word reading.

    Find Words = “mystery reading”
This page was last modified on December 17, 2013, at 11:54.

Feedback

Comment on this article:

blog comments powered by Disqus