Jump to: navigation, search

Planning: The Data Pipeline

This topic addresses issues such as data sources, what kind of data Predictive Routing needs in order to function, how much data you need, and how best to structure it. It also includes data size guidelines.

In general, you will need to collect the following types of data:

  • Agent
  • Customer
  • Interaction
  • Outcome

While Genesys Info Mart provides a very rich source for the required data, and Configuration Server can supply the Agent Profile schema data, your environment might include various other relevant data sources, such as a CRM system. Because environments differ and the exact metrics you want to target for improvement are also dependent on your environment and business needs, this topic offers only general guidelines.

Predictive Routing can use any relevant data available in your environment. Data must be combined into a consistent schema and saved as a CSV file. You can use either the Predictive Routing application or the API to import this CSV data.

Important
This Guide assumes you are using the Predictive Routing application. For details about the API, see the Predictive Routing API Reference.

Predictive Routing evaluates the schema, enables you to modify it if necessary, and synchronizes the verified dataset to the database. You can update your dataset by appending additional data.

Important
Genesys testing has identified certain guidelines about data size to keep in mind while creating datasets and planning modeling and feature analysis.

Considerations

  • What metrics of Contact Center operation are we going to optimize?
  • What databases are available?
  • Is it possible to join the data from those databases?
  • Is the data clean and consistent in format? This is especially an issue if you are joining data from various sources.
  • Do we have enough data to draw conclusions about agents performance?
  • What data is available from the IVR and the CRM database at runtime?

Note that when you create your CSV file, you will need to include a value, a context_id or customer ID, that can be entered into your scoring request from your strategy and that enables you to retrieve the relevant record.

Data Size Guidelines - Data Import, Model Training, and Feature Analysis

Description Column Count Row Count/File Size
Size of data that can be uploaded to create a dataset in a single batch. You can append data; you can load bigger datasets in multiple batches. Data uploads successfully for a file with 250 columns but with a smaller number of total records so that the total file size is less than the 250 MB limit. 250 columns maximum 250 MB file size
Minimum number of records needed to train a DISJOINT model for an agent. Not Applicable 10
Total Cardinality limit for model training. Total Cardinality = the number of numeric columns plus the sum of the number of unique values across all string columns within a given dataset. No specific column count; has been tested up to 250 columns. Total Cardinality should be less than 2 to the power of 29.
Record count limit for GLOBAL model training. Not Applicable; from a model-training perspective there is virtually no limit on the number of columns. The constraining issue is the possibility of compromising the Global model quality by ending up with a reduced number of samples for training. The total number of records should be less than 2 to power of 29 (that is, 536870912) divided by Total Cardinality as defined above.
  • Example 1:
    You are required to use ALL of the data for training the GLOBAL model (note that the GLOBAL model is trained even if you select DISJOINT, so that the scoring engine can rank agents who do not yet have data). The dataset contains 1 million records. Therefore the maximum total cardinality is 536 (536870912 divided by 1 million) .
  • Example 2:
    You can undersample the data for training the GLOBAL model—that is, use fewer than the ideal number of records for training. You might take 10,000 as the total cardinality, but only 53,687 of your total of 1 million records will be used for training. The calculation to determine this is 10,000 * 53,687 = 536870912 (the maximum cardinality).
Column count limitation on the Feature Analysis report, Agent Variance report, Lift Estimation report, and Model Quality report. 250 Not Applicable

Genesys Info Mart Data

The following table represents the data required from Genesys Info Mart (GIM) and the Genesys Configuration Database.

Domain Fields Table
Agent Profile Agent username RESOURCE_
Agent Profile Employee ID RESOURCE_
Customer Profile CustomerID Attached Data to map with interaction metadata record
Customer Profile SERVICE_TYPE Attached Data to map with interaction metadata record
Interaction Metadata INTERACTION ID Interaction_Fact (IF)
Interaction Metadata INTERACTION TYPE IRF
Interaction Metadata MEDIA TYPE Media_Type
Interaction Metadata RESOURCE ROLE TECHNICAL_DESCRIPTOR
Interaction Metadata ROLE REASON TECHNICAL_DESCRIPTOR
Interaction Metadata TECHNICAL RESULT TECHNICAL_DESCRIPTOR
Interaction Metadata RESULT REASON TECHNICAL_DESCRIPTOR
Interaction Metadata MEDIA RESOURCE (VIRTUAL QUEUE) RESOURCE_
Interaction Metadata RESOURCE GROUP COMBINATION RESOURCE_GROUP_COMBINATION
Interaction Metadata ROUTING TARGET ROUTING_TARGET
Interaction Metadata TARGET_OBJECT_SELECTED TARGET_OBJECT_SELECTED
Interaction Metadata SKILL EXPRESSION REQUESTED_SKILL
Interaction Metadata START_TS INTERACTION_FACT
Interaction Metadata END_TS INTERACTION_FACT
Interaction Metadata LAST ROUTING POINT RESOURCE_
Interaction Metadata LAST VQ  
Interaction Metadata IS_LAST_RESOURCE IRF
Interaction Metadata SOURCE_ADRESS (ANI,..) SOURCE_ADDRESS
Interaction Metadata TARGET_ADRESS (DNIS,..) TARGET_ADDRESS
Interaction Metadata WAITING TIME (QUEUE DURATION, ROUTING POINT DURATION, MEDIATION DURATION)
Interaction Metadata RINGING TIME RING_DURATION
Interaction Metadata HANDLE TIME (Talk_Duration, ACW Duration,..)
Interaction Metadata HOLD TIME HOLD_DURATION
Interaction Metadata HOLD COUNT HOLD_COUNT
Interaction Metadata ACW COUNT AFTER_CALL_WORK_COUNT
Interaction Metadata FOCUS TIME FOCUS_TIME
Interaction Metadata CONSULTATION TIME (CONS_RCV_TALK_DURATION, POST_CONS_XFER_TALK_COUNT,…)  

Feedback

Comment on this article:

blog comments powered by Disqus
This page was last modified on 5 March 2018, at 14:02.