Planning: The Data Pipeline
This topic addresses issues such as data sources, what kind of data Predictive Routing needs in order to function, how much data you need, and how best to structure it. It also includes data size guidelines and a list of Genesys Info Mart tables that provide data for GPR.
In general, you will need to collect the following types of data:
While Genesys Info Mart provides a very rich source for the required data, and Configuration Server can supply the Agent Profile schema data, your environment might include various other relevant data sources, such as a CRM system. Because environments differ and the exact metrics you want to target for improvement are also dependent on your environment and business needs, this topic offers only general guidelines.
Predictive Routing can use any relevant data available in your environment. Data must be combined into a consistent schema and saved as a CSV file. You can use either the Predictive Routing application or the API to import this CSV data.
Predictive Routing evaluates the schema, enables you to modify it if necessary, and synchronizes the verified dataset to the database. You can update your dataset by appending additional data.
Correctly Specifying Data Types in Your Dataset
GPR automatically determines the data types of the columns in your dataset during dataset initialization by analyzing the first 1000 rows of each column. To ensure that GPR can make a correct determination, Genesys recommends that you insert a "dummy" row at the beginning of your dataset that contains values that can be unambiguously interpreted as the expected data types for each column. This prevents cases in which the first 1000 rows may contain all NULL or 0 values, which might lead to an incorrect data type assignment (since 0 can be a valid integer, float, or Boolean value). If a column does contain meaningful values, the dummy row is analyzed along with the other values and contributes to the data type determination. Genesys recommends you use the following data type specifications in your dummy row:
- ‘a_string’ - Is recognized as a string.
- 2.1, or any integer or float value > 1 - Is recognized as a float.
- False or True - Is recognized as Boolean.
- Unix Timestamp (such as 1535538976) - Is recognized as a timestamp.
Unsupported Characters in Agent and Customer Profiles and Datasets
Certain characters in column names are ignored, are unsupported, or cause an upload to fail, as explained in the following points:
- Columns with the following symbols in their column names are not added to Agent Profiles or Customer Profiles:
- *, !, %, ^, (, ), ', &, /, â, è, ü, ó, â, ï
- Columns with the following symbols in their column names are ignored and the column is added with the symbol dropped out as though it had not been entered:
- [Space], -, <
- Non-ASCII characters are not supported. How they are handled differs depending on what data you are uploading:
- In Agent Profiles and Customer Profiles, columns with non-ASCII characters in the column name are not added.
- }In Datasets, when a column name contains a mix of ASCII and non-ASCII characters, GPR removes the non-ASCII characters from the column name as though they had not been entered and correctly uploads all column values.
- }In Datasets, when a column name contains only non-ASCII characters, the column name is entirely omitted. All the column values are preserved, but you cannot modify or save the schema. In this scenario, GPR generates the following error message: An unhandled exception has occurred: KeyError('name').
Logs for Unsupported Characters
The following Agent State Connector log messages record issues with unsupported characters:
- <datetime>  ERROR <BOTTLE> schema_based.py:63 Invalid expression while parsing: <fieldname> = None
- <datetime>  ERROR <BOTTLE> agents.py:172 Fields set([u'<fieldname>']) were ignored because names were invalid.
- What metrics of Contact Center operation do you want to optimize?
- What databases are available?
- Is it possible to join the data from those databases?
- Is the data clean and consistent in format? This is especially an issue if you are joining data from various sources.
- Do you have enough data to draw conclusions about agents performance?
- What data is available from the IVR and the CRM database at runtime?
Note that when you create your CSV file, you will need to include a value, a context_id or customer ID, that can be entered into your scoring request from your strategy and that enables you to retrieve the relevant record.
Data Size Guidelines - Data Import, Model Training, and Feature Analysis
|Description||Column Count||Row Count/File Size|
|Size of data that can be uploaded to create a dataset in a single batch. You can append data; you can load bigger datasets in multiple batches. Data uploads successfully for a file with 250 columns but with a smaller number of total records so that the total file size is less than the 250 MB limit.||250 columns maximum||250 MB file size|
|Minimum number of records needed to train a DISJOINT model for an agent.||Not Applicable||10|
|Total Cardinality limit for model training. Total Cardinality = the number of numeric columns plus the sum of the number of unique values across all string columns within a given dataset.||No specific column count; has been tested up to 250 columns.||Total Cardinality should be less than 2 to the power of 29.|
|Record count limit for GLOBAL model training.||Not Applicable; from a model-training perspective there is virtually no limit on the number of columns. The constraining issue is the possibility of compromising the Global model quality by ending up with a reduced number of samples for training.|| The total number of records should be less than 2 to power of 29 (that is, 536870912) divided by Total Cardinality as defined above.
|Column count limitation on the Feature Analysis report, Agent Variance report, Lift Estimation report, and Model Quality report.||250||Not Applicable|
Genesys Info Mart Data
The following table represents the data required from Genesys Info Mart (GIM) and the Genesys Configuration Database.
|Agent Profile||Agent username||RESOURCE_|
|Agent Profile||Employee ID||RESOURCE_|
|Customer Profile||CustomerID||Attached Data to map with interaction metadata record|
|Customer Profile||SERVICE_TYPE||Attached Data to map with interaction metadata record|
|Interaction Metadata||INTERACTION ID||Interaction_Fact (IF)|
|Interaction Metadata||INTERACTION TYPE||IRF|
|Interaction Metadata||MEDIA TYPE||Media_Type|
|Interaction Metadata||RESOURCE ROLE||TECHNICAL_DESCRIPTOR|
|Interaction Metadata||ROLE REASON||TECHNICAL_DESCRIPTOR|
|Interaction Metadata||TECHNICAL RESULT||TECHNICAL_DESCRIPTOR|
|Interaction Metadata||RESULT REASON||TECHNICAL_DESCRIPTOR|
|Interaction Metadata||MEDIA RESOURCE (VIRTUAL QUEUE)||RESOURCE_|
|Interaction Metadata||RESOURCE GROUP COMBINATION||RESOURCE_GROUP_COMBINATION|
|Interaction Metadata||ROUTING TARGET||ROUTING_TARGET|
|Interaction Metadata||SKILL EXPRESSION||REQUESTED_SKILL|
|Interaction Metadata||LAST ROUTING POINT||RESOURCE_|
|Interaction Metadata||LAST VQ|
|Interaction Metadata||SOURCE_ADRESS (ANI,..)||SOURCE_ADDRESS|
|Interaction Metadata||TARGET_ADRESS (DNIS,..)||TARGET_ADDRESS|
|Interaction Metadata||WAITING TIME||(QUEUE DURATION, ROUTING POINT DURATION, MEDIATION DURATION)|
|Interaction Metadata||RINGING TIME||RING_DURATION|
|Interaction Metadata||HANDLE TIME||(Talk_Duration, ACW Duration,..)|
|Interaction Metadata||HOLD TIME||HOLD_DURATION|
|Interaction Metadata||HOLD COUNT||HOLD_COUNT|
|Interaction Metadata||ACW COUNT||AFTER_CALL_WORK_COUNT|
|Interaction Metadata||FOCUS TIME||FOCUS_TIME|
|Interaction Metadata||CONSULTATION TIME (CONS_RCV_TALK_DURATION, POST_CONS_XFER_TALK_COUNT,…)|