Jump to: navigation, search

Flexible Data Extraction

Release 9.1 enables you to perform the following data extraction tasks:

  • Create comprehensive data selection criteria and combinations of selection criteria.
  • Schedule the data extraction.
  • Protect data by utilizing stringent security, including compression and encryption of exported data. Only PGP (Pretty Good Privacy) encryption is supported.
    • The name of an export file looks like this—export-2fd9c025-d959-436b-a83d-e357d59285d9.zip.pgp
    • Decryption and uncompression of this file will provide the following:
      Interaction-2fd9c025-d959-436b-a83d-e357d59285d9.json
      plus the following but only if export of attachments was required:
      Interaction-Attachment-2fd9c025-d959-436b-a83d-e357d59285d9.json
  • Use UCS in test mode to provide details of how many records and how much data will be exported.

You can use these data export format options:

  • JSON
  • XML
Important
Flexible Data Extraction supports Elasticsearch 5 only. No other versions of Elasticsearch are currently supported.

Recommendations

Genesys recommends running data extraction and/or analytics on a Cassandra data center that mirrors the active Cassandra data center. This approach avoids potential delays, timeouts, high CPU loading and so on that might affect the active site.

Replication can be near real-time (using native Cassandra features with Network Topology Strategy or NTS) or triggered with a batch from a GDPS job (a distinct job run at regular intervals.)

Summary

Extraction of interactions consists of:

  1. Uploading in GDPS the extraction .jar file (it is the same as used in migration) by using a command line.
  2. Starting the extraction process in GDPS.

The extraction is applicable only to interactions and their attachments.

Required Files

To run the data extraction the following files are needed:

  • analytic-package-<version number>.jar, available in the deploy\package directory of an installed GDPS. This is the file that will be uploaded in GDPS.
  • ucs-job.bat or ucs-job.sh available in the deploy\package directory of an installed GDPS. This file is used to run the data extraction command line.

Extraction Jobs

To run the data extraction you must execute ucs-job.bat or ucs-job.sh with several parameters. You can consult the parameter list using either of the following:

  • ucs-job /help or ucs-job /h on Windows platform
  • ./ucs-job.sh -help or ./ucs-job.sh -h on Linux platform

The extraction job does the following:

  1. Exports data to a file
  2. Zips the file
  3. Encrypts the file
  4. Uploads the file.

The zipping process can optionally split the file into several smaller files.

Data Extraction Parameters

Parameter Mandatory Notes Format
-p PACKAGE_NAME Yes Name of the uploaded package in GDPS. It can be any string.  
-p PATH_CURL No Directory where curl is installed. The default value is /usr/bin. If path contains spaces, surround it with double quotes—for example/; -p PATH_CURL "C:\Program Files\tools".
-p JOB_NAME Yes Name of the job to run within the package. For data extraction the value must be ExportUcs9Interactions.  
-p START_DATE_FROM No The first of the range of start dates for extracting items. Format is yyyy-MM-dd HH:mm:ss.SSS and must be surrounded with quotation marks. The first date of the range is included in the selection. Example to export only interactions created in 2017:

-p START_DATE_FROM "2017-01-01 00:00:00.000" -p START_DATE_TO "2018-01-01 00:00:00.000"

-p START_DATE_TO No The last of the range of start dates for extracting items. Format is yyyy-MM-dd HH:mm:ss.SSS and must be surrounded with quotation marks. The last date of the range is excluded from the selection. Example to export only interactions created in 2017:

-p START_DATE_FROM "2017-01-01 00:00:00.000" -p START_DATE_TO "2018-01-01 00:00:00.000"

-p EXPIRATION_DATE_FROM No The first of the range of expiration dates for extracting items. Format is yyyy-MM-dd HH:mm:ss.SSS and must be surrounded with quotation marks. The first date of the range is included in the selection. Example to export only interactions expiring in 2018:

-p EXPIRATION_DATE_FROM "2018-01-01 00:00:00.000" -p EXPIRATION_DATE_TO "2019-01-01 00:00:00.000"

-p EXPIRATION_DATE_TO No The last of the range of expiration dates for extracting items. Format is yyyy-MM-dd HH:mm:ss.SSS and must be surrounded with quotation marks. The last date of the range is excluded from the selection. Example to export only interactions expiring in 2018:

-p EXPIRATION_DATE_FROM "2018-01-01 00:00:00.000" -p EXPIRATION_DATE_TO "2019-01-01 00:00:00.000"

-p MEDIA_TYPE_ID No The MediaTypeID of interactions that should be extracted. If not specified, all interactions matching the date criteria will be extracted. The selection can be reduced to the interaction of a specific media type. Valid values are any media type configured in Configuration Server, such as:
  • chat
  • email
  • sms
  • voice
These should appear in a comma-separated list with quotation marks around the list—for example; "chat,email,voice"
-p ES_HOST No The Elasticsearch host, such as hostindex. This can be a comma-separated list of hostnames in the case of a cluster.
-p ES_PORT No The Elasticsearch port. The default port is 9042.  
-p ES_INDEX No Elasticsearch index name. Set the value configured in your UCS 9.1 (the option name in section [cassandra-keyspace])  
-p EXPORT_FILE_SPLIT_SIZE No If used, the export file will be split into chunks of specified length, expressed as: <length in bytes>  
-p EXPORT_ENCRYPTION Yes The encryption algorithm. The provided value must be pgp.
-p CASSANDRA_HOST No Host of Cassandra (default: 127.0.0.1). This can be a comma-separated list of hostnames in the case of a cluster.
-p CASSANDRA_PORT No Native port of Cassandra (default: 9042)  
-p JAR_FILE No Set the full path to the file to be uploaded. This option will upload the .jar file to the server. The file is available in the GDPS installation deploy/package directory. For example, the file name can be analytic-package\-<version number>.jar.
Important
Using this option will stop and unload any current activity on the server. If path contains spaces, use double quotes.
 
-p INCLUDE_ATTACHMENTS No Default value is true. Included attachments (value = true) are dumped to a distinct file.  
-p DATA_FORMAT No xml or json. Default value is json.  
-p SPARK_PARTITIONS No The number of Spark partitions used for processing data. Default value is 16.  
-p GDPS_HOST Yes IP or host name of the GDPS server.  
-p GDPS_PORT Yes The TCP port for GDPS.  
-p UPLOAD_PROTOCOL Yes Valid values:
  • ftp—regular unsecured ftp
  • ftps—ftp over SSL/TLS
  • sftp—ftp over SSH|| 
-p FTP_HOST Yes Host of the FTP server  
-p FTP_PORT Yes Port of the FTP server  
-p FTP_USER Yes User login to the FTP server  
-p FTP_PWD Yes Sets the password of the specified FTP user.  
-p FTP_PASSIVE_MODE No Default is false. Sets the passive mode of the FTP server.  
-p FTP_DIR No Specifies a default directory when using ftp or sftp.  
-p FTP_AUTH_TYPE Yes for sftp Specifies whether authentication is done by password or by certificate. Possible values:
  • password (default)
  • key

If value key is used, this value must be provided by the customer (sFTP server administrator). In both cases, use the value set in the FTP_PASSWORD parameter.|| 

-p CERTIFICATE Yes if CERTIFICATE_PATH not defined. Sets the user certificate for encryption (inline). Only PGP encryption is supported.  
-p CERTIFICATE_PATH Yes if CERTIFICATE not defined. Sets the path to the user certicate file. The path must be accessible from GDPS server. Only PGP encryption is supported.  
-p COUNT_ONLY No Default value is false. If enabled (value = true) the job does not count the interactions but only returns an estimated number of elements and an estimated size. If COUNT_ONLY is used with value true there is no need to provide the following options:
  • UPLOAD_PROTOCOL
  • FTP_x (any of the "FTP"-prefixed options)
  • CERTIFICATE
  • CERTIFICATE_PATH
 

Upload the GDPS Data Extraction Package

You must first upload the data extraction package in GDPS. This is done by using the command ucs-job add and adding the following parameters:

-p PACKAGE_NAME <Name_Of_Package>
-p JAR_FILE <FULL_PATH_TO_JAR_FILE>
-p PATH_CURL <Directory where curl is installed>
-p GDPS_HOST <IP OR HOSTNAME>
-p GDPS_PORT <TCP PORT>

Example:

ucs-job.sh add \
-p PACKAGE_NAME extraction 
-p PATH_CURL /usr/bin \
-p GDPS_HOST localhost \
-p GDPS_PORT 17009 \
-p JAR_FILE /home/migration/analytic-package-8.5.000.39.jar

Perform the Data Extraction

Perform the data extraction using the command ucs-job submit and use the appropriate parameters.

Example

./ucs-job.sh submit \
-p PACKAGE_NAME extraction \
-p PATH_CURL /usr/bin \
-p JOB_NAME ExportUcs9Interactions\
-p SPARK_PARTITIONS 50 \
-p GDPS_HOST localhost \
-p GDPS_PORT 17009 \
-p CASSANDRA_HOST hostcass \ 
-p CASSANDRA_PORT 9042 \ 
-p ES_HOST hostindex
-p ES_PORT 9200 \ 
-p ES_INDEX ks_ucs9 \ 
-p START_DATE_FROM "2017-01-01 00:00:00.000" \
-p START_DATE_TO "2018-01-01 00:00:00.000" \
-p MEDIA_TYPE_ID chat \
-p EXPORT_ENCRYPTION pgp \
-p CERTIFICATE_PATH "/home/genesys/export/certificate/certucs.pem"
-p UPLOAD_PROTOCOL sftp \
-p FTP_HOST 9e4b781ac709.local. \ 
-p FTP_PORT 22 \ 
-p FTP_USER vp \ 
-p FTP_PWD genesys \ 
-p FTP_DIR /upload/ucs \ 
-p FTP_AUTH_TYPE password

Feedback

Comment on this article:

blog comments powered by Disqus
This page was last modified on May 18, 2018, at 07:01.