Maintenance Notice - PDF Generation
Dynamic PDF generation for web-based content is temporarily unavailable. This maintenance affects dynamic PDF files that are generated from either the HTML-based page or manual that you are viewing. Links that normally allow this functionality have been hidden, and will reappear as soon as the feature is restored.


Note: Access to static files, including PDF files that are not dynamically generated from our web-based content, is unaffected.

Jump to: navigation, search

Flexible Data Extraction

Release 9.1 enables you to perform the following data extraction tasks:

  • Create comprehensive data selection criteria and combinations of selection criteria.
  • Schedule the data extraction.
  • Protect data by utilizing stringent security, including compression and encryption of exported data. PGP (Pretty Good Privacy) and X509 encryption are supported.
    • The name of an export file looks like this—export-2fd9c025-d959-436b-a83d-e357d59285d9.zip.pgp or export-2fd9c025-d959-436b-a83d-e357d59285d9.zip.smime with X509 encryption.
    • Sample to uncompress export-2fd9c025-d959-436b-a83d-e357d59285d9.zip.pgp, where genesys was the passphrase used to generate the certificate:
      echo "genesys" | gpg --batch --passphrase-fd 0 --output dataexportplaintext.zip --decrypt export-2fd9c025-d959-436b-a83d-e357d59285d9.zip.pgp
    • Sample to uncompress export-2fd9c025-d959-436b-a83d-e357d59285d9.zip.smime, where genesys was the passphrase used to generate the certificate:
      openssl smime -decrypt -in export-2fd9c025-d959-436b-a83d-e357d59285d9.zip.smime -inkey privkey.pem -out dataexportplaintext.zip -passin pass:genesys
  • Decryption and uncompression of this file will provide the following:
    • Interaction-2fd9c025-d959-436b-a83d-e357d59285d9.json
      plus the following but only if export of attachments was required:
      Interaction-Attachment-2fd9c025-d959-436b-a83d-e357d59285d9.json
      and the following only if export of binary content was required:
      Interaction-BinaryContent-2fd9c025-d959-436b-a83d-e357d59285d9.json
  • Use UCS in test mode to provide details of how many records and how much data will be exported.

You can use these data export format options:

  • JSON
  • XML


Recommendations

Genesys recommends running data extraction and/or analytics on a Cassandra data center that mirrors the active Cassandra data center. This approach avoids potential delays, timeouts, high CPU loading and so on that might affect the active site.

Replication can be near real-time (using native Cassandra features with Network Topology Strategy or NTS) or triggered with a batch from a GDPS job (a distinct job run at regular intervals.)

If export of attachments is requested, Genesys recommends configuring 8Gb of memory for GDPS workers (in option executorMemory in the [spark] section of the GDPS application).

Summary

Extraction of interactions consists of:

  1. Uploading in GDPS the extraction .jar file (it is the same as used in migration) by using a command line.
  2. Starting the extraction process in GDPS.

The extraction is applicable only to interactions and their attachments.

Required Files

To run the data extraction the following files are needed:

  • analytic-package-<version number>.jar, available in the deploy\package directory of an installed GDPS. This is the file that will be uploaded in GDPS.
  • ucs-job.bat or ucs-job.sh available in the deploy\package directory of an installed GDPS. This file is used to run the data extraction command line.

Extraction Jobs

To run the data extraction you must execute ucs-job.bat or ucs-job.sh with several parameters. You can consult the parameter list using either of the following:

  • ucs-job /help or ucs-job /h on Windows platform
  • ./ucs-job.sh -help or ./ucs-job.sh -h on Linux platform

The extraction job does the following:

  1. Exports data to a file
  2. Zips the file
  3. Encrypts the file
  4. Uploads the file.

The zipping process can optionally split the file into several smaller files.

Data Extraction Parameters

Parameter Mandatory Notes Format
-p PACKAGE_NAME Yes Name of the uploaded package in GDPS. It can be any string.  
-p PATH_CURL No Directory where curl is installed. The default value is /usr/bin. If path contains spaces, surround it with double quotes—for example/; -p PATH_CURL "C:\Program Files\tools".
-p JOB_NAME Yes Name of the job to run within the package. For data extraction the value must be ExportUcs9Interactions.  
-p START_DATE_FROM No The first of the range of start dates for extracting items. Format is yyyy-MM-dd HH:mm:ss.SSS and must be surrounded with quotation marks. The first date of the range is included in the selection. Example to export only interactions created in 2017:

-p START_DATE_FROM "2017-01-01 00:00:00.000" -p START_DATE_TO "2018-01-01 00:00:00.000"

-p START_DATE_TO No The last of the range of start dates for extracting items. Format is yyyy-MM-dd HH:mm:ss.SSS and must be surrounded with quotation marks. The last date of the range is excluded from the selection. Example to export only interactions created in 2017:

-p START_DATE_FROM "2017-01-01 00:00:00.000" -p START_DATE_TO "2018-01-01 00:00:00.000"

-p EXPIRATION_DATE_FROM No The first of the range of expiration dates for extracting items. Format is yyyy-MM-dd HH:mm:ss.SSS and must be surrounded with quotation marks. The first date of the range is included in the selection. Example to export only interactions expiring in 2018:

-p EXPIRATION_DATE_FROM "2018-01-01 00:00:00.000" -p EXPIRATION_DATE_TO "2019-01-01 00:00:00.000"

-p EXPIRATION_DATE_TO No The last of the range of expiration dates for extracting items. Format is yyyy-MM-dd HH:mm:ss.SSS and must be surrounded with quotation marks. The last date of the range is excluded from the selection. Example to export only interactions expiring in 2018:

-p EXPIRATION_DATE_FROM "2018-01-01 00:00:00.000" -p EXPIRATION_DATE_TO "2019-01-01 00:00:00.000"

-p MEDIA_TYPE_ID No The MediaTypeID of interactions that should be extracted. If not specified, all interactions matching the date criteria will be extracted. The selection can be reduced to the interaction of a specific media type. Valid values are any media type configured in Configuration Server, such as:
  • chat
  • email
  • sms
  • voice
These should appear in a comma-separated list with quotation marks around the list—for example; "chat,email,voice"
-p ES_HOST No The Elasticsearch host, such as hostindex. This can be a comma-separated list of hostnames in the case of a cluster.
-p ES_PORT No The Elasticsearch port. The default port is 9042.  
-p ES_INDEX No Elasticsearch index name. Set the value configured in your UCS 9.1 (the option name in section [cassandra-keyspace])  
-p EXPORT_FILE_SPLIT_SIZE No If used, the export file will be split into chunks of specified length, expressed as: <length in bytes>  
-p EXPORT_ENCRYPTION Yes The encryption algorithm. The provided value must be pgp or X509.  
-p CASSANDRA_HOST No Host of Cassandra (default: 127.0.0.1). This can be a comma-separated list of hostnames in the case of a cluster.
-p CASSANDRA_PORT No Native port of Cassandra (default: 9042)  
-p JAR_FILE No Set the full path to the file to be uploaded. This option will upload the .jar file to the server. The file is available in the GDPS installation deploy/package directory. For example, the file name can be analytic-package\-<version number>.jar.
Important
Using this option will stop and unload any current activity on the server. If path contains spaces, use double quotes.
 
-p INCLUDE_BINARY_CONTENT No Default value is false. Included binary content (value = true) is dumped to a distinct file.  
-p INCLUDE_ATTACHMENTS No Default value is true. Included attachments (value = true) are dumped to a distinct file.  
-p DATA_FORMAT No xml or json. Default value is json.  
-p GDPS_HOST Yes IP or host name of the GDPS server.  
-p GDPS_PORT Yes The TCP port for GDPS.  
-p UPLOAD_PROTOCOL Yes Valid values:
  • ftp—regular unsecured ftp
  • ftps—ftp over SSL/TLS
  • sftp—ftp over SSH
 
-p FTP_HOST Yes Host of the FTP server  
-p FTP_PORT Yes Port of the FTP server  
-p FTP_USER Yes User login to the FTP server  
-p FTP_PWD Yes Sets the password of the specified FTP user.  
-p FTP_PASSIVE_MODE No Default is false. Sets the passive mode of the FTP server.  
-p FTP_DIR No Specifies a default directory when using ftp or sftp. It is recommended to provide an absolute path rather than a relative one.  
-p FTP_AUTH_TYPE Yes for sFTP Specifies whether authentication is done by password or by certificate. Possible values:
  • password (default)
  • key

If value key is used, this value must be provided by the customer (sFTP server administrator). In both cases, use the value set in the FTP_PASSWORD parameter.

 
-p CERTIFICATE Yes if CERTIFICATE_PATH not defined. Sets the user certificate for encryption (inline). PGP and X509 encryption are supported.  
-p CERTIFICATE_PATH Yes if CERTIFICATE not defined. Sets the path to the user certicate file. The path must be accessible from GDPS server. Only PGP encryption is supported.  
-p COUNT_ONLY No Default value is false. If enabled (value = true) the job does not count the interactions but only returns an estimated number of elements and an estimated size. If COUNT_ONLY is used with value true there is no need to provide the following options:
  • UPLOAD_PROTOCOL
  • FTP_x (any of the "FTP"-prefixed options)
  • CERTIFICATE
  • CERTIFICATE_PATH
 

Upload the GDPS Data Extraction Package

You must first upload the data extraction package in GDPS. This is done by using the command ucs-job add and adding the following parameters:

-p PACKAGE_NAME <Name_Of_Package>
-p JAR_FILE <FULL_PATH_TO_JAR_FILE>
-p PATH_CURL <Directory where curl is installed>
-p GDPS_HOST <IP OR HOSTNAME>
-p GDPS_PORT <TCP PORT>

Example:

ucs-job.sh add \
-p PACKAGE_NAME extraction 
-p PATH_CURL /usr/bin \
-p GDPS_HOST localhost \
-p GDPS_PORT 17009 \
-p JAR_FILE /home/migration/analytic-package-8.5.000.39.jar

Perform the Data Extraction

Perform the data extraction using the command ucs-job submit and use the appropriate parameters.

Example

./ucs-job.sh submit \
-p PACKAGE_NAME extraction \
-p PATH_CURL /usr/bin \
-p JOB_NAME ExportUcs9Interactions\
-p GDPS_HOST localhost \
-p GDPS_PORT 17009 \
-p CASSANDRA_HOST hostcass \ 
-p CASSANDRA_PORT 9042 \ 
-p ES_HOST hostindex
-p ES_PORT 9200 \ 
-p ES_INDEX ks_ucs9 \ 
-p START_DATE_FROM "2017-01-01 00:00:00.000" \
-p START_DATE_TO "2018-01-01 00:00:00.000" \
-p MEDIA_TYPE_ID chat \
-p EXPORT_ENCRYPTION pgp \
-p CERTIFICATE_PATH "/home/genesys/export/certificate/certucs.pem"
-p UPLOAD_PROTOCOL sftp \
-p FTP_HOST 9e4b781ac709.local. \ 
-p FTP_PORT 22 \ 
-p FTP_USER vp \ 
-p FTP_PWD genesys \ 
-p FTP_DIR /upload/ucs \ 
-p FTP_AUTH_TYPE password

Fast Count feature

A Fast Count feature is available that enables a quick and fast estimation to be run without impacting the Cassandra and Elasticsearch database. To use this feature, set the following parameters in the request when invoking the export job:

  • COUNT_ONLY = true
  • INCLUDE_ATTACHMENTS = false
  • INCLUDE_BINARY_CONTENT = false

Example of response

AttachmentsCount: 412
AttachmentsEstimatedSize: N/A
BinariesCount: 17920
BinariesEstimatedSize: 41 MB
InteractionsCount: 70274
InteractionsEstimatedSize: 274 MB

Note that Attachment Estimated size is not available with this feature.

Fast count issues a single query to the Elasticsearch cluster to get all information. To get a precise count, specify the following parameters:

  • INCLUDE_ATTACHMENTS = true
  • and/or INCLUDE_BINARY_CONTENT = true

If a flag is omitted, check its default value—for example, INCLUDE_ATTACHMENTS has a default value of true.

Important
Getting a precise count (as distinct from a fast count) will crawl the Elasticsearch and Cassandra clusters for aggregating data. Be mindful of the impact on the system and avoid doing a precise count when the system is under heavy load.
This page was last edited on March 13, 2019, at 17:56.

Feedback

Comment on this article:

blog comments powered by Disqus