Jump to: navigation, search

accept-clients-in-backup-mode

Section: statserver
Default Value: yes
Valid Values: yes, no
Changes Take Effect: After restart
Modified: 8.5.1. New default value is yes

Specifies whether Stat Server accepts client connections when operating in backup mode.

With this option set to yes, Stat Server notifies the clients about its redundancy mode after a client's registration and after a change in mode. Moreover, when its redundancy mode is changed to backup, Stat Server does not close the communication port and accepts clients' connections and requests.

Deploying in High Availability Environments

Both AI Core Services (AICS) and Agent State Connector (ASC) support High Availability (HA).

High availability (HA) is configured differently for each Predictive Routing component:

  • AI Core Services (AICS) uses a multi-server architecture. It can be installed at a single site, or in a multi-site architecture. Genesys recommends that you install AICS on three to five servers. More servers mean higher availability: with three servers, the system can survive the failure of only one machine; with five servers, the system can survive the failure of two machines; and so on.
Important
  • AICS is installed in Docker containers. Genesys does not ship Docker as a part of AICS. You must install Docker in your environment before you can load the AICS containers.
  • You might need an active internet connection to download additional libraries when installing Docker.
  • Agent State Connector (ASC) is deployed in warm-standby mode, with primary and backup servers.
  • The strategy subroutines run as part of your routing solution, and therefore use the HA architecture established for that solution.

HA for AICS

The HA deployment and operating information for AICS is divided into the following sections:

Installing HA AICS - Single Data Center Architecture

Important
  • The following instructions enable you to set up a new AICS HA deployment in a single data center. If you already have a single-server deployment of AICS installed, contact Genesys Customer Care for help migrating to an HA architecture.
  • If you need to remove AICS from an HA environment, contact Genesys Customer Care for assistance.

Hardware Requirements

  • AICS HA requires a cluster of at least three servers. Genesys recommends that you deploy an odd number of servers to be used for hosting highly-available AICS system (3, 5, 7).
  • Every server must meet preconditions stated in single-host installation. This will be verified during installation.
  • All servers must have networking set up between them, with the ports opened stated in Required Opened Ports for Firewall Configuration.
  • All servers must have synchronized system clocks. You can use Network Time Protocol (NTP) for this.
  • On every target server, port 3031 must be reachable by the load balancer.
  • On every target server, you MUST create a separate disk partition for storing MongoDB data. Mount this partition as /datadir. The partition size depends on your expected data usage, but must be at least 50 GB. For disk partitioning, use standard Linux tools. The /datadir partition MUST exist before you install GPR and the user who is executing the GPR installation should have write access to the partition. Preliminary Step: Create a Separate Disk for the MongoDB Database explains how to check the free space in your mongodb directory.
  • You have at least 50 GB free disk space on the root partition.
Important
If you are running VMWare VXLAN, you might encounter a port conflict between VMWare VXLAN and Docker, both of which require port 4789. If you encounter this issue, Genesys recommends that you use a networking application such as Weave Net to manage networking among Docker containers. For additional information, consult the documentation for the respective products:

Installation Procedure

Important
Some installation steps require to know the hostname of the target servers. You can run the command hostname in every server in the cluster to get the hostname. This document uses the terms node-1-hostname, node-2-hostname, node-3-hostname, and so on, to refer to the real hostnames of the servers. You must use actual hostnames when executing the example commands shown in the following sections.
  1. Copy the installation binary file (*.tar.gz) to every server in the cluster. Make sure you follow recommendations about the user PR_USER and the installation location described in single-host installation.
  2. Unpack the installation binary file on every server in the cluster. To unpack, follow these steps:
    1. Copy the IP_JOP_PRR_<version_number>_ENU_linux.tar.gz installation binary file to the desired installation directory. Genesys recommends that you use the PR_USER home directory as the destination for the AICS installation package.
    2. From a command prompt, unpack the file using the following command to create the IP_JOP_PRR_<version_number>_ENU_linux directory:
      tar -xvzf IP_JOP_PRR_<version_number>_ENU_linux.tar.gz
      Note the following points:
      • All scripts for installing and operating AICS in an HA setup can be found in the IP_JOP_PRR_<version_number>_ENU_linux/ha-scripts/ directory.
    3. Create a Docker Swarm cluster.
      AICS uses Docker Swarm technology to ensure high availability of all its components. In order for AICS to be deployed in highly available manner, you must properly format the Docker Swarm cluster on your target servers.
      1. On the target server with the hostname node-1-hostname, execute following command to initiate the Docker Swarm cluster:
        docker swarm init
        Important
        If the system has multiple IP addresses, specify the --advertise-addr parameter so the correct address is chosen for communication between all nodes in the cluster. If you do not specify this parameter, an error similar to the following is generated: Error response from daemon: could not choose an IP address to advertise since this system has multiple addresses on different interfaces (10.33.181.18 on ens160 and 178.139.129.20 on ens192) - specify one with --advertise-addr.
        Example of the command to initiate the Docker Swarm cluster, specifying the address that is advertised to other members of the cluster:
        docker swarm init --advertise-addr YOUR_IP_ADDRESS
        You can also specify a network interface to advertise the interface address, as in the following example:
        docker swarm init --advertise-addr YOUR_NETWORK_INTERFACE
      2. After that, still on the node with the hostname node-1-hostname, execute the following command:
        docker swarm join-token manager
        The output of this command should look similar to the following:
        docker swarm join --token SWMTKN-1-4d6wgar0nbghws5gx6j912zf2fdawpud42njjwwkso1rf9sy9y-dsbdfid1ilds081yyy30rof1t 172.31.18.159:2377
      3. Copy this command and execute it on all other nodes in cluster. This ensures that all other nodes join the same cluster and coordinates AICS deployment.
      4. Now execute following command on the node with the hostname node-1-hostname in order to verify that cluster has been properly formed and that you can continue with installation:
        docker node ls
        The output of this command MUST show you all target servers in the cluster (node-1-hostname, node-2-hostname, ..., node-X-hostname). If you do not see a complete list of servers, do not proceed with AICS installation. The following is an example of output where all nodes joined the cluster and are all reachable:
        ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
        vdxn4uzuvaxly9i0je8g0bhps *node-1-hostname Ready Active Leader
        908bvibmyg9w87la6php11q96 node-2-hostname Ready Active Reachable
        ersak4msppm0ymgd2y7lbkgne node-3-hostname Ready Active Reachable
        shzyj970n5932h3z7pdvyvjes node-4-hostname Ready Active Reachable
        zjy3ltqsp3m5uekci7nr06tlj node-5-hostname Ready Active Reachable
    4. Label MongoDB Nodes in the Cluster
      Follow the steps below to define your MongoDB nodes:
      1. Decide how many MongoDB instances to install in your deployment. This can be only 3 or 5. The higher number means higher availability.
        Important
        Only one MongoDB instance can run per target server.
      2. On the server with the host name node-1-hostname, execute following command to see all the nodes currently in the cluster:
        docker node ls
      3. Choose the servers where MongoDB instances will run. In single data center deployment it does not matter which servers you choose as long as they have fast disks (SSD) and enough disk space.
        The examples assume you chose the servers with the host names node-1-hostname, node-2-hostname, and node-3-hostname to run MongoDB instances.
      4. Label the selected nodes appropriately. To do this, execute following commands on node-1-hostname:
        docker node update --label-add mongo.replica=1 $(docker node ls -q -f name=node-1-hostname)
        docker node update --label-add mongo.replica=2 $(docker node ls -q -f name=node-2-hostname)
        docker node update --label-add mongo.replica=3 $(docker node ls -q -f name=node-3-hostname)
      5. For a cluster with five MongoDB instances, you would also run these two additional commands (and you would have to have at least five servers in the cluster):
        docker node update --label-add mongo.replica=4 $(docker node ls -q -f name=node-4-hostname)
        docker node update --label-add mongo.replica=5 $(docker node ls -q -f name=node-5-hostname)
    5. Label the Worker Nodes in the Cluster
      Decide how many workers you want to run and on which servers.
      • The minimum number of servers marked to run worker instances is two, but you can have more for increased scalability and high availability. This configuration is verified during AICS installation.
      • Each worker container scales independently and you can have multiple instances of the same worker type running on the same server.
      • Workers can be co-located with other containers (such as MongoDB).
      1. Execute following commands on the node with the hostname node-1-hostname to ensure that worker instances will run on nodes node-1-hostname, node-2-hostname, and node-3-hostname:
        docker node update --label-add worker=true $(docker node ls -q -f name=node-1-hostname)
        docker node update --label-add worker=true $(docker node ls -q -f name=node-2-hostname)
        docker node update --label-add worker=true $(docker node ls -q -f name=node-3-hostname)
        You can choose to label more nodes and make them available to run worker instances. You cannot label fewer than two nodes with worker = true
    6. Label the Minio Nodes in the Cluster
      You should always label at least one node to run Minio container. Minio container is used for faster dataset uploads. We recommend to label two nodes to run Minio.
      1. Execute following commands on the node with the hostname node-1-hostname to ensure that a Minio instance will run on one of nodes node-1-hostname or node-2-hostname:
        docker node update --label-add minio=true $(docker node ls -q -f name=node-1-hostname)
        docker node update --label-add minio=true $(docker node ls -q -f name=node-2-hostname)
        • There is always only one Minio instance running and it only runs on one of the properly-labeled nodes.
      2. Find on what node Minio container is running by executing following command:
        docker service ps minio_server_minio --format {{.Node}}
      3. Find the public IP address of the Minio node and make sure that the S3_ENDPOINT configuration parameter in IP_JOP_PRR_<version_number>_ENU_linux/conf/tango.env is configured in the following way:
        S3_ENDPOINT=https://PUBLIC_IP_OF_NODE_WHERE_MINIO_CONTAINER_RUNS:9000
        • The Minio container can be co-located with other containers (such as MongoDB or workers).
    7. Note the Tango Instances
      There is automatically one Tango instance running on every node (server) in the cluster. As you expand the cluster, new Tango instances are installed and started on the newly-created nodes.
    8. Install AICS in HA Mode
      Your Docker Swarm cluster is now ready for AICS installation.
      1. To make the Docker images needed by AICS available on every server in the cluster, execute the following command on every server in the cluster:
        bash ha-scripts/install.sh
        If you are managing your MongoDB deployment externally, run the install.sh script with the -externalMongo flag, as follows:
        bash ha-scripts/install.sh -externalMongo
      2. To initialize the HA AICS deployment and start the application, execute the following command. This command also sets the password for your default user, super_user@genesys.com. Replace the variable <'my_password'> in the command below with a strong password, and record it securely for future reference.
        cd ha-scripts; bash start.sh -l -p <'my_password'>
    9. Access AICS in HA Mode
      Once your fully-installed AICS deployment has started up correctly, you can access AICS by using the IP address of any server in the cluster on port 3031 as: https://<IP_ADDRESS>:3031
      Important
      Genesys recommends that you install a load balancer in front of the cluster to make it easier to access AICS. See Load Balancing for HA AICS for details.

Installing HA AICS - Multiple Data Center Architecture

Important
The following instructions enable you to set up a new AICS HA deployment in a multiple data center environment. If you already have a single-server deployment of AICS installed, contact Genesys Customer Care for help migrating to an HA architecture.

The basic procedure for installing AICS in multiple data centers is the same as installing AICS in single data center. However, when deploying AICS in an environment with multiple data centers, there are some considerations and requirements in addition to those for a single data center.

  • Before starting, ensure that you have a fast LAN/WAN that connects all of the servers and that all ports are open.
  • Plan to spread all instances of the AICS components (Workers, MongoDB, Tango, Minio) across your data centers to ensure that AICS continues to operate correctly if a single data center fails. This is most important for servers running MongoDB.

Special Considerations for MongoDB Instances

  • Spread labels across the data centers when labeling servers to run MongoDB replica set members.
    Important
    The AICS installation procedure does not validate whether MongoDB instances are spread across data centers. Failing to ensure this even distribution can compromise overall availability of the AICS deployment.
  • Every data center should have similar hardware capacity (RAM, CPU, disk).
  • No data center should have a majority of the MongoDB servers running in it when using three data centers.

Using Only Two Data Centers

You can use only two data centers when installing AICS in HA mode, but this reduces overall availability of AICS. In this scenario, one data center always has the majority of the MongoDB servers running in it. If that data center fails, the second data center goes into read-only mode. You must then execute a manual recovery action, using the following procedure:

Execute Manual Recovery

To recover if your system enters read-only mode:

  1. Find the current status of the MongoDB cluster by entering the following command:
    docker exec -it $(docker ps -qf label=com.docker.swarm.service.name=mongo_mongo3) mongo --ssl --sslCAFile /etc/ssl/mongodb.pem --sslAllowInvalidHostnames --eval "for (i=0; i<rs.status().members.length; i++) { member = rs.status().members[i]; print(member.name + \" : \" + member.stateStr) }"
    For example, you might enter:
    [pm@hostname ha-scripts]$ docker exec -it $(docker ps -qf label=com.docker.swarm.service.name=mongo_mongo3) mongo --ssl --sslCAFile /etc/ssl/mongodb.pem --sslAllowInvalidHostnames --eval "for (i=0; i<rs.status().members.length; i++)="" {="" member="rs.status().members[i];" print(member.name="" +="" \"="" :="" member.statestr)="" }"<="" tt="">
    And receive back the following:
    MongoDB shell version: 3.2.18
    connecting to: test
    mongo_mongo1:27017 : SECONDARY
    mongo_mongo2:27017 : SECONDARY
    mongo_mongo3:27017 : PRIMARY
    [pm@node-3 ha-scripts]$
    The primary MongoDB node is mongo_mongo3. The following command shows the number of members in the MongoDB cluster:
    rs.status().members.length;
  2. Remove any unreachable MongoDB members. If necessary, use the following command to change to the primary node:
    com.docker.swarm.service.name=mongo_mongo3
  3. Run the following command on the primary MongoDB node to recover the MongoDB cluster:
    docker exec -it $(docker ps -qf label=com.docker.swarm.service.name=mongo_mongo3) mongo --ssl --sslCAFile /etc/ssl/mongodb.pem --sslAllowInvalidHostnames --eval "members = rs.status().members; cfgmembers = rs.conf().members; for (i=members.length; i>0; i--) { j = i - 1; if (members[j].health == 0) { cfgmembers.splice(j,1) } }; cfg = rs.conf(); cfg.members = cfgmembers; printjson(rs.reconfig(cfg, {force: 1}))"
    For example, you might enter:
    [pm@hostname ha-scripts]$ docker exec -it $(docker ps -qf label=com.docker.swarm.service.name=mongo_mongo3) mongo --ssl --sslCAFile /etc/ssl/mongodb.pem --sslAllowInvalidHostnames --eval "members = rs.status().members; cfgmembers = rs.conf().members; for (i=members.length; i>0; i--) { j = i - 1; if (members[j].health == 0) { cfgmembers.splice(j,1) } }; cfg = rs.conf(); cfg.members = cfgmembers; printjson(rs.reconfig(cfg, {force: 1}))"
    And receive back the following:
    MongoDB shell version: 3.2.18
    connecting to: test
    { "ok" : 1 }
    [pm@node-3 ha-scripts]$

The minority members in the reachable data center can now form a quorum, which returns the running data center to read-write mode.

For other useful commands, including commands for checking node status and removing non-functional nodes, see Troubleshooting Your HA AICS Deployment, below.

Set Values for Environment Variables

This section lists environment variables that should be configured for optimal GPR performance, and the recommended values. Adjust these values as necessary, based on your specific environment.

Warning
The tango.env file, which contains the environment variables, is overwritten when you perform a software upgrade. Before upgrading, save a copy of the tango.env file and refer to it to reset your variables. Note that if you simply overwrite the new tango.env file with your existing one, any environment variables added in the new release are removed.

Environment variables are defined in the IP_JOP_PRR_<version_number>_ENU_linux/conf/tango.env file. The same file is used for both single node and HA deployments.

To add a new variable:

  1. Create a new line in the tango.env file.
  2. Add the variable and its value, using the following format:
    <NEW_ENV_VAR>=value
Important
  • Do not use quotes for string parameters.
  • Remove trailing spaces.

Changes take effect on restart of the tango container (run the bash scripts/restart.sh command command). In an HA environment, with multiple instances of the containers running, restart is performed sequentially (a rolling restart), so that there is no downtime of the GPR application.

Configurable Environment Variables

  • ADD_CARDINALITIES_EVERY_N_RECORDS - When you append data to an Agent or Customer Profile via the API, cardinalities are computed only for the appended data portion and only when the number of agents or customers set in the ADD_CARDINALITIES_EVERY_N_RECORDS parameter is reached. The results of computation are added to the already-stored cardinality values. This significantly improves speed when loading new data by avoiding simultaneous recomputations on the full data collection when there are multiple frequent appends done in small batches.enables you to specify how many appended records are added to an Agent or Customer Profile before GPR recalculates cardinalities. The default value is 1000.
    • Notes:
      • This functionality is available only when you use the Predictive Routing API. If you append using the Predictive Routing application interface, all cardinalities are recalculated.
      • Full automatic computation happens only once, when an Agent or Customer Profile is uploaded the first time for schema discovery.
      • You can force recomputation of cardinalities on the full Agent or Customer Profiles collection using the POST compute_cardinalities API endpoint. For details, see the Predictive Routing API Reference. (This file requires a password to open it. Contact your Genesys representative if you need access.)
  • HOST_DOMAIN - Use this variable to specify the public IP address or host name used for your deployment. The value should be one of the following, depending on your environment type:
    • For single-server deployments, specify the public IP address or the host name of the host where GPR is deployed.
    • For high availability (HA) deployments, specify the IP address of your load balancer.
  • LOG_LEVEL
    • INFO - Informational messages that highlight the progress of the application: LOG_LEVEL=INFO. This setting is recommended for production deployments.
    • DEBUG - Fine-grained informational events that are most useful to debug the application: LOG_LEVEL=DEBUG. This setting should be used only for short periods of time because it can fill the disk.
  • LOGIN_MESSAGES enables you have the Predictive Routing application display a custom message on the login screen.
    • When you enter this message, make sure that all special characters are properly escaped. Special characters are ones not part of the standard English alphabet, such as symbols, letters with umlauts, cedillas, and other such marks, and letters from other alphabets, such as the Greek or Cyrillic alphabets.
    • To simplify the task of converting characters, Genesys recommends an online conversion tool, such as https://www.freeformatter.com/html-escape.html.
    • For example, make the following substitutions:
      • & becomes `&amp`;
      • < becomes `&lt`;
      • > becomes `&gt`;
      • " becomes `&quot`;
      • ' becomes `&#39`;
  • OMP_NUM_THREADS (required for releases prior to 9.0.011.00; in releases 9.0.011.00 and higher, this parameter is set automatically)
    • Genesys recommends that you set the value to OMP_NUM_THREADS=1 for the best performance.
    • If you do not specify a value, GPR spawns one thread for each core it detects in your environment. The system assumes it can use all available cores for tasks such as analysis and model training, leaving no CPU resources for other processes running on the same machine, such as reading/writing to the database. The result is an overall slowdown of the application. Set this variable to allow the operating system to properly distribute CPU threads among the various running processes.
  • S3_ENDPOINT - To achieve the fastest Dataset uploads offered by GPR, configure this variable to point to the Minio container, which was introduced in AICS release 9.0.013.01. The default value does not provide the fastest possible upload speed. The value set for this variable must be the public IP address or domain name of the server where the Minio container is running, followed, optionally, by the port number allocated for the container.
    • The S3_ENDPOINT value must specify the protocol (http or https) to be used, which must match the protocol used for accessing the GPR APIs.
    In single host deployments, use the public IP address or domain name of the server where GPR is installed. In HA environments, locate the server on which the Minio container is running and use the public IP address or the domain name of that server. For example:
    • For an IP address - S3_ENDPOINT=https://<public_ip_address>:9000
    • For a domain name - S3_ENDPOINT=https://<your_domain_name>:9000
  • (Optional) GUNICORN_TIMEOUT
    • Adjust the timeout if you need to accommodate a large dataset. The current default value is 600 seconds.

Load Balancing for HA AICS

Once AICS has been installed and started, you can access it using the IP address of any node in the cluster on port 3031. To enable load balancing:

  1. Your load balancer should have its health-check functionality turned on.
  2. The load balancer should check for HTTP code 200 to be returned on https://IP:3031/login.
Important
  • Genesys recommends a third-party highly-available load balancer, such as F5, to ensure all requests to AICS platform are spread evenly across all nodes in the AICS cluster.
  • If you need SSL, set it up on the third-party load balancer.
  • If you are using a domain name instead of a numeric IP address, configure the S3_ENDPOINT environment variable in the tango.env file as follows: S3_ENDPOINT=https://<your_domain_name>:9000

Using the NGINX Load Balancer

Important
The NGINX container was removed from AICS in release 9.0.013.01.

In releases through 9.0.012.01, Genesys shipped the NGINX load balancer as part of AICS. It is intended for use only in prototype scenarios.

Important
The NGINX load balancer is a single point of failure and should not be used in production deployments.

To use the NGINX, follow the procedure below:

  1. Edit the ha-scripts/nginx/nginx.conf file by putting the IP addresses of all nodes in your cluster into the upstream tango section using syntax such as IP1:3031, IP2:3031, IP3:3031. For example, your command might look similar to the following:
    upstream tango {  
        server 18.220.11.120:3031;  
        server 18.216.235.201:3031;  
        server 13.59.93.192:3031;  
        }
  2. Execute the following command in order to start the NGINX container:
    bash ha-scripts/nginx/start.sh
  3. Verify that you can access AICS by pointing your browser to IP address where NGINX is running.

To stop NGINX, run the following command:

bash ha-scripts/nginx/stop.sh

To fix a 413 (Request Entity Too Large) NGINX error, follow these steps:

  1. Open the nginx.conf file.
  2. Increase the value for the client_max_body_size parameter to 3g.
  3. Restart NGINX using the command:
docker restart nginx

Clean Up Disk Space

Starting in release 9.0.013.00, GPR performs automatic cleanup processes which should maintain an adequate amount of free disk space. However, if you are running an earlier version of AICS, or are running 9.0.013.00 or higher and continue to encounter disk space problems, refer to the instructions in this section.

You might encounter performance issues if you do not clean up Docker data that is no longer required. The Docker prune command enables you to clean up your Docker environment. The Docker user documentation provides a detailed discussion of the prune command and how to use it to clean up images, containers, and volumes; see Prune unused Docker objects.

Important
The clean-up process does not affect normal GPR operation. It does not require downtime, there is no need to restart any component, and performance is unaffected.

Clean-Up Procedure

Genesys recommends that you use the following commands to remove unnecessary Docker data:

docker container prune -f
docker volume prune -f
docker network prune -f 

To schedule regular cleanup jobs, use the crontab functionality to execute the appropriate command on every server where GPR is installed. The following example schedules the cleanup job for every Saturday at 1:00 am:

echo "0 1 * * Sat (docker container prune -f; docker volume prune -f; docker network prune -f)" ) | crontab -

In an HA environment, Genesys recommends that you perform the cleanup on each node in turn.

If you need to configure your logging settings to avoid unacceptable log file sizes, see the following information:

Installing into an Existing HA AICS Deployment

Important
There is no downtime during this process and no data is lost. Executing this script only upgrades the services and does not stop or upgrade MongoDB.
Important
Review the Upgrade Notes section of the Release Notes for all releases later than your starting release, including your target release. Follow any procedures specified for the interim releases, such as running scripts. If there is no Upgrade Notes section, or the section is empty, no additional steps are required for the associated release. The following AICS releases do require special upgrade procedures:

To perform the upgrade:

  1. Copy the new AICS release package (the *.tar.gz file) to all servers in the cluster. Use the same user and procedure as if you are installing AICS for the first time. All the recommendations about the user who performs the installation and operates AICS still apply.
  2. After unpacking the new version of AICS in the PR_USER home directory that contains all target servers, you will have multiple different subdirectories named IP_JOP_PRR_<version_number>_ENU_linux. For example you might have two subdirectories:
    • IP_JOP_PRR_<old_version_number>_ENU_linux
    • IP_JOP_PRR_<new_version_number>_ENU_linux
  3. Assuming you are installing new_version of the application and removing old_version, execute the following command in the IP_JOP_PRR_<new_version_number>_ENU_linux directory on all target servers:
    bash ha-scripts/install.sh
  4. Then in any one of the servers, execute the following command in the IP_JOP_PRR_<new_version_number>_ENU_linux directory:
    bash ha-scripts/upgrade_gpr_services.sh

This command executes the upgrade of Tango (AICS) on all nodes in the cluster, one by one, and rolls back the change if there is a problem. There is no downtime during this upgrade, and no data loss.

Troubleshooting a AICS HA Deployment

The following sections offer information that can help you identify issues without your deployment.

Handling Server Failure

If a server (node) restarts, the HA deployment recovers automatically as long as the server keeps its previous IP address and the data on the disk is not corrupted.

The following command identifies a non-working node as unreachable node:

docker node ls

If a server needs to be decommissioned and replaced with new one, the following manual step is necessary to preserve the health of the cluster. After shutting down the server that is to be decommissioned, execute the following two commands, where NODE_ID is the unique node identifier of the server to be decommissioned:

docker node demote <NODE_ID>
docker node rm <NODE_ID>

After this, you can add a new server to your environment. Label it the same way as the decommissioned server and execute the procedure for joining that server to the cluster as described in Installation Procedure, above.

Handling Failover

When a server hosting MongoDB and the AICS application (the Tango container) experiences a failover, a certain number of API requests to AICS might fail during the few seconds it takes for the system to recover. The routing strategy attempts to resend any failed request, but Agent State Connector (ASC) does not have this capability. As a result, there is a risk of a small data loss.

Note that error messages appear in the logs for both MongoDB and the AICS application when a failover occurs.

Health Checks for Your Deployment

To check the health of your Predictive Routing HA deployment, perform the following steps:

  1. Verify that all nodes are up and running. On any node in the cluster, execute the following command:
    docker node ls
    You should receive output similar to the following:
    .
    [pm@hostname ~]$ docker node ls
    ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
    mc0bgyueb3c0h9drsy3j0i2ty node-1-hostname Ready Active Leader
    vm1csljly66vwguxzaz8ly98r *node-2-hostname Ready Active Reachable
    z2vlnldcyh0y57jwns0bz9jxe node-3-hostname Ready Active Reachable
    .
    All nodes should be reachable.
  2. Check that all services are running by executing the following command on any node in the cluster:
    docker service ls
    You should receive output similar to the following:
    .
    [pm@hostname ~]$ docker service ls
    ID NAME MODE REPLICAS IMAGE PORTS
    jzjitn8lp78t mongo_mongo1 replicated 1/1 mongo:3.2
    iqntp5eabfnw mongo_mongo2 replicated 1/1 mongo:3.2
    whw05twosi9s mongo_mongo3 replicated 1/1 mongo:3.2
    1jp3sgt16czw tango_tango global 3/3 jop_tango:2017_12_12_15_17
    hu3kvkzxn88r workers_workers replicated 2/2 jop_tango:2017_12_12_15_17
    .
    • The important column here is REPLICAS.
    • The Tango service should always be global and reachable on port 3031 on every node in cluster
    • The MongoDB service is replicated, and should show 3/3 or 5/5 replicas (or however many are actually present in your environment). See Checking the Health of MongoDB (below) for how to check health of MongoDB database.
    • The Workers service is replicated and should show as many replicas as there are nodes labeled with the Workers label. See Label the Worker Nodes in the Cluster (above) for how to label nodes.

Checking the Health of MongoDB

All the commands listed below should show your MongoDB cluster with one PRIMARY instance and all other instances should be healthy SECONDARY instances.

  • To check the health of the MongoDB cluster while logged into node with hostname node-1-hostname execute following command on node-1-hostname:
    [pm@node-1-hostname ~]$ docker exec -it $(docker ps -qf label=com.docker.swarm.service.name=mongo_mongo1) mongo --ssl --sslCAFile /etc/ssl/mongodb.pem --sslAllowInvalidHostnames --eval 'rs.status()'
  • To check the health of MongoDB cluster while logged into node with hostname node-2-hostname execute following command on node-2-hostname:
    [pm@node-2-hostname ~]$ docker exec -it $(docker ps -qf label=com.docker.swarm.service.name=mongo_mongo2) mongo --ssl --sslCAFile /etc/ssl/mongodb.pem --sslAllowInvalidHostnames --eval 'rs.status()'
  • To check the health of MongoDB cluster while logged into node with hostname node-3-hostname execute following command on node-3-hostname:
    [pm@node-3-hostname ~]$ docker exec -it $(docker ps -qf label=com.docker.swarm.service.name=mongo_mongo3) mongo --ssl --sslCAFile /etc/ssl/mongodb.pem --sslAllowInvalidHostnames --eval 'rs.status()'

Similarly, you can check the health of MongoDB cluster from any other node where a MongoDB replica is running.

Other Useful Commands

Here are few more useful commands to troubleshoot MongoDB:

To find out the status of all members in the replica set, use the following command:

docker exec -it $(docker ps -qf label=com.docker.swarm.service.name=mongo_mongo3) mongo --ssl --sslCAFile /etc/ssl/mongodb.pem --sslAllowInvalidHostnames --eval "rs.status().members"

To remove an unreachable member, execute the following command (this has to be repeated for each unreachable member in a failed data center):

docker exec -it $(docker ps -qf label=com.docker.swarm.service.name=mongo_mongo3) mongo --ssl --sslCAFile /etc/ssl/mongodb.pem --sslAllowInvalidHostnames --eval 'rs.remove("HOST:PORT")'

(Optional) Backing Up Your Data

This section applies specifically to backing up and restoring in an HA environment. For instructions to back up and restore MongoDB in a single-site/single-server AICS deployment, see Backing Up and Restoring Your Data.

Although HA greatly reduces the likelihood of data loss, Genesys recommends that you back up your data to safeguard it. This section explains how to back up and restore your data in an HA environment.

Important
All MongoDB backup and restore operations should be performed on the PRIMARY MongoDB instance.

Backing Up

On every server where MongoDB is running, there is one important directory:

  • The /data/db directory in every MongoDB container is mapped to the /datadir directory on the server file system.

Use the mongodump command from inside the container to back up your MongoDB data, using the following command:

mongodump --out /data/db/`date +"%m-%d-%Y"`

This command backs up all databases in the /data/db/<date +"%m-%d-%Y"> directory located in the container. For example, you might back up the /data/db/12-18-2017 directory.

The backed-up data is located in the /datadir/<date +"%m-%d-%Y"> directory on the server host computer. For the example backup command above, the output would be located in the /datadir/12-18-2017 directory.

Restoring

In order to restore data you must first make data files available in the appropriate directory on the server host computer.

Use the following command inside of the container:

mongorestore /data/db/''PATH_TO_SPECIFIC_BACKUP_DIRECTORY''

For example, you might run the command:

mongorestore /data/db/12-18-2017

For extra information about backing up MongoDB and data preservation strategies, see the following topic on the MongoDB site: https://docs.mongodb.com/manual/core/backups/.

(Optional) Map a Local Volume into a Container

Local directories or files can be mapped on any of the containers user by the application in an HA deployment: tango, workers. or mongo.

Tip
An HA deployment in Production mode should not use NGINX.

To mount a volume, update the file corresponding to the desired container to a local directory or file by editing the volumes declaration:

  • tango: <IP_JOP_PRR_<version_number>_ENU_linux/ha-scripts/swarm/tango-swarm.yml
  • mongo: <IP_JOP_PRR_<version_number>_ENU_linux/ha-scripts/swarm/mongo-swarm5.yml / <IP_JOP_PRR_<version_number>_ENU_linux/ha-scripts/swarm/mongo-swarm.yml
  • workers: <IP_JOP_PRR_<version_number>_ENU_linux/ha-scripts/swarm/worker-swarm.yml

Important
Mapping a directory or file on a node makes it available only on that host. It does not create or imply any type of file replication.

To mount a local directory, follow the format presented in the following example:

  • To mount /some_local_directory, into /custom_mount_point in the mongo container on node-1, edit the <IP_JOP_PRR_<version_number>_ENU_linux/ha-scripts/mongo-swarm.yml file as follows:
   volumes:
     - mongodata1:/data/db
     - mongoconfig1:/data/configdb
     - ../conf/mongodb.pem:/etc/ssl/mongodb.pem
     - /some_local_directory:/custom_mount_point

To make the changes take effect restart the application:

bash <IP_JOP_PRR_<version_number>_ENU_linux/ha-scripts/restart.sh
Important
Additional information can be found at https://docs.docker.com/compose/compose-file/compose-file-v2/#volumes

Required Ports for AICS Servers

The following ports are those required for communication between all target servers in the cluster. Note that some ports are specific to high availability (HA) environments (such as the Docker swarm port), while others apply to all deployments.

Component Protocol Port Number Type Description
Docker TCP 2377 Inbound/Outbound Cluster management communications
Docker swarm TCP/UDP 7946 Inbound/Outbound Required for Docker Swarm for communication among nodes
Docker swarm UDP 4789 Outbound/Inbound For overlay network traffic
MongoDB TCP 27017 Inbound/Outbound Default port for MongoDB
Tango container TCP 3031 Inbound Required to access the Predictive Routing API and Predictive Routing web application
SSH TCP 22 Inbound/Outbound Required to access all target servers using SSH

To open a port, use the following syntax:

firewall-cmd --zone=public --add-port=<port_number>/<protocol> --permanent
Important
If you are running VMWare VXLAN, you might encounter a port conflict between VMWare VXLAN and Docker, both of which require port 4789. If you encounter this issue, Genesys recommends that you use a networking application such as Weave Net to manage networking among Docker containers. For additional information, consult the documentation for the respective products:

HA for ASC

Agent State Connector (ASC) has a standard primary-backup warm-standby high availability configuration. The backup server application remains initialized and ready to take over the operations of the primary server. It maintains connections to Configuration Server and Stat Server, but does not send agent profile updates to AICS.

To configure a primary-backup pair of ASC instances, create two ASC Application objects. Open the Server Info tab for the backup ASC set warm standby as the redundancy mode. When Local Control Agent (LCA) determines that the primary ASC is unavailable, it implements a changeover of the backup to primary mode.

ASC-HA-Arch.png

Important
If the Stat Server instance you are using for Predictive Routing is release 8.5.100.10 or higher, you must set the value for the accept-clients-in-backup-mode configuration option in the Stat Server Application object to no to ensure normal backup switchover between ASC instances.

Feedback

Comment on this article:

blog comments powered by Disqus
This page was last modified on October 25, 2018, at 13:49.