Jump to: navigation, search

System Monitoring and Logging

Monitoring AICS and its Containers

AICS uses a number of logs to track the status of the various containers: Tango, Gunicorn worker containers, NGINX, and MongoDB. This section covers logging for each container.

Access Logs for AICS

Important
To access the logs conveniently, you must add your username (PR_USER, by default) to the following Linux group: sudo usermod -aG systemd-journal pm. Otherwise, you must use the sudo command to see the logs for the various containers.

To access the AICS logs run the following commands:

  • For Tango logs:
journalctl CONTAINER_NAME=tango -o cat
  • For MongoDB logs:
journalctl CONTAINER_NAME=mongo -o cat
  • For Model Training workers logs (2 containers run by default):
journalctl CONTAINER_NAME=workers_model_training_1 -o cat
journalctl CONTAINER_NAME=workers_model_training_2 -o cat
  • For Analysis workers logs:
journalctl CONTAINER_NAME=workers_analysis_1 -o cat
  • For Purging workers logs:
journalctl CONTAINER_NAME=workers_purging_1 -o cat
  • For Dataset Upload workers logs:
journalctl CONTAINER_NAME=workers_dataset_upload_1 -o cat
  • For NGINX logs:
journalctl CONTAINER_NAME=nginx -o cat

Examples:

1) To get last 100 lines of the Tango log, run:

journalctl CONTAINER_NAME=tango -n 100 -o cat

2) To get last 60 minutes of the Tango log, run:

journalctl CONTAINER_NAME=tango --since="1 hour ago" -o cat

3) To get the last ten hours of the MongoDB log, run:

journalctl CONTAINER_NAME=mongo --since="10 hour ago" -o cat

4) To tail Tango logs, run:

journalctl CONTAINER_NAME=tango -f -o cat

Configure Log Level

To set the appropriate log level on the Tango container, configure the LOG_LEVEL variable in the tango.env file.

There are two possible values:

  • INFO - Informational messages that highlight the progress of the application: LOG_LEVEL=INFO
  • DEBUG - Fine-grained informational events that are most useful to debug the application: LOG_LEVEL=DEBUG

If you change the log level setting, run the bash scripts/restart.sh command command to have them take effect.

Checking the Logs for HA AICS Containers

To access AICS logs when it is running in a HA architecture, execute the following commands on any node in the cluster:

  • For Tango logs:
    docker service logs tango_tango
  • For MongoDB logs:
    docker service logs mongo_mongo1
    docker service logs mongo_mongo2
    docker service logs mongo_mongo3

And so on, for however many MongoDB nodes you have configured.

  • For Workers logs:
    docker service logs workers_analysis
    docker service logs workers_model_training
    docker service logs workers_purging
    docker service logs workers_dataset_upload

To return only the last N lines of a log file, use the same commands as above, appending the command --tail N, as in the following example:

docker service logs workers_analysis --tail 100

To continuously stream output of a log, use the same commands as above, appending the command -f, as in the following example:

docker service logs workers_analysis -f

Monitoring Agent State Connector

Agent State Connector writes log data to Message Server, using the usual Genesys logging parameters. These are configured using the Agent State Connector [log] Section configuration options.

Genesys recommends that you configure alarms to notify you when the Agent State Connector generates the following Standard-level log messages:

  • 60400|STANDARD|error1|error... %s
  • 60401|STANDARD|error2|%s error... %s
  • 60402|STANDARD|exception1|exception caught and processed... %s
  • 60403|STANDARD|exception2|%s exception caught and processed... %s
  • 60404|STANDARD|failed_exc|%s failed, exception caught and processed... %s
  • 60701|STANDARD - Stat Server has experienced multiple switchovers during the period specified in the ss-monitoring-reconnect-min option.
  • 60702|STANDARD - Configuration Server has experienced multiple switchovers during the period specified in the confserv-monitoring-reconnect-min option.
  • 60703|STANDARD - Stat Server is losing connection with ASC. The number of times the connection is lost before this alarm is triggered is set in the ss-monitoring-reconnect-count option.
  • 60704|STANDARD - The cancel event for 60703.
  • 60706|STANDARD - Configuration Server is losing connection with ASC. The number of times the connection is lost before this alarm is triggered is set in the confserv-monitoring-reconnect-count option.
  • 60707|STANDARD - The cancel event for 60706.
Important

Alarm conditions are configured for all ASC instances.

  • If you receive an alarm condition 60402, AgentStateConnectorException1, or 60403, AgentStateConnectorException2, ASC cannot recover. You must restart ASC. Genesys recommends that you also contact Genesys to evaluate your environment and prevent such conditions in future.
  • If you receive an alarm condition 60400, AgentStateConnectorError1, 60401, AgentStateConnectorError2, or 60404, AgentStateConnectorExceptionCaught, the application continues to operate normally.

Example ASC Log Messages

  • ASC starts to read Person profiles from Configuration Server:
    04:16:52.702 Trc 09900 (AgentStateMonitor).(run):Main loop just started!
    04:16:52.703 Dbg 09900 (ConfigServerQueryEngine).(readAllAgents):Trying to query config server
  • One Person record was added to the ASC internal cache:
    04:22:48.367 Dbg 09900 (AgentStateMonitor).(initializeAgentData):Adding agent into current map: 6003880
  • The data for an Agent Group was added to the appropriate Agent Profiles in the cache:
    04:30:06.171 Dbg 09900 (AgentStateMonitor).(initializeAgentData):Adding group: EWT_ROG_VO_QA_CTI_01 to agent in the current map: T_QA_CTI_01
  • ASC starts loading agent configuration data to JOP:
    04:30:06.210 Dbg 09900 (JOPConnector).(upsertAgentBatch):Upsert request:
    [{lastName=Johnston, loginId=, native_id=8582230, RS_TPV_ALPCC00001=2, RS_SUP_851457=2, RS_SID_52=2, employeeId=8582230, attached_data={groupNames=[AG_LOC_ALPCC, AG_SID_52, AG_TPV_ALPCC00001, AG_SUP_851457, TP_ALP, DORENE_MOORE_851457, KAREN_PAVICIC_848751, NATASHA_MCMURRAY_890166, PETER_FINLAY_861575, COLLIN_MASON_874176, AG_TPW_CLB, DD_Not_skilled], RS_TPV_ALPCC00001=2, RS_SUP_851457=2, RS_SID_52=2, RS_LOC_ALPCC=2, RS_TPW_CLB=2}, userName=8582230, loginStatus=-1, on_call=false, skills={RS_TPV_ALPCC00001=2, RS_SUP_851457=2, RS_SID_52=2, RS_LOC_ALPCC=2, RS_TPW_CLB=2}, groupNames=[AG_LOC_ALPCC, AG_SID_52, AG_TPV_AL…
  • The agent configuration data was submitted successfully:
    04:30:58.164 Trc 09900 (JOPConnector).(upsertAgentBatch):Status ok: true// 0.0//200.0//0.0
  • ASC finished reading agent configuration data and starts to subscribe to agent login statistics:
    05:40:20.899 Trc 09900 (AgentStateMonitor).(initializeAgentData):TimeDelta: Total time to push 17641 agents to JOP=4214727
    05:40:20.906 Trc 09900 (StatServerCL).(subscribeAgents):need to subscrib agent count:17641
  • Subscription to agent login statistics is completed:
    05:40:21.635 Trc 09900 (StatServerCL).(registerEventCallbacks):TimeDelta: Took a total of 736 to register Stat Server callbacks for 17641 agents.
  • ASC startup is completed:
    05:40:21.635 Trc 09900 (AgentStateMonitor).(run):Driver main loop - starting a new iteration.

Logging Strategy Subroutines Performance

The Predictive Routing strategy subroutines write both error messages and informational messages into the URS log file and attach data to the processed interactions for reporting purposes.

A macro, PRRLog, which is called by the subroutines supplied with Genesys Predictive Routing, logs messages in the following situations:

  • No agents are returned for the skill expression.
  • There is no response from Predictive Routing within an acceptable amount of time.
  • There is an exception of some kind from the Predictive Routing scoring engine.

In addition, you can configure URS to log http requests and responses in a separate file.

The PrrIxnLog subroutine captures which agent an interaction was actually routed to and which predictive model was used for scoring. This is essential to properly conduct A/B testing, which leads to improved models and predictions.

Troubleshooting the Strategy Subroutines Using the URS Log

For the IRD strategy monitoring in a URS-based Predictive Routing environment:

  • Alarm condition 23001 in URS indicates authentication failure in the attempt to connect to the Journey Optimization Platform.
  • Alarm condition 23002 in URS indicates an empty list of agents provided by the scoring engine in response to a scoring request.

Monitoring Queue-Level Statistics

You can configure a Pulse dashboard to monitor queue-level statistics for the interactions processed by Predictive Routing. Templates for real-time Predictive Routing reporting are available from the Genesys Dashboard Community Center.

Feedback

Comment on this article:

blog comments powered by Disqus
This page was last modified on August 23, 2018, at 12:25.