Jump to: navigation, search

protection

Section: overload
Default Value: false
Valid Values: true, false
Changes Take Effect: Immediately
Introduced: 8.5.108

Controls whether the overload protection is applied during the Stat Server overload.

qos-default-overload-policy

Section: overload
Default Value: 0
Valid Values: 0, 1, 2
Changes Take Effect: After restart
Introduced: 8.5.108

Defines the global overload policy.

If this option is set to:

  • 0 (zero) - sends and updates for requested statistics can be cut
  • 1 - only sends of statistics to Stat Server clients can be cut
  • 2 - nothing can be cut. Stat Server updates and sends all requested statistics.

qos-recovery-enable-lms-messages

Section: overload
Default Value: false
Valid Values: true, false
Changes Take Effect: After restart
Introduced: 8.5.108

Enables Standard recovery related log messages, which are introduced for debugging purpose:

10072 “GCTI_SS_OVERLOAD_RECOVERY_STARTED - Overload recovery started on %s (%d current CPU usage)”

10073 “GCTI_SS_OVERLOAD_RECOVERY_FAILED - Overload recovery failed on %s (%d current CPU usage)”.

qos-default-overload-policy

Section: overload
Default Value: 0
Valid Values: 0, 1, 2
Changes Take Effect: After restart
Introduced: 8.5.108

Defines the global overload policy.

If this option is set to:

  • 0 (zero) - sends and updates for requested statistics can be cut
  • 1 - only sends of statistics to Stat Server clients can be cut
  • 2 - nothing can be cut. Stat Server updates and sends all requested statistics.

protection

Section: overload
Default Value: false
Valid Values: true, false
Changes Take Effect: Immediately
Introduced: 8.5.108

Controls whether the overload protection is applied during the Stat Server overload.

cut-debug-log

Section: overload
Default Value: true
Valid Values: true, false
Changes Take Effect: Immediately
Introduced: 8.5.108

Controls debug logging in the overload. If set to true, the debug log is cut during the Stat Server overload.

cpu-threshold-low

Section: overload
Default Value: 60
Valid Values: 0-100
Changes Take Effect: After restart
Introduced: 8.5.108

Defines the lower level of the main thread CPU utilization threshold, which signifies the start of the Stat Server recovery.

cpu-threshold-high

Section: overload
Default Value: 80
Valid Values: 0-100
Changes Take Effect: After restart
Introduced: 8.5.108

Defines the higher level of the main thread CPU utilization threshold, which signifies the start of the Stat Server overload.

cpu-poll-timeout

Section: overload
Default Value: 10
Valid Values: 1-60
Changes Take Effect: After restart
Introduced: 8.5.108

Defines, in seconds, how often the main thread CPU is polled.

cpu-cooldown-cycles

Section: overload
Default Value: 30
Valid Values: 1-100
Changes Take Effect: After restart
Introduced: 8.5.108

Defines the number of cpu-poll-timeout cycles in a cooldown period.

For example, if the cpu-poll-timeout = 10sec and cpu-cooldown-cycles = 30, then the cooldown period is 10x30 =300sec. It means that the main thread CPU should be below the value of the cpu-threshold-low option for 300sec, after this period overload recovery is considered to be over.

allow-new-requests-during-overload

Section: overload
Default Value: true
Valid Values: true, false
Changes Take Effect: Immediately
Introduced: 8.5.108

Controls whether new requests can be made during the Stat Server overload.

allow-new-connections-during-overload

Section: overload
Default Value: true
Valid Values: true, false
Changes Take Effect: Immediately
Introduced: 8.5.108

Controls whether new clients can connect during the Stat Server overload.

cut-debug-log

Section: overload
Default Value: true
Valid Values: true, false
Changes Take Effect: Immediately
Introduced: 8.5.108

Controls debug logging in the overload. If set to true, the debug log is cut during the Stat Server overload.

Overload Protection

Starting with release 8.5.108, Stat Server supports overload protection.

Introduction

When and why to use overload protection?

The number of opened statistics depends on the client demands. The more statistics are opened or the more incoming events are received, the higher Stat Server CPU consumption. Stat Server application is not scalable and, in certain circumstances, it may start behaving unreliably (disconnect clienst, get disconnected from servers, delay computations).

Stat Server load is %CPU, consumed by its main thread. It depends on the rate of incoming events and number (and parameters) of open statistics. Overload protection is a method of reducing CPU consumption as a response to Stat Server overload. The load range is defined as [min,max]. The cooldown is a predefined duration of time, when the load is less then min. Stat Server is in overload, if the load exceeded max, and no cooldown happened since then. Stat Server is in recovery, if it is in overload, and current load is less then min.

Overload protection consists of the following load reducing measures:

  • Measure 1. Cut debug logging, controlled by the settings of the cut-debug-log option.
  • Measure 2. Stat Server cannot skip incoming events and always processes them. However, it can lower the quality of service for some statistics in order to reduce CPU consumption. Also, it can skip some operations in the pipeline above: for some statistics, it may stop recalculating values and sending them.
  • Measure 3. For some statistics Stat Server may stop updating aggregate. Please note, that measure 3 includes measure 2.

As soon as Stat Server hits the predefined high CPU threshold, it enters the state of overload. To leave that state, CPU should remain below predefined low threshold for predefined cooldown period.

The goal of the overload protection is to skip minimal amount of operations of statistical sends and updates to reduce CPU consumption to the acceptable level.

Tip
The following statistical categories are not affected by overload protection:
  • CurrentTargetState
  • CurrentState
  • CurrentStateReasons

Configuration Options

The following new configuration options are added to Stat Server starting with release 8.5.108:

Option Summary
allow-new-connections-during-overload Allows new clients to connect during overload.
allow-new-requests-during-overload Allows opening new statistics during overload.
cpu-cooldown-cycles The number of cpu-pool-timeout cycles in a cooldown period (Cooldown period / cpu-poll-timeout).
cpu-poll-timeout Timeout of polling main thread CPU, in seconds.
cpu-threshold-high The higher boundary of the load range.
cpu-threshold-low The lower boundary of the CPU range.
cut-debug-log Controls the debug log in overload.
protection Enables/disables protection.
qos-default-overload-policy Default overload policy.
qos-recovery-enable-lms-messages Enables recovery-related LMS messages.

The above options are configured in the [overload] section of the Stat Server application.

The overload policy may vary from statistic to statistic, depending on the end-user preferences. The default overload policy, defined by the qos-default-overload-policy option settings, can be overridden on the stat type level by the DynamicOverloadPolicy option in the [<stat type>] section:

Option Values Description
DynamicOverloadPolicy
  • 0 (default) - sends and updates for requested statistics can be cut
  • 1 - only sends of statistics to Stat Server clients can be cut
  • 2 - nothing can be cut, Stat Server updates and sends all requested statistics.
Defines actions that Stat Server may apply to a given statistic to reduce the overload

LMS Messages

New LMS messages, associated with overload protection, are listed below:

  • 10070|STANDARD|GCTI_SS_OVERLOAD_DETECT|Overload detected on %s (%d current CPU usage)
  • 10071|STANDARD|GCTI_SS_OVERLOAD_END|Overload ended on %s (%d current CPU usage)
  • 10072|STANDARD|GCTI_SS_OVERLOAD_RECOVERY_STARTED|Overload recovery started on %s (%d current CPU usage)
  • 10073|STANDARD|GCTI_SS_OVERLOAD_RECOVERY_FAILED|Overload recovery failed on %s (%d current CPU usage)
  • 10074|STANDARD|GCTI_SS_OVERLOAD_PROTECTION_ACTIVATED|Overload protection on %s activated
  • 10075|STANDARD|GCTI_SS_OVERLOAD_PROTECTION_DEACTIVATED|Overload protection on %s deactivated
Important
  • Messages 10070 and 10071 are recommended for operations monitoring.
  • Messages 10072 and 10073 are for debugging purposes only, they are disabled by default.
  • Messages 10074 and 10075 are generated when the protection configuration option changes its value (or at startup). We need this information in the standard log because the debug logging is cut, when Stat Server is in overload. These messages are for troubleshooting only.

See also Stat Server Deployment Guide for more information on LMS messages.

Performance Counters

The following table includes new performance counters:

Counter Description
cpu Main-thread CPU percentage (% of single processor)
pcpu Process CPU percentage (% of total)
shc stats hit count
shcs stats hit count suppressed
clens client events not sent
opc overload periods count
opd overload periods duration sec
osn overload stats normal
osns overload stats not sent
osnu overload stats not updated

Feedback

Comment on this article:

blog comments powered by Disqus
This page was last modified on December 19, 2017, at 14:11.