Jump to: navigation, search

GWS Disaster Recovery Scenarios

Important
This page lists the several DR Scenarios when using GWS with SIP Cluster.


The following scenario is assumed to be the standard environment setup for the disaster recovery scenarios:

  • Two PSTN agents are connected to the GWS1@DC1.
  • SIP Cluster DN ownership for the PSTN agents is established on the SIP Cluster node where GWS is connected (SIP Server 1).
  • Both agents are logged in.
  • Agent1 handles Call1 and Agent2 handles Call2.
Gws sipcluster initial dr environment.png
Warning
If best practices are applied, then failures, which occur in DC2 or with the connection link between DC1 and DC2, do not affect calls and agents in DC1, but local DC1 failures still do.

Some of the sample DR Scenarios covered are:

  • WWE reconnects to a new GWS API node
  • GWS disconnects from SIP Server
  • TC-layer between DC1 and DC2 fails
  • Softphone disconnects from DC1


WWE reconnects to a new GWS API node

Gws sipcluster dr wwe reconnects to new GWS.png

Failure Condition

  • WWE disconnects from GWS1 (e.g., NIC failure).
  • SIP and media paths are not impacted.
  • Agent can continue talking to the customer.

Summary

  • Agents can continue talking to the customers at all times.
  • Agents are logged out from their desktop and need to log on again.
  • Agent2 may need to log on twice.
  • Agents temporarily lose call control through WWE.

Agent 1

WWE Error messages

  • “Connection to the server has been lost. Reconnecting…”
  • If not recovered after 1 minute, “Unable to establish connection. Please refresh your browser and log in again”

Recovery

  1. Agent 1 connects to GWS2@DC1 and NGINX forwards WWE to the other node in the same DC.
    • If GWS1 is down, NGINX drops the sticky session and opens a new session to GWS2.
  2. Agent 1 logs on again and GWS2@DC1 submits the following requests:
    • TRegisterAddress(DN1): existing call details are reported.
    • TAgentLogin(DN1): rejected because there is a call in progress.
  3. Agent can control the call through WWE but cannot log in. The call is released and ACW is over.
  4. Agent logs in to GWS1@DC1 with the TUnregisterAddress(DN1) request.
    • SIP Cluster doesn’t move the ownership because new desktop application connects to the same SIP Cluster node.

Agent 2

  1. Agent 2 undergoes the same process as Agent 1 till Step 4 (when GWS1@DC1 submits TUnregisterAddress(DN2)).
    • It happens independently of call or ACW state.
  2. SIP Cluster waits until call and ACW is over and moves the ownership to SIP Server 3 @DC2.
  3. If Agent 2 logs on successfully before that, then SIP Server logs Agent 2 out before moving the ownership and Agent 2 has to log on again.
  4. If GWS1 fails and SIP Server 1 detects it, there is no TUnregisterAddress(DN2) received but SIP Server triggers logout-on-disconnect mechanism, which leads to agent logout for Agent 2.
  5. If GWS1@DC1 doesn’t submit TUnregisterAddress(DN2) and doesn’t disconnect from SIP Server, then ownership is not transferred.

GWS disconnects from SIP Server

Gws sipcluster dr gws disconnected.png

Failure Condition

  • GWS1 disconnects from SIP Server 1 @ DC1 (e.g. NIC or network failure).
  • GWS1 is still running.
  • SIP and media paths are not impacted.

Summary

  • Agents can continue talking to the customers at all times.
  • Agents are logged out from their desktop and need to log on again.
  • Agent 2 may need to log on twice.
  • Agents temporarily lose call control through WWE.

Details

  1. GWS disconnects from SIP Server.
  2. SIP Server detects that the desktop client is disconnected.
  3. SIP Server logs out an agent (logout-on-disconnect = true), but the call is preserved and agent continues talking to the customer.
  4. GWS notifies the agent about the voice channel in OutOfService state and the loss of connection to the SIP Server.
  5. WWE reports the voice channel as unavailable.
  6. WWE reconnects to the available GWS node when call is still in progress.
  7. Agent can control the call but cannot log in.
  8. Call is released and ACW is over.
  9. Agent logs on.
  10. If agent is connected to the new DC, SIP Server waits for 30 seconds, logs the agent out, and moves the ownership.
  11. Agent has to log on again.

TC-layer between DC1 and DC2 fails

Gws sipcluster tclayer failure.png

Conditions

  • Agent 1: PSTN agent as in previous example
  • Call1: SIP Server 3 @ DC2
  • Agent2 is a Softphone agent:
  • Ownership: SIP Server 4 @ DC2
  • Desktop: GWS1@DC1 connected to SIP Server 1 @ DC1
  • Call2: SIP Server 2 @ DC1

Failure Condition

  • TC-layer fails between DC1 and DC2.
  • SIP and media paths are not impacted.

Summary

  • Best practices are not followed.
  • Agent1: Call and agent are in different DC’s (not a local routing).
  • Agent 2: Ownership and desktop are in different DC’s (priorities of WWE FQDN and SIP FQDNs are not aligned).
  • Agents can continue talking to the customers at all times.
  • Agents lose control over the call (for different reasons).
  • Agent 2 may need to log on twice.

Agent 1

  1. Call continues, audio is not impacted.
  2. Agent loses 3pcc control over the call and doesn’t receive call-related T-Events.
  3. As soon as actual call is over, Agent 1 can submit TReleaseCall from WWE to avoid getting a stuck call on a desktop.
  4. Call control comes back if TC-layer is recovered.
  5. Agent 1 can restore 3pcc control over the call by reconnecting to DC2 but there is no notification suggesting to do that.

Agent 2

  1. There is no problem for Agent 2 after the failure as SIP and RTP connections are still OK between the SIP Server 2 @ DC1 and agent’s phone.
  2. SIP Server 1 @ DC1 has lost connection to the DN owner.
    • Agent 2 is logged out and the DN is set to OutOfService state.
    • Audio channel is not impacted.
  3. WWE reports that voice channel is not available.
  4. SIP Server 4 @ DC2 logs out the agent based on ‘logout-on-disconnect’ = true
    • The call continues.
  5. Agent 2 can not log on to any GWS node in DC1.
  6. Agent 2 should reconnect to DC2 to log on.
  7. Agent 2 can log in.
    • Agent 2 can continue talking to the customer.
    • But there is no call reported in the desktop because the call is in the disconnected DC.

Softphone disconnects from DC1

Gws sipcluster softphone disconnect.png

Conditions

  • Agent2 is a Softphone agent:
  • Ownership: SIP Server 1 @ DC1
  • Desktop: GWS1@DC1 connected to SIP Server 1 @ DC1
  • Call2: SIP Server 2 @ DC1

Failure Condition

  • Softphone cannot connect to DC1 .
  • The call is lost because both SIP and media connections are affected at the same time.
  • If the connection is restored when existing registration hass not expired, then DN stays in service and agent is not affected.

Details

  1. The connection to DC1 is lost permanently.
  2. Softphone detects the failure when attempting to refresh the registration and registers at SIP Server 3 @ DC2.
  3. There is no call or ACW.
  4. SIP Server 1 @ DC1 logs out an agent and immediately moves the ownership.
  5. WWE reports Unable to establish connection. Please refresh your browser and log in again error message.
  6. Agent has to log on again.
  7. Agent has to log on to any DC.

Feedback

Comment on this article:

blog comments powered by Disqus
This page was last modified on June 28, 2018, at 15:18.