Voice Quality Impairment
Incident Report for Cool Telecom
Postmortem

Date: 04/29/21
Prepared by: Eric Nelson
Market: Dallas, TX
IQNT Trouble Ticket: 331288
Date and Time Event Began: 04/28/21 approximately 1245 CT
Date and Time Event Ended: 04/28/21 approximately 1500 CT

Event Summary:

Some customers may have experienced an impact to service on April 28th, between approximately 1245-1500 CT due to a network event in Dallas, TX when our monitoring and alerting systems indicated performance issues within our Dallas market. Indications of packet latency, errors, and drops were degrading performance on some of our platforms and services hosted out of and/or transiting the Dallas market.

Customer reports of poor quality, call failures, and packet loss started coming in around 1300 CT. Approximately 1315 CT it was determined that one of our core routers was having what appeared to be an ARP caching issue which seemed to impact its ability to forward packets and the services through it. The engineering team routed traffic away from the affected router at approximately 1320 CT which alleviated most of the impact. They engaged the hardware vendor at nearly the same time. They made a configuration change at approximately 1500 CT that effectively resolved the issue. Engineering reverted all traffic re-routing and manual intervention changes on the router and restored it to normal duties and loads by approximately 1600 CT.

Reason For Outage:

Recent configuration changes preparing for migration work resulted in a VLAN, which was pre-existing on another path, to allow for looping on the layer 2 network. This looping impacted a router and four voice gateways.

The configuration which allowed the loop was reversed at approximately 1500 CT. The four gateways and all services restored once the loop was removed. Engineering reverted all traffic re-routing and manual intervention changes on the router and restored it to normal duties and loads by approximately 1600 CT.

Recommendations for preventing a reoccurrence:

Preparation work of this type is to be conducted after hours (and off peak) to minimize any potential impact from these configuration changes from typically non-impactful work. This is already standard policy within Inteliquent, and this policy was reinforced with the team and the engineers making the configuration changes where re-educated. – Complete

Additionally, process steps will be added to MOPs for this type of maintenance effort (a.k.a. Router Platform Migrations). These changes will prevent the potential of looping issues by restricting the number of interconnects between old platforms and new platforms to one during the migration work. - Complete

Note to Cool Telecom Customers

Inteliquent is one of several of our upstream providers for dial tone connectivity and call routing. This is why only some customers were affected.

Posted Apr 29, 2021 - 16:07 CDT

Resolved
This incident has been resolved.
Posted Apr 28, 2021 - 16:05 CDT
Identified
The issue has been identified and a fix is being implemented.
Posted Apr 28, 2021 - 15:38 CDT
Update
We are continuing to investigate this issue. If you are experiencing call quality issues, please open a support ticket at HTTP://cooltele.com/ticket to let us know you are among the affected customers. As of the writing of this update, we are aware of only one customer that has been affected by this issue. We will continue to investigate and update this status page as we learn more.
Posted Apr 28, 2021 - 15:15 CDT
Update
We are continuing to investigate this issue.
Posted Apr 28, 2021 - 14:45 CDT
Investigating
Cool Telecom teams are currently investigating reports of degraded audio quality and dropped calls on inbound and outbound traffic for some customers. All appropriate teams are actively engaged. We will provide an update once more information becomes available.
Posted Apr 28, 2021 - 14:18 CDT
This incident affected: Calling (Inbound Calling, Outbound Calling).