Inbound/Outbound Service Impact
Incident Report for Cool Telecom
Postmortem

Event Summary:

At 10:47 AM Central time on September 22nd, Inteliquent (one of our upstream voice providers) had an issue occur which impacted some customers and call flows that transit our Denver POP. Inteliquent engineering teams were engaged immediately, as well as on-site field staff. The issue impacted the routers by causing severe flapping of protocols which also made consistent access to the routers difficult for diagnosis.

At 11:40 AM the engineers determined there was a loop, and due to the access issues decided to turn down one of the routers in the pair to stabilize the network. The shutdown of the router immediately stabilized the local transit infrastructure and restored much of the impacted services. However, there were still some indications of call failures through the Denver infrastructure. Additional investigation revealed issues with call processing for a couple of voice network elements. The engineering teams determined that the logic in those devices was corrupted due to the routing instability and needed to be failed over to their redundant sides to clear the issues.

These devices were switched over between 1232 CT-1234 CT on the same day. After which, call flows stabilized and all services restored.

Reason For Outage:

A loop between two edge routers in the Denver POP was created as a result of new activation work and ultimately caused by an error in our records and assignment processes. Essentially, a new port was assigned which had previously been used for a connection between routers and hadn’t been physically removed.

A review of the migration process determined that this migration was atypical due some hardware resourcing issues to support the migration. The additional delay in this process impacted the typical activation-migration-disconnect process and the disconnect portion was missed.

Posted Sep 24, 2021 - 14:41 CDT

Resolved
This incident has been resolved.
Posted Sep 22, 2021 - 14:47 CDT
Monitoring
Thank you for being patient while our engineering team worked through this network disruption. Our engineers were able to divert the traffic to alleviate the impacts of this issue. We have performed detailed internal testing and all services show to be operational at this time. Please retest and reach out to our support team at http://ticket.cooltele.com for further assistance.
Posted Sep 22, 2021 - 12:55 CDT
Identified
The issue has been identified and a fix is being implemented.
Posted Sep 22, 2021 - 11:58 CDT
Update
We are continuing to investigate this issue.
Posted Sep 22, 2021 - 11:51 CDT
Investigating
We have reports from some of our users that they are unable to make outbound calls. A few of those customers have also reported an inability to receive calls. We are investigating this issue and will provide more information as soon as we receive it.
Posted Sep 22, 2021 - 11:27 CDT
This incident affected: Calling (Inbound Calling, Outbound Calling).