Partial Inbound/Outbound Calling Outage
Incident Report for OneCloud
Postmortem

Incident Summary:
On July 3, 2024 between 15:15 UTC and 17:34 UTC, OneCloud experienced issues with inbound and outbound calls due to incorrectly tagged calls from an upstream carrier, triggering DDoS protections and significantly reducing call processing speed.

Root Cause:
The upstream carrier was sending calls that were not correctly tagged for OneCloud services. When OneCloud sent error messages back to the carrier, the carrier continued to attempt the calls instead of stopping, leading to a flood of requests.

Impact:

  • Reduced speed of call processing
  • Potential disruption to customer communications
  • Temporary degradation of service quality

Detection:

  • OneCloud engineers quickly identified the issue through monitoring systems and error logs.

Resolution:

  • OneCloud engineers immediately engaged with the upstream carrier to resolve the incorrectly tagged calls.
  • The carrier corrected the tagging issue, stopping the flood of erroneous call attempts.

Corrective Actions:

  • OneCloud has refined its DDoS protection mechanisms to narrow the impact when triggered, improving service resilience.
  • Enhanced monitoring and alerting systems to detect similar issues more quickly in the future.

Future Prevention:

  • Regular audits of carrier integrations and tagging protocols
  • Continued refinement of DDoS protection systems
  • Improved communication channels with upstream carriers for faster issue resolution

We apologize for any inconvenience caused to our customers during this incident. OneCloud remains committed to providing the highest quality of service and will continue to improve our systems to prevent similar occurrences in the future.

OneCloud-After-Incident-RCA-07032024.pdf

Posted Jul 03, 2024 - 20:35 UTC

Resolved
The issue causing the partial outage affecting inbound and outbound calling services has been fully resolved. All services are now operating normally.
Posted Jul 03, 2024 - 18:24 UTC
Monitoring
We have successfully implemented a fix for the partial outage affecting inbound and outbound calling services. Our team is now monitoring the situation to ensure that the issue has been fully resolved.
Posted Jul 03, 2024 - 17:34 UTC
Identified
We have identified the issue causing the partial outage affecting inbound and outbound calling services. Our engineering team is actively working on implementing a fix.
Posted Jul 03, 2024 - 16:48 UTC
Investigating
We are currently investigating reports of a partial outage affecting inbound and outbound calling services. Our engineering team is actively investigating the issue to identify the root cause and restore full service as quickly as possible.
Posted Jul 03, 2024 - 15:15 UTC
This incident affected: OneCloud (Inbound Calling, Outbound Calling).