All machines disconnected

Incident Report for Connect

Postmortem

Incident Summary

On March 23 at 06:24 UTC, we became aware of an issue where machine production events were not appearing in our cloud platform. While outbound commands to machines continued to function normally, inbound production data was temporarily not being processed as expected. Engineering teams were engaged immediately, and communication was restored by 07:29 UTC. At 09:33 UTC, all carpet machine weaving events were fully processed again

Impact

Machine operations were not affected during the incident. However, production data sent to the cloud during this period may have been delayed or incomplete. In some cases, customers may notice missing or inconsistent data for the affected timeframe. Our support team is available to assist with any required corrections.

Root Cause

The issue was traced back to a problem in the underlying database layer that temporarily prevented inbound messages from being processed. As a result, the issue was not detected through automated monitoring as early as intended.

Resolution

Once identified, engineering teams restored normal message processing. Data integrity checks were performed to confirm system consistency.

Preventive Actions

To reduce the likelihood of recurrence and further improve detection and response, we are implementing the following measures:

  • Increased consistency and resilience across infrastructure configurations
  • Enhanced monitoring and alerting mechanisms with improved escalation paths
  • Additional automated checks to detect data flow interruptions earlier
  • Clearer internal procedures for incident response coordination

These improvements strengthen our ability to detect and resolve issues proactively while maintaining platform reliability.

Posted Mar 25, 2026 - 10:40 UTC

Resolved

This incident has been resolved.
Posted Mar 24, 2026 - 13:44 UTC

Monitoring

The plant view is now fully back online, and all machines are successfully reconnected.
We will continue to monitor the recovery process closely and provide an update as soon as additional information becomes available.
Posted Mar 23, 2026 - 07:32 UTC

Update

We are continuing to work on a fix for this issue and are actively progressing toward full restoration.
Our team remains fully engaged, and we will provide further updates as soon as new information becomes available.
Posted Mar 23, 2026 - 07:31 UTC

Identified

We have identified the root cause of the issue and are currently restoring live communication.
The system should begin reconnecting progressively, and we will continue to monitor the recovery closely.
We will provide a further update once full service has been confirmed.
Posted Mar 23, 2026 - 07:27 UTC

Investigating

On Connect, it appears that all machines across all subscriptions are currently showing as disconnected.
We're already looking into this and will carry out the necessary checks to determine the scope and root cause of the issue.
We will keep you updated as soon as We have more information.
Posted Mar 23, 2026 - 07:27 UTC
This incident affected: Monitoring and API's.