During the time window between 2:19 and 2:31 JST, our internal time-series database (TSDB) experienced issues that impacted a portion of metrics data. As a result:
• BizStack Console and Assistant failed to display metric data from the most recent 24 hours.
• Threshold-based alerts intermittently did not trigger as expected.
• Heartbeat (liveness) alerts produced false positives.
These issues were intermittent and did not affect all customers or all metrics.
The incident was triggered during maintenance work to add a new read replica to the MongoDB cluster backing our TSDB.
• A read replica was mistakenly created from a 1-month-old snapshot, rather than the latest snapshot used in standard operations.
• Because the snapshot was outdated, the new replica required additional time (about 10 minutes) to catch up with recent data.
• Some TSDB queries were routed to this replica before it had fully synchronized, causing queries to return incomplete or stale data.
Under normal procedures, replicas are created from up-to-date snapshots, allowing near-immediate synchronization. This operational error introduced unexpected replication lag.
MODE regularly adds and rotates replica nodes in production as part of routine maintenance and resilience readiness. The issue occurred during this standard procedure.
After synchronization completed, metrics and alerting behavior returned to normal.
We are implementing the following improvements to prevent recurrence:
We apologize for any disruptions caused and appreciate your understanding as we continue to improve the reliability of MODE’s monitoring infrastructure.