Deliver Web and Route Outage

Incident Report for Innovo

Resolved

This incident has been resolved. We created a couple indexes to address some slower queries over the weekend and believe that should address this issue moving forward. We appreciate everyone's patience!!

Posted Jan 27, 2025 - 07:17 MST

Update

We still haven't found the actual root cause for the outage yesterday but we did find there was a spike in IOPS which is the standard unit of measurement for the maximum number of read/writes per second. This could have been caused by the portal change implemented this past weekend to turn on Deliver alerts by default for everyone or it could have been caused by a bad query. We are just not sure yet. There were also out of memory issues so we increased the heap size to account for that. We will monitor the IOPS today to see if that needs to be increased as well. Again, we really appreciate everyone's patience yesterday and apologize for the inconvenience. Fortunately operations were only affected for a short time.

Posted Jan 24, 2025 - 07:29 MST

Monitoring

We have identified and fixed the issue and all services should be coming back online. This will be a rolling process as we had to make a DNS change. We have not yet found the root cause so we will continue to monitor the systems throughout the day and provide additional updates as needed. Unfortunately any breadcrumb data that was accumulated in the Deliver app during the time of the outage will not show in Deliver Web. And you may see manifests that were completed during the outage not reflect a completed status in Deliver Web. During the time of the outage, all communication with Eclipse was intact so signatures, photos, comments, etc. were updated as normal. We really appreciate your patience and will let you know as soon as we know more.

Posted Jan 23, 2025 - 09:50 MST

Investigating

We are currently investigating the issue and will provide status updates.

Posted Jan 23, 2025 - 08:32 MST

This incident affected: Deliver and Route.