Notification Type
Emergency Maintenance
Service Affecting
Yes
Message
Summary: This afternoon at approx 12:04 one of the hypervisor nodes in the cloud crashed. While the system was moving the handful of VMs to other nodes after this event it became unstable.The software in question is OnApp and it appears to be quite unstable.
Current situation:
There are a lot of pending tasks in the queue that are supposed to be starting Virtual Machines. Most if not all of these appear to be defunct and will not cancel. We have a number of VMs back online and we're trying to get the rest of them on.
We're requesting a full root cause analysis from OnApp on this issue. It's a standard part of what the software is supposed to manage so we can't see it became so unstable.
Update: 14:45: All customer VMs should now be back online. OnApp want us to upgrade to the latest code base which will avoid such instability issues in the future. We are going to postpone this until next week when we'll have ample staff resources on the development side to fix any issues that might crop up.
Leave a comment