pemvzlin02 difficulties

TrackBacks (0) Comments (0)

Notification Type

Emergency Maintenance

Service Affecting

Yes

Message

The above Linux VPS Server seems to be having difficulty this morning. We're working on it now to get it back. This has a number of VPS on it and their primary IPs are:

78.153.208.8
78.153.208.32
78.153.208.50
78.153.208.41
78.153.208.17
78.153.208.79
78.153.208.95
78.153.208.100
78.153.208.122
78.153.208.36
78.153.208.119
78.153.208.193
78.153.208.196
78.153.209.110
78.153.209.144
78.153.209.182
78.153.209.191
78.153.209.201
78.153.209.209
78.153.209.215
78.153.209.224
78.153.209.228
78.153.209.229
78.153.208.186
78.153.208.46
78.153.210.2

We hope to have it back up and running asap.

Update: 11:10 the machine has successfully come back up after a reboot. However we're doing some diagnostics on it currently to see can we find out exactly what is causing the hanging that has been occurring. This diagnosis is impossible when the VPS' are online so please hang in there while we work on it.

Update: 11:40 We've replayed the logs from last night and we can see that the machine ran out of steam at 02:20 and at 04:50 it stopped responding for most containers that are on it. We can also see that the issue is two conflicting backup processes that appear to be causing the issue. The CDP kicked off at midnight and then around 02:00 the internal virtuozzo backups kicked off. Obviously this is not a good idea so we're going to disabled user scheduled backups on this node. It's far safer to have CDP backups than the internal virtuozzo ones.

FYI we're also running a forced fsck (disk check) on the filesystem that houses the VPS on this node to ensure that it is in a consistent state. So far it hasn't found any issues but we want to be 100% sure. In a recent DR test we found that the bare metal restored fs mounted fine but had some underlying issues. An force fsck (disk check) fixed the issues we found.

Update: 12:20 All our checks and testing was completed at around 12:10. We rebooted the machine once again but unfortunately it is now doing another fsck on the /vz file system. This will take about 60 minutes to complete. 

Update 12:45: We've managed to get the machine to skip the disk check as we know there is no issue with it. It is booted now and containers are starting.

ETA for being back online is on or before 12:45


0 TrackBacks

Listed below are links to blogs that reference this entry: pemvzlin02 difficulties.

TrackBack URL for this entry: http://www.blacknightstatus.com/cgi-bin/mt/mt-tb.cgi/624

Leave a comment