Notification Type
Technical Information
Date
December 3, 2009 11:05 AM
Service Affecting
Yes
Message
Summary: Today this server has powered itself off twice. We've been working on the issue since this morning at 8am but it appears that it may be due to a cooling problem localised to this rack.We may have to swap it's position in the rack and while doing so we'll check that all the fans are operating properly, if they're not we'll swap them.
We've ordered additional vented tiles for this part of the data centre also which will improve air flow for the servers in this rack. These will be installed today.
The following VPS' are affected by this issue:
vps-642.cp.blacknight.com
vps-841.cp.blacknight.com
vps-890.cp.blacknight.com
vps-1110.cp.blacknight.com
vps-1118.cp.blacknight.com
vps-1126.cp.blacknight.com
vps-1142.cp.blacknight.com
vps-1164.cp.blacknight.com
vps-1194.cp.blacknight.com
vps-1198.cp.blacknight.com
We hope to have service reliably restored within the next few hours. An emergency maintenance window may be necessary.
Update: 11:31 December 3rd
We're shutting this machine down to replace a dead fan on CPU0 which is causing CPU0's core temperature to rise in excess of 100C and the machine to shut itself down in order to protect the CPU.
Update: 11:42 December 3rd
This machine is back in production now with a new heatsink. VM's are starting up and should be fully online by 11:55 or 12:00.
Update: 11:56 December 3rd
All VMs are now back. The core temperatures of all CPUs in this node are now returning to normal. This issue is now closed.
Leave a comment