Summary: One of the SAN devices that is connected to this cluster has a dodgy file system. It's currently being recovered but this will take upto 8 hours. There will be no data loss, this is semi expected as the file system hasn't been checked for over a year and it's recently had huge volumes of data removed from it.
Update: 11:00 am
This file system has been restored and the services are now back online.
We have been alerted to a disk failure in both PEMVZMPS20 and PEMVZMPS23. Due to it's RAID configuration no data loss has occured of course.
To get the RAID back up and running to it's full potential we are going to replace the failed disks tonight.
When: 3rd of March 2010 at 21:00 (Expected downtime 15 minutes per node.)
Affected nodes are:
81.17.254.85 pemlinweb04.blacknight.com
81.17.254.86 pemlinweb05.blacknight.com
81.17.254.45 mysql71.cp.blacknight.com
We will update this blog post once completed.
Update 21:32 - PEMVZMPS23 has been completed, both pemlinweb04/05 are back online. We are waiting on a disk check to finish on PEMVZMPS20, once complete the node will be brought online. ETA 15 minutes.
Update: 22:20 - both nodes are back up and the services are restored. As the raid arrays are rebuilding thing sill be a little sluggish until it's completed. This will take several hours.
Summary: Tonight Monday March 1st we're moving the Provisioning node known as OSSCORE to new hardware. This is due to the sheer volume of customers going into the system on a daily basis. This has caused our provisioning back end to get slower and slower which is causing cp.blacknight.com (web hosting) to slow down.
When: March 1st 2010 starting at 22:00 until 01:00 on March 2nd.
What: From 22:00 hours we'll be working on the migration, the cp on cp.blacknight.com will be turned off for the duration of this migration in order to prevent inconsistencies
Services affected: Management of your web hosting plans, e-mail, databases etc will not be available during this window but your hosting will be unaffected.
Update 23:00: this window has been completed successfully a full 2 hours early.
Summary: pemwinweb08 is currently experiencing issues. Our engineering team are on the case and hopefully it'll return to normal service shortly. We'll post further updates here.
Update: 08:20 The server is back now. The Frontnet (public facing network adapter) disappeared. We believe this to be related to a bug in our provisioning software or some sort of configuration conflict. On reboot the interface came backup and the machine is back on the air. We're investigating this issue further.
Summary: Some of the services behind cp.blacknight.com are currently unstable which is causing the control panel to show 404 and proxy errors. We're working with our software vendor to resolve this as soon as possible. We hope to put the problem behind us this week once and for all. Further updates will be posted here when we have them.
Update: 09:38
cp.blacknight.com is now backup, but it's considered unstable until further notice. As mentioned above our vendor (Parallels) have been working on this for approx 2 weeks and twice they've informed us that it's been fixed, yet we keep seeing more problems. Further updates will be posted here when have them.
We've been experiencing some issues with the store this evening.
Our technical team are working on resolving the issue as quickly as possible
This should not have any impact on any existing services - just the actual shopfront.
UDPATE 10:00 - this issue has been resolved.