Summary: sites hosted on pemlinweb19 and 20 are currently down due to a faulty PSU in the hardware node that they are on. (PSU = power supply unit)
Update: 14:58 both nodes are back online now.
Update: 14:45 the psu has been replaced and the hardware node is backup, the two nodes are currently doing disk checks and should be back online asap.
Domains Hosted on the following IPs are affected:
81.17.254.79
81.17.254.57
We have spare PSUs and chassis for this type of server on site and one of our engineers is currently performing the swap out.
Further updates will be posted here.
We are seeing disk errors on ector's RAID array. Due to the nature of the errors, we would like to replace the disk urgently.
This will involve bringing the server down, replacing the disk and powering back up again.
Downtime should be no more than 15 minutes.
I will update this blog post once completed, thanks for your patience.
UPDATE 11:40 - All went as expected - thanks for your understanding
We are currently having issues with Ector. It has been rebooted, but we're still waiting for it to come back up.
We are seeing an increased load on our hardware node PEMVZMPS17. This is mainly due to increase visitor traffic to some of the sites on this node.
It's starting to out run it's capacity. To ensure the best customer experience the nodes CPU and memory are being upgraded immediately.
The downtime is estimated at no more than 15 minutes beginning at 8.10AM
Affected Linux Shared Hosting nodes:
81.17.254.77 pemlinweb17.blacknight.com
81.17.254.78 pemlinweb18.blacknight.com
I'll update this blog post once the maintenance is completed.
UPDATE 08:22AM - This is now complete
We are currently experiencing issues with one of our shared windows hosting servers - pemwinweb08 / 81.17.250.65
We hope to resume normal service shortly.
Update: 08:12 - All services are now running normally.
Update: September 16th 08:40
We have identified a 3km long fibre link between our two data centres that is having intermittent packet loss which has been causing slow down for some customers from their websites and their vps servers. This link has been taken out of service apart from 1 network segment that we're using to diagnose the problem as we replace equipment on either end, this is NOT service affecting.
Last night at 23:00 we replaced the media converters (See note 1) in both locations 1 at a time and performed our throughput and packet loss tests. Replacing both media converters had no affect on the issue so this morning we're looking at other equipment including cross connects in both locations. Further updates will be posted when we have more information.
For now we're considered "At Risk" as our redundancy between the two facilities is affected by this issue.
Note 1: A media converter is a device that converts light energy into electrical energy for network purposes. i.e. it takes a fibre connection on one side and a copper connection on the other and passes the network data from one to the other.
We have received several reports of network issues including packet loss.
Our technical team are working on investigating the issue in depth.