We are currently experiencing issues with our linux shared hosting node: balin
Our engineers are working on resolving this asap. This blog post will be updated once resolved.
Update: 16:47: This issue has now been resolved.
We have been alerted to a disk failure in both PEMVZMPS20 and PEMVZMPS23. Due to it's RAID configuration no data loss has occured of course.
To get the RAID back up and running to it's full potential we are going to replace the failed disks tonight.
When: 3rd of March 2010 at 21:00 (Expected downtime 15 minutes per node.)
Affected nodes are:
81.17.254.85 pemlinweb04.blacknight.com
81.17.254.86 pemlinweb05.blacknight.com
81.17.254.45 mysql71.cp.blacknight.com
We will update this blog post once completed.
Update 21:32 - PEMVZMPS23 has been completed, both pemlinweb04/05 are back online. We are waiting on a disk check to finish on PEMVZMPS20, once complete the node will be brought online. ETA 15 minutes.
Update: 22:20 - both nodes are back up and the services are restored. As the raid arrays are rebuilding thing sill be a little sluggish until it's completed. This will take several hours.
Ragnell is currently having issues which is causing it to become unresponsive. There is an engineer on site getting it backup, and it should reappear within the next 10mins.
We are currently experiencing issues with our shared hosting server Ragnell. Our engineers are working on resolving this currently.
Update 5:58PM This has been resolved.
Due to increased traffic we are seeing on one of our MySQL nodes we are going to migrate it to it's own dedicated hardware.
The downtime will occur tonight at 22:00, Feb 8th 2010
The downtime for the affected node will be no more than 45 minutes.
The affected node is:
mysql106.cp.blacknight.com
mysql106int.cp.blacknight.com
We will update this post once completed.
Update: This work has now been completed.
We are currently experiencing issues with our shared hosting server Bors (81.17.252.40)
Our engineers are working on resolving this currently and will update this post once more information is available.
Update 6:52PM: This issue has been resolved.
At around 15:30 balin.blacknight.ie stopped responding to requests. As we had an engineer on site who also couldn't log in locally, we rebooted it straight away and it was back up by 15:40.
While we don't know exactly what caused the issue yet, it looks like server was run out of memory, possibly due to a massive surge of queries to MySQL.
There will be an upgrade of the PHP versions on Gorlois, Priamus and Rivalin to PHP 5.2.12 this evening in order to fully sort an issue we've been having with Installatron. There should be only a few seconds downtime required as Apache restarts with the new version.
UPDATE: 18:10
All three servers have been successfully updated. Gorlois' had slightly more downtime than expected as an automated process restarted Apache in the background while PHP was being updated. This caused PHP to stop working for about 2 minutes.
UPDATE: 08:52 Tuesday Feb 2nd
As mentioned above the server Gorlois had some additional knock on problems after this upgrade. During the compile of PHP exim decided that it wouldn't accept e-mail from outside. This wasn't caught until the alerting system sent it's first reminder about the service being down some time after. Gorlois' SMTP service came back at 20:17 last night and people were able to send e-mail again and e-mail from outside started to flow. We don't believe any e-mail was lost during this window.
Due to the increase amount of traffic to the shared linux servers
located on the hardware node, PEMVZMPS15, we will be increasing it's
performance potential with a CPU upgrade.
The affected services are:
81.17.254.72 pemlinweb13.blacknight.com
81.17.254.73 pemlinweb14.blacknight.com
The
time frame for the upgrade will commence at 06:30 on the 21st of January 2010 and should last no longer than 15 minutes of downtime.
We will update this blog post once completed.
This is now completed.
A disk within the RAID array on the hardware node PEMVZMPS28 has failed. We are scheduling some emergency maintenance to resolve this as soon as possible.
The window for the maintenance will begin at 7:00AM on the 15th of January.
The estimated downtime is no longer than 20 minutes.
The affected nodes are as follows:
81.17.254.64 pemlinweb25.blacknight.com
81.17.254.67 pemlinweb26.blacknight.com
This post will be updated once completed.
Update: This has been completed
Update 10:37: There is currently an issue with the data on these two nodes. When the drive was replaced the wrong drive was removed. This resulted in the dead drive springing back to live this morning.
The drive died on the 10th of January and as a result all data has been reverted back to this date. Our engineers are on route to replace the drive with the one from 8AM this morning. We hope to have this resolved within the next 2 hours.
Update 11:39: This server is being brought offline now to replace the harddrive.
Update 11:54: All services have been restored to the correct disk.
Note: As the faulty hard drive was in the server between the hours of 08:00AM and 11:39AM any changes made to your files on this disk will be lost. We apologies for any inconvenience this may have caused.