Summary: pemwinweb08 is currently experiencing issues. Our engineering team are on the case and hopefully it'll return to normal service shortly. We'll post further updates here.
Update: 08:20 The server is back now. The Frontnet (public facing network adapter) disappeared. We believe this to be related to a bug in our provisioning software or some sort of configuration conflict. On reboot the interface came backup and the machine is back on the air. We're investigating this issue further.
Summary: Some of the services behind cp.blacknight.com are currently unstable which is causing the control panel to show 404 and proxy errors. We're working with our software vendor to resolve this as soon as possible. We hope to put the problem behind us this week once and for all. Further updates will be posted here when we have them.
Update: 09:38
cp.blacknight.com is now backup, but it's considered unstable until further notice. As mentioned above our vendor (Parallels) have been working on this for approx 2 weeks and twice they've informed us that it's been fixed, yet we keep seeing more problems. Further updates will be posted here when have them.
We've been experiencing some issues with the store this evening.
Our technical team are working on resolving the issue as quickly as possible
This should not have any impact on any existing services - just the actual shopfront.
UDPATE 10:00 - this issue has been resolved.
When: 03:00 to 06:00 on Friday Feb 12th.
Summary: In order to diagnose issues with the java service that drives the UI in our control panel our Software Vendor wishes to perform several restarts of the service. We'll minimise the hit on customers by doing this between 03:00 and 06:00 tomorrow Feb 12th.
What: In order to diagnose the problems which cause 404 tomcat errors and proxy gateway errors on cp.blacknight.com parallels want to put their application into "deep investigation" mode. This will involve several restarts and potentially prolonged bouts of downtime (10-15 minutes) during this maintenance window.
We'll hopefully see a more stable control panel once they've done this and found the root cause of the problem.
The backup service, backup.mybackup.ie will be offline for the majority of the day due to some file system errors we've seen on one of it's SAN volumes.
Our engineers are working on this currently and this blog post will be updated once completed.
Update: 16:50
This file system check completed at approx 14:30. We restarted the system and it's now back working 100%. We've taken some steps to get notified quicker also if a file system changes from RW to RO (where the FS is damaged for some reason).
This issue is now resolved fully.
Ragnell is currently having issues which is causing it to become unresponsive. There is an engineer on site getting it backup, and it should reappear within the next 10mins.
Summary: The hardware node pemvzmps32 which contains pemlinweb31 and mysql452 is experiencing issues. This is resulting in strange behaviour such as ftp timing out and high load on the mysql node. We plan to move these two containers off of this node tonight between 22:00 and 01:00 at which time the issues that people are experiencing will be fixed. Unfortunately this can't be resolved any sooner without significant downtime.
Eurid, the registry operator for .eu, are conducting maintenance work tomorrow, February 10th 2010 from 0600 to 0630 CET.
During this period no new registrations or updates will be possible. WHOIS will also not be available.
Existing .eu domain names will continue to work as normal
We are currently experiencing issues with our shared hosting server Ragnell. Our engineers are working on resolving this currently.
Update 5:58PM This has been resolved.
Due to increased traffic we are seeing on one of our MySQL nodes we are going to migrate it to it's own dedicated hardware.
The downtime will occur tonight at 22:00, Feb 8th 2010
The downtime for the affected node will be no more than 45 minutes.
The affected node is:
mysql106.cp.blacknight.com
mysql106int.cp.blacknight.com
We will update this post once completed.
Update: This work has now been completed.
We are currently experiencing issues with our shared hosting server Bors (81.17.252.40)
Our engineers are working on resolving this currently and will update this post once more information is available.
Update 6:52PM: This issue has been resolved.
Due to a software error we are seeing on the hardware node, PEMVZWIN02, we need to reboot this node asap.
The node will be down for no more than 10 minutes per VPS on the server.
The affected VPSs are:
78.153.208.116 VPS-238
78.153.208.123 VPS-258
78.153.208.16 VPS-262
78.153.208.127 VPS-282
78.153.208.28 VPS-287
78.153.209.211 VPS-288
78.153.208.154 VPS-317
78.153.208.168 VPS-331
78.153.208.149 VPS-337
78.153.208.170 VPS-341
78.153.208.172 VPS-342
78.153.208.169 VPS
78.153.208.171 VPS-344
78.153.209.151 VPS-346
78.153.208.15 VPS-347
78.153.208.62 VPS-353
78.153.208.176 VPS-354
78.153.208.177 VPS-355
78.153.208.179 VPS-357
78.153.208.184 VPS-362
78.153.210.75 VPS-387
78.153.208.205 VPS-390
78.153.208.218 VPS-407
78.153.208.222 VPS-409
78.153.208.223 VPS-410
78.153.209.118 VPS-620
78.153.209.160 VPS-664
78.153.209.164 VPS-667
78.153.209.176 PARTYCENTRAL
78.153.209.107 VPS-710
We will update this post once completed.
Update 12:57: This server has been rebooted now. We are now bringing all VEs back online
Update 13:17: All services are resumed.
At around 15:30 balin.blacknight.ie stopped responding to requests. As we had an engineer on site who also couldn't log in locally, we rebooted it straight away and it was back up by 15:40.
While we don't know exactly what caused the issue yet, it looks like server was run out of memory, possibly due to a massive surge of queries to MySQL.
The server Bediver (81.17.248.30) needs to be reboot urgently.
Our engineers are doing this now and the downtime will be no more than 5 minutes.
UPDATE: This is complete
Pemwinweb09(81.17.250.41) is currently experiencing issues. We're currently looking into it and will update here as soon as we know more.
UPDATE 17:38: The server is up and running again and we're currently investigating to find the cause of the outage.
There will be an upgrade of the PHP versions on Gorlois, Priamus and Rivalin to PHP 5.2.12 this evening in order to fully sort an issue we've been having with Installatron. There should be only a few seconds downtime required as Apache restarts with the new version.
UPDATE: 18:10
All three servers have been successfully updated. Gorlois' had slightly more downtime than expected as an automated process restarted Apache in the background while PHP was being updated. This caused PHP to stop working for about 2 minutes.
UPDATE: 08:52 Tuesday Feb 2nd
As mentioned above the server Gorlois had some additional knock on problems after this upgrade. During the compile of PHP exim decided that it wouldn't accept e-mail from outside. This wasn't caught until the alerting system sent it's first reminder about the service being down some time after. Gorlois' SMTP service came back at 20:17 last night and people were able to send e-mail again and e-mail from outside started to flow. We don't believe any e-mail was lost during this window.