December 2010 Archives

PEMVZWIN14 Server Issues

TrackBacks (0) Comments (0)
We are currently experiencing licensing issues on PEMVZWIN14 which has caused it to shut down VPS running on it. We are working with the vendors to get it back up and running as soon as possible.

UPDATE 12:45: This has now been resolved and all VPS should be back up and running.

Windows VPS Hardware node pemvzwin02

TrackBacks (0) Comments (2)

The above server has stopped responding. We have dispatched an engineer to have a look at it.

Update 11:30: The server was rebooted and all services & VPSs have returned to normal.

Brief network interuption between 13:46 and 13:50

TrackBacks (0) Comments (0)
Summary: At approx 13:46 today we experienced some packet loss on our INEX facing connections. This resulted in some BGP flaps so our routes were withdrawn from several ISPs and then re-advertised. This would have appeared as a brief network outage. We've received a report from INEX explaining what caused this issue.

The report follows:

"Due to a port configuration issue, some INEX members may have experienced
packet loss today between 13:46:04 UTC and 13:50:46 UTC today (2010/12/23).

The reason for this packet loss was that the configuration on some
member-facing ports was changed to include unicast storm control.  While
this should have been a non service-affecting operation, it appears that
this was not the case.  We will shortly be opening up a support case about
this, as the packet loss was unexpected."

We don't expect any further issues as a result of this.

Linux DirectAdmin Server: Gorlois

TrackBacks (0) Comments (0)
We currently experiencing issues with our shared linux server gorlois.blacknight.ie - Our engineers are working on this issue at the moment and will update this post once complete.

UPDATE 13:47 - This issue has been resolved.


Legacy Linux Shared Hosting Server - rivalin

TrackBacks (0) Comments (0)

This server is currently experiencing a high load, and as such may be slow.

We are trying to bring the load down and restore normal services.

 

Update: We had to reboot this server, and it is now carrying out a file system check. We aplogise for this unexpected downtime.

Update 2: The file system check has been completed and all services have returned to normal.

smtp1r.cp.blacknight.com / mail.blacknight.com physical move

TrackBacks (0) Comments (0)
Summary: As many of our customers will have noticed there have been problems with e-mail over the past 2 months or so. Our plan for this is simple. We've built a new storage system designed specifically to take the current load. The move to this new platform will be in two parts. Firstly consolidate all mail equipment to a rack with enough space for expansion. The second phase which we'll announce next week after the successful move of the servers.

The first part of the plan is to move all the existing to a new rack in the same Cage. On Monday 20th of December at 07:00 we will shutdown e-mail and proceed to re-rack the equipment.

This will mean there will be a 1 hour outage of e-mail services where you won't be able to check your e-mail or send any e-mail.

What's affected: All e-mail services for hosting accounts administered from cp.blacknight.com will be affected for a duration of 1 hour on the morning of Monday 20th of December.

UPDATE 20/12/2010 07:45AM - We are rescheduling this maintenance for the 21st of December 2010 at the same time of 07:00AM.

UPDATE 21/12/2010 08:30AM - This move is complete, it did run over the 1 hour by about 15 minutes. Apologies for that.. A further post will be put in place over the Christmas break where we'll move the mail to the new system.


pemvzwin02 unexpected restart

TrackBacks (0) Comments (0)
Summary: At 10:26 on Thursday December 16th the above named server blue screened and has rebooted itself. Currently unsure if patch Tuesday is to blame. Our own internal SUS shouldn't have pushed updates to it, but it's possible they did.

The containers on this node are starting already and should be back by 10:50 approx. We'll post further updates on this thread if required.

Before completing this status update, all the containers are already fully back up and running. Please check to ensure that your own internal services have resumed correctly.



Christmas / New Year Opening Hours

TrackBacks (0) Comments (0)
As Christmas is next week here are our opening hours over the break:

Friday 24th December 2010 - 0900 - 1300
Saturday 25th / Sunday 26th December - closed.
Monday 27th / Tuesday 28th December 1200 - 1600
Wednesday 29th, Thursday 30th, Friday 31st December - 0900 to 1800
Saturday 1st January 2011 and Sunday 2nd January 2011 - closed
Monday 3rd January 2011 1200 - 1600
Tuesday 4th January 2011 - normal office hours resume

NB: We have staff on duty 24/7/365. If you have a dedicated server or colo please refer to the "out of hours" contact details

And, in case we forget, have a good break!

Issues On Igraine

TrackBacks (0) Comments (0)
We were forced to remotely reboot igraine as the load went up so high that it became unresponsive. Unfortunately on reboot, anfile system check was required that slowed things down. This has completed, and it should be back now.

PEMVZWIN04 Hardware Node

TrackBacks (0) Comments (1)
We are currently experiencing issues with our hardware node PEMVZWIN04.

An engineer is on route to resolve the issue currently and this status post will be update with details as soon as they are available.

UPDATE 7:31AM: The issue has been found to a hard disk failure. The engineers are still working on this.

UPDATE 08:40:
This server has suffered a multiple disk failure in a raid 10 array. Multiple disks failing in a raid 10 array typically result in a complete loss of data. This is the case in this instance.

We will work to get a replacement machine backup and have people's Virtual Private Servers re-created on the new node as quickly as possible. This is going to take some time but customers are advised to get their data backups ready to be restored once your VPS has been re-instated into your account.

UPDATE 11:00: We're currently in the process of examining the disks to see can we retrieve data from them. Some customers have indicated to us that they have no backups of their own data at all. Our intention is to re-create all the VPS servers on a new node put people in a position where they can re upload their data and restore their own backups.

So to summarise, we intend to re-create all VPS's on a new hardware node as "newly installed VPS" so people can move on and get their sites etc backup.

Secondly we're examining the old disk(s) forensically to see can we get the data off them. While this is unlikely to be an immediate resolution there's a good chance that the data can be retrieved but it could take a few days or a number of weeks.

Our technical team have been in direct contact with all clients affected by this issue.

Linux VPS Hardware Node: PEMVZLIN18

TrackBacks (0) Comments (0)
We are currently expierencing issues with the VPSs located on the hardware node: PEMVZLIN18.

Our engineers are working to resolve this issue asap.

Affected VPSs:

vps-294-1918.cp.blacknight.com 78.153.211.115
nrgserver.com 78.153.211.196
vps-1116091-1944.cp.blacknight.com 78.153.211.199
vps-1116157-1945.cp.blacknight.com 78.153.211.200
vps-1116274-1946.cp.blacknight.com 78.153.210.80
vps-1116314-1948.cp.blacknight.com 78.153.211.204
vps-1116348-1949.cp.blacknight.com 78.153.211.208
vps-1116623-1950.cp.blacknight.com 78.153.211.210
vps-1116611-1951.cp.blacknight.com 78.153.211.211
vps-1117281-1955.cp.blacknight.com 78.153.208.13
vps-1117336-1958.cp.blacknight.com 78.153.208.52
vps-1117585-1961.cp.blacknight.com 78.153.208.23
vps-1117592-1962.cp.blacknight.com 78.153.208.80
vps-1118587-1967.cp.blacknight.com 78.153.208.153
vps-1118776-1969.cp.blacknight.com 78.153.208.162
vps-1119048-1971.cp.blacknight.com 78.153.209.81
vps-1119083-1973.cp.blacknight.com 78.153.209.119
vps-1119175-1975.cp.blacknight.com 78.153.209.132
vps-1119524-1980.cp.blacknight.com 78.153.209.200
vps-1119640-1981.cp.blacknight.com 78.153.209.230
vps-1119698-1982.cp.blacknight.com 78.153.209.253
vps-1119699-1983.cp.blacknight.com 78.153.211.14
vps-1111443-1988.cp.blacknight.com 78.153.211.37
vps-1120226-1989.cp.blacknight.com 78.153.211.38
vps-1120517-1992.cp.blacknight.com 78.153.209.197
vps-296-1994.cp.blacknight.com 78.153.211.39
vps-1120779-1996.cp.blacknight.com 78.153.211.40

09:13AM - This issue has now been resolved.

PEMVZMPS22 - Server Reboot

TrackBacks (0) Comments (0)
Due to an error we were seeing on this linux shared hosting hardware node, a reboot was needed.

The effected nodes are:

pemlinweb01.blacknight.com 81.17.254.70
pemlinweb07.blacknight.com 81.17.254.88

ETA until they are back online is aprox 10 minutes.

EUrid Registry Maintenance (.eu)

Comments (0)

Eurid, the .eu registry, will be carrying out maintenance on their systems on Wednesday 15 December 2010 from 2100 to 2230 CET.

Registrations and updates will not be available during this period.

mail.blacknight.com Down

TrackBacks (0) Comments (0)
We are currently experiencing issues with email over mail.blacknight.com - Our engineers are working to resolve this issue currently. Updates will be posted here.

UPDATE 01:53AM - The issue has been diagnosed and the engineers are now getting the servers back online.

UPDATE 02:01AM - The front end nodes are now starting to route their mail correctly again. In about 15 minutes the load will calm down drastically and all will return to normal.

Shared Hosting Linux: PEMVZMPS43 Reboot

TrackBacks (0) Comments (0)
Due to the hardware node PEMVZMPS43 becoming unresponsive the server has been sent for a reboot.

This will affect the following nodes:

pemlinweb39.blacknight.com 81.17.254.12
pemlinweb40.blacknight.com 81.17.254.13
mysql551.cp.blacknight.com
mysql554.cp.blacknight.com

The downtime is expected to be 10 minutes. We'll keep this post updated.


mail.blacknight.com Issues

TrackBacks (0) Comments (0)
We are currently experiencing issues with our Mail Clusters storage. Our engineers are working on diagnosing and resolving the issue asap.

11:19AM - One of our mail clusters storage platforms has kernel paniced. A reboot will be needed which may take time depending on the speed of the FS check on boot. 

11:34AM - All is functioning correctly again. Just waiting on the backlog of mail to clear.

11:51AM - Due to the high volume of traffic hitting the servers looking for their emails over POP3/IMAP etc the servers are running with a very high load. This will balance out and our engineers are monitoring it closely.

Linux Shared Host Morgana Compromised

TrackBacks (0) Comments (3)
Sometime before midnight last night, morgana was compromised with a root level exploit. It looks like any index file has been overwritten on all sites. Mail seems to be unaffected.

The most recent uncompromised backup seems to be 16:00 yesterday, so we're working to restore the overwritten html files from that.

When we've more data available, it will be put up here.

Update 15:30: Mail is back up and running, however we're running into delays restoring compromised index pages from the backups. The backups are fine, however the program for restoring them seems to be sulking due the presence of dead symlinks in the backup.

We are working with the vendors to resolve this and get a full restore. In the meantime, we're going to use a previously restored backup from Oct18th to get sites up and running again. Once the bug with more current restore is resolved, we will bring back up yesterday's versions of the files.

Update 01:30: The mass restore of files from Oct 18th is going at full swing and should be completed within an hour or two.

Update: 08:30 Monday Dec 6th.

The mass restore is taking a little longer than anticipated. it's working it's way through alphabetically currently. This will take probably another 16-24 hours but we might stop it and run it in a threaded fashion instead of a single thread which will be much faster.

There have been a large number of sites restored to their state from the 4th of December. However as index* in terms of filenames were overwritten there's many sub directories of websites not working.

Please open a support ticket and our support team can restore individual files for you. However we will ask for some patience as we want the mass restore to finish before we start doing further restores.

We are still working with the vendor in order to get the restore of yesterday's backups running properly. We do have a full backup, and we can easily restore any individual files or directories if there's something badly needed, however a mass restore like we can do on the backup from Oct 18th is failing.

December 6th @ 12:00

The restore of the full backup from October 18th 2010 has completed. The full CDP snapshot from December 4th @ 15:21 is at 65.7 GB of 111.1 GB. This will take another 2hours or so to restore at which point we're going to completely restore all public_html and private_html directories completely to this date/time.

If you have critical sites which are still down please contact us immediately via our helpdesk, however until this full restore is complete we can't access this snapshot so we can't do individual file restores.

December 6th @ 14:30

The restore from the back of Dec 4th @ 15:21 has now completed. This has been restored to another server. We are now proceeding with the restoration of changed files using rsync. This will take several hours to complete and we will update this post again once we're done.

We are now in a position to restore individual files on request from people so please let us know if there is something urgent that you need done.

December 6th @ 16:45

The restore of files from December 4th should now be complete. This issue is considered closed. If you're having any problems please let us know and we can restore files manually for you very quickly.