August 2010 Archives

Emergency reboot of pemlinweb15 and pemlinweb16

TrackBacks (0) Comments (0)
Summary: We're rebooting pemvzmps16 NOW (11:30am, Friday 27th) to bring it back up on the correct kernel after a memory upgrade last night.

Details: The hw node that these two VMs reside on had a failed disk replaced last night. As a result a new kernel was loaded. We've identified that the kernel that is currently loaded is the incorrect kernel and is causing some load issues on these two nodes. A reboot onto the correct kernel will resume normal operations.

Update 11:37: this is now complete and normal service has been restored.

Hard Disk replacement: Linux Shared Hosting Hardware Node PEMVZMPS16

TrackBacks (0) Comments (0)
We have been alerted to a failed hard disk on one of our linux shared hosting nodes PEMVZMPS16. To ensure the RAID becomes full optimal asap we are going to replace this disk tonight at 9PM.

What's affected?
Two linux shared hosting web servers are located on this node. They are:
  • 81.17.254.74    pemlinweb15.blacknight.com
  • 81.17.254.75    pemlinweb16.blacknight.com 
What sort of downtime is to be expected
The plan is to bring the server offline, replace the disk, boot it back up and then let the RAID resync itself.

The downtime will be in the timeframe of 15 minutes, but we are going to schedule a window of one hour.

When is this happening ?
The downtime will occur tonight 26th of August 2010 at 21:00 hours.

Once the maintenance is complete we'll update this blog post.
 
UPDATE 21:08 - The disk change is complete and the server is now back online.

Issues with PEMLINWEB32/33 Linux Shared Hosting

TrackBacks (0) Comments (0)
We are currently experiencing issues with two linux shared hosting nodes:

pemlinweb32.blacknight.com - 81.17.254.44
pemlinweb33.blacknight.com - 81.17.254.48

Our engineerings are working to resolve this issue asap and will keep this blog post updated.

UPDATE 12:20AM - This issue has been resolved.

Email changes post cp.blacknight.com upgrade

TrackBacks (0) Comments (0)
There has been a slight change in how the email accounts work since our major cp.blacknight.com upgrade on Monday the 16th of August:

http://www.blacknightstatus.com/2010/08/cpblacknightcom-major-upgrade.html

This will not affect the vast majority of customers but if you are having problems emailling an address on your domain that worked fine before Monday, but now no longer works, this might be the cause.

How the email accounts work is they have two parts: a service user and an email address connected to that service user.  The service user is your username and password that you use to access the email address itself.  While we would always recommend the service user have the same username as your email address, you might not have set up your email account this way.

Prior to Monday your email account would accept email sent to either the service user username or the email address.  So for example if your username was info@test.com but the email account associated with this was actually john@test.com then emails to info@test.com would still get delivered to the john@test.com address without error.

This was not actually how the system should work however and the upgrade on Monday fixed this "bug".  Now email will only go the email account itself and email to, for example, info@test.com would no longer get through.  You can resolve this easily for yourself now as follows:

# http://cp.blacknight.com > Email > Email Addresses
# Click on the affected service user name in the list
# Go to Email Addresses > Add > Add the email address that no longer works > Submit

That should start to work again straight away.  If you continue to have problems, or have any questions, please let us know as soon as possible.

EU Registry Maintenance August 25th 2010

Comments (0)

Eurid, the registry operator for .eu, will be conducting maintenance on August 25th 2010 from 0600 to 0700 CEST.

During this period we will not be able to process any new registrations or updates.

Existing domain names will not be affected.

Control Panel Unavailable

Comments (0)

The control panel is currently unavailable.

More information as soon as we have it

1230 Update: It looks like there is some configuration issue presumably related to yesterday's upgrade. We have contacted the software vendors and hope to have an update and resolution quickly

Update 1243 - We have restarted parts of the system, so the control panel login is currently available. We will update when we have more information

Shared Hosting Server Ragnell Compromised

TrackBacks (0) Comments (7)
Our shared hosting server Ragnell has been compromised, and the majority of the index.php's have been replaced with a hacked version. We have disabled all copies of the compromised index files already.

We are at the moment making sure the hole used is fixed before re-enabling Apache. As part of this, PHP is being upgraded to PHP5.

We are also going to see about restoring the disabled index files, however this is going to take longer. The backup system we use is geared towards full system backups, so restoring individual files is likely to take a while. If you have an uptodate copy of your index file, it will probably be faster if you get it uploaded yourself. This can be done even while Apache is down.

Update 1430: The upgrade of php / Apache is almost complete. Once it's finished we will be able start restoring index files from backups.

Update 1515: Apache is back up and running.  We are currently restoring the index files from backup. This is going to take a long time.

UPDATE 1615: If your site's index file has been restored or if you've restored it yourself let us know if there are any issues.

UPDATE 16:52: As restoring individual index files is proving to be far too unwieldy, we are currently restoring the whole partition to another box. This will allow us to script the restore of any index files which are still showing as compromised. 

UPDATE 1910: The restoration of the index files is progressing, but it's slow, as we are checking each index file to see if it has been compromised or simply replaced from a customer's own backup. If you have a backup / replacement index file and are having issues uploading it you may need to CHMOD 644 the current index.php
UPDATE 09:30 Friday Aug 20th

After 3 failed attempts at a full restore to a machine in our offices, we have successfully done a full restore to a machine in the data centre. This morning around 9am we restored any files which had a checksum that matched that of the defaced files that were placed there during the compromise on Saturday last.

Anyone who requires other files to be restored for any reason should contact us ASAP so we can restore them for you.

COCCA Registry Maintenance

Comments (0)

The COCCA registry platform, which is used by .fm, .gs, .cx, .ht and several other country code top level domains, has scheduled maintenance on August 18th

When?

August 18th 2300 - 2330

What is impacted?

New registrations, updates and lookups for .fm, .gs and other country codes.

Existing .fm domain names will not be impacted

CP.blacknight.com MAJOR UPGRADE

TrackBacks (0) Comments (8)

When: Monday August 16th from 02:00 until 8am

What: Control panel software, provisioning system, Agents on all hardware nodes, mysql nodes, web servers will all be upgraded from version 2.9.4 to version 5.0. During this window access to the control panel will be restricted. However e-mail, hosting etc services should not be affected by this upgrade.

Changes made to webspaces, new or existing e-mail accounts added or modified, new database creations etc during this window would be ill advised.

Services Affected:

cp.blacknight.com - Control Panel only.

Windows Shared hosting servers may be restarted. only if the upgrade of the management application proves to be problematic.

Linux Shared hosting servers may be restarted. only if the upgrade of the management application proves to be problematic.

Exchange Servers may be restarted, but due to the design of the Hosted Exchange no downtime should be noticed.

Qmail servers will have their imap/pop3/smtp services restarted as new versions of the software get put in place. Also changes may be made to LDAP so there could be intermittent authentication issues.

Sitebuilder servers should not be affected.

VPS nodes will have some software upgraded, but end users should experience no downtime.

Domain registrations and modifications will not be affected by this upgrade.

Update 07:40 Monday 16th:

The upgrade is still being performed at the moment, but it's in it's final stages. We'll post an "all clear" notice when it's complete.

Update 08:45 Monday 16th:

During the upgrade Parallels appear to have broken the provisioning system somehow. As this drives the Control Panel the CP is still down. They're working to resolve this issue but we've got no ETA as of yet.

Update 1045
The maintenance and upgrade has been completed. If anyone has any specific issues post-upgrade please contact our helpdesk

IE Registry Scheduled Maintenance

Comments (0)

The IE domain registry has informed us that they will be conducting maintenance on Thursday 12th August from 0700 to 0730

During this time period we will not be able to process any new registrations or updates.

Registered domains will not be impacted

Shared Direct Admin server - Ector down

TrackBacks (0) Comments (0)
Our engineers are currently seeing issues with one of our older Direct Admin servers:

Ector.blacknight.ie - 81.17.252.50

The server was rebooted not long ago and there are some lingering issues that may affect the receiving of email and possibly cause downtime on your websites.  We are working on this issue at the moment and our highest priority is to get this server back up and running as soon as possible.

We also hope to implement some changes on this server over the upcoming week to resolve the intermittant issues this server has been having over the last couple of weeks.


Update 12:50 - The server is back up now and our engineers are checking all services to ensure they are up and running.  They are still working on the server to restore all back to a normal level of service.

Update 15:10 - I'm afraid the server has gone down again and our engineers are working on restoring service once more.  We are still investigating the root cause of these issues today.

Update 15:26 - The server is still a little slow but services are back now.  We will be contacting some of the busier sites on this server in order to alleviate the load issues.

Update Aug 5th 11:22 - This server has had some config changes, we've moved a few of the busier wordpress blog sites off it and also we've found a problem in the AntiSpam system that was causing lookups on dead realtime blacklists. All of these changes appear to have resolved the problems we were seeing on this server for the past couple of weeks. We'll be monitoring it closely for the next week and if there are no issues in that timeframe we'll consider this issue closed once and for all.

Shared Hosting Linux - Gorlois

TrackBacks (0) Comments (0)
We are currently experiencing issues with our shared hosting linux node Gorlois. Our engineers are currently working on the issue at the moment and hope to have it resolved shortly.

We will keep this post updated with progress

UPDATE 09:46 - This issue has been resolved.

mail.blacknight.com - emergency maintenance

TrackBacks (0) Comments (0)
Summary: We're going to tweak the storage backend for mail.blacknight.com tonight around 22:00 hours. We estimate approx. 30 minutes where e-mail, webmail, imap(s), pop(s) and smtp will be unavailable.

When: Tuesday August 3rd at 22:00 hours until 22:30

What:
We've had some complaints about sending and receiving being slow, this is caused by a problem in Courier-Imap where it does a "list" on the users home dir during the authentication process before a list is requested from the mail client. We believe that the NFS servers have more capacity so we wish to restart the daemons after some configuration changes. As this will mean the nfs shares will have to all be unmounted and offline we'll need to shutdown all mail services while we do this. The change isn't major and should take less than 30 seconds, however we have to ensure that no mail is lost so we're shutting the system down.

We also have further storage nodes to go into the cluster but we're putting off this install until we stabilise the current solution.


Shared Hosting Windows - Pendragon

TrackBacks (0) Comments (0)
We are currently experiencing issues with our Windows Shared hosting node Pendgraon.

Our engineers are working to resolve this currently and we expected to have it resolved shortly.


UPDATE 11:55 - Our engineers are on site and working on this issue. The server is having issues with some services causing it not to boot.

UPDATE 14:04 - This issue seems to lie within the hardware of the old server. We're moving the data onto a new server at the moment. ETA until the server is online is 2 hours.

UPDATE 20:15 - This issue is now resolved. The fix took much longer than anticipated but the machine has been moved to more modern hardware and new hard drives as a result of this issue. The underlying problem which caused the issue will be diagnosed tomorrow during business hours but we don't expect any further disruption to customers services.