April 2011 Archives

Windows VPS node emergency reboot

TrackBacks (0) Comments (0)
I'm afraid that our engineers have had to reboot one of our Windows VPS nodes.  If your windows VPS is down in the next few minutes then this may be the cause.

Affected VPS's and IP addresses are:
1703    vps-1703.cp.blacknight.com     78.153.196.51
1706    vps-1706.cp.blacknight.com     78.153.196.52
1708    vps-1708.cp.blacknight.com     78.153.196.53
1711    vps-1711.cp.blacknight.com     78.153.196.54
1712    vps-1712.cp.blacknight.com     78.153.196.55
1714    vps-1714.cp.blacknight.com     78.153.196.56
1719    vps-1719.cp.blacknight.com     78.153.196.59
1720    vps-1720.cp.blacknight.com     78.153.196.60
1721    vps-1721.cp.blacknight.com     78.153.196.61
1729    vps-1729.cp.blacknight.com     78.153.196.64
1742    vps-1742.cp.blacknight.com     78.153.196.67
1746    vps-1746.cp.blacknight.com     78.153.196.71
1749    vps-1749.cp.blacknight.com     78.153.196.72
1750    vps-1750.cp.blacknight.com     78.153.196.73
1753    vps-1753.cp.blacknight.com     78.153.196.76
1755    vps-1755.cp.blacknight.com     78.153.196.79
1762    vps-1762.cp.blacknight.com     78.153.196.83
2035    ....     78.153.196.50
2040    vps-2040.cp.blacknight.com     78.153.196.146
2042    vps-2042.cp.blacknight.com     78.153.196.181
2050    vps-2050.cp.blacknight.com     78.153.196.183
2054    vps-2054.cp.blacknight.com     78.153.196.184
2067    vps-2067.cp.blacknight.com     78.153.196.189
2069    vps-2069.cp.blacknight.com     78.153.196.190
2082    vps-2082.cp.blacknight.com     78.153.196.196

Our engineers will update this as soon as service is restored.

Pendragon Compromised, Old Windows Shared Hosting

TrackBacks (0) Comments (0)
It looks like Pendragon was compromised over the weekend and all index files replaced with a "Hacked By" message. We haven't been able to find the point of entry yet, so IIS is being disabled.

All replaced files are being restored from backup.

More details and ETA will be posted here once we have them.

Update 17:30: We've found the source and blocked it. We're now waiting for the restore to complete before restarting IIS.

Update 17:55: All files are now restored and all sites are back up and running. If you spot any lingering issues, please let support know.

pemlinweb02 experiencing difficulties

TrackBacks (0) Comments (0)

The above Linux Shared Hosting server is currently experiencing difficulties, we have dispatched an engineer to investigate.

Update 19:40 - This server has now returned to normal.

EU Registry - Scheduled Maintenance

TrackBacks (0) Comments (0)
Eurid, the .eu registry, will be conducting scheduled maintenance on Wednesday 27 April from 0600 to 0900 CEST

During this period new registrations, updates and WHOIS will be unavailable.

Update: 0932
We received notification from the registry that this maintenance window was completed and all services have returned to normal

AHSAY Server Upgrade

TrackBacks (0) Comments (0)
Summary: We're going to upgrade our Ahsay server to a later version tomorrow at 12:00 in order to fix a certain bug with restores. This is being done during business hours as the vast major of backups are being done outside of business hours.

Downtime should be less than an hour.

Services affected: All backup / restore operations that work with the Ahsay backup client that use http://backup.mybackup.ie. As the day time is the quietest part of the day for the backup system it's the best time to do work on the server.

Note: This not affect any shared hosting customers nor is it service affecting for any customers.

PEMLINWEB02 Issues

TrackBacks (0) Comments (0)
PEMLINWEB02 is currently unresponsive. We have an engineer on site and we're working on getting it back up ASAP

UPDATE 11:08 - This issue is fully resolved.

Disk Repacements - PEMVZMPS64/69

TrackBacks (0) Comments (0)
We been alerted to some failed disks within the RAID arrays of PEMVZMPS64 & PEMVZMPS69. Because of this we are going to bring these nodes offline tomorrow morning and replace the dead disks.

When: Wednesday 20th of April at 07:30. The maintenance window will be 1 hour. 

What's affected?

78.153.215.156  pemlinweb65.blacknight.com      
78.153.215.157  pemlinweb66.blacknight.com
78.153.215.165  pemlinweb72.blacknight.com      
78.153.215.164  pemlinweb71.blacknight.com 

We will update this blog post once everything is completed.

UPDATE 08:36 - These works are fully completed.

Disk Repacements - PEMVZMPS21/61

TrackBacks (0) Comments (0)
We been alerted to some failed disks within the RAID arrays of PEMVZMPS21 & PEMVZMPS61. Because of this we are going to bring these nodes offline tomorrow morning and replace the dead disks.

When: Tuesday 19th of April at 07:30. The maintenance window will be 1 hour. 

What's affected?

81.17.254.62    pemlinweb23.blacknight.com      
81.17.254.63    pemlinweb24.blacknight.com
78.153.214.54   pemlinweb59.blacknight.com      
78.153.214.55   pemlinweb60.blacknight.com

We will update this blog post once everything is completed.

UPDATE 08:26 - The following nodes are fully back online:

78.153.214.54   pemlinweb59.blacknight.com      
78.153.214.55   pemlinweb60.blacknight.com

We are just waiting on a file system check to complete on the remaining nodes to complete this maintenance.

UPDATE 09:11 - All services are fully back online.

Server reboots for April 18th

TrackBacks (0) Comments (0)
Summary: In order to diagnose some slow down problems that we're seeing with a number of mysql nodes which doesn't appear to be caused by customer load we're going to install the latest kernel on the following nodes and reboot them.

What will be restarted: pemvzmps55

What will be affected: All databases on the following nodes will be affected by this:

pemmysql14-5.blacknight.com    mysql643.cp.blacknight.com    mysql643int.cp.blacknight.com
pemmysql15-5.blacknight.com    mysql646.cp.blacknight.com    mysql646int.cp.blacknight.com

There will be approx 30 minutes down time for this window. The node may require to do a disk check which will take some time.

Update 22:40: This maintenance window is now complete.

PEMLINWEB34/35 Issues

TrackBacks (0) Comments (0)
We are currently experiencing issues with the following two linux shared hosting nodes:

pemlinweb34.blacknight.com ->81.17.254.21
pemlinweb35.blacknight.com ->81.17.254.22

Our engineers are working to resolve this issue asap.

16:24 - This issue is now resolved.

pemlinweb34 & pemlinweb35 unresponsive

TrackBacks (0) Comments (0)

The above linux shared hosting servers have become unresponsive, and we have dispatched an engineer to investigate.

Update 16:15 - these servers are now returning to normal.

Problem with Provisioning System

TrackBacks (0) Comments (0)

We are currently experiencing a problem with our Microsoft Provisioning System, which is causing automated tasks to backlog. Changes to Windows Shared Hosting packages are delayed as a result.

 

We have opened a ticket with our software vendors and hope to have a resolution as soon as possible.

 

Update: This has now been fixed and tasks are processing as normal.

Qmail Mail Issue

TrackBacks (0) Comments (0)
The Qmail server down for a short while between 19:35 and 20:10 due to an error with the LDAP servers backing the service.

This has been recified and mail is flowing again. No mail will have been lost due to this as the mail servers were giving a temporary error and sending servers will simply retry again later.

pemlinweb69 / 70 experiencing issues

TrackBacks (0) Comments (0)
Summary: These two servers are currently down. An engineer is on site and working on it.

Update 10:19: These servers are fully back. This was a network port misconfiguration for the public interface on the hardware node that these servers currently reside on. This issue is now fully resolved.

pemwinweb19 authentication / mysql communication issues

TrackBacks (0) Comments (0)

Summary: The above server is having intermittent backnet problems. The backnet is used for Active Directory authentication and communication with MySQL and MSSQL servers. As we've not been able to catch the problem when it was occurring until now we didn't put a post up about it.

We're working now to find the problem and put a fix in place. We'll post further updates as we have more information.

Update 13:15 - We will be taking this server offline for the next while to try and fix the problem once and for all. We apologise for this unexpected downtime.

Update 15:10 - This node has been up and down for about an hour while we worked on it. It has then since been down completely for the last 40 minutes or so. There appears to be corruption in the registry and or system files. Right now we're backing out of our debug work and we're going to restore the machine to a known good state. The ETA for this is unknown at this time. It will be a complete server restore however. Once the restore is under way and is giving us a realistic restore time we'll put an ETA on this status site with those details.

 

Update 19:00 - The restore is still under way and is approximately 80% complete. All data will be as was when we took the server offline, when it returns.

Update 23:00 - The server restore has completed and was successful. Because we had to roll back, the initial problem may still exist, but we will keep monitoring it to see if it still is reoccuring.

Update 16:00 - It appears the initial problem is indeed still occuring. We want to take the server offline for approximately 10 minutes at 18:00 to apply a fix.

Legacy linux shared server - balin

TrackBacks (0) Comments (0)

This server has stopped responding to requests. We have dispatched someone to have a look at it.

Update 17:50 - all services on this server have returned to normal.

Legacy linux shared server - gorlois

TrackBacks (0) Comments (0)
gorlois is currently experiencing load issues. We have rebooted it, and it is coming back online at the moment.

pemlinweb69 & pemlinweb70 unresponsive

TrackBacks (0) Comments (0)

The above linux shared hosting servers have become unresponsive, and we have dispatched an engineer to investigate.

Update 02:00 - both servers have now returned to normal.

pemlinweb59 / pemlinweb60 down

TrackBacks (0) Comments (0)
Summary: Due to a hardware issue in pemvzmps61 the above two linwebs are having issues. They're completely down at the moment and we're diagnosing the issue.

What's affected:


Sites on pemlinweb59 or the following IPs:

78.153.214.54 78.153.214.141 78.153.214.171 78.153.214.193

Sites on pemlinweb60 or the following IPs:

78.153.214.55 78.153.214.128 78.153.214.172

We'll have more information when it's available.

Update 11:15: These two servers are now back online. The machine has a failed disk, we'll replace this next week to ensure this doesn't occur again.

Windows VPS Hardware node reboot

TrackBacks (0) Comments (0)

An issue has occoured so that one of our Windows VPS nodes requires an immediate reboot. VPSs on this node will be down for approximately 20 minutes.

 

Ths effected VPSs are:

78.153.196.63
78.153.196.92
78.153.196.87
78.153.196.96
78.153.196.97
78.153.196.101
78.153.196.102
78.153.196.103
78.153.196.104
78.153.196.105
78.153.196.114
78.153.196.115
78.153.196.116
78.153.196.117
78.153.196.121
78.153.196.124
78.153.196.191
78.153.196.194
78.153.196.195
78.153.196.193
78.153.196.197
78.153.196.199
78.153.196.212
78.153.196.213
78.153.196.214
78.153.196.215
78.153.196.216

We apologise for any inconvenience this unexpected reboot may cause.

pemlinweb21 / 22 server issues

TrackBacks (0) Comments (0)
Summary: This server is having an issue due to a failed disk. A mixture of the nightly 3am snapshot plus it's logrotation around 4am have caused it to become unresponsive and responsive over and over again for the past few hours. Additionally our SMS provider doesn't appear to be relaying the SMS alerts and as such we've not seen this issue until now.

We're working on it now and should have it resolved ASAP.

Email delay on one cp.blacknight.com mailservers

TrackBacks (0) Comments (1)
I'm afraid we are currently experiencing an email delay on one of the shared mailservers for email for any shared hosting package set up through cp.blacknight.com.

This may result in some emails being sent to you getting delayed.  Our engineers are working on this issue with the control panel vendors and we hope to get it resolved as soon as possible.

This does not affect our Hosted Exchange services.

Update 15:00 - Our engineers are still working on this issue with the control panel vendors, but the queue is much lower now and seems to be getting back to normal.

mysql71.cp.blacknight.com/mysql71int.cp.blacknight.com Issues

TrackBacks (0) Comments (1)
We're currently experiencing issue with mysql71.cp.blacknight.com/mysql71int.cp.blacknight.com We are investigating at the moment.

Update 22:14
: This issue is now resolved. It appears to have been an issue we've seen in the past in Virtuozzo where it's constantly swapping which uses extension IO. Once we updated the kernel and rebooted the machine it came back. We're monitoring this node closely to see if the issue re-occurs.