November 2011 Archives

Network Upgrade 1/12/11 23:00 - 02/12/11 01:00

TrackBacks (0) Comments (0)
Introduction
As our network continues to grow, we need to ensure there is ample hardware to support it. We've had a few DDoS attacks over the past year which we want to eliminate for the future. Finally we need to ensure that there is enough room for growth within our core network to support the current demands.

What's going to happen?
Between the hours of 23:00 and 01:00 on the 1st of Decemeber 2011 we'll be making some infrastructure changes. The maintenance window of 2 hours does not mean services will be down for this length of time. You'll only see some network latency for a couple of minutes here and there as our engineering team make the necessary changes.

We've planned out exactly how the upgrade is going to run. Our core network edge and transit routers will be upgraded. The new edge routers allowing exponentially more packets per second through our multiple transit links. This will ensure resiliency against DDoS attacks.

Each edge router is in a HA pair. As we move our transit links from their old homes and on to the new edge routers, the routes to the servers will need to be recalculated by the new edge routers. This can take a few minutes.

We'll also be changing how we are doing some interior routing to maximize stability.

When it's going to take place?
The maintenance is scheduled between 23:00 and 01:00 on 01/12/11.

As always, we'll keep this status blog post up to date with how the upgrade is going.

Thank you for your patience.

Update 01:00 The Upgrade is taking longer then expected, it will be 3am before we expect to complete now.

Update 03:00 We had an outage during this upgrade, connectivity is now restored and we apologise for any inconvenience.

We are continuing with the upgrade and will update this post again once it has completed.

FM Domain Registry Scheduled Maintenance

TrackBacks (0) Comments (0)
The .fm domain name registry has scheduled maintenance between 2300 and 0100 on Sunday 4th to Monday 5th December 2011

No new registrations or updates will be possible during this window.

Existing .fm domain names will not be impacted

Windows VPS Platform Updates 29th Nov 21:30

TrackBacks (0) Comments (0)
We will be rebooting the following nodes to allow for updates at 21:30 29/11/11

The downtime should be only about 10 minutes

pemvzwin08
pemvzwin11
pemvzwin13

Affected VPS's:

pemvzwin08:
78.153.208.25
78.153.208.106
78.153.208.113
78.153.209.121
78.153.209.208
78.153.210.105
78.153.210.116
78.153.211.6
78.153.211.18
78.153.211.20
78.153.211.25
78.153.211.34
78.153.209.204
78.153.208.139

pemvzwin11:
78.153.196.51
78.153.196.53
78.153.196.54
78.153.196.55
78.153.196.56
78.153.196.59
78.153.196.61
78.153.196.64
78.153.196.67
78.153.196.72
78.153.196.73
78.153.196.76
78.153.196.79
78.153.196.83
78.153.196.146
78.153.196.181
78.153.196.182
78.153.196.183
78.153.196.184
78.153.196.189
78.153.196.190
78.153.196.196

pemvzwin13:
78.153.196.130
78.153.196.131
78.153.196.133
78.153.196.135
78.153.196.138
78.153.196.144
78.153.196.145
78.153.196.152
78.153.196.155
78.153.196.153
78.153.196.161
78.153.196.167
78.153.196.170
78.153.196.171
78.153.196.172
78.153.196.173
78.153.196.177
78.153.196.178
78.153.196.180
78.153.196.62
78.153.197.33



pemvzmps22 scheduled maintenance

TrackBacks (0) Comments (0)
The server pemvzmps22 will be taken offline at 9pm this evening in order to perform some maintenance.  We expect this to take no more than 2-3 hours.

This will affect the following shared hosting servers:

pemlinweb01.blacknight.com
pemlinweb07.blacknight.com


Update: Updated expected window to more accurately reflect time required to perform the maintenance tasks.

Update 23:30 pemlinweb01 has been completed but took longer than expected so we will postpone pemlinweb07 until Tomorrow night

Update Tuesday 22:00 pemlinweb07 has completed now

cp.blacknight.com - location change for website settings

TrackBacks (0) Comments (0)
Summary: Our friends in Parallels have given us an updated control panel during the night. The changes are subtle but they're there and might cause some confusion.

Firstly the idea of webspaces can now really be forgotten. They have moved all ftp, website settings (php, error logging etc) has been moved to Sites & Domains > $Domainname we've updated the knowledge base article for this, please see:

https://support.blacknight.ie/index.php?_m=knowledgebase&_a=viewarticle&kbarticleid=274

Secondly the Linux web hosting now defaults to adding all domains to one webspace. It will now automatically populate the location. e.g.

if you are adding the domainname "myidea.ie" the location will be set to /myidea.ie - this means that you should not be able to accidentally overwrite another website on the system. It also means that for new domains you should be reliably able to guess its location within your ftp location.

pemlinweb32 offline

TrackBacks (0) Comments (0)

pemlinweb32 is currently offline due to a ddos attack.

 

Update 15:00 The server is back online 

Update 19:20 The server has been attacked again, we have removed access to the server again. We will update again once we have restored access


The server is back online

pemlinweb59 / pemlinweb60 down

TrackBacks (0) Comments (0)
Summary: Due to an issue on pemvzmps61 the above two linwebs are offline. The servers are running but we have had to take them offline to identify a network issue.

What's affected:

Sites on pemlinweb59 or the following IPs:

78.153.214.54 78.153.214.141 78.153.214.171 78.153.214.193

Sites on pemlinweb60 or the following IPs:

78.153.214.55 78.153.214.128 78.153.214.172

We'll have more information when it's available.

Update 15:00 Both servers are back online now

Reboot of MySQL servers mysql870 + mysql873

TrackBacks (0) Comments (0)
The hardware node for the MySQL servers requires an emergency reboot.  The following MySQL servers are affected:

mysql873.cp.blacknight.com
mysql870.cp.blacknight.com

We anticipate they will return in 5-10 minutes, but will update this post should there be any further developments.

Update @ 09:38: Both servers are now fully back up.  Apologies for any inconvenience.

Legacy DNS Server down

TrackBacks (0)
ns2.blacknightsolutions.com in Germany is currently down. We're in contact with the local provider in order to get it back up as soon as possible. ns.blacknightsolutions.com is responding to queries as normal.

This was resolved at 4am

pemlinweb67 Shared Linux server offline

TrackBacks (0) Comments (0)

We've noticed some errors in the logs that indicated a vzfs (virtuozzo file system) issue on this server. These inconsistancies could cause issues if they're not fixed immediately. So we've taken this server offline and we're running a utility against it to resolve the problem.

We endeavour to bring the server back online as soon as possible.


The server is back online now

Windows VPS Platform Updates 07:30 24th Nov

TrackBacks (0) Comments (0)
We will be rebooting the following nodes to allow for updates at 07:30 24/11/11

The downtime should be only about 10 minutes

pemvzwin03
pemvzwin08
pemvzwin09
pemvzwin11
pemvzwin13

Affected VPS's:

pemvzwin03:
78.153.208.229
78.153.208.230
78.153.208.248
78.153.209.12
78.153.208.96
78.153.208.185
78.153.209.23
78.153.209.26
78.153.209.29
78.153.209.43
78.153.209.44
78.153.209.48
78.153.209.88
78.153.209.98
78.153.209.97
78.153.209.117
78.153.209.123
78.153.209.153
78.153.208.19

pemvzwin08:
78.153.208.25
78.153.208.106
78.153.208.113
78.153.209.121
78.153.209.208
78.153.210.105
78.153.210.116
78.153.211.6
78.153.211.18
78.153.211.20
78.153.211.25
78.153.211.34
78.153.209.204
78.153.208.139

pemvzwin09:
78.153.211.74
78.153.211.77
78.153.211.80
78.153.211.81
78.153.211.83
78.153.211.84
78.153.211.94
78.153.211.92
78.153.211.113
78.153.211.121
78.153.211.125
78.153.211.127
78.153.211.132
78.153.211.141

pemvzwin11:
78.153.196.51
78.153.196.53
78.153.196.54
78.153.196.55
78.153.196.56
78.153.196.59
78.153.196.61
78.153.196.64
78.153.196.67
78.153.196.72
78.153.196.73
78.153.196.76
78.153.196.79
78.153.196.83
78.153.196.146
78.153.196.181
78.153.196.182
78.153.196.183
78.153.196.184
78.153.196.189
78.153.196.190
78.153.196.196

pemvzwin13:
78.153.196.130
78.153.196.131
78.153.196.133
78.153.196.135
78.153.196.138
78.153.196.144
78.153.196.145
78.153.196.152
78.153.196.155
78.153.196.153
78.153.196.161
78.153.196.167
78.153.196.170
78.153.196.171
78.153.196.172
78.153.196.173
78.153.196.177
78.153.196.178
78.153.196.180
78.153.196.62
78.153.197.33

Update 07:55
All servers are back online and VPS's started


mysql71.cp.blacknight.com offline

TrackBacks (0) Comments (0)
Server mysql71.cp.blacknight.com with IP 81.17.254.45 is been rebooted to resolve a load issue.

Update @ 10:55:  The container has had to be taken offline for a raid resync.  We will update with more information shortly.

Update 11:25 We are going to migrate the server to another hardware node as this will provide the quickest solution. This will take 45 mins approximately

Update 13:25 The server is coming back online on the original hardware now, we will look at moving it to new hardware in the coming days.

mysql71.cp.blacknight.com offline

TrackBacks (0) Comments (0)
Server mysql71.cp.blacknight.com with IP 81.17.254.45 rebooted and is running an file system check.

Once it is back up we will update this page.

Update: This post previously referenced the wrong server.  Updated to reflect correct MySQL server affected.

Update @ 15:10:  This server is now back up.

pemvzmps39 & pemvzmps22 drive replacments

TrackBacks (0) Comments (0)
We will be taking pemvzmps39 & pemvzmps22 offline for brief period tonight to allow disks to be replaced.

When: Tuesday 22/11/11 18:00

Whats Affected:
pemlinweb31.blacknight.com 81.17.254.38
pemlinweb01.blacknight.com 81.17.254.70
pemlinweb07.blacknight.com 81.17.254.88

The replacments should be completed within 45 minutes.

We'll update here once the servers are back online

Update 18:30
pemvzmps39 is back online now, we have postponed pemvzmps22.

Linux VPS Hardware Node - PEMVZLIN06

TrackBacks (0) Comments (0)
Summary:  Further to this mornings drive failure - we have had to take the VPSs offline until the rebuild of the RAID array is finished.  We will update this post as soon as we have any further information.

When: 16:20

VPS's Affected:

78.153.210.17
78.153.210.60
78.153.210.115
78.153.210.144
78.153.210.153
78.153.210.168
78.153.211.145
78.153.211.146
78.153.211.149
78.153.211.150
78.153.211.151
78.153.211.158
78.153.211.162
78.153.211.173
78.153.211.180
78.153.208.27
78.153.208.134
78.153.208.194
78.153.208.214
78.153.209.16
78.153.209.77
78.153.209.113
78.153.209.192
78.153.209.235
78.153.211.65

Update 19:15
The rebuild of the RAID is at 60%, we will update again in an hour or sooner if we have more information.

Update 21:00
The rebuild has encountered errors on the good drive on the second half of the Raid 10 array. This means that it has found bad sectors on the disk and it has restarted. It may still complete. However we have done a restore from this mornings backup so we might bring that up but we'll wait and see if the rebuild will actually complete.

Update 23:30
We have begun copying VPS data to new servers. We will work through the night and expect the VPS's to back up and running by morning.


Update 05:00
We are now syncing the data one last time to ensure we have the latest version, this is taking a little longer than expected but we will start the VPS's as soon as we have all data in sync.

Update 06:00
We have begun to bring VPS's back online, 50% are back online and the remaining 50% should be back online by 8am.

Update 07:35
All VPS's are now back online, if you have any issues please raise a Support ticket.

We will update again when we have more information.


pemvzmps19 migration

TrackBacks (0) Comments (0)

We are migrating PEMVZMPS19 to new hardware tonight to allow for both drives to be replaced.


When: 21st November at 20:00. The maintenance window will be up to 2 hours for each node. pemlinweb67 from 20:00 and pemlinweb68 from 21:30


What's affected?


78.153.215.160  pemlinweb67.blacknight.com      

78.153.215.161  pemlinweb68.blacknight.com


We will update this blog post once everything is completed

Update 20:55
pemlinweb67 has completed and is now back online. pemlnweb68 will start at 21:30 as planned.

Update 22:05
pemlinweb68 has now also completed.

Shared Hosting Server pemvzmps33 emergency maintenance

TrackBacks (0) Comments (0)
pemvzmps33 has become non-responsive, onsite engineer is investigating.

Whats affected:
  • PEMVZMPS33
    • pemlinweb32.blacknight.com (81.17.254.44)
    • pemlinweb33.blacknight.com (81.17.254.48)

Update 13:50
The server is booting and running an fscheck

Update 14:00
Both pemlinweb32 and  pemlinweb33 are back online now

Linux VPS Hardware Node - PEMVZLIN06

TrackBacks (0) Comments (0)
Summary: pemvzlin06 has a failed drive and is been shutdown to allow replacment. We will update this page once we have the server back up and running.

When: 09:45

VPS's Affected:

78.153.210.17
78.153.210.60
78.153.210.115
78.153.210.144
78.153.210.153
78.153.210.168
78.153.211.145
78.153.211.146
78.153.211.149
78.153.211.150
78.153.211.151
78.153.211.158
78.153.211.162
78.153.211.173
78.153.211.180
78.153.208.27
78.153.208.134
78.153.208.194
78.153.208.214
78.153.209.16
78.153.209.77
78.153.209.113
78.153.209.192
78.153.209.235
78.153.211.65

Update 10:10
The drive has been replaced and the server has booted, the VPS's are now starting, they should all be started shortly.

Update 10:35
All VPS's are now started

Billing and Store Offline

Comments (4)
The billing section of our control panel and our online store are currently offline.

We have taken both facilities offline as there are some database issues and inconsistencies that need to be repaired. Since new orders and renewals etc., use the database we have disabled everything while working on the issue

If you need to change nameservers on a domain name please contact our support desk.

UPDATE 0921 - we brought the system back online around midnight last night

Linux Shared hosting server problems

We are currently experiencing difficulties with one of our Linux Shared Hosting servers. We have dispatched an Engineer to investigate.

Update - 19:40: This server has returned to normal, and we apologise for the unexpected downtime.

Store & Billing issues

TrackBacks (0) Comments (0)

Our store and billing are currenly unavailable. We are investigating, and hope to have these back online as soon as possible.

 

Update: 13:50 - these services have now returned to normal.

SSL certificate update on cp.blacknight.com

TrackBacks (0) Comments (0)
Summary: The SSL cert for cp.blacknight.com expires on November 21st. We're replacing it today to ensure continuity of service.

PEMVZMPS19 Migration

TrackBacks (0) Comments (0)

We are migrating PEMVZMPS19 to new hardware tonight to allow for both drives to be replaced.


When: 17th November at 20:00. The maintenance window will be 1 hour. 


What's affected?


78.153.215.160  pemlinweb67.blacknight.com      

78.153.215.161  pemlinweb68.blacknight.com


We will update this blog post once everything is completed.


UPDATE 19:48 - This has been rescheduled due to current issues that have arisen.

Network Latency

Summary: We're currently experiencing some network latency on our shared hosting networks. Our engineers are working to resolve this issue ASAP.

Update: 20:00: Everything bar pemlinweb32 is back up and running. During tonights outage we've moved cp.blacknight.com and ns1.blacknight.com to different network infrastructure.

Pemlinweb32 outage is due to the attack that caused this outage is targeted at this node.

The following IPs on this node are currently not reachable.

81.17.254.44
81.17.255.52
81.17.255.91
81.17.255.103

Update: Friday 18th @ 11:00: This issue was fully resolved last night at 21:00.




Windows VPS Platform upgrade - part 4

TrackBacks (0) Comments (2)

As part of our ongoing commitment to bring you the latest in technology improvement, we are performing a major upgrade to our Windows VPS platform. The next server will be upgraded next Thursday, 17/11/11, at 23:00 GMT.

The following VPSs are on this server, and will be offline for the period of this upgrade, which should take no more than 90-120 minutes. We apologise for this downtime, and hope no inconvenience will be caused.

Effected VPSs

78.153.208.160
78.153.209.232
78.153.209.161
78.153.210.20
78.153.210.160
78.153.209.166
78.153.210.203
78.153.210.81
78.153.208.217
78.153.208.163
78.153.210.28
78.153.210.88
78.153.209.142
78.153.210.126

PEMVZMPS19 - Reboot

TrackBacks (0) Comments (0)
We've noticed a dramatic constant increase in the server load on PEMVZMPS19. We need to update this hardware node to it's latest kernel.

Thus we need to reboot it to insure stability.

When: 08:40AM 15th of Nov 2011

Affected? : 

78.153.215.160  pemlinweb67.blacknight.com      

78.153.215.161  pemlinweb68.blacknight.com


We'll update this post once completed. Thanks as always for your understanding.


UPDATE: This is completed.


PEMVZWIN02 Issues

TrackBacks (0) Comments (0)
We're currently experiencing some issues with our Windows VPS hardware node: PEMVZWIN02.

We've sent an engineer to resolve this issue ASAP:

Affected VPSs

78.153.208.116  VPS-238
78.153.208.123  VPS-258
78.153.208.127  VPS-282
78.153.208.28   VPS-287
78.153.209.211  VPS-288
78.153.208.168  VPS-331
78.153.208.149  VPS-337
78.153.208.172  VPS-342
78.153.208.169  VPS
78.153.208.171  VPS-344
78.153.209.151  VPS-346
78.153.208.15   VPS-347
78.153.208.62   VPS-353
78.153.208.176  VPS-354
78.153.208.179  VPS-357
78.153.208.184  VPS-362
78.153.210.75   VPS-387
78.153.208.205  VPS-390
78.153.208.222  VPS-409
78.153.208.223  VPS-410
78.153.209.118  VPS-620
78.153.209.160  VPS-664
78.153.209.164  VPS-667
78.153.209.107  VPS-710

UPDATE: The server is booting and the VEs shall be coming back online shortly.

UPDATE 22:13 - VEs are coming back online now without issue. 

pemvzmps19 reboot

TrackBacks (0) Comments (0)
Server pemvzmps19 required a reboot, we have started the server now and will update once it is back up.

What's affected?


78.153.215.160  pemlinweb67.blacknight.com      

78.153.215.161  pemlinweb68.blacknight.com


Update 09:40

The host server pemvzmps19 is back up but we are still working on finding what is causing the issue so pemlinweb67 and pemlinweb68 are not yet back online fully. We will update once they are.


Update 10:40

Both servers have been online now for an hour without any issues, we will keep a close eye on this server.


pemvzmps19 reboot

TrackBacks (0) Comments (0)
Server pemvzmps19 required a reboot, we have started the server now and will update once it is back up

What's affected?


78.153.215.160  pemlinweb67.blacknight.com      

78.153.215.161  pemlinweb68.blacknight.com


Update 15:05

Both servers are back online, we will investigate the issue.

DotCo Registry Maintenance

TrackBacks (0) Comments (0)
The .co registry will be conducting scheduled maintenance this weekend.
As a result of this work the registry will be offline for new registrations and updates between 1300 and 1500 UTC on Saturday November 12th 2011

Existing domain names will continue to resolve as normal

Windows VPS Platform upgrade - part 3

TrackBacks (0) Comments (0)

As part of our ongoing commitment to bring you the latest in technology improvement, we are performing a major upgrade to our Windows VPS platform. The next server will be upgraded next Monday, 14/11/11, at 23:00 GMT.

The following VPSs are on this server, and will be offline for the period of this upgrade, which should take no more than 90-120 minutes. We apologise for this downtime, and hope no inconvenience will be caused.

Effected VPSs

78.153.208.229
78.153.208.230
78.153.208.248
78.153.209.12
78.153.208.96
78.153.208.185
78.153.209.23
78.153.209.26
78.153.209.29
78.153.209.43
78.153.209.44
78.153.209.48
78.153.209.88
78.153.209.98
78.153.209.97
78.153.209.117
78.153.209.123
78.153.209.153
78.153.208.19

I can't send e-mail, what's up?

TrackBacks (0) Comments (3)
Summary: Nothing. After the downtime on Monday and Tuesday we put a new Mail system in place that deals with POP and IMAP. it's more robust, faster for you and for us. It's better in many different ways than the old one.
There is a slight drawback -  you can no longer send e-mail without explicitly turning on SMTP authentication in Outlook, Apple Mail etc.

We have gathered a few articles for you to help you out on this:

1) https://support.blacknight.ie/index.php?_m=knowledgebase&_a=viewarticle&kbarticleid=472
    - this gives you instructions on how to enable this in most common e-mail clients.

OR

2) http://wiki.blacknight.com/index.php/SMTP
    - this gives you a list of SMTP servers for most Irish ISPs, down the bottom is a link to a page that has a list of ISPs for many ISPs outside of Ireland.
An ISP is the company that provides your broadband, cable, dsl, dial up, satellite net connect etc.

I've requested that a newsletter go out to all customers with the above information in it as a technical notice for the next few runs.

Update: 09:35

Webmail users using Firefox 4 or IE 7 or above can use our "alternative" webmail application here: https://altmail.blacknight.com

PEMVZMPS67 Migration

TrackBacks (0) Comments (0)

We are migrating PEMVZMPS67 to new hardware tonight to allow for both drives to be replaced.


When: Tuesday 8th November at 21:00. The maintenance window will be 1 hour. 


What's affected?


78.153.215.160  pemlinweb67.blacknight.com      

78.153.215.161  pemlinweb68.blacknight.com


We will update this blog post once everything is completed.


Update 21:55

pemlinweb68 is complete and back online, pemlinweb67 is still ongoing but should complete shortly.


Update 22:30

The migration has completed now and both servers are online, sorry for the delay but it was important to get it completed.


EU (Eurid) Scheduled Maintenance

TrackBacks (0) Comments (0)
Eurid, the domain registry for .eu, will be conducting maintenance on their backend on Wednesday November 9th 2011 from 0600 to 0900 CET.

During this period updates, registrations and whois will not be available.

Existing .eu domain names will continue to resolve as normal.

mail.blacknight.com / smtpr1.cp.blacknight.com performance issues

Summary: This mail cluster is having some performance issues this morning. We're working on a fix right now.

Update: 10:45: We have been busy working away trying to resolve this issue. At the moment however the cause of the issue isn't at all clear and as such it's proving difficult to get a fix for it. This system has been stable since the last round of hardware updates we put in place a couple of weeks ago. The only thing that has changed is that Compellent our SAN vendor swapped out the iSCSI cards in the two SAN controllers yesterday. This should not have had a negative impact on the system however it appears that it has. So we're working with them to find the cause of the problem.

Update: 11:40: We are still working on this issue. It's the top most priority for our engineering and support teams this morning.

Update: 13:00: This issue is still on going. Unfortunately we've not had made any progress in finding a cause for the slow down. 

Update: 13:25: We're having people getting abusive to our helpdesk staff. This is not helpful for anyone. The issue at the moment is that while they are working properly they're not fulfilling their duties and thus causing this service issue for you all. We are still working on this issue and we are investigating all avenues currently including blocking certain services to see if it's some sort of inbound attack on the mail servers.

Update: 14:30: I've removed some of the previous commentary from this thread as it was causing people issues. I'm sorry about that. Right now we're on the phone to Compellent and we're hoping that their escalation time has found something in the logs we sent them.

Update: 14:55: We have currently taken the entire system offline completely and we're examining each part. The Qmail cluster is made up of 4 service groups.

1) SAN + NFS server
2) POP/IMAP/SMTP servers
3) Authentication - LDAP and WHOSOND
4) Mail Scanning / Anti Spam prevention.

We're fairly confident that Groups 3 and 4 are functioning perfectly as we're not seeing the type of issues you would see if they were having issues. So that leaves the pop/imap and SAN systems. The SAN system had some cards replaced yesterday by Compellent so we immediately thought that this was the cause of the problem and asked them to give us back the old cards. They're told this isn't possible. We've had 2 x 1hour long phone calls with them so far today where we went over all the metrics on the SAN. Disk latency, network latency, volume latency, IO throughput etc. Everything on the SAN looks normal. So that leaves the NFS server + NFS clients. We would normally see upward of 300Mbit/s of traffic between the clients and the server, today this is showing as 10-20Mbit/s so it's fairly obvious that the problem is entered around NFS. This is where we are now concentrating all of our efforts. To figure out what is causing this and to fix it.

Update: 15:45: A number of people who forward their email onto gmail / hotmail etc have been getting their email all day. This is expected. SMTP inbound i.e. mail delivery from others into us is working ok. The issue is the pop/imap connections from your e-mail clients and are problematic. For those that asked, all the servers are back online now. We're still seeing the performance issue after the tweaks / changes we've made but forwarding should be working ok right now. Again please accept our sincerest apologies for the issues this is causing you all.

Update: 16:50: Sorry about the previous comment. It was a direct response to some customers having issues with forwarding. E-mail is still down but no e-mail will be lost. Again sorry about this outage, it's the single longest outage we have ever had. It is the number 1 priority and has been all day.

Update: 18:45: Sorry about the delayed response since our last update. We believe we have identified the cause of the issue. We're not sure exactly where the problem lies but we can see some weird network traffic between the NFS server and the SAN. We're in discussions with Compellent now to get them to shine some light on the situation.

Update: 20:50: Having spent most of the evening with Compellent they did find a problem with the Write Cache on one of the controllers. This happens to be the primary controller for the mail storage system. So this has been resolved. It hasn't fixed the issue completely but we turned e-mail back on for 30 minutes and we saw a lot more traffic over the network so it looks like we're quite close.

At this point we're going to begin syncing e-mail back to the old mailstore in order to have a fallback. This will also give allow us to eliminate the current server as the problem if that is the issues we've been having. 

Update: 21:15: We're instigating a roll back plan to the old mail storage box until we can nail down what is causing the issues on the newer one. 

Update: 23:30: The roll back plan is going to take a number of hours to put in place. Currently e-mail is syncing back to the old mail store and it's about 25% done right now. Despite Compellent  finding an issue with the Cache settings on the SAN controller for this volume it didn't have a positive impact on the mail performance. So mail is currently completely switched off.

Day Changed to November 8th:

Update: 03:30: The data copy back to the old storage node is progressing well. Will check in on it again at 06:00.

Update: 06:15: The copy to the old storage node is almost completed. The ETA still stands at 09:00 to have mail backup and running.

Update: 06:56: The 9am ETA is this morning Tuesday 8th of November.

Update: 08:45: POP and IMAP have been switched back on. During the night we moved back to mailstore1 and we also converted the mail system away from Courier-IMAP to Dovecot. This change we hope brings significant performance improvements through better indexing and logging. SMTP will take a while to turn back on unfortunately. ETA for smtp is now 11am.

Update: 09:15: People are saying to our helpdesk that they're having problems with IMAP connections. They can't sync folders. We're investigating this now.

Update: 09:45: POP3 seems to be working ok for most customers. IMAP is intermittent and we're trying to figure that out. Webmail relies heavily on IMAP, so when IMAP is fully working so will Webmail.

Update: 10:44: We are working our way through some file permission issues. Once we get these sorted we'll have everything backup. The main issue right now is e-mail delivery and IMAP/Webmail access. We are not going to make the 11am Deadline on this unfortunately. The ETA is being pushed onto Midday.

Update: 11:55: Right now we have e-mail flowing from the general internet and our inbound scanning boxes into Qmail. So people who are able to get onto POP3 will begin receiving email in the next while. We estimate around 1,000,000 or so e-mails are queued for delivery, a lot of which will bounce because they're spam messages. So far we've seen around 250k of these go into the local delivery queues on the mail servers. So things are progressing all beit slower than you would like. The reason for this is that we have an abnormally large number of users trying to get their e-mail because of the prolonged outage.

Update: 12:45: We have been working with Parallels to get Dovecot working properly. Dovecot is built to work with NFS storage and is programmed in such a way that it is NFS friendly. We have got it working on 2 of the 4 mail servers currently and we've processed well over 500k mails and delivered them to your inboxes. Some of you may also have noticed that SMTP is working but it's still a little patchy due to the high volume of inbound e-mail however it's not as bad as it was at 11:30. There is still a fair bit of e-mail to get through right now but the system is handling it very well.

Update: 14:10: All e-mail has been delivered to their respective mailboxes at this stage. POP3 is working but not on SSL. IMAP and SMTP are intermittent still but we're close to having those resolved. Also as mentioned earlier IMAP being offline or not working fully means webmail isn't working yet. ETA for full restoration is another 2 hours unfortunately.

Calling support to look for an update is futile as the engineering team are putting the updates here first and passing the url onto support. They do not know more than the information is being put here.

Update: 16:15: We believe we have nailed down the right combination of limits for IMAP to be stable. We made some changes about 15 minutes ago and we're monitoring connections to it right now. Once we deem it stable we'll turn webmail back on as we're acutely aware that a number of customers only use Webmail.

Update: 17:10: We turned webmail back on at 16:20 this evening. We've been monitoring it closely and so far we're happy with the performance. As of now this issue is finally resolved.

A few points to note:

1) if you used to pop mail and leave it on the server, you'll have to re-download all your e-mail. This is unfortunately unavoidable.
2) we have moved away from Couier-IMAP to Dovecot. Dovecot does some very smart caching on the mail server and this appears to be doing great things for performance.
3) pop before smtp is no longer supported. We appreciate that this might cause issues for customers but unfortunately we can't turn it back on.

We will post an update on our main company blog and here on the status blog with further information about this issue once we've had time to diagnose it fully and produce a report for the management team here.

All Services should be functioning normal as of 16:20 this evening.
 


 

Hosted Exchange Mailbox issue

TrackBacks (0) Comments (0)
Summary: One of the databases on one of our Mailbox servers is a non consistent state. As a result a number of customers are unable to get access to their e-mail. When we have more information on the issue we'll post an update. I apologize for this issue as it should have been spotted sooner. A number of actions will be taken internally to ensure that this doesn't happen again.

Update: 14:40: Due to file system corruption on one of our mailbox servers one of the databases got corrupt. This has been resolved now. To prevent it from occurring again we'll be moving some of the mail from the old mailbox stores to newer servers that we put in place 2 months ago. 

mail.blacknight.com / smtpr1.cp.blacknight.com pop/imap/smtp performance degradation

TrackBacks (0) Comments (0)
Summary: We have reported an odd issue to Compellent about drop offs in performance on our SAN. They've come back to us saying that they wish to replace all 4 Dual port 10GigE cards in both controllers as a result of the issue we are seeing.

While the issue at hand isn't causing a lot of problems it could do in the coming hours. Compellent have dispatched an engineer with all new cards for both controllers. The SAN in this instance provides the storage for all of the Qmail services that we currently operate. The connections from the san to the mailstore and other services are working ok. i.e. all 8 logical paths to the storage network are working but we are seeing a high error rate on the ports in the controllers which could cause performance issues.

At approx 17:00 today we'll be replacing all the cards in the SAN while the it is online. This can be done because it's built in a resilient fashion. We'll post an update in a few hours once we have firm timelines from Compellent.

mail delivery delay last night Nov 3rd

TrackBacks (0) Comments (0)
Summary: We have had some complaints about mail delivery last night that it was delayed. We have found that the cause of this is similar to an issue we've found on other machines with 10GigE interfaces that are very new. There is a new driver that we will apply to all machines involved and that will prevent the issue from occurring again.

This affected inbound e-mail into our spam scanning server that sits in the Cloud. No e-mail was lost during this window. It was resolved around 00:00 last night.

pemvzlin13 down

TrackBacks (0) Comments (0)
Summary: This machine is down, an engineer is looking into what the issue is.

VPS affected:

78.153.208.024
78.153.208.183
78.153.209.035
78.153.209.104
78.153.209.186
78.153.209.221
78.153.210.019
78.153.210.059
78.153.210.102
78.153.208.061
78.153.210.208
78.153.210.131
78.153.211.022
78.153.209.236
78.153.208.055
78.153.211.045
78.153.211.050
78.153.209.220
78.153.208.253
78.153.211.176
78.153.209.171
78.153.211.036
78.153.208.227
78.153.209.101
78.153.210.013
78.153.210.252
78.153.211.026
78.153.211.058
78.153.211.124
78.153.211.129

Update: 08:50: It looks like a drive failure caused the raid array to go offline on this host. The machine is backup now and we'll replace the drive today.

All the VPS are booting now, quite a number are already back online.

Update: 09:30: There are 9 VMs left to boot right now. Their VEIDs are:

2279
2258
2254
3073
2484
3062
2877
2221
2311

The reason for this is that a quota check / fsck equivalent is required to boot the VMs after the node went offline.

Update: 10:25: There are 2 containers still running disk checks, they are:

2484
2877

We believe they should be back at the latest 11am.

Update 11:45 All containers have now started.

Windows VPS Platform upgrade - part 2

TrackBacks (0) Comments (0)

As part of our ongoing commitment to bring you the latest in technology improvement, we are performing a major upgrade to our Windows VPS platform. The second server will be upgraded next Monday, 07/11/11, at 23:00 GMT.

The following VPSs are on this server, and will be offline for the period of this upgrade, which should take no more than 90-120 minutes. We apologise for this downtime, and hope no inconvenience will be caused.

Effected VPSs

78.153.208.116
78.153.208.127
78.153.208.28
78.153.209.211
78.153.208.168
78.153.208.172
78.153.208.169
78.153.208.171
78.153.209.151
78.153.208.62
78.153.208.176
78.153.208.179
78.153.208.184
78.153.210.75
78.153.208.222
78.153.208.223
78.153.209.118
78.153.209.160
78.153.209.164
78.153.209.107

Disk Repacement - Pemvzmps53

TrackBacks (0) Comments (0)
Due to a failed disk, our hardware node PEMVZMPS53 will be brought offline for a period of 15 minutes to replace a failed disk in a RAID.

When: 02/11/11 18:30

Services Affected: 
pemlinweb53.blacknight.com
pemlinweb54.blacknight.com

We'll update this blog post once completed.

Update 18:45
This completed at 18:45

Disk Repacement - Pemvzmps67

TrackBacks (0) Comments (0)

We been alerted to a failed disk within the RAID array of PEMVZMPS67. Because of this we are going to bring this nodes offline tonight to replace the dead disk.


When: Wednesday 2nd November at 18:10. The maintenance window will be 1 hour. 


What's affected?


78.153.215.160  pemlinweb67.blacknight.com      

78.153.215.161  pemlinweb68.blacknight.com


We will update this blog post once everything is completed.


Update 18:20

This completed at 18:20


Network issue InterXion Dub01

TrackBacks (0) Comments (0)
Summary: We're having a slight network issue in InterXion Dub01 at the moment. We have engineers on site and we're looking to find the cause and hopefully we'll fix it very shortly.

Services affected: Some access to MySQL servers, some access to some websites, Intermittant e-mail access.

Symptoms were but not limited to: Some shared hosts not being able to connect to mysql servers with unknown host errors. Some dedicated and colocated servers were inaccessible because the IP space they were on predates this data centre. Various oddities within the network that people might have observed. DNS related lookup problems which may have caused slow downs for login attempts to various systems.

Update: 12:20: this issue is fully resolved now.

Windows VPS Platform upgrade

TrackBacks (0) Comments (2)

As part of our ongoing commitment to bring you the latest in technology improvement, we are performing a major upgrade to our Windows VPS platform. The first server will be upgraded tomorrow (Wednesday 02/11/11) at 23:00 GMT.

The following VPSs are on this server, and will be offline for the period of this upgrade, which should take no more than 90 minutes. We apologise for this downtime, and hope no inconvenience will be caused.

Effected VPSs

78.153.208.57
78.153.208.86
78.153.208.91
78.153.208.94
78.153.208.111
78.153.208.67
78.153.209.136
78.153.208.155
78.153.211.69

pemlinweb95, pemlinweb96, pemlinweb97 and pemlinweb98 Issues

TrackBacks (0) Comments (0)
pemlinweb95, pemlinweb96, pemlinweb97 and pemlinweb98 are currently offline we are investigating the issue and will update.

Servers Affected:
pemlinweb95
78.153.215.233
pemlinweb96 78.153.215.234
pemlinweb97 78.153.215.235
pemlinweb98 78.153.215.236

Update @ 12:04pm:  Servers are back up and we will investigate the cause further.