October 2011 Archives

Extreme amounts of incoming traffic

TrackBacks (0) Comments (0)
We're currently experiencing high amounts of traffic flooding into our network. Our engineers are working to resolve this currently.

You've services may be latent as a result.

10:53 - This issue is still on going as our engineers continue to find the issue.

Update 11:25: The network has been stable for the last 10 minutes since 11:15. We found the destination of the attack and blocked it within our network. Normally this would be much quicker to do and we're going to investigate ways to find these types of attacks quicker so that they are less troublesome.

Currently the shared IP address of pemlinweb32 is blackholed and as such no traffic can get to it. This IP is 81.17.254.44 so if your site is still down this is why. We'll hopefully get a resolution for this shortly and we'll get this back up and running also. Thank you for your patience.

Update 15:40
Access to pemlinweb32 81.17.254.44 is now restored. Thank you for your patience.

Linux Legacy Hosting - Gorlois

TrackBacks (0) Comments (0)
We are currently experiencing issues with our legacy shared hosting linux server - Gorlois.

Our engineers are working to get the issue resolved and the server back online ASAP.

Scheduled Maintenance pemvzwin13

TrackBacks (0) Comments (0)

The above Windows VPS Hardware node requires a reboot to correct an issue.

This will be carried out at 22:00 this evening (Thursday 27/10)

The following customer VPSs will be effected:

78.153.196.130
78.153.196.131
78.153.196.133
78.153.196.135
78.153.196.138
78.153.196.139
78.153.196.144
78.153.196.145
78.153.196.152
78.153.196.155
78.153.196.153
78.153.196.161
78.153.196.167
78.153.196.170
78.153.196.171
78.153.196.172
78.153.196.173
78.153.196.177
78.153.196.178
78.153.196.180
78.153.196.65
78.153.196.62
78.153.197.33

 

We apologise for any inconvenience this may cause.

Main site flagged as containing malware by google.

TrackBacks (0) Comments (0)
It would appear that a third party site that we were using to include some html on our main website was compromised and this resulted in the main Blacknight website being flagged by Google as containing malware. We have removed the included code from our website and we're currently awaiting review by the Google webmasters team and we hope to have an update and the problem resolved shortly.

Update: 08:10: this also affects the control panel, webmail etc. Anything service originating from the blacknight.com domain name is affected.

Update: 11:50: Google have removed the block and access to all services such as Webmail, Exchange Owa and our main site. It will probably take a bit more time for all browsers to pick up the change.

Scheduled Maintenance mysql452

TrackBacks (0) Comments (0)
The MySQL Server mysql452.cp.blacknight.com with IP address 81.17.254.3, will be offline for approximately 45 minutes from 9pm Wednesday the 26th of October for scheduled maintenance.

Update 21:06

Server is back online.

pemlinweb08 Upgrade 9pm 25/10/2011

TrackBacks (0) Comments (0)
pemlinweb08 is been upgraded tonight 25/10/2011 at 9pm, we expect the upgrade to take up to 2 hours.

Shared hosting server affected:
Pemlinweb08

Update 22:10
Server is back online after upgrade

mail.blacknight.com / smtpr1.cp.blacknight.com SSL cert update

TrackBacks (0) Comments (0)
Summary: It has been 2 years since we last had to update the SSL cert for mail.blacknight.com. Yesterday we applied the new SSL cert on the mail servers. At the time we didn't include the CA bundle. However this morning we've added the CA bundle so this should mean less errors in mail clients.

If you have a mail client that is throwing an error because the cert changed please tell your mail client (outlook, thunderbird, apple mail etc) to accept the new certificate.


PEMVZWIN16 Scheduled Maintenance

TrackBacks (0) Comments (0)
PEMVZWIN16 requires a reboot for some updates. The server will be rebooted at 7:45am Tueday the 25th of October.

VPS' Affected:
78.153.196.96       
78.153.196.141     
78.153.196.237     
78.153.196.248            
78.153.196.249      
78.153.196.251   
78.153.196.254      
78.153.197.1
78.153.197.5       
78.153.197.9        
78.153.197.12       
78.153.197.10  
78.153.197.13       
78.153.197.2      
78.153.197.14 
78.153.197.15      
78.153.197.16   
78.153.197.17     
78.153.197.18      
78.153.197.20     
78.153.197.21   
78.153.197.30   
78.153.197.23   

Update 08:00
Server has rebooted and VPS' are starting

pemvzlin02 difficulties

TrackBacks (0) Comments (0)
The above Linux VPS Server seems to be having difficulty this morning. We're working on it now to get it back. This has a number of VPS on it and their primary IPs are:

78.153.208.8
78.153.208.32
78.153.208.50
78.153.208.41
78.153.208.17
78.153.208.79
78.153.208.95
78.153.208.100
78.153.208.122
78.153.208.36
78.153.208.119
78.153.208.193
78.153.208.196
78.153.209.110
78.153.209.144
78.153.209.182
78.153.209.191
78.153.209.201
78.153.209.209
78.153.209.215
78.153.209.224
78.153.209.228
78.153.209.229
78.153.208.186
78.153.208.46
78.153.210.2

We hope to have it back up and running asap.

Update: 11:10 the machine has successfully come back up after a reboot. However we're doing some diagnostics on it currently to see can we find out exactly what is causing the hanging that has been occurring. This diagnosis is impossible when the VPS' are online so please hang in there while we work on it.

Update: 11:40 We've replayed the logs from last night and we can see that the machine ran out of steam at 02:20 and at 04:50 it stopped responding for most containers that are on it. We can also see that the issue is two conflicting backup processes that appear to be causing the issue. The CDP kicked off at midnight and then around 02:00 the internal virtuozzo backups kicked off. Obviously this is not a good idea so we're going to disabled user scheduled backups on this node. It's far safer to have CDP backups than the internal virtuozzo ones.

FYI we're also running a forced fsck (disk check) on the filesystem that houses the VPS on this node to ensure that it is in a consistent state. So far it hasn't found any issues but we want to be 100% sure. In a recent DR test we found that the bare metal restored fs mounted fine but had some underlying issues. An force fsck (disk check) fixed the issues we found.

Update: 12:20 All our checks and testing was completed at around 12:10. We rebooted the machine once again but unfortunately it is now doing another fsck on the /vz file system. This will take about 60 minutes to complete. 

Update 12:45: We've managed to get the machine to skip the disk check as we know there is no issue with it. It is booted now and containers are starting.

ETA for being back online is on or before 12:45


mail.blacknight.com / smtpr1.cp.blacknight.com brief outage notification

TrackBacks (0) Comments (0)
Summary: At 23:00 tonight Friday 21st of October we're going to update the NIC card drivers for the 10GigE network cards that provide connectivity to the SAN. To do this we have to take mail down for a window of approx 15 minutes.

This is to hopefully resolve a bug that we've seen manifest a number of times since the mail storage platform was upgraded last Friday night. Once this is complete we hope that the mail system will function better moving forward.

Update 23:30: The driver upgrade was successful and mail was down for approx 20 minutes. We're monitoring this situation very closely. 

mail.blacknight.com / smtpr1.cp.blacknight.com mail delivery delays

TrackBacks (0) Comments (0)
Summary: 2 of the 4 servers that deliver mail to mailboxes currently have a few thousand messages in their queues at the moment. We expect this to die down in the coming 60-90 minutes as it was due to a number of large mailers from various companies hosted outside the qmail cluster. As a result the spam scanning servers are very busy.

Additionally we're moving the primary spam scanning box to new hardware which should improve performance 10 fold and increase delivery times in periods of high inbound email load.

Update:
The upgrade has greatly improved performance.

Emergency Maintenance - pemvzlin13

TrackBacks (0) Comments (0)
The server pemvzlin13 needs to be rebooted for emergency maintenance purposes.  The server will be taken offline to perform a standard file system check and then rebooted.

The maintenance window will start at 07:00 tomorrow morning (20/10/2011) and is expected to take between thirty minutes to two hours, depending on the speed of the file system check.

The following IP addresses will be affected:

78.153.208.24
78.153.208.55
78.153.208.61
78.153.208.146
78.153.208.183
78.153.208.207
78.153.208.227
78.153.208.253
78.153.209.35
78.153.209.101
78.153.209.104
78.153.209.168
78.153.209.171
78.153.209.186
78.153.209.216
78.153.209.220
78.153.209.221
78.153.209.236
78.153.210.13
78.153.210.19
78.153.210.59
78.153.210.102
78.153.210.131
78.153.210.167
78.153.210.185
78.153.210.208
78.153.210.252
78.153.211.13
78.153.211.15
78.153.211.22
78.153.211.26
78.153.211.36
78.153.211.45
78.153.211.50
78.153.211.54
78.153.211.58
78.153.211.124
78.153.211.129
78.153.211.155
78.153.211.157
78.153.211.161
78.153.211.172
78.153.211.176
78.153.211.177

Update 8:00

Maintenance completed

pemvzlin02 having issues

TrackBacks (0) Comments (0)
Summary: pemvzlin02 seems to be having difficulty this morning. We're working on it now to get it back. This has a number of VPS on it and their primary IPs are:

78.153.208.8
78.153.208.32
78.153.208.50
78.153.208.41
78.153.208.17
78.153.208.79
78.153.208.95
78.153.208.100
78.153.208.122
78.153.208.36
78.153.208.119
78.153.208.193
78.153.208.196
78.153.209.110
78.153.209.144
78.153.209.182
78.153.209.191
78.153.209.201
78.153.209.209
78.153.209.215
78.153.209.224
78.153.209.228
78.153.209.229
78.153.208.186
78.153.208.46
78.153.210.2

We hope to have it back up and running within the next 15 minutes

Update 9:10
VPS servers are now coming back online

pemlinweb71 & pemlinweb72 Scheduled Maintenance

TrackBacks (0) Comments (0)

pemlinweb71 & pemlinweb72 will be offline this evening the 17th October for approximately 45 minutes from 9pm to allow for migration to hardware.


Servers Affected:
pemlinweb71 78.153.215.164
pemlinweb72 78.153.215.165

Update 21:40
The migration is going slower than expected but it's 100% required to ensure continuity of service, we will update as soon as we have more information.

Update 22:00
pemlinweb72 78.153.215.165 has been migrated and booted. pemlinweb71 is ongoing.

Update 22:20
Both servers are now migrated and online.

mail.blacknight.com

TrackBacks (0) Comments (0)
We're currently experiencing issues with our qmail cluster, mail.blacknight.com - Our engineers are working to resolve this ASAP and all updates will be posted here as they become available.

UPDATE 11:32 - All services have been resumed. 

pemlinweb71 & pemlinweb72 issues

TrackBacks (0) Comments (0)
pemlinweb71 & pemlinweb72 are currently offline we are investigating the issue and will update.

Servers Affected:
pemlinweb71 78.153.215.164
pemlinweb72 78.153.215.165

An engineer has been dispatched to the data center to resolve.

UPDATE 09:57 - There is an issue with the servers RAID - it has been resolved and a file system check is currently in operation.

UPDATE 10:27 - Both servers are back online.

mail.blacknight.com / smtpr1.cp.blacknight.com major maintenance

TrackBacks (0) Comments (0)
Summary: Following on from this weeks earlier maintenance we're going to do the final move of e-mail to the new storage platform tonight October 14th.

We'll take mail offline again at 23:00 hours for approx 2 hours until 1am. This move is essentially just an rsync of data from the old system to the new system and that's it. We've already seeded the data to the new platform using a recent backup so there isn't a huge amount of data to be copied. We will then re-mount the new drives onto the mail servers and begin the mail flow once again after some essential tests.

Tomorrow Saturday 15th we'll monitor everything closely and see how it all performs.

Update: 01:20: We've gone over time on this but the difference between the restored backup and the data on the old mail storage node is more significant than we expected. The copy is approx 50% done and we estimate about another hour to two hours for completion.

Update: 03:30: The copy is going very well, it shouldn't be too much longer. Current estimates for complete are approx 04:15am

Update: 04:15: It's very close to being completed right now. There's probably another 30-40 minutes max left. ETA is now 05:00.

Update: 04:45: Mail has been back and stable for the past 10 minutes. We have full visibility on the storage platforms performance now and everything is well within specified parameters.

pemlinweb71 & pemlinweb72 Issues

TrackBacks (0) Comments (0)
pemlinweb71 & pemlinweb72 are currently offline we are investigating the issue and will update.

Servers Affected:
pemlinweb71 78.153.215.164
pemlinweb72 78.153.215.165

Update 8:50
Servers are back online after a reboot

pemvzlin02 having issues

TrackBacks (0) Comments (0)
Summary: pemvzlin02 seems to be having difficulty this morning. We're working on it now to get it back. This has a number of VPS on it and their primary IPs are:

78.153.208.8
78.153.208.32
78.153.208.50
78.153.208.41
78.153.208.17
78.153.208.79
78.153.208.95
78.153.208.100
78.153.208.122
78.153.208.36
78.153.208.119
78.153.208.193
78.153.208.196
78.153.209.110
78.153.209.144
78.153.209.182
78.153.209.191
78.153.209.201
78.153.209.209
78.153.209.215
78.153.209.224
78.153.209.228
78.153.209.229
78.153.208.186
78.153.208.46
78.153.210.2

We hope to have it back up and running within the next 15 minutes

Update 9:00

All VPS are now back online

mail.blacknight.com / smtpr1.cp.blacknight.com mail download slowness

TrackBacks (0) Comments (0)
Summary: Since approx 11:30 this morning people have been experiencing some delays in downloading e-mail, it appears to download very slow and sometimes it times out. Between 11:30 and approx 13:30 is the busiest time of day for the mail servers and we expect that a rebuilding raid array may be causing this issue. The array should be rebuilt completely by mid to late afternoon today at which time things should improve dramatically. We can't really do anything to speed this up in the interim.

Additionally once we perform the major upgrade later this week of the storage backend we expect mail to function 100% and perform much much better than it has been.

Update @ 16:45:  Mail raid rebuild is still continuing at this time.

Update: 19:55: The raid array that powers the current mail cluster finished rebuilding at 18:45:54 this evening. We expect it's performance to return to normal now.

pemwinweb24, 25 and 26 outage

TrackBacks (0) Comments (0)
Summary: These 3 virtual machines are in the bottom of the rack where the mail equipment was being moved from. The power cable got snagged when we were removing the equipment from the rack.

The following IPs were on these nodes:

pemwinweb24

81.17.250.61
81.17.250.168
81.17.250.185
81.17.250.210

pemwinweb25

81.17.250.62
81.17.250.169
81.17.250.173
81.17.250.181
81.17.250.202

pemwinweb26

81.17.250.63
81.17.250.171
81.17.250.175
81.17.250.199

We are working to get this node back online as soon as possible.

Update: 03:00: The above servers are still down due to an issue with the file system on this machine. We're attempting to do a recovery of some system files but this is taking a long time. We have some other options on the cards such as restoring backups. We'll post more updates as they become available.

Update 07:15 The above servers are now back online, we have repaired the file system and the servers have booted.

pemlinweb91 & pemlinweb92 Outage

TrackBacks (0) Comments (0)
pemlinweb91 & pemlinweb92 are currently offline, we are investigating the issue and will update.

Servers Affected:
pemlinweb91 78.153.215.218
pemlinweb92 78.153.215.219

Update 16:15

Node has been rebooted and both servers are back online.

mail.blacknight.com / smtpr1.cp.blacknight.com major maintenance

TrackBacks (0) Comments (0)
Summary: On 23:00 on Monday 10th of October we're going to take the mail cluster offline completely for approx 4 hours until 03:00. This is to facilitate the following:

1) Upgraded storage space for the mail store
2) Upgraded performance for the mail store
3) Upgraded file system from ext3 to ext4 for performance reasons

We're doing this in stages.

Stage 1)

We've restored the existing mail system from a recent backup to a new location. We'll begin copying that data to the new Mail Storage node which has 20Gbit/s of connectivity to our SAN. We estimate that this will take approx 24-48 hours to complete.

Stage 2)

We will then take the mail system offline again in a couple of nights to do a final sync of the changes on the new system. This will depend largely on the amount of time it takes to do the copy over to the SAN. Estimations range from between 36 and 50 hours approx.

Tonight we're simply relocating the servers to a new rack in close proximity to our SAN. And while they're down we're doing a clean up of old mailboxes and purging data that is no longer required.

Update: 03:30: Everything went as planned tonight regarding this move. The servers are now in their new home. This week we hope to swap the storage backends. We'll put a further maintenance window in place for this with an ETA (as of now) for Friday night this week. 

pemvzlin02 having issues

TrackBacks (0) Comments (0)
Summary: pemvzlin02 seems to be having difficulty this morning. We're working on it now to get it back. This has a number of VPS on it and their primary IPs are:

78.153.208.8
78.153.208.32
78.153.208.50
78.153.208.41
78.153.208.17
78.153.208.79
78.153.208.95
78.153.208.100
78.153.208.122
78.153.208.36
78.153.208.119
78.153.208.193
78.153.208.196
78.153.209.110
78.153.209.144
78.153.209.182
78.153.209.191
78.153.209.201
78.153.209.209
78.153.209.215
78.153.209.224
78.153.209.228
78.153.209.229
78.153.208.186
78.153.208.46
78.153.210.2

We hope to have it back up and running asap.

UPDATE 08:08 - The server is running a file system check on it's main partition currently, ETA 20 mins.

UPDATE 08:43 - File System Check is still running but currently at 67.7%

UPDATE 10:59 - We are still experiencing issues with the file system. We are working continually to fix it ASAP. We appreciate your patience.

UPDATE 11:29 - Again, thanks everyone for your patience. The server is now back online and your VPS's are currently booting.

Billing upgrade - Thursday 6th at 03:00

TrackBacks (0) Comments (0)
Summary: Parallels have informed us of an update for our billing system which prevents a number of serious problems from re-occuring. Namely it'll prevent more than 1 daily billing process from running. After the upgrade from PBA 5.0 to 5.1 we had a situation where upto 3 daily billing processes were able to run. This resulted in around 150 subscriptions being double billed. This update fixes this problem.

Between 03:00 and 06:00 the store will be up and down, also the billing tab in the control panel will be down for much of it.

Legacy Shared Windows server - dinadan

TrackBacks (0) Comments (0)

This server is currently experiencing difficulties serving HTTP traffic.

 

We are aware of the issue, and hope to have a resolution as soon as possible.


Update 17:30

Issue is now resolved

Linux Shared Hosting PEMLINWEB15/16 Reboot

TrackBacks (0) Comments (0)
In an effort to keep all shared hosting environments up to date with the most recent kernels we need to reboot the hardware node that both pemlinweb15 and pemlinweb16 live on tonight.

The estimated downtime is no more than 10 minutes.

When: 03 Oct 2011 @ 20:00

What's affected: 

pemlinweb15.blacknight.com
pemlinweb16.blacknight.com

Phone issues

TrackBacks (0) Comments (0)
We are currently experiencing issues with our phones in our Carlow offices

UPDATE 13:00 The phones appear to be working properly at present however we are still investigating the issue ie. what caused the problem

UPDATE 1328 we are still looking into the cause of the issue, but rebooting one of the firewalls seems to have fixed it. At present the network in the office and our phonelines appear to be stable so we'll tentatively say that this issue has been resolved

gorlois.blacknight.ie outage

TrackBacks (0) Comments (0)
Services on gorlois.blacknight.ie have become non-responsive and the server requires a reboot.

We hope that the server should return to normal after the reboot, however engineers will be monitoring should there be further issues.

Update @ 11:08:

The issue has now been resolved

Scheduled Maintenance mysql452

TrackBacks (0) Comments (0)
The MySQL Server mysql452.cp.blacknight.com with IP address 81.17.254.3, will be offline for approximately 30 minutes from 7am Tueday the 4th of October for scheduled maintenance.