February 2009 Archives

Linux VPS node - pemvzlin05 currently down

TrackBacks (0) Comments (0)
One of our Linux VPS nodes - pemvzlin05 is currently experiencing some issues at the moment and so if your VPS appears to be down it could be on this node.

We hope to have service back up on the VPS in the next few minutes though our engineers will need to wait until service is back to fully investigate the issue and try resolve any possible lingering issues.

Update: 12:23

All containers are started on this node now. It's back to normal. This issue is related to a back office issue and we're going to resolve it with the help of Parallels.

Emergency Notification - Plesk on Windows VPSs

TrackBacks (0) Comments (0)

The following VPSs will need to be rebooted to fix an issue occourring with Plesk on one of the Windows hardware nodes.

 

78.153.208.143
78.153.208.183
78.153.208.251
78.153.209.106
78.153.209.162
78.153.209.234
78.153.209.240
78.153.208.109
78.153.208.110
78.153.208.90
78.153.208.163
78.153.209.132
78.153.209.161
78.153.209.226

 

It it proposed to carry out this maintenence at 12:00 midday today. Expected downtime is 20 minutes.

Emergency Maintenance - Windows VPSs

TrackBacks (0) Comments (0)
An issue has arisen on one of our Windows VPS hardware nodes which has caused Plesk not to function as expected for all VPSs on this node. The list of effected VPSs is as follows:

78.153.208.229
78.153.208.230
78.153.208.209
78.153.208.254
78.153.208.248
78.153.208.249
78.153.208.251
78.153.209.12
78.153.208.96
78.153.208.185
78.153.209.23
78.153.209.26
78.153.209.29
78.153.209.159
78.153.209.40
78.153.209.43
78.153.209.44
78.153.209.48
78.153.208.97
78.153.209.59
78.153.209.57
78.153.209.65
78.153.209.84
78.153.209.88
78.153.209.106
78.153.209.98
78.153.209.97
78.153.209.117
78.153.209.123
78.153.209.133
78.153.209.68
78.153.209.134
78.153.209.138
78.153.209.95
78.153.209.152
78.153.209.150
78.153.209.153
78.153.209.162
78.153.208.19
78.153.209.200
78.153.209.167

These VPSs will need to be rebooted. I propose to carry out this reboot at 23:00 GMT today.

Update 23:50

This maintenenance window was successful, and is now deemed to be closed.

Intermittant email issues on cp.blacknight.com mailservers

TrackBacks (0) Comments (0)
If your hosting package is through cp.blacknight.com (minimus, medius, maximus) and you are having some email issues either receiving, sending, or with the webmail, then it is most likely related to an issue we are currently having on our new shared mailserver cluster.

The load balancer that divides the email load between our shared mailservers is not functioning fully at the moment and is sending more mail through one single server rather than sharing it equally.  As such that mailserver is getting overloaded and causing intermittant mail issues for some users.

Our engineers are looking into this now and we hope to have it resolved as soon as we can.

Update: 12:15 - This issue should be fully resolved now.  We have liased with the control panel vendors about this and it looks like the issue was due to a configuration problem where one mailserver was taking the vast majority of email requests due to it saying it was the load balancer. 

Issues With Shared Server Morgana

TrackBacks (0) Comments (0)
There are and have been intermittent issues with the linux shared server Morgana this evening.

Our technical staff are aware of the issue and are working on it

Update 2330

We have isolated the site causing the issue and disabled it. All services should now be normal

OpenSRS / Tucow Maintenance

TrackBacks (0) Comments (0)
OpenSRS / Tucows, who we use for some ccTLD registrations as well as .tel, have informed us of a maintenance window this weekend.

Domain registration and management for any tlds via OpenSRS will be unavailable between 2300 and 0300 UTC on February 28 - March 1 2009

Currently registered domains will not be affected, although no changes / updates or new registrations will be possible.

Affected TLDs include:
.es
.asia
.be
.at
.ch
.it
.tel

Shared server "Morgana" is currently down

TrackBacks (0) Comments (1)
Morgana has been having intermittent service issues this evening. An engineer is currently working on it and we'll update this ticket when we have anything to report.

Update: 21:00

The server is back up since 20:45, we're monitoring the situation for the next few hours and this ticket will remain open until we're satisfied that the server has stablised. We believe this issue was caused by a surge of traffic to a popular blog that is hosted on this machine.

Accidental auto-responder on Helpdesk

TrackBacks (0) Comments (0)
We've all had those bad Monday mornings and unfortunately today was no exception for one of our developers just back from her holidays!

An auto-responder she had set up before she left acted a bit out of control this morning when she turned her PC on.  If you have received a new reply to any older tickets from our helpdesk please ignore them.

If the ticket that has been responded to has not been fully resolved yet please respond and let us know you are still having an issue as all tickets replied to by the auto-responder have been closed now.

Mail auth issues on qmail cluster

TrackBacks (0) Comments (2)
We're investigating an issue with our qmail cluster at the moment. This issue is causing auth problems for pop and imap however mail is being received into the system so it's not being lost.

I'll post an update soon as we've got further information.

Update: 11:15

This issue is now resolved. The main issue was IMAP auth, further testing showed that POP3 was working but IMAP wasn't. We've restarted services and rebooted each mail server in succession and this has resolved this issue.


cp.blacknight.com throwing unforeseen errors

TrackBacks (0) Comments (2)
Currently our primary control panel is throwing errors for new logins. We're working on this issue and we've escalated it to our control panel vendors.

Current ETA is unknown at this time.

Update: 16:05

The control panel has been backup since 16:05. The tomcat process that drives the UI for the control panel has been having issues of late and we're awaiting a fix from Parallels to get rid of the problem. In the mean time we're monitoring this very closely and we'll post further updates here as we have them.

Eurid Scheduled Downtime Tuesday 17 February 2009

TrackBacks (0) Comments (0)
Eurid have informed us that they will be conducting maintenance on the registry backend on Tueday 17th February 2009 between 1800 and 2300 CET.

During this window existing .eu domains will not be affected, but WHOIS, DAS and registration services will be impacted.

Full details as supplied from Eurid are below:

EURid will perform system maintenance on Tuesday, 17 February. The
maintenance will be carried out during the maintenance window, between 18:00
and 23:00, Central European Time. We expect to limit downtime to the first
half hour.

During the downtime, we will implement a bug fix to ensure that transfer
authorisation codes are sent under all conditions. We will also fine-tune
the public DAS and WHOIS services in response to the brief, unexpected
downtimes that occurred on Tuesday morning and earlier today.


Network Issues

TrackBacks (0) Comments (0)
We have been experiencing  some network issues this afternoon resulting in packet loss

Our technical team are working on a full resolution and a more detailed report will be published as soon as it is available

Update: 16:40

The issue is resolved and we're investigating the problem. We believe a compromised machine was sending our large volumes of UDP traffic that was causing the packet loss. Once we've got further information we'll send it out.

Update: 16:55

We've narrowed this issue down to the HA firewall pair in DEG that protects certain IP ranges running out of sessions per second. Currently this equipment is a HA pair of Cisco 5520 ASA firewall appliances. They have a maximum session per second count of 280k.

bk1-fw3# sh conn count
185559 in use, 280000 most used


The MAX used above is the MAX this firewall pair can do. We're ordering another pair that is similarly configured and we'll move some networks off this pair. This will prevent this issue from occuring again.

However *note* the networks hanging off of this firewall are considered at risk until we've put in new hardware. If you have any questions please let us know.

cp.blacknight.com pop3/pop3s currently not working fully

TrackBacks (0) Comments (0)
On our new shared hosting platform, we have a problem with locking on the NFS file servers which is currently preventing mail delivery and collection.

This issue is sporadic but we hope to have things back to normal shortly.

Update: 15:35

Mail is back. We're still working on the control panel issue. Hopefully this will be resolved shortly.

Update: 16:10

The cp is now back working.


Cisco ASA Software Upgrades Feb 11th 23:00

TrackBacks (0) Comments (0)
When: Wednesday February 11th at 23:00 hours until 2am

What: Starting at approx 23:00 hours we'll be installing new ASA software versions on the two HA pairs in DEG and InterXion. We'll do DEG first which affect anyone in the following IP ranges:

81.17.244.0/22
81.17.248.0/23 (Windows Hosting, Helm)
81.17.252.0/23 (Linux Hosting, Directadmin)
78.153.222.128/27
81.17.242.104/29 (Blacknight Website Infrastructure)

InterXion will follow and the following ranges there will be affected:

78.153.212.0/24 PEM CP/Backend infrastructure (cp.blacknight.com
81.17.254.0/23 New shared linux hosting services, mysql servers, web servers, mail servers
81.17.250.0/23 New shared windows hosting servers, SQL servers etc
78.153.200.0/23
78.153.208.0/22 VPS public network block

Outage possibilities:

While outage possibilities are slim, there could be upto 5 second hits as the ASA's failover during the upgrade. Each HA pair is completely resilient in design and normally updates are hitless, but we're learned from experience that this isn't always the case.

We're classifying this notification as  none service affecting and for information purposes only but we would say that services to the above network ranges are at risk during this window.

If you have any questions please contact us ASAP.

Update: 22:00 Wednesday 11th

This is just to notify people that we'll be starting this maintenance window in approx 1 hour from now.

Update: 23:10 Wednesday 11th

This maintenance window is complete. We reloaded the standby firewalls between 22:30 and 23:00 so the reload of the live firewalls lasted around 15 seconds each. We recorded approx 19 packets to drop during each reload. However all TCP sessions stayed up and also all web requests were queued for several seconds during the reload. So all in all a hitless maintenance window as we predicted.

Emergency Notification - Windows VPS Node

TrackBacks (0) Comments (0)

One of our Windows VPS nodes requires an emergency reboot this evening due to some performance issues. We propose to carry out this maintenence at 23:00. All VPSs on this node will be effected. The IPs of these VPSs are as follows:

78.153.208.229
78.153.208.230
78.153.208.209
78.153.208.254
78.153.208.248
78.153.208.249
78.153.208.251
78.153.209.12
78.153.208.96
78.153.208.185
78.153.209.23
78.153.209.26
78.153.209.29
78.153.209.159
78.153.209.40
78.153.209.43
78.153.209.44
78.153.209.48
78.153.208.97
78.153.209.59
78.153.209.57
78.153.209.65
78.153.209.84
78.153.209.88
78.153.209.106
78.153.209.98
78.153.209.97
78.153.209.117
78.153.209.123
78.153.209.133
78.153.209.68
78.153.209.134
78.153.209.138
78.153.209.95
78.153.209.152
78.153.209.150
78.153.209.153
78.153.209.162
78.153.208.19
78.153.209.200
78.153.209.167

Some VPSs will also be moved off this node onto a new node, but these customers have already been informed seperately.

 

Update 1:

We have run into some complicated issues - who said a reboot was going to be easy! We are currently working with the vendors of the virtualisation platform to try to provide a speedy resolution. Rest assured, no data has been lost, and we will attempt to get the VPSs back up and running as soon as possible.

Update 2: 09:50

The VPS servers are all running now. There are some lingering issues with the hardware node where it's showing as "offline" to our provisioning system. Parallels are working to resolve this issue. Until we get the "all clear" from them, Plesk and Virtuozzo control panels will be unavailable, but services shouldn't be affected. Also we'll need to apply windows updates to this node in the very near future as they had to be rolled back.

Update 3: 10:20

We've re-installed some components to the service VPS that drives the Plesk and Virtuozzo services on that hardware node. Now the node is showing as active, as are all the VPS servers and Plesk and Virtuozzo are now available to all customers. This matter is now resolved.

HelpDesk Emergency Maintanence

TrackBacks (0) Comments (0)
Email and online support will be offline for an hour this evening while it's moved over to new hardware. Any emails send in will be queued until the system is back up and running again.

Transient Network Issues On Some VLANs

TrackBacks (0) Comments (0)
Our technical team is currently investigating some transient network issues, including packet loss, affecting some VLANs

Update: 20:45

We've narrowed this issue down to a firewall pair in Data Electronics that protects about 8 vlans. These range from customer vlans to vlans for our old shared hosting platform. All servers connected to these networks experienced packet loss ranging from 2% to 50% sporadically during this extended issue.

We've still not found the route cause, but services are currently fully restored and our investigation continues.

If there are any issues lingering from this issue please e-mail the support desk, dedicated and colo customers please call the on-call number.

Co.uk Registry Maintenance

TrackBacks (0) Comments (0)
Nominet, which manages the co.uk domain registry, will be conducting maintenance on Tuesday February 3rd from 8am for approximately one hour.

During this period no new domain registrations or updates will be processed.

Existing domains will not be affected