July 2011 Archives

gb.com Offline

TrackBacks (0) Comments (0)
All gb.com domain names are currently offline.

It's not very clear exactly what is happening, but Centralnic, the company that was acting as registry for *.gb.com domain names, issued a statement a short time ago.

We will post updates when we receive them.

UPDATE Sunday 1400: Centralnic have confirmed that the issue relates to a legal dispute.

UPDATE Sunday 1750: Centralnic have informed us that they will be signing up all registrars' clients to the gb.com service to mitigate the damage. As Centralnic have the registration details etc., for all gb.com domains then they should be in a position to provide DNS server details etc.

MySQL server problem

TrackBacks (0) Comments (2)
We are currently experiencing problems with 2 MySQL servers. We have dispatched someone to have a look at it to see if the problem can be identified.

Update: 23:35: These two nodes are back online.

Scheduled Maintenance Pemvzmps42 & Pemvzmps63

TrackBacks (0) Comments (2)
Summary: pemvzmps42 and pemvzmps63 require a reboot to faciliate maintenance. The Servers will be offline from 8am Monday the 25th July for approximaetly 1 hour.

The following VPS servers will be down during this operation:

pemlinweb38.blacknight.com
mysql519.cp.blacknight.com
pemlinweb64.blacknight.com
pemlinweb63.blacknight.com

Update 08:50: Completed successfully.The two nodes were down until 08:40.


pemvzlin16 issue - all vps down

TrackBacks (0) Comments (5)

: pemvzlin16 is down at the moment, we're looking at it to see what the problem might be. An engineer is on site with it at the moment.

Update: 10:00: We are continuing to work on this server, we hope to have it back online by approx 11:00 am. Further updates will follow.

Update: 11:20: This machine is still being worked on, we've run over our last ETA, sorry about that. We hope to get the node backup and running as soon as possible. The current situation is that we're waiting for a raid array to rebuild, this is a time consuming process but we want to be sure that it completes before booting the machine.

Update: 12:00: After several attempts to do a manual disk check on this server we have been unsuccessful. The data on the /vz partition where the vps servers resides appears to be badly corrupted. As this juncture we're looking at doing a restore which could take upto 48 hours to complete or maybe longer due to the sheer volume of the data contained on the node. We're talking several TB of data.

If customers have their own backups offsite that they wish to restore let us know and we can re-provision your vps on another node. Otherwise you'll have to wait upto 48 hours at least before we will be able to get you back up and running. We apologise for this but it was outside of our control. Raid card can sometimes do very weird things and in this instance it has somehow corrupted the filesystem on one of the arrays in this machine.

Update: 22nd July 14:16: Currently the restoration process is going ok, our engineers have built a new array and began the restoration of data on to this.  We have encountered some problems with the data we are restoring as there is some level of corruption.  We have already raised this with the R1soft CDP vendors (the backup software we use) but are continuing to work on the data in the meantime.  We will not have a better picture on the level of corruption and data that is restorable until tomorrow afternoon at the earliest, we hope to have another update at that time but we are looking at a few days until relatively full service is restored.

Update: 23rd July 15:05: Unfortunatly the file transfer process is taking longer than expected due to the volume of small files.  We now hope to be able to fully install and test the recovered data sometime between 8-10pm tonight.  We can see that the level of corruption on files is low, around 10%, though we do not know if any vital files are corrupted yet. After we are able to test the recovered data we will see if there are any further ways to recover or repair any corrupted files.

Update: 23rd July 23:15: Our engineers are restoring the data in VPS format on to a new VPS node now.  This restore will run overnight.  In the meantime if you have provisioned a new VPS and\or pointed your domains elsewhere during this outage please let us know by sending an email to our Support team (or replying to any ticket you may already have open) and we will be sure to help you get your data on to your new VPS where possible, or pointed back to the restored data.

Update 24th July 19:11: The restore has been running since last night and has been very slow due to the large number of small files involved.

the following VPS are affected:

78.153.211.72
78.153.208.216
78.153.211.82
78.153.211.85
78.153.211.90
78.153.211.97
78.153.209.218
78.153.210.170
78.153.211.100
78.153.211.105
78.153.211.106
78.153.211.108
78.153.211.111
78.153.211.114
78.153.211.122
78.153.210.184
78.153.211.128
78.153.211.109
78.153.211.139
78.153.211.140
78.153.211.142
78.153.211.143
78.153.209.206
78.153.208.56
78.153.208.144
78.153.208.201
78.153.208.220
78.153.208.232
78.153.211.169
78.153.211.179
78.153.211.181
78.153.208.31
78.153.211.214
78.153.208.215
78.153.209.18
78.153.209.127
78.153.209.195
78.153.210.45
78.153.210.219
78.153.210.238
78.153.211.11

UPDATE 25/07/11 - 9:54AM: We have fully recovered data back from all VPSs. The data is in good condition however we are unable to boot the old VPSs currently. We are trying to boot them from previous backups.

If you would like us to generate a new VPS for you and dump the data from your previous VPS into a folder on your new one we can do that easily. Please email support@blacknight.com if you would like us to do this.

In the interim we will continue to try boot the old restores.

UPDATE 26/7/11 - 10:30AM: We've worked  through the night to try and restore all the VPS to their former state. However we can now say with some certainty that this is not going to be possible and I'll explain why. Virtuozzo doesn't use a traditional image based file system like Xen, KVM, HyperV, Vmware etc. It has a template based system that that has a master template for each OS type. e.g. CentOS, when the VPS gets installed most of the system binaries etc are symlinks which link to this template. When the restore took place it didn't understand these symlinks and so it was unable to restore them. As a result of this the VPS can't boot because most of the operating system is missing.

Symlinks are pointers which look to most applications like a real file but on the file system level they're pointers to the real file. When a file that is symlinked gets updated, say you upgrade apache via yum or apt the symlink gets replaced with the real file.

It's not possible to fix these vps servers right now and we believe our efforts are better spent restoring your data in a new VPS and helping you to get everything back online. We do have Parallels working with us to try restore the automatic backups that we perform each night but we haven't had any success with this yet. If this proves to be fruitful we'll let everyone know.

All customer data, including modified files, web pages, email, databases, log files etc is restoreable and we've been creating new VPS for all customers and putting the old data in them.

To re-assure everyone at this stage, we're going to do better backup checking in future, we will perform weekly and daily test restores of all VPS hardware nodes to ensure that a) the backups are working properly and b) that the restored data is working as we would expect. We will also be most likely discontinuing this product line. By this I mean we're going to simply replace it with a better product, one that will be more flexible and will actually operate like a real server. e.g. Xen, KVM or HyperV. The new product won't have any single points of failure and will be "Cloud" based, i.e. it'll be clusters of hardware nodes and no single VM will live on a single raid array, rather it'll be stored on our new cloud storage platform.

UPDATE 26/7/11 - 11:30AM: Some customers have asked how we are providing them with their data. Basically we'll give you a new VPS and drop the restored data into a folder for you. If you haven't already contacted support please do so immediately. We are re-creating VPS servers on request of customers and we will place your old data in /restored/ on the VPS once it is created. Also if you are on ubuntu 9.x or lower we'll create the new vps using 10.04 LTS.

pemvzlin01 reboot Wednesday July 20th @ 08:00

TrackBacks (0) Comments (0)
Summary: In order to complete the diagnosis of a provisioning problem which should increase the performance of the VPS platform overall we're working with Parallels and they've asked us to upgrade this node to the latest virtuozzo version. We've done this already and the node needs to reboot to complete it.

Tomorrow morning at 08:00 we'll reboot this node and the follow Virtual Private Servers will reboot:

vps-1007204-554.cp.blacknight.com
vps-1007518-564.cp.blacknight.com
vps-1009247-613.cp.blacknight.com
vps-1010280-631.cp.blacknight.com
vps-1011752-666.cp.blacknight.com
vps-1011869-673.cp.blacknight.com
vps-1014079-738.cp.blacknight.com
vps-1015387-764.cp.blacknight.com
vps-1017302-815.cp.blacknight.com
vps-1017526-822.cp.blacknight.com
vps-1020039-878.cp.blacknight.com

We will post an update once the reboot is complete and all the Virtual Private Servers are back online. We estimate that it might take around 1 hour to complete as the /vz partition will require a disk check on boot.

Update: 08:29: All the VPS servers are back online and they were only down for a couple of minutes.

slow database access

TrackBacks (0) Comments (0)
Summary: Customers have reported issues of slow database access to us. We're investigating this at present. It doesn't appear to be a problem on the mysql servers themselves or a network issue. It may well be a problem with DNS. We'll post further updates when we have more information.

Update: 09:40: We believe that we had found a fix for this and we notified customers of same however we're still working on this issue. It is most certainly a DNS problem. It appears to be related to the stateful firewall on the client DNS servers.

Update: 09:51:
This issue is resolved for the moment. We're reviewing a number of factors that may have caused this issue. It _is_ DNS related but we haven't found the exact cause just yet.

server reboot: pemlinweb49 and pemlinweb50

TrackBacks (0) Comments (0)
Summary: At 14:35 we're rebooting the hardware node that these two nodes are on as the hardware node is having some issues with memory which is causing instability.

It shouldn't take too long.

Update 15:00:
These two nodes were down until 14:42.

phone call quality issues Thursday July 14th

TrackBacks (0) Comments (0)
Summary: We've been noticing call quality issues this morning both in and out bound via our normal 059 numbers. This is due to a problem within our phone providers network. We're seeing approximately 25% packet loss to them. We've notified them of the problem and they're working on it at the moment.

In the mean time if you contact us via our 1850 (929929) number you won't notice call quality issues as thats routed via another company.

Update: 14:00: We had a call just before lunch to from our telco provider and they've re-routed their traffic within their network and our call quality is now back to normal.

Power Failure - Main Carlow Offices

TrackBacks (0) Comments (0)
We are currently experiencing a power outage in our main Carlow offices. This is affecting our telephone system which handles our main support and telephone lines. This will also affect our business email within the company.

Our engineers are working to resolve this ASAP and we'll keep this blog post updated.

Update: 11:40: We have restored power to the rack in question and we're working to get all our infrastructure back up and running. Right now phones are operational and have been since around 11:20.

Update: 14:00: We're fully back in our Carlow office. We've had our electrician get us up and running and early next week we're going to swap out some of the existing power infrastructure in order to make it more redundant.

Tasks stuck in cp.blacknight.com

TrackBacks (0) Comments (0)
There is a current backlog of control panel and system tasks in our cp.blacknight.com system.  Our Developers are working on this at the moment.

This would affect any changes, additions, or modifications made in cp.blacknight.com, including:

# Adding, removing, or changing the hosting on domains or subdomains
# Changing any nameservers or DNS records
# Adding, removing, or making changes on email accounts or FTP accounts
# Changing any passwords in cp.blacknight.com
# Installing, removing, or modifying any Application Vault applications

We hope to have this resolved as soon as possible.

Update 16:00 - The control panel vendors have resolved the stuck tasks causing this issue and the system is working through the queue at the moment.  Most tasks should be completed at this stage.

PEMLINWEB49/50 Unresponsive

TrackBacks (0) Comments (0)
Two linux shared hosting webspace servers have become unresponsive. The details of the servers are:

pemlinweb49.blacknight.com 78.153.214.40
pemlinweb50.blacknight.com 78.153.214.41

Our engineering team are working to get the nodes back online ASAP.

UPDATE 13:49 - This issue has been resolved.

Windows Shared Hosting - Palamedes

TrackBacks (0) Comments (0)
Our windows shared hosting server Palamedes was compromised. The attackers created both index.html and index.asp files at the document root of every website on the server since 8PM on Saturday evening.

We've removed and restored the majority of the offending files however some are still in the process of being removed.

UPDATE 11:01AM - All offending files have been removed now.


Display issues within the cp.blacknight.com control panel

TrackBacks (0) Comments (0)
After a minor update to the control panel this morning there are some bugs in the control panel display.  The main issues are the Application Vault and Domain Management links no longer show.

We are working with the control panel vendors to get this resolved as soon as possible.

Update 15:00 - The control panel vendors have restored the Application Vault link.  They are still working on the other display issues (lack of Domain Overview\Domain Management link, and changes to the All Domains > domain name page).

Billing / Store down for 60 minutes.

TrackBacks (0) Comments (0)
Summary: We've identified a problem within our billing system that is causing issues today. It requires us to perform some actions on certain Database tables in order to fix the issue. So we'll be turning off the store and the billing system for about 1 hour from 12:10 until 13:10 today.

Update 1316

The developers are still working on resolving this issue

UPDATE 1441
The store and billing system has been stable for the last few minutes but our staff are keeping an eye on it. All services should be accessible and normal.

EU Registry Maintenance July 6 2011

TrackBacks (0) Comments (0)
Eurid, the .eu domain name registry, will be conducting maintenance on July 6th. They're doing a test of their business continuity plans.

The window starts at 10 am CEST and services will be interrupted for short periods of time throughout the day

Services affected:
  • whois lookups
  • new registrations
  • updates

Services NOT affected
  • domain name resolution

UPDATE 6th July 1200

Eurid inform us that this exercise is complete and that they are currently running from their Amsterdam data centre

DotCat Registry Emergency Maintenance

TrackBacks (0) Comments (0)
We have been informed that the .cat registry will be conducting emergency maintenance later this morning

The maintenance will begin at 1200 CEST and should last about 10 minutes.