We experienced some issues with one of our mysql nodes at 19:35. Some users may have seen connection timeout errors on their database connections for a peroid of 3/5 minutes.
The affected node was: mysql71.cp.blacknight.com
This has been resolved now and normal service is restored.
Due to a hardware upgrade we are migrating the MySQL node listed below to new hardware this evening. All details are listed below
When?
The downtime will occur tonight at 22:00 hours, 27th of January 2010
What will be offline?
Any databases located on these servers:
mysql360.cp.blacknight.com
mysql360int.cp.blacknight.com
For how long?
The downtime will last an hour.
We will update this post once completed.
Update 23:00
This update completed at 22:24. This mysql server node is now live on a new hw node and is already showing better performance for customers using it.
Due an issue with Installatron, we have to upgrade the version of PHP on Ragnell to 5.2.12. This shouldn't result in any downtime, but it is possible. Any downtime should be no more than a minute or two.
UPDATE 18:11
This upgrade has been completed and the issue with Installatron is now resolved.
Summary: At 3:35 pm we started experiencing routing issues in one of our facilities. I.e. InterXion. We're working as fast as possible to resolve this routing issue and we hope to have it backup ASAP.
Update: 15:50 Any connectivity coming in via INEX LAN2 and Cogent was unavailable. This issue affects AS39122 in it's entirety, i.e. not InterXion only. Both DEG and InterXion customers are affected due to the nature of this issue.
Update: 15:58 The issue has been resolved. The cause and the fix will be documented and we'll publish a full RFO when it's available.
Update: 16:05 Services affected by this issue were: Colo/Dedicated networks in Deg and InterXion, Shared hosting lans 81.17.252.0/22, 81.17.248.0/22, Control panel access to cp.blacknight.com, Hosted Exchange services, Qmail mail cluster, Transit customers in DEG and InterXion
We are currently seeing some template issues on the server PEMVZWIN04. We are working with our software vendor to resolve this asap.
A reboot will be required.. Estimated downtime for customer VPSs is 15 minutes.
This blog post will be kept updated during the process.
Affected VPSs:
78.153.209.65 VPS-544
78.153.209.175 VPS-880
78.153.210.23 VPS-893
78.153.210.27 VPS-896
78.153.210.28 VPS
78.153.210.30 VPS-899
78.153.210.18 VPS-904
78.153.210.42 VPS-910
78.153.210.53 VPS-921
78.153.210.54 VPS-922
78.153.209.95 VPS-940
78.153.210.69 VPS-958
78.153.209.253 VPS-960
78.153.210.70 VPS-961
78.153.210.76 VPS-975
78.153.210.80 VPS-977
78.153.210.88 VPS-988
78.153.210.46 VPS
78.153.210.121 VPS-1030
78.153.210.126 VPS-1035
78.153.210.132 VPS-1048
Update: All VEs are now back online.
Summary: Tomorrow morning starting at 04:00 our billing system bound within the "billing" tab of cp.blacknight.com is going to be upgraded by 2 minor revisions.
Details: During the upgrade our billing system will be unavailable. This means that domain and hosting purchases, upgrades, licence upgrades and domain contact, name server management will all be unavailable. The upgrade shouldn't take longer than about 2 hours so it will be finished by 6am.
During this window we'll be installing a new build of our own registrar and payment plugins. The payment plugin has no updates but the registrar plugin has some fixes that we've been waiting to deploy.
We are scheduling a reboot of the VPS Hardware node PEMVZWIN04. This is a routine procedure and will involve the reboot of customer VPSs (listed below) on the hardware node. If your VPS is listed below, it will be offline for a period of no more than 10 minutes.
The maintenance will take place on the 20th of January at 22:00.
This blog post will be updated once completed.
Affect VPSs:
78.153.209.65 VPS-544
78.153.209.175 VPS-880
78.153.210.23 VPS-893
78.153.210.27 VPS-896
78.153.210.28 VPS
78.153.210.30 VPS-899
78.153.210.18 VPS-904
78.153.210.42 VPS-910
78.153.210.53 VPS-921
78.153.210.54 VPS-922
78.153.209.95 VPS-940
78.153.210.69 VPS-958
78.153.209.253 VPS-960
78.153.210.70 VPS-961
78.153.210.76 VPS-975
78.153.210.80 VPS-977
78.153.210.88 VPS-988
78.153.210.46 VPS
78.153.210.121 VPS-1030
78.153.210.126 VPS-1035
78.153.210.132 VPS-1048
Due to the increase amount of traffic to the shared linux servers
located on the hardware node, PEMVZMPS15, we will be increasing it's
performance potential with a CPU upgrade.
The affected services are:
81.17.254.72 pemlinweb13.blacknight.com
81.17.254.73 pemlinweb14.blacknight.com
The
time frame for the upgrade will commence at 06:30 on the 21st of January 2010 and should last no longer than 15 minutes of downtime.
We will update this blog post once completed.
This is now completed.
Summary: There's a http ddos inbound to the above named shared hosting server. We've mitigated the attack already however it is still on-going.
ETA to full resolution: unknown
Summary: mail.blacknight.com and it's associated services are running slower than normal today due to increased activity from end users. This increase in activity is not part of the normal pattern of use so we're trying to pinpoint the cause.
Services affected: smtp, imap and pop3.
ETA For a fix: Approx 1 hour
Update: Jan 18th @ 12:41 pm
We've switched storage appliances on the back end of the mail cluster. E-mail is out of sync by 1 day but it's currently synchronising. As a result of this mail auth, and general mail delivery, connection times have decreased dramatically.
You may get some duplicated e-mail, but this was preferable over e-mail being down for several hours while we fight with the current storage platform.
Verisign will be conducting routine maintenance on the .com and .net production environments on Sunday, January 24 2010 from 0100 to 0145
During this period no new registrations, updates or whois will be available.
Existing domain names will continue to function as normal
Nominet, who run the co.uk domain registry, will be conducting maintenance on Wednesday January 20th 2010 from 0700 to 0800.
During this time period we will not be able to process any new registrations or updates.
Existing co.uk domain names will continue to resolve as normal
A disk within the RAID array on the hardware node PEMVZMPS28 has failed. We are scheduling some emergency maintenance to resolve this as soon as possible.
The window for the maintenance will begin at 7:00AM on the 15th of January.
The estimated downtime is no longer than 20 minutes.
The affected nodes are as follows:
81.17.254.64 pemlinweb25.blacknight.com
81.17.254.67 pemlinweb26.blacknight.com
This post will be updated once completed.
Update: This has been completed
Update 10:37: There is currently an issue with the data on these two nodes. When the drive was replaced the wrong drive was removed. This resulted in the dead drive springing back to live this morning.
The drive died on the 10th of January and as a result all data has been reverted back to this date. Our engineers are on route to replace the drive with the one from 8AM this morning. We hope to have this resolved within the next 2 hours.
Update 11:39: This server is being brought offline now to replace the harddrive.
Update 11:54: All services have been restored to the correct disk.
Note: As the faulty hard drive was in the server between the hours of 08:00AM and 11:39AM any changes made to your files on this disk will be lost. We apologies for any inconvenience this may have caused.
At 8:08AM the hardware node, PEMVZMPS16 was rebooted. This was done due to essential maintenance that was needed.
During the essential maintenance you would have noticed the following nodes being offline for a period of 5 minutes:
81.17.254.74 pemlinweb15.blacknight.com
81.17.254.75 pemlinweb16.blacknight.com
All services are now resumed and operating as normal.
We have been informed that the .info (dotInfo) registry will be conducting maintenance on January 9th 2010 between 1500 and 1900 UTC
During this period no new registrations, updates or whois lookups will be available.
Existing .info domain names will not be affected