June 2010 Archives

Network Latency issues

TrackBacks (0) Comments (0)

We are currently experiencing high latency on our shared firewall infrastructure.

 

Update: This issue has now been resolved.

Shared Hosting Linux - Ector

TrackBacks (0) Comments (0)
We are currently experiencing issues with our old shared hosting server, Ector.

Our engineers are working on resolving this shortly and this post will be updated as soon as possible.

UPDATE 12:49PM - This issue has now been resolved.


Legacy Shared Hosting Server - Galahad

TrackBacks (0) Comments (0)

This server requires some unscheduled downtime to apply some Windows Updates. This will take place at 8am tomorrow (Monday 28/06) - downtime is expected to be under 1 hour.

 

Update: 09:14 - The update is taking a little longer than expected. We are in the last stages of the update, and hope to be back up and running in the next 10-15 minutes

Update: 09:30 -  The update was successfully applied, and services have returned to normal.

Linux VPS Hardware Node - PEMVZLIN06

TrackBacks (0) Comments (0)
We are currently experiencing issues with the hardware node PEMVZLIN06.

Our engineers are working on this issue currently and will have it resolved asap.

UPDATE 10:57AM: The hardware node is now back online. We are now waiting on the VEs to start up.

UPDATE 11:36AM: All VPSs are now online

Linux VPS Hardware Node - PEMVZLIN12

TrackBacks (0) Comments (0)
We are currently experiencing issues with a linux VPS hardware node PEMVZLIN12.

Our engineers are looking into this currently and will resolve asap:

Affected VPSs:

78.153.209.216
78.153.208.72
78.153.210.253
78.153.209.13
78.153.209.49
78.153.209.81
78.153.209.114
78.153.209.178
78.153.209.219
78.153.210.12
78.153.210.54
78.153.210.101
78.153.210.113
78.153.210.161
78.153.210.173
78.153.209.180
78.153.210.197
78.153.210.254
78.153.211.23
78.153.211.27
78.153.209.150
78.153.211.30
78.153.211.47
78.153.211.51
78.153.211.56
78.153.211.61
78.153.211.64

UPDATE 5:33AM - The server is back online - We are just waiting on the VPSs themselves to boot now.
UPDATE 5:48AM - All VPSs are back online - This issue is now fully resolved.

Shared Hosting Linux - Gorlois

TrackBacks (0) Comments (0)
We are currently experiencing issues with our shared hosting linux server Gorlois.

Are engineers are looking into this currently and will resolve asap.

UPDATE 14:21: This issue is now resolved

pemvzlin06 down

TrackBacks (0) Comments (1)
Summary: pemvzlin06 is currently down, we're working on it via it's KVM and we hope to have it back up shortly.

Update 09:33 : the machine is backup and running, containers are starting. This will take some time as each individual VPS is doing a diskcheck.

Update 11:03: the raid controller has marked all file systems as read only due a second disk failure in this machine. As it's Raid 10 this isn't a problem for data integrity however it's been marked as read only due to multiple disk failures.

We're going to need to take the machine offline for examination. ETA for a fix is approx 2 hours and 30 minutes, i.e. 13:30 or there abouts.

Update 12:40: the disk has been replaced and the raid array is rebuilding. As this process can take a long time, we're going to let it run for as long as possible.We understand that people want access to their e-mail and websites etc. However data integrity is our main concern right now.

Update 14:30: All the 25 VPS servers on this node are back online. A number had some issues with quota's that we're going to fix after the raid array is fully rebuilt. We'll contact those customers separately at a later date.

Network Connectivity

TrackBacks (0) Comments (0)
We were experiencing issues with a network switch in on of our cabinets in our Interxion facility.

This issue has been fully resolved and all network connectivity has been restored.

Here is full timeline of the events that occurred yesterday:

18:30 - An attack begun against one of our shared linux web serves, pemlinweb04.blacknight.com. This was not picked up with our monitors are first due to the low volume of attack traffic

19.45 - The attack began to increase, sending huge amounts of UDP traffic to pemlinweb04.blacknight.com. At this point alerts began to filter through and the on-call engineer was alerted.

20.00 - The issue became apparent that because of the sheer volume of UDP traffic the network switch located in the rack where pemlinweb04 was located was exhausting itself and dropping traffic to all devices connected in the rack being facilitated by that switch.


20.15 - The engineers disabled the site the traffic was being directed to and began null routing the sources the traffic was coming from.

20.25 - A noticeable drop in the traffic began showing causing the switch to resume normal operations. 

20:40 - All services were returned to normal operation.

Windows VPS Nodes
78.153.208.163
78.153.210.20
78.153.210.160
78.153.210.184
78.153.209.166
78.153.210.203
78.153.210.81
78.153.208.70
78.153.208.162

Shared Windows Hosting Nodes:
pemwinweb07
pemwinweb08
pemwinweb09
pemwinweb10

Shared Linux Hosting Nodes
pemlinweb01
pemlinweb04
pemlinweb05
pemlinweb07
pemlinweb19
pemlinweb20
pemlinweb21
pemlinweb22
pemlinweb23
pemlinweb24


MySQL DB Nodes
mysql71.cp.blacknight.com
 

MySQL Database Issues: mysql71.cp.blacknight.com

TrackBacks (0) Comments (1)
We were experiencing issues with the MySQL database node: mysql71.cp.blacknight.com.

This would have caused issues for users trying to connect to their databases on this node for a period of 15/20 minutes.

The engineers have looked into and resolved this issue fully now.

We are monitoring this node closely.

Linux Shared Hosting Disk Replacement - Gorlois

TrackBacks (0) Comments (0)
A disk in the RAID array has failed belonging to the shard hosting server, gorlois.blacknight.ie

We would like to replace this disk asap and thus will be scheduling downtime for it on Wednesday the 23rd of June 2010.

The downtime window will be from 21:00 to 22:00 to ensure there are no issues but it should be completed alot quicker than that.

We will update this blog post during the maintenance window with more information as it becomes available.

PEMVZWIN04 - Windows VPS Hardware Node

TrackBacks (0) Comments (0)
We are currently experiencing issues with the Windows VPS Hardware Node PEMVZWIN04. Our engineers are currently in the middle of resolving this issue and will respond with an update once one is available.

Update 15:30: This issue is now resolved

Emergency maintenance for pemwinweb11 June 17th at 22:00

TrackBacks (0) Comments (0)
Summary: In order to ensure that services on this node are maintained to the highest standards we need to migrate this machine to new hardware. This will take several hours and sites hosted on it will be down during the migration.

When: Thursday June 17th at 22:00 until approx 04:00 Friday June 18th

Update: 00:30 Friday June 18th

The machine is back up after approx 1 of downtime only. It's safely on it's new hardware and performance should be better than before. This maintenance window is consider completed.

What: pemlinweb11 is being moved to new hardware. The following webspaces will be affected:

105765
105797
105890
105905
105964
105980
105989
106020
106081
106088
106122
106155
106172
106187
106249
106266
106341
106351
106411
106484
106500
106505
106611
106622
106635
106683
106709
106716
106769
106790
106831
106854
106859
106902
106956
106961
106966
106984
106997
107011
107021
107022
107027
107032
107036
107041
107048
107052
107062
107072
107085
107093
107106
107119
107123
107129
107138
107144
107159
107164
107170
107179
107195
107200
107206
107217
107227
107237
107243
107256
107265
107270
107276
107291
107307
107316
107328
107336
107341
107353
107369
107375
107391
107398
107410
107423
107429
107437
107443
107450
107462
107477
107479
107484
107498
107507
107514
107525
107534
107542
107557
107568
107576
107585
107609
107613
107620
107628
107645
107660
107673
107680
107690
107710
107719
107729
107734
107756
107765
107781
107791
107795
107813
107823
107832
107847
107860
107886
107903
107909
107924
107939
107972
107994
107999
108009
108035
108063
108099
108123
108144
108180
108192
108226
108232
108245
108287
108316
108350
108374
108379
108400
108428
108447
108475
108490
108512
108540
108564
108572
108584
108612
108628
108647
108658
108683
108694
108702
108727
108738
108750
108779
108812
108830
108835
108909
108955
108978
109005
109029
109085
109099
109130
109155
109165
109166
109199
109210
109239
109268
109283
109296
109309
109330
109344
109350
109352
109379
109389
109424
109438
109454
109464
109501
109519
109533
109568
109576
109587
109596
109618
109637
109655
109667
109688
109701
109718
109745
109759
109769
109782
109809
109818
109877
109886
109934
109958
109982
110001
110004
110019
110036
110084
110101
110139
110158
110176
110199
110201
110211
110224
110252
110274
110290
110309
110343
110352
110376
110399
110402
110437
110464
110483
110513
110530
110557
110588
110598
110616
110640
110701
110712
110751
110762
110766
110781
110818
110850
110856
110869
110902
110914
110946
110966
111007
111025
111065
111120
111136
111162
111173
111174
111194
111214
111236
111237
111255
111279
111300
111320
111329
111389
111405
111416
111437
111455
111483
111518
111547
111562
111582
111585
111606
111637
111643
111653
111665
111701
111704
111726
111768
111792
111793
111860
111867
111877
111890
111898
111915
111940
111960
111999
112024
112049
112084
112092
112101
112121
112141
112163
112170
112208
112236

Further updates will be posted here during the maintenance window.

PEMVZLIN16 - Linux VPS Hardware Node Reboot

TrackBacks (0) Comments (0)
We need to install some virtuozzo updates on this node which will require the node to be rebooted.

The maintenance will occur tonight at 21:00 - 14/06/2010

The downtime is expected to be no more than 10 minutes.

The affected nodes are:

78.153.211.72   vps-1088048-1575.cp.blacknight.com
78.153.208.216  vps-1088878-1580.cp.blacknight.com
78.153.211.82   vps-1089504-1583.cp.blacknight.com
78.153.211.85   vpn1.eselabs.ie                 
78.153.211.86   vps-262-1587.cp.blacknight.com  
78.153.211.89   tjbdesigns.com                  
78.153.211.90   vps-1091097-1592.cp.blacknight.com
78.153.211.93   vps-1091486-1595.cp.blacknight.com
78.153.211.95   vps-1091674-1597.cp.blacknight.com
78.153.211.97   vps-1091785-1598.cp.blacknight.com
78.153.209.218  vps-1092050-1601.cp.blacknight.com
78.153.210.170  vps-1091805-1602.cp.blacknight.com
78.153.210.79   vps-1092131-1603.cp.blacknight.com
78.153.211.100  vps-1092399-1606.cp.blacknight.com
78.153.211.104  vps-1092809-1610.cp.blacknight.com
78.153.211.107  vps-1093022-1612.cp.blacknight.com

We will update this blog post once completed.

UPDATE 21:33: This was completed between 21:22 and 21:31

Eurid - Possible Delays In Updates

Comments (0)

Eurid are signing their zone tomorrow, so the zone will not be dynamic for about two hours - between 1000 and 1200 UTC.

During this time period there may be some slight delays in DNS updates, though the delay is expected to be negligible.

IE Domain Order Issues

TrackBacks (0) Comments (2)
Due to a software issue, which is currently being worked on, we are currently not accepting orders for IE domain names

UPDATE 1458
This issue has now been resolved and IE orders should now be working as normal





ns2.blacknightsolutions.com issues

TrackBacks (0) Comments (0)

We are experiencing problems with one of the Name Servers for our legacy shared-hosting services.

 

These should not be service effecting.

Intermittent MySQL Issues Affecting One Node

TrackBacks (0) Comments (0)

Our technical team have been working on an issue with one of the MySQL server nodes which has been giving intermittent issues over the last few hours.

You can check the current server status here

Update @ 22:00 : This server has been stable for several hours at this stage, but we're continuing to keep an eye on it.

Linux VPS Hardware Node Reboot

TrackBacks (0) Comments (0)
We are scheduling a reboot of the Linux VPS Hardware Node: PEMVZLIN12

The downtime will occur on 03/06/2010 at 21:00

Estimate downtime per VPS is no longer than 10 minutes.

The affected VPSs are:

78.153.209.216  vps-183-1342.cp.blacknight.com 
78.153.208.72   vps-184-1350.cp.blacknight.com 
78.153.210.253  hosting.manalog.net            
78.153.209.13   be.perews.com                  
78.153.209.49   completely.bonkers.ie          
78.153.209.81   vps-1074340-1419.cp.blacknight.com
78.153.209.114  int.brimbrothers.com           
78.153.209.178  vps-1075535-1430.cp.blacknight.com
78.153.209.219  vps-1075589-1435.cp.blacknight.com
78.153.210.12   vps-1075962-1439.cp.blacknight.com
78.153.210.54   vps-1076472-1442.cp.blacknight.com
78.153.210.101  vps-1077078-1447.cp.blacknight.com
78.153.210.113  vps-1077886-1453.cp.blacknight.com
78.153.210.161  vps-1078138-1458.cp.blacknight.com
78.153.210.173  vps-1076912-1460.cp.blacknight.com
78.153.209.180  vps-1077847-1465.cp.blacknight.com
78.153.210.197  vps-1080296-1469.cp.blacknight.com
78.153.210.254  vps-1081102-1477.cp.blacknight.com
78.153.211.23   vps-1083452-1507.cp.blacknight.com
78.153.211.27   vps-259-1511.blacknight.com    
78.153.209.150  vps-1084389-1518.cp.blacknight.com
78.153.211.30   vps-1085007-1530.cp.blacknight.com
78.153.211.47   vps-1085765-1536.cp.blacknight.com
78.153.211.51   vps-1086258-1548.cp.blacknight.com
78.153.211.56   vps-1086646-1552.cp.blacknight.com
78.153.211.61   vps-1087057-1565.cp.blacknight.com
78.153.211.64   vps-1087820-1570.cp.blacknight.com

We will update this blog post once completed.


UPDATE: This has now been completed.

Core Network Switch Upgrade And Firewall Move

TrackBacks (0) Comments (0)
On the 11th of June we will be completing the upgrade of the core access switches in Interxion. As part of this, we need to physically move the current shared firewalls in Interxion to a different position within the rack. As they are a HA pair, it should be possible to do this without affecting connectivity.

The time line will be as follows:

02:00 Fail-over all the traffic to the second firewall in the pair. Once we're sure traffic is flowing through the second firewall, power down, move it to it's new position in the rack and recable. Power it back up and ensure that it's working as expected.

02:30 Repeat the procedure with the second firewall in the pair.

03:00 Install the new switches, and start swapping over from the current switches. There should be minimal downtime involved with this as each customer and rack switch has redundant connectivity back to the core.

The maintenance window will end at 06:00 once we're sure that every thing is back up and running as expected.

UPDATE Jun 12th 06:00 This maintenance window has now completed. Unfortunately not everything was completed, but the main work of moving the firewalls and getting the new switches in has been done. The few bit remaining will be completed during a future maintenance window.

Priamus PHP Upgrade

TrackBacks (0) Comments (0)
We are upgrading PHP on of our DirectAdmin hosts, Priamus, in order to add libraries requested by customers.

There's likely to be a small bit of downtime during the upgrade as the extra modules are compiled and Apache restarts, but this should be minimal.

UPDATE 18:10: This has been completed without issue.

Network Connectivity

TrackBacks (0) Comments (0)
We are currently experiencing some network connectivity issues at our Interxion facility.

Our engineers are working to resolve this asap.

Update: 05:40

The following timelines detail the events of tonight.

03:00 Switch swap maintenance begins. Engineer decides that he can't proceed and attempts to carry out some non intrusive maintenance on access router 2 (Hot Standby Router for Customers on unfirewalled VLANs, BGP customers and customers with HA firewall setups)

03:10 access router 1 reboots and traffic to the above mentioned VLANs goes down.

03:20 the on-call engineer calls the engineer doing the maintenance informing him of an issue

03:22 onsite engineer begins investigation on access router 1 over it's console cable.

03:29 access router 1 is power cycled

03:30 access router 1 returns to service.

03:45 - 04:25 access router 1 was down again due to human error. During the investigation of access router 1's problems the onsite engineer was using the same console cable he had been using on access router 2. The engineer then proceeded to work on access router 1 as if it was access router 2 and this is what caused the down time. It took until 04:00 to realise the mistake and a further 25 minutes to undo what had been done. Unfortunately the use of the rollback command in JunOS wasn't used in this case which would have put the system back online in under 60 seconds. In future as part of our maintenance policy we'll do a forced rollback in the event of any issues and ensure that all engineering staff are up to date on both JunOS and IOS procedures for rolling back config changes.