Recently in network Category

Firewall Upgrades

TrackBacks (0) Comments (0)
In order to add more redundancy into the network, we've moving certain segments of the network over to their own dedicated pairs of firewalls.

The following services will be affected:
  • Miniumus, Medius and Maximus Windows and Linux shared hosted
  • DirectAdmin shared hosting
  • Windows Helm
  • cp.blacknight.com
  • Hosted Exchange
In each case, there should be minimal downtime as we're just shutting down the interface on the old firewalls and bringing it up on the new firewalls.

Update: 23:52

The above work is mostly completed. Due to time constraints we didn't complete it, however cp.blacknight.com and all our minimus/medius and maximus hosting packages have all been moved to the new firewall infrastructure.

During another maintenance window we'll complete this work, this current one is now closed.

Network Latency issues

TrackBacks (0) Comments (0)

We are currently experiencing high latency on our shared firewall infrastructure.

 

Update: This issue has now been resolved.

Network Connectivity

TrackBacks (0) Comments (0)
We were experiencing issues with a network switch in on of our cabinets in our Interxion facility.

This issue has been fully resolved and all network connectivity has been restored.

Here is full timeline of the events that occurred yesterday:

18:30 - An attack begun against one of our shared linux web serves, pemlinweb04.blacknight.com. This was not picked up with our monitors are first due to the low volume of attack traffic

19.45 - The attack began to increase, sending huge amounts of UDP traffic to pemlinweb04.blacknight.com. At this point alerts began to filter through and the on-call engineer was alerted.

20.00 - The issue became apparent that because of the sheer volume of UDP traffic the network switch located in the rack where pemlinweb04 was located was exhausting itself and dropping traffic to all devices connected in the rack being facilitated by that switch.


20.15 - The engineers disabled the site the traffic was being directed to and began null routing the sources the traffic was coming from.

20.25 - A noticeable drop in the traffic began showing causing the switch to resume normal operations. 

20:40 - All services were returned to normal operation.

Windows VPS Nodes
78.153.208.163
78.153.210.20
78.153.210.160
78.153.210.184
78.153.209.166
78.153.210.203
78.153.210.81
78.153.208.70
78.153.208.162

Shared Windows Hosting Nodes:
pemwinweb07
pemwinweb08
pemwinweb09
pemwinweb10

Shared Linux Hosting Nodes
pemlinweb01
pemlinweb04
pemlinweb05
pemlinweb07
pemlinweb19
pemlinweb20
pemlinweb21
pemlinweb22
pemlinweb23
pemlinweb24


MySQL DB Nodes
mysql71.cp.blacknight.com
 

Core Network Switch Upgrade And Firewall Move

TrackBacks (0) Comments (0)
On the 11th of June we will be completing the upgrade of the core access switches in Interxion. As part of this, we need to physically move the current shared firewalls in Interxion to a different position within the rack. As they are a HA pair, it should be possible to do this without affecting connectivity.

The time line will be as follows:

02:00 Fail-over all the traffic to the second firewall in the pair. Once we're sure traffic is flowing through the second firewall, power down, move it to it's new position in the rack and recable. Power it back up and ensure that it's working as expected.

02:30 Repeat the procedure with the second firewall in the pair.

03:00 Install the new switches, and start swapping over from the current switches. There should be minimal downtime involved with this as each customer and rack switch has redundant connectivity back to the core.

The maintenance window will end at 06:00 once we're sure that every thing is back up and running as expected.

UPDATE Jun 12th 06:00 This maintenance window has now completed. Unfortunately not everything was completed, but the main work of moving the firewalls and getting the new switches in has been done. The few bit remaining will be completed during a future maintenance window.

Network Connectivity

TrackBacks (0) Comments (0)
We are currently experiencing some network connectivity issues at our Interxion facility.

Our engineers are working to resolve this asap.

Update: 05:40

The following timelines detail the events of tonight.

03:00 Switch swap maintenance begins. Engineer decides that he can't proceed and attempts to carry out some non intrusive maintenance on access router 2 (Hot Standby Router for Customers on unfirewalled VLANs, BGP customers and customers with HA firewall setups)

03:10 access router 1 reboots and traffic to the above mentioned VLANs goes down.

03:20 the on-call engineer calls the engineer doing the maintenance informing him of an issue

03:22 onsite engineer begins investigation on access router 1 over it's console cable.

03:29 access router 1 is power cycled

03:30 access router 1 returns to service.

03:45 - 04:25 access router 1 was down again due to human error. During the investigation of access router 1's problems the onsite engineer was using the same console cable he had been using on access router 2. The engineer then proceeded to work on access router 1 as if it was access router 2 and this is what caused the down time. It took until 04:00 to realise the mistake and a further 25 minutes to undo what had been done. Unfortunately the use of the rollback command in JunOS wasn't used in this case which would have put the system back online in under 60 seconds. In future as part of our maintenance policy we'll do a forced rollback in the event of any issues and ensure that all engineering staff are up to date on both JunOS and IOS procedures for rolling back config changes.

Core Access Switch Upgrade

TrackBacks (0) Comments (0)
The core access switches in Interxion are being upgraded to new hardware next Wednesday at 3 in the morning. There should be no disruption of service during this upgrade as all racks have multiple links back to the core of the network, however it is possible that there will be slight blips in connectivity as the switches figure out their new paths back to the core.

UPDATE: Unfortunately, due to unforeseen issues with cabling, we were unable to complete this upgrade. While attempting to sort out the offending cabling, some parts of the network were rendered inaccessible for a period of time.

We're going to reschedule this upgrade for some night next week.

Router Upgrades

TrackBacks (0) Comments (0)
On the morning of Saturday the 15th we're going to be upgrading the core routers distribution routers in Interxion. While the downtime should be minimum, there is likely to blips in connectivity during the upgrade.

The new Cisco routers are already in place and hooked up into the network parallel with the current Junipers where possible. However in order to make them the primary distribution routers there is extra configuration and cabling required.

While the network should work around any work that's being done, there is likely to be short outages as routes converge. These outages are likely to affect servers in both DEG and Interxion.

The work will be done between 03:00 and 06:00 on Saturday May 15th.

UPDATE: This work has been completed.

Dedicated / Co-Location Network Switch Upgrade

TrackBacks (0) Comments (0)
We are planning an upgrade of a switch that is utilized by some of our dedicated and co-location customers. The plan includes upgrading the 24 port switch to a new 48 port switch.

The upgrade will take place on Wednesday the 11th of November at 22:00 hours.

All dedicated and co-location clients with equipment in the cabinet shall be notified separately via email to ensure they are fully up to speed on the matter.

The new switch will be mounted and connected and then the customers simply migrated over one by one to the new switch. There should be a blip of no more than 5/10 seconds per port.

This upgrade will allow for expansion in the future.

Once the upgrade is complete I will post a status update.

Thank you for your patience and understanding.

Update 10:16PM - The switch migration was completed. The downtime per port was aprox 10 seconds.

Network blip in InterXion this morning

TrackBacks (0) Comments (0)
Summary: At 09:04 we had a 30 second blip in InterXion. This happened during the process of bringing a series of new switches online. Ordinarily this operation is non service affecting however for some reason it caused a problem on a number of vlans.

All service was restored on or before 09:05 and the case is closed.

Network DDoS Attack

TrackBacks (0) Comments (6)
We are currently experiencing a network DDoS attack. We are working to block the attack asap and will update this post with regular status updates.

Update 11.05: The situation is currently back to normal operations, the issue has been blackholed and the routers are showing normal loads again. The engineering team is monitoring the network closely.

Update 11.41 The issue has been tracked back to one of our peers causing floods of traffic on our transit routers. Engineers are currently blacklisting the errorous traffic.

Update 11.49 The network has stabilized again. We are monitoring it closely.