Network Connectivity

TrackBacks (0) Comments (0)

Notification Type

Technical Information

Service Affecting

Yes

Message

We are currently experiencing some network connectivity issues at our Interxion facility.

Our engineers are working to resolve this asap.

Update: 05:40

The following timelines detail the events of tonight.

03:00 Switch swap maintenance begins. Engineer decides that he can't proceed and attempts to carry out some non intrusive maintenance on access router 2 (Hot Standby Router for Customers on unfirewalled VLANs, BGP customers and customers with HA firewall setups)

03:10 access router 1 reboots and traffic to the above mentioned VLANs goes down.

03:20 the on-call engineer calls the engineer doing the maintenance informing him of an issue

03:22 onsite engineer begins investigation on access router 1 over it's console cable.

03:29 access router 1 is power cycled

03:30 access router 1 returns to service.

03:45 - 04:25 access router 1 was down again due to human error. During the investigation of access router 1's problems the onsite engineer was using the same console cable he had been using on access router 2. The engineer then proceeded to work on access router 1 as if it was access router 2 and this is what caused the down time. It took until 04:00 to realise the mistake and a further 25 minutes to undo what had been done. Unfortunately the use of the rollback command in JunOS wasn't used in this case which would have put the system back online in under 60 seconds. In future as part of our maintenance policy we'll do a forced rollback in the event of any issues and ensure that all engineering staff are up to date on both JunOS and IOS procedures for rolling back config changes.

0 TrackBacks

Listed below are links to blogs that reference this entry: Network Connectivity.

TrackBack URL for this entry: http://www.blacknightstatus.com/cgi-bin/mt/mt-tb.cgi/339

Leave a comment