
The Network is Down: Round Up the Usual Suspects
By Roger Boivin
Network downtime happens, and as your network grows, it seems to happen with increased frequency. After a while, you become familiar with what’s causing these downtimes and when it happens, you “round up the usual suspects” that you are familiar with based on historical data and then, spend a whole lot of time and effort (seems like forever) determining the root cause and fixing the problem. Outages cause a whole list of costly problems. If your network is down for longer than a few minutes, you can be facing angry bosses and angry customers often leading to loss of revenue. David Large and James Farmer, in Broadband Cable Access Networks, 2009 Network Availability-Failure Related, state that “Given a network failure rate, the next important parameter is the percentage of time that the network (or a specific service) is available for use.” The general equation for availability (A) is:
MTBF is the mean time between failures and MTTR is the mean time to restore service when failures occur. MTBF is governed by the reliability of network elements and how they are interconnected, however MTTR is both an operational and a network design issue.
Network-monitoring capabilities, however, are part of network design and control how soon an outage is known to network operators and can substantially improve network availability.
What can I do to improve my network availability?
Thankfully, rather than waiting for the next failure to occur and THEN doing something about it, there is a better way. Imagine if you could have a holistic view of your network, all its elements and connections and create a time-series database to establish a base of activities as normal. Then imagine when something happens outside this “normal network”, you are automatically notified so you can take action before the issue becomes service affecting.
Imagine further if an anomaly occurs, rather than raising yet another alarm, (we have enough of these, don’t we?) The alarm kicks off a workflow that does most of the manual investigation for you and then let’s you know the most probable cause and the best solution to implement.
How does that help?
Not only would you catch the “usual suspects” but you would be warned of pending network element failures, zero-day attacks and hacking attempts whether internal or external BEFORE your network is affected.
How does it work?
Far too often, network management is a reactive discipline. You can get ahead of that by taking advantage of the latest packet brokers that provide better network visibility and control by capturing and analyzing 100% of the packets traversing your network.
I’m not suggesting a “rip and replace” in your network infrastructure, but a careful analysis of new and emerging network visibility technologies – real time analytics enhanced with machine learning. This can help your organization achieve its business goals by ensuring a high Quality of Experience (QoE) for users of your network.
Contact Cirries at [email protected] for more details on new network visibility tools.