This week’s guest blog post is by Bruce Kosbab. Bruce is CTO at Fluke Networks – Visual. If you haven’t already, check out the real user reviews of Visual TruView here on IT Central Station.
If you are a network manager you have likely faced two conflicting business directives when it comes to managing your network: 1) ensuring that you are delivering the optimal end-user experience with your network, and 2) reducing the operational cost of your network.
The need to ensure adequate end-user-experience puts constant pressures on IT to increase bandwidth in order to provide an effective service to the business, while cost management requires that bandwidth is limited, or even reduced. So, how can you manage these conflicting pressures?
Frequently, in situations in which there are persistent performance problems with an application the initial reaction is to throw bandwidth at the problem. However, often times you can substantially improve the end-user experience and reduce operational costs merely by using the bandwidth you already have more efficiently.
Throwing bandwidth at an application performance problem may be right answer, but this solution is not immediate. Ordering new circuits can take anywhere from 30 to 90 days to deploy. And, gathering data to understand the true bandwidth usage of a link can be time-consuming and error-prone.
- Which network links need the most attention?
- Is the bandwidth being used for business purposes?
- Can I downsize a link while maintaining business service quality?
- How can I demonstrate that an increase in capacity is warranted?
There are three common approaches used to manage network capacity:
- Long-range views of average utilization – shows a long-term trend of utilization, but traffic spikes and even brief periods of congestion are hidden by the highly aggregated averages
- Peak utilization – shows the days in a month that had a busy minute but doesn’t give insight into the amount of time during which time a link is congested
- Traffic usage totals – does not give any indication of congestion except in extreme cases
None of these approaches provides adequate information to make informed decisions.
The Problem With Utilization
Let’s look at an example in which average utilization is used. In this example, I’ve chosen a short time-frame, but it demonstrates the problem with average utilization.
The average utilization over the selected time period, Oct 9 through Oct 14, is approximately 45 percent. By looking at this information one could assume that bandwidth congestion on this interface is not an issue. If we were to use network utilization as a yardstick for bandwidth capacity planning then this interface would most likely not appear on our radar.
Data aggregation is at the core of the problem in using utilization for capacity planning. The utilization values in the above chart are aggregated into 2-hour intervals. This means that each point in the report represents an average over a 2-hour timeframe. This aggregation has a smoothing effect on the data that masks high-congestion periods.
To demonstrate the smoothing effect, let’s zoom in on 60-minute timeframe within the time period shown in the above chart. On the afternoon of Oct 11th the utilization peaks at around 80 percent, which is not evident in a 5-day view of the data.
This level of granularity is what is needed to truly understand the network utilization. The problem is that getting this fine level of granularity over a month or a year is not feasible because it requires a vast amount of data to be stored and displayed in order for the real utilization to be visually and quantitatively apparent.
There is a Better Way
There is a technique for analyzing network utilization, which Fluke Networks’ products use. It provides more actionable and accurate decision-making information. We call this data Network Burst data. Burst Utilization indicates the amount of time interface utilization is greater than specified thresholds.
By using Burst Utilization you can determine how long the congestion of a link exceeded 80 percent utilization or other utilization thresholds. With this type of information you can make decisions on whether to upsize (or downsize) a link or whether to investigate how the link is being used.
The advantage of Burst Utilization is that if link congestion levels can reported based on 1-minute granularity regardless of the reported time frame, a day, a month, a year, without loss of information fidelity. Contrast this with using average utilization over 15, 30, or 60-minute time ranges, which dampen the utilization trend, and make accurate capacity planning decision very difficult if not impossible.
Network managers typically want to begin to keep an eye on a particular interface when it spends more than 10 percent of time above 80 percent utilization. This translates to a little more than a half day out of a typical workweek. When the utilization burst reaches 20 percent time spent at the 80 percent threshold, i.e. a full work-day, then it may be time to either upgrade the link or investigate how it’s being used.
The chart shown above lists the interfaces being monitored and their respective burst data. The color breakdown indicates, for each interface, the time spent over 30 percent utilization (yellow), 60 percent utilization (orange), and 80 percent utilization (red). The interface listed first is obviously in trouble. It is running above at >80 percent utilization all of the time.
Gathering Data to Make a Decision
If we look at Burst Utilization for a given interface we can determine, at a glance, which days of the week and hours of the date are most congested.
And then, we can investigate whether the bandwidth is being used for business purposes or for recreational use.
In the chart shown below, it appears that most of the bandwidth is consumed by legitimate business applications. The 1755/TCP application bears further investigation though.
When all of this information shown above is accessible in a single solution and in one place in that solution, making bandwidth-sizing decisions can be quick and easy.
Call To Action
The goal of managing network bandwidth is not to report on the utilization of a link over time, but rather to ensure that you are buying the right amount of bandwidth to meet the needs of the business.
Please let me know how you perform bandwidth management:
- Is WAN capacity management part of your standard process?
- What tools do you use?
- What are your biggest challenges in managing bandwidth?
Also, please take a look at Visual TruView from Fluke Networks. That solution can help you with your network capacity management chores, plus it can help you understand whether network congestion issues are indeed causing application performance issues.
Don’t throw bandwidth at the problem. Make informed decisions with the right data.