Troubleshooting application performance problems can be tricky. Perhaps your application performance has become rather sluggish, or maybe you've just realized that you have no idea what kind of data is actually traveling through your network. Either way, you need tools that not only let you peek at network traffic and data but also let you perform analysis and troubleshooting. Fortunately, you can choose from a wide range of tools and tool suites that deliver just the monitoring you seek.
At the highest level are performance-monitoring tools to track LAN and WAN traffic loads as well as availability and response times for specific server services, such as email, authentication, and database. Such tools let you establish your network's baseline performance and can give you early warning of potential problems or identify network bottlenecks. But sometimes what you really want is to be able to study the content and characteristics of your network traffic at the lowest level: the packets themselves as they flow between your IBM i and other devices. You can buy tools for any of these needs, either as task-specific utilities or complete Network Management System (NMS) consoles encompassing dozens of monitoring functions. Because these tools sport graphical interfaces, they run on desktop workstations (Windows, Mac, or Unix), rather than directly on your IBM i.
This Buyer's Guide [3] helps you select among a myriad of products that have capabilities in all these realms. To best use the guide, you'll need to know what kind of tools are out there and what they're best suited to accomplish. Then you'll be ready to go shopping.
Beyond measuring and tracking performance, you'll want some kind of notification mechanism to alert you to detected problems. At a minimum you'll want to be notified when critical devices and services go down and back up again, and virtually all products offer such notifications in the form of email alerts and/or various kinds of paging (cellular SMS, dial-up modem TAP paging, etc). More sophisticated tools can alert on threshold crossings, such as excessively high (or low) CPU, disk, or network activity. Some even have built-in duty rosters to specify which IT staff member should be notified at various times on specific days.
Data-monitoring products, or packet sniffers, examine the entire contents of individual packets, giving you the power to monitor the data traversing your network at the content level. For example, you can keep an eye on FTP, HTTP, and SNMP packets to reveal inappropriate URLs or email messages involving those particular protocols.
Your System i has a built-in packet sniffer, the communications trace (CMNTRC) tool [4], but it only captures packets sent or received directly from your IBM i. Often, you need to capture packets between other network components, such as a Windows desktop and another server. In these cases, you need a more generalized capture tool that connects to your network through an Ethernet switch monitoring port (sometimes called a mirroring port) or using a dedicated hardware network tap (sometimes called a "three-car garage" for the three ports—in, out, and monitoring—that they sport). When you're shopping for a packet sniffer, check out the granularity of the tool's reach and get a feel for the types of information the tool can discern from the captured data. Most give you statistics on low-level traffic, and some can decode packets down to the bit level.
Flow monitoring (also known as statistical monitoring), by contrast, examines streams of data in the network. Instead of examining each packet's detailed contents, a statistical monitor looks only at the source and destination IP addresses, and source and destination port numbers; each unique address/port combination is considered as a single flow. Note that flows aren't the same thing as TCP/IP sessions – a single flow could encompass many TCP/IP sessions as long as the source and destination port numbers don't change.
Flow analysis can not only show you traffic trends (e.g., peak usage, bottlenecks) per protocol and device, but can also expose vulnerabilities and even ongoing attacks. Using flow analysis, you might see many packets bombarding your network at once, indicating some kind of internal misconfiguration or malicious activity. A packet sniffer might obscure that kind of problem by presenting too much detail. With a traffic-flow monitoring solution, you can also identify network abusers—so-called "top talkers."
Network flow tools have two components. The network flow generator looks at every packet and identifies flows. The flow generator sends statistics to a flow collector, which stores them in a database and generates various kinds of graphs and tables so you can analyze them. The most common protocol used by flow generators and collectors is Netflow, a proprietary but freely usable standard developed by Cisco. If you have a Cisco router in your network, you can configure it as a Netflow generator by pointing it at your chosen network monitoring tool, which then acts as a collector. Some flow analysis tools have built-in Netflow or sFlow software agents, and you can purchase dedicated hardware flow generator appliances, such as Nmon's nBox.
Netflow collectors can bog down when traffic levels exceed a few hundred megabits per second. To accommodate higher flow rates—up to 10 Gbps—a new open industry standard called sFlow was developed. sFlow samples traffic at fixed intervals, making it somewhat less accurate than Netflow, but usually provides adequate information about the most active flows in your network.
Think of Netflow and sFlow as reporting technologies—not specifically as troubleshooting technologies. Embedded in your monitoring infrastructure, these tools are best used to get an idea of traffic flow, usage patterns, and highly used applications in the environment. They don't let you look at the data itself and perform low-level troubleshooting. The best network-monitoring tactic uses both packet sniffing and statistical monitoring approaches.
Another network traffic analysis feature you might have come across in your network gear is Remote Monitoring (RMON). RMON uses Ethernet switches and routers to collect protocol-level statistics, but because it doesn't pay attention to flows, it turns out not to be not very useful, and it's now considered defunct. We don't look at RMON features or products at all in this Buyer's Guide.
For small to mid-sized networks, SaaS makes a lot of sense. For a small monthly fee per device or service monitored, you get immediate network monitoring with lots of features with virtually no capital investment or IT staff time expenditure. The downside is that, over time, SaaS may cost more than purchasing a dedicated NMS, particularly for larger networks. SaaS offerings also by nature have somewhat lower reporting granularity than a dedicated NMS—typically five minutes or more, as opposed to granularity of little as ten seconds for in-house monitoring.
You'll see increasing support for 10GbE bandwidth. Maybe you don't need it today, but making the investment in that future 10GbE visibility is worth considering. Finally, retrospective analysis is gaining popularity in the market, letting you funnel data to a disk array and perform retrospective troubleshooting—a new mainstay of the industry.
Links:
[1] http://systeminetwork.com/author/jason-bovberg
[2] http://systeminetwork.com/author/mel-beckman
[3] http://systeminetwork.com/files/64059table.xls
[4] http://systeminetwork.com/article/identify-network-problems-communications-trace
[5] http://systeminetwork.com/files/64059table.xls