Agent-based or Agent-less Network Monitoring

By | June 18, 2006

Another factor to consider is uniformity in the response of the service and the importance this could have on the selection of active monitoring agents. If, for example, a server returns corrupt data once in every million transactions and an agent is configured to perform a test transaction every minute, then it is unlikely to encounter and report the error. Whereas, if a client is performing a thousand transactions a second then the error occur will occur every fifteen minutes, but never be reported by the agent.

For highly meshed connectivity, thought should be given as to which pairs of endpoints should be used for monitoring. Typically the possible number of connections is far higher than the number of strategically significant connections, and it is unnecessary to monitor every possible connection (for passive monitoring this is not an issue). Active agents include Ixia IxChariot, NetIQ Vivinet, Concord SystemEdge, and Cisco Service Assurance Agent (SAA). The protocols for communicating between the agents and the NMS are often proprietary, but are occasionally standards based – e.g. Cisco’s SAA can be configured and polled using SNMP (as performed by Entuity’s EYE and CA’s Concord)

Passive Monitoring – Passive agents monitor real time client-server transactions, examining the behavior of the server by observing communication between client and server. There are many advantages to this technique – no additional application/server/network load, fewer meshing/scaling issues, reliable detection of intermittent and infrequent errors, etc. Passive monitoring is far less prevalent than active monitoring due to the difficulty of ‘snooping’ real traffic flows and extracting useful data. Coping with throughput is often problematic since analysing large numbers of transactions and large volumes of traffic (e.g. at line-speed) can be difficult and often only achievable using dedicated hardware or by analysing a subset of the data flows.

Passive monitoring is effective for availability and performance recording, but not for fault alerting as it is not known prior to a transaction being attempted whether or not the service is available. In this case, the first indication of a problem with the service is the failure of a real client-server transaction. Examples of passive monitoring agents include Psytechnics, Cisco netflow, and Nmon Nbox (as per actively monitoring agents, passive agents either communicate with an NMS via proprietary or standards based protocols (e.g. SNMP).

When And Where Are Agents Required?

For monitoring the network’s interconnecting clients and servers, device centered monitoring is usually sufficient (i.e. monitoring each device, port, virtual circuit endpoint, etc.). With a suitable NMS, these objects can be logically grouped to provide reasonable indication of end-end connectivity between the client and server switch ports. Clearly this cannot take account of traffic flows failing due to access list violations, QoS restrictions and routing changes, but in the majority of cases this proves sufficient. It is certainly adequate for non-critical service monitoring. Two simple extensions to this composite service model are to configure the NMS to check for server reachability (e.g. ping) and for application availability (check for connectivity to specific ports on a server). While neither of these is a guarantee of application availability from a specific client, they add sufficient extra value that they are normally included in such a configuration.

There are circumstances when the use of agents is required:

Service Level Agreements are usually based on the provision of a service to a customer and verifying that the service is available from some other location (i.e. the NMS) is not sufficient.

Certain applications have particularly stringent requirements of a network and the measurement of bulk traffic characteristics is inadequate. One such application is Voice over IP (VoIP) for which inferences as to call quality may be made from key network transport metrics, but for accurate call quality measurement voice-aware agents are required.

Highly redundant configurations or networks, where dynamic routing protocols are extensively used, make the construction (within the NMS) of composite services on the basis of the availability and performance of specific links and devices impractical and unreliable. In these situations, agent based in-band measurements provide significantly more accurate results.

Business critical services mandate the use of agents to provide as much data about a service, in as timely a manner as possible.

If data gathering must continue even when network connectivity to the NMS is lost, agents will be required (as required for SLA monitoring and billing).

If agents are required, then factors affecting the choice of agent are:

Will infrastructure agents suffice? Modern infrastructure agents are capable of providing a growing range of application-level monitoring on a periodic basis, producing alerts when user-specified conditions are breached. For example, Cisco’s Service Assurance Agent (SAA) can monitor DHCP, DLSW, DNS, HTTP, and FTP servers and provide basic statistical analysis and data aggregation.

If network, bandwidth, router CPU/memory or server resources are fully utilised, then passive agents should be deployed to avoid additional burden.

If advance notification of availability/performance issues is required then active agents should be used.

Conclusions

Agent-based monitoring is showing a resurgence in popularity. Historically, agents were used to augment the minimal monitoring provided by early hubs, switches and routers. As instrumentation in network devices improved, agent-based monitoring became less important, especially since increased processing power and the migration to hardware switching made enabling such options more viable. Recently, however, with the advent of more demanding/sensitive applications (such as voice and video over the network) and more commercial dependence on service availability (eCommerce, email, CRM systems, ePOS, etc.) the need for high quality and high availability services has emerged. This, in turn, has lead to the need for agent based monitoring solutions for key business systems.

While an increasing number of environments now suggest or mandate the use of agents, this does not necessarily mean purchase and management of large quantities of additional hardware and software. Most modern network infrastructure devices, servers and clients provide some form of embedded agents which can be remotely queried (or configured). The sophistication of these agents, and the quality of the data they can produce varies widely, but these are often sufficient to mitigate additional CAPEX and OPEX when used in conjunction with an appropriate NMS.

Agents can provide very high quality data in very large volumes but consideration should always be given as to whether or not agents are really necessary (or will some cheaper alternative suffice), and what is to be done with the data – without some automated means of processing the data, it will provide little additional benefit in maintaining a high quality, high availability network.

For the lower layers of the ISO model, application flow aware agents are not normally necessary however, as one moves up the layers of the ISO model, agents become increasingly relevant. In general, a combination of agent-less monitoring for layers 2 and 3, with agent-based monitoring at layer 4 and above, coupled with other sources of data such as SNMP traps and sysLog message parsing provides solutions for managing the majority of enterprise networks and monitoring their end-to-end SLAs.

NMS’s with open architectures allow for the integration of additional agents as new protocols and applications are deployed and the requirements change.

Leave a Reply