When monitoring the availability of services between networked clients and servers, it is important to ensure a correct and timely response between those devices, for example to meet service level agreements (SLAs).
This is often referred to as “end-to-end” service management and encompasses the need to monitor applications, servers and interconnecting networks. For each of these components, decisions must be made whether to use specially deployed agents (software or hardware) to conduct the monitoring, or whether sufficient data can be gathered by other means. There are advantages and disadvantages to both agent-based technologies and their alternatives. This article explores the issues associated with the decision of which to use, and suggests where in the service model each technology should be employed.
Advantages of Agent Based Monitoring
Agents can monitor the status (availability and performance) of applications, servers, and network components in significantly more depth than generic management tools, since they are able to gather data through application-specific interfaces, exercise the full application functionality, and perform localised aggregation and summarisation of high volume metrics for example. This data can be used, in conjunction with information obtained using traditional methods such as ping, SNMP polling, trap decoding, and sysLog message analysis, to enhance the overall visibility of the network and end-to-end services.
Agents have the ability to check local and remote devices, applications and services. With local monitoring the agent checks the status of the device upon which it resides, an application running on that device, or something ‘nearby’. Remote monitoring agents can be used to provide a true end-to-end perspective i.e. tracking the availability and performance of remote applications from local clients.
Advances in networking technologies, particularly fault tolerant, dynamic (policy-based) routing make prediction of end-to-end path availability and characteristics exceedingly difficult. This is exacerbated when only a limited part of the network is visible – e.g. across WAN links, within tunneling protocols, etc. The only reliable way to accurately measure true end-to-end characteristics is to measure the traffic “in-band” – i.e. generate traffic which is indistinguishable from genuine application data or by monitoring real application traffic flows (and correlating the endpoints).
When assessing service availability, conventional techniques such as testing for connectivity to specific ports is often inadequate; for example, the fact that a client is unable to connect to port 80 on a remote web server definitely means that the web-service is not available. The fact that another client can connect to port 80 does not necessarily mean that the full web server functionality is available – all that can be concluded is that it might be available. To conclusively determine whether or not a service is available, one must exercise it in the same manner as a client – i.e. request a web-page from the server and validate the results, request a DHCP assigned IP address from a DHCP server and validate the results, perform a database query, etc.
Another advantage of agent-based solutions is that, in the event of network outage (where agents are temporarily unreachable from the management station), agents can continue to collect data and transfer it to the management solution when connectivity is restored.
Disadvantages of Deploying Agents
Agents tend to be expensive; to assess the true cost of employing an agent based solution, one should look at the all of the associated expenses – initial purchase, additional hardware, rack space, operating system licenses, maintenance (configuration, patches, upgrades), training and integration.
Agents usually place additional load on the network, servers and applications. The majority of agents work by generating synthetic transactions (i.e. they request services from a remote server, then validate and characterise the responses). For these agents, consideration should be given to additional bandwidth requirements, increased connection concurrency, and increased server load, especially in a highly meshed environment with a large number of agents. Agents which monitor real application flows generate no additional loading.
Not all applications and services can be monitored using agents. While suitable agents exist for generic applications and services (web-services, mail services, database services, etc.) agents will not be available for bespoke (or custom) applications. It may be possible to procure custom agents for these but at significant additional risk and cost, and at the sacrifice of flexibility at upgrade or update time.
One of the biggest concerns when deploying agent-based solutions is that of scalability. While using agents between small numbers of clients and servers is readily achievable, deploying, managing (or administering), and monitoring any-to-any connectivity with large numbers of clients and servers rapidly becomes untenable. The problem is even more acute when considering network infrastructure devices for which the number of possible connection paths is vast. This can be mitigated by the use of agents which monitor real application flows as and when they occur. When examining scale, one needs to account for the number of agents required (based on the number of paths to be monitored), the number of agents that each device can support and then provision the hardware and software accordingly.
Types Of Agents
There are several different types of agents, varying in where they are deployed and how they function; these are addressed below.
Hardware agents – Hardware agents are typically network “appliances” feeding to, or queried from remote management stations. (e.g. Fluke OptiView Link Analyser, nMon nBox, Packeteer PacketSeeker and Niksun NetDetector)
Software Agents – Software agents are specially developed applications which monitor server and application “health”, performance and availability. They range from those that monitor remote applications (by watching live traffic or using synthetic transactions) to those that monitor local application status and local server health. (e.g. Ixia IxChariot, NetIQ Vivinet and Concord SystemEdge)
Intrinsic/Infrastructure Agents – Intrinsic agents are already present within an organisation’s existing infrastructure (network devices, servers’ operating systems and applications) (e.g. Cisco SAA, ping MIB, RMON, SMON, Netflow, OS SNMP agents and LFAP). They can provide a wealth of information to a network management system (NMS) covering physical assets, configuration, performance, faults, status and traffic flows (typically via SNMP) and are heavily used by most NMS’s. Examples of SNMP based NMS include Entuity Eye of the Storm (EYE), HP Openview, and CA Concord. Additional information can be gathered from OS logs – e.g. sysLog messages which can be pulled by the NMS (via SNMP, FTP, telnet, etc.) or pushed by the device (as sysLog messages, SNMP traps, Netflow records, etc.).
Active Monitoring – Agents which actively monitor services using ‘synthetic’ transactions to simulate client requests create additional load on servers and the network. When configuring such agents it is important to ensure that the synthetic transactions match those used by real clients in order that the measurements are meaningful, such as matching QoS parameters, routing parameters, size and type of transactions. It should be noted that it might not always be possible to replicate “real” transactions (for example, replicating bulk updates to a large financial database on a regular and frequent basis would be undesirable).