Stephen Fritz on Systems Engineering: Nagios/Icinga Exchange 2010 Performance Monitoring

This article describes Exchange 2010 Nagios/Icinga Templates that monitor various Exchange Roles: Common Services, Mailbox, Hub Transport and Client Access Servers. While built specifically for those systems, it is likely the Templates are also compatible with other Exchange versions.

For those with Nagios/Icinga and Windows experience, the counters used are available from the Monitoring Exchange page.

Windows Performance Monitor

Windows Server ships with a excellent monitoring and trend analysis tool: Performance Monitor. As illustrated below, it allows administrators to select and graph counters that include list of system metrics. These measurements may also be saved as delimited text files for future analysis and visualization. A centralized server may connect to other servers to remotely collect data. Since the API is well-documented, it is integrated into other value-added systems monitoring software.

Selecting Windows Server Performance Monitor Counters

The illustration below depicts the text format of a Windows Performance Monitor counter. Commands conforming to this syntax may be sent from remote monitoring servers whose applications comply with the Windows API.

Displaying Windows Server Performance Monitor Syntax

Exchange 2010 Management Console

The Exchange Management Console (EMC) provides another set of tools to monitor and diagnose Exchange performance. Many of the screens are overviews that offer sets of Powershell-scripted tools to manage the enterprise.

The EMC Toolbox offers a set of more detailed management and monitoring tools such as Message and Queue Tracking, Log Viewers and preconfigured Performance Monitor and Troubleshooter screens.

These tools are valuable and (for the most part) provide a centralized console for enterprise management. However, Nagios/Icinga is centralized and can collect information that may be displayed in customizable formats. Nagios/Icinga also automates may of the visualization and presentation tasks in a far more flexible way.

Prerequisites

A thorough knowledge of Nagios/Icinga installation and configuration is necessary. The fundamental knowledge to Nagios/Icinga is presented in the following articles:
Nagios/Icinga Installation and Initial Configuration
Nagios/Icinga Database Integration
Nagios/Icinga Configuration Files
Nagios / Icinga Configuration with NConf - a Graphical Web / Database Tool
Nagios / Icinga Performance Data Graphing with PNP4Nagios

Information specific to Windows Server enterprise monitoring is presented in these two articles:

Nagios/Icinga Templates for Windows 2008 R2 OS, Domain Controllers, DNS and IIS Servers
Automated Nagios/Icinga NSClient++ Deployment and Configuration for Windows Enterprises

Overview of Exchange 2010 Roles

Exchange Server 2010 allows the administrator to install different roles.

Mailbox Role (Mandatory)
Hub Transport Role (Mandatory)
Client Access Server Role (Mandatory)
Unified Messaging Server Role (Optional)
Edge Transport Role (Optional and Unique)

The first three roles must be present in the enterprise. They may be deployed on a single server or deployed individually or in combination on multiple servers. The Unified Messaging role is optional; the Edge Transport Role is also optional and (if installed) no other roles may be present. A more detailed description of Exchange 2010 Roles is available in the article Exchange 2010 Architecture.

The templates for each Role/Role Service/Feature set monitor the availability of services using both NRPE Windows service checks to query the OS and check_tcp or check_udp plugins to query the NIC TCP/UDP For instance, you may use NRPE to query the "Microsoft Exchange IMAP Service" and the check_tcp plugin to check the associated TCP services on ports 143 (IMAP) and 993 (IMAPS) to determine if the IMAP Mailbox Access service is available.

There is still another set of definitions for Templates -- information important for day-to-day monitoring and information important for troubleshooting, trend analysis and scalability design. The second set includes primarily Performance Counters. Thus, for each set of Exchange data collected, there is a standard set of data including Application service checks, TCP service checks and Performance Counters and a second, more detailed set that includes primarily Performance Counters. Flowing from the examples above, the "Exchange 2010 Client Access Server" Template would provide important day-to-day triggers and performance information while a second, "Exchange 2010 Client Access Server Performance Counters" provides highly detailed data for troubleshooting, trend analysis, etc.

Description of Monitored Exchange 2010 Services and Counters

Counter Overview

Nagios/Icinga use Check Commands and Service Checks to monitor Windows. They use an NRPE Windows Services check to determine if all automatically started and disabled services are in their proper state. They use NRPE Windows Performance Monitoring Counter checks to monitor various aspects of the Exchange Server Roles. Finally, they use TCP Service Checks to determine if services are available to the network.

Baseline Windows Server Service Checks are available for download from the Monitoring Exchange site. These checks were gathered from a variety of Microsoft sources and provide details of basic Processor, Memory, Disk and Network utilization statistics that are fundamental to systems monitoring.

Each Exchange Server Role (Mailbox, Hub Transport and Client Access) have their own set of templates for both standard, day-to-day monitoring and more complete Performance Monitoring for detailed performance, trend and capacity planning analyses. These are available at:
Exchange Common Services
Exchange Mailbox Services
Exchange Hub Transport Services
Exchange Client Access Services

Host Group Assignments

The counters are intended to be associated with Host Groups for deployment. Each Host is then assigned to the appropriate Host Group, at which point the counters become active. These include:

exchange-common-servers
exchange-common-servers-perfmon
exchange-mailbox-servers
exchange-mailbox-servers-perfmon
exchange-hubtransport-servers
exchange-hubtransport-servers-perfmon
exchange-cas-servers
exchange-cas-servers-perfmon

Again, once created, applying the checks now requires no more than adding the host to the appropriate groups.

Service Group Assignments

Service Groups provide a logical grouping of hosts that each monitor specified Service Checks. They are created by assigning specified Service Checks to the Service Group, after which Nagios/Icinga will include the appropriate Hosts. This will necessarily include overlapping Hosts and Host Groups within defined Service Groups because Exchange services are not necessarily unique to each Role. For instance, monitoring Outlook Web Access (OWA) requires monitoring web-based services. While we associate OWA with Client Access Servers, some of these services are common to all Exchange Roles and (in fact) necessary for end-to-end OWA conectivity throughout the Exchange Enterprise; web-based services must be functioning on Hub Transport and Mailbox Role Servers for OWA to properly work. This is a good thing; OWA will fail if the web services on Mailbox and Hub Transport Servers are not working correctly and such a definition of the Service Group provides a complete view of the required counters.

PNP4Nagios and NagVis Integration

Web-based monitoring using the default applications is helpful, however additional visualizations provide an easier to understand overview and facilitate more rapid troubleshooting.

PNP4Nagios is an add-on that provides performance graphing. While useful for identifying immediate issues (such as overloaded mail queues), it is even more so for identifying issues before they happen -- trend analysis. For instance, you check mail queue performance, disk and network activity to identify the specific Hub Transport and Mailbox servers that are approaching excessive mail message capacity. The need for added capacity may thus be identified in advance so that a planned upgrade specified and implemented before system performance degrades.

NagVis is another visualization add-on that provides a map-like or diagram-like interface. The selection of information is limited only by that available in Nagios/Icinga, ranging from generalized overviews to highly-detailed (Service Check specific) information. The examples provided show Host and Service Groups status for the Enterprise.

Example

The video below depicts a Nagios/Icinga Exchange Enterprise monitoring in operation. Also included are examples of PNP4Nagios and NagVis.

Stephen Fritz on Systems Engineering

Search This Blog

Labels

Tuesday, August 12, 2014

Nagios/Icinga Exchange 2010 Performance Monitoring