Search This Blog

Tuesday, August 12, 2014

Nagios/Icinga Exchange 2010 Performance Monitoring

This article describes Exchange 2010 Nagios/Icinga Templates that monitor various Exchange Roles:  Common Services, Mailbox, Hub Transport and Client Access Servers.  While built specifically for those systems, it is likely  the Templates are also compatible with other Exchange versions.

For those with Nagios/Icinga and Windows experience, the counters used are available from the Monitoring Exchange page.

Elements of an Exchange Enterprise

Windows Performance Monitor

Windows Server ships with a excellent monitoring and trend analysis tool: Performance Monitor.  As illustrated below, it allows administrators to select and graph counters that include list of system metrics.  These measurements may also be saved as delimited text files for future analysis and visualization.  A centralized server may connect to other servers to remotely collect data.  Since the API is well-documented, it is integrated into other value-added systems monitoring software.

Windows Server Performance Monitor

Selecting Windows Server Performance Monitor Counters

The illustration below depicts the text format of a Windows Performance Monitor counter.  Commands conforming to this syntax may be sent from remote monitoring servers whose applications comply with the Windows API.
Displaying Windows Server Performance Monitor Syntax

Exchange 2010 Management Console

The Exchange Management Console (EMC) provides another set of tools to monitor and diagnose Exchange performance.  Many of the screens are overviews that offer sets of Powershell-scripted tools to manage the enterprise.

The EMC Toolbox offers a set of more detailed management and monitoring tools such as Message and Queue Tracking, Log Viewers and preconfigured Performance Monitor and Troubleshooter screens.
These tools are valuable and (for the most part) provide a centralized console for enterprise management.  However, Nagios/Icinga is centralized and can collect information that may be displayed in customizable formats.  Nagios/Icinga also automates may of the visualization and presentation tasks in a far more flexible way.


A thorough knowledge of Nagios/Icinga installation and configuration is necessary.  The fundamental knowledge to Nagios/Icinga is presented in the following articles:
Nagios/Icinga Installation and Initial Configuration
Nagios/Icinga Database Integration
Nagios/Icinga Configuration Files
Nagios / Icinga Configuration with NConf - a Graphical Web / Database Tool

Nagios / Icinga Performance Data Graphing with PNP4Nagios

Information specific to Windows Server enterprise monitoring is presented in these two articles:

Nagios/Icinga Templates for Windows 2008 R2 OS, Domain Controllers, DNS and IIS Servers 

Automated Nagios/Icinga NSClient++ Deployment and Configuration for Windows Enterprises

Overview of Exchange 2010 Roles

Exchange Server 2010 allows the administrator to install different roles.
  1. Mailbox Role (Mandatory)
  2. Hub Transport Role (Mandatory)
  3. Client Access Server Role (Mandatory)
  4. Unified Messaging Server Role (Optional)
  5. Edge Transport Role (Optional and Unique)
The first three roles must be present in the enterprise. They may be deployed on a single server or deployed individually or in combination on multiple servers.  The Unified Messaging role is optional; the Edge Transport Role is also optional and (if installed) no other roles may be present.  A more detailed description of Exchange 2010 Roles is available in the article Exchange 2010 Architecture.

The templates for each Role/Role Service/Feature set monitor the availability of services using both NRPE Windows service checks to query the OS and check_tcp or check_udp plugins to query the NIC TCP/UDP  For instance, you may use NRPE to query the "Microsoft Exchange IMAP Service" and the check_tcp plugin to check the associated TCP services on ports 143 (IMAP) and 993 (IMAPS) to determine if the IMAP Mailbox Access service is available.

There is still another set of definitions for Templates -- information important for day-to-day monitoring and information important for troubleshooting, trend analysis and scalability design.  The second set includes primarily Performance Counters.  Thus, for each set of Exchange data collected, there is a standard set of data including Application service checks, TCP service checks and Performance Counters and a second, more detailed set that includes primarily Performance Counters.  Flowing from the examples above, the "Exchange 2010 Client Access Server" Template would provide important day-to-day triggers and performance information while a second, "Exchange 2010 Client Access Server  Performance Counters" provides highly detailed data for troubleshooting, trend analysis, etc.

Description of Monitored Exchange 2010 Services and Counters

Counter Overview

Nagios/Icinga use Check Commands and Service Checks to monitor Windows.  They use an NRPE Windows Services check to determine if all automatically started and disabled services are in their proper state. They use NRPE Windows Performance Monitoring Counter checks to monitor various aspects of the Exchange Server Roles.  Finally, they use TCP Service Checks to determine if services are available to the network.

Baseline Windows Server Service Checks are available for download from the Monitoring Exchange site. These checks were gathered from a variety of Microsoft sources and provide details of basic Processor, Memory, Disk and Network utilization statistics that are fundamental to systems monitoring. 

Each Exchange Server Role (Mailbox, Hub Transport and Client Access) have their own set of templates for both standard, day-to-day monitoring and more complete Performance Monitoring for detailed performance, trend and capacity planning analyses.  These are available at:
Exchange Common Services 
Exchange Mailbox Services 
Exchange Hub Transport Services 
Exchange Client Access Services

Host Group Assignments

The counters are intended to be associated with Host Groups for deployment.  Each Host is then assigned to the appropriate Host Group, at which point the counters become active.  These include:

  1. exchange-common-servers
  2. exchange-common-servers-perfmon
  3. exchange-mailbox-servers
  4. exchange-mailbox-servers-perfmon
  5. exchange-hubtransport-servers
  6. exchange-hubtransport-servers-perfmon
  7. exchange-cas-servers
  8. exchange-cas-servers-perfmon
Again, once created, applying the checks now requires no more than adding the host to the appropriate groups.

Service Group Assignments 

Service Groups provide a logical grouping of hosts that each monitor specified Service Checks.  They are created by assigning specified Service Checks to the Service Group, after which Nagios/Icinga will include the appropriate Hosts.  This will necessarily include overlapping Hosts and Host Groups within defined Service Groups because Exchange services are not necessarily unique to each Role.  For instance, monitoring Outlook Web Access (OWA) requires monitoring web-based services.  While we associate OWA with Client Access Servers, some of these services are common to all Exchange Roles and (in fact) necessary for end-to-end OWA conectivity throughout the Exchange Enterprise; web-based services must be functioning on Hub Transport and Mailbox Role Servers for OWA to properly work.  This is a good thing; OWA will fail if the web services on Mailbox and Hub Transport Servers are not working correctly and such a definition of the Service Group provides a complete view of the required counters.


PNP4Nagios and NagVis Integration

Web-based monitoring using the default applications is helpful, however additional visualizations provide an easier to understand overview and facilitate more rapid troubleshooting.

PNP4Nagios is an add-on that provides performance graphing.  While useful for identifying immediate issues (such as overloaded mail queues), it is even more so for identifying issues before they happen -- trend analysis.  For instance, you check mail queue performance, disk and network activity to identify the specific Hub Transport and Mailbox servers that are approaching excessive mail message capacity.  The need for added capacity may thus be identified in advance so that a planned upgrade specified and implemented before system performance degrades.

NagVis is another visualization add-on that provides a map-like or diagram-like interface.  The selection of information is limited only by that available in Nagios/Icinga, ranging from generalized overviews to highly-detailed (Service Check specific) information.  The examples provided show Host and Service Groups status for the Enterprise.



The video below depicts a Nagios/Icinga Exchange Enterprise monitoring in operation.  Also included are examples of PNP4Nagios and NagVis.

Monday, August 4, 2014

Zabbix in Multilingual Windows and Exchange Enterprises

Windows Performance counters are a valuable monitoring tool, but they require modifications in a Zabbix-monitored multilingual environment.  This article explains how to support English language-based Zabbix queries by modifying the language support in Windows Performance Monitor.

Issues with Zabbix Templates in Windows Enterprises

Many Zabbix queries are language-independent.  However, the names of Windows Performance counters are language-specific and since the Zabbix Server passes the query through the Windows Zabbix Agent and then to the operating system, English language queries will fail.

There are two options:
  1. Maintain Zabbix Templates for each supported locale (language) or
  2. Modify the Windows Enterprise to support a single locale in the Performance Monitor language

Zabbix expects responses in a UTF-8 format.  This supports a wide range of languages, but presents some difficulties at the application layer.  IBM provides a brief description of UTF-8 transformations.  From the author's perspective and experience with mixed Linux and Windows environments, modifying the underlying Windows Performance Monitoring locale is preferred as it may be more easily automated.

There are several areas in the Zabbix documentation that suggest overwriting the English language scripted queries with the corresponding numeric values obtained by examining the Registry.  THIS WILL ONLY WORK FOR BASIC PERFORMANCE MONITORING COUNTERS.  It will NOT work for enterprise queries.  After OS installation, additional Performance Monitoring values are machine-specific and WILL ASSUREDLY vary throughout the enterprise.  This is not an option.

Windows, too, supports many locales and character sets.  Although there is a paucity of documentation of the underlying operating system and locales implementation, there is enough to develop an automated solution.

Windows Performance Monitoring Locales

The methods by which Windows implements locale support are poorly documented.  The Zabbix Forums, in a thread begun in 2007 and last updated in 2010, provides insight into locale support of Performance Monitoring.  The thread suggests several options:
  1. Use PDH APIs to address the registry and select the supported language.
  2. Modify the Registry (manual or automated) to change the Performance Monitor Language.
  3. Copy the c:\windows\system32\perf*###.dat files to overwrite the current locale language.  This is not perfect and may require additional Zabbix Agent and VBS-scripted modifications.
The thread is old and has been overcome by more recent events.  At the time of writing, Option 3 is the only one that works, but does not require extensive Agent and VBS-Scripted modifications.

Windows, with its long record of security problems, has been changed since the release of Windows 7 and Windows Server 2008 R2.  Prior to that, applications vulnerable to privilege elevations could overwrite the Registry.  There is no available description or discussion of the process, but rather than fixing the root cause -- application privilege escalation vulnerabilities -- Microsoft instead implemented a solution to treat the symptoms -- ability to overwrite the Registry.  This "fix" was implemented with Windows Resource Protection.  This prevents any user account -- including Administrator accounts and groups -- from modifying selected Registry Hives and Keys.  Microsoft provides a brief description of Registry Key Protection and their answer to developers is, essentially, change your application or go away.  Modifying the requisite Protected Registry Keys for Windows Performance Monitoring is not an option.

That leaves Option 3 -- modifying files in the c:\windows\system32 directory.  It is an inelegant solution and may, in the future, be prevented by Microsoft as a security fix, but it is workable.

Modifying Language Support in Windows Server 2008 R2 and Above

Windows must have English language support installed.  This may be checked in the Control Panel or by listing with 
dir c:\windows\system32\perf*.dat
English language support is installed if it returns perf*009.dat listings.  If not installed, add it through the Control Panel, but leave the locale unchanged.

Prepare a script to overwrite the existing language perf*###.dat files with the corresponding perf*009.dat files.  In this case, * is c,d,h and i.  For instance:
if exist c:\windows\system32\perfc007.dat (
copy c:\windows\system32\perfc009.dat c:\windows\system32\perfc007.dat
copy c:\windows\system32\perfd009.dat c:\windows\system32\perfd007.dat
copy c:\windows\system32\perfh009.dat c:\windows\system32\perfh007.dat
copy c:\windows\system32\perfi009.dat c:\windows\system32\perfi007.dat
will overwrite German language .dat files with English ones.  A complete script may incorporate any or all of the supported Windows languages and you may identify them as previously described. There are several options to deploy the scripts, including Group Policy Shutdown Scripts or manually running the script on each server.

Once the Performance Monitor .dat files are changed to English, Windows will, as it load the Registry, write the values into the CurrentLanguage hive under HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Perflib.  The Performance Monitoring application (and its OS interface) obtain values from this Registry Hive and not from the .dat files.  You may check the CurrentLanguage Hive to make sure the language has been changed to English.

The Zabbix Forums thread referenced above includes a comment that some (unspecified) counters do not work.  During testing, this is indeed the case.  The latest Processor counters address the Performance Monitor as "Processor Information" and this failed in all cases on non-English servers.  The older "Processor" format worked for most cases, and will be updated in the Templates.  Also, querying specific Instances may also fail if the process name is different.  This was encountered in the Exchange Counters for a limited numer of counters.  The fix appears to be changing the Item to reflect the language-specific process name; this will not be updated in the Templates and any such fix is left to the systems administrator deploying them. Overall, only 1% to 2% of Items fail.

The video below depicts an English language Zabbix Server successfully monitoring a French, German, Brazilian Portuguese and Russian Windows Server using OS, Domain Controller, DNS Server and IIS Templates.

The video below depicts an English language Zabbix Server successfully monitoring French and German Exchange Servers with the Mailbox, Hub Transport and Client Access Roles installed on each.