Zabbix Template Formats
Applications
Applications are logical groupings of Items. These should reflect physical and logical aspects of the device monitored. For Exchange Server, this includes Services (both Operating System and TCP) and Performance Counters. The underlying hardware is monitored at the Operating System level, described in the article Zabbix Templates for Windows 2008 R2 OS and Domain Controllers.Items
Items are individual, measurable counters that may be retrieved by querying the OS or Zabbix Agent. The administrator may easily define time-weighted average sampling to smooth "peaks" and "valleys" in the calculated values. A sampling rate (frequency) and period (time) define a time-weighted average. For instance, if the sampling period is defined as 300 seconds and the rate at one sample per 30 seconds, a time-weighted average value for each stored Item value consists of the mean of ten samples.Triggers
Triggers are counter values that represent degraded conditions. Zabbix provides the following classifications of Trigger values:- Not Classified
- Information
- Warning
- Average
- High
- Disaster
Each Trigger also has administratively-assigned fields for Description and URL. These are very useful as the administrator may assign the URL of a reliable source and description of the Trigger (e.g. Microsoft Technet Performance Monitor references for descriptions and thresholds) and a brief description copied into the trigger.
Graphs
Graphs are the primary visualization for collected data and the feature at which Zabbix excels. Multiple, related items may be assigned to a single graph (e.g. Processor Idle % and Processor Utilization %) for easy comparison. Graphs should be grouped according to the types of data present (e.g. rates in counters/sec for one graph, percentages in a second, total counters in a third, etc.).Graphs, unlike those generated by many RRDTool implementations, are generated just-in-time, saving resource utilization. That is, without a CGI implementation, RRDTool typically generates all graphs simultaneously using cron jobs. For an example, see this article on Munin and this article on Icinga/Nagios graphing for examples. Zabbix only generates graphs when called upon by the administrator, thus saving much of the periodic processing time a cron job implementation requires.
Screens
Screens are logical groupings of graphs and may be designed to present a quick overview of a partial or entire subsystem (e.g. several graphs of processor counters that present the entire set of collected processor data).The Importance of Defining Appropriate Sets of Zabbix Template Information for Exchange Enterprises
Exchange Server Roles
Zabbix will not allow you to assign Templates to a host if there are duplicates in the Templates. Thus, if Template A defines and item "service_state[MSExchangeADTopology]" and Template B defines the same item, Zabbix will only allow you to assign one of the Templates. It is critical to design Templates in advance so that Item, Trigger, Graph and Screen names are not duplicated. For an Exchange Enterprise, the author has selected a framework based upon the overall architecture of the Application, its Performance Counters and its Services. Yet even then it is more complex than just those three general groups.Exchange Server 2010 allows the administrator to install different roles.
- Mailbox Role (Mandatory)
- Hub Transport Role (Mandatory)
- Client Access Server Role (Mandatory)
- Unified Messaging Server Role (Optional)
- Edge Transport Role (Optional and Unique)
The templates for each Role/Role Service/Feature set monitor the availability of services using both the Zabbix "service_state" check to query the OS and "net.tcp.listen" (or variant thereof) to query the NIC TCP service(s) in question. For instance, you may check the Service state "Microsoft Exchange IMAP Service" and associated TCP services on ports 143 (IMAP) and 993 (IMAPS) to determine if the IMAP Mailbox Access service is available. Service state queries may have multiple triggers that depend on the state returned by Windows (Up, Down, Restarting, etc.) and TCP services a single trigger for Up or Down. Templates also collect fundamental Performance Counters.
There is still another set of definitions for Templates -- information important for day-to-day monitoring and information important for troubleshooting, trend analysis and scalability design. The second set includes primarily Performance Counters. Thus, for each set of Exchange data collected, there is a standard set of data including Application service checks, TCP service checks and Performance Counters and a second, more detailed set that includes primarily Performance Counters. Flowing from the examples above, the "Exchange 2010 Client Access Server" Template would provide important day-to-day triggers and performance information while a second, "exchange 2010 Client Access Server Performance Counters" provides highly detailed data for troubleshooting, trend analysis, etc.
Implementation
Avoiding duplicate definitions that may lead to the inability to assign multiple Templates to a Zabbix host requires planning and a thorough understanding of Exchange Server Architecture and Topology. A thorough understanding of the critical core Items versus specialized troubleshooting ones is also very important to provide both day-to-day functionality and less-commonly used troubleshooting, trend analysis, and scalability design information.Using the above guidelines, we may define an (incomplete) example set of Templates to be implemented for Exchange Enterprises:
- Exchange Common Servers
- Exchange Common Servers Performance Counters
- Exchange Mailbox Servers
- Exchange Mailbox Servers Performance Counters
- Exchange Hub Transport Servers
- Exchange Hub Transport Servers Performance Counters
- Exchange Client Access Servers
- Exchange Client Access Servers Performance Counters
Each Template contains (where available and appropriate):
- Item and Trigger Descriptions
- Time-weighted Item collection (generally a 30 second collection interval averaged over a 300 second collection period)
- Trigger URL references to the source reference page for the definition
- Graphs of individual and sets of items
- Screens of Application Groups that collect various screens into logical and complete visualization sets
Discovery
Discovery uses the same methodology as described in the article Automated Zabbix Deployment and Configuration for Windows Enterprises. The Zabbix configuration file deployed to each node through Active Directory Group Policy Objects uses Windows commands to check for specific services -- unique to Roles -- during discovery. In the case of Exchange, the following service check statements are added to the file:Common to All Exchange Servers
UserParameter=services.MSExchangeServiceHost,net start MSExchangeServiceHost
Mailbox Servers
UserParameter=services.MSExchangeIS,net start MSExchangeIS
Hub Transport Servers
UserParameter=services.MSExchangeTransport,net start MSExchangeTransport
Client Access Servers
UserParameter=services.MSExchangeFBA,net start MSExchangeFBA
The Zabbix Discovery Process runs those queries only during discovery and adds the hosts to the proper groups and templates using Actions.