Zabbix Template Formats
Applications
Applications are logical groupings of Items. These should reflect physical and logical aspects of the device monitored. For Windows Server, this includes hardware (e.g. Disk, Processor, Network and Memory) and software (e.g. Services and Processes).Items
Items are individual, measurable counters that may be retrieved by querying the OS or Zabbix Agent. The administrator may easily define time-weighted average sampling to smooth "peaks" and "valleys" in the calculated values. A sampling rate (frequency) and period (time) define a time-weighted average. For instance, if the sampling period is defined as 300 seconds and the rate at one sample per 30 seconds, a time-weighted average value for each stored Item value consists of the mean of ten samples.Triggers
Triggers are counter values that represent degraded conditions. Zabbix provides the following classifications of Trigger values:- Not Classified
- Information
- Warning
- Average
- High
- Disaster
Each Trigger also has administratively-assigned fields for Description and URL. These are very useful as the administrator may assign the URL of a reliable source and description of the Trigger (e.g. Microsoft Technet Performance Monitor references for descriptions and thresholds) and a brief description copied therefrom.
Graphs
Graphs are the primary visualization for collected data and the feature at which Zabbix excels. Multiple, related items may be assigned to a single graph (e.g. Processor Idle % and Processor Utilization %) for easy comparison. Graphs should be grouped according to the types of data present (e.g. rates in counters/sec for one graph, percentages in a second, total counters in a third, etc.).Graphs, unlike those generated by many RRDTool implementations, are generated just-in-time, saving resource utilization. That is, without a CGI implementation, RRDTool typically generates all graphs simultaneously using cron jobs. For an example, see this article on Munin and this article on Icinga/Nagios graphing for examples. Zabbix only generates graphs when called upon by the administrator, thus saving much of the periodic processing time a cron job implementation requires.
Screens
Screens are logical groupings of graphs and may be designed to present a quick overview of a partial or entire subsystem (e.g. several graphs of processor counters that present the entire set of collected processor data).The Importance of Defining Appropriate Sets of Zabbix Template Information for Windows Enterprises
Windows 2008 R2 Operating System Roles, Role Services and Features
Zabbix will not allow you to assign Templates to a host if there are duplicates in the Templates. Thus, if Template A defines and item "perf_counter["\Processor Information(_Total)\% Processor Time",300]" and Template B defines the same item, Zabbix will only allow you to assign one of the Templates. It is critical to design Templates in advance so that Item, Trigger, Graph and Screen names are not duplicated. For a Windows Enterprise, the author has selected a framework based upon the overall architecture of the Operating System, its Services and Server Applications. Yet even then it is more complex than just those three general groups.A default Windows Server installation only installs the software necessary to operate as a server. Windows then provides "Roles" and "Features" to provide specific additional functionality. For instance, a Server may be assigned the "Active Directory Domain Services" and "DNS Server" roles that includes one set of functionality while another may be assigned the "Web Server (IIS)" role that includes a different set. Role Services and Features may also be installed, further adding to the complexity of defining sets of monitored information. Flowing from the definitions, the fundamental hardware and OS data may be collected by a Template "Windows Server 2008 R2." Additional Templates then define "Active Directory Domain Services," "DNS Server" and "Web Server (IIS)" Roles and Features.
The templates for each Role/Role Service/Feature set should monitor the availability of services using both the Zabbix "service_state" check to query the OS and "net.tcp.listen" (or variant thereof) to query the NIC TCP service(s) in question. For instance, you may check the OS Service state "NTDS" and associated TCP services on ports 389 (LDAP), 636 (LDAPS) and 464 (Kerberos Password) to determine if services necessary for authentication are available. Service state queries may have multiple triggers that depend on the state returned by Windows (Up, Down, Restarting, etc.) and TCP services a single trigger for Up or Down. Templates also collect fundamental Performance Counters.
There is still another set of definitions for Templates -- information important for day-to-day monitoring and information important for troubleshooting, trend analysis and scalability design. The second set includes primarily Performance Counters. Thus, for each set of Windows data collected, there is a standard set of data including OS service checks, TCP service checks and Performance Counters and a second, more detailed set that includes primarily Performance Counters. Flowing from the examples above, the "Windows 2008 R2 Server" Template would provide important day-to-day triggers and performance information while a second, "Windows 2008 R2 Performance Counters" provides highly detailed data for troubleshooting, trend analysis, etc.
Windows Application Servers
The Windows Operating System (with its Roles, Role Services and Features) is also a platform for additional Servers, such as the Exchange E-Mail and Collaboration Server and SQL Database Server. These servers provide common core software and specialized role-based software; the modular aspect of Windows Application Servers provides fault-tolerance and scalability. For example, Exchange may be deployed on a single server with all roles and services for small-business environments or may be deployed on many servers with roles (individually or in combination) of Mailbox, Hub Transport, Edge Transport, Client Access and Unified Messaging for large enterprises. Microsoft provides design guidelines for large Exchange 2010 deployments here and for large Exchange 2013 deployments here.Template designs for Windows Application Servers must also reflect common and specific sets of data in much the same way as (for example) "Windows Server 2008 R2" is common core functionality and "Active Directory Domain Services" is specific to a role. Windows Application Server Templates thus defined must also provide to "Core" day-to-day Items, Triggers and Performance Counters and also the less-commonly used Performance Counters set of information.
Conclusion
Avoiding duplicate definitions that may lead to the inability to assign multiple Templates to a Zabbix host requires planning and a thorough understanding of Windows Server, Roles, Role Services, Features and Applications design. A thorough understanding of the critical core Items versus specialized troubleshooting ones is also very important to provide both day-to-day functionality and less-commonly used troubleshooting, trend analysis, and scalability design information.Using the above guidelines, we may define an (incomplete) example set of Templates to be implemented for the Windows Operating System:
- Windows Server 2008 R2
- Windows Server 2008 R2 Performance Counters
- DNS Server
- DNS Server Performance Counters
- Active Directory Domain Services
- Active Directory Domain Services Performance Counters
- Web Server (IIS)
- Web Server (IIS) Performance Counters
We may also define an (incomplete) set Templates to be implemented for the Exchange Server 2010 Application Server:
- Exchange Server Common
- Exchange Server Common Performance Counters
- Exchange Hub Transport Server
- Exchange Hub Transport Server Performance Counters
- Exchange Mailbox Server
- Exchange Mailbox Server Performance Counters
- Exchange Client Access Server
- Exchange Client Access Server Performance Counters
Each Template contains (where available and appropriate):
- Item and Trigger Descriptions
- Time-weighted Item collection (generally a 30 second collection interval averaged over a 300 second collection period)
- Trigger URL references to the source reference page for the definition
- Graphs of individual and sets of items
- Screens of Application Groups that collect various screens into logical and complete visualization sets