Search This Blog

Saturday, July 26, 2014

Zabbix Templates for Microsoft Exchange 2010 Server Roles

This article describes how to build Zabbix templates for Microsoft Exchange 2010 Enterprises and includes Zabbix Templates for Exchange Enterprises.  Zabbix Template items must be unique because duplicate items may not be assigned to the same host.  In an enterprise Exchange environment, a thorough understanding of the Application Roles and their architecture is required to successfully define templates for automated deployment.

Zabbix Template Formats

Zabbix Applications

Applications 

Applications are logical groupings of Items.  These should reflect physical and logical aspects of the device monitored.  For Exchange Server, this includes Services (both Operating System and TCP) and Performance Counters.  The underlying hardware is monitored at the Operating System level, described in the article Zabbix Templates for Windows 2008 R2 OS and Domain Controllers.
Zabbix Items

Items

Items are individual, measurable counters that may be retrieved by querying the OS or Zabbix Agent.  The administrator may easily define time-weighted average sampling to smooth "peaks" and "valleys" in the calculated values.  A sampling rate (frequency) and period (time) define a time-weighted average.  For instance, if the sampling period is defined as 300 seconds and the rate at one sample per 30 seconds, a time-weighted average value for each stored Item value consists of the mean of ten samples.
Zabbix Triggers

Triggers

Triggers are counter values that represent degraded conditions.  Zabbix provides the following classifications of Trigger values:

  1. Not Classified
  2. Information
  3. Warning
  4. Average
  5. High
  6. Disaster

Each Trigger also has administratively-assigned fields for Description and URL.  These are very useful as the administrator may assign the URL of a reliable source and description of the Trigger (e.g. Microsoft Technet Performance Monitor references for descriptions and thresholds) and a brief description copied into the trigger. 
Zabbix Graphs

Graphs

Graphs are the primary visualization for collected data and the feature at which Zabbix excels.  Multiple, related items may be assigned to a single graph (e.g. Processor Idle % and Processor Utilization %) for easy comparison.  Graphs should be grouped according to the types of data present (e.g. rates in counters/sec for one graph, percentages in a second, total counters in a third, etc.).

Graphs, unlike those generated by many RRDTool implementations, are generated just-in-time, saving resource utilization.  That is, without a CGI implementation, RRDTool typically generates all graphs simultaneously using cron jobs.  For an example, see this article on Munin and this article on Icinga/Nagios graphing for examples.  Zabbix only generates graphs when called upon by the administrator, thus saving much of the periodic processing time a cron job implementation requires.
Zabbix Screens

Screens

Screens are logical groupings of graphs and may be designed to present a quick overview of a partial or entire subsystem (e.g. several graphs of processor counters that present the entire set of collected processor data).
Zabbix Screens

The Importance of Defining Appropriate Sets of Zabbix Template Information for Exchange Enterprises

Exchange Server Roles

Zabbix will not allow you to assign Templates to a host if there are duplicates in the Templates.  Thus, if Template A defines and item "service_state[MSExchangeADTopology]" and Template B defines the same item, Zabbix will only allow you to assign one of the Templates.  It is critical to design Templates in advance so that Item, Trigger, Graph and Screen names are not duplicated.  For an Exchange Enterprise, the author has selected a framework based upon the overall architecture of the Application, its Performance Counters and its Services.  Yet even then it is more complex than just those three general groups.


Exchange Server 2010 allows the administrator to install different roles.
  1. Mailbox Role (Mandatory)
  2. Hub Transport Role (Mandatory)
  3. Client Access Server Role (Mandatory)
  4. Unified Messaging Server Role (Optional)
  5. Edge Transport Role (Optional and Unique)
The first three roles must be present in the enterprise. They may be deployed on a single server or deployed individually or in combination on multiple servers.  The Unified Messaging role is optional; the Edge Transport Role is also optional and (if installed) no other roles may be present.  A more detailed description of Exchange 2010 Roles is available in the article Exchange 2010 Architecture.



The templates for each Role/Role Service/Feature set monitor the availability of services using both the Zabbix "service_state" check to query the OS and "net.tcp.listen" (or variant thereof) to query the NIC TCP service(s) in question.  For instance, you may check the Service state "Microsoft Exchange IMAP Service" and associated TCP services on ports 143 (IMAP) and 993 (IMAPS) to determine if the IMAP Mailbox Access service is available.  Service state queries may have multiple triggers that depend on the state returned by Windows (Up, Down, Restarting, etc.) and TCP services a single trigger for Up or Down.  Templates also collect fundamental Performance Counters.

There is still another set of definitions for Templates -- information important for day-to-day monitoring and information important for troubleshooting, trend analysis and scalability design.  The second set includes primarily Performance Counters.  Thus, for each set of Exchange data collected, there is a standard set of data including Application service checks, TCP service checks and Performance Counters and a second, more detailed set that includes primarily Performance Counters.  Flowing from the examples above, the "Exchange 2010 Client Access Server" Template would provide important day-to-day triggers and performance information while a second, "exchange 2010 Client Access Server Performance Counters" provides highly detailed data for troubleshooting, trend analysis, etc.

Implementation

Avoiding duplicate definitions that may lead to the inability to assign multiple Templates to a Zabbix host requires planning and a thorough understanding of Exchange Server Architecture and Topology.  A thorough understanding of the critical core Items versus specialized troubleshooting ones is also very important to provide both day-to-day functionality and less-commonly used troubleshooting, trend analysis, and scalability design information.

Using the above guidelines, we may define an (incomplete) example set of Templates to be implemented for Exchange Enterprises:
  1. Exchange Common Servers
  2. Exchange Common Servers Performance Counters
  3. Exchange Mailbox Servers
  4. Exchange Mailbox Servers Performance Counters
  5. Exchange Hub Transport Servers
  6. Exchange Hub Transport Servers Performance Counters
  7. Exchange Client Access Servers
  8. Exchange Client Access Servers Performance Counters
The optional Unified Messaging and Edge Transport Roles are not included in this list, but may also be defined.  The Edge Transport Role is a subset of the Hub Transport Role functionality and the Templates may be easily added by cloning and modifying the existing Hub Transport Templates.

Each Template contains (where available and appropriate):
  1. Item and Trigger Descriptions
  2. Time-weighted Item collection (generally a 30 second collection interval averaged over a 300 second collection period)
  3. Trigger URL references to the source reference page for the definition
  4. Graphs of individual and sets of items
  5. Screens of Application Groups that collect various screens into logical and complete visualization sets

 Discovery

Discovery uses the same methodology as described in the article Automated Zabbix Deployment and Configuration for Windows Enterprises.  The Zabbix configuration file deployed to each node through Active Directory Group Policy Objects uses Windows commands to check for specific services -- unique to Roles -- during discovery.  In the case of Exchange, the following service check statements are added to the file:

Common to All Exchange Servers
UserParameter=services.MSExchangeServiceHost,net start MSExchangeServiceHost
 
Mailbox Servers
UserParameter=services.MSExchangeIS,net start MSExchangeIS 

Hub Transport Servers
UserParameter=services.MSExchangeTransport,net start MSExchangeTransport

Client Access Servers

UserParameter=services.MSExchangeFBA,net start MSExchangeFBA
 

The Zabbix Discovery Process runs those queries only during discovery and adds the hosts to the proper groups and templates using Actions.



Zabbix Actions for Exchange 2010


Wednesday, July 23, 2014

Book Review -- Learning Nagios 4 by Wojciech Kocjan




Summary

Learning Nagios 4 by Wojciech Kocjan documents the new Nagios 4 project, released in September, 2013. It is a practical guide written by an experienced Nagios administrator in a format more practical than the technical documentation provided by the project maintainers. It is useful not only for those new to Nagios, but as a comprehensive continuing education review of the Nagios 4 milestone release. Its structure begins with the basics and proceeds through the most important advanced and add-on features that make Nagios such a powerful systems monitoring tool.

Review

Learning Nagios 4 by Wojciech Kocjan is an ambitious project. Its preface sets out its goal: to be a practical guide for setting up Nagios 4. It begins with installation, describes the tools available and their configuration and concludes with more advanced topics such as programming service checks and using query handlers. In between, it systematically covers the most important tools available to the sysadmin and how to use them. If that scope sounds ambitious, its 400 page length presages the thoroughness of its content.

Although there is abundant documentation available from the project maintainers, that documentation is thorough, almost too much so. Equal weight is given to the less-commonly used options in that documentation, and reading it can become a burden. Kocjan's book, by selecting the most important topics, is better focused for practical implementations. The book thus achieves a practicality that only an experienced professional can attain.

Nagios 4 -- a September 2013 milestone release -- is a good point for experienced administrators to review the application from the basics up. Periodic continuing education is important and milestone releases are an appropriate time to thoroughly review skills from the foundation up. Yet the book is also written at a level appropriate to new Nagios administrators. While a thorough knowledge of basic Linux skills is necessary, even those unfamiliar with Nagios will be able to build a monitoring system. While stating it is focused on Ubuntu, there is adequate discussion of installations from source and RPM-based distributions for administrators of other Linux distributions to understand Nagios. However, there are likely details specific to those other distributions that are not covered and will require additional research on the administrators part.


Installation and Configuration

The book begins with basic installation and configuration tasks. The author's experience is evident. While some of the material looks like it is drawn directly from the maintainer's documentation, the format is much more practical. As opposed to the topic-based organization of the maintainer's technical documentation, the author's format is organized in an order that reflects a real-world implementation. For instance, the author includes a concise discussion of topology definitions with host definitions -- the point at which an administrator would define topology. Descriptions of the web interface and basic plugins follows. These are illustrated with practical examples.

The author proceeds with advanced topics, such as organizing definitions in a maintainable manner with suggested sets of definitions and version control. Indeed, troubleshooting a Nagios installation that is not well organized will invariably add a great deal of time simply searching for errors in a poorly organized system. That organization is the foundation upon which more advanced definitions such as dependencies and templates are built.

Having established a well-defined framework, the author then addresses the whole point of a monitoring system: events, notifications and escalations and event handlers. These are also illustrated with practical examples. Event handlers are often afforded only light coverage in Nagios manuals; that is not the case here. Event handlers a a Nagios strength that automate responses to events; the code required to restart a web server serves as an example. Adaptive monitoring is also often overlooked altogether, but is adequately described here.


Advanced Nagios 4 Configuration and Features

Establishing this solid foundation is only half the book. The second half explores much more advanced topics such as workload distribution, scalability and extending Nagios to monitor additional platforms.

The Nagios Service Check Acceptor (NSCA) is more difficult to understand and implement, but the author does a good job explaining and providing an example. So, too, is the description and illustration of load distribution using ssh and the Nagios Remote Plugin Executor (NRPE), preferable to ssh because it reduces Nagios server overhead. SNMP is a protocol that, well-implemented, affords a great deal of availability and performance data. However, it can be a bit difficult to learn and understand. Here, the author's experience is evident as the text provides a very practical, understandable and thorough description of the protocol and its application.

Finally, the last quarter of the book addresses the most advanced topics, such as Windows, distributed monitoring, programming and query handlers.

Windows is ubiquitous in the enterprise, but requires expertise to monitor using Nagios. NSClient++ is the agent used and it provides NSCA, NRPE and other functionality. It also acts as an "interpretor" for Nagios to record Windows-specific data. Yet here, the author does not explore deploying and maintaining NSClient++ using Active Directory Group Policy Objects. Rather, the example provided is limited to manual installation and configuration -- an onerous and possibly impractical task for a Windows enterprise. However, the descriptions and examples provided are otherwise thorough and practical.

Distributed monitoring is discussed only at a high level and examples are rather basic. However, implementing a distributed Nagios implementation is a complex task worthy of a book itself. This book lays out the reasons and higher-level architecture of distributed Nagios well enough for an administrator to recognize when it is required and with the necessary architectural understanding to research and design it.

Programming, too, is covered at a high level. Several languages may be used and the author uses C for examples. Examples include web services, VMWare and Amazon Web Services -- topics of current and practical interest.

The book ends with a discussion of Query Handlers -- a feature new to Nagios 4. Think of it as a Unix domain sockets communications implementation for Nagios. Query Handlers, using tools such as the Nagios Event Radio Dispatcher (NERD) and Google's open source Gource provide a framework to receive real-time updates from Nagios.

Conclusion

Learning Nagios 4 has an ambitious scope. Kocjan has the experience to deliver a thorough and well-organized book.  The expertise is apparent from his recommendations for organized definitions and the logic flow of the presentation.  The book is detailed enough for a new Nagios administrator to learn the application quickly.  There is enough detail for seasoned administrators to learn about advanced features and how they are implemented.  Even experts can benefit from a top to bottom review of the milestone Nagios 4 release.

Sunday, July 6, 2014

Nagios/Icinga Windows Server, Domain Controller, DNS and IIS Performance Monitoring

This article describes Windows Server 2008 R2 Nagios/Icinga Templates that monitor core server functions, Domain Controllers, DNS Servers and IIS 7.5.  While built specifically for those systems, it is likely  the Templates are compatible with other versions of Windows as well.

For those with Nagios/Icinga and Windows experience, the counters used are available from the Monitoring Exchange page.

Windows Server ships with a excellent monitoring and trend analysis tool: Performance Monitor.  As illustrated below, it allows administrators to select and graph counters that include list of system metrics.  These measurements may also be saved as delimited text files for future analysis and visualization.  A centralized server may connect to other servers to remotely collect data.  Since the API is well-documented, it is integrated into other value-added systems monitoring software.


Windows Server Performance Monitor


Selecting Windows Server Performance Monitor Counters

The illustration below depicts the text format of a Windows Performance Monitor counter.  Commands conforming to this syntax may be sent from remote monitoring servers whose applications comply with the Windows API.
Displaying Windows Server Performance Monitor Syntax

Prerequisites

A thorough knowledge of Nagios/Icinga installation and configuration is necessary.  The fundamental knowledge to Nagios/Icinga is presented in the following articles:
Nagios/Icinga Installation and Initial Configuration
Nagios/Icinga Database Integration
Nagios/Icinga Configuration Files
Nagios / Icinga Configuration with NConf - a Graphical Web / Database Tool

Nagios / Icinga Performance Data Graphing with PNP4Nagios


Information specific to Windows Server enterprise monitoring is presented in these two articles:

Nagios/Icinga Templates for Windows 2008 R2 OS, Domain Controllers, DNS and IIS Servers 

Automated Nagios/Icinga NSClient++ Deployment and Configuration for Windows Enterprises


Description of Monitored Windows 2008 R2 Services and Counters


Nagios/Icinga use Check Commands and Service Checks to monitor Windows.  They use an NRPE Windows Services check to determine if all automatically started and disabled services are in their proper state. They use NRPE Windows Performance Monitoring Ccounter checks to monitor Processor, Memory, Disk and Network counters (among others).  The services and counters listed below are common to Windows Server 2008 R2 regardless of applications installed.  They are indicative of overall performance (or problems) but do not necessarily pinpoint the root issue(s); more advanced -- and specific -- checks are required to diagnose application issues.

Three template files containing the required Check Commands, Service Checks and advanced Performance Monitoring Windows Server Service Checks are available for download from the Monitoring Exchange site.
These checks were gathered from a variety of Microsoft sources.

TCP Ports

135 MSRPC
139 NetBIOS-ssn
445 NetBIOS-ssn

Processor

Processor Information Total Percent UtilizationProcessor Information Total Idle Time
System Processor Queue Length
Server Work Queues Length

Memory

Memory Available MBytes
Memory Free System Page Table Entries
Memory Pages Input/sec
Memory Pages/sec
Memory Pool Nonpaged Bytes
Memory Pool Paged Bytes
Memory Cache Bytes
Memory Percent Registry Quota in Use
Memory Percent Committed Bytes in Use

Paging File Total Percent Usage

Disk

LogicalDisk Avg. Disk sec/Read
LogicalDisk Avg. Disk sec/Write
LogicalDisk Disk Transfers/sec
PhysicalDisk Total Disk sec/Read
PhysicalDisk Percent Idle Time
PhysicalDisk Total Disk sec/Write
LogicalDisk C Free Megabytes
LogicalDisk C Percent Free Space

Network

Network Interface Output Queue Length
Network Interface Bytes Total/sec
Network Interface Bytes Sent/sec
Network Interface Bytes Received/sec

Windows Domain Controller Monitoring and Trend Analysis

Three template files containing the required Check Commands, Service Checks and advanced Performance Monitoring Windows Domain Controller Service Checks are available for download from the Monitoring Exchange site.

Microsoft provides a summary of Windows Domain Controller Performance Counters.

Summary of Monitored Services and Counters

TCP Ports

389 LDAP
464 Kerberos Password
636 LDAPS
3268 Global Catalog
3269 Global Catalog

Windows Server Domain Controller (NTDS) Counters

NTDS DRA Inbound Full Sync Objects Remaining
NTDS DS Notify Queue Size
NTDS LDAP Bind Time
NTDS SAM Account Group Evaluation Latency

Summary of DNS Server Services and Counters

Three template files containing the required Check Commands, Service Checks and advanced Performance Monitoring Windows DNS Server Service Checks are available for download from the Monitoring Exchange site.

Microsoft provides a summary of Windows DNS Server Performance Counters.

TCP and UDP Ports

53 (TCP) DNS
53 (TCP) DNS

Windows DNS Server Counters

Caching Memory
Database Node Memory
Record Flow Memory
Recursive Query Errors
Secure Update Failure
TCP Message Memory
Total Query Received
Total Query Received/sec

UDP Message Memory 
Zone Transfer Failure
Zone Transfer Success

Summary of IIS Server Services and Counters

IIS has changed repeatedly over time and Microsoft-recommended performance counters are generally out-of-date.  The list was developed from a variety of sources and intended to reflect the basic IIS 7.5 Server functions.  Other counters (such as ASP.NET, etc.) are more appropriate to various application environments, such as the Windows Application Server Role, which adds the .NET environment.

Three template files containing the required Check Commands, Service Checks and advanced Performance Monitoring Windows IIS Server Service Checks are available for download from the Monitoring Exchange site.

TCP Ports

80 HTTP
443 HTTPS

Windows IIS Server Counters

Bytes Received/sec
Bytes Sent/sec
Bytes Total/sec
Current Connections
GET Requests/sec
POST Requests/sec
Current Files Cached
Current Metadata Cached
Current URIs Cached
File Cache Hits %
Metadata Cache Hits
URI Cache Hits %

Server 2008 R2, Domain Controller and DNS Server Performance Monitoring Templates

Performance Monitoring Counters are included for advanced troubleshooting, trending and capacity planning.  These counters unlikely useful for day-to-day monitoring and should be used only when needed in those scenarios.

Host Group Assignments

Advanced Services -- as defined above -- are assigned to Host Groups.  When a host is added to a Host group, all of the Advanced Services in that group are then applied to the host.  The OS - Windows Role definitions of Host Groups and Advanced Services defines the groups that must be defined.  These include:
  1. windows-servers
  2. windows-servers-perfmon
  3. windows-domain-controllers
  4. windows-domain-controllers-perfmon
  5. windows-dns-servers
  6. windows-dns-servers-perfmon
  7. windows-iis-servers
 Applying the checks now requires no more than adding the host to the appropriate groups.

Example

The video below depicts a Processor stress test conducted on a Windows Server 2008 R2 Domain Controller/DNS Server.  The Warning and Critical values returned for Processor Counters are expected.  Under this high load, there are also Warning and Critical values returned for Domain Controller functions, indicating that there may be issues with authentication and services relying upon it.

Saturday, July 5, 2014

Nagios/Icinga Templates for Windows 2008 R2 OS, Domain Controllers, DNS and IIS Servers

This article describes how to use Template Check Commands, Service Checks and Host Groups to monitor Windows Server, Domain Controllers, DNS Servers and IIS -- applications that ship with Windows Server.  Links to sample templates are also provided.  In an enterprise Windows environment, a thorough understanding of the Operating System and Applications is required to successfully utilize sets of commands and checks to monitor and diagnose problems.

Nagios/Icinga Template Formats

Although Nagio/Icinga configuration files allow the administrator to insert any properly formatted definition or command into a file ending in .cfg, there are good reasons to observe several formats.  First, many of the add-on graphical configuration applications (for instance, NConf discussed here) require only a single type of definition or command in each file.  Not only does NConf generate configuration files in this format, it requires them to perform imports.  Service Checks are split into two types:
  1. Services -- applied only to single hosts
  2. Advanced Services -- applied to multiple hosts and/or service groups
Check Commands are also split into two groups, but they are not used in this article and are not discussed.

It is also easier to troubleshoot configuration errors is a strict format is observed.  This need not split each type of command and check into separate and dedicated files.  However, experience and portability to graphical configuration applications indicate such a structure is the easiest to use and most compatible.

Therefore, the templates used for this article consist of Check Commands (specifying Windows NRPE Service and Performance Monitoring Counter definitions) and Advanced Services (Service Checks defined with a Host Group assignment).  Two examples are provided below:

Check Command

define command {
                command_name                          check_WinNRPEPerf_4ArgMax
                command_line                          $USER1$/check_nrpe -H $HOSTADDRESS$ -n -p 5666 -c CheckCounter -a "Counter:$ARG1$=$ARG2$" ShowAll MaxWarn=$ARG3$ MaxCrit=$ARG4$
}

Advanced Service

define service {
                service_description                   Processor Information Total Percent Utilization
                check_command                         check_WinNRPEPerf_4ArgMax!CPUPercent!\\Processor Information(_Total)\\% Processor Time!80!90
                check_period                          24x7
                notification_period                   24x7
                hostgroup_name                        windows-servers
                use                                   generic-service
                contact_groups                        +admins
}


Several points deserve explanation.  In the Check Command command_line definition, the -n option specifies no SSL, which may or may not apply to specific deployments.  The -c option defines the type of command (which must be supported in the NSClient configuration file).  The -a command passes specific arguments to the agent.  $ARG1$ is the command name.  $ARG2$ is the Windows-formatted Performance Counter to be queried.  $ARG3$ and $ARG4$ pass warning and critical threshold values to be processed by Nagios/Icinga.  The Advanced Service check_command definition the calls the Check Command and specifies the arguments.  Finally, the Advanced Service is only applied to hosts belonging to the windows-servers Host Group.

The Importance of Defining Appropriate Sets of Nagios/Icinga Template Information for Windows Enterprises

Windows 2008 R2 Operating System Roles, Role Services and Features


A default Windows Server installation only installs the software necessary to operate as a server.  Windows then provides "Roles" and "Features" to provide specific additional functionality.  For instance, a Server may be assigned the "Active Directory Domain Services" and "DNS Server" roles that includes one set of functionality while another may be assigned the "Web Server (IIS)" role that includes a different set.  Role Services and Features may also be installed, further adding to the complexity of defining sets of monitored information.  Flowing from the definitions, the fundamental hardware and OS data may be collected by a Template "Windows Server 2008 R2."  Additional Templates then define "Active Directory Domain Services," "DNS Server" and "Web Server (IIS)" Roles and Features.

The templates for each Role/Role Service/Feature set should monitor the availability of services using both the  Nagios/Icinga Plugins "check_nrpe" to query the OS Services and "check-tcp"/"check_udp" to query the NIC TCP and UDP service(s) in question.  The NRPE Services check selected queries all services set to "automatically started" and "disabled"; it returns OK if all services are in the proper state or lists those that are not.  The associated TCP and UDP services on ports 389 (LDAP), 636 (LDAPS) and 464 (Kerberos Password), among others, are queried to determine if the Active Directory services necessary for authentication are available.  Finally, the data collection includes the Performance Monitoring Counters.

There is still another set of definitions for Templates -- information important for day-to-day monitoring and information important for troubleshooting, trend analysis and scalability design.  The second set includes primarily Performance Counters.  Thus, for each set of Windows Role data collected, there is a standard set of data including OS service checks, TCP service checks and Performance Counters and a second, more detailed set that includes primarily Performance Counters.  Flowing from the examples above, the "Windows Server" Template would provide important day-to-day triggers and performance information while a second, "Windows Server Performance Counters" provides highly detailed data for troubleshooting, trend analysis, etc.

Windows Application Servers

The Windows Operating System (with its Roles, Role Services and Features) is also a platform for additional Servers, such as the Exchange E-Mail and Collaboration Server and SQL Database Server.  These servers provide common core software and specialized role-based software; the modular aspect of Windows Application Servers provides fault-tolerance and scalability.  For example, Exchange may be deployed on a single server with all roles and services for small-business environments or may be deployed on many servers with roles (individually or in combination) of Mailbox, Hub Transport, Edge Transport, Client Access and Unified Messaging for large enterprises.  Microsoft provides design guidelines for large Exchange 2010 deployments here and for large Exchange 2013 deployments here.

Template designs for Windows Application Servers must also reflect common and specific sets of data in much the same way as (for example) "Windows Server 2008 R2" is common core functionality and "Active Directory Domain Services" is specific to a role.  Windows Application Server Templates thus defined must also provide to "Core" day-to-day Items, Triggers and Performance Counters and also the less-commonly used Performance Counters set of information.

Conclusion

Using the above guidelines, we may define an (incomplete) example set of Templates to be implemented for the Windows Operating System:

  1. Windows Server 2008 R2
  2. Windows Server 2008 R2 Performance Counters
  3. DNS Server
  4. DNS Server Performance Counters
  5. Active Directory Domain Services
  6. Active Directory Domain Services Performance Counters
  7. Web Server (IIS)
  8. Web Server (IIS) Performance Counters
We may also define an (incomplete) set Templates to be implemented for the Exchange Server 2010 Application Server:
  1. Exchange Server Common
  2. Exchange Server Common Performance Counters
  3. Exchange Hub Transport Server
  4. Exchange Hub Transport Server Performance Counters
  5. Exchange Mailbox Server
  6. Exchange Mailbox Server Performance Counters
  7. Exchange Client Access Server
  8. Exchange Client Access Server Performance Counters
The following links are Templates hosted on the Monitoring Exchange web site.
Windows Server Check Commands and Service Checks
Windows Domain Controller Check Commands and Service Checks
Windows DNS Server Check Commands and Service Checks

Windows IIS Check Commands and Service Checks

Friday, July 4, 2014

Automated Nagios/Icinga NSClient++ Deployment and Configuration for Windows Enterprises

Much of the NSClient++ deployment and configuration work may be centralized and automated using Active Directory Group Policy Objects (GPOs).  This article describes agent deployment and .ini file configuration to support core modules, check_nt and check_nrpe commands. 

Deploying the NSClient++ Agent through Active Directory GPOs

Deploying the NSClient++ Program through Active Directory GPOs

The NSClient++ Agent is deployed from a Microsoft Software Installer (.msi) package available and documented here. The package provides everything required to install a default NSClient++ agent. For the purposes of this article, all configurations will be applied to the Default Domain Policy Organizational Unit (OU). The OU(s) utilized in a production environment will vary depending upon the structure of the actual Active Directory domain.  Save the .msi file to a shared directory. Then, open the Group Policy Management Console and edit the policy for the selected OU. The package is then selected for deployment by its shared path (<Server>\,<Share>) under the Computer Configuration > Policies > Software Settings > Software Installation policy. Upon rebooting, each server to which the GPO is applied will then install the agent.  However, the configuration file will be the default installation one and not operational.  The customized installation file (described below) requires a second reboot.

If you also wish to support python scripting, also install the .msi file that may be downloaded here.


The .msi install will also add a Windows Firewall rule (pictured above) to allow any inbound traffic to connect to the nscp.exe process.

The following video depicts the Windows boot process and application installation, admittedly not exciting, but included as a demonstration.

Deploying the NSClient++ Agent Configuration File through Active Directory GPOs

The default configuration file -- c:\program files\nsclient++\nsclient.ini -- deployed above is not customized for a production environment.  You may generate new configurations using the nsclient++ program or simply issue the following command to generate a configuration file with all options supported (but not enabled):
c:\program files\nsclient++>nscp settings --generate --add-defaults --load-all





If you have not installed the python .msi file above, you will receive an error that the python .dll file is missing; simply click through the warning an the process will complete.

A configuration file defining the live environment must be deployed to each node before the agents may communicate with the server.  The nsclient.ini defines many different features (modules, Nagios Remote Plugin Executor (NRPE) Server, Nagios Service Check Acceptor (NSCA), scripts, etc.)  A full description of the capabilities is beyond the scope of this article, but I will focus on modules and NRPE Server settings.

Default settings are located at the bottom of the file.  These include:

allowed ciphers =  [set to: ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH]
allowed hosts = [set to IP address(es) of allowed Nagios/Icinga Servers]
cache allowed hosts = true
certificate = [specify path to SSL certificate)
password = [set to shared password]
timeout = 30
use ssl = [true or false; if true, requires certificate]
verify mode = none

Modules provide the core functionality of NSClient++.  To support Nagios/Icinga check_nt, check_nrpe and NSCA commands, the following apply:
CheckSystem = 1 (important for supporting check_nrpe Performance Counter commands)
NRPEServer = 1 (for Nagios/Icinga check_nrpe commands)
NSCAClient = 1 (for Nagios/Icinga NSCA commands)
NSCAServer = 1 (for Nagios/Icinga NSCA commands)
NSClientServer = 1 (for Nagios/Icinga check_nt commands)
To implement the NRPE Server, the following modifications apply:
allowed ciphers = ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH
allow arguments = true
allow nasty characters = true
port = 5666
These settings define the ciphers allowable, whether NSClient++ accepts arguments and normally illegal characters and the port upon which it listens.  By default, use ssl = TRUE requires SSL encryption; if you disable this, any check_nrpe commands issued by the Nagios/Icinga server must include the -n option to disable SSL.

The file is deployed through the GPO Computer Configuration > Preferences > Windows Settings > Files.  The illustration below depicts deploying the Zabbix Agent configuration file, but the process is the same; simply substitute the nsclient.ini file in the shared folder and GPO configuration.


Nagios/Icinga NRPE Commands

Two of the most useful groups of commands check:
  1. Windows Service Availability
  2. Windows Performance Monitoring Counters
TCP and UDP Port availability is also important, but is checked directly rather than through the agent.

Windows Service Availability

The easiest way to implement service checks is to define a Nagios/Icinga command:
 

define command {
                command_name                          check_WinNRPEService
                command_line                          $USER1$/check_nrpe -H $HOSTADDRESS$ -n -p 5666 -c CheckServiceState -a CheckAll exclude=WLMS exclude=ShellHWDetection
}

Where:

-H $HOSTADDRESS$ -- variable for the checked IP address
-n -- no SSL Encryption
-p 5666 -- TCP port for NRPE communications
-c CheckServiceState -- specifies a Windows Service check
-a CheckAll -- specifies to check that all automatically started services are running and all disabled services are stopped
exclude=WLMS exclude=ShellHWDetection -- excludes the Windows Licensing and Shell hardware Detection Services from the check

This command will return either OK or a list of services in the critical state.

Alternatively, this is a more granular check for individual services:

define command {
  command_name <<CheckServiceState>>
  command_line check_nrpe -H $HOSTADDRESS$ -p 5666 -c CheckServiceState -a ShowAll $ARG1$ $ARG2$=stopped
}


Here, the "-a ShowAll $ARG1$ $ARG2$=stopped" options specify to check if the specific service name specified in variable $ARG1$ is running and that in $ARG2$ is stopped.

Full Service Check documentation is available here.

Windows Performance Monitoring Counters

Collecting Windows Performance Monitoring Counters requires at least two variables supplied to the check_nrpe command:  a name and the formatted Windows Performance Counter path.  However, to get the most out of the check, it may be desirable to supply additional warning and critical threshold values.  If the threshold is simple (such as exceeds 0 or less than 100), a single variable -- either --MinCrit or --MaxCrit -- is supplied.  If there is both a warning and critical threshold, two variables -- either --MinWarn and --MinCrit or --MaxWarn and --MaxCrit -- are supplied.  Thus, there are file possible Performance Check commands:

define command {
                command_name                          check_WinNRPEPerf_2Arg
                command_line                          $USER1$/check_nrpe -H $HOSTADDRESS$ -n -p 5666 -c CheckCounter -a "Counter:$ARG1$=$ARG2$" ShowAll
}

define command {
                command_name                          check_WinNRPEPerf_3ArgMax    
                command_line                          $USER1$/check_nrpe -H $HOSTADDRESS$ -n -p 5666 -c CheckCounter -a "Counter:$ARG1$=$ARG2$" ShowAll MaxCrit=$ARG3$
}

define command {
                command_name                          check_WinNRPEPerf_3ArgMin    
                command_line                          $USER1$/check_nrpe -H $HOSTADDRESS$ -n -p 5666 -c CheckCounter -a "Counter:$ARG1$=$ARG2$" ShowAll MinCrit=$ARG3$
}

define command {
                command_name                          check_WinNRPEPerf_4ArgMax    
                command_line                          $USER1$/check_nrpe -H $HOSTADDRESS$ -n -p 5666 -c CheckCounter -a "Counter:$ARG1$=$ARG2$" ShowAll MaxWarn=$ARG3$ MaxCrit=$ARG4$
}

define command {
                command_name                          check_WinNRPEPerf_4ArgMin
                command_line                          $USER1$/check_nrpe -H $HOSTADDRESS$ -n -p 5666 -c CheckCounter -a "Counter:$ARG1$=$ARG2$" ShowAll MinWarn=$ARG3$ MinCrit=$ARG4$
}