Search This Blog

Thursday, May 29, 2014

Nagios/Icinga Windows Server and Domain Controller Performance Monitoring


Introduction

Windows Server ships with a excellent monitoring and trend analysis tool: Performance Monitor.  As illustrated below, it allows administrators to select and graph counters that include list of system metrics.  These measurements may also be saved as delimited text files for future analysis and visualization.  A centralized server may connect to other servers to remotely collect data.  Since the API is well-documented, it is integrated into other value-added systems monitoring software.





The illustration below depicts the text format of a Windows Performance Monitor counter.  Commands conforming to this syntax may be sent from remote monitoring servers whose applications comply with the Windows API.
Nagios and Icinga include a plugin that complies with Windows Performance Monitor APIs as well as several other Windows systems performance indicators.  The balance of this document will describe how to implement the check_nt command to develop a comprehensive monitoring, alerting and trend analysis system for Windows Server 2008 R2 and Windows Domain Controllers.

Prerequisites


A thorough knowledge of Nagios/Icinga, including PNP4Nagios, is necessary.  Several articles that describe how to configure the necessary components are available here.

Windows Server 2008 R2 Monitoring and Trend Analysis


Check_nt Commands and Services

The command set is based upon the Microsoft recommended 

There are three check_nt variables (-v) used in the command definitions: Windows Services, Windows Performance Counters and the CPULOAD check.  The commands are used in corresponding service definitions that assign templates, service groups and hostgroups.  Sample commands to illustrate the formats include:

Windows Server Services Format

define command {
command_name Win_2008R2_Server_Services
command_line /usr/lib/nagios/plugins/check_nt -H $HOSTADDRESS$ -s <password> -p 12489 -v SERVICESTATE -l MSDTC,gpsvc,Netlogon,netprofm,nlasvc,nsi,RpcEptMapper,SamSs,LanmanServer,eventlog,MpsSvc,W32Time,LanmanWorkstation,Dnscache
}

define service{

use                               generic-service,pnpsvc
servicegroups               win_2008r2_servers_servicegroup
hostgroup_name          win_2008r2_servers_hostgroup
service_description      Windows 2008R2 Server Services
check_command          Win_2008R2_Server_Services
notification_interval      0

}

Windows Performance Monitor Counters Format

define command {
command_name Win_2008R2_Processor_Processor_Percent_Idle_Time
command_line /usr/lib/nagios/plugins/check_nt -H $HOSTADDRESS$ -s <password> -p 12489 -v COUNTER -l "\\Processor(_Total)\% Idle Time","Processor Percent Idle Time is %.f"
}

define service{
use                                  generic-service,pnpsvc
servicegroups                 win_2008r2_servers_servicegroup
hostgroup_name            win_2008r2_servers_hostgroup
service_description        Windows 2008R2 Server Processor Percent Idle Time
check_command            Win_2008R2_Processor_Processor_Percent_Idle_Time
notification_interval        0
}

Windows Server CPU Load Format

define command{
command_name Win_2008R2_CPU_Avg
command_line /usr/lib/nagios/plugins//check_nt -H $HOSTADDRESS$ -s <password> -p 12489 -v CPULOAD -l 1,60,95,5,60,95,15,60,95,60,60,95
}

define service{
use                                  generic-service,pnpsvc
servicegroups                 win_2008r2_servers_servicegroup
hostgroup_name            win_2008r2_servers_hostgroup
service_description        Windows 2008R2 Server Processor Averaged Load
check_command            Win_2008R2_CPU_Avg
notification_interval        0
}



Download Windows Server 2008R2 Nagios/Icinga Command and Service Definitions
Download Windows Domain Controller Nagios/Icinga Command and Service Definitions

Summary of Monitored Services and Counters

Windows Server Services

Distributed Transaction Coordinator (MSDTC)
Group Policy Client (gpsvc)
Netlogon (Netlogon)
Network List Service (netprofm)
Network Location Awareness Service (nlasvc)
Network Store Interface (nsi)
RPC Endpoint Mapper (RpcEptMapper)
Security Accounts manager (SamSs)
Server Service (LanmanServer)
Event Log Service (eventlog)
Windows Firewall Service (MpsSvc)
Windows Time Service (W32Time)
Workstation Service (LanmanWorkstation)
DNS Cache (Dnscache)

Processor

Current work queue (Processor 0)
Current work queue (Processor 1)
Processor Percent C1 TimeProcessor Percent C2 TimeProcessor Percent C3 TimeProcessor Percent DPC TimeProcessor Percent Idle TimeProcessor Percent Interrupt TimeProcessor Percent Maximum Frequency TimeProcessor Percent Priority TimeProcessor Percent Privileged TimeProcessor Percent Processor TimeProcessor Percent User TimeSystem Context Switches/sec
System Processor Queue Length
CPULOAD (1-, 5-, 15- and 60-minute averages)

Memory

Memory Available MBytes
Memory Free System Page Table Entries
Memory Pages Input/sec
Memory Pages/sec
Memory Pool Nonpaged Bytes
Memory Pool Paged Bytes

Disk

LogicalDisk Avg. Disk sec/Read
LogicalDisk Avg. Disk sec/Write
LogicalDisk Disk Transfers/sec
PhysicalDisk Avg. Disk sec/Read
PhysicalDisk Avg. Disk sec/Write

Network

Network Interface Output Queue Length
Network Interface Packets Outbound Discarded
Network Interface Packets Outbound Errors
Network Interface Packets Inbound Discarded
Network Interface Packets Inbound Errors
Network Interface Bytes Total/sec
Server Bytes Total per sec

Other

Percent Registry Quota in Use
Process IO Data Operations/sec (Typically Disk)
Process IO Other Operations/sec (Typically Disk)
Process IO Read Operations/sec (Typically Disk)
Process IO Write Operations/sec (Typically Disk)

Description of the Command and Service Definitions

Using the check commands requires a working knowledge of both Nagios/Icinga and Windows Performance Monitor.  For instance, the included definitions are relatively generic, using counters such as _Total instead of specific targets.  For instance, instead of monitoring the C:\, D:\, etc. and individual physical RAID arrays, the commands merely monitor total activity thus:
\\LogicalDisk(_Total)\Avg. Disk sec/Read
\\PhysicalDisk(_Total)\Avg. Disk sec/Read
A more precise definition would replace _Total with specific Logical and Physical Array definitions.

Network adapter counters must be specified by name, consulting the name used in the Performance Monitor Counter.  Thus, the files used in this configuration define an Intel Desktop NIC:
\\Network Interface(Intel[R] PRO_1000 MT Desktop Adapter)\Output Queue Length
The adapter name must be changed for each specific server.  In fact, it is necessary to write individual check commands and service definitions for each type of adapter and assign it on a host-by-host basis.

Windows Domain Controller Monitoring and Trend Analysis

The format and use of the command and service definitions are the same as those described above.

Summary of Monitored Services and Counters

Windows Server Services

DNS Server (DNS)
Intersite Messaging Service (IsmServ)
Kerberos Key Distribution Center (kdc)

Windows Server Domain Controller (NTDS) Counters

NTDS Address Book Client Sessions
NTDS ATQ Estimated Queue Delay
NTDS ATQ Outstanding Queued Requests
NTDS ATQ Request Latency
NTDS DRA Pending Replication Synchronizations
NTDS DRA Inbound Full Sync Objects Remaining
NTDS DRA Outbound Values (DNs only) per sec
NTDS DS Threads in Use
NTDS DS Directory Reads per sec
NTDS DS Directory Writes per sec
NTDS DS Name Cache hit rate
NTDS DS Notify Queue Size
NTDS LDAP Active Threads
NTDS LDAP Bind Time
NTDS LDAP Client Sessions
NTDS LDAP Successful Binds per sec
NTDS LDAP Searches per sec
NTDS NTLM Binds per second
NTDS SAM Account Group Evaluation Latency
NTDS SAM Enumerations per sec
NTDS Simple Binds per sec

The configuration files are straightforward.

Configuration and CPU Stress Test Demonstration

The two videos below illustrate the configuration files (as viewed through NConf) and a CPU stress test with a warning alert and graphing of relevant counters.







Download Windows Server 2008R2 Nagios/Icinga Command and Service Definitions
Download Windows Domain Controller Nagios/Icinga Command and Service Definitions

2 comments :

  1. What a fantastic perspective! I appreciate the positivity and motivation you bring to your writing. Looking forward to more!

    ReplyDelete
  2. Nagios/Icinga can monitor Windows Server and Domain Controller performance by using the NSClient webspacekit agent to collect metrics like CPU usage, memory consumption, and service status, providing real-time alerts and reports.






    ReplyDelete