Stephen Fritz on Systems Engineering: June 2014

Wednesday, June 25, 2014

Zabbix Templates for Windows 2008 R2 OS and Domain Controllers

This article describes how to build Zabbix templates for Windows Server and Application enterprises. Zabbix Template items must be unique because duplicate items may not be assigned to the same host. In an enterprise Windows environment, a thorough understanding of the Operating System and Applications is required to successfully define templates for automated deployment.

Zabbix Graph of Windows Physical/Logical Disk Performance Counters

Zabbix Template Formats

Applications

Applications are logical groupings of Items. These should reflect physical and logical aspects of the device monitored. For Windows Server, this includes hardware (e.g. Disk, Processor, Network and Memory) and software (e.g. Services and Processes).

Items

Items are individual, measurable counters that may be retrieved by querying the OS or Zabbix Agent. The administrator may easily define time-weighted average sampling to smooth "peaks" and "valleys" in the calculated values. A sampling rate (frequency) and period (time) define a time-weighted average. For instance, if the sampling period is defined as 300 seconds and the rate at one sample per 30 seconds, a time-weighted average value for each stored Item value consists of the mean of ten samples.

Triggers

Triggers are counter values that represent degraded conditions. Zabbix provides the following classifications of Trigger values:

Not Classified
Information
Warning
Average
High
Disaster

Each Trigger also has administratively-assigned fields for Description and URL. These are very useful as the administrator may assign the URL of a reliable source and description of the Trigger (e.g. Microsoft Technet Performance Monitor references for descriptions and thresholds) and a brief description copied therefrom.

Graphs

Graphs are the primary visualization for collected data and the feature at which Zabbix excels. Multiple, related items may be assigned to a single graph (e.g. Processor Idle % and Processor Utilization %) for easy comparison. Graphs should be grouped according to the types of data present (e.g. rates in counters/sec for one graph, percentages in a second, total counters in a third, etc.).

Graphs, unlike those generated by many RRDTool implementations, are generated just-in-time, saving resource utilization. That is, without a CGI implementation, RRDTool typically generates all graphs simultaneously using cron jobs. For an example, see this article on Munin and this article on Icinga/Nagios graphing for examples. Zabbix only generates graphs when called upon by the administrator, thus saving much of the periodic processing time a cron job implementation requires.

Screens

Screens are logical groupings of graphs and may be designed to present a quick overview of a partial or entire subsystem (e.g. several graphs of processor counters that present the entire set of collected processor data).

The Importance of Defining Appropriate Sets of Zabbix Template Information for Windows Enterprises

Windows 2008 R2 Operating System Roles, Role Services and Features

Zabbix will not allow you to assign Templates to a host if there are duplicates in the Templates. Thus, if Template A defines and item "perf_counter["\Processor Information(_Total)\% Processor Time",300]" and Template B defines the same item, Zabbix will only allow you to assign one of the Templates. It is critical to design Templates in advance so that Item, Trigger, Graph and Screen names are not duplicated. For a Windows Enterprise, the author has selected a framework based upon the overall architecture of the Operating System, its Services and Server Applications. Yet even then it is more complex than just those three general groups.

A default Windows Server installation only installs the software necessary to operate as a server. Windows then provides "Roles" and "Features" to provide specific additional functionality. For instance, a Server may be assigned the "Active Directory Domain Services" and "DNS Server" roles that includes one set of functionality while another may be assigned the "Web Server (IIS)" role that includes a different set. Role Services and Features may also be installed, further adding to the complexity of defining sets of monitored information. Flowing from the definitions, the fundamental hardware and OS data may be collected by a Template "Windows Server 2008 R2." Additional Templates then define "Active Directory Domain Services," "DNS Server" and "Web Server (IIS)" Roles and Features.

The templates for each Role/Role Service/Feature set should monitor the availability of services using both the Zabbix "service_state" check to query the OS and "net.tcp.listen" (or variant thereof) to query the NIC TCP service(s) in question. For instance, you may check the OS Service state "NTDS" and associated TCP services on ports 389 (LDAP), 636 (LDAPS) and 464 (Kerberos Password) to determine if services necessary for authentication are available. Service state queries may have multiple triggers that depend on the state returned by Windows (Up, Down, Restarting, etc.) and TCP services a single trigger for Up or Down. Templates also collect fundamental Performance Counters.

There is still another set of definitions for Templates -- information important for day-to-day monitoring and information important for troubleshooting, trend analysis and scalability design. The second set includes primarily Performance Counters. Thus, for each set of Windows data collected, there is a standard set of data including OS service checks, TCP service checks and Performance Counters and a second, more detailed set that includes primarily Performance Counters. Flowing from the examples above, the "Windows 2008 R2 Server" Template would provide important day-to-day triggers and performance information while a second, "Windows 2008 R2 Performance Counters" provides highly detailed data for troubleshooting, trend analysis, etc.

Windows Application Servers

The Windows Operating System (with its Roles, Role Services and Features) is also a platform for additional Servers, such as the Exchange E-Mail and Collaboration Server and SQL Database Server. These servers provide common core software and specialized role-based software; the modular aspect of Windows Application Servers provides fault-tolerance and scalability. For example, Exchange may be deployed on a single server with all roles and services for small-business environments or may be deployed on many servers with roles (individually or in combination) of Mailbox, Hub Transport, Edge Transport, Client Access and Unified Messaging for large enterprises. Microsoft provides design guidelines for large Exchange 2010 deployments here and for large Exchange 2013 deployments here.

Template designs for Windows Application Servers must also reflect common and specific sets of data in much the same way as (for example) "Windows Server 2008 R2" is common core functionality and "Active Directory Domain Services" is specific to a role. Windows Application Server Templates thus defined must also provide to "Core" day-to-day Items, Triggers and Performance Counters and also the less-commonly used Performance Counters set of information.

Conclusion

Avoiding duplicate definitions that may lead to the inability to assign multiple Templates to a Zabbix host requires planning and a thorough understanding of Windows Server, Roles, Role Services, Features and Applications design. A thorough understanding of the critical core Items versus specialized troubleshooting ones is also very important to provide both day-to-day functionality and less-commonly used troubleshooting, trend analysis, and scalability design information.

Using the above guidelines, we may define an (incomplete) example set of Templates to be implemented for the Windows Operating System:

Windows Server 2008 R2
Windows Server 2008 R2 Performance Counters
DNS Server
DNS Server Performance Counters
Active Directory Domain Services
Active Directory Domain Services Performance Counters
Web Server (IIS)
Web Server (IIS) Performance Counters

We may also define an (incomplete) set Templates to be implemented for the Exchange Server 2010 Application Server:

Exchange Server Common
Exchange Server Common Performance Counters
Exchange Hub Transport Server
Exchange Hub Transport Server Performance Counters
Exchange Mailbox Server
Exchange Mailbox Server Performance Counters
Exchange Client Access Server
Exchange Client Access Server Performance Counters

Each Template contains (where available and appropriate):

Item and Trigger Descriptions
Time-weighted Item collection (generally a 30 second collection interval averaged over a 300 second collection period)
Trigger URL references to the source reference page for the definition
Graphs of individual and sets of items
Screens of Application Groups that collect various screens into logical and complete visualization sets

Tuesday, June 24, 2014

Automated Zabbix Deployment and Configuration for Windows Enterprises

Deploying Zabbix in the Windows Enterprise consists of three automated tasks:

Deploying the Zabbix Windows Agent and configuration file through Active Directory Group Policy Objects (GPOs)
Configuring Windows for the Zabbix Agent through GPOs to collect data and
Configuring Zabbix Discovery of Windows Servers and Services

All of these tasks may be automated for enterprise deployments using Active Directory and the Zabbix Server. This document shall describe these tasks in detail.

Deploying the Windows Zabbix Agent through Active Directory GPOs

The Zabbix Windows Agent is deployed from a Microsoft Software Installer (.msi) package available and documented here. The package provides everything required to install a default Zabbix agent. For the purposes of this article, all configurations will be applied to the Default Domain Policy Organizational Unit (OU). The OU(s) utilized in a production environment will vary depending upon the structure of the actual Active Directory domain.

Save the .msi file to a shared directory. The, open the Group Policy Management Console and edit the policy for the selected OU. The package is then selected for deployment by its shared path (<Server>\,<Share>) under the Computer Configuration > Policies > Si=oftware Settings > Software Installation policy. Upon rebooting, each server to which the GPO is applied will then install the agent.

Deploying the Zabbix Windows Agent Configuration File through Active Directory GPOs

The default configuration file -- c:\program files\zabbix agent\zabbix_agentd.conf -- deployed above is not customized for a production environment. A configuration file defining the live deployment must be deployed to each node before the agent may communicate with the server.

At a minimum, the following items should be configured in the customized agent .conf file:

LogFile=<Path and Name of Zabbix Agent Log File>
Server=<IP Address of Zabbix Server>
ServerActive=<IP Address of Zabbix Server>
EnableRemoteCommands=1
LogRemoteCommands=1
HostnameItem=system.hostname

These settings configure the agent to log to a defined location, communicate with a specified Zabbix server, recive and log remte commands and, finally, report the system NetBIOS name as the host name.

Windows Service Discovery using the Zabbix Agent

Describing the variety of agent configuration options available is beyond the scope of this article. However, one other option is used:

UserParameter=services.NTDS,net start NTDS

This option defines a Zabbix Agent parameter "services.NTDS" that issues the shell command "net start NTDS." The shell command attempts to start the Active Directory Domain Service. If the host server is not a Domain Controller, it replies with an error; if it is a Domain Controller, it will reply that the service is already started.

The services.NTDS Zabbix Agent parameter thus defined provides Windows Service Discovery, a feature lacking in Zabbix at the time of writing. There are other methods of Windows Service Discovery, however this one is chosen for simplicity and ease of configuration and deployment through GPOs.

Once the configuration file is prepared, it is placed in a shared folder. It is deployed by editing the GPO's Computer Configuration > Preferences > Windows Settings > Files option. Specify the network path (<server>\<share>) to the file and configure it to Replace within the c:\program files\zabbix agent folder.

Updating the Zabbix Configuration File from Active Directory

Configuring Windows Firewall for Zabbix Agent-Server communications through Active Directory GPOs

Edit the GPO's Computer Configuration > Policies > Windows Settings > Security Settings > Windows Firewall with Advanced Security > Inbound Rules to install a Firewall Rule. The rule must allow TCP port 10050 Inbound. It may be more restrictive, but that is the minimum required.

Configuring Windows Firewall for the Zabbix Agent with Active Directory

Configuring a Registry Key to Collect Windows Database Advanced Counters

While not necessary for a basic deployment, there are circumstances in which collecting advanced ESENT (database) counters are helpful. An example will be reviewed in an article on monitoring Exchange Mailbox Servers.

Configuring Advanced Counters support requires a registry edit. On the Domain Controller, open the Registry Editor (Run regedit.exe) and navigate to \HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ESENT\Performance. Add a DWORD Value named "Show Advanced Counters" and set its value to "1."

Creating the Registry Key for Zabbix to Read Windows Database Advanced Counters

Then, edit the GPO's Computer Configuration > Preferences > Windows Settings > Registry and select the local Show Advanced Counters Value configured above.

Selecting the Registry Key for Windows Database Advanced Counters Deployment

Finally, set the Action to Replace.

Defining the Registry Key Deployment Properties for Windows Database Advanced Counters

Configuring Zabbix Autodiscovery of the Windows Operating System and Defined Services

Although the Windows Servers are now configured to communicate with the Zabbix Server, the server itself does not have any recognized nodes. Manual configuration is impractical in an Enterprise, so Zabbix provides Autodiscovery. The video at the bottom of the page illustrates a general configuration case. The following sections illustrate how to configure Zabbix for Operating System and Service Autodiscovery of Windows Servers.

Configuring Zabbix Autodiscovery of the Windows Operating System

The first step is to define a Discovery Rule. From the Zabbix Web Interface, select the Configuration tab and Discovery item. Then, create a Discovery Rule "Windows Server," specify an IP address range and define the Check Zabbix agent "system.uname" to return a verbose operating system description from discovered agents.

Configuring Zabbix Windows OS Autodiscovery

Next, select the Actions item and define an Action. Under the Conditions tab, choose the previous Discovery Rule name and define the Received Value "Like" and "Windows." This will look for the word "Windows" in the returned verbose operating system description and True if "Windows" appears in it. Also define the Discovery Status as "Up" and the Service Type "Zabbix Agent."

Configuring Zabbix Windows OS Action Conditions

Finally, define Operations for the Action. Define the actions "Add Host," "Add to Host Group," and "Link to Template," to add the discovered node to a predefined host group and template. You may also wish to add the operation Remove from host group "Discovered Hosts."

Templates for Windows Server are included with the default Zabbix installation. You may also customize or import templates.

Configuring Zabbix Windows OS Action Operations

Configuring Zabbix Autodiscovery of the Windows Domain Controller Service

The item "UserParameter=services.NTDS,net start NTDS" deployed in the Zabbix Agent configuration file now comes into play. Service Autodiscovery is no more complicated than the built-in OS Autodiscovery described above.

Once again, the first step is to define a Discovery Rule. From the Zabbix Web Interface, select the Configuration tab and Discovery item. Then, create a Discovery Rule "Windows Domain Controller," specify an IP address range and define the Check Zabbix agent "system.NTDS" to return a verbose Service response.

Next, select the Actions item and define an Action. Under the Conditions tab, choose the previous Discovery Rule name and define the Received Value "Like" and "already been started." This will look for the phrase "already been started" in the returned verbose service response and True if "already been started" appears in it. Also define the Discovery Status as "Up" and the Service Type "Zabbix Agent."

At this point, the Windows Servers and Zabbix Server are configured to automatically deploy all required software and settings and perform Autodiscovery. Simply enable the Autodiscovery and Actions items and all discovered servers will be added as hosts with host groups and templates applied.

Enalbing Zabbix Windows Server and Domain Controller Autodiscovery

This demonstration includes five Windows 2008 R2 Servers, one of which is a Domain Controller. As shown below, Zabbix discovers these hosts and adds them to the proper groups. However, it does NOT recognizes neither the NetBIOS not DNS names and adds the servers by IP address. This is an ongoing problem and requires manual updates for each host.

Zabbix after discovering Windows 2008 R2 Servers and Domain Controller

Thursday, June 19, 2014

Windows Server Performance Monitoring with Munin

Using Munin to collect Windows Server Performance Counters comes down to this: knowing the important Windows Performance Monitoring counters and adding the to the munin-node.ini file in the proper format. Below s the Processor\% C1 Time counter (a low-power state) and the .ini file entry that produces the graph.

[PerfCounterPlugin_Processor%C1Time]
Object=Processor Information
Counter=% C1 Time
GraphTitle=% C1 Time
GraphCategory=processor
DropTotal=1
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
That's all it takes to produce daily, weekly, monthly and annual graphs for a counter. Lather, rinse and repeat as often as you like.

Installing the Windows Munin Agent

This topic is covered in another Munin article. At the time of writing, the installation package is available here.

Configuring the Windows Munin Agent

Windows Performance Monitoring commands are defined in the agent munin-node.ini file, not on the server. Each counter is defined by a set of commands that define three things: name, Windows Performance Counter and graph display properties.

Name

[PerfCounterPlugin_<name>]

is the format used. Simply insert a name that adequately describes the counter.

Windows Performance Counter

Object=<windows counter group object>
Counter=<windows counter item>
DropTotal=<0 or 1>

The first two entries define the counter to be collected. Munin does not require quotes or special formatting to interpret the counters. Needless to say, Windows provides hundreds of individual counters from which to choose. The most difficult part of the process is selecting a set of counters that adequately monitors all of the important subsystems of interest. For instance, hardware limitations include Processor Item, Memory, Disk and Network counters. Server limitations (e.g. Microsoft SQL Server) also include application layer counters that define configuration errors.

DropTotal instructs the agent to drop the last counter when a set of items appear under a counter. For instance, in a multiprocessor server, the individual items include Processor 0, Processor 1, Processor 2... and finally Processor _Total. The last item Windows collects (Processor _Total in this case) may or may not be of interest. If you do not want to see this item, add DropTotal=1. Otherwise the default (DropTotal=0) will collect and display that information.

Graph Display Properties

GraphTitle=<name>
GraphCategory=<name>
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000

GraphTitle instructs rrdtool to add a title item to the graph. GraphCategory instructs Munin how to group sets or graphs. For a Windows Server performance analysis, appropriate GraphCategory groups include processor, memory, disk, system and network. The graphs are then sorted alphabetically by GraphCategory then GraphTitle.

GraphDraw instructs rrdtool how to display and group items. A counter that collects only one parameter may be displayed as a LINE (a single line on the graph) or an AREA (a line filled down to zero). The illustration below depicts the difference between LINE and AREA. You may also use LINESTACK or AREASTACK when mutliple items occur for each counter (as in Processor above). These definitions sequentially stack items one on top of the previous. The difference between the two is whether or not the area between is filled or not.

GraphArgs supplies rrdtool with additional instructions. There are many available and the full list of rrd graph arguments is fully documented here. If you want the x-axis intercept to always display as y=0, specify --lower-limit=0. If counters are in percent and you always want the greatest y-axis value to be 100, specify --upper-limit=100.

CounterFormat defines the format of the numerical counter and may be either int (integer), double or large (int64).

CounterMultiply specifies a scaling factor. For instance, system Uptime is reported in seconds; to change the value to days, multiply seconds by 1.1574074074074073e-005.

Example Munin Analysis of a Windows Memory Stress Test

This stress test utilizes HeavyLoad, a simple to configure application for processor, memory disk read and disk write tests.

The two graphs above display information indicative of excessive memory load: Pages/sec and Total Page File % Usage. Once physical memory (RAM) is full of data, Windows use the Page File (the equivalent of Linux swap space) to store additional data in demand by the processor. Pages/sec measures the rate at which data is written to the page file; Total Page File % Usage measures how much of the page file space is used for storing data.

This excessive memory usage is also manifest in processor utilization, depicted in the graphs above. At low utilization, the processor is in a high Idle state and also using a low-power state -- C1 -- to conserve energy.

The remaining counters will typically increase with load. DPC is a lower-priority deferred processing queue. User, privileged and priority time display the types of processes consuming CPU cycles; Processor time is analagous to total individual processor utilization.
The video below depicts a basic review of the stress test counters for this experiment.

List of Important Windows Server System Performance Counters

The following is the template for Windows Server 2008 R2 Performance Counters (munin-node.ini).

[Plugins]
; Plugin Section, 1 enables plugin, 0 disables
Disk=1
Memory=0
Processes=0
Network=0
MbmTemp=1
MbmVoltage=1
MbmFan=1
MbmMhz=1
SMART=0
HD=0
Cpu=0
SpeedFan=1
External=1

[DiskPlugin]
; Default Warning and Critical values for % space used
Warning=92
Critical=98

[ExternalPlugin]
; For External Plugins just add an entry with the path to the program to run
; It doesn't matter what the name of the name=value pair is
Plugin01=C:\Users\Jory\Documents\Visual Studio Projects\munin-node\src\plugins\python\disk_free.py

[PerfCounterPlugin_disktime]
DropTotal=1
Object=LogicalDisk
Counter=% Disk Time
CounterFormat=double
CounterMultiply=1.000000
GraphTitle=Disk Time
GraphCategory=disk
GraphArgs=--base 1000 -l 0
GraphDraw=LINE

[PerfCounterPlugin_uptime]
; This is a section for the Performance Counter plugin
; The Object and Counter settings are used to access the Performance Counter
; For uptime this would result in \System\System Up Time
; The Graph settings are reported to munin
; The DropTotal setting will drop the last instance from the list, which is often _Total
; Has no effect on single instance counters (Uptime)
; The CounterFormat setting controls what format the counter value is read in as a double, int, or large (int64).
; The plugin always outputs doubles, so this shouldn't have that much effect
; The CounterMultiply setting sets a value the counter value is multiplied by, use it to adjust the scale
; 1.1574074074074073e-005 is the result of(1 / 86400.0), the uptime counter reports seconds and we want to report days.
; So we want to divide the counter value by the number of seconds in a day, 86400.
Object=System
Counter=System Up Time
GraphTitle=Uptime
GraphCategory=system
GraphDraw=AREA
GraphArgs=--base 1000 -l 0
DropTotal=0
CounterFormat=large
CounterMultiply=1.1574074074074073e-005

[SpeedFanPlugin]
;\System\Threads
;------------------------------------------------------------------------------
BroadcastIP=192.168.0.255
UID=FF671100

[PerfCounterPlugin_Threads]
Object=System
Counter=Threads
GraphTitle=Number of Threads
GraphCategory=System
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0 --upper-limit 100
DropTotal=0
CounterFormat=int
CounterMultiply=1.000000

[PerfCounterPlugin_ErrorSystem]
Object=Server
Counter=Errors System
GraphTitle=Errors System
GraphCategory=System
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0 --upper-limit 100
DropTotal=0
CounterFormat=int
CounterMultiply=1.000000
CounterType=DERIVE

[PerfCounterPlugin_MemoryAvailableMBytes]
Object=Memory
Counter=Available Bytes
GraphTitle=Memory Available Bytes
GraphCategory=Memory
GraphDraw=AREA
GraphArgs=--base 1024 --lower-limit 0
DropTotal=0
CounterFormat=large
CounterMultiply=1.000000

[PerfCounterPlugin_PageingFileUsage]
Object=Paging File
Counter=% Usage
GraphTitle=Paging File(_Total) % Usage
GraphCategory=Memory
GraphDraw=AREA
GraphArgs=--base 1000 --lower-limit 0 --upper-limit 100
DropTotal=1
CounterFormat=int
CounterMultiply=1.000000

[PerfCounterPlugin_PageFaultsSec]
Object=Memory
Counter=Page Faults/sec
GraphTitle=Page Faults/sec
GraphCategory=Memory
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0 --upper-limit 100
DropTotal=0
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_PagesSec]
Object=Memory
Counter=Pages/sec
GraphTitle=Pages/sec
GraphCategory=Memory
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
DropTotal=0
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_PageInputsSec]
Object=Memory
Counter=Page Input/sec
GraphTitle=Page Input/sec
GraphCategory=Memory
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
DropTotal=0
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_CacheBytes]
Object=Memory
Counter=Cache Bytes
GraphTitle=Cache Bytes
GraphCategory=Memory
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
DropTotal=0
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_PhysicalDiskSecRead]
Object=PhysicalDisk
Counter=Avg. Disk sec/Read
GraphTitle=PhysicalDisk(_Total) Avg. Disk sec/Read
GraphCategory=Disk
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
DropTotal=1
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_PhysicalDiskSecWrite]
Object=PhysicalDisk
Counter=Avg. Disk sec/Write
GraphTitle=PhysicalDisk(_Total) Avg. Disk sec/Write
GraphCategory=Disk
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
DropTotal=1
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_FileReadOpSec]
Object=System
Counter=File Read Operations/sec
GraphTitle=File Read Operations/sec
GraphCategory=Disk
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
DropTotal=1
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_FileWriteOpSec]
Object=System
Counter=File Write Operations/sec
GraphTitle=File Write Operations/sec
GraphCategory=Disk
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
DropTotal=1
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_DiskWriteBytes]
Object=PhysicalDisk
Counter=Avg. Disk Bytes/Write
GraphTitle=Avg. Disk Bytes/Write
GraphCategory=Disk
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
DropTotal=1
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_DiskReadBytes]
Object=PhysicalDisk
Counter=Avg. Disk Bytes/Read
GraphTitle=Avg. Disk Bytes/Read
GraphCategory=Disk
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
DropTotal=1
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_IOReadOpSec]
Object=Process
Counter=IO Read Operations/sec
GraphTitle=IO Read Operations/sec
GraphCategory=processes
GraphDraw=AREASTACK
GraphArgs=--base 1000 --lower-limit 0
DropTotal=1
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_IOWriteOpSec]
Object=Process
Counter=IO Write Operations/sec
GraphTitle=IO Write Operations/sec
GraphCategory=processes
GraphDraw=AREASTACK
GraphArgs=--base 1000 --lower-limit 0
DropTotal=1
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_IODataOpSec]
Object=Process
Counter=IO Data Operations/sec
GraphTitle=IO Data Operations/sec
GraphCategory=processes
GraphDraw=AREASTACK
GraphArgs=--base 1000 --lower-limit 0
DropTotal=1
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_IOOtherOpSec]
Object=Process
Counter=IO Other Operations/sec
GraphTitle=IO Other Operations/sec
GraphCategory=processes
GraphDraw=AREASTACK
GraphArgs=--base 1000 --lower-limit 0
DropTotal=1
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_Processor%IdleTime]
Object=Processor Information
Counter=% Idle Time
GraphTitle=% Idle Time
GraphCategory=processor
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_Processor%InterruptTime]
Object=Processor Information
Counter=% Interrupt Time
GraphTitle=% Interrupt Time
GraphCategory=processor
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_Processor%MaximumFrequencyTime]
Object=Processor Information
Counter=% Maximum Frequency Time
GraphTitle=% Maximum Frequency Time
GraphCategory=processor
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_Processor%PriorityTime]
Object=Processor Information
Counter=% Priority Time
GraphTitle=% Priority Time
GraphCategory=processor
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_Processor%PrivilegedTime]
Object=Processor Information
Counter=% Privileged Time
GraphTitle=% Privileged Time
GraphCategory=processor
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_Processor%UserTime]
Object=Processor Information
Counter=% User Time
GraphTitle=% User Time
GraphCategory=processor
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_Processor%C1Time]
Object=Processor Information
Counter=% C1 Time
GraphTitle=% C1 Time
GraphCategory=processor
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_Processor%DPCTime]
Object=Processor Information
Counter=% DPC Time
GraphTitle=% DPC Time
GraphCategory=processor
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_NetworkInterfaceOutputQueueLength]
Object=Network Interface
Counter=Output Queue Length
GraphTitle=Output Queue Length
GraphCategory=network
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=integer
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_NetworkInterfaceOutboundDiscarded]
Object=Network Interface
Counter=Packets Outbound Discarded
GraphTitle=Packets Outbound Discarded
GraphCategory=network
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_NetworkInterfaceOutboundErrors]
Object=Network Interface
Counter=Packets Outbound Errors
GraphTitle=Packets Outbound Errors
GraphCategory=network
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_NetworkInterfaceReceivedDiscarded]
Object=Network Interface
Counter=Packets Received Discarded
GraphTitle=Packets Received Discarded
GraphCategory=network
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_NetworkInterfaceReceivedErrors]
Object=Network Interface
Counter=Packets Received Errors
GraphTitle=Packets Received Errors
GraphCategory=network
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_NetworkInterfaceBytesTotal/sec]
Object=Network Interface
Counter=Bytes Total/sec
GraphTitle=Bytes Total/sec
GraphCategory=network
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_NetworkInterfaceBytesSent/sec]
Object=Network Interface
Counter=Bytes Sent/sec
GraphTitle=Bytes Sent/sec
GraphCategory=network
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_NetworkInterfaceBytesReceived/sec]
Object=Network Interface
Counter=Bytes Received/sec
GraphTitle=Bytes Received/sec
GraphCategory=network
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

Wednesday, June 11, 2014

Exchange Client Access Server Stress Test -- Zabbix Performance Counters

This simple example of and Exchange stress test uses Zabbix to collect performance counters to identify the bottlenecks on a Client Access Server. Zabbix provides a fast and effective setup because, once installed, one simply uploads pre-formatted templates to record data and present it as screens and graphs.

The test environment consists of two Mailbox Servers, one Hub Transport Server and one Client Access Server which are stressed using Exchange 2010 Load Generator Outlook 2007 Online mode from a member server. Each Exchange Server has a configured Zabbix Agent.

Outlook 2007 Stress Test Topology

The Exchange Topology is illustrated below. Each server is installed on an Oracle VirtualBox VM with two Intel i5 processors and 768 MB RAM. While not an ideal environment -- due to shared SATA drive and limited RAM -- it suffices to display how Zabbix collects Windows Performance Counters and displays them in pre-configured graphs and screens designed for Windows Server and Exchange Hub Transport Server specific parameters.

This link provides the Exchange 2010 LoadGen Manual. This link is the Exchange 2010 LoadGen Download. Finally, this link documents Exchange 2010 Client Access Server Performance Counters.

The illustration below is an overview of processor loads on the four servers during an Outlook 2007 stress test. EXCHANGE01 is the target Mailbox and EXCHANGE04 is the Client Access Server. These both show heavy CPU loads while the Hub Transport and second Mailbox Servers show very low loads.

Processors on All Four Servers During an Outlook Stress Test

Detailed Stress Test Analysis

The video below illustrates the test and Zabbix data presentation. As expected, the test environment is hardware-limited. The data review at the end of the video illustrates:

Disk performance is the limiting factor. This is no surprise considering six VMs and a host share access to a single SATA drive.
Processor load, while high, is not critically so. However, if the test environment's disk subsystem were upgraded, two Intel i5 processors would soon be a bottleneck.
The hardware limitations (and likely the Exchange 2010 LoadGen software) do not allow realistic tests of Outlook 2007 loads on the Exchange hardware-software system.

The reader is encouraged to review the video for a detailed data presentation. The Zabbix Templates have not been fully validated and formatted. However, the templates may be downloaded for testing.

Links to Exchange Server Zabbix Templates.

Sunday, June 8, 2014

Creating a Testing Environment with Oracle Virtualbox on Ubuntu 13.04 Desktop

Testing new systems is necessary before deployment, but maintaining the hardware to do so is often impractical. Fortunately, virtual systems are available to build test environments. This article describes building Oracle VirtualBox systems on Ubuntu 13.04 desktop. Ultimately, there will be a four-office network connected by routers. There will be web, e-mail and database services, a backup and a systems monitoring strategy. However, the basics come first, and that requires a brief review of installing operating systems.

The Host laptop for this testing environment:

Four-core, 2.6 GHz Intel i5 processor
6 GB RAM
500GB SATA Hard Disk with 6GB swap space
Ubuntu 13.04 Desktop
Lubuntu Desktop (an Ubuntu XFCE configuration)
Oracle VirtualBox 4.2.10_Ubuntu r84101

The host laptop shall also act as one of the offices (Coudersport). The virtual machines will be configured to model three offices -- Philadelphia, Harrisburg and Pittsburgh -- connected by routers and redundant WAN links. The following illustrates and describes creating a monitored Debian Wheezy SAN device.

First, start the VirtualBox management application from the desktop.

The host laptop will act as the central point through which all traffic is routed to the Internet. From "File > Preferences", configure three Host Only network adapters:

Interface vobxnet0: Address 10.0.0.1, Netmask 255.255.255.0 (connected to the eventual Philadelphia router)
Interface vobxnet0: Address 10.100.0.1, Netmask 255.255.255.0 (acting as a router loopback address)
Interface vobxnet0: Address 10.0.0.1, Netmask 255.255.255.0 (connected to the eventual Pittsburgh router)

The virtual media -- hard disks -- may also be managed through "File > Virtual Media Manager."

The Settings icon opens a dialog that provides detailed configuration of the virtual hardware. A previously configured monitoring server -- requiring web, database and e-mail servers -- is depicted below.

The motherboard resources are available from the System icon. The monitoring server consists of 512 MB main memory, two processors and hardware acceleration.

The disk subsystem consists of a Serial ATA controller for the DVD drive and a serial SCSI controller for the hard disk.

Switching to a previously configured router, the network interfaces consist of a Host-only adapter and three Internal Network adapters. The Host-only adapter option connects to virtual network adapters configured above on the host laptop. Internal Network adapters run on a virtual switched environment, each switch identified by a unique name. Each virtual switched network acts as a single broadcast domain, isolating the virtual machines from not only hosts on other virtual switches, but also from the host operating system. Thus, routers are needed to connect virtual switches to one another. This router is configured to connect to one Host-only Network (the host laptop's operating system through network vboxnet2) and three virtual Internal Networks:

PHL-PIT - the WAN link between Philadelphia and Pittsburgh
PIT-HBG - the WAN link between Harrisburg and Pittsburgh
PIT - the Pittsburgh Office private network

Oracle provides a detailed description of the networking options.

The above information defines the networking required to build a SAN server on the Pittsburgh private network (VirtualBox adapter 4, OS adapter eth3, 10.202.0.0/24). The following illustrations depict setting up the hardware for the server.

First, name the machine and specify the operating system -- in this case 64-bit Debian Linux.

Specify the installed main memory.

Define a virtual hard disk. By default, VirtualBox will assign a dynamically-sized disk that grows as needed up to the specified size. However, there is a small performance penalty compared to a fixed virtual disk in which all disk space is allocated upon creation. Since there will be many hosts operating simultaneously, select a Fixed Size virtual disk.

The machine is ready to boot after the hard disk is created, however it is not configured for the test network and selected operating system. VirtualBox creates an IDE controller for the DVD device and a Serial ATA device for the hard disk. Debian Linux suffers a significant performance penalty for ATA and SATA hard disks on VirtualBox -- despite the host laptops SATA architecture. Remove the Virtual Controllers (NOT the disks), add a SATA controller for the DVD and a SAS controller for the hard disk. VirtualBox will prompt you to create a new hard disk or select an existing one; use the existing hard disk that is in the directory VirtualBox created for the new server.

By default, VirtualBox creates one network adapter that attaches to the host network adapter by Network Address Translation. Thus, we need to change the network adapters for the SAN Server to four devices (Intel Pro 1000/MT Desktop Adapters) connected to the PIT switch for the Pittsburgh private network.

Once the network adapters are configured, go to Storage and add the bootable Debian Wheezy DVD .iso as the DVD on the SATA controller.

Power on the virtual machine and install just as you would on hardware.

Once the machine is installed, there are some minor adjustments to prepare it for deployment to the test environment. Debian configures only one network adapter during installation, so edit the /etc/network/interfaces file to add the additional adapters and then restart networking with the two commands "service networking stop" and "service networking start" ("service networking restart" is unreliable under Debian Wheezy; use the two commands to assure networking restarts correctly).

Modify the /etc/apt/sources.list file to add the "wheezy", "wheezy-updates" and "wheezy-backports" "main" and "contrib" mirrors of your choice. We will install webmin -- the web-based system configuration package. Add the webmin and somersettechsolutions sites as well, and then fetch the key from webmin and add it to the the apt keys.

Next, update the repository list with "apt-get update." this will add the sources we need to install applications required for the test environment.

The applications we need can be downloaded and installed using apt-get install. snmp and snmpd are the Simple Network Monitoring Protocol packages. The SAN will provide network storage services with iscsitarget. Webmin is a management package. Nagios, Xymon, Munin and Zabbix are network and host monitoring applications that will be discussed in a later article.

Notice while installing there are 20 packages listed that may be upgraded (patched). Updates will be applied later.

While installing, Xymon will prompt for the address of a monitoring server. If this is installed already -- or if its address has been previously planned -- enter it. The configuration file may be changed later, so it is not critical to assign it at this time.

Upon completion, the system is ready to configure. Below are a sample Webmin Main and iSCSI Target configuration screens. These,too, will be discussed in a later article.

Upon installing all required packages, install all available updates with the command "apt-get dist-upgrade." Reboot if kernel updates are installed.

The host is installed and ready to deploy for final configuration. The tasks to build the network are the subjects of future articles.

Subscribe to: Comments ( Atom )

Search This Blog

Labels