Search This Blog

Thursday, June 19, 2014

Windows Server Performance Monitoring with Munin

Using Munin to collect Windows Server Performance Counters comes down to this: knowing the important Windows Performance Monitoring counters and adding the to the munin-node.ini file in the proper format.  Below s the Processor\% C1 Time counter (a low-power state) and the .ini file entry that produces the graph.


[PerfCounterPlugin_Processor%C1Time]
Object=Processor Information
Counter=% C1 Time
GraphTitle=% C1 Time
GraphCategory=processor
DropTotal=1
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000

That's all it takes to produce daily, weekly, monthly and annual graphs for a counter.  Lather, rinse and repeat as often as you like.

Installing the Windows Munin Agent

This topic is covered in another Munin article.  At the time of writing, the installation package is available here.

Configuring the Windows Munin Agent

Windows Performance Monitoring commands are defined in the agent munin-node.ini file, not on the server.  Each counter is defined by a set of commands that define three things:  name, Windows Performance Counter and graph display properties.

Name

[PerfCounterPlugin_<name>] 
is the format used.  Simply insert a name that adequately describes the counter.

Windows Performance Counter

Object=<windows counter group object>
Counter=<windows counter item>
DropTotal=<0 or 1>

The first two entries define the counter to be collected.  Munin does not require quotes or special formatting to interpret the counters.  Needless to say, Windows provides hundreds of individual counters from which to choose.  The most difficult part of the process is selecting a set of counters that adequately monitors all of the important subsystems of interest.  For instance, hardware limitations include Processor Item, Memory, Disk and Network counters.  Server limitations (e.g. Microsoft SQL Server) also include application layer counters that define configuration errors.

DropTotal instructs the agent to drop the last counter when a set of items appear under a counter.  For instance, in a multiprocessor server, the individual items include Processor 0, Processor 1, Processor 2... and finally Processor _Total.  The last item Windows collects (Processor _Total in this case) may or may not be of interest.  If you do not want to see this item, add DropTotal=1.  Otherwise the default (DropTotal=0) will collect and display that information.

Graph Display Properties

GraphTitle=<name>
GraphCategory=<name>
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
GraphTitle instructs rrdtool to add a title item to the graph.  GraphCategory instructs Munin how to group sets or graphs.  For a Windows Server performance analysis, appropriate GraphCategory groups include processor, memory, disk, system and network.  The graphs are then sorted alphabetically by GraphCategory then GraphTitle.

GraphDraw instructs rrdtool how to display and group items.  A counter that collects only one parameter may be displayed as a LINE (a single line on the graph) or an AREA (a line filled down to zero).  The illustration below depicts the difference between LINE and AREA.  You may also use LINESTACK or AREASTACK when mutliple items occur for each counter (as in Processor above).  These definitions sequentially stack items one on top of the previous.  The difference between the two is whether or not the area between is filled or not.

GraphArgs supplies rrdtool with additional instructions.  There are many available and the full list of rrd graph arguments is fully documented here.  If you want the x-axis intercept to always display as y=0, specify --lower-limit=0.  If counters are in percent and you always want the greatest y-axis value to be 100, specify --upper-limit=100.



CounterFormat defines the format of the numerical counter and may be either int (integer), double or large (int64).


CounterMultiply specifies a scaling factor.  For instance, system Uptime is reported in seconds; to change the value to days, multiply seconds by 1.1574074074074073e-005.

Example Munin Analysis of a Windows Memory Stress Test

This stress test utilizes HeavyLoad, a simple to configure application for processor, memory disk read and disk write tests. 


The two graphs above display information indicative of excessive memory load: Pages/sec and Total Page File % Usage.  Once physical memory (RAM) is full of data, Windows use the Page File (the equivalent of Linux swap space) to store additional data in demand by the processor.  Pages/sec measures the rate at which data is written to the page file; Total Page File % Usage measures how much of the page file space is used for storing data.
This excessive memory usage is also manifest in processor utilization, depicted in the graphs above.  At low utilization, the processor is in a high Idle state and also using a low-power state -- C1 -- to conserve energy.

The remaining counters will typically increase with load.  DPC is a lower-priority deferred processing queue.  User, privileged and priority time display the types of processes consuming CPU cycles; Processor time is analagous to total individual processor utilization.
The video below depicts a basic review of the stress test counters for this experiment.




List of Important Windows Server System Performance Counters

The following is the template for Windows Server 2008 R2 Performance Counters (munin-node.ini).

[Plugins]
; Plugin Section, 1 enables plugin, 0 disables
Disk=1
Memory=0
Processes=0
Network=0
MbmTemp=1
MbmVoltage=1
MbmFan=1
MbmMhz=1
SMART=0
HD=0
Cpu=0
SpeedFan=1
External=1

[DiskPlugin]
; Default Warning and Critical values for % space used
Warning=92
Critical=98

[ExternalPlugin]
; For External Plugins just add an entry with the path to the program to run
; It doesn't matter what the name of the name=value pair is
Plugin01=C:\Users\Jory\Documents\Visual Studio Projects\munin-node\src\plugins\python\disk_free.py

[PerfCounterPlugin_disktime]
DropTotal=1
Object=LogicalDisk
Counter=% Disk Time
CounterFormat=double
CounterMultiply=1.000000
GraphTitle=Disk Time
GraphCategory=disk
GraphArgs=--base 1000 -l 0
GraphDraw=LINE

[PerfCounterPlugin_uptime]
; This is a section for the Performance Counter plugin
; The Object and Counter settings are used to access the Performance Counter
; For uptime this would result in \System\System Up Time
; The Graph settings are reported to munin
; The DropTotal setting will drop the last instance from the list, which is often _Total
; Has no effect on single instance counters (Uptime)
; The CounterFormat setting controls what format the counter value is read in as a double, int, or large (int64).
; The plugin always outputs doubles, so this shouldn't have that much effect
; The CounterMultiply setting sets a value the counter value is multiplied by, use it to adjust the scale
; 1.1574074074074073e-005 is the result of(1 / 86400.0), the uptime counter reports seconds and we want to report days.
; So we want to divide the counter value by the number of seconds in a day, 86400.
Object=System
Counter=System Up Time
GraphTitle=Uptime
GraphCategory=system
GraphDraw=AREA
GraphArgs=--base 1000 -l 0
DropTotal=0
CounterFormat=large
CounterMultiply=1.1574074074074073e-005

[SpeedFanPlugin]
;\System\Threads
;------------------------------------------------------------------------------
BroadcastIP=192.168.0.255
UID=FF671100

[PerfCounterPlugin_Threads]
Object=System
Counter=Threads
GraphTitle=Number of Threads
GraphCategory=System
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0 --upper-limit 100
DropTotal=0
CounterFormat=int
CounterMultiply=1.000000

[PerfCounterPlugin_ErrorSystem]
Object=Server
Counter=Errors System
GraphTitle=Errors System
GraphCategory=System
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0 --upper-limit 100
DropTotal=0
CounterFormat=int
CounterMultiply=1.000000
CounterType=DERIVE

[PerfCounterPlugin_MemoryAvailableMBytes]
Object=Memory
Counter=Available Bytes
GraphTitle=Memory Available Bytes
GraphCategory=Memory
GraphDraw=AREA
GraphArgs=--base 1024 --lower-limit 0
DropTotal=0
CounterFormat=large
CounterMultiply=1.000000

[PerfCounterPlugin_PageingFileUsage]
Object=Paging File
Counter=% Usage
GraphTitle=Paging File(_Total) % Usage
GraphCategory=Memory
GraphDraw=AREA
GraphArgs=--base 1000 --lower-limit 0 --upper-limit 100
DropTotal=1
CounterFormat=int
CounterMultiply=1.000000

[PerfCounterPlugin_PageFaultsSec]
Object=Memory
Counter=Page Faults/sec
GraphTitle=Page Faults/sec
GraphCategory=Memory
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0 --upper-limit 100
DropTotal=0
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_PagesSec]
Object=Memory
Counter=Pages/sec
GraphTitle=Pages/sec
GraphCategory=Memory
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
DropTotal=0
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_PageInputsSec]
Object=Memory
Counter=Page Input/sec
GraphTitle=Page Input/sec
GraphCategory=Memory
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
DropTotal=0
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_CacheBytes]
Object=Memory
Counter=Cache Bytes
GraphTitle=Cache Bytes
GraphCategory=Memory
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
DropTotal=0
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_PhysicalDiskSecRead]
Object=PhysicalDisk
Counter=Avg. Disk sec/Read
GraphTitle=PhysicalDisk(_Total) Avg. Disk sec/Read
GraphCategory=Disk
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
DropTotal=1
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_PhysicalDiskSecWrite]
Object=PhysicalDisk
Counter=Avg. Disk sec/Write
GraphTitle=PhysicalDisk(_Total) Avg. Disk sec/Write
GraphCategory=Disk
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
DropTotal=1
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_FileReadOpSec]
Object=System
Counter=File Read Operations/sec
GraphTitle=File Read Operations/sec
GraphCategory=Disk
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
DropTotal=1
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_FileWriteOpSec]
Object=System
Counter=File Write Operations/sec
GraphTitle=File Write Operations/sec
GraphCategory=Disk
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
DropTotal=1
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_DiskWriteBytes]
Object=PhysicalDisk
Counter=Avg. Disk Bytes/Write
GraphTitle=Avg. Disk Bytes/Write
GraphCategory=Disk
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
DropTotal=1
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_DiskReadBytes]
Object=PhysicalDisk
Counter=Avg. Disk Bytes/Read
GraphTitle=Avg. Disk Bytes/Read
GraphCategory=Disk
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
DropTotal=1
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_IOReadOpSec]
Object=Process
Counter=IO Read Operations/sec
GraphTitle=IO Read Operations/sec
GraphCategory=processes
GraphDraw=AREASTACK
GraphArgs=--base 1000 --lower-limit 0
DropTotal=1
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_IOWriteOpSec]
Object=Process
Counter=IO Write Operations/sec
GraphTitle=IO Write Operations/sec
GraphCategory=processes
GraphDraw=AREASTACK
GraphArgs=--base 1000 --lower-limit 0
DropTotal=1
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_IODataOpSec]
Object=Process
Counter=IO Data Operations/sec
GraphTitle=IO Data Operations/sec
GraphCategory=processes
GraphDraw=AREASTACK
GraphArgs=--base 1000 --lower-limit 0
DropTotal=1
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_IOOtherOpSec]
Object=Process
Counter=IO Other Operations/sec
GraphTitle=IO Other Operations/sec
GraphCategory=processes
GraphDraw=AREASTACK
GraphArgs=--base 1000 --lower-limit 0
DropTotal=1
CounterFormat=double
CounterMultiply=1.000000

[PerfCounterPlugin_Processor%IdleTime]
Object=Processor Information
Counter=% Idle Time
GraphTitle=% Idle Time
GraphCategory=processor
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_Processor%InterruptTime]
Object=Processor Information
Counter=% Interrupt Time
GraphTitle=% Interrupt Time
GraphCategory=processor
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_Processor%MaximumFrequencyTime]
Object=Processor Information
Counter=% Maximum Frequency Time
GraphTitle=% Maximum Frequency Time
GraphCategory=processor
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_Processor%PriorityTime]
Object=Processor Information
Counter=% Priority Time
GraphTitle=% Priority Time
GraphCategory=processor
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_Processor%PrivilegedTime]
Object=Processor Information
Counter=% Privileged Time
GraphTitle=% Privileged Time
GraphCategory=processor
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_Processor%UserTime]
Object=Processor Information
Counter=% User Time
GraphTitle=% User Time
GraphCategory=processor
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_Processor%C1Time]
Object=Processor Information
Counter=% C1 Time
GraphTitle=% C1 Time
GraphCategory=processor
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_Processor%DPCTime]
Object=Processor Information
Counter=% DPC Time
GraphTitle=% DPC Time
GraphCategory=processor
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_NetworkInterfaceOutputQueueLength]
Object=Network Interface
Counter=Output Queue Length
GraphTitle=Output Queue Length
GraphCategory=network
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=integer
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_NetworkInterfaceOutboundDiscarded]
Object=Network Interface
Counter=Packets Outbound Discarded
GraphTitle=Packets Outbound Discarded
GraphCategory=network
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_NetworkInterfaceOutboundErrors]
Object=Network Interface
Counter=Packets Outbound Errors
GraphTitle=Packets Outbound Errors
GraphCategory=network
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_NetworkInterfaceReceivedDiscarded]
Object=Network Interface
Counter=Packets Received Discarded
GraphTitle=Packets Received Discarded
GraphCategory=network
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_NetworkInterfaceReceivedErrors]
Object=Network Interface
Counter=Packets Received Errors
GraphTitle=Packets Received Errors
GraphCategory=network
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_NetworkInterfaceBytesTotal/sec]
Object=Network Interface
Counter=Bytes Total/sec
GraphTitle=Bytes Total/sec
GraphCategory=network
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_NetworkInterfaceBytesSent/sec]
Object=Network Interface
Counter=Bytes Sent/sec
GraphTitle=Bytes Sent/sec
GraphCategory=network
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

[PerfCounterPlugin_NetworkInterfaceBytesReceived/sec]
Object=Network Interface
Counter=Bytes Received/sec
GraphTitle=Bytes Received/sec
GraphCategory=network
GraphDraw=LINE
GraphArgs=--base 1000 --lower-limit 0
CounterFormat=double
CounterMultiply=1.000000
DropTotal=1

No comments :

Post a Comment