Search This Blog

Saturday, November 23, 2013

Nagios/Icinga Installation and Initial Configuration

<data:blog.title/> <data:blog.pageName/>

Nagios / Icinga are the core packages of an entire platform.  This installation will include add-on packages -- IDOUtils for MySQL database support and Nagvis for visualization -- but the article will focus primarily on installing Icinga.

Icinga Web CGI


Icinga has a default web-based interface that is very similar to Nagios.  However, there are add-on packages -- icinga-web and icinga-cgi -- that require a database.  The illustration below shows the installation command on Debian Wheezy.  During the installation, a series of pop-up screens will provide configuration prompts.  The first step is to install and configure the Postfix mail server and MySQL database server.  After those are installed, add Icinga, Icinga-Web, Icinga-IDOUtils (for database support) and Nagios-NRPE / Nagios Service Check Acceptor for remote host checks.   The video below illustrates a command-line installation.



The applications are now installed and operational -- accessible at http://<servername>/icinga.  The default installation includes a basic configuration for "localhost" at the loopback address 127.0.0.1.  However, there are several tasks to complete before the monitoring systems are functional.

1) Enable External Commands
  • sed -i -e 's/check_external_commands=0/check_external_commands=1/' /etc/icinga/icinga.cfg
  • dpkg-statoverride --update --add nagios www-data 2710 /var/lib/icinga/rw
  • dpkg-statoverride --update --add nagios nagios 751 /var/lib/icinga
  • service icinga restart
2)  Enable IDO2DB Database Functionality
3)  Supply a working Nagios-NRPE configuration file
4)  Modify the php.ini file's timezone settings for the Icinga-Web Instance Monitoring to work correctly

Icinga's web interface to external commands will now work properly.  Thus, a basic Icinga installation is now operational.  The video below demonstrate's these steps and also includes an initialization of the system by restoring configuration files that describe the network.




Subsequent articles shall describe additional Nagios / Icinga features and their configuration.

Thursday, November 21, 2013

Nagios/Icinga Architecture

<data:blog.title/> <data:blog.pageName/>
Nagios/Icinga are enterprise class systems monitors that can track publicly-available services, agent-collected data and SNMP.  Data is presented in web-based formats and --through additional packages -- graphs and user-configurable visualizations.
Icinga CGI Display

The Nagios project commenced in 1996 and was first publicly released -- as NetSaint -- in 1999.  The project was renamed Nagios in 2002.  By 2005, the project was receiving a great deal of attention from the Open Source Community and the developers formed Nagios LLC in 2007.  By 2009, Nagios LLC began to release commercial products and provide support contracts.

Also in 2009, a group of developers forked the Nagios project to Icinga.  The well-developed core project retains a great deal of compatibility with Nagios (particularly the configuration files), but offers independently-developed solutions and interfaces.

Although the projects are forks, this article will discuss the two projects from a common perspective.

Monitored Systems


Publicly Available Services

Publicly available services are those shared to the network, such as HTTP, SMTP, FTP etc.  Essentially, these are the TCP and UDP services reported using the netstat command.

Host Resources

Host resources are not shared across the network and include items such as hard drive /  memory performance and utilization, processor utilization, network interface card statistics, etc.  These performance parameters may not be directly accessed and rely on other methods -- including agents and SNMP -- to collect data.

Agents

Agents are application-specific software that run on the monitored host.  In the Nagios and Icinga systems, the most common are:


  • Nagios Remote Plugin Executor (NRPE)
  • Nagios Service Check Acceptor (NSCA)
Agents are polled by the monitoring server and return data formatted for the Nagios application.  NRPE, as the name implies, performs much of the data processing on the monitored host and returns results that are compatible with the application storage format; this architecture relieves the monitoring server of some of the processing load and distributes it among the monitored hosts.

Simple Network Monitoring Protocol (SNMP)

SNMP is defined by RFC's and provides a publicly-available standard for querying monitored hosts and returning standardized data. 

Configuration Files

Nagios and Icinga are configured with text files.  There are a few differences between the main configuration file and several of the add-on files (e.g. database configuration), but the object configuration and command files are compatible.

Main Configuration File

The /etc/nagios3/nagios.cfg and /etc/icinga/icinga.cfg files define environment, scheduling, logging and performance parameters for the monitoring daemons.

Object Configuration Files

Objects are defined in a series of files in the /etc/nagios3/conf.d and /etc/icinga/objects directories.  Objects may be divided into the following categories:


  • Services
  • Service Groups
  • Hosts
  • Host Groups
  • Contacts
  • Contact Groups
  • Commands
  • Time Periods
  • Notification Escalations
  • Notification and Execution Dependencies

Plugins

Nagios and Icinga do not maintain internal processes to perform service monitoring.  Instead, these actions are performed by plugins, external executable files and scripts (perl, shell, etc.) that perform the host queries.  Nagios and Icinga distributions typically ship with a set of plugins, but many more may be added from external sources, such as the Nagios Exchange.

Host Checks

Host checks only determine if a host is available or not.  The host definitions include parent/child definitions that define host dependencies and reachability logic.  Host checks are performed:
  • At regular intervals in the host definition.
  • On-demand when a host's service state changes.
  • On-demand and controlled/triggered by host reachability logic.
  • On-demand and controlled/triggered by host dependency logic.

Hosts are reported as in an UP, DOWN or UNREACHABLE state.  UP and DOWN are self-explanatory.  UNREACHABLE is a state in which access to a host is no available because an intermediary host is DOWN.  Thus, if a server is behind a router and the router is reported DOWN, the server will be reported as UNREACHABLE.  Each state may be either HARD or SOFT.  The SOFT state is reported at a host's first state change.  It is then rechecked a specified number of times, after which it is the reported in a HARD state.  These states are used to control things such as notification, reducing the number of false alarms from the system.

Service Checks

Service checks poll the state of specific applications, hardware, software, etc. on individual hosts.  The service definitions include parent/child definitions that define service dependencies and reachability logic.  Service checks are performed:


  • At regular intervals, as defined by the service definition.
  • On-demand and controlled/triggered by predictive service dependency checks.
Services are reported as in an OK, WARNING, UNKNOWN or CRITICAL state.  OK means the service has responded as expected.  WARNING means the service has reported back to the polling server, but provided information that indicates it is outside defined optimal performance.  CRITICAL means the service has failed to respond or has responded with information that indicates it is outside defined acceptable performance.  UNKNOWN is a bit unclear; for instance an SNMP check that receives no response may report UNKNOWN rather than CRITICAL.  Services, too, also report in HARD and SOFT states.

Active Checks

Active checks are configured and controlled by the Nagios and Icinga applications.

Passive Checks

Passive checks are performed on the hosts outside the Nagios and Icinga applications.  The monitoring process accepts and interprets these service checks as they occur.  They are typically asynchronous.  Examples include SNMP Traps and Security Alerts.

State Types

State types refers to the reliability that a given host/service state is accurate.
  • Soft -- the host has returned a changed service state, but it's reliability has not been verified by repeated rechecks.
  • Hard -- the host's state is considered reliable because it has been repeatedly checked.

Time Periods and Notifications

Time periods define when:
  • Scheduled host and service checks are performed
  • Notifications are sent
  • Notification escalations may be used
  • Dependencies are valid
Notifications are messages sent via e-mail, pager, SMS, IM and other media in reponse to specific, defined events.

Event Handlers

Event handlers are external scripts and actions that are triggered by Nagios and Icinga events.
  • Restarting a failed service
  • Entering a trouble ticket into a help desk system
  • Logging event information to a database
  • Power Cycling a host

Distributed, Redundant and Fail Over Monitoring

Distributed, redundant and fail over networking are complex topics; Nagios and Icinga
offer a great of software and implementations to achieve load distribution and redundancy.

  • DNX
  • Fusion
  • MNTOS
  • NDOutils, IDOutils
  • Gearman
  • Check_MK

The nature and implementations of these services are beyind the scope of this discussion and the reader is referred to project documentation for more detailed descriptions.

Predictive Monitoring

Predictive monitoring optimizes data collection using reachability logic, host and service dependencies.  For instance, if a host is unreachable, there is no point expending resources monitoring its services.  When the host is again reachable, obtaining updated information about its services is a priority.  The following briefly illustrates and example.

In the first image, all hosts and services are available and monitored by a server in Pittsburgh, on the left of the image.
NagVis Enterprise Visualization
Operational Network -- Hosts and Services Available
 A system failure -- the Harrisburg Router shuts down -- and predictive monitoring directives override default host and service checks.  The image below depicts the Harrisburg Router as DOWN (red) and the Harrisburg Backup Server (connected to the WAN by through the router) as UNREACHABLE (purple).
NagVis Enterprise Visualization
Harrisburg Router Down, Harrisburg Backup Unreachable
 A detailed status of the each office may then be reviewed.  Note that the Harrisburg Router host is DOWN and its interfaces are CRITICAL (both red).  The Harrisburg Backup host is unreachable (purple).  However, the other services on these two hosts are a mix of CRITICAL (red) and OK (green).  The monitoring server, upon determining that the router is DOWN ceases to check additional services.  Those that report CRITICAL were polled before the router host was reported DOWN.
NagVis Data Center Visualization
Harrisburg Data Center As Seen from Pittsburgh -- Router Down, Backup Unreachable, Several Services Down, Remaining Services not Polled
The Pittsburgh Data Center is local to the monitoring server and the Philadelphia Data Center is reachable through a redundant link.  Those data centers report the far side of the Harrisburg WAN links are down.
NagVis Data Center Visualization
Pittsburgh Data Center -- Harrisburg WAN Link Down


Performance Tuning

Applications with the scope of Nagios and Icinga must, to be scalable, have performance tuning options.  These generalizations briefly list what is available:
  • Service Check Latency Monitoring to quickly identify generally poor performance.
  • MRTG Performance Statistics to identify specific bottlenecks.
  • Process tuning through configuration file options.
  • Passive Checks that offload processing from the monitoring server to monitored hosts.
  • Embedded Perl Interpreter that offers better performance than an Operating System-installed Perl Interpreter.
  • Cached Logic and Host Checks that preclude unnecessary checks.

Saturday, October 19, 2013

Installing and Configuring Basic Zabbix Functionality on Debian Wheezy

<data:blog.title/> <data:blog.pageName/>

The Zabbix project began in 1998 when Alexei Vladishev started working on an internal project.  By 2001, it was released in alpha and the first stable release was in 2004.  Six years is a long time for a project to reach stable release, but Zabbix is an ambitious undertaking.



Zabbix uses a variety of mechanisms to collect data.  It supports SNMP gets and also provides an installable host agent.  The host agent supports passive and active checks -- queries that only return data to be processed by the server versus those that require processing by the client prior to returning the check response to the server.  It is also designed to be scalable, providing a data-collecting proxy and a Java JMX application-monitoring proxy -- the Zabbix Java Gateway.

It is reasonably easy to install and configure the basic functionality.  But Zabbix is not an entry-level application.  An experienced administrator is necessary to design and install a full-featured deployment.  For example, implementing SNMP checks is described on the supporting documentation web page, but the administrator configuring the checks needs to have some familiarity with SNMP MIBs to obtain useful (and appropriate) OIDs. Adding functionality beyond the considerable amount available out-of-the-box also requires some regular expression knowledge.

Even the Debian Wheezy installation requires some extra work.  Debian's apt packaging system is normally very good at installing all of the dependencies.  Not so with Zabbix.  It took some hunting around the blogosphere to figure out a workable installation.  Start by installing the Postfix Mail Server and MySQL database.  Zabbix also supports Postgres databases, but the virtual machine test environment in which this is deployed is served well by MySQL.

Zabbix Installation

The command line installation requires three steps:
  1. apt-get install postfix postfix-mysql mysql-client mysql-server
  2. apt-get install apache2 apache2-mpm-prefork apache2-utils libexpat1 libapache2-mod-php5 php5-common php5-gd php5-mysql php5-cli php5-cgi libapache2-mod-fcgid php-pear php-auth php5-mcrypt mcrypt php5-imagick imagemagick php5-curl libcurl4-openssl-dev
  3. apt-get install zabbix-agent zabbix-server-mysql zabbix-frontend-php phpmyadmin
The Debian Postfix installation offers several configuration options.  The test network has a Microsoft Exchange server, so Postfix will use the Exchange Hub as its smarthost.  Provide the mail server name and DNS name or IP address of the Exchange smarthost.  The Debian MySQL installation will prompt for the password of the MySQL database root user.  The Apache web server installation requires PHP support.  Unfortunately, the Debian package dependencies do not provide all the software Zabbix needs; supply a more explicit list of packages.  Finally, install the Zabbix agent, MySQL back end server and PHP front end PHP applications.

The video below illustrates these three steps with prompted configurations designed for the test environment.

Upon completing the command line installation, copy the zabbix apache configuration file to the apache conf.d directory and procede with the web-based installation.  You will have to correct the php.ini file, create and populate a zabbix database, install the zabbix.conf.php configuration file, modify the /etc/default/zabbix_server file and update the /etc/zabbix/zabbix_server.conf file with the correct username and password.  These steps are illustrated in the video below.



Unlike many Debian LAMP applications that are installed but securely locked down, there is a bit more work to deploy Zabbix.  First, copy the /usr/share/doc/zabbix-frontend-php/apache.conf file, renaming it as /etc/apache2/conf.d/zabbix.conf.  Use PHPMyAdmin to create and populate a zabbix database.  The installation is now ready to continue with the Zabbix web installer.  Browse to http://<servername>/zabbix.  It first checks the /etc/php5/php.ini file and the default installation requires some modifications.  These are clearly indicated.  Once the edits are complete, recheck and proceed.  Next, configure Zabbix for the MySQL server.  Then supply the Zabbix server name and review the final check.  Download the zabbix.conf.php file and upload it to the server's /etc/zabbix directory. The server is now almost ready to use.  Modify the /etc/default/zabbix-server file from the default START=no to START=yes.  Then, provide the mysql user "root" and its password to the /etc/zabbix/zabbix_server.conf file.


Zabbix Configuration

Discovery

Browse to http://<servername>/zabbix and supply the default user name "admin" and password "zabbix".  The default theme is clean and attractive, but I prefer darker colors and change the theme to Black and Blue.

On a production deployment, one of the first things to configure appears near the end of the support site's documentation:  Discovery.  This is a great feature, even when implemented in a basic form.  Zabbix provides a very configurable discovery system that saves a lot of work for an administrator who knows how to set it up.

Agents
The first step is to deploy Zabbix Agents on the monitored hosts.  Many operating systems are supported, but only Linux and Windows are described here.  The Linux client is well documented.  There are only three mandatory changes to apply:  Server, ServerActive and Hostname.  The first two define the IP address of the Zabbix server for passive and active checks.  The third provides the unique host name to the server. The Windows client configuration file is a deplorable mess:  a long string of characters lacking carriage returns.  Just add the three lines above to the file and call it good enough for now.



Host Groups
Devices may be logically grouped by type using Host Groups.  "Linux Servers is included by default.  For this example, add a group "Windows Servers" to display them under one heading.

Templates

Templates provide a configurable set of monitoring, trigger and display items and a large number are included by default.  For this example, modify "Template OS Linux" and "Template OS Windows" to add a default SNMP community string "public" to the OS templates.

Actions

There is much more to Discovery than simply pinging hosts or identifying agents.  The Discovery - Agent interaction may be programmed to take care of a great deal of drudgery using Actions.  Actions specify Operations to perform when specified Conditions are met.  Take this rule, for instance.  The Conditions specify the returned Zabbix Agent OS value is like "Linux" and the Operation is to place the Server in the Host Group "Linux Servers" and apply the monitoring Template "Template OS Linux" to the host.  With no further manual work, discovered hosts are configured with a reasonably comprehensive set of monitoring rules.  Clone this rule and change the OS value to "Windows" and the actions to automatically add Windows hosts to the Windows Host Group and apply the Template OS Windows.

Discovery
The default installation is disabled and configured to search the 192.168.1.1-255 network.  Modify that value to match your local addressing scheme, but restrict the Discovery Process to one subnet at a time, preferably a 24-bit netmask.  By default, this Discovery Process will look for hosts using Zabbix Agent queries and ICMP pings.Continue adding subnets to define the set of hosts to be monitored.  How many you initially configure depends upon the hardware on which Zappix is deployed, but test with three to ten.  It is a bit resource hungry when fully operational.

If you plan to monitor devices for which there are no Zabbix Agents -- such as switches and routers -- you should also add SNMP to the Discovery options.  Provide the read-only community (typically public) and OID ifDescr and SNMP will search for Ethernet interfaces.

The following video depicts the above Discovery and Actions processes.





Zabbix Screens

Screens are a customizable way to aggregate information.  There is a great variety of information that may be presented, and one example does not suffice.  However, the video below shows how to add all of the network's WAN interfaces to a screen, an external URL to a NagVis visualization and the private interfaces for the Philadelphia router and Domain Controller.  Once configured, the Domain Controller begins a download and WAN links begin to fail.  Since the WAN routers use OSPF, they quickly fail over to live routes and the download continues.  Zabbix graphs depict the traffic on WAN links throughout the scenario.




Zabbix Alerts

Monitoring systems need to provide notifications.  Zabbix provides well-implemented notification and tracking.  First, add additional monitoring functionality to the SAN servers by applying Template SNMP Disks.  Then create a user and the User Group SAN Admins.

Unfortunately, Zabbix does not have readily configured topology logic in its alert system.  For instance, if a router fails, Zabbix will alert not only the router has failed, but also any hosts behind it that are now unreachable.  Thus, administrators are flooded with alerts that distract from the root cause of the problem:  a failed router.  This situation leads to confusion and additional difficulty diagnosing problems.

Thus, while Zabbix is very useful and one of the best Debian systems visualization packages available, it is primarily a host monitoring tool and not an enterprise-class systems management tool. It is well worth deploying for host trending and visualizations, but has drawbacks that do not recommend its use as an enterprise-class monitoring tool.

Friday, October 18, 2013

Xymon Host Monitoring

View Stephen Fritz's LinkedIn profileView Stephen Fritz's profile

Xymon, like rrdtool, has a long history.  Its oldest antecedent is Big Brother, a project that has become commercial and well worth checking out.  The original Big Brother project lay dormant as the developers concentrated on the Professional edition.  Two notable forks picked up the original project:  Big Sister and Hobbit.  Forks are confusing enough, but the Hobbit team eventually decided to drop the Tolkien-themed names (preferring to avoid copyrighted and trademarked nomenclature) and adopted Xymon.  Working with Xymon can be a bit of a dinosaur dig, with old names sprinkled throughout the software's configuration files.

Xymon is gathers information with an installed client application and monitors TCP services.  The original Big Brother project did not provide much more than that, but the Xymon team has added more functionality.  With a bit of extra work, specific processes may also be monitored.

As with most Debian software, installation is easy:  apt-get install xymon xymon-client apache2.


Also like many Debian web-based application installations, the default installation is locked down and needs a bit of configuration file editing.  


Below you see the first indication of the project's history -- the /etc/apache/conf.d directory's Xymon configuration file is named hobbit.

Simply change the line "Allow from localhost  ::1/128" to "Allow from all."




"Allow from all" by itself is insecure and the web server should be protected with authentication -- the .htaccess file.  The Xymon team has provided documentation of how to use the files that implement their version of Linux/Apache .htaccess authentication -- hobbitpasswd and hobbitgroups located in the /etc/hobbit directory.  Yes, another bit of inherited nomenclature.

 


Once the web server is configured, browse to http://<servername/hobbit.  The monitoring host is already configured.



The configuration files are kept in the /etc/hobbit directory.  Lo and behold, we are greeted with even older heritage -- the bb-hosts and bb-services directories.  Ahhh, that brings back some fond memories.


Most of the editing is in the bb-hosts file.  The format is well-explained in the comments, but the general idea is the hosts lines consist of an IP address, host name, # character and list of services.  These services are in addition to those provided by the xymon client; more on that later.


A few quick edits brings up more hosts.


Unfortunately, it is not uncommon for the client to insert the wrong host name in the monitored hosts /etc/default/hobbit-client file.  Often (as in this case), the host name is pulled from an old /etc/hosts entry that was never updated (my bad).


The host names on the clients must match those in the /etc/hobbit/bb-hosts file.  Once they do, the default client services (CPU, disk, files, memory and msgs, ports and procs) begin to show up.


There is also a monitoring agent for Windows.  The 32-bit application runs as a service and is configured from a file in the C:\Program Files (x86)\BBWin\etc directory.



The configuration file is fairly well documented,but not as simple as the Linux client.  However, it is easy to add server-side TCP service monitoring as described below.



The information and presentation has improved significantly under the Xymon team.  Although no port checks are defined, clicking on a ports icon brings up a netstat output.




Clicking on a procs icon brings up ps output.



Adding addition services to monitor is very easy.  The bb-services file is well documented and defining a service is as simple as naming it and adding a TCP port number to monitor.   This definition will only return a green (up) or red (down) state.  Adding SEND, EXPECT and OPTIONS lines can distinguish a service running in a degraded state and return yellow (degraded) as long as you know how the service operates.  Below is an illustration of the mysql service running on port 3306.


One host returns green and one red.  The explanation is simple:  the down host is newly installed and has the default listen on address 127.0.0.1 configuration in the /etc/mysql/my.cnf file.  Commenting out that line and restarting the mysql service returns the state to green; mysql is now listening on all interfaces.


The Xymon team has incorporated rrdtools reporting into the application.  This is a great benefit for performance trending.




They have also provided a comprehensive host report with the output from Linux systems utilities.




Adding TCP ports and processes to monitor is relatively simple because the application provides netstat and ps outputs.

However, Xymon's notification capability is limited.  If a router or link fails, the monitoring server node will flood failure notifications for every monitored service that is unreachable behind the failed router; it is difficult to filter the root cause of the failure when there are notifications from every service that is unreachable behind the failed router or link.

Since Xymon is primarily a host-based monitoring application, it is not designed to monitor multi-homed hosts.  Thus, in the test network -- with redundant WAN links -- on which it is installed, it may not catch a failed router interface if the configuration monitors hosts, not the individual interfaces.


Even with two failed interfaces, Xymon may not be aware of a problem because all hosts are reachable on at least one interface.


There is a way around this problem: defining interfaces as hosts with unique names.  Thus, each router has an eth0 through eth3 host name configured to monitor the four interfaces.  Simply doing this in the bb-hosts file leads to a rather cluttered monitoring entry page, however.


However, the bb-hosts file supports defining pages below the entry page quite easily.  And since these interface host definitions do not match the host name in the /etc/default/hobbit file, they only report connection info and trends information.  It is a workable and efficient solution; with experience, the formatting syntax is flexible enough to find workable solutions.






Notifications are managed in the hobbit-alerts.cfg file.  Recipients are defined as an e-mail address or script (e.g. sms message sent by a script ).  The following snips of the configuration file provide an overview of some of the options available.






Xymon appears rather limited when first installed, but digging into the well-documented configuration files reveals many more options.  It is simple enough to deploy as an entry-level monitoring tool, with enough features to provide a reasonably comprehensive monitoring, trending, reporting and notification system.  It is not resource-intensive and scales fairly well with the option to deploy multi-server configurations.

Xymon is also a good tool to deploy in an emergency because it sets up fairly quickly and can provide sufficient information to identify problem hosts in a large, complex data center.

Thursday, October 17, 2013

Using Munin on Debian Wheezy for Monitoring and Trend Analysis

The heritage of Tobi Oetiker's rrdtool is located on a previous post describing Cacti.  As mentioned, there is a rich ecosystem of applications that incorporate rrdtool -- Munin is the subject of this post.
Windows Memory Counters During Stress Tests
 
Cacti uses SNMP  -- a publicly available monitoring protocol -- to gather system information.  Munin uses its own agent, software that is provided by the application developers and installed on each host.  There is SNMP support described on the Munin site, but its forte is the installable agent.  Linux, FreeBSD, NetBSD, Solaris, AIX, OS/X Darwin, HP-UX and Windows are supported as of May 18, 2013.

Installing and Configuring the Munin Server

Installation on Debian Wheezy is trouble-free:  apt-get install munin munin-node apache2.  Note that the default installation uses Apache MPM-Worker, while many other Debian monitoring applications use Prefork.  If you plan on deploying multiple monitoring systems, you may wish to specify Prefork.  Be forewarned, Munin is resource-intensive and may be better left on a stand-alone system with MPM-Worker.


By default, Munin installs with the web server locked down.



For a testing environment, simply change the /etc/apache2/conf.d/munin definitions from "Allow from localhost 127.0.0.0/8 ::1" to "Allow from all."  This is insecure and a production deployment should use authentication.


After restarting Apache ("service apache2 restart"), the default installation configuration will display with localhost.localdomain the only available host.


Configuring Linux Munin Nodes


Two files control Munin's data collection:  /etc/munin/munin.conf for the server and /etc/munin/munin-node.conf for the monitored hosts.  The only required additions to the munin-node.conf file are "allow <address>" lines for each Munin monitoring server IP address.  This file may then be copied by scp to each monitored host and the node process restarted (service munin-node restart).

However, you may customize the Munin Node configuration for each host.  The /etc/munin/plugins/ directory contains symlinks to plugins stored in the /usr/share/munin/plugins directory.  Simply delete unwanted symlinks and add new ones to those available in /usr/share/munin/plugins/.


Configuring Windows Munin Nodes

The Windows Munin Node runs as a service and is configured from the munin node.conf file.  The default installation is pictured below.
Configuring additional counters for Windows Nodes is very easy and provides access to the large number of Windows Performance Counters. For example, a large set of Microsoft SQL Server Performance Counters are available through Performance Manager.  Simply select a desired counter and add a Performance Counter definition to the node configuration file.






 Microsoft Exchange Server offers a similarly large set of Performance Counters.


Configuring the Munin Server to Collect Node Data

The munin.conf server configuration file requires a bit more work, but not much.  Simply add the host name, IP address and (normally) use_node_name yes statements to define each host as specified and restart the munin process (service munin restart).  Munin has a variety of configuration and reporting options, but they are beyond the scope of this article and should be reviewed at Munin's extensive online documentation.


Once the configured server begins to collect data, the hosts appear, grouped by domain name and sorted by host name.  Here, two Windows Server 2008 servers are the first to appear.


Munin runs from a cron job at five minute intervals.  Do NOT attempt to increase the time interval or the server will miss data and expend more system resources trying to interpret what it gets.  If Munin is grinding the server to a standstill, try editing the munin.conf file to tune the numbers of its simultaneous processes.



Make no mistake.  Munin can eat up resources.  The server contacts the monitored nodes requesting the last five minutes of collected data (the munin-node process on monitored hosts).  Once the data has been collected on the server, a variety of processes (described in the munin.conf file) begin to process the data.  The number of processes can be configured in a fairly granular fashion.  Munin-html then takes over and formats the html output.  Now the real number crunching begins:  the munin-graph process.  This is a CPU- and disk-intensive process, generating the resultant graphs.  This virtual server has two Intel Core-i5 2.6 GHz processors assigned and they are pegged.  Disk IO is also intensive.




Soon, all of the monitored nodes appear on the overview page.


Why would anyone deploy a system as resource intensive as described?  Take a look at the results.  Even on a laptop -- with data collection gaps when it and the virtual machines are shut down -- the presentation is beautiful.

Munin excels at reporting processor, memory and process data.  Munin has beauty AND brains.









Munin, like Cacti, is not an alerting system.  It is designed for more granular trend analysis and capacity planning than Cacti, at the price of much greater system loads.  It succeeds with presentation-quality output.

The final words: Management-Pleasing Colors.