UNIX/Linux MP Authoring – Discovering and Monitoring Failover Clusters

In my last post, I walked through creation of an MP with dynamic discovery for a UNIX/Linux application. In this post, I’ll continue to demonstrate the use of the UNIX/Linux Authoring Library examples for MP authoring, but I will take the demonstration quite a bit deeper into authoring territory – by creating an MP for monitoring of Linux failover clusters. While the base UNIX and Linux operating system Management Packs don’t have built-in detection/monitoring of failover clusters, MP Authoring can be used to build a robust cluster monitoring solution.

In this post, I will walkthrough authoring of a basic MP for monitoring a Linux failover cluster. I have two goals for this post:

  1. Demonstrate the use of the UNIX/Linux Authoring Library for MP Authoring scenarios
  2. Demonstrate the creation of a basic cluster monitoring MP that can be modified to work with other cluster technologies and monitoring requirement

This is fairly involved MP Authoring, and is intended for the author with a bit of experience.

Note:  the example MP described in this blog post (and the VSAE project) can be found in the \ExampleMPs folder of the UNIX/Linux Authoring Library .zip file.

Background

The MP I am building is intended to perform discovery and monitoring of Linux failover clusters, though it could certainly be adapted to work for other cluster technologies.  Prior to starting on the MP implementation, I think it is useful to conceptually model the implementation.

Regardless of the specific technology, failover clusters tend to have the same general concepts.  Entities that I want to represent are:

  • Cluster nodes – hosts that participate in the failover cluster
    • Monitor for requisite daemons
    • Monitor for quorum state
  • Cluster – a “group,” containing the member nodes as well as clustered resources
    • Roll-up monitors describing the total state of the cluster
  • Service – a clustered service, such as a virtual IP or Web server
    • Monitor each service for availability

These conceptual elements will need to be described in Management Pack ClassTypes and corresponding RelationshipTypes.  A basic diagram of my intended implementation looks like:
MPDiagram

Tools and Commands

For both dynamic discovery of the cluster nodes, as well as monitoring of the cluster resource status, I leveraged the clustat utility.

As an example, the clustat output in my test environment, with two nodes and a single virtual IP address as a service, looks like:

[monuser@clnode1 ~]$ sudo clustat
 Cluster Status for hacluster @ Tue Jul 23 19:24:47 2013
 Member Status: Quorate
Member Name ID Status
 ------ ---- ---- ------
 clnode1 1 Online, Local, rgmanager
 clnode2 2 Online, rgmanager
Service Name Owner (Last) State
 ------- ---- ----- ------ -----
 service:IP clnode1 started

As you can see, the output here can be parsed and used in discovery of cluster nodes, cluster, and services, while also providing health information about the cluster nodes, cluster, and services. Depending on versions, clustat may require privileges (i.e. sudo), but it has the advantage of being a status reporting tool that is not used in cluster configuration, so it is a useful command line tool for monitoring purposes.

Creating the MP

Getting Started

Just like with my previous example, starting off on this MP involves creating a new Management Pack project with the VSAE:

GettingStarted

Read more of this post

UNIX/Linux MP Authoring – Dynamic Discovery of Roles & Applications

In a previous post, I demonstrated the use of Operations Manager’s “Shell Command Templates” to customize monitoring by creating a management pack for monitoring BIND DNS servers. The Shell Command Templates offer a great deal of customization potential through simple wizards, but one limitation is that the templates require the target class to already be defined. In other words, they cannot dynamically discover the presence of an application/role on monitored computers. One way to generate a custom class to represent an application or role is to create a UNIX/Linux Process Monitor with the MP Template and target the Process Monitor to a group containing the UNIX/Linux computers running this service. This will create a custom class representing the process, which can be used as a target for the Shell Command Templates. However, this still requires that the group members are explicitly defined, or dynamically populated based on properties such as the hostname. Such an approach is not a bad option at all, but still not quite dynamic discovery. However, with the UNIX/Linux Authoring Library examples, creating a dynamic discovery of an application or role is a fairly easy undertaking. In this post, I will revisit that prior BIND DNS monitoring walkthrough, but demonstrate how the UNIX/Linux Authoring Library can be used to dynamically discover the DNS servers.

Note:  the example MP described in this blog post (and the VSAE project) can be found in the \ExampleMPs folder of the UNIX/Linux Authoring Library .zip file.

Walkthrough:  Dynamic Discovery for UNIX/Linux, using the Authoring Library examples

I will use the Visual Studio Authoring Extensions for this project, so starting the project entails creating a new MP project:

NewVSAEProject

Following the direction in the Getting Started documentation for the UNIX/Linux Authoring Library, I will then set up MP references and aliases. The suggested references in the Getting Started documentation are mostly sufficient, but I will add Microsoft.Linux.Library.mp (with an alias of Linux) as well in order to target discoveries to the Microsoft.Linux.Computer class.

Management Pack ID Alias
Microsoft.Linux.Library Linux
Microsoft.SystemCenter.DataWarehouse.Library  SCDW
Microsoft.SystemCenter.Library  SC
Microsoft.SystemCenter.WSManagement.Library  WSM
Microsoft.Unix.Library  Unix
Microsoft.Windows.Library  Windows
System.Health.Library  Health
System.Library  System
System.Performance.Library  Perf
Unix.Authoring.Library  UnixAuth

Discovering BIND DNS Servers

I’ll use a basic shell command to detect the installation of BIND on Linux computers in order to dynamically discover it.  This method could be used in multiple ways, such as:

  • Check for the presence of a file or directory (such as an installation directory, init script in /etc/init.d, or configuration file)
  • Check for an installed package with a package manager like rpm
  • Check for a running process with ps

In this case, I’ll look for a config file to indicate that BIND is installed on the computer.  Depending on the UNIX/Linux distro, and version of the application, the artifacts on the system that indicate the installation of the application may vary.  In my example, with BIND 9 on CentOS, I will look for the existence of /etc/named.conf to indicate that BIND is installed. The command I will use is: ls /etc/named.conf |wc –l  This will return a value of 1 if the file exists, and 0 if it does not.

To add the dynamic discovery to the MP, I will add a new Empty Management Pack Fragment to the VS project, named Discoveries.mpx.

mpfragment

Read more of this post

Introducing the UNIX/Linux Authoring Library for SC 2012 – Operations Manager

Over at the TechNet Gallery, I’ve posted a new resource I am calling the UNIX/Linux Authoring Library MP. This is an example library MP project, which features numerous Probe/Write Action and Data Source composite modules and numerous Unit Monitor Types. These modules should cover many common UNIX/Linux authoring scenarios. The library MP is provided as a sealed MP, along with the VSAE source project.

Documentation and examples for the using the Library can be found here.

Some of the key scenarios covered include:

 

The full reference for the library MP can be found on the documentation page here.

Happy authoring!

OpsMgr UNIX/Linux updates

Updates were released this week for UNIX/Linux features multiple versions of OpsMgr, with a good selection of fixes:

CU7 for SCOM 2007 R2 fixes the following issues for UNIX and Linux monitoring:

  • Logical disk performance statistics are not collected for some volume types on Solaris computers.
  • Some Network Adapters on HP-UX computers may not be discovered.
  • Network adapter performance statistics are not collected for HP-UX network adapters.
  • The Solaris 8 and 9 agent may not restart after an ungraceful shutdown.

The HP-UX managements have also been updated for 2007 R2, available in catalog or here.

For 2012 and 2012 SP1, fixes related to UNIX and Linux monitoring are delivered in MP updates. The fixes are described in the KB articles for the Update Rollup releases:

System Center 2012 – Operations Manager

  • When multiple process monitors are used to target the same computer or group, processes may incorrectly monitor some template instances.  Additionally, problems with the monitored processes may not be detected. This issue occurs when each process monitor uses the same name as the process even though different argument filters are used in each process monitor.
  • Logical disk performance statistics are not collected for certain volume kinds of disks on Solaris-based computers.
  • Certain Network Adapters on HP-UX computers may not be discovered.
  • Network adapter performance statistics are not collected for HP-UX Network Adapters.
  • Logical disk performance statistics are not collected for certain kinds of device on Linux-based computers.
  • In Solaris operating systems, memory that is used for the ZFS Adaptive Replacement Cache is considered used memory incorrectly.

UR1 for System Center 2012 SP1 – Operations Manager

  • When multiple process monitors are used to target the same computer or group, processes may incorrectly monitor some template instances.  Additionally, problems with the monitored processes may not be detected. This issue occurs when each process monitor uses the same name as the process even though different argument filters are used in each process monitor.

For both 2012 and SP1, the UNIX/Linux MPs can be found here.

I’d also recommend checking out Bob Cornelisson’s notes on the updates:
http://www.bictt.com/blogs/bictt.php/2013/01/09/ur1-for-system-center-2012

http://www.bictt.com/blogs/bictt.php/2013/01/09/cu7-for-scom-2007-r2

OpsMgr 2012 UNIX/Linux Authoring Templates: More Information on the Shell Command Templates

In my last post, I described the new UNIX/Linux Shell Command templates in OpsMgr 2012, and provided a walkthrough on creating a Management Pack for monitoring BIND DNS servers on Linux. In this post, I will provide more detail on the nuts and bolts of the Shell Command templates, as well as a list of tips & tricks to consider when using the templates.

Using the Shell Command Templates

Read more of this post

OpsMgr 2012 UNIX/Linux Authoring Templates: Shell Command

Many of the OpsMgr authoring examples for UNIX/Linux monitoring that I have described on this blog are based on the use of the WSMan Invoke modules to execute shell commands. This is a really powerful mechanism to extend the capabilities of Operations Manager monitoring, and the 2012 version of Operations Manager includes a new set of templates allowing the creation of rules, monitors, and tasks using UNIX/Linux shell commands directly from the Authoring pane of the console.

The new templates are:

Monitors

  • UNIX/Linux Shell Command Three State Monitor
  • UNIX/Linux Shell Command Two State Monitor

Rules

  • UNIX/Linux Shell Command (Alert)
  • UNIX/Linux Shell Command (Performance)

Tasks

  • Run a UNIX/Linux Shell Command

Note: For the OpsMgr 2012 Release Candidate, the Shell Command Template MP needs to be downloaded and imported.  In the final release, it will be imported by default.

Underneath the covers, all of these templates use the ExecuteShellCommand method of the agent’s script provider with the WSMan Invoke module. This method executes the command and outputs StdOut, StdErr, and ReturnCode. The command can be a path to a simple command, a command or script existing on the managed computer, or a “one-liner” script (a shell script condensed to one line with pipes and semi-colons).  The templates also allow you to select whether to run with the nonprivileged action account, or the privileged account (which also supports sudo elevation).

If you’ve done this kind of UNIX/Linux authoring in 2007 R2, you will quickly see how much easier and faster this can be done in 2012.

To show the use of these templates, I have put together an MP authoring walkthrough for monitoring BIND DNS servers on Linux. This entire MP will be created in the Operations Console, with no XML editing!

Walkthrough: Monitoring BIND on Linux

Read more of this post

Operations Manager – Extending UNIX/Linux Monitoring with MP Authoring – Part IV

Introduction

In Part III of this series, I walked through creation of data sources, a discovery, and a rule for discovering dynamically-named log files and implementing an alert-generating rule for log file monitoring.  In this post, I will continue to expand this Management Pack to implement performance collection rules, using WSMan Invoke methods to collect numerical performance data from a shell command. 

Using Shell Commands to Collect Performance Data

Whether it is system performance data from the /proc or /sys file systems, or application performance metrics in other locations, performance data for UNIX and Linux systems can often be found in flat files.   In this example Management Pack, I wanted to demonstrate using a WSMan Invoke module with the script provider to gather a numeric value from a file and publish the data as performance data.   In many cases, this would be slightly more complex than is represented in this example (e.g. if the performance metric value should be the delta between data points in the file over time), but this example should provide the framework for using the contents of a file to drive performance collection rules.   The root of these workflows is a shell command using the cat command to parse the file, which could be piped to grep, awk, and sed to filter for specific lines and columns.  

Additionally, if the performance data (e.g. hardware temperature or fan speed, current application user or connection count) that you are looking for is not stored in a file, but available in the output of a utility command, the same method could be used by using the utility command instead of cat.

Collecting Performance Data from a File

In this example, the MyApp application stores three performance metrics in flat files in the subdirectory ./perf.   I have built three rules that cat these files, and map the values to performance data.  The three rules are functionally identical, so I will only describe one of them.

Performance Collection Rule:  MyApp.Monitoring.Rule.CollectMyAppMem

Read more of this post