SCOM: A Net-SNMP Management Pack for UNIX Monitoring, Part I – Desiging the MP

The Net-SNMP agent is a cross-platform SNMP agent that is often bundled with UNIX (e.g. Solaris) and Linux distributions (including VMWare and Checkpoint Splat).   The agent implements some relatively standard MIBs, such as the UC Davis (UCD-SNMP-MIB) and HOST-RESOURCES MIBs, exposing a good selection of operating system metrics for monitoring.  Furthermore, the agent is remarkably extensible in a few different ways.  The agent can be extended with “dlMod” module extensions to integrate additional SNMP components, such as the VMWare or HP Insight Management Agent modules.   Additionally, the Net-SNMP agent can be configured to proxy for another SNMP agent listening on a different port, which is how the Sun SunFire Management Agent is typically configured.   Lastly, the Net-SNMP agent provides the ability to define processes, files, extensible objects, load average objects, and disks for specific monitoring.  For each of these, directives are configured in snmpd.conf with parameters and thresholds, and an SNMP table is exposed that will return the monitored value and status value to an SNMP poll, on-demand.   I had previously  posted an article on using the ext (Extensible Object/Arbitrary Extension) capabilities of the Net-SNMP agent to utilize shell scripts for monitoring of Sun hardware.   With this degree of flexibility, it’s easy to understand why the Net-SNMP agent is so widely utilized.

Design Approach for the Net-SNMP Management Pack

Building on the lessons learned from developing the Cisco Management Pack, I wanted to create a management pack with similar capabilities for monitoring of Net-SNMP agents, followed by a set of management packs for monitoring of vendor-specific hardware (e.g. HP Proliant servers running Linux).   After some thought, I decided that it would be advantageous to split the Net-SNMP management pack into two separate management packs, one that defines the base classes and performs discovery of the classes and one for the actual monitoring.   The reason for this is that I wanted a base management pack without any actual monitoring functions so that it could easily (and more flexibly) be reused for future hardware monitoring management packs.   To illustrate this point, if the native UNIX agent capabilities of R2 are being used to monitor OS-level objects, there is still a need to discover the Net-SNMP agent and create base classes for use in SNMP-based hardware monitoring, but in this case, monitoring OS-level objects with SNMP and the OpsMgr agent would be redundant.  With the multiple MP approach, just the base Net-SNMP management pack could be imported along with the hardware monitoring management packs, skipping the Net-SNMP monitoring management pack altogether.  Additionally, I wanted to leave the monitoring management pack unsealed for easy customization, and MP references can only be made to sealed management packs.  In this way, the Net-SNMP Library MP will be sealed, but the monitoring MP can remain unsealed.

The Net-SNMP Library Management Pack

Read more of this post

SCOM: Advanced SNMP Monitoring Part IV: Making the Cisco MP SP1 Compatible

As previously mentioned, the Cisco Management Pack that I have been describing in the past several posts is specific to OpsMgr R2, due to its reliance on the capability of the System.SnmpProbe’s function to return multiple items in an SNMP walk, which is a feature introduced in R2.   My intention was to design the management pack so that it could easily be made SP1 compatible by substituting the discovery methodology, but keeping the classes and monitoring objects intact.   I am still doing more testing to finalize the SP1 version, but in this post, I am going to introduce the discovery methods that I have implemented for SP1 compatibility.

In general, when dealing with SNMP objects stored in tables, options in SCOM 2007 SP1 are a bit limited.  If the table objects are static, rules could be configured targeting specific OID’s, but this is unlikely to be a common scenario.   Another option (and a quite good one) would be to segregate monitoring duties between SCOM (for servers and applications) and a task-specific SNMP monitoring tool like SolarWinds or JalaSoft for SNMP device monitoring.   But for the purposes of this management pack, the option I pursued involves wrapping an external SNMP provider in WSH scripts to perform the discovery.   I chose to use the WMI provider because of its easy use in VBscripts and object-oriented capabilities, but something like the Net-SNMP binaries could be called with ShellExec methods in a vbs script to accomplish the same goal.  

Before I get into the use of the WMI SNMP provider in this MP, there is some background information that should be discussed.  Firstly, when the WMI SNMP provider is installed, it includes support only for the RFC-1213 MIB.   In order to use the WMI SNMP provider for OID’s not in the RFC-1213 MIB, the MIB files must be converted to MOF files with smi2smir.exe and then imported with mofcomp.exe.    This can get a bit tricky, largely depending on how well constructed the vendor’s MIB is and the dependencies of the MIB.   Once these steps are completed, the WMI SNMP provider exposes SNMP objects as collections that can be accessed natively in VBscripts.   However, the WMI provider is not terribly scalable and when overloaded with requests, WMI will cease to function in a timely fashion.   I had done some testing with using the WMI SNMP provider for rules and monitors, but decided against this approach due to the ease with which the provider can be overloaded when monitoring a handful of objects with reasonably aggressive frequencies.   On the other hand, discoveries are not as time-sensitive as monitors/rules and don’t need to run as frequently, so by using the WMI SNMP provider for discoveries, and the System.SnmpProbe provider for monitors and rules, this hybrid approach should remain pretty scalable. 

Implementation Overview

Read more of this post

SCOM: Updates to the Cisco Management Pack

I have uploaded version 1.01.27 of the Cisco Management Pack.   This minor update includes a few fixes and some new features added in response to some helpful comments from readers who have tried this MP out.  The changes in this version are:

  1. Group Population rules have been fixed so that they can be edited in the GUI
  2. Performance data collection has been modified to not collect data when the interface is in a partially discovered state.  This should prevent duplication of performance counters for interfaces going forward.   Because the Interface discovery occurs in two parts: 1) discovering the interfaces and 2) discovering the interface properties, I added a property to the Interface class named: Discovered.  When the Interface discovery runs (which just discovers the index), the discovered property will be set to False.  Once the Interface Property discovery runs, this property will be set to True.   The performance collection rules have been modified to only collect data if this property is set to true. 
  3. Index discovery can now be filtered with a RegEx expression for a device via an override (create the override on the Discover Cisco Interfaces discovery).   By default, the RegEx expression:  (.|..|…) will be used to discover interfaces, which will discover all interfaces with indexes with 1, 2, or 3 digits in length.  To filter by interface index, the expression can be modified to:  (1|2|3|10|15) where the numbers are the index values for the interfaces to be discovered, separated by pipe characters.   The override for this filter can be targeted at all Cisco devices, a group, or specific devices.
  4. I have added a new property to the Interface class named:  “Speed – Override.”  This property allows for manually defining a speed value for the interface to be used in utilization calculations.  There are two scenarios where this is useful: 1) sometimes the value reported in the RFC-1213 ifTable for an interface’s speed isn’t accurate, and 2) if an Ethernet port is the last monitored hop before a lower bandwidth wide-area connection (i.e. the vendor equipment is not monitored), it can be useful to monitor the Ethernet port as if its speed were equivalent to the wide-area connection bandwidth.   For example, if a switch is the last monitored device before connecting to a vendor-managed MetroE connection with a maximum bandwidth of 500Mbit, monitoring utilization on the gigabit switch port will be more accurate if the speed were overridden to 500Mbit.  This property can be set with an override on the Discover Cisco Interface Properties object discovery, by setting the OverrideIFSpeed value to the desired speed, in bits per second.

SCOM: Advanced SNMP Monitoring Part III: The Completed Cisco Management Pack

Note:  This management pack has been replaced by the more feature-rich xSNMP suite.

I still have some more full-scale testing to complete, but I have finished the first version of the Cisco monitoring management pack that I have described in the last few posts.   This version is R2 only, but I hope to have an SP1 version of it complete soon. 

Read more of this post

SCOM: Advanced SNMP Monitoring Part II: Designing the SNMP Monitors

For the Cisco Management Pack, a handful of different approaches were required when designing the monitors and rules and their supporting workflows.  These approaches can generally be placed into three categories:

  • Simple SNMP GET operations
  • Operations that require data manipulation
  • Operations that require access to previously collected values  

For example, monitoring of an interface’s operational status only requires current SNMP GET requests (to retrieve the ifOperStatus and ifAdminStatus values) from the SNMP ifTable.  Condition detection for the monitor type definition can handle the SNMP values as they are presented with no further manipulation.   However, to monitor a value such as the percent free bytes on a Cisco memory pool, data manipulation is required.  In this example, a percentage value is not exposed via SNMP, but free and available values are.  So, to calculate a percentage, the free and available bytes values must be summed and then the free value divided by that sum.  And lastly, in some cases, a value from a previous poll must be referenced to detect the desired condition, such as monitoring for an increase in the collisions reported for a Cisco interface.  In this example, the locIfCollisions values can be retrieved from the locIfTable, but this is a continually augmenting counter, which is difficult to monitor.   By recording the value from the immediately previous poll, and comparing it with the current poll, a delta value can be established for that polling cycle to determine the number of collisions recorded in a single polling cycle. 

Read more of this post

SCOM: Advanced SNMP Monitoring Part I: Discovering Cisco Devices and Hosted Classes

When limited to editing within the Authoring pane of the Operations Console in OpsMgr, SNMP monitoring options are relatively limited – only SNMP GET requests are supported and data cannot be manipulated before calculating health state.   However, with some pretty heavy authoring, very complex SNMP monitoring becomes completely viable, particularly in R2.   In the next several posts, I will be describing some methods to perform advanced SNMP discovery and monitoring in OpsMgr 2007, such as discovering hosting relationships for entities like network interfaces, as well as passing Snmp data to script probes in order to perform state change monitoring and mathematical operations.  When all is said and done, I will post my management packs for SNMP monitoring of Cisco, Checkpoint, and Net-SNMP devices.   In this post, I will describe the methods I used for custom discovery of Cisco SNMP devices and their hosted entities (interfaces, power supplies, etc).

Cisco SNMP Management Pack: Designing the Classes and Groups

In order to facilitate monitoring of the device (e.g. CPU, Memory, up/down status), individual interfaces (e.g. status, utilization, resets), and Cisco EnvMon objects (e.g. Power Supplies, Temperature, etc), each of these entities are best represented as a unique class with hosting relationships.   For the objects stored in tables, there are two feasible options for discovery.  In SCOM 2007 R2, the ‘walkreturnmultipleitems’ attribute of the System.SnmpProbe probe action can be set in order to walk an SNMP table and return each snmpdata item individually.   In SCOM 2007 SP1, this option is not available, requiring the use of a script discovery with an external snmp probe (such as WMI) in order to retrieve the discovery items.  In this post, I will describe the R2 method, and I will address the script option using the WMI SNMP provider in a later post.  In either case, the class design will be similar.

For monitoring of Cisco SNMP devices, I created five classes:

  • CiscoSNMP.Class.CiscoDevice
  • CiscoSNMP.Class.CiscoDevice.Interface
  • CiscoSNMP.Class.CiscoDevice.PowerSupply
  • CiscoSNMP.Class.CiscoDevice.Fan
  • CiscoSNMP.Class.CiscoDevice.TemperatureSensor

For the four hardware entities, I created a hosting relationship so that they are hosted by the CiscoDevice class.   The classes, properties, and relationships are represented on this diagram.

Read more of this post

WebMon: A SCOM Management Pack for Basic Web Site Monitoring, Configured with a Single XML File, Part II

The WebMon URL Monitoring management pack that I described in the previous post, can be downloaded here.    Notes on deploying and configuring the MP are as follows:

Overview:

This management pack for SCOM 2007, is intended to provide basic web monitoring for multiple web sites while being very easy to deploy.   All configuration for each monitored URL is performed by editing a single XML file on the Watcher Node.   The management pack implements three classes:  1) WebMon Watcher Node, which hosts 2) WebMon Request and 3) WebMon Secure Request classes.   The Request and Secure Request classes are identical, except the Secure Request class utilizes NTLM/Integrated authentication through a RunAs profile.   The monitors and rules implemented are as follows (for each of the two request clasess):

  • Monitors:
    • DNS Resolution Failure
    • Status Code (greater than a threshold)
    • Reachable
    • Error Code
    • CA Untrusted
    • Certificate Expired
    • Certificate Invalid
    • Response Time (greater than a threshold)
  • Rules:
    • Response Time Performance Collection

The monitors are rolled-up into an aggregate monitor used for alerting for each Request class.   A set of views are also created under the Web Application folder in the Monitoring section of the SCOM console.  These include state views for the Watcher Nodes, two Request classes, and a performance view displaying the historical response time for all requests. 

The advantage of this management pack over the SCOM the Web Application monitoring implementation for basic web monitoring is that it can be rapidly deployed and configured by simply editing the XML configuration file. 

Deployment and Configuration:

To deploy the WebMon URL Monitoring Management Pack:

  • Copy the sample webmonconfig.xml to a location on a local disk drive of each node intended to be a watcher node (the default path is C:\webmon\webmonconfig.xml)
  • Edit the configuration file with the desired settings (see the XML configuration section below)
  • Import the WebMon.xml file using the Import Management Pack function in the Operations Manager console, under administration
  • Access the Authoring section of the Opeartions Manager console, click the Change Scope link and limit the scope to WebMon Watcher Node. 
  • Right click the “WebMon Discovery” object under “Discovered Type: WebMon Watcher Node,” and choose Overrides->Override the Object Discovery->For a Specific Object of Class: Windows Server.   Select the server designated as Watcher Nodes, and override the object discovery to be “enabled.”
  • If the webmonconfig.xml file is deployed to a non-default location, override the script arguments and update the first parameter (c:\webmon\webmonconfig.xml) to reflect the actual script location.
  • The default interval for the object discovery is 15 minutes.  If this needs to be changes, edit the properties of the WebMon Discovery object, and adjust the schedule accordingly. 
  • If any sites are to be monitored with NTLM/Integrated authentication, a RunAs Profile must be configured.
    • Create a new RunAs account to be used by the watcher node, or determine an existing account to use.  
    • In the Administration section of the Operations Manager console, click RunAs Profiles.  Edit the properties of the WebMon Request RunAs Profile. 
    • Assign a RunAs account for the designated Watcher Node

Configuration

All configurable elements of the WebMon URL Monitoring Management Pack can be set in the webmonconfig.xml on the Watcher Node.   Multiple requests can be defined in the configuration file by adding another <request/> element.   The discovery script performs some basic validation, but the XML configuration should be edited carefully in order to prevent inadvertent errors due to invalid configuration.

The XML configuration file looks like:

 <?xml version=”1.0″ encoding=”utf-8″?>
<webmonconfig>
  <requests>
    <request>
      <requesturl>http://www.google.com</requesturl&gt;
      <responsetimethreshold>10</responsetimethreshold>
      <retrycount>1</retrycount>
      <pollinginterval>300</pollinginterval>
      <statuscodevalue>399</statuscodevalue>
      <usentlm>false</usentlm>
    </request>
    <request>
      <requesturl>http://www.microsoft.com</requesturl&gt;
      <responsetimethreshold>15</responsetimethreshold>
      <retrycount>0</retrycount>
      <pollinginterval>180</pollinginterval>
      <statuscodevalue>399</statuscodevalue>
      <usentlm>false</usentlm>
    </request>
    </request>
  </requests>
</webmonconfig>

The configuration elements are:

requests/request

  • requesturl:  the URL of the request, either http:// or https://
  • responsetimethreshold:  the response time threshold in seconds , if this is surpassed, a warning alert will be generated
  • retrycount:  the number of attempts to retry the request, 0 or greater
  • pollinginterval:  the interval in seconds between requests
  • statuscodevalue;  an error alert will be generated if the reponse status code is greater than this value
  • usentlm:  input true for this value if the site requires NTLM authentication.  The credentials used are defined in the WebMon Request URL RunAsProfile

Using the Management Pack

For both the WebMon Request and WebMon Secure Request classes, aggregate monitors are configured to generate alerts if any of the monitors trigger a warning or error health state.   Health states can be viewed in the Monitoring section of the Operations Manager Console, under the Web Application\WebMon URL Monitoring folder.   The views include state views for the WebMon Watcher Node, WebMon Request, and WebMon Secure Request classes as well as a view to display the collected response time data for all of the URL requests.  While overrides on the individual monitors can be configured, configuration should be performed in the webmonconfig.xml file. 

Notes on Editing the Management Pack

Prior to editing the management pack, please reference the link below to read more about the design of this management pack.   Due to some issues with the way the Authoring Console handles MonitorTypes and the use of variables in some configuration elements, all edits should be made in an XML editor. 

Support and More Info

This management pack is provided as-is, with no implied or explicit warranty.    For more info about the design and development of this management pack, reference:  https://operatingquadrant.com/2009/08/22/webmon-a-scom-management-pack-for-basic-web-site-monitoring-configured-with-a-single-xml-file-part-i/

WebMon: A SCOM Management Pack for Basic Web Site Monitoring, Configured with a Single XML File, Part I

The native Web Application monitoring capabilities of SCOM are impressive to say the least, and provide excellent functionality for in-depth monitoring of complex web application transactions.   However, the administrative effort required to configure web application monitoring makes the implementation less than ideal for wide-scale basic web site monitoring.  In most cases, required web monitoring would entail monitors just for status code, reachability, response time, and perhaps some other checks like certificate validity.  I wanted to create a custom management pack to implement these monitors, with a minimal degree of configuration effort.   While researching available options, I came across a post by Russ Slaten describing a way to utilize the Microsoft SystemCenter WebApplication Library implementation to accomplish a similar goal.  

However, I wanted to take it a step further.  The key decision point in my design approach was that I wanted to be able to deploy a configuration file (in XML format) to each watcher node involved and define the URL’s and monitoring properties in that file.   I have completed this management pack, and I’m quite happy with it thus far.   I’ve described the Management Pack and development process below.

The WebMon MP, Design and Development

Read more of this post