Some Screenshots of the xSNMP Management Packs

Diagram view of an HP ProCurve device:

Diagram view of a Data Domain Restorer:

Health Explorer view for an APC UPS:

Diagram View for an HP Proliant server:

Performance View for a network interface:

Diagram view for a Brocade Fibre Channel Switch:

Scalability and Performance (Design and Testing) in the xSNMP Management Packs

In this post, I intend to describe some of the challenges in scaling SNMP monitoring in an Operations Manager environment to a large number of monitored objects, as well as my experiences from testing and the approaches that I took to address these challenges with the xSNMP Management Packs.

Background

In spite of the market availability of many task-specific SNMP monitoring applications boasting rich feature sets, I think that a strong case can be made for the use of System Center Operations Manager in this SNMP monitoring role. Using a single product for systems and infrastructure (SNMP) monitoring facilitates unparalleled monitoring integration (e.g. including critical network devices/interfaces or appliances in Distributed Application Models for vital business functions). The rich MP authoring implementation, dynamic discovery capabilities, and object-oriented modeling approach allow a level of depth and flexibility in SNMP monitoring not often found in pure SNMP monitoring tools.

However, Operations Manager is first and foremost a distributed monitoring application, most often depending on agents to run small workloads independently. Inevitably, running centralized monitoring workloads (i.e. SNMP polls) in a distributed monitoring application is going to carry a higher performance load than the same workloads in a task-specific centralized monitoring application that was built from the ground up to handle a very high number of concurrent polls with maximum efficiency. This centralized architecture would likely feature a single scheduler process that distributes execution of polls in an optimized fashion as well as common polling functions implemented in streamlined managed code. With SNMP monitoring in Operations Manager, any optimization of workload scheduling and code optimization more or less falls to the MP author to implement.

While working on the xSNMP Management Packs, I spent a lot of time testing different approaches to maximize efficiency (and thus scalability) in a centralized SNMP monitoring scenario. I’m sure there is always room for continual improvement, but I will try to highlight some of the key points of my experiences in this pursuits.

Designing for Cookdown

Cookdown is one of the most important concepts in MP authoring when considering the performance impact of workflows. A great summary of OpsMgr cookdown can be found here. In effect, the cookdown process looks for modules with identical configurations (including input parameters) and replaces the multiple executions of redundant modules with a single execution. So, if one wanted to monitor and collect historical data on the inbound and outbound percent utilization and Mb/s throughput of an SNMP network interface, a scheduler and SNMP Probe (with VarBinds defined to retrieve the in and out octets counters for the interface) could be configured. As long as each of the rules and monitors provided the same input parameters to these modules for each interface, the scheduler and SNMP probe would only execute once per interval per interface. Taking this a step further, the SNMP probe could be configured to gather all SNMP values for objects to monitor in the IFTable for this interface (e.g. Admin Status, Oper Status, In Errors, Out Errors), and these values could be used in even more rules and monitors. The one big catch here is that the SNMP Probe module stops processing SNMP VarBind’s once it hits an error. So, it’s typically not a good idea to mix SNMP VarBinds for objects that may not be implemented on some agents with OIDS that would be implemented on all agents.

Workflow Scheduling

Read more of this post

Introducing the xSNMP Management Pack Suite

Introduction

Over the past several weeks, I’ve been hard at work on some new SNMP management packs for Operations Manager 2007 R2, to replace the Cisco SNMP MP and extend similar functionality to a wide range of SNMP-enabled devices.   In the next few posts, I hope to describe some of the design and development considerations related to these Management Packs, which I am calling the xSNMP Management Pack Suite.   For this post, I hope to give a basic overview of the development effort and resulting management packs.

As I was working on some feature enhancements to the Cisco SNMP Management Pack, and following some really great discussions with others on potential improvements,  I concluded that a more efficient and effective design could be realized by aligning the management pack structure along the lines of the SNMP standard itself.   To expound on this point, much of the monitoring in the Cisco MP is not specific to Cisco devices, but rather, mostly common to all SNMP devices.   The SNMP standard defines a hierarchical set of standard MIBs, and a hierarchical implementation of vendor-specific MIBS, with consideration to the elimination of  redundancy.   I tried to loosely adapt this model in the xSNMP MP architecture.   The first of the MP’s, and the one that all of the others depend on, is the root xSNMP Management Pack.   This management pack has a few functions:

  1. It performs the base discovery of SNMP devices (the discovery is targeted to the  SNMP Network Device class)
  2. It implements monitoring of the SNMP v1/v2 objects for discovered devices and interfaces
  3.  It provides a set of standardized and reusable data sources for use in dependent management packs

From there, the remaining management packs implement vendor-specific monitoring.   Devices and/or interfaces are discovered for the vendor-specific management packs as derived objects from the xSNMP MP, and most of the discoveries, monitors, and rules utilize the common data sources from the xSNMP MP, which makes the initial and ongoing development for vendor-specific MP’s much more efficient.

Controlling Interface Monitoring

One of the topics frequently commented on with the Cisco SNMP Management Pack, and a subject of much deliberation, was that of selecting network interfaces for monitoring.   Even determining the optimal default interface monitoring behavior (disabled vs. enabled) isn’t a terribly easy decision.  For example, a core network switch in a datacenter may require that nearly all interfaces are monitored, while a user distribution switch may just require some uplink ports to be monitored.   In the end, I decided on an approach that seems to work quite well.   In the xSNMP Management Pack, all interface monitoring is disabled by default.   A second, unsealed management pack, is also provided and includes groups to control interface monitoring (e.g. Fully Monitored, Not Monitored, Status Only Monitored).  Overrides are pre-configured in this MP to enable/disable the appropriate interface rules and monitors for these groups.   So, to enable interface monitoring for all Ethernet interfaces, a dynamic group membership rule can be configured to include objects based with interface type 6, or if critical interfaces are consistently labeled on switches with an Alias, the Interface Alias can be used in rules for group population.  

Organizing Hosted Objects

For each of the management packs,  I tried to take a standardized approach for hierarchical organization of hosted objects and their relationships.   This organization was facilitated primarily through the use of arbitrary classes to contain child objects.   So, rather than discover all interfaces of a device with a single hosting relationship to the parent, an intermediary logical class (named “Interfaces”) is discovered with parent and child hosting relationships.   This approach has three primary benefits: 1) the graphical Diagram View is easier to navigate, 2) the object hierarchy is more neatly organized for devices that may be monitored by multiple MP’s (e.g. a server monitored by three MP’s for SNMP hardware monitoring, O/S monitoring, and application monitoring), and 3) the organization of hosted objects is consistent, even for devices with multiple entities exposed through a single SNMP agent. 

Scalability

With loads of invaluable help from some volunteer beta testers, a great deal of time has been spent testing and investigating performance and scalability for these management packs.  While I will save many of these details for a later post, I can offer a few comments on the topic.   In all but the smallest SNMP-monitoring environments, it’s highly advisable to configure SNMP devices to be monitored by a node other than the RMS.  For larger environments, one or more dedicated Management Servers or Agent Proxies (Operations Manager agents configured to proxy requests for SNMP devices) are preferred for optimal performance.    From our testing with these Management Packs, a dedicated agent proxy can be expected to effectively monitor between 1500-3500 objects, depending on the number of monitors/rules, the intervals configured, and the processing power of the agent proxy.   By object, I am referring to any discovered object that is monitored by SNMP modules, such as devices, interfaces, fans, file systems, power supplies, etc.   So, monitoring a switch infrastructure with 4000-6000 monitored network interfaces should be doable with two dedicated agent proxy systems.  

I intend to write in greater detail about these topics in the coming weeks, and hope to post the first public beta version of these management packs soon.

Automating WSS 3.0 Backups with a Script

Although SQL backups of the content databases for a Windows SharePoint Services farm can be used for data recovery, it’s usually a good idea to also perform backups through the stsadm.exe utility to facilitate site and object-level restores.   I recently took on a task to script a more robust solution for the automation of WSS farm backups, which I will describe here.

The stsadm.exe utility can be used to backup in two modes, site collection and catastrophic.  The site collection method backups up an individual site and content for an individual site, specified by URL, and the catastrophic backup method backs up the entire farm or a specified object in a full or differential mode.    I opted to go with the catastrophic backup method in this script to support differential backups and eliminate the requirement to enumerate individual sites for backup operations. 

WSS Backup Script Overview

The script is a bit too long to post in its entirity, but it can be downloaded here.   The script accepts three parameters:

  • The target backup directory path
  • The backup type (full or differential)
  • The number of backups to maintain

The backup operation is relatively simple:  the script uses the WScript.Shell.exec method to execute the stsadm.exe command (after querying the registry to determine the install path of WSS).

Command = sPath & “\STSADM.exe -o backup -Directory ” & BackupDir & ” -BackupMethod ” & BackupType

Set oExec = WshShell.Exec(command) 
While Not oExec.StdOut.AtEndOfStream 
    OutPut = oExec.StdOut.ReadAll
Wend 
Do While oExec.Status = 0
    WScript.Sleep 5000
Loop

To improve monitoring of the operation, the script performs a shell execution to the eventcreate.exe utility to log status to the Windows Application Log.   (Although the WScript.Shell supports basic EventLog logging, I wanted to control the event source and ID, so the eventcreate.exe utility seemed to be a better option).

If blSuccessful then
   Command =  “Eventcreate /L Application /T INFORMATION /ID 943 /SO “”SharePoint Backups”” /D “” ” & sMesg & “”
Else
   Command =  “Eventcreate /L Application /T WARNING /ID 944 /SO “”SharePoint Backups”” /D “” ” & sMesg & “”
End if
Set oExec = WshShell.Exec(command)

The most complex operation of this WSS backup automation script is the maintenance of old backups.  The stsadm backup operation maintains an XML file named spbrtoc.xml in the backup directory with meta-data related to past backups.   While an example of deleting backups older than a certain time interval can be found here, I wanted to maintain past backups based on a count (x number of fulls, x number of differentials).    To implement this, the script loads meta-data from the Table of Contents XML file into an array, determines the number of backups to be purged (correlated to the current backup operation type – full or differential), flags the oldest backups for deletion, and then deletes the related backup directories and XML nodes.  

Automating With System Center Operations Manager 2007

Read more of this post