xSNMP Reports 1.1.1 Now Available!

The xSNMP Reports version 1.1.1 package is now available at manage-x.net.   This suite of management pack adds value to the xSNMP suite by implementing OpsMgr reports for data collected by performance rules in the xSNMP management packs.   Reporting management packs included are:

  • xSNMP for APC Reports
  • xSNMP for APC NetBotz Reports
  • xSNMP for Brocade Reports
  • xsNMP for Check Point Secure Platform Reports
  • xSNMP for Cisco Reports
  • xSNMP for Data Domain Reports
  • xSNMP for Dell PowerEdge Reports
  • xSNMP for HP ProCurve Reports
  • xSNMP for IBM AIX Reports
  • xSNMP for Juniper Networks Reports
  • xSNMP for Juniper-NetScreen Reports
  • xSNMP for NetApp Reports
  • xSNMP for Net-SNMP Reports
  • xSNMP for SonicWALL Reports

Requirements are:   OpsMgr 2007 R2, xSNMP suite, OpsMgr reporting implementation.   Like the xSNMP suite, the reports MP’s are licensed with the GNU-GPL and unsealed versions are provided.

Advertisement

xSNMP Verions 1.1.1(Alpha) Available Now!

I have posted the 1.1.1 (Alpha) version of the xSNMP suite at manage-x.net.   While the updates in this release are relatively minor in nature, I would still consider this to be an early test copy of this version and deploy in test environments before production.  The abbreviated change log for this release is:

New Management Packs:

  • xSNMP for APC NetBotz Management Pack
  • xSNMP for IBM AIX
  • xSNMP for Juniper Networks Management Pack
  • xSNMP for SonicWALL Management Pack

New Features:

  • Added support for monitoring of Net-SNMP Extend objects (xSNMP for Net-SNMP Management Pack)
  • Added a three state monitor for Net-SNMP Exec objects(xSNMP for Net-SNMP Management Pack)
  • New Cisco Firewall Subsystem monitoring for PIX and ASA firewalls
  • Added new data sources to the xSNMP MP to decrease data source redundancy
  • Individual inbound/outbound speeds can now be set through the Speed Override discovery in the xSNMP MP
  • All views are now publically accessible

Issues Resolved:

  • Fixed an uncommon issue with network interface utilization calculations
    that could result in invalid values being calculated if a previous poll returned null data
  • Updated the list of Device OID’s for APC UPS devices to include missing UPS models
  • Fixed an issue with an incorrect OID specified for HP Proliant SCSI (IDA) storage health monitoring
  • Fixed an issue with the Cisco Default Gateway Changed alert-generating rule

Development Updates

While this blog has been a bit quiet lately, it has not been for lack of efforts.   In the next two weeks, a minor update to the xSNMP suite should be ready with a few bug-fixes and feature improvements.    This update will also include three new management packs:

  • xSNMP  for Juniper Networks
  • xSNMP for SonicWALL
  • xSNMP for APC NetBotz

We’re still finishing up testing and fine-tuning on the Oracle Unix/Linux MP as well.

These updates will be described in more detail here and available for download on manage-X.net soon.

xSNMP Management Pack Suite Version 1.1.0 (Beta)

After weeks of pilot testing, the xSNMP Suite version 1.1.0 is ready for public beta release.  In addition to some bug fixes and feature enhancements, this release includes four completely new management packs:

  • xSNMP Juniper-NetScreen MP:   Implements comprehensive monitoring of NetScreen firewall devices
  • xSNMP NetApp MP:  Implements comprehensive monitoring of NetApp storage devices
  • xSNMP Sun Hardware MP: Implements monitoring of Sun Server hardware, through the SunFire Management Agent, Sun ILOM out-of-band management interface, or the Fujitsu XSCF out-of-band management interface
  • xSNMP Dell PowerEdge MP:  Implements monitoring of hardware for Dell PowerEdge devices via the Dell OpenManage agent or DRAC out-of-band management interface

These management packs are in addition to the other 11 management packs previously included in the suite:

  • xSNMP Management Pack – Implements filtered discovery and monitoring of SNMP devices and interfaces that support the standard RFC1213 MIB, IF-MIB, and EtherLike-MIB.  This management pack is the core of the xSNMP Suite and contains public datasources that are utilized in the other optional management packs.
  • xSNMP Overrides Management Pack – This unsealed management pack can be used as a container for overrides, but also provides preconfigured groups and overrides for easily controlling interface monitoring through groups of network interfaces.
  • xSNMP APC Managment Pack – Implements monitoring for APC Rackmount PDU, UPS, Automatic Transfer Switch, and Environmental Monitor devices.
  • xSNMP Brocade Management Pack – Implements chassis monitoring for Brocade Fibre-Channel switch devices (Fibre-Channel ports are monitored as network interfaces with the xSNMP MP).
  • xSNMP Check Point Secure Platform Management Pack – Implements module health and firewall HA failover monitoring for Check Point Secure Platform firewall devices.
  • xSNMP Cisco Management Pack – Implements additional monitoring for Cisco devices, primarily including chassis hardware moniotring for devices that support the EnvMon MIB, Entity-MB, or Cisco-Stack MIB.
  • xSNMP Data Domain Management Pack – Implements monitoring for the performance, hardware status, and replication status of Data Domain Restorer storage appliances.
  • xSNMP HP ProCurve Management Pack – Implements component health monitoring for HP ProCurve switches and wireless access points.
  • xSNMP HP Proliant Management Pack – Implements hardware health monitoring for SNMP-enabled HP servers that support the Proliant Insight Management Agents.
  • xSNMP Net-SNMP Management Pack  – Implements operating system monitoring for Net-SNMP agent devices, such as UNIX/Linux servers through the UCD and Host-Resources MIBs. 
  • xSNMP Syslog Management Pack – Provides  warning and critical alert generating rules that can be enabled and filtered with overrides to alert on incoming syslog messages from discovered SNMP devices.

Behind the scenes, significant improvements have been made in the class structure and shared data source configuration in the root xSNMP MP.  These changes facilitate better organization and easier future development of dependent management packs.  These changes are probably best illustrated with a screenshot of the improved xSNMP base class hierarchy:

I greatly appreciate the testing carried out by the volunteer pilot testers, as well as all of the feedback from those who have tested the xSNMP MP’s, but I specifically want to thank Chris L. for all of his assistance in every stage of the development of the NetScreen MP, as well as Raphael Burri for all of his help in designing the organizational improvements in the xSNMP MP.  

The updated xSNMP Suite can be downloaded here.

Note:  One impact of the reorganization in the xSNMP MP is that direct upgrade from previous versions is not possible and the xSNMP MP’s must be removed prior to importing the new version.   To assist in this regard, I have provided a PowerShell script which can be used to automate the upgrade process and preserve configured overrides (in most cases).   The prerequisites and procedures for upgrading are covered in detail in the Upgrading_xSNMP_To_1.1.0 document that is included in the download package.

xSNMP Management Packs – Beta Version 1.0.8 Release

After many weeks of development efforts and testing, I have made the latest beta version of the xSNMP Management Packs available for download.   Before getting into any more detail, I would be remiss if I did not acknowledge the invaluable help provided by some of the volunteers who tested these management packs through all stages of development.  Many thanks in particular to Chris and Davey, who played a huge role in every stage of the development of the xSNMP MP’s.   Many thanks also to Gary and Björn for their great help in testing the MP’s.  

The included documentation covers recommendations for deployment and configuration as well as the details of the management packs.  Additional information about performance considerations in large SNMP monitoring environments can be found in this previous post.   At present, the following management packs are included in this suite, and more are currently in the works.

  • xSNMP Management Pack – Implements filtered discovery and monitoring of SNMP devices and interfaces that support the standard RFC1213 MIB, IF-MIB, and EtherLike-MIB.  This management pack is the core of the xSNMP Suite and contains public datasources that are utilized in the other optional management packs.
  • xSNMP Overrides Management Pack – This unsealed management pack can be used as a container for overrides, but also provides preconfigured groups and overrides for easily controlling interface monitoring through groups of network interfaces.
  • xSNMP APC Managment Pack – Implements monitoring for APC Rackmount PDU, UPS, Automatic Transfer Switch, and Environmental Monitor devices.
  • xSNMP Brocade Management Pack – Implements chassis monitoring for Brocade Fibre-Channel switch devices (Fibre-Channel ports are monitored as network interfaces with the xSNMP MP).
  • xSNMP Check Point Secure Platform Management Pack – Implements module health and firewall HA failover monitoring for Check Point Secure Platform firewall devices.
  • xSNMP Cisco Management Pack – Implements additional monitoring for Cisco devices, primarily including chassis hardware moniotring for devices that support the EnvMon MIB, Entity-MIB, or Cisco-Stack MIB.
  • xSNMP Data Domain Management Pack – Implements monitoring for the performance, hardware status, and replication status of Data Domain Restorer storage appliances.
  • xSNMP HP ProCurve Management Pack – Implements component health monitoring for HP ProCurve switches and wireless access points.
  • xSNMP HP Proliant Management Pack – Implements hardware health monitoring for SNMP-enabled HP servers that support the Proliant Insight Management Agents.
  • xSNMP Net-SNMP Management Pack  – Implements operating system monitoring for Net-SNMP agent devices, such as UNIX/Linux servers through the UCD and Host-Resources MIBs. 
  • xSNMP Syslog Management Pack – Provides  warning and critical alert generating rules that can be enabled and filtered with overrides to alert on incoming syslog messages from discovered SNMP devices.

Feedback is, of course, welcomed.

Some Screenshots of the xSNMP Management Packs

Diagram view of an HP ProCurve device:

Diagram view of a Data Domain Restorer:

Health Explorer view for an APC UPS:

Diagram View for an HP Proliant server:

Performance View for a network interface:

Diagram view for a Brocade Fibre Channel Switch:

Scalability and Performance (Design and Testing) in the xSNMP Management Packs

In this post, I intend to describe some of the challenges in scaling SNMP monitoring in an Operations Manager environment to a large number of monitored objects, as well as my experiences from testing and the approaches that I took to address these challenges with the xSNMP Management Packs.

Background

In spite of the market availability of many task-specific SNMP monitoring applications boasting rich feature sets, I think that a strong case can be made for the use of System Center Operations Manager in this SNMP monitoring role. Using a single product for systems and infrastructure (SNMP) monitoring facilitates unparalleled monitoring integration (e.g. including critical network devices/interfaces or appliances in Distributed Application Models for vital business functions). The rich MP authoring implementation, dynamic discovery capabilities, and object-oriented modeling approach allow a level of depth and flexibility in SNMP monitoring not often found in pure SNMP monitoring tools.

However, Operations Manager is first and foremost a distributed monitoring application, most often depending on agents to run small workloads independently. Inevitably, running centralized monitoring workloads (i.e. SNMP polls) in a distributed monitoring application is going to carry a higher performance load than the same workloads in a task-specific centralized monitoring application that was built from the ground up to handle a very high number of concurrent polls with maximum efficiency. This centralized architecture would likely feature a single scheduler process that distributes execution of polls in an optimized fashion as well as common polling functions implemented in streamlined managed code. With SNMP monitoring in Operations Manager, any optimization of workload scheduling and code optimization more or less falls to the MP author to implement.

While working on the xSNMP Management Packs, I spent a lot of time testing different approaches to maximize efficiency (and thus scalability) in a centralized SNMP monitoring scenario. I’m sure there is always room for continual improvement, but I will try to highlight some of the key points of my experiences in this pursuits.

Designing for Cookdown

Cookdown is one of the most important concepts in MP authoring when considering the performance impact of workflows. A great summary of OpsMgr cookdown can be found here. In effect, the cookdown process looks for modules with identical configurations (including input parameters) and replaces the multiple executions of redundant modules with a single execution. So, if one wanted to monitor and collect historical data on the inbound and outbound percent utilization and Mb/s throughput of an SNMP network interface, a scheduler and SNMP Probe (with VarBinds defined to retrieve the in and out octets counters for the interface) could be configured. As long as each of the rules and monitors provided the same input parameters to these modules for each interface, the scheduler and SNMP probe would only execute once per interval per interface. Taking this a step further, the SNMP probe could be configured to gather all SNMP values for objects to monitor in the IFTable for this interface (e.g. Admin Status, Oper Status, In Errors, Out Errors), and these values could be used in even more rules and monitors. The one big catch here is that the SNMP Probe module stops processing SNMP VarBind’s once it hits an error. So, it’s typically not a good idea to mix SNMP VarBinds for objects that may not be implemented on some agents with OIDS that would be implemented on all agents.

Workflow Scheduling

Read more of this post

Introducing the xSNMP Management Pack Suite

Introduction

Over the past several weeks, I’ve been hard at work on some new SNMP management packs for Operations Manager 2007 R2, to replace the Cisco SNMP MP and extend similar functionality to a wide range of SNMP-enabled devices.   In the next few posts, I hope to describe some of the design and development considerations related to these Management Packs, which I am calling the xSNMP Management Pack Suite.   For this post, I hope to give a basic overview of the development effort and resulting management packs.

As I was working on some feature enhancements to the Cisco SNMP Management Pack, and following some really great discussions with others on potential improvements,  I concluded that a more efficient and effective design could be realized by aligning the management pack structure along the lines of the SNMP standard itself.   To expound on this point, much of the monitoring in the Cisco MP is not specific to Cisco devices, but rather, mostly common to all SNMP devices.   The SNMP standard defines a hierarchical set of standard MIBs, and a hierarchical implementation of vendor-specific MIBS, with consideration to the elimination of  redundancy.   I tried to loosely adapt this model in the xSNMP MP architecture.   The first of the MP’s, and the one that all of the others depend on, is the root xSNMP Management Pack.   This management pack has a few functions:

  1. It performs the base discovery of SNMP devices (the discovery is targeted to the  SNMP Network Device class)
  2. It implements monitoring of the SNMP v1/v2 objects for discovered devices and interfaces
  3.  It provides a set of standardized and reusable data sources for use in dependent management packs

From there, the remaining management packs implement vendor-specific monitoring.   Devices and/or interfaces are discovered for the vendor-specific management packs as derived objects from the xSNMP MP, and most of the discoveries, monitors, and rules utilize the common data sources from the xSNMP MP, which makes the initial and ongoing development for vendor-specific MP’s much more efficient.

Controlling Interface Monitoring

One of the topics frequently commented on with the Cisco SNMP Management Pack, and a subject of much deliberation, was that of selecting network interfaces for monitoring.   Even determining the optimal default interface monitoring behavior (disabled vs. enabled) isn’t a terribly easy decision.  For example, a core network switch in a datacenter may require that nearly all interfaces are monitored, while a user distribution switch may just require some uplink ports to be monitored.   In the end, I decided on an approach that seems to work quite well.   In the xSNMP Management Pack, all interface monitoring is disabled by default.   A second, unsealed management pack, is also provided and includes groups to control interface monitoring (e.g. Fully Monitored, Not Monitored, Status Only Monitored).  Overrides are pre-configured in this MP to enable/disable the appropriate interface rules and monitors for these groups.   So, to enable interface monitoring for all Ethernet interfaces, a dynamic group membership rule can be configured to include objects based with interface type 6, or if critical interfaces are consistently labeled on switches with an Alias, the Interface Alias can be used in rules for group population.  

Organizing Hosted Objects

For each of the management packs,  I tried to take a standardized approach for hierarchical organization of hosted objects and their relationships.   This organization was facilitated primarily through the use of arbitrary classes to contain child objects.   So, rather than discover all interfaces of a device with a single hosting relationship to the parent, an intermediary logical class (named “Interfaces”) is discovered with parent and child hosting relationships.   This approach has three primary benefits: 1) the graphical Diagram View is easier to navigate, 2) the object hierarchy is more neatly organized for devices that may be monitored by multiple MP’s (e.g. a server monitored by three MP’s for SNMP hardware monitoring, O/S monitoring, and application monitoring), and 3) the organization of hosted objects is consistent, even for devices with multiple entities exposed through a single SNMP agent. 

Scalability

With loads of invaluable help from some volunteer beta testers, a great deal of time has been spent testing and investigating performance and scalability for these management packs.  While I will save many of these details for a later post, I can offer a few comments on the topic.   In all but the smallest SNMP-monitoring environments, it’s highly advisable to configure SNMP devices to be monitored by a node other than the RMS.  For larger environments, one or more dedicated Management Servers or Agent Proxies (Operations Manager agents configured to proxy requests for SNMP devices) are preferred for optimal performance.    From our testing with these Management Packs, a dedicated agent proxy can be expected to effectively monitor between 1500-3500 objects, depending on the number of monitors/rules, the intervals configured, and the processing power of the agent proxy.   By object, I am referring to any discovered object that is monitored by SNMP modules, such as devices, interfaces, fans, file systems, power supplies, etc.   So, monitoring a switch infrastructure with 4000-6000 monitored network interfaces should be doable with two dedicated agent proxy systems.  

I intend to write in greater detail about these topics in the coming weeks, and hope to post the first public beta version of these management packs soon.

SCOM: Combining a System.SnmpProbe and System.Performance.DeltaValueCondition Modules to Calculate SNMP Counter Delta Values

I have previously written about using the combination of an SnmpProbe and script probe in Operations Manager work flows to facilitate manipulation of numeric values.   While this is currently the only way to perform numeric operations, there are some cases in which the only required manipulation of a numeric value is the calculation of a delta between two polls, such as calculating the number of interface collisions in an interval (from the ifTable) or calculating the number of interface resets in a polling cycle (from the Cisco locIfTable).  In these cases, the SnmpProbe can be combined with a System.Performance.DeltaValueCondition condition detection module to calculate the delta value without having to engage a script probe.

The Performance.DeltaValueCondition module expects Performance Data as an input, so a System.Performance.DataGenericMapper must be used between the SnmpProbe and DeltavalueCondition modules to do the data conversion.   The DataGenericMapper accepts two options:  NumSamples and Absolute.   The NumSample parameter sets the number of value samples to maintain in memory, and the value returned is the difference between the first and last samples in memory.  The “Absolute” parameter, when true, causes the DeltaValueCondition module to return the delta as the raw difference between the samples, and when false, causes the module to return the percentage of change.

An example workflow can be represented in this diagram (the expression filter being used to validate the data returned from the SnmpProbe prior to continuing):

  Read more of this post

SCOM: WSH Vs. PowerShell Modules in Composite Workflows – Resource Utilization in SNMP Data Manipulation

One of the realities of working with SNMP monitoring is that more often than not, the monitoring data are presented in a raw form that requires some kind of manipulation in order to render meaningful output.  For example, required data manipulation may be a simple arithmetic operation on two values to calculate a percentage, or in the case of Counter data, mathematical operations based on the delta between values recorded in multiple polling cycles.  In Operations Manager, these manipulations require exiting the realm of managed code and utilizing script-based modules to perform the operations or facilitate temporary storage of values from previous polling cycles.  Two sets of modules are available for the Operations Manager –supported scripting engines: WSH and PowerShell.  To date, I had been opting to use VB scripts when authoring Management Packs for two reasons: 1) WSH is universally deployed in Windows environments whereas PowerShell is not necessarily so – by using VB scripts, there is no requirement to install Power Shell on proxy agents 2) I had assumed that the resource utilization impact of PowerShell was equal or greater than that of WSH.   I had assumed that PowerShell would carry a heavier impact based on the simple notion that if one were to watch process resource utilization when simply launching powershell.exe and cscript.exe, powershell.exe consumes more memory and CPU time (assuming WSH 5.7 is installed).  

The resource utilization of these script providers becomes a major concern particularly when implementing script-based modules in SNMP monitoring scenarios.   To illustrate this point, if a proxy agent were configured to proxy SNMP requests for 10 Cisco switches, with each of these switches having an average of 20 interfaces discovered, and each interface monitored with two monitors that utilize a script probe action to manipulate the raw SNMP data (e.g. collisions and octets), 400 scripts would be executed in a single polling cycle for just the interface monitors for this small scale monitoring scenario.  This poses a threat to the scalability of SNMP monitoring and could severely limit the number of devices/objects a single proxy agent can handle effectively.  

In the course of trying to find a way to address this scalability issue, I was fortunate enough to communicate with someone possessing a great deal of insight into Operations Manager who helpfully suggested that the PowerShell modules should be more efficient than WSH-based modules in composite workflows.  I rewrote all of the scripts in the Cisco MP to convert them from VB Script to PowerShell and began some testing.  I was familiar with the tighter integration of PowerShell in R2 modules (PS scripts no longer have to be launched as external commands), but to be honest, I was expecting to see a large number of powershell.exe processes spawned as the monitors fired.   However, this is not the case.  Rather, it looks to me that the modules are executing the PowerShell script through the .NET framework within the context of the monitoringhost.exe process.   This does appear to be more efficient overall, as the overhead associated with spawning new processes is effectively eliminated, and my impressions thus far are that CPU utilization overall is reduced.

However, switching from WSH scripts to PowerShell scripts in R2 workflows is a little bit of jumping from the frying pan and into the fire in that, instead of spawning a large number of processes each consuming relatively small amounts of processor and memory resources, the PowerShell script modules drive a single process (monitroinghost.exe) to consume a large quantity of resources, particularly CPU cycles.   Overall, memory utilization looks a lot better with the PowerShell modules, and although CPU utilization does seem to be better, it is still a concern for scalability. 

Thus far, I have been doing this performance testing in a development environment, with OpsMgr running on some virtual machines on both on workstation and older server class hardware, neither of which provide a good indication of real-world scalability (particularly given the fact that I have these VM’s running SQL, all OpsMgr duties, and SNMP simulations to boot).  On one of these woefully over-utilized VM’s, something around 130-150 interfaces on 10 monitored Cisco devices seemed to be the breaking point, but a more realistic OpsMgr deployment scenario (segregated database, RMS, and MS duties) on physical hardware should be able to handle far more than that.   I will report an update once I get a chance to do some broader scalability testing with the PowerShell version of the MP on more appropriate hardware. 

In summary, both the WSH and PowerShell probe and write action modules introduce a relatively heavy CPU load when utilized for data manipulation – relative to the very simple operations required to manipulate SNMP data, and a managed code module would be far more desirable, if available.  However, at present, these two providers are the only supported mechanisms for handling data that require processing before returning to a rule or monitor.   My testing thus far appears to support the assertion that R2 implements the PowerShell modules more efficiently than the WSH-based modules, which is welcome news, given the relative ease and impressive flexibility of scripting with PowerShell.  I’ve seen a bit of talk that PowerShell V2 is supposed to bring significant performance improvements over V1, and I hope to do some testing with the CTP version of V2 on an OpsMgr proxy agent in the very near future to see if it helps address any of the scalability challenges in SNMP monitoring with OpsMgr.  As for the best approach for the present, it looks like PowerShell is the way to go, and the overall impact on the MS/proxy agents can be mitigated by spreading monitored objects across multiple proxy agents, focusing discovery to only those objects which are required to be monitored (i.e. interfaces), and avoiding overly-aggressive scheduling of monitors.