Monitoring for SNMP Value Changes with SolarWinds ORION NPM

I had previously described a few example scenarios in which monitoring SNMP values for changes (from the values in previous polling cycles) could be useful.   In this post, I will describe the steps to configure monitoring for these scenarios in SolarWinds ORION NPM. 

Detecting changes in Checkpoint Firewall (Splat) High Availability State

The checkpoint mib includes a good set of SNMP objects exposed for state and performance monitoring of Checkpoint Secure Platform firewalls.   The state of firewall modules can be polled with the xxStatCode (numeric) or xxStatShortDescr (string) objects.  For example, Secure Virtual Networking can be monitored with the svnStatCode (1.3.6.1.4.1.2620.1.6.101) or svnStatShortDescr (1.3.6.1.4.1.2620.1.6.102) objects.  Likewise for the other modules such as HA, DTPS, or WAM (etc) modules.   However, in order to detect HA failovers, I monitor the haState (1.3.6.1.4.1.2620.1.5.6) object for changes (i.e. from “standby” to “active”).  

Detecting Default Gateway (ipRouteNextHop) Changes on Cisco Routers

In some redundant configurations, a change in the device’s default gateway may be the best indicator of a failover to an alternate Wide-Area connection, which could be a problem if the backup WAN link is a slower bandwidth connection.   The ipRouteNextHop (1.3.6.1.2.1.4.21.1.7) object is located in the ipRoute table of the ubiquitous RFC1213 (MIB II) mib.  The device’s default gateway is the first row listed in this table.

Detecting Serial Interface Flapping

Increases in the locIFResets (1.3.6.1.4.1.9.2.2.1.1.17) Cisco counter on a serial interface are a good indicator of flapping on the serial connection.   If the serial interface resets more than two times in a polling cycle, we can probably assume that it is flapping (an administrative shut and start would be one reset, so by monitoring for 2 or more resets, we can avoid alerts when planned maintenance is being performed).    If the reset count doesn’t change for a few polling cycles, it can probably be assumed that the connection has stabilized. 

Read more of this post

Coming Soon: SNMP Monitoring for Changes in Polled Values

Most SNMP monitoring can be facilitated by comparing the value of a specific retrieved SNMP object to an expected string or threshold, but monitoring for some conditions can only really be accomplished by comparing the current value to a previous value.  

Three examples of this are:

1)      Serial Interface Flapping:  If a serial connection is experiencing problems, the interface may bounce up and down rapidly.  If an SNMP poll on that interface is occurring every 1, 3, or 5 minutes, it may not detect any problems (if the interface is up for the poll), meaning that compromised availability could go undetected for several polling cycles.  These conditions can be detected by comparing the Interface Resets (locIFResets) counter in the Cisco local interfaces table  to previous values.   

2)      Default Gateway Changes on Redundant Routers:  In some redundant WAN router deployments, a default gateway change on the routers is indicative of a redundancy failover.  Because all devices and interfaces may be up and reachable before and after a failover, it may be difficult to detect when the failover has occurred, potentially meaning production traffic is routed over a slower backup link.   This can be detected by monitoring the Default Gateway value (ipRouteNextHop in the ipRouteTable) on the routers and detecting changes when compared to previous polling cycles.

3)      High-Availability State Changes on CheckPoint SPLAT Firewalls:  In an HA configuration on CheckPoint firewalls, the haStatus value will return a string value of “active” or “standby.”  The best way to detect an HA failover is to watch this value for a change.

These are just three examples of many potential scenarios where monitoring of an SNMP object is best served by comparing current values to previously polled values.  Unfortunately, this capability is not a common feature in many of the monitoring tools that I am familiar with.  

Over the next week (or more), I will be posting articles about how I have to implemented just such a monitor for the three described scenarios using the two monitoring products that I currently work with:  SolarWinds ORION and System Center Operations Manager.  In the case of ORION, these monitors can be implemented fairly easily with a bit of SQL work.  In the case of SCOM, it’s a little bit more complicated, but ultimately doable.