August 30, 2009 Leave a comment
Most SNMP monitoring can be facilitated by comparing the value of a specific retrieved SNMP object to an expected string or threshold, but monitoring for some conditions can only really be accomplished by comparing the current value to a previous value.
Three examples of this are:
1) Serial Interface Flapping: If a serial connection is experiencing problems, the interface may bounce up and down rapidly. If an SNMP poll on that interface is occurring every 1, 3, or 5 minutes, it may not detect any problems (if the interface is up for the poll), meaning that compromised availability could go undetected for several polling cycles. These conditions can be detected by comparing the Interface Resets (locIFResets) counter in the Cisco local interfaces table to previous values.
2) Default Gateway Changes on Redundant Routers: In some redundant WAN router deployments, a default gateway change on the routers is indicative of a redundancy failover. Because all devices and interfaces may be up and reachable before and after a failover, it may be difficult to detect when the failover has occurred, potentially meaning production traffic is routed over a slower backup link. This can be detected by monitoring the Default Gateway value (ipRouteNextHop in the ipRouteTable) on the routers and detecting changes when compared to previous polling cycles.
3) High-Availability State Changes on CheckPoint SPLAT Firewalls: In an HA configuration on CheckPoint firewalls, the haStatus value will return a string value of “active” or “standby.” The best way to detect an HA failover is to watch this value for a change.
These are just three examples of many potential scenarios where monitoring of an SNMP object is best served by comparing current values to previously polled values. Unfortunately, this capability is not a common feature in many of the monitoring tools that I am familiar with.
Over the next week (or more), I will be posting articles about how I have to implemented just such a monitor for the three described scenarios using the two monitoring products that I currently work with: SolarWinds ORION and System Center Operations Manager. In the case of ORION, these monitors can be implemented fairly easily with a bit of SQL work. In the case of SCOM, it’s a little bit more complicated, but ultimately doable.