Monitoring for SNMP Value Changes with SolarWinds ORION NPM

I had previously described a few example scenarios in which monitoring SNMP values for changes (from the values in previous polling cycles) could be useful.   In this post, I will describe the steps to configure monitoring for these scenarios in SolarWinds ORION NPM. 

Detecting changes in Checkpoint Firewall (Splat) High Availability State

The checkpoint mib includes a good set of SNMP objects exposed for state and performance monitoring of Checkpoint Secure Platform firewalls.   The state of firewall modules can be polled with the xxStatCode (numeric) or xxStatShortDescr (string) objects.  For example, Secure Virtual Networking can be monitored with the svnStatCode (1.3.6.1.4.1.2620.1.6.101) or svnStatShortDescr (1.3.6.1.4.1.2620.1.6.102) objects.  Likewise for the other modules such as HA, DTPS, or WAM (etc) modules.   However, in order to detect HA failovers, I monitor the haState (1.3.6.1.4.1.2620.1.5.6) object for changes (i.e. from “standby” to “active”).  

Detecting Default Gateway (ipRouteNextHop) Changes on Cisco Routers

In some redundant configurations, a change in the device’s default gateway may be the best indicator of a failover to an alternate Wide-Area connection, which could be a problem if the backup WAN link is a slower bandwidth connection.   The ipRouteNextHop (1.3.6.1.2.1.4.21.1.7) object is located in the ipRoute table of the ubiquitous RFC1213 (MIB II) mib.  The device’s default gateway is the first row listed in this table.

Detecting Serial Interface Flapping

Increases in the locIFResets (1.3.6.1.4.1.9.2.2.1.1.17) Cisco counter on a serial interface are a good indicator of flapping on the serial connection.   If the serial interface resets more than two times in a polling cycle, we can probably assume that it is flapping (an administrative shut and start would be one reset, so by monitoring for 2 or more resets, we can avoid alerts when planned maintenance is being performed).    If the reset count doesn’t change for a few polling cycles, it can probably be assumed that the connection has stabilized. 

Step 1:  Create the Universal Poller Definitions

For the CheckPoint HA state change monitor, a GET NEXT poller for the OID 1.3.6.1.4.1.2620.1.5.6 (hastate) will return a string value indicating the current High Availability state (e.g. Active or Standby).   Configure the poller to keep historical data and assign to the appropriate CheckPoint devices.

 

For the default gateway change monitor, create a poller with a GET NEXT request to the OID 1.3.6.1.2.1.4.21.1.7 (ipRouteNextHop).  The default gateway for the device will be the first entry in the table, so the GET NEXT request will suffice.  Again, keep historical data and assign to the appropriate devices.

 

For the serial interface flapping monitor, create a universal device poller for the locIfResets object in the Cisco local interfaces table (1.3.6.1.4.1.9.2.2.1.1.17).  Use a GET TABLE request, keep historical data, and assign to the appropriate nodes. Note, this could be targeted to an interface to keep historical data smaller, but for convenience, I use the GET TABLE and just return the values for all interfaces on selected nodes. 

Step 2:  Create SQL User Defined Functions

For each of the three monitors, a SQL User Defined Function can be defined to calculate changes in the recent historical data with queries to the CustomPollerStatistics_Detail table joined with the CustomPollers and CustomPollerAssignments tables. 

For the CheckPoint HA status change monitor, this function will retrieve the distinct count of values retrieved by the haState universal device poller in the past 30 minutes for a given node id.  This is an easy way to check for recent changes as the count will be greater than 1 if the value has changed within the past 30 minutes.

USE [NetPerfMon]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[udf_HAStateChange30Min]
   (@NodeID int)

RETURNS int
AS
BEGIN

DECLARE @NumHAStatus30min int
SELECT @NumHAStatus30min = COUNT(DISTINCT status)
FROM CustomPollers
INNER JOIN
  CustomPollerAssignment
ON
  (CustomPollers.CustomPollerID = CustomPollerAssignment.CustomPollerID)
INNER JOIN
   CustomPollerStatistics_Detail
ON
   (CustomPollerStatistics_Detail.CustomPollerAssignmentID =
     CustomPollerAssignment.CustomPollerAssignmentID)
WHERE
    CustomPollers.UniqueName=’haState’
AND
   NodeID = @NodeID
AND 
   DateTime > DateAdd(n,-30,GetDate())
RETURN
   @NumHAStatus30min
END

For the default gateway change monitor, this function will also check for the distinct count in ipRouteNextHop values in the past 30 minutes, given a node id.

USE [NetPerfMon]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[udf_DefGWCount30Min]
(@NodeID int)
RETURNS int
AS
BEGIN
DECLARE @DefGWCount30min int

SELECT
   @DefGWCount30Min = COUNT(DISTINCT status)
FROM
   CustomPollers
INNER JOIN
   CustomPollerAssignment
ON
  (CustomPollers.CustomPollerID =
   CustomPollerAssignment.CustomPollerID)
INNER JOIN
  CustomPollerStatistics_Detail
ON
   (CustomPollerStatistics_Detail.CustomPollerAssignmentID =
   CustomPollerAssignment.CustomPollerAssignmentID)
WHERE
   CustomPollers.UniqueName=’ipRouteNextHop’
AND
   NodeID = @NodeID
AND
   DateTime > dateadd(n,-30,GetDate())
AND
   DateTime <= GetDate()
RETURN
   @DefGWCount30Min
END

For the serial interface flapping monitor, we don’t want to just check for a change in values, but rather the actual increase count in the logged values.  Additionally, we will need to pass both a node id and interface row id to the function match both the node and interface.

USE [NetPerfMon]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[udf_ifResetDelta30Min]
   (@NodeID int, @Rowid int)
RETURNS int
AS
BEGIN
DECLARE
   @ifResetDelta30Min int
SELECT
   @ifResetDelta30Min =
   Max(Cast(CustomPollerStatistics_Detail.Status as float)) –
   Min(Cast(CustomPollerStatistics_Detail.Status as float))
FROM        
   CustomPollers
INNER JOIN
   CustomPollerAssignment
ON
   (CustomPollers.CustomPollerID =
    CustomPollerAssignment.CustomPollerID )
INNER JOIN
   CustomPollerStatistics_Detail
ON
   (CustomPollerAssignment.CustomPollerAssignmentID =
    CustomPollerStatistics_Detail.CustomPollerAssignmentID)
INNER JOIN
   Interfaces
ON
   (CustomPollerAssignment.NodeID =
    Interfaces.NodeID)
AND
   (CustomPollerStatistics_Detail.RowID =
    Interfaces.InterfaceIndex)
INNER JOIN
   Nodes
ON
   CustomPollerAssignment.NodeID = Nodes.NodeID
WHERE
   CustomPollers.UniqueName = ‘locIFResets’
AND
   CustomPollerStatistics_Detail.RowID = @RowID
AND
   CustomPollerAssignment.NodeID = @NodeID
AND
   DateTime > dateadd(n,-30,GetDate())
AND
   DateTime < GetDate()
RETURN @ifResetDelta30Min
END

Step 3:  Create SQL Computed Columns

To facilitate alerting on these custom functions, we will add a computed columns to the Nodes and Interfaces tables.

For the CheckPoint HA state change monitor, add a computed column to the Nodes table with the specification:

([dbo].[udf_HAStateChange30Min]([NodeID]))

For the default gateway change monitor, add a computed column to the Nodes table with the specification:

([dbo].[udf_DefGWCount30Min]([NodeID]))

For the Serial Interface flapping monitor, add a computed column to the Interfaces table with the specification:

([dbo].[udf_ifResetDelta30Min]([NodeID],[InterfaceIndex]))

With these computed columns (which show up as ORION custom properties), alerts can be configured on the calculated values returned by the user defined functions.

Step 4:  Create Alerts

With the SQL work out of the way, alerting on these state changes is relatively simple.  For each monitor, a new Advanced Alert is required, scoped to the appropriate nodes or interfaces and triggered by the computed column (custom property).  When adding the triggers, the computed columns can be accessed under the Nodes (or Interface for the interface reset function) -> Custom Property menu.

The CheckPoint HA state change monitor alert trigger can be configured like:

 

For the default gateway change monitor, the trigger can be set up as follows:

 

And for the serial interface flapping monitor, the trigger can be set up as follows (filtering the interface type to serial interfaces) 

Figure 6

In conclusion, these monitors require a good degree of customization to facilitate in ORION.  However, the value provided by these monitors makes the effort justified many times over, in my experience at least.

Advertisements

About Kristopher Bash
Kris is a Senior Program Manager at Microsoft, working on UNIX and Linux management features in Microsoft System Center. Prior to joining Microsoft, Kris worked in systems management, server administration, and IT operations for nearly 15 years.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: