Monitoring for SNMP Value Changes with SolarWinds ORION NPM
September 1, 2009 Leave a comment
Detecting changes in Checkpoint Firewall (Splat) High Availability State
The checkpoint mib includes a good set of SNMP objects exposed for state and performance monitoring of Checkpoint Secure Platform firewalls. The state of firewall modules can be polled with the xxStatCode (numeric) or xxStatShortDescr (string) objects. For example, Secure Virtual Networking can be monitored with the svnStatCode (1.3.6.1.4.1.2620.1.6.101) or svnStatShortDescr (1.3.6.1.4.1.2620.1.6.102) objects. Likewise for the other modules such as HA, DTPS, or WAM (etc) modules. However, in order to detect HA failovers, I monitor the haState (1.3.6.1.4.1.2620.1.5.6) object for changes (i.e. from “standby” to “active”).
Detecting Default Gateway (ipRouteNextHop) Changes on Cisco Routers
In some redundant configurations, a change in the device’s default gateway may be the best indicator of a failover to an alternate Wide-Area connection, which could be a problem if the backup WAN link is a slower bandwidth connection. The ipRouteNextHop (1.3.6.1.2.1.4.21.1.7) object is located in the ipRoute table of the ubiquitous RFC1213 (MIB II) mib. The device’s default gateway is the first row listed in this table.
Detecting Serial Interface Flapping
Increases in the locIFResets (1.3.6.1.4.1.9.2.2.1.1.17) Cisco counter on a serial interface are a good indicator of flapping on the serial connection. If the serial interface resets more than two times in a polling cycle, we can probably assume that it is flapping (an administrative shut and start would be one reset, so by monitoring for 2 or more resets, we can avoid alerts when planned maintenance is being performed). If the reset count doesn’t change for a few polling cycles, it can probably be assumed that the connection has stabilized.
Step 1: Create the Universal Poller Definitions
For the CheckPoint HA state change monitor, a GET NEXT poller for the OID 1.3.6.1.4.1.2620.1.5.6 (hastate) will return a string value indicating the current High Availability state (e.g. Active or Standby). Configure the poller to keep historical data and assign to the appropriate CheckPoint devices.
For the default gateway change monitor, create a poller with a GET NEXT request to the OID 1.3.6.1.2.1.4.21.1.7 (ipRouteNextHop). The default gateway for the device will be the first entry in the table, so the GET NEXT request will suffice. Again, keep historical data and assign to the appropriate devices.
For the serial interface flapping monitor, create a universal device poller for the locIfResets object in the Cisco local interfaces table (1.3.6.1.4.1.9.2.2.1.1.17). Use a GET TABLE request, keep historical data, and assign to the appropriate nodes. Note, this could be targeted to an interface to keep historical data smaller, but for convenience, I use the GET TABLE and just return the values for all interfaces on selected nodes.
Step 2: Create SQL User Defined Functions
For each of the three monitors, a SQL User Defined Function can be defined to calculate changes in the recent historical data with queries to the CustomPollerStatistics_Detail table joined with the CustomPollers and CustomPollerAssignments tables.
For the CheckPoint HA status change monitor, this function will retrieve the distinct count of values retrieved by the haState universal device poller in the past 30 minutes for a given node id. This is an easy way to check for recent changes as the count will be greater than 1 if the value has changed within the past 30 minutes.
USE [NetPerfMon]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GOCREATE FUNCTION [dbo].[udf_HAStateChange30Min]
(@NodeID int)RETURNS int
AS
BEGINDECLARE @NumHAStatus30min int
SELECT @NumHAStatus30min = COUNT(DISTINCT status)
FROM CustomPollers
INNER JOIN
CustomPollerAssignment
ON
(CustomPollers.CustomPollerID = CustomPollerAssignment.CustomPollerID)
INNER JOIN
CustomPollerStatistics_Detail
ON
(CustomPollerStatistics_Detail.CustomPollerAssignmentID =
CustomPollerAssignment.CustomPollerAssignmentID)
WHERE
CustomPollers.UniqueName=’haState’
AND
NodeID = @NodeID
AND
DateTime > DateAdd(n,-30,GetDate())
RETURN
@NumHAStatus30min
END
For the default gateway change monitor, this function will also check for the distinct count in ipRouteNextHop values in the past 30 minutes, given a node id.
USE [NetPerfMon]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GOCREATE FUNCTION [dbo].[udf_DefGWCount30Min]
(@NodeID int)
RETURNS int
AS
BEGIN
DECLARE @DefGWCount30min intSELECT
@DefGWCount30Min = COUNT(DISTINCT status)
FROM
CustomPollers
INNER JOIN
CustomPollerAssignment
ON
(CustomPollers.CustomPollerID =
CustomPollerAssignment.CustomPollerID)
INNER JOIN
CustomPollerStatistics_Detail
ON
(CustomPollerStatistics_Detail.CustomPollerAssignmentID =
CustomPollerAssignment.CustomPollerAssignmentID)
WHERE
CustomPollers.UniqueName=’ipRouteNextHop’
AND
NodeID = @NodeID
AND
DateTime > dateadd(n,-30,GetDate())
AND
DateTime <= GetDate()
RETURN
@DefGWCount30Min
END
For the serial interface flapping monitor, we don’t want to just check for a change in values, but rather the actual increase count in the logged values. Additionally, we will need to pass both a node id and interface row id to the function match both the node and interface.
USE [NetPerfMon]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GOCREATE FUNCTION [dbo].[udf_ifResetDelta30Min]
(@NodeID int, @Rowid int)
RETURNS int
AS
BEGIN
DECLARE
@ifResetDelta30Min int
SELECT
@ifResetDelta30Min =
Max(Cast(CustomPollerStatistics_Detail.Status as float)) –
Min(Cast(CustomPollerStatistics_Detail.Status as float))
FROM
CustomPollers
INNER JOIN
CustomPollerAssignment
ON
(CustomPollers.CustomPollerID =
CustomPollerAssignment.CustomPollerID )
INNER JOIN
CustomPollerStatistics_Detail
ON
(CustomPollerAssignment.CustomPollerAssignmentID =
CustomPollerStatistics_Detail.CustomPollerAssignmentID)
INNER JOIN
Interfaces
ON
(CustomPollerAssignment.NodeID =
Interfaces.NodeID)
AND
(CustomPollerStatistics_Detail.RowID =
Interfaces.InterfaceIndex)
INNER JOIN
Nodes
ON
CustomPollerAssignment.NodeID = Nodes.NodeID
WHERE
CustomPollers.UniqueName = ‘locIFResets’
AND
CustomPollerStatistics_Detail.RowID = @RowID
AND
CustomPollerAssignment.NodeID = @NodeID
AND
DateTime > dateadd(n,-30,GetDate())
AND
DateTime < GetDate()
RETURN @ifResetDelta30Min
END
Step 3: Create SQL Computed Columns
To facilitate alerting on these custom functions, we will add a computed columns to the Nodes and Interfaces tables.
For the CheckPoint HA state change monitor, add a computed column to the Nodes table with the specification:
([dbo].[udf_HAStateChange30Min]([NodeID]))
For the default gateway change monitor, add a computed column to the Nodes table with the specification:
([dbo].[udf_DefGWCount30Min]([NodeID]))
For the Serial Interface flapping monitor, add a computed column to the Interfaces table with the specification:
([dbo].[udf_ifResetDelta30Min]([NodeID],[InterfaceIndex]))
With these computed columns (which show up as ORION custom properties), alerts can be configured on the calculated values returned by the user defined functions.
Step 4: Create Alerts
With the SQL work out of the way, alerting on these state changes is relatively simple. For each monitor, a new Advanced Alert is required, scoped to the appropriate nodes or interfaces and triggered by the computed column (custom property). When adding the triggers, the computed columns can be accessed under the Nodes (or Interface for the interface reset function) -> Custom Property menu.
The CheckPoint HA state change monitor alert trigger can be configured like:
For the default gateway change monitor, the trigger can be set up as follows:
And for the serial interface flapping monitor, the trigger can be set up as follows (filtering the interface type to serial interfaces)
In conclusion, these monitors require a good degree of customization to facilitate in ORION. However, the value provided by these monitors makes the effort justified many times over, in my experience at least.