SCOM: Updates to the Cisco Management Pack (R2) v1.0.2.6

September 26, 2009 26 Comments

I’m hoping to finish up the SP1 version of the Cisco Management Pack pretty soon, but I’ve modified the R2 version to include several new changes. The current version: 1.0.2.6 can be downloaded here.

The changes in this version are:

Added three new containment classes: Cisco Device Chassis, Cisco Device System Components, and Cisco Device Interfaces. These classes contain monitored objects to add an additional level of hierarchical organization.
Added discovery of the IFAlias property for Interfaces
Added discovery of the Hostname (OLD-CISCO-MIB) and Chassis description for the Cisco Device class.
Updated the properties displayed by default in the Device and Interface views
Added a rule to clean up unused XML temporary files once a day. Several of the monitors utilize temporary XML files written to the %TEMP% path. In the previous version, old files would be left on the file system if a previously monitored object was removed. This rule will remove those temporary files.
Modified discovery intervals for some objects for more balanced timing.
Added four new monitors for switches that implement the CISCO-STACK-MIB. The monitors are targeted at the Cisco Device Chassis class and include
- Fan Alarm
- Temperature Alarm
- Minor Alarm
- Major Alarm

With the new containment classes, the diagram view looks a lot better:

Filed under Management Pack Authoring, Operations Manager, SNMP

About Kristopher Bash
Kris is a Senior Program Manager at Microsoft, working on UNIX and Linux management features in Microsoft System Center. Prior to joining Microsoft, Kris worked in systems management, server administration, and IT operations for nearly 15 years.

26 Responses to SCOM: Updates to the Cisco Management Pack (R2) v1.0.2.6

Pingback: SCOM: Advanced SNMP Monitoring Part III: The Completed Cisco Management Pack « Operating-Quadrant
Richard says:

September 30, 2009 at 8:56 am

Kris,

First off, thanks for the update!

I do have one concern though. I noticed that the Discovery ID for the Interfaces was changed from:

CiscoSNMP.Discovery.CiscoInterfaces

to:

CiscoSNMP.Discovery.Interface

Isn’t this going to affect any overrides already in place for Interface discovery?

Reply
- Kristopher Bash says:
  
  September 30, 2009 at 11:14 am
  
  I’ll look into this. I’m also open to any other feedback you may have on managing the interface discovery.
  
  Reply
Richard says:

September 30, 2009 at 10:44 am

Kris,

Yes, it does affect any overrides already in place.

I had a little over a hundred overrides for limiting interfaces for devices and they are gone after updating.

Reply
- Kristopher Bash says:
  
  September 30, 2009 at 11:10 am
  
  Ouch. This version did introduce significant changes to the class organization. Do you have a backup of your old MP with the overrides?
  
  Reply
  - Richard says:
    
    September 30, 2009 at 11:27 am
    
    Yes, i have backups. I have been going through your MP and disabling most of the discoveries and then sealing the MP so I can store my overrides in a separate MP. I had them all exported to an excel spreadsheet also, so adding them back won’t be that big of a deal. Glad I had only done 3 sites so far, we have over 300 total, that would have been a lot of work to redo.
Richard says:

September 30, 2009 at 10:54 am

Also, with the new discovery, when creating a new override for “Discover Cisco Interface” , the override is now targeted at “Cisco Device Interfaces” which only shows an IP address as the path, the hostname/devicename is no longer shown so it makes it harder to target the override if you are used to using the hostname/devicename instead of the IP Address (which I am).

Reply
Marnix Wolf says:

October 2, 2009 at 4:16 am

Hi Kris.

I get many events like this one:
Event Type: Error
Event Source: Health Service Modules
Event Category: None
Event ID: 11903
Date: 2-10-2009
Time: 10:11:57
User: N/A
Computer: XXXX
Description:
The Microsoft Operations Manager Expression Filter Module could not convert the received value to the requested type.

Property Expression: Property[@Name=’MemoryPctUsed’]

Property Value: 82,33

Conversion Type: DataItemElementTypeDouble(3)

Original Error: 0x80FF005A

One or more workflows were affected by this.

Workflow name: CiscoSNMP.Monitor.MemoryPoolPctUtil
Instance name: Processor
Instance ID: {AC2BBD47-FA33-0B10-62AA-F4F9DCBA2CCA}
Management group: XXXX

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Also this event pops up, many times:

Event Type: Error
Event Source: Health Service Modules
Event Category: None
Event ID: 11001
Date: 2-10-2009
Time: 10:08:12
User: N/A
Computer: XXXX
Description:
Error sending an SNMP GET message to IP Address 10.40.1.252, Community String:=XXXX, Status 0x6c.

One or more workflows were affected by this.

Workflow name: CiscoSNMP.Rule.CollectIFCollisions
Instance name: 59
Instance ID: {20030910-0988-FC87-B40E-04FD9E0FD9B3}
Management group: XXX

I have already disabled for two ports the Cisco Status Interface Monitor since these ports aren’t used, so the Alerts raised for these ports aren’t needed.

Reply
- Kristopher Bash says:
  
  October 2, 2009 at 7:27 am
  
  I’ll look into these errors.
  
  Reply
  - Marnix Wolf says:
    
    October 2, 2009 at 7:37 am
    
    Thanks Kris.
    
    Don’t get me wrong. I couldn’t write such a MP myself. I have investigated it further and the SNMP device is being monitored properly in OpsMgr.
    
    The EventID 11001 happens many times per second. These workflows are reporting the above mentioned issue:
    CiscoSNMP.Rule.CollectIFCollisions
    CiscoSNMP.Rule.CollectInterfaceInPctUtil
    CiscoSNMP.Monitor.InterfaceCollisions
    CiscoSNMP.Rule.CollectInterfaceOutPctUtil
    CiscoSNMP.Monitor.InterfacePctUtil
    
    If there is anything else I can do, let me know.
    
    Best regards,
    Marnix
Kristopher Bash says:

October 2, 2009 at 6:39 pm

Marnix,

Thanks for bringing this to my attention, if you’re up for it, I may ask you to collect some more information so that we can get to the root of the issue. How many CIsco devices are you monitoring and what is the highest number of interfaces per device you are monitoring? Are you able to tell if these errors are occurring on just one device or on multiple devices? Off the top of my head, I suspect one of three things is hapepning:
1) Too many SNMP requests are being sent to the device at the same time and some are timing out
2) Too many SNMP requests are being sent from the proxy agent at the same time and some are timing out
3) A malformed OID is being passed to the monitor

To explain item 3, all of the interface monitors use a variable replacement to form the oid. So the OID is defined in the monitor as .1.3.1.x.x.x.x.x.$Config/Index$. I have used expression filters to try to prevent any SNMP requests from being sent if the $Config/Index$ value is null, but I suppose there could be a problem with this that I am unaware of.

If the SNMP request is successfully sent and no data value is returned, it should not generate an 11001 event, the return value would just be empty.

As for the memory pool utilization error, I believe that is a problem due to regional language settings (or rather the MP not properly anticipating regional setting differences). I haven’t proven this out, but my hunch is that the performance data mapper may not like the value being returned as 82,33 instead of 82.33. I think this is probably the result of using a vbs formatnumber function to trim the decimal places. I’ll try to see if I can prove this out in a lab setting.

Thanks again!

Reply
- Marnix Wolf says:
  
  October 5, 2009 at 1:39 am
  
  Hi Kris.
  
  I am collection information. (I am on a different site now). When I have the information I’ll let you know.
  
  I am sure it is too many snmp requests since the Event ID: 11001 popped up tens of times per second.
  
  At the moment only one Cisco device was being monitored. I have removed the Cisco MP for now since it flooded the OpsMgr eventlog of the Management Server (whih monitors this Cisco device as a network device) too much.
  
  Thanks again for your efforts.
  
  By the way, can we communicate more directly? (When you are interested that is)
  
  Best regards,
  Marnix Wolf
  
  Reply
Marnix Wolf says:

October 5, 2009 at 2:16 am

Hi Kris.

Information about the device: Cisco 6500 with13 slots with at this moment 91 ports.

Best regards,
Marnix Wolf

Reply
- Kristopher Bash says:
  
  October 6, 2009 at 9:11 pm
  
  FYI, I sent you a direct email
  
  Reply
  - mats says:
    
    November 26, 2009 at 6:18 am
    
    Kristopher.
    Did you and Marnix get any further on this issue? I’m having the same at a customer site.
    
    Best regards
    Mats
  - Kristopher Bash says:
    
    November 26, 2009 at 12:26 pm
    
    Are the errors related to a device with a high number of interfaces?
  - mats says:
    
    November 27, 2009 at 5:27 am
    
    Yes it is. Its a Catalyst 4705R with 180 ports.
Yury Krylov says:

November 4, 2009 at 9:50 pm

Hi Kris,
Thanks for your great job. The only thing I wnat to point out is that if SCOM is installed on Windows Server 2008 the file RegCiscoMibs.cmd needs to be modified:
all command lines %SYSTEMROOT%\system32\wbem\smtp\smi2smir.exe chnage to %SYSTEMROOT%\system32\wbem\smi2smir.exe. There is no %SYSTEMROOT%\system32\wbem\smtp directory in WS 2008, if I’m not mistaken

-Yury

Reply
Jeroen Mazereel says:

November 25, 2009 at 9:06 am

Hi,

First of all, great job with the management pack!

I get a lot of warning alerts like this one:

Cisco Temperature Sensor Status Monitor

The temperature sensor (chassis) on the device xxxxxxx is in a warning or error state. The sensor state is: 5.

Legend:
normal(1),
warning(2),
critical(3),
shutdown(4),
notPresent(5),
notFunctioning(6)

It probably shouldn’t alert on this state. That or the condition on which the alert triggers should be overridable.

Again, great job so far with the management pack.

Best Regards,
Jeroen Mazereel

Reply
- Kristopher Bash says:
  
  November 25, 2009 at 5:44 pm
  
  Thanks for your comment. That is an issue that I will need to correct. I’ll try to get it updated soon.
  
  Reply
Chris Taylor says:

December 5, 2009 at 5:35 pm

First, nice work. =) This is extremely useful.

A couple things that I’ve noticed though (I’m testing monitoring with an ASA5505).

1) Interface DMZ is administratively down, there is no cable plugged in. It is healthy (since it’s admin status is 2). If I change this to up (1) it shows up in a critical state (since the line protocol is down). If I shut the interface back down it doesn’t automatically return to healthy. If I reset the health it does show up correctly.

2) The same thing seems to happen if the device being managed is offline longer than the Discover Cisco Device discovery interval the device drops out of the Cisco Devices group.

Reply
Alex Fischer says:

December 9, 2009 at 9:49 am

Hi Kristopher,

I’m working with the 1.0.2.7 build (SCOM 2007 R2 under W2K8). What could be the reason that Cisco 6500 (Supervisor 720) are just visible and you’re able to see the integrity of the chassis but not being able to see the status of the devices?

Alex

Reply
Pierre-Emmanuel says:

December 9, 2009 at 3:43 pm

Hi Kris,

This is a great job you have done so far, I am currently inspecting the nooks and crannies of the mp and I have a question right off the bat:

In the data source module for discovering cisco devices, you have an OID Filter module which, from what I understand, filters off any network devices which do not have an OID containing, 1.3.6.1.4.1.9., then in the final mapper module, you use the FilteredClassSnapshotDataMapper, in this module there is an expression that filters again on the OID, 1.3.6.1.4.1.9.. Is this really necessary? wouldn’t the ClassSnapshotDataMapper module be sufficient since the OID is previously filtered? The way I see it, if an OID doesn’t match the first OID filter, the workflow stops there and therefor further filter isin’t necessary.

Best regards!

Pierre-Emmanuel

Reply
François Dufour says:

January 20, 2010 at 4:56 pm

Hi Kristopher,

That is some great work ! Really appreciated.

I have just one remark, the Cisco switch I’m monitoring has some 10Gbits ports and these are the ones I precisely want to monitor. Indexfilter parameter works fine. The problem I see now is that the ifInOctets value is limited to 4294967295 as it is a Counter32. With 10Gbits bandwidth the counter is reset almost every 5 mins which makes the inpct calculation false. I found there http://www.cisco.com/en/US/tech/tk648/tk362/technologies_q_and_a_item09186a00800b69ac.shtml that there were some 64bits counters that provide the values. So I have two questions: Is it possible to query such snmp 64bits counters in SCOM ?

Reply
- Kristopher Bash says:
  
  January 21, 2010 at 12:04 am
  
  Thanks for your comment. The 32bit counter rollover is a real problem with 10GbE interfaces and can be a problem with 1Gb interfaces too. The challenge in a monitoring scenario is detecting which interfaces support the 64bit counters in the IF-MIB as some vendors/versions don’t support 64bit octet counters on 1Gb interfaces and others don’t. I am happy to say that this issue is fully addressed in the xSNMP management pack which I will post for public beta testing this week. In this MP, I configured support for both 32 and 64 bit IF Octet counters by discovering whether each interface supports 64bit counters, and then passing the correct OID (32 or 64 bit octect counter) as an overridable parameter to the utilization monitors/rules.
  
  Reply
  - François Dufour says:
    
    January 21, 2010 at 5:54 am
    
    Hi Kris,
    
    You just Rock ! 🙂