Monitoring HP Hardware Status on VMWare ESX Servers

HP provides great SCOM management packs for monitoring of Proliant servers, but only Windows agents are supported by these management packs.  If you’re running ESX on Proliant servers, it takes a little bit more effort to implement monitoring of hardware status.  Fortunately, HP also offers their Management Agents for ESX.   Thus, all that is needed to monitor HP ESX server hardware are some custom monitors to poll the snmp data exposed by the management agent.   An overview of the process for this is as follows:

Installing the HP Management Agent for ESX

  1. Configure SNMP on the ESX servers and set options: http://thwack.com/blogs/geekspeak/archive/2008/10/30/how-to-enable-snmp-on-a-vmware-esx-server.aspx
  2. Download the HP ESX agent (make sure your server model is supported by the agent) and copy the .tgz file to a temporary location on the ESX server
  3. Extract the file hpmgmt-8.x.x-vmware3x.tgz with a tar-zxvf command
  4. In the extracted directory, run the install script — later versions of the agent have a preinstall_setup.sh script which is to be manually run first, and requires a reboot. 
  5. Amongst other configuration prompts, you will be prompted to use an existing snmpd.conf, if you choose “no,” the install will create a new snmpd.conf that has to be configured with your snmp settings.
  6. If you use an existing snmpd.conf, you will have to add one line to it:  cd to /etc/snmp/ and edit snmpd.conf.  Add the following line:  dlmod cmaX /usr/lib/libcmaX.so   – this extends the SNMP agent to include the HP objects as a module.
  7. Restart snmp with:  service snmpd restart

Testing

The HP agents implement the Compaq mibs under the OID 1.3.6.1.4.1.232.   To test, you can use an SNMP browser to remotely connect and walk this OID, or from the ESX server, you can use an snmpwalk command:  snmpwalk –v 2c  -c <read-only community name>  localhost 1.3.6.1.4.1.232.

Monitoring with SCOM

  1. Discover the ESX servers as Network Devices
  2. Create a group for HP ESX servers (optionally in a new management pack).  You can use dynamic inclusion logic by setting a filter on the Device Description (Contains vmnix)
  3. Create your SNMP monitors and rules, targeting the SNMP Network Device class.  Configure the monitors and rules to be disabled, and then use an override to enable them for the HP ESX server group
  4. Create any required views or console tasks

What to Monitor?

When HP purchased Compaq, they made a smart decision in utilizing the Compaq SNMP MIBs for all HP servers, as this is one of the better vendor SNMP implementations out there.   It has remained very consistent over the years and most importantly, it tends to implement a single status value for each group of subcomponents that are represented in SNMP tables, so you don’t have to walk the table to get the overall status.    Thus, instead of checking the status of each disk drive, which will vary in number (and identifier in the table), you can just poll the cpqDaMibCondition (1.3.6.1.4.1.232.3.1.3) from the CPQIDA MIB to get the overall intelligent drive array health.  The agent’s System Management web console can be used for specifically drilling in to problems, so from a monitoring perspective, it is really only necessary to know when there is a problem, and what it’s general nature is.

These are the SNMP objects that I like to alert on for HP servers running UNIX:

Object Name OID
CPU Fans cpqHeThermalCpuFanStatus 1.3.6.1.4.1.232.6.2.6.5.0
Drive Array Health cpqDaMibCondition 1.3.6.1.4.1.232.3.1.3.0
Drive Array Controller (1) cpqDaCntlCondition 1.3.6.1.4.1.232.3.2.2.1.1.6.1
Power supplies cpqHEfltTolPwrSupply 1.3.6.1.4.1.232.6.2.9.1.0
System Fans cpqHeThermalSystemFanStatus 1.3.6.1.4.1.232.6.2.6.4.0
Temperature (Status) cpqHeThermalTempStatus 1.3.6.1.4.1.232.6.2.6.3.0
Thermal Conditions cpqHeThermalCondition 1.3.6.1.4.1.232.6.2.6.1.0
Integrated Management Log cpqHeEventLogCondition 1.3.6.1.4.1.232.6.2.11.2.0
Critical Errors cpqHeCritLogCondition 1.3.6.1.4.1.232.6.2.2.2.0
Correctable Memory Errors cpqHeCorrMemLogStatus 1.3.6.1.4.1.232.6.2.3.1.0

For reference on SNMP MIBS, ByteSphere provides a great Online MIB Database.  The primary Compaq MIBS to look for are: CPQHLTH, CPQIDA, CPSTSYS, CPQHOST,  CPQNIC, CPQTHRSH.

Advertisements

About Kristopher Bash
Kris is a Senior Program Manager at Microsoft, working on UNIX and Linux management features in Microsoft System Center. Prior to joining Microsoft, Kris worked in systems management, server administration, and IT operations for nearly 15 years.

3 Responses to Monitoring HP Hardware Status on VMWare ESX Servers

  1. Emanuel Beunder says:

    I think the OID you recommend for checking ECC-errors is not correct.
    According to the SolarWind website this OID represents:
    “This value specifies whether this system is currently tracking correctable memory errors.”
    The value you are looking probably is:
    cpqHeCorrMemLogCondition
    Oid 1.3.6.1.4.1.232.6.2.3.2

  2. Derv says:

    I recently installed the HP Proliant management pack for xSNMP. It works great for ESX hosts with HP SNMP agents installed. However, I was told that HP doesn’t offer a SNMP agent for ESXi hosts. Unable to get much hardware information from ESXi hosts. This also affects other SNMP based management systems like Solarwinds Orion. The Linux SNMP daemon is responding to queries, but without a HP SNMP agents installed, can’t get much hardware related info. We had to configured HP-SIM to poll these ESXI hosts using WBEM credentials. Is there any open source management packs that can utilize WBEM credentials?

  3. Pingback: Hardware monitoring on HP Proliant DL380 Gx servers | 0xf8.org

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: