Sun Hardware Monitoring with Net-SNMP and Shell Scripts
August 18, 2009 1 Comment
While Sun, like most server vendors, offers a comprehensive suite of hardware monitoring agents and management tools, it can be frustrating to monitor Sun hardware using the Sun Management Agent and a third-party SNMP tool, such as System Center Operations Manager. The Sun Management Agent’s SNMP implementation builds on the Entity-MIB (http://docs.sun.com/app/docs/doc/817-3155/6mip4hnov?l=en&a=view) with the SUN Platform-MIB, and while all relevant hardware monitoring data are exposed through this MIB implementation, there are problems with deploying wide-scale monitoring of these objects using SNMP get requests. This is because the list of entity objects varies by server model, and can even vary depending on the number of objects, like hard drives.
To expound on this point, Sun servers that run the SMA agent will list all hardware sensors in the entPhysicalTable of the ENTITY-MIB. The id value for each of the hardware sensors will correspond to the id value for the sensor status (administrative and operational status) in the sunPlatEquipmentTable in the SUN Platform-MIB. However, on one model, id 15 might correspond to CPU 0 Fan 0, but on another model, id 15 would correspond to a different sensor, and if that server had two CPU’s, id 17 might correspond to CPU 1 Fan 0, but if that system had only one CPU, id 17 would correspond to a different sensor.
If you could use an SNMP table or SNMP walk request, and return the results to a script that parses the output, this would not be a problem, but like many SNMP monitoring tools, SCOM implements SNMP gets only, meaning that variability in the OID.
So, what’s a way to work around this without committing to deploying a secondary monitoring tool just for monitoring SUN hardware? One solution lies in the extensibility of the Net-SNMP agent, which is the default SNMP agent for Solaris. Net-SNMP allows the extension of the agent’s functionality by assigning commands to OIDs. With this configuration, whenever the OID is polled, the command is run on-demand and the output of the command is returned as the SNMP value. For more on this functionality, see the Extending Agent Functionality section at: http://www.net-snmp.org/docs/man/snmpd.conf.html.
To utilize this for hardware monitoring, a standard set of shell or PERL scripts can be written and deployed to a uniform path on all of your SUN servers, each configured to return a value such as “Pass” if everything checks out, or “Fail: <reason for failure>” if there are problems found. The scripts can be written to support different status checking commands to support maximum portability (for example, using one status command on systems with software disk redundancy and another on systems with hardware RAID). A great starting point for example monitoring scripts can be found at Sun’s BigAdmin site: http://www.sun.com/bigadmin/scripts/indexMon.html
The net result is that with well written scripts and the Net-SNMP agent, a single monitoring solution can be deployed to all Sun servers, independent of their hardware model. With consistent configuration in the snmpd.conf, the OID’s for each of the scripts (e.g. CPU script, HDD script, etc) will be the same and can be polled with a single set of SNMP get monitors in SCOM or another utility.