Operations Manager – Extending UNIX/Linux Monitoring with MP Authoring – Part II

Introduction

In Part I of this series, I walked through creation of a custom Management Pack for monitoring an application hosted on a UNIX or Linux server, as well as the creation of some base data sources and application discovery.   In this post, I will build on this MP to implement custom process monitoring – monitoring the count of instances of a running daemon/process to check that the count is within a range.   While the standard process monitoring provider (SCX_UnixProcess) is the best source for process information in OpsMgr UNIX and Linux monitoring, it does not support this level of customized monitoring.

Advanced Service Monitoring

Continuing this custom application monitoring scenario, our hypothetical app has a single daemon associated with the app, but we will build the classes and data sources so that they could easily be extended to add more services/daemons to monitor.    In this example, we can suppose that we want to monitor a daemon that may have multiple instances running, and drive an alert if too many or too few instances of that process are running.   This monitoring will be implemented by using the ps command in a WSMan Invoke module.   To implement monitoring of a daemon for a discovered, custom application, there are two approaches that are viable:
 
  1. Define a custom service class, and discover an instance of this class for each service to monitor, configure monitor types and monitors targeting this class
  2. Create a monitor for each service to monitor, targeting the custom application class

Both methods are completely viable, and in most cases, it is appropriate to take the simpler approach and target the custom monitors to the application, providing static inputs into the monitor.   There are some cases where discovering a class instance for the service makes sense though.  Facilitating dynamic discovery of services or thresholds (read from a config file), using the service class in a Distributed Application model in OpsMgr, or maintaining logical seperation (in terms of monitoring) between the application and its subsystems are all scenarios that would benefit from discovering the monitored services as class instances.   For the purpose of illustration, I will discover the daemon to monitor in this example Management Pack as a class instance.

Class Definition

Class:  MyApp.Monitoring.Service

Definition

  • ID:  MyApp.Monitoring.Service
  • Base Class:  Microsoft.Unix.ApplicationComponent
  • Name:  MyApp Service

Properties

  • Name (String) – Key
  • MinRunning (Integer)
  • MaxRunning (Integer)

Discovery

Then we can define the data source to discover a service.   In this case, we know the name of the service and the value of the properties, so we don’t need to actually poll the agent to return data.   We can simply combine a Discovery Scheduler with a Discovery Data Mapper module to implement the data source.  However, we want to be able to override the values of MinRunning and MaxRunning, so these will need to be exposed as overridable configuration parameters.

Therefore, I’ve chosen to implement this data source in two parts.   The first data source, will simply combine a System.Discovery.Scheduler module and a System.Discovery.ClassSnapshotDataMapper module.   This data source will accept Interval, ClassId and InstanceSettings parameters as inputs.  The second data source will reference the first data source, but implement parameters for Service Name, MinRunning, and MaxRunning.    By breaking this into two data sources, the first data source can be used for other simple discoveries.

Discovery Data Source:  MyApp.Monitoring.DataSource.DiscoverObject

This is the data source that simply combines a scheduler and a discovery data mapper.  It requires that the MapperSchema be added to the Configuration:

<Configuration>
<IncludeSchemaTypes>
<SchemaType>
 System!System.Discovery.MapperSchema
</SchemaType>
</IncludeSchemaTypes>
…

Configuration Parameters:

  • Interval (integer):  Scheduler interval in seconds
  • ClassId (string):  ID of the Class to discover
  • InstanceSettings (SettingsType):  Discovery Instance Settings

Member Modules:

The first member module is the System.Discovery.Scheduler module, with the configuration:

<Scheduler>
<SimpleReccuringSchedule>
<Interval>$Config/Interval$</Interval>
<SyncTime/>
</SimpleReccuringSchedule>
<ExcludeDates/>
</Scheduler>

This is followed by a System.Discovery.ClassSnapshotDataMappermodule, with the configuration:

<ClassId>$Config/ClassId$</ClassId>
<InstanceSettings>
 $Config/InstanceSettings$
</InstanceSettings>

So this data source accepts the arbitrary Instance Settings and Class Id and maps the inputs to Discovery Data.

Discovery Data Source:  MyApp.Monitoring.DataSource.DiscoverService

This data source uses the MyApp.Monitoring.DataSource.DiscoverObject data source that we just created, but supports overridable inputs for the MinRunning and MaxRunning service class properties, by embedding the $Config/$ variables in the Instance Settings definition.

Configuration Parameters:

  • Interval (integer):  Scheduler interval in seconds – overridable
  • TargetSystem (string):  UNIX/Linux agent computer to execute the discovery
  • AppName (string):   The name of the application object (which is the key property for the hosting class instance)
  • ServiceName (string): The name of the service to discover
  • MinRunning (integer):  The minimum threshold of running processes expected – overridable
  • MaxRunning (integer):  The maximum threshold of running processes expected – overridable

Member Modules

This data source only has one member module:  MyApp.Monitoring.DataSource.DiscoverObject, with the configuration:

<Interval>$Config/Interval$</Interval>
<ClassId>
 $MPElement[Name="MyApp.Monitoring.Service"]$
</ClassId>
<InstanceSettings>
<Settings>
<Setting>
<Name>
 $MPElement[Name='MyApp.Monitoring.MyApp']/Name$
</Name>
<Value>$Config/AppName$</Value>
</Setting> 
<Setting>
<Name>
 $MPElement[Name='MyApp.Monitoring.Service']/Name$
</Name>
<Value>$Config/ServiceName$</Value>
</Setting>
<Setting>
<Name>
 $MPElement[Name='MyApp.Monitoring.Service']/MinRunning$
</Name>
<Value>$Config/MinProcesses$</Value>
</Setting>
<Setting>
<Name>
 $MPElement[Name='MyApp.Monitoring.Service']/MaxRunning$
</Name>
<Value>$Config/MaxProcesses$</Value>
</Setting>
<Setting>
<Name>
 $MPElement[Name='MicrosoftUnixLibrary!
  Microsoft.Unix.Computer']/PrincipalName$
</Name>
<Value>$Config/TargetSystem$</Value>
</Setting>
<Setting>
<Name>
$MPElement[Name='System!System.Entity']/DisplayName$
</Name>
<Value>$Config/ServiceName$</Value>
</Setting>
</Settings>
</InstanceSettings>

Note that the key properties for the hosting class instances (MyApp.Monitoring.MyApp, and Microsoft.Unix.Computer) are included in the Instance Settings so that the relationships can be mapped.  With the data sources in place, the next step is to configure the Discovery Rule (one for each service to discover).

Discovery Rule:  MyApp.Monitoring.Discovery.MyAppDService

This rule discovers the MyAppD daemon, using the data source just created.  It is targeted to instances of the MyApp class, and provides the name of the service and process count thresholds.

Data Source Configuration:

<Interval>28800</Interval>
<TargetSystem>
 $Target/Host/Property[Type="MicrosoftUnixLibrary!
  Microsoft.Unix.Computer"]/PrincipalName$
</TargetSystem>
<AppName>
 $Target/Property[Type="MyApp.Monitoring.MyApp"]/Name$
</AppName>
<ServiceName>myappd</ServiceName>
<MinProcesses>1</MinProcesses>
<MaxProcesses>3</MaxProcesses>

Once the MP is imported and the dicsovery has run, we can see the discovered service in the Discovered Inventory view in the Ops Console:

Building the Monitor Type

In this example, we are going to monitor the process/daemon and generate an alert if the count of running instances is outside of the threshold range.   To do this, we can use the Shell Command monitoring data source to execute a  ps h -e |grep <process name> grep -v grep |wc –l command string.  This calls the ps command, with switches to suppress the header row, and show all processes.   Grep is used to find the process name, and exclude the grep process itself.  Finally, the results are piped to wc –l to return a line count.   The result of this command string is that the running count of processes matching the process name is returned as a numeric value.

Monitor Type: MyApp.Monitoring.MonitorType.ProcessCount

This monitor type will use the MyApp.Monitoring.DataSource.ShellCommandMonitoring data source, to call the ps command string described above.  A set of condition detection modules are used to determine if the process count is below or above the minimum and maximum thresholds.

Configuration Parameters

  • Interval (integer):  Scheduler interval in seconds – overridable
  • TargetSystem (string):  UNIX/Linux agent computer to monitor
  • ServiceName (string): The name of the service to monitor
  • MinRunning (integer):  The minimum threshold of running processes expected
  • MaxRunning (integer):  The maximum threshold of running processes expected

Health States

  • ProcessCountOK
  • ProcessCountNotOK

Member Modules

The data source for this monitor type is: MyApp.Monitoring.DataSource.ShellCommandMonitoring, with the configuration:
<Interval>$Config/Interval$</Interval>
<TargetSystem>
 $Config/TargetSystem$
</TargetSystem>
<ShellCommand>
 ps h -e |grep $Config/ProcessName$|grep -v grep |wc –l
</ShellCommand>
<Timeout>120</Timeout>

Two configuration detection modules are required to detect the OK and NotOK states, comparing the returned StdOut to the thresholds:

<ConditionDetection ID="CDProcessCountOK"
  TypeID="System!System.ExpressionFilter">
<Expression>
<And>
<Expression>
<SimpleExpression>
<ValueExpression>
 <XPathQuery Type="Double">
 //*[local-name()="StdOut"]
</XPathQuery>
</ValueExpression>
<Operator>GreaterEqual</Operator>
<ValueExpression>
<Value Type="Double">$Config/MinCount$</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type="Double">
 //*[local-name()="StdOut"]
</XPathQuery>
</ValueExpression>
<Operator>LessEqual</Operator>
<ValueExpression>
<Value Type="Double">$Config/MaxCount$</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
</And>
</Expression>
</ConditionDetection>

And

<ConditionDetection ID="CDProcessCountNotOK"
  TypeID="System!System.ExpressionFilter">
<Expression>
<Or>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type="Double">
 //*[local-name()="StdOut"]
</XPathQuery>
</ValueExpression>
<Operator>Greater</Operator>
<ValueExpression>
<Value Type="Double">$Config/MaxCount$</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type="Double">
 //*[local-name()="StdOut"]
</XPathQuery>
</ValueExpression>
<Operator>Less</Operator>
<ValueExpression>
<Value Type="Double">$Config/MinCount$</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
</Or>
</Expression>
</ConditionDetection>

Regular Detections

The regular detections just need to be configured to map the Condition Detection member modules to the defined Health States:

<RegularDetections>
<RegularDetection
  MonitorTypeStateID="ProcessCountOK">
<Node ID="CDProcessCountOK">
<Node ID="DS1"/>
</Node>
</RegularDetection>
<RegularDetection
  MonitorTypeStateID="ProcessCountNotOK">
<Node ID="CDProcessCountNotOK">
<Node ID="DS1"/>
</Node>
</RegularDetection>
</RegularDetections>

In summary, this monitor type will execute our ps command, for a given service name, and compare the count of running/matched process instances  to the defined thresholds (defined during service discovery).

Building the Custom Process Monitor

In this step, we will create a monitor, of the type that we just created: MyApp.Monitoring.MonitorType.ProcessCount

As for the monitor configuration, the service name, and minimum and maximum thresholds were discovered in the service discovery rule, so we can simply provide those target parameters to the monitor configuration.
<Configuration>
<Interval>180</Interval>
<TargetSystem>
 $Target/Host/Host/Property[Type="MicrosoftUnixLibrary!
     Microsoft.Unix.Computer"]/NetworkName$
</TargetSystem>
<ProcessName>
 $Target/Property[Type="MyApp.Monitoring.Service"]/Name$
</ProcessName>
<MinCount>
 $Target/Property[Type="MyApp.Monitoring.Service"]/MinRunning$
</MinCount>
<MaxCount>
 $Target/Property[Type="MyApp.Monitoring.Service"]/MaxRunning$
</MaxCount>
</Configuration>
For the actual alert, we can embed the parameters from the configuration and shell script output, to create an alert message detailing the current count of running processes as well as the minimum and maximum expected range.
The MyApp Service: $Target/Property[Type=”MyApp.Monitoring.Service”]/Name$, currently has too many or too few processes running.
Current process count:  $Data/Context///*[local-name()=”StdOut”]$.
Expected range:
$Target/Property[Type=”MyApp.Monitoring.Service”]/MinRunning$ – $Target/Property[Type=”MyApp.Monitoring.Service”]/MaxRunning$.

Stay tuned for more in this series…

Advertisement

About Kristopher Bash
Kris is a Senior Program Manager at Microsoft, working on UNIX and Linux management features in Microsoft System Center. Prior to joining Microsoft, Kris worked in systems management, server administration, and IT operations for nearly 15 years.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: