Operations Manager – Extending UNIX/Linux Monitoring with MP Authoring – Part II
March 24, 2011 Leave a comment
Introduction
Advanced Service Monitoring
-
Define a custom service class, and discover an instance of this class for each service to monitor, configure monitor types and monitors targeting this class
-
Create a monitor for each service to monitor, targeting the custom application class
Both methods are completely viable, and in most cases, it is appropriate to take the simpler approach and target the custom monitors to the application, providing static inputs into the monitor. There are some cases where discovering a class instance for the service makes sense though. Facilitating dynamic discovery of services or thresholds (read from a config file), using the service class in a Distributed Application model in OpsMgr, or maintaining logical seperation (in terms of monitoring) between the application and its subsystems are all scenarios that would benefit from discovering the monitored services as class instances. For the purpose of illustration, I will discover the daemon to monitor in this example Management Pack as a class instance.
Class Definition
Class: MyApp.Monitoring.Service
Definition
-
ID: MyApp.Monitoring.Service
-
Base Class: Microsoft.Unix.ApplicationComponent
-
Name: MyApp Service
Properties
-
Name (String) – Key
-
MinRunning (Integer)
-
MaxRunning (Integer)
Discovery
Then we can define the data source to discover a service. In this case, we know the name of the service and the value of the properties, so we don’t need to actually poll the agent to return data. We can simply combine a Discovery Scheduler with a Discovery Data Mapper module to implement the data source. However, we want to be able to override the values of MinRunning and MaxRunning, so these will need to be exposed as overridable configuration parameters.
Therefore, I’ve chosen to implement this data source in two parts. The first data source, will simply combine a System.Discovery.Scheduler module and a System.Discovery.ClassSnapshotDataMapper module. This data source will accept Interval, ClassId and InstanceSettings parameters as inputs. The second data source will reference the first data source, but implement parameters for Service Name, MinRunning, and MaxRunning. By breaking this into two data sources, the first data source can be used for other simple discoveries.
Discovery Data Source: MyApp.Monitoring.DataSource.DiscoverObject
This is the data source that simply combines a scheduler and a discovery data mapper. It requires that the MapperSchema be added to the Configuration:
<Configuration> <IncludeSchemaTypes> <SchemaType> System!System.Discovery.MapperSchema </SchemaType> </IncludeSchemaTypes> …
Configuration Parameters:
- Interval (integer): Scheduler interval in seconds
- ClassId (string): ID of the Class to discover
- InstanceSettings (SettingsType): Discovery Instance Settings
Member Modules:
The first member module is the System.Discovery.Scheduler module, with the configuration:
<Scheduler> <SimpleReccuringSchedule> <Interval>$Config/Interval$</Interval> <SyncTime/> </SimpleReccuringSchedule> <ExcludeDates/> </Scheduler>
This is followed by a System.Discovery.ClassSnapshotDataMappermodule, with the configuration:
<ClassId>$Config/ClassId$</ClassId> <InstanceSettings> $Config/InstanceSettings$ </InstanceSettings>
So this data source accepts the arbitrary Instance Settings and Class Id and maps the inputs to Discovery Data.
Discovery Data Source: MyApp.Monitoring.DataSource.DiscoverService
This data source uses the MyApp.Monitoring.DataSource.DiscoverObject data source that we just created, but supports overridable inputs for the MinRunning and MaxRunning service class properties, by embedding the $Config/$ variables in the Instance Settings definition.
Configuration Parameters:
- Interval (integer): Scheduler interval in seconds – overridable
- TargetSystem (string): UNIX/Linux agent computer to execute the discovery
- AppName (string): The name of the application object (which is the key property for the hosting class instance)
- ServiceName (string): The name of the service to discover
- MinRunning (integer): The minimum threshold of running processes expected – overridable
- MaxRunning (integer): The maximum threshold of running processes expected – overridable
Member Modules
This data source only has one member module: MyApp.Monitoring.DataSource.DiscoverObject, with the configuration:
<Interval>$Config/Interval$</Interval> <ClassId> $MPElement[Name="MyApp.Monitoring.Service"]$ </ClassId> <InstanceSettings> <Settings> <Setting> <Name> $MPElement[Name='MyApp.Monitoring.MyApp']/Name$ </Name> <Value>$Config/AppName$</Value> </Setting> <Setting> <Name> $MPElement[Name='MyApp.Monitoring.Service']/Name$ </Name> <Value>$Config/ServiceName$</Value> </Setting> <Setting> <Name> $MPElement[Name='MyApp.Monitoring.Service']/MinRunning$ </Name> <Value>$Config/MinProcesses$</Value> </Setting> <Setting> <Name> $MPElement[Name='MyApp.Monitoring.Service']/MaxRunning$ </Name> <Value>$Config/MaxProcesses$</Value> </Setting> <Setting> <Name> $MPElement[Name='MicrosoftUnixLibrary! Microsoft.Unix.Computer']/PrincipalName$ </Name> <Value>$Config/TargetSystem$</Value> </Setting> <Setting> <Name> $MPElement[Name='System!System.Entity']/DisplayName$ </Name> <Value>$Config/ServiceName$</Value> </Setting> </Settings> </InstanceSettings>
Note that the key properties for the hosting class instances (MyApp.Monitoring.MyApp, and Microsoft.Unix.Computer) are included in the Instance Settings so that the relationships can be mapped. With the data sources in place, the next step is to configure the Discovery Rule (one for each service to discover).
Discovery Rule: MyApp.Monitoring.Discovery.MyAppDService
This rule discovers the MyAppD daemon, using the data source just created. It is targeted to instances of the MyApp class, and provides the name of the service and process count thresholds.
Data Source Configuration:
<Interval>28800</Interval> <TargetSystem> $Target/Host/Property[Type="MicrosoftUnixLibrary! Microsoft.Unix.Computer"]/PrincipalName$ </TargetSystem> <AppName> $Target/Property[Type="MyApp.Monitoring.MyApp"]/Name$ </AppName> <ServiceName>myappd</ServiceName> <MinProcesses>1</MinProcesses> <MaxProcesses>3</MaxProcesses>
Once the MP is imported and the dicsovery has run, we can see the discovered service in the Discovered Inventory view in the Ops Console:
Building the Monitor Type
In this example, we are going to monitor the process/daemon and generate an alert if the count of running instances is outside of the threshold range. To do this, we can use the Shell Command monitoring data source to execute a ps h -e |grep <process name> grep -v grep |wc –l command string. This calls the ps command, with switches to suppress the header row, and show all processes. Grep is used to find the process name, and exclude the grep process itself. Finally, the results are piped to wc –l to return a line count. The result of this command string is that the running count of processes matching the process name is returned as a numeric value.
Monitor Type: MyApp.Monitoring.MonitorType.ProcessCount
This monitor type will use the MyApp.Monitoring.DataSource.ShellCommandMonitoring data source, to call the ps command string described above. A set of condition detection modules are used to determine if the process count is below or above the minimum and maximum thresholds.
Configuration Parameters
- Interval (integer): Scheduler interval in seconds – overridable
- TargetSystem (string): UNIX/Linux agent computer to monitor
- ServiceName (string): The name of the service to monitor
- MinRunning (integer): The minimum threshold of running processes expected
- MaxRunning (integer): The maximum threshold of running processes expected
Health States
- ProcessCountOK
- ProcessCountNotOK
Member Modules
The data source for this monitor type is: MyApp.Monitoring.DataSource.ShellCommandMonitoring, with the configuration: <Interval>$Config/Interval$</Interval> <TargetSystem> $Config/TargetSystem$ </TargetSystem> <ShellCommand> ps h -e |grep $Config/ProcessName$|grep -v grep |wc –l </ShellCommand> <Timeout>120</Timeout>
Two configuration detection modules are required to detect the OK and NotOK states, comparing the returned StdOut to the thresholds:
<ConditionDetection ID="CDProcessCountOK" TypeID="System!System.ExpressionFilter"> <Expression> <And> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="Double"> //*[local-name()="StdOut"] </XPathQuery> </ValueExpression> <Operator>GreaterEqual</Operator> <ValueExpression> <Value Type="Double">$Config/MinCount$</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="Double"> //*[local-name()="StdOut"] </XPathQuery> </ValueExpression> <Operator>LessEqual</Operator> <ValueExpression> <Value Type="Double">$Config/MaxCount$</Value> </ValueExpression> </SimpleExpression> </Expression> </And> </Expression> </ConditionDetection>
And
<ConditionDetection ID="CDProcessCountNotOK" TypeID="System!System.ExpressionFilter"> <Expression> <Or> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="Double"> //*[local-name()="StdOut"] </XPathQuery> </ValueExpression> <Operator>Greater</Operator> <ValueExpression> <Value Type="Double">$Config/MaxCount$</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="Double"> //*[local-name()="StdOut"] </XPathQuery> </ValueExpression> <Operator>Less</Operator> <ValueExpression> <Value Type="Double">$Config/MinCount$</Value> </ValueExpression> </SimpleExpression> </Expression> </Or> </Expression> </ConditionDetection>
Regular Detections
The regular detections just need to be configured to map the Condition Detection member modules to the defined Health States:
<RegularDetections> <RegularDetection MonitorTypeStateID="ProcessCountOK"> <Node ID="CDProcessCountOK"> <Node ID="DS1"/> </Node> </RegularDetection> <RegularDetection MonitorTypeStateID="ProcessCountNotOK"> <Node ID="CDProcessCountNotOK"> <Node ID="DS1"/> </Node> </RegularDetection> </RegularDetections>
In summary, this monitor type will execute our ps command, for a given service name, and compare the count of running/matched process instances to the defined thresholds (defined during service discovery).
Building the Custom Process Monitor
In this step, we will create a monitor, of the type that we just created: MyApp.Monitoring.MonitorType.ProcessCount
<Configuration> <Interval>180</Interval> <TargetSystem> $Target/Host/Host/Property[Type="MicrosoftUnixLibrary! Microsoft.Unix.Computer"]/NetworkName$ </TargetSystem> <ProcessName> $Target/Property[Type="MyApp.Monitoring.Service"]/Name$ </ProcessName> <MinCount> $Target/Property[Type="MyApp.Monitoring.Service"]/MinRunning$ </MinCount> <MaxCount> $Target/Property[Type="MyApp.Monitoring.Service"]/MaxRunning$ </MaxCount> </Configuration>
The MyApp Service: $Target/Property[Type=”MyApp.Monitoring.Service”]/Name$, currently has too many or too few processes running.Current process count: $Data/Context///*[local-name()=”StdOut”]$.Expected range:
$Target/Property[Type=”MyApp.Monitoring.Service”]/MinRunning$ – $Target/Property[Type=”MyApp.Monitoring.Service”]/MaxRunning$.