OpsMgr 2012 UNIX/Linux Authoring Templates: Process Monitoring

In Operations Manager, custom rules and monitors can be used to extensively build on the out-of-the-box Management Pack contents. Unfortunately, this kind of custom authoring for UNIX/Linux monitoring carried a steep learning curve with OpsMgr 2007 R2. However, the 2012 release of Operations Manager has some new features to enable many common UNIX/Linux authoring scenarios using templates, directly from the console.  The first of these new templates I wanted to cover is the new process monitoring template.

UNIX/Linux Process Monitoring Template

Operations Manager 2007 R2 included the Unix Service Monitoring template for custom monitoring of daemons on UNIX and Linux agents.   This template has been replaced in the System Center 2012 release of Operations Manager with the far more capable UNIX/Linux Process Monitoring template.   The new UNIX/Linux Process Monitoring template allows more flexibility in process/daemon monitoring, including the ability to monitor for minimum and maximum process count thresholds, and the ability to filter processes on arguments in addition to the process name. For this example, I will walk through the use the UNIX/Linux Process Monitoring template to monitor a Tomcat daemon.

The UNIX/Linux Process Monitoring template is accessible in the Authoring Pane of the Operations Console.   It can be launched with the “Add Monitoring Wizard” task under the Management Pack Templates view.

Read more of this post

Advertisement

OpsMgr: UNIX/Linux Heartbeat Failures After Applying KB2585542

The OpsMgr UNIX/Linux monitoring team at Microsoft is currently investigating an issue that results in heartbeat failures on Operations Manager UNIX/Linux agents after the security update KB2585542 is applied to a Management Server or Gateway.  This update fixes a vulnerability in SSL/TLS1.0, but appears to cause WS-Management connections to UNIX/Linux agents to fail. 

The vulnerability is described in bulletin MS12-006, and more information can be found in the KB article.  While we continue to investigate options for resolving this issue, there are two viable workarounds (which must be applied to all Mgmt Servers and Gateways that manage UNIX/Linux agents):

  1. Uninstall the update KB2585542 
  2. Make a registry modification to disable the SecureChannel changes implemented in the update

Note: the registry modification described here and in the KB article effectively disables the security fix that the update implements, so the modified system is subject to the same vulnerability as an unpatched system.

Modifying the registry to disable the SecureChannel changes:

  • A “FixIt” package is available in the KB article under the Known Issues section that can be used to disable the security update
  • Alternatively, you can add the 32bit DWORD value:
    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control
    \SecurityProviders\SCHANNEL\

     SendExtraRecord = 2

These changes take effect immediately and do not require a reboot.

Operations Manager UNIX/Linux Agent Certificates (and using a PKI)

Introduction

UNIX and Linux agent monitoring in Operations Manager requires certificates to secure the SSL communication channel between the Management Servers and agents.  In this post, I will provide some background information on this communication and the certificates, as well as describe an optional approach to replace the default Operations Manager certificate infrastructure with your organization’s Public Key Infrastructure.

The Protocols

The Operations Manager UNIX/Linux agent is a very lightweight agent implementation, comprising a CIM Object Manager (OpenPegasus) and CIM Providers.   Unlike Operations Manager Windows agents, the UNIX/Linux agent doesn’t have a health service, and doesn’t run workflows locally.  Rather, the Management Server (or Gateway) that manages the agent runs the workflows and remotely connects to the UNIX/Linux agent to retrieve current data.  

There are two protocols involved in the communication between the Management Server and the UNIX/Linux agent:  ssh and WS-Management.   

Ssh is used purely for agent maintenance activities, and is not used for any monitoring.   Operations like agent installation, uninstallation, upgrade, or agent daemon restart (through a recovery task) are executed over ssh.    Ssh facilitates the transfer of files and execution of remote commands for these operations when the agent daemon is unavailable.  

WS-Management (or WSMan) is the core protocol used in UNIX/Linux monitoring.   WSMan is a SOAP-based protocol for cross-platform management.   All monitoring operations (e.g. enumerating CIM providers for data on file systems, memory, etc, execution of commands/scripts for monitoring, executing log file reads for monitoring) are implemented over WSMan.   As WSMan is a web service protocol, the OpenPegasus-based CIMOM functions as a secure web server (user credentials are authenticated through PAM).  This is where the agent certificate comes in to play.

The Certificate

The UNIX/Linux agent certificate is quite simply used to secure the WSMan connection using SSL and provide authentication for the remote agent host.   The requirements for this certificate are:

  • The certificate is a server authentication certificate (Enhanced Key Usage: 1.3.6.1.5.5.7.3.1)
  • The CN of the certificate matches the FQDN that the Management Server uses to connect to the agent
  • The certificate is signed by a trusted authority (and can be checked for revocation)

When the Operations Manager UNIX/Linux agent is installed, it generates a certificate (using openssl) at the path:  /etc/opt/microsoft/ssl.  The file name of the certificate is scx-host-<hostname>.pem and the corresponding private key is named scx-key.pem.   The agent actually looks for the certificate at /etc/opt/microsoft/scx/ssl/scx.pem, which is initially configured as a symbolic link pointing to scx-host-<hostname>.pem.

Upon initial agent installation, the certificate is not signed, and is not usable for securing the WSMan SSL communication.

Note:  when initially creating the certificate, the agent attempts to determine the agent hostname for use as the CN value of the certificate.   In cases where the DNS name known to the local host does not match the FQDN that OpsMgr will use to communicate with the agent, additional steps are required to establish a valid certificate.  More information can be found here: http://technet.microsoft.com/en-us/library/dd891009.aspx

Certificates and Management Servers

When a Management Server discovers a UNIX or Linux agent, the server uses its certificate to sign the agent certificate, acting like a standalone Certificate Authority.  In the discovery process, this actually involves securely transferring the certificate from the agent to the Management Server, signing it, copying it back to the agent, and restarting the agent daemon.  

In order to move an agent between Management Servers, the new Management Server must trust the certificate that was used to sign the agent’s certificate.  This becomes particularly important in the 2012 version of Operations Manager, where agents will move automatically between the Management Servers that are members of the Resource Pool managing the agent.  For more information on the procedure to trust a server’s certificate from another server, review this document: http://technet.microsoft.com/en-us/library/hh287152.aspx.

Using a PKI Instead of Management Servers for Signing

Because the certificates used for securing the agent SSL channel are not proprietary, a separate Public Key Infrastructure can be used to manage the agent certificates, if the PKI option is appealing for your organization.  While this requires some additional resources in the environment (a Certificate Authority) and customization, there are a few benefits to using a PKI: 

  • Certificate policies are controlled by the PKI and customizable to meet your organization’s security requirements
  • Migrations of agents between Management Servers (within or between Resource Pools) can be done without exporting/importing Management Server certificates – simplifying the provisioning of Management Servers.
  • More options exist for automation of agent deployment and certificate signing

The procedure to use a PKI instead of Management Server signed certificates varies with different requirements and environments, but I will describe the steps required for one example approach.  This example assumes that the Certificate Authority is a Windows 2008 Certificate Authority. 

Prerequisites:

  1. Configure the certificate template on the Certificate Authority – you can use the “Web Server” template or a copy of it – configure options and permissions, publish the template.
  2. Import the CA certificate from the signing CA  to the trusted authorities list on every management serverthat will manage the UNIX/Linux agents:
    1. certutil -f -config “<CAHostname>\<CAName>” -ca.cert <CACertFile>
    2. certutil -addstore Root <CACertFile>

Per-Agent steps:

  1. Install the agent – this can be done through the OpsMgr Discovery Wizard, manually, or with another package distribution tool.  If you use the OpsMgr Discovery Wizard to install the agent, the agent will generate a certificate that is signed by the management server, but this can be replaced with your PKI CA signed certificate.
  2. Generate a cert signing request – either create a new private key with OpenSSL or use the private key generated during the agent install
    1. a.      Command to generate a CSR using the key generated during agent install:
      openssl req -new -key /etc/opt/microsoft/scx/ssl/scx-key.pem  -subj /CN=<FQDN of agent host> -text -out <OutputPath>
    2. 3.       Copy the CSR back to a Windows machine
    3. 4.       Submit the CSR to the CA – this command assumes auto-enrollment is enabled and authorized:
      1. certreq.exe -submit -config <CAHostName>\<CAName> -attrib “CertificateTemplate:<TemplateName>” <CSR FileName> <OutputCertName>
      2. Copy the signed cert back to the UNIX/Linux agent using  a secure copy method.  If auto-enrollment was used in step 4, the value for <OutputCertName>  specifies the file name of the signed certificate to copy to the agent.
      3. Update the symbolic link: /etc/opt/microsoft/scx/ssl/scx.pem to point to your new signed certificate
      4. Restart the agent:  /opt/microsoft/scx/bin/tools/scxadmin –restart
      5. Discover the agent using the Operations Console or PowerShell Cmdlet

Automation and Customization Opportunities

All of the per-agent steps described above can be executed from a command line, meaning that this procedure can be automated through scripting.  Using a script on a Windows server, the UNIX/Linux commands and file copying actions can be executed with SSH utilities like PuTTY’s plink and pscp.  For really robust automation capabilities, all of the steps can be implemented in a PowerShell script – I like the plink.exe integration example described on this blog: http://www.christowles.com/2011/06/how-to-ssh-from-powershell-using.html.

Aside from the primary benefits of automating these steps in terms of reducing manual interactions, other customization opportunities are exposed with using this scripting approach.  For example, if your DNS infrastructure and UNIX/Linux agent hostnames don’t neatly correlate, you could modify step 2 of the per-agent steps to also generate a new certificate with openssl using the desired FQDN as the certificate’s CN (http://technet.microsoft.com/en-us/library/dd891009.aspx).  Alternatively, if you are using Operations Manager 2007 R2 and want to implement agent deployment and certificate signing using sudo elevation instead of root credentials, the UNIX/Linux host commands in the per-agent steps could be prepended with the sudo command (this functionality is built into the 2012 version of Operations Manager).

 

 

Operations Manager Cross-Platform Authoring: Invoke Action Monitor

When monitoring UNIX/Linux servers, command execution or script-based monitors can provide a great deal of flexibility in many health-checking applications.   The Operations Manager 2007 R2 cross-platform agent facilitates the execution of shell command lines or executable binaries and scripts with the Microsoft.Unix.WSMan.Invoke.Probe module.   In this post, I will walk through the use of this module in an example monitoring scenario:  monitoring UNIX/Linux systems for the count of defunct/zombie processes.   The management pack described in this post can be downloaded here.

Background

The Microsoft.Unix.WSMan.Invoke.Probe is a nicely wrapped implementation of the module Microsoft.SystemCenter.WSManagement.TimedInvoker from the Microsoft.SystemCenter.WsManagement.Library management pack.  The Microsoft.Unix.WSMan.Invoke.Probe facilitates the execution of commands or processes on the agent with two Invoke Actions:  ExecuteCommand and ExecuteShellCommand.   The ExecuteCommand Invoke Action executes a script or binary executable (along with command line parameters), whereas the ExecuteShellCommand executes a command string in a shell environment.   While similar, a key functional difference between the two is that the ExecuteShellCommand Invoke Action supports command-line pipe operations while the ExecuteCommand Invoke Action does not.   So, any output filtering with awk, sed, or grep (for example) will require the use of the ExecuteShellCommand Invoke Action.  An example of using the ExecuteCommand Invoke Action in a discovery and monitor can be found in the Cross Platform MP Authoring Guide.   However, one advantage of using ‘one-liner’ commands with the ExecuteShellCommand Invoke Action in monitoring scenarios as opposed to calling local scripts with ExecuteCommand is that the need to distribute and maintain scripts to agents is eliminated and the monitoring script is thus embedded in the MP to be managed centrally. 

As for monitoring of defunct processs count, the UNIX ps command can easily be utilized to identify defunct/zombie processes.  With some output manipulation by grep and awk, the command string can be configured to return just the number of defunct processes to StdOut: ps -eo ‘s’ | grep Z | awk ‘END{print NR}’

Turning this command into a functional monitor then just requires a data source to execute the InvokeAction, a monitor type to define the condition detections and health states, and a unit monitor.

Walk Through

Read more of this post