OpsMgr UNIX/Linux – New Update Rollups Released

Update Rollup 5 for System Center 2012 SP1 and Update Rollup 1 for System Center 2012 R2 been released today with important Operations Manager UNIX/Linux monitoring fixes.  The UNIX and Linux Monitoring Packs (and updated agents) can be found here.

The fixes for UNIX and Linux include:

System Center 2012 R2 – Operations Manager: Update Rollup 1

Issue 1

On a Solaris-based computer, an error message that resembles the following is logged in the Operations Manager log. This issue occurs if a Solaris-based computer that has many monitored resources runs out of file descriptors and does not monitor the resources. Monitored resources may include file systems, physical disks, and network adapters.

Note The Operations Manager log is located at /var/opt/microsoft/scx/log/scx.log.

errno = 24 (Too many open files)
This issue occurs because the default user limit on Solaris is too low to allocate a sufficient number of file descriptors. After the rollup update is installed, the updated agent overrides the default user limit by using a user limit for the agent process of 1,024.

Issue 2

If Linux Container (cgroup) entries in the /etc/mtab path on a monitored Linux-based computer begin with the “cgroup” string, a warning that resembles the following is logged in the agent log.

Note When this issue occurs, some physical disks may not be discovered as expected.

Warning [scx.core.common.pal.system.disk.diskdepend:418:29352:139684846989056] Did not find key ‘cgroup’ in proc_disk_stats map, device name was ‘cgroup’.
Issue 3

Physical disk configurations that cannot be monitored, or failures in physical disk monitoring, cause failures in system monitoring on UNIX and Linux computers. When this issue occurs, logical disk instances are not discovered by Operations Manager for a monitored UNIX-based or Linux-based computer.

Issue 4

A monitored Solaris zone that is configured to use dynamic CPU allocation with dynamic resource pools may log errors in the agent logs as CPUs are removed from the zone and do not identify the CPUs currently in the system. In rare cases, the agent on a Solaris zone with dynamic CPU allocation may hang during routine monitoring.

Note This issue applies to any monitored Solaris zones that are configured to use dynamic resource pools and a “dedicated-cpu” configuration that involves a range of CPUs.

Issue 5

An error that resembles the following is generated on Solaris 9-based computers when the /opt/microsoft/scx/bin/tools/setup.sh script does not set the library pathcorrectly. When this issue occurs, the omicli tool cannot run.

ld.so.1: omicli: fatal: libssl.so.0.9.7: open failed: No such file or directory
Issue 6

If the agent does not retrieve process arguments from the getargs subroutine on an AIX-based computer, the monitored daemons may be reported incorrectly as offline. An error message that resembles the following is logged in the agent log:

Calling getargs() returned an error
Issue 7

The agent on AIX-based computers considers all file cache to be available memory and does not treat minperm cache as used memory. After this update rollup is installed, available memory on AIX-based computer is calculated as: free memory + (cache – minperm).

Issue 8

The Universal Linux agent is not installed on Linux computers that have OpenSSL versions greater than 1.0.0 if the library file libssl.so.1.0.0 does not exist. An error message that resembles the following is logged:

/opt/microsoft/scx/bin/tools/.scxsslconfig: error while loading shared libraries: libssl.so.1.0.0: cannot open shared object file: No such file or directory
System Center 2012 SP1 – Operations Manager: Update Rollup 5

Issue 1

In rare cases, the agent on a Solaris Zone that is configured to use dynamic CPU allocation with Dynamic Resource Pools may hang during routine monitoring.

Note This issue can occur on any monitored Solaris Zone that is configured to use Dynamic Resource Pools and a “dedicated-cpu” configuration that involves a range of CPUs.

Issue 2

If the agent fails to retrieve process arguments from the getargs subroutine on an AIX-based computer, the monitored daemons may be incorrectly reported as offline. An error message that resembles the following is logged in the agent log:

Calling getargs() returned an error

.
Issue 3

The agent on AIX-based computers considers all file cache to be available memory and does not treat minperm cache as used memory. After this update rollup is installed, available memory on AIX-based computer is calculated as: free memory + (cache – minperm).

Issue 4

The Universal Linux agent fails to install on Linux computers with OpenSSL versions greater than 1.0.0 if the library file libssl.so.1.0.0 does not exist. An error message that resembles the following is logged:

/opt/microsoft/scx/bin/tools/.scxsslconfig: error while loading shared libraries: libssl.so.1.0.0: cannot open shared object file: No such file or directory
Issue 5

Spurious errors that are related to the InstanceID property of the SCX_Endpoint are logged in the System log every four hours.

Advertisements

System Center & UNIX/Linux – Recent Updates and Releases

October has been a month full of releases for System Center UNIX and Linux management features. In case you missed these, here is a recap:

System Center 2012 R2

System Center 2012 SP1

  • Update Rollup 4 is now available. UR4 for Operations Manager has several fixes for UNIX/Linux monitoring, and adds support for Debian GNU/Linux 7. The UNIX/Linux Management Packs and agents for UR4 are available here.

System Center Management Packs for Java EE

  • The Java EE MPs have been updated to add support for IBM WebSphere 8.x and Oracle WebLogic 12c.  Support for these new app server versions requires the updated UR4 agent for discovery on UNIX/Linux computers.

UNIX/Linux MP Authoring – Discovering and Monitoring Failover Clusters

In my last post, I walked through creation of an MP with dynamic discovery for a UNIX/Linux application. In this post, I’ll continue to demonstrate the use of the UNIX/Linux Authoring Library examples for MP authoring, but I will take the demonstration quite a bit deeper into authoring territory – by creating an MP for monitoring of Linux failover clusters. While the base UNIX and Linux operating system Management Packs don’t have built-in detection/monitoring of failover clusters, MP Authoring can be used to build a robust cluster monitoring solution.

In this post, I will walkthrough authoring of a basic MP for monitoring a Linux failover cluster. I have two goals for this post:

  1. Demonstrate the use of the UNIX/Linux Authoring Library for MP Authoring scenarios
  2. Demonstrate the creation of a basic cluster monitoring MP that can be modified to work with other cluster technologies and monitoring requirement

This is fairly involved MP Authoring, and is intended for the author with a bit of experience.

Note:  the example MP described in this blog post (and the VSAE project) can be found in the \ExampleMPs folder of the UNIX/Linux Authoring Library .zip file.

Background

The MP I am building is intended to perform discovery and monitoring of Linux failover clusters, though it could certainly be adapted to work for other cluster technologies.  Prior to starting on the MP implementation, I think it is useful to conceptually model the implementation.

Regardless of the specific technology, failover clusters tend to have the same general concepts.  Entities that I want to represent are:

  • Cluster nodes – hosts that participate in the failover cluster
    • Monitor for requisite daemons
    • Monitor for quorum state
  • Cluster – a “group,” containing the member nodes as well as clustered resources
    • Roll-up monitors describing the total state of the cluster
  • Service – a clustered service, such as a virtual IP or Web server
    • Monitor each service for availability

These conceptual elements will need to be described in Management Pack ClassTypes and corresponding RelationshipTypes.  A basic diagram of my intended implementation looks like:
MPDiagram

Tools and Commands

For both dynamic discovery of the cluster nodes, as well as monitoring of the cluster resource status, I leveraged the clustat utility.

As an example, the clustat output in my test environment, with two nodes and a single virtual IP address as a service, looks like:

[monuser@clnode1 ~]$ sudo clustat
 Cluster Status for hacluster @ Tue Jul 23 19:24:47 2013
 Member Status: Quorate
Member Name ID Status
 ------ ---- ---- ------
 clnode1 1 Online, Local, rgmanager
 clnode2 2 Online, rgmanager
Service Name Owner (Last) State
 ------- ---- ----- ------ -----
 service:IP clnode1 started

As you can see, the output here can be parsed and used in discovery of cluster nodes, cluster, and services, while also providing health information about the cluster nodes, cluster, and services. Depending on versions, clustat may require privileges (i.e. sudo), but it has the advantage of being a status reporting tool that is not used in cluster configuration, so it is a useful command line tool for monitoring purposes.

Creating the MP

Getting Started

Just like with my previous example, starting off on this MP involves creating a new Management Pack project with the VSAE:

GettingStarted

Read more of this post

UNIX/Linux MP Authoring – Dynamic Discovery of Roles & Applications

In a previous post, I demonstrated the use of Operations Manager’s “Shell Command Templates” to customize monitoring by creating a management pack for monitoring BIND DNS servers. The Shell Command Templates offer a great deal of customization potential through simple wizards, but one limitation is that the templates require the target class to already be defined. In other words, they cannot dynamically discover the presence of an application/role on monitored computers. One way to generate a custom class to represent an application or role is to create a UNIX/Linux Process Monitor with the MP Template and target the Process Monitor to a group containing the UNIX/Linux computers running this service. This will create a custom class representing the process, which can be used as a target for the Shell Command Templates. However, this still requires that the group members are explicitly defined, or dynamically populated based on properties such as the hostname. Such an approach is not a bad option at all, but still not quite dynamic discovery. However, with the UNIX/Linux Authoring Library examples, creating a dynamic discovery of an application or role is a fairly easy undertaking. In this post, I will revisit that prior BIND DNS monitoring walkthrough, but demonstrate how the UNIX/Linux Authoring Library can be used to dynamically discover the DNS servers.

Note:  the example MP described in this blog post (and the VSAE project) can be found in the \ExampleMPs folder of the UNIX/Linux Authoring Library .zip file.

Walkthrough:  Dynamic Discovery for UNIX/Linux, using the Authoring Library examples

I will use the Visual Studio Authoring Extensions for this project, so starting the project entails creating a new MP project:

NewVSAEProject

Following the direction in the Getting Started documentation for the UNIX/Linux Authoring Library, I will then set up MP references and aliases. The suggested references in the Getting Started documentation are mostly sufficient, but I will add Microsoft.Linux.Library.mp (with an alias of Linux) as well in order to target discoveries to the Microsoft.Linux.Computer class.

Management Pack ID Alias
Microsoft.Linux.Library Linux
Microsoft.SystemCenter.DataWarehouse.Library  SCDW
Microsoft.SystemCenter.Library  SC
Microsoft.SystemCenter.WSManagement.Library  WSM
Microsoft.Unix.Library  Unix
Microsoft.Windows.Library  Windows
System.Health.Library  Health
System.Library  System
System.Performance.Library  Perf
Unix.Authoring.Library  UnixAuth

Discovering BIND DNS Servers

I’ll use a basic shell command to detect the installation of BIND on Linux computers in order to dynamically discover it.  This method could be used in multiple ways, such as:

  • Check for the presence of a file or directory (such as an installation directory, init script in /etc/init.d, or configuration file)
  • Check for an installed package with a package manager like rpm
  • Check for a running process with ps

In this case, I’ll look for a config file to indicate that BIND is installed on the computer.  Depending on the UNIX/Linux distro, and version of the application, the artifacts on the system that indicate the installation of the application may vary.  In my example, with BIND 9 on CentOS, I will look for the existence of /etc/named.conf to indicate that BIND is installed. The command I will use is: ls /etc/named.conf |wc –l  This will return a value of 1 if the file exists, and 0 if it does not.

To add the dynamic discovery to the MP, I will add a new Empty Management Pack Fragment to the VS project, named Discoveries.mpx.

mpfragment

Read more of this post

Introducing the UNIX/Linux Authoring Library for SC 2012 – Operations Manager

Over at the TechNet Gallery, I’ve posted a new resource I am calling the UNIX/Linux Authoring Library MP. This is an example library MP project, which features numerous Probe/Write Action and Data Source composite modules and numerous Unit Monitor Types. These modules should cover many common UNIX/Linux authoring scenarios. The library MP is provided as a sealed MP, along with the VSAE source project.

Documentation and examples for the using the Library can be found here.

Some of the key scenarios covered include:

 

The full reference for the library MP can be found on the documentation page here.

Happy authoring!

OpsMgr UNIX/Linux updates

Updates were released this week for UNIX/Linux features multiple versions of OpsMgr, with a good selection of fixes:

CU7 for SCOM 2007 R2 fixes the following issues for UNIX and Linux monitoring:

  • Logical disk performance statistics are not collected for some volume types on Solaris computers.
  • Some Network Adapters on HP-UX computers may not be discovered.
  • Network adapter performance statistics are not collected for HP-UX network adapters.
  • The Solaris 8 and 9 agent may not restart after an ungraceful shutdown.

The HP-UX managements have also been updated for 2007 R2, available in catalog or here.

For 2012 and 2012 SP1, fixes related to UNIX and Linux monitoring are delivered in MP updates. The fixes are described in the KB articles for the Update Rollup releases:

System Center 2012 – Operations Manager

  • When multiple process monitors are used to target the same computer or group, processes may incorrectly monitor some template instances.  Additionally, problems with the monitored processes may not be detected. This issue occurs when each process monitor uses the same name as the process even though different argument filters are used in each process monitor.
  • Logical disk performance statistics are not collected for certain volume kinds of disks on Solaris-based computers.
  • Certain Network Adapters on HP-UX computers may not be discovered.
  • Network adapter performance statistics are not collected for HP-UX Network Adapters.
  • Logical disk performance statistics are not collected for certain kinds of device on Linux-based computers.
  • In Solaris operating systems, memory that is used for the ZFS Adaptive Replacement Cache is considered used memory incorrectly.

UR1 for System Center 2012 SP1 – Operations Manager

  • When multiple process monitors are used to target the same computer or group, processes may incorrectly monitor some template instances.  Additionally, problems with the monitored processes may not be detected. This issue occurs when each process monitor uses the same name as the process even though different argument filters are used in each process monitor.

For both 2012 and SP1, the UNIX/Linux MPs can be found here.

I’d also recommend checking out Bob Cornelisson’s notes on the updates:
http://www.bictt.com/blogs/bictt.php/2013/01/09/ur1-for-system-center-2012

http://www.bictt.com/blogs/bictt.php/2013/01/09/cu7-for-scom-2007-r2

Using PowerShell for automated UNIX/Linux Agent Discovery

PowerShell cmdlets for administration of UNIX/Linux agents were added in the System Center 2012 release of Operations Manager. There is good documentation available on the cmdlet use, but a basic discovery script might look something like this:

$SSHCredential=Get-SCXSSHCredential
$WSCredential=Get-Credential
$Pool = Get-SCOMResourcePool -DisplayName “All Management Servers Resource Pool”
$DiscResult = Invoke-SCXDiscovery -name $HostName -ResourcePool $Pool -WsManCredential $WSCredential -SshCredential $SSHcredential
$DiscResult | Install-scxagent

In this example, the Invoke-SCXDiscovery cmdlet is provided the following parameters:

  • $Hostname – the fqdn of the agent to discover
  • $Pool – the Resource Pool object used to discover and manage the agent, from Get-SCOMResourcePool
  • $WSCredential – a PSCredential object used for WSMan authentication, from Get-Credential
  • $SSHCredential – an ssh Credential object used for ssh authentication, from Get-SCXSSHCredential

If you have tried a PowerShell discovery like this, you’ll know that both Get-Credential and Get-SCXSSHCredential prompt you for credential input and don’t allow specification of passwords as command-line arguments.  This is for good reason, as plain-text scripts are a bad place to store passwords. However, this does have the effect of limiting your ability to truly automate UNIX/Linux agent discovery.  Well, with a bit of extra scripting, you can actually embed your credentials in a script in a fairly secure manner.

This article does a great job explaining how to securely write a password to a file, and then retrieve it from a script.  The steps to do this are:

  1. Logged in as the user that will run the script, create a credential object with:  $Credential = Get-Credential
  2. Write this as a secure string to a file:
    $credential.Password | ConvertFrom-SecureString | Set-Content $env:userprofile\password.txt

Now, this password can be read back into a script (but only if the script is run with the same user that wrote the password to the file), by using the following scriptlet:

$wsmanuser=”monuser”
$wsmanpassword =  Get-Content $env:userprofile\password.txt | ConvertTo-SecureString
$WSCredential = New-Object System.Management.Automation.PSCredential ($wsmanuser, $wsmanpassword)

Using this method, we can securely create the credential object to use as our WSManCredential value without a prompt, but Invoke-SCXDiscovery also needs an ssh Credential.  The ssh Credential is a bit more involved, but can be done in a similar fashion.

A function to create the ssh Credential object, using encrypted passwords stored in files is:

function Get-SCXSSHCredentialFromScript{
[CmdletBinding()]
param
(
[Parameter(Mandatory=$True)]
[string]$UserName,
[string]$PassphraseFile,
[string]$SSHKeyFile,
[string]$SuPasswordFile,
[string]$ElevationType
)

process {
$SSHcredential=””
$scred=””
$SSHcredential = New-Object Microsoft.SystemCenter.CrossPlatform.
ClientLibrary.CredentialManagement.Core.CredentialSet
$scred = New-Object Microsoft.SystemCenter.CrossPlatform.
ClientLibrary.CredentialManagement.Core.PosixHostCredential
$scred.Usage = 2
$scred.PrincipalName = $username

if ($PassphraseFile.Length -gt 0){
$sPassphrase=Get-Content $PassphraseFile | ConvertTo-SecureString
$scred.Passphrase = $sPassphrase
}

if ($SSHKeyFile.Length -gt 0)  {
$scred.KeyFile = $SSHKeyFile
Write-Host “Validating SSH Key”
$scred.ReadAndValidateSshKey()
}

#add posixhost credential to credential set
$SSHcredential.Add($scred)

if ($elevationType.Equals(“su”)) {
$sucred = New-Object Microsoft.SystemCenter.CrossPlatform.
ClientLibrary.CredentialManagement.Core.PosixHostCredential
$sucred.Usage = 32 #su elevation
$sucred.PrincipalName = “root”
$sucred.Passphrase = Get-Content $SUPasswordFile | ConvertTo-SecureString
$SSHcredential.Add($sucred)
}

if ($elevationType.Equals(“sudo”)) {
$sudocred = New-Object Microsoft.SystemCenter.CrossPlatform.
ClientLibrary.CredentialManagement.Core.PosixHostCredential
$sudocred.Usage = 16 #su elevation
$SSHcredential.Add($sudocred)
}
Return $SSHCredential
}
}

With this function defined, it can be used like this:

PS C:\Users\Administrator> $SSHCredential=Get-SCXSSHCredentialFromScript -username:monuser -PassphraseFile:$env:userprofile\password.txt –ElevationType:sudo

PS C:\Users\Administrator> $SSHCredential
SshUserName      : monuser
SshElevationType : sudo
Credentials      : {, }
Count            : 2
IsSSHKey         : False
Usage            : 0

Of course, this needs to be run in an OpsMgr shell, or the script needs to be prefaced with:

Import-Module “C:\Program Files\System Center 2012\Operations Manager\Powershell\OperationsManager\OperationsManager”
New-SCOMManagementGroupConnection localhost;

So, now we have snippets to create WSMan and ssh Credential objects, using an ecrypted password stored in a file.  Building on this, we can define a function to invoke UNIX/Linux discovery and install the agent:

function DiscoverSCXAgents{
[CmdletBinding()]
param
(
[Parameter(Mandatory=$True)]
[string]$Hostname
)
$PassphraseFile=”$env:userprofile\password.txt”
$SSHCredential=Get-SCXSSHCredentialFromScript -username:monuser -PassphraseFile:$PassphraseFile –ElevationType:sudo
$wsmanuser=”monuser”
$wsmanpassword =  Get-Content $PassphraseFile | ConvertTo-SecureString
$WSCredential = New-Object System.Management.Automation.PSCredential ($wsmanuser, $wsmanpassword)
$Pool = Get-SCOMResourcePool -DisplayName “All Management Servers Resource Pool”
Write-Output “Attempting discovery of $Hostname”
$DiscResult = Invoke-SCXDiscovery -name $HostName -ResourcePool $Pool -WsManCredential $WSCredential -SshCredential $SSHcredential
$DiscResult | fl Succeeded, ErrorData
$DiscResult | Install-scxagent
}

Testing out the new function:

DiscoverSCXAgents “lnx-db-007.contoso.com”

Attempting discovery of lnx-db-007.contoso.com

Name            AgentVersion  ManagementPackPlatformIdentifier  Id                                  
—-            ————  ——————————–  —                                  
lnx-db-007.c… 1.4.0-906     Microsoft.Linux.SLES.11           c5944ea1-e4f4-1908-ea15-d5be6ba7d14e

 

And with that, we can use an Orchestrator runbook to call our discovery script and fully automate UNIX/Linux agent discovery.