Scalability and Performance (Design and Testing) in the xSNMP Management Packs
December 14, 2009 9 Comments
In this post, I intend to describe some of the challenges in scaling SNMP monitoring in an Operations Manager environment to a large number of monitored objects, as well as my experiences from testing and the approaches that I took to address these challenges with the xSNMP Management Packs.
In spite of the market availability of many task-specific SNMP monitoring applications boasting rich feature sets, I think that a strong case can be made for the use of System Center Operations Manager in this SNMP monitoring role. Using a single product for systems and infrastructure (SNMP) monitoring facilitates unparalleled monitoring integration (e.g. including critical network devices/interfaces or appliances in Distributed Application Models for vital business functions). The rich MP authoring implementation, dynamic discovery capabilities, and object-oriented modeling approach allow a level of depth and flexibility in SNMP monitoring not often found in pure SNMP monitoring tools.
However, Operations Manager is first and foremost a distributed monitoring application, most often depending on agents to run small workloads independently. Inevitably, running centralized monitoring workloads (i.e. SNMP polls) in a distributed monitoring application is going to carry a higher performance load than the same workloads in a task-specific centralized monitoring application that was built from the ground up to handle a very high number of concurrent polls with maximum efficiency. This centralized architecture would likely feature a single scheduler process that distributes execution of polls in an optimized fashion as well as common polling functions implemented in streamlined managed code. With SNMP monitoring in Operations Manager, any optimization of workload scheduling and code optimization more or less falls to the MP author to implement.
While working on the xSNMP Management Packs, I spent a lot of time testing different approaches to maximize efficiency (and thus scalability) in a centralized SNMP monitoring scenario. I’m sure there is always room for continual improvement, but I will try to highlight some of the key points of my experiences in this pursuits.
Designing for Cookdown
Cookdown is one of the most important concepts in MP authoring when considering the performance impact of workflows. A great summary of OpsMgr cookdown can be found here. In effect, the cookdown process looks for modules with identical configurations (including input parameters) and replaces the multiple executions of redundant modules with a single execution. So, if one wanted to monitor and collect historical data on the inbound and outbound percent utilization and Mb/s throughput of an SNMP network interface, a scheduler and SNMP Probe (with VarBinds defined to retrieve the in and out octets counters for the interface) could be configured. As long as each of the rules and monitors provided the same input parameters to these modules for each interface, the scheduler and SNMP probe would only execute once per interval per interface. Taking this a step further, the SNMP probe could be configured to gather all SNMP values for objects to monitor in the IFTable for this interface (e.g. Admin Status, Oper Status, In Errors, Out Errors), and these values could be used in even more rules and monitors. The one big catch here is that the SNMP Probe module stops processing SNMP VarBind’s once it hits an error. So, it’s typically not a good idea to mix SNMP VarBinds for objects that may not be implemented on some agents with OIDS that would be implemented on all agents.
Another important concept in designing scalable SNMP management packs for Operations Manager is the scheduling of rule/monitor workflows. To work with an example, if one were to discover something like 10 switches with 100 network interfaces a piece, or roughly 1000 objects, and a single monitor were targeted to each interface (e.g. IF Oper Status), 1000 workflows would be scheduled. If a basic System.Scheduler module were used with only an interval configured, say 300 seconds, every 5 minutes 1000 SNMP probes would be triggered simultaneously. This simultaneous firing of probes will likely result in two problems: 1) the monitoring server will be briefly overloaded and many of the SNMP probes’ async calls are likely to time out and 2) the SNMP agent devices are likely to be overloaded and not respond to all of the SNMP Get requests in time.
There are two options for the Scheduler module that can be used to work around this scenario. One option is the SyncTime parameter. I did some experimentation with using a script discovery to randomize a SyncTime value (between 00:00 and 00:10) for each SNMP interface object, and using this SyncTime property in the Schedulers to distribute workflow execution over time. This works to an extent, but because the SyncTime only accepts HH:MM as an input, even with randomly distributed SyncTime values, workflows running on a five minute polling interval would still involve ~20% of all the interface monitoring workflows executing every minute. This was still too many simultaneous executions once the number of monitored objects got above a certain point. The second Scheduler parameter that can be used to distribute workflow scheduling is the SpreadInitializationOverInterval parameter, which I had written about previously in this post. This parameter (which is R2 specific) distributes the initial scheduling of the workflow over the defined interval, and is intended to address this very issue. This function works well, but there is one catch, which I will attempt to illustrate.
If a basic SNMP polling workflow were configured to poll an SNMP counter for a device, such as CPU load average or SysUptime, and the Scheduler module for this workflow was given an interval of 600 seconds and a SpreadInitializationOverInterval value of 600 seconds, the expected behavior is that the monitor would fire every 10 minutes and the initialization of the workflow for each device would be spread out over a 10 minute window, preventing all of the workflows from triggering concurrently. In this scenario, the actual behavior would be exactly as expected. However, if the scenario was a bit different and the SNMP probe was targeting a network interface hosted by a device, the resulting behavior would also be different.
In this hosted interface monitoring case, the cookdown process would “cookdown” all of these schedulers (which have the same input parameters) for the device and although the monitors for different devices would not fire concurrently, the interface monitors for a single device would all fire concurrently. This is still a problem when monitoring interfaces on switches, where encountering 100-200 interfaces on a single device is a very real possibility. So this is actually a situation where instead of designing for cookdown, more scalability can be affected by designing against cookdown. The easiest way to break cookdown is to simply pass unique input parameters to the module, so the approach I used to address this was along those lines. I configured a parameter (called DistribVal) on discovered network interface objects, and added a script discovery that would populate this parameter with a randomized number between 300-600 (seconds). This property value is then passed as the value for the SpreadInitializationOverInterval parameter for each workflow that targets interfaces. By using the value as a static property on the interface, all workflows for the specific interface will have the same parameters for the scheduler (thus allowing those workflows to cookdown for the interface), but two or more interfaces on the same device are unlikely to have the same parameters, thus preventing cookdown between interfaces. The net result is a very even distribution of workload over time, preventing both CPU spikes on the monitoring server(s) as well as floods of requests to the SNMP agents. The interval remains static, so the monitor/rule still fires every 5 or 10 minutes as configured, but the initial firing is variable on a per interface basis.
It’s pretty much inevitable when working with SNMP monitoring in OpsMgr that script modules will be required in order to manipulate SNMP data, as I described in this post. If the most efficient code execution could be realized through .NET managed code, the PowerShell script modules in R2 are certainly the next best thing. As I described in this post, the PowerShell script modules do not require the spawning of a separate unmanaged process, and when compared to the cscript.exe–based script modules, the PowerShell script modules introduce really minimal performance overhead. That being said, I have found that there are some considerations that should be noted when working with the PowerShell modules, particularly related to memory consumption.
With the classic OpsMgr script modules, a cscript.exe process is spawned to execute the script and in most cases return a property bag of XML data. Once the spawned cscript.exe process is terminated, any memory consumed by objects or variables in the scripts is released. The PowerShell modules do not involve spawning an unmanaged process, and any memory consumed by objects or variables is subject to .NET garbage collection, at least until a Health Service restart. While I have not performed specific before and after testing to quantify the benefits, I tried to help .NE T garbage collection along in these management packs by explicitly removing variables in each script and scheduling a forced garbage collection every two hours with a rule.
General Performance Considerations
Before going any deeper into the general performance considerations of running a high number of OpsMgr SNMP workflows, I wanted to reiterate the importance of offloading these workflows to another management server or agent proxy. There are a number of processes specific to the RMS in an OpsMgr Management Group (like group calculation, etc), and adding a significant CPU, memory, or I/O strain from monitoring can negatively impact the function of the RMS (and the monitoring). In small environments with a single management server, certainly some degree of centralized SNMP monitoring can be handled by the RMS, but optimal functionality will be realized by offloading this monitoring to an agent-proxy or other management server, preferably dedicated to the task. Fortunately, it’s very easy to designate an agent system to function as a proxy for specific SNMP Network Devices in Ops Mgr, and although the Discovery Wizard requires that SNMP devices be discovered by a management server or gateway, all of the SNMP monitoring in the management packs that I have been working on can be offloaded to agent systems, saving on licensing costs (agent licenses as opposed to management server licenses) in large deployments.
The three primary system resources that come in to play with a heavy SNMP monitoring load in an OpsMgr environment are CPU, Memory, and Disk I/O (no surprise there), and I’ll try to discuss each of these separately. Firstly: CPU. With 20K-35K concurrent workflows on an agent-proxy or MS system performing SNMP monitoring (which would equate to just a few thousand monitored objects), quite a heavy CPU load can be generated. During the initial discovery of objects, the HealthService.exe on the agent proxy system can hammer CPU as it runs through discoveries and creates the discovery data instances. Typically, this activity finishes in relatively short order and stabilizes as subsequent discoveries are spread out over ranges from 4-24 hours. The CPU load then gives way to MonitoringHost.exe, which handles all of the monitoring workflow work. In the most extreme of my lab tests, I configured a single agent proxy (Win2008 x64, 4x2GHz CPU cores, 4GB of RAM) to monitor just over 3500 interfaces on a five minute polling cycle. Each interface was monitored for status, in and out throughput and utilization, in and out errors, and the Ethernet interfaces were also monitored for FCS and Late Collision errors (these two were on a ten minute polling cycle). After doing a number of tests with different monitoring configurations, the best performing configuration resulted in a load of about 60-70% average CPU. Once I had the schedulers worked out to distribute the load as described in a preceding section of this post, CPU utilization was pretty static, major spikes were avoided, and monitoring proved to be reliable. I was initially hoping to be able to tune the CPU demands of the workflows beyond this, but I was running out of ideas for efficiency improvement and with this load of incoming data on a single system, some other issues started to manifest. An agent system (including a management server) first writes any workflow data to the local Health Service Store (an EDB database) before sending it upstream. This facilitates caching in the event of a communication break between the agent and upstream servers. With 3500+ interfaces, the size of the Health Service Store could get fairly large, and the time to replay logs and create “Global Snapshots” following restarts of the Health Service begins to push the edge of being problematic. Some of the beta testers of these management packs (who I can’t thank enough) have been running with 1800-2000 monitored interfaces per dedicated agent proxy (virtualized systems) with an average CPU in the 40-55% range. To scale beyond 2000-3000 interfaces, a few options exist: 1) adding additional agent proxies, 2) increasing the polling interval, and/or 3) targeting only required monitoring to interfaces (i.e. some interfaces may require only status monitoring and not status, performance, and error monitoring).
Memory is another area of concern in big deployments. In my 3500+ monitored interface test, the combined memory utilization of HealthService.exe and MonitoringHost.exe averaged about 1.5GB – 1.75GB, which is high, but sustainable and less of a bottleneck than the CPU utilization. For SNMP workflows that require the calculation of delta values between two polling samples, two general approaches can be used: 1) write the polling data to a local temporary file (with a script probe) or 2) convert the values to performance data and use the System.Performance.DeltaValueCondition module to calculate the delta value in memory. I tested these two approaches with various network interface modules in the hopes of determining which approach was more scalable. From my observations, the I/O hit of writing a few thousand temporary files every polling cycle seems to be less costly than the memory hit of sustaining multiple values in memory for the delta calculation for thousands of interfaces.
While disk I/O (on the agent proxy) is not as obvious of a performance concern in this type of monitoring, there is one specific area that I feel is worth mentioning. I had previously seen a post on Cameron Fuller’s blog describing a trick to improve write throughput to the Health Service Store by mapping a LUN (preferably on another spindle) to the Health Service Store directory and formatting it with an 8KB allocation size. I did do some testing with this trick and found it to be worth about a 10% reduction in CPU utilization on a heavily loaded agent proxy. It’s certainly not a make-or-break difference, but might be worth considering if a slight performance gain is worth the effort. I tried to take this a step further and mounted the Health Service Store on a RAM disk, to remove the storage subsystem entirely, but this didn’t seem to manifest any additional significant improvement.
I hope that these notes help illustrate some of the design and testing considerations involved in the development of these management packs and/or are of some help to others working on MP’s for centralized monitoring scenarios with Ops Mgr.