The issue of closing alerts triggered by monitors – proposing a new solution

The issue is by no means new to SCOM community, so I will just briefly mention it here: monitoring inconsistencies are introduced when an alert (generated by a monitor) is closed without the error condition itself being resolved; that monitor in an unhealthy state will never raise another alert to warn you of the underlying error condition until the monitor returns to a healthy state back again.

My solution to address this issue is the Management Pack that I uploaded to TechNet gallery: https://gallery.technet.microsoft.com/Alerts-Watchdog-Management-d5b3ea77

The MP is actually simple: it only implements a rule that has a DataSource of type Microsoft.SystemCenter.SubscribedAlertProvider with the Criteria defined as:
Resolution State = 255 and IsMonitorAlert = True and LastModifiedBy <> System and LastModifiedBy <> Auto-resolve and LastModifiedBy <> Maintenance Mode.

The rule has 2 write actions:

– First to generate an informational alert, something in line with the example below:

Alert Name: Logical Disk Free Space is low was closed manually by
Alert Description:
The alert was triggered by monitor {12631e6d-900b-d685-a713-e821d2c06c70}
The Managed Entity Full Name is: Microsoft.Windows.Server.2008.LogicalDisk:;C:
(with Id {d28f408e-1bf7-ffeb-c42c-695b889f8496})
The health state will be now reset to ensure consistent monitoring…

– Second is a type Microsoft.Windows.PowerShellPropertyBagWriteAction write action and it executes a script that is resetting the associated monitor (passed parameters are $Data/WorkflowId$ and $Data/ManagedEntity$ from the DataItem provided by the DataSource)

Advertisements

How about Advanced instead of just Simple (System.SimpleScheduler)

Monitoring configurations that require being active on a specific schedule can be dealt in various ways.

One possibility is to take care of the scheduling while implementing the Monitor Type, inserting System.SchedulerFilter condition detection modules in the regular detections for monitor states. Depending on what workflows you are configuring, this might be the right and only approach.

But sometimes, if you can control your starting point (data source), you would be better off (read efficient) having the scheduling implemented right there, at beginning. To explain a bit better, with an example: instead of performing a check with a script at a certain interval and then drop the findings because outside of the schedule, the propose is to execute the script according to the active schedule and not waste resources for nothing, when results do not matter anyhow.

The System.Library Management Pack does not have such Datasource module, but it does provide the right building blocks to create one.
Use the following module to initiate a workflow at a regular interval, on a schedule of your choice.

<DataSourceModuleType ID="System.AdvancedScheduler" Accessibility="Public" Batching="false">
	<Configuration>
	  <IncludeSchemaTypes>
		<SchemaType>System!System.ExpressionEvaluatorSchema</SchemaType>
	  </IncludeSchemaTypes>
	  <xsd:element name="IntervalSeconds" type="xsd:int"/>
	  <xsd:element name="SyncTime" type="xsd:string"/>
	  <xsd:element name="Schedule" type="PublicSchedulerType" />
	</Configuration>
	<OverrideableParameters>
	  <OverrideableParameter ID="IntervalSeconds" ParameterType="int" Selector="$Config/IntervalSeconds$"/>
	  <OverrideableParameter ID="SyncTime" ParameterType="string" Selector="$Config/SyncTime$"/>
	</OverrideableParameters>
	<ModuleImplementation Isolation="Any">
	  <Composite>
		<MemberModules>
		  <DataSource TypeID="System!System.SimpleScheduler" ID="DS1">
			<IntervalSeconds>$Config/IntervalSeconds$</IntervalSeconds>
			<SyncTime>$Config/SyncTime$</SyncTime>
		  </DataSource>
		  <ConditionDetection ID="OnScheduleCD" TypeID="System!System.SchedulerFilter">
			<SchedulerFilter>
			  <ProcessDataMode>OnSchedule</ProcessDataMode>
			  <Schedule>$Config/Schedule$</Schedule>
			  <UseCurrentTime>true</UseCurrentTime>
			</SchedulerFilter>
		  </ConditionDetection>
		</MemberModules>
		<Composition>
		  <Node ID="OnScheduleCD">
			<Node ID="DS1"/>
		  </Node>
		</Composition>
	  </Composite>
	</ModuleImplementation>
	<OutputType>System!System.TriggerData</OutputType>
</DataSourceModuleType> 

And here is an example of usage: the rule below will log an event in Application log every 5 minutes on Mon and Tue between 8-9pm and Fri between 10-11pm. Not exactly useful other than getting you started in the right direction.

<Rule ID="Testing.System.AdvancedScheduler" Enabled="true" Target="SystemCenter!Microsoft.SystemCenter.RootManagementServer" ConfirmDelivery="false" Remotable="true" Priority="Normal" DiscardLevel="100">
	<Category>Custom</Category>
	<DataSources>
	  <DataSource ID="Scheduler" TypeID="System.AdvancedScheduler">
		<IntervalSeconds>300</IntervalSeconds>
		<SyncTime></SyncTime>
		<Schedule>
		  <WeeklySchedule>
			<Windows>
			  <Daily>
				<Start>20:00</Start>
				<End>21:00</End>
				<DaysOfWeekMask>6</DaysOfWeekMask>
			  </Daily>
			  <Daily>
				<Start>22:00</Start>
				<End>23:00</End>
				<DaysOfWeekMask>32</DaysOfWeekMask>
			  </Daily>
			</Windows>
		  </WeeklySchedule>
		  <ExcludeDates/>
		</Schedule>
	  </DataSource>
	</DataSources>
	<WriteActions>
	  <WriteAction ID="ExecuteCommand" TypeID="System!System.CommandExecuter">
		<ApplicationName>c:\windows\system32\eventcreate.exe</ApplicationName>
		<WorkingDirectory>c:\windows\system32</WorkingDirectory>
		<CommandLine>/T Information /ID 1 /L Application /D "Testing System.AdvancedScheduler"</CommandLine>
		<TimeoutSeconds>15</TimeoutSeconds>
		<RequireOutput>true</RequireOutput>
		<Files />
	  </WriteAction>
	</WriteActions>
</Rule>

Data Mine the Windows Event Log by Using SCOM

I recall a discussion with a colleague a while ago around errors and warnings in event log. He was insisting that event logs should be immaculate but this sounded so disconnected from reality that I regarded it as utopia.
But he had a point – it must be that errors and warnings are there to be audited and reviewed. And fixed.

SCOM is the right tool to accomplish this task. The Management Pack that I crafted and uploaded to the TechNet Gallery gets this done for you: https://gallery.technet.microsoft.com/Windows-Event-Log-Data-cc1fe248.
Let’s make things clear from the very beginning so there is no confusion on what this Management Pack goes after.
It is NOT collecting ALL warnings and errors in Event Logs (Application, System and Operations Manager). That would be a very bad idea.
Instead it only collects statistics, periodically (every 1 day), for error and warning log entries generated in the last 24h. And by statistics I mean just count each Event ID and Source combination and provide for convenience also a sample description. Plus, the Management Pack exposes these statistics in a report.
There are 2 rules that perform the collections:
– Windows EventLog Data Mining Event Collection Rule (MS) that targets Management Server – enabled by default
– Windows EventLog Data Mining Event Collection Rule (Agent) that targets the Agent – disabled by default.

Here is a screenshot of the report:

Post6-1

Probably you noticed that report parameter allows you to select Information Event Type. What? We said only Error and Warnings.
Yes, that’s not by mistake – the report will display as Information events the System.Exceptions thrown by the auditing script, like:
– no Error or Warning events found
– auditing script failed because the target system does not meet the minimal requirements

I hope that you’ll appreciate the benefits of having such auditing tool in place in your environment.
And, as always, please have it tested in a Dev environment first before importing into Prod.

Powershell Grid Widget with Filter

The Powershell Grid Widget introduced since UR2 for System Center 2012 R2 Operations Manager opens wide open the possibilities of what can be covered by Operations Manager Dashboards.

I was very excited by this particular widget and I started using it. If the same applies to you, then perhaps it was not long before you noticed a short-coming of its implementation that frustrates in case of relative large output data provided by the Powershell data source (the script).
In such situations, a Filter control (like for instance the one that is part of the State widget) would be much appreciated.

I was not able to find such solution (not for lack of trying) therefore I decided to create one myself and the end result was the library MP that I uploaded to TechNet Gallery (https://gallery.technet.microsoft.com/Powershell-Grid-Widget-919dc3d6).

This library management pack adds the PowerShell Grid Widget (Filtered) that allows the user now to create PowerShell Grid dashboards, using the template driven wizard. The usage and configuration of the new widget is the same as the Powershell Grid Widget, just that the output is now searchable (can be filtered). Cool!

post5-1

Just import the Management Pack and start using the new widget in your dashboard visualizations.

I am providing below an example of the usage.

I often felt a need to perform fast, yet organized, searches against SCOM. Here is what I want to ask SCOM from my dashboard: start by telling me all the classes that you’re aware, let me select one class (or even maybe several classes) then tell me the instances of that class that you have discovered.
Then let me search for an instance and when I select it, let me know the details of the instance, the health and all alerts related.

Steps to create such dashboard:

1. Create a Grid Layout dashboard with 5 Cells, name it “Bing SCOM”, for fun…

2. Add the first widget using “Powershell Grid Widget (Filtered)”, name it Search Class, and enter the following script:


$classes = Get-SCOMClass

foreach ($class in $classes)
{
	$dataObject = $ScriptContext.CreateInstance("xsd://foo!bar/baz")
	$dataObject["Id"]=$class.Id.ToString()
	$dataObject["Name"]=$class.Name
	$dataObject["DisplayName"]=$class.DisplayName
	$ScriptContext.ReturnCollection.Add($dataObject)
}

3. Add the second widget using “Powershell Grid Widget (Filtered)”, name it Search Instance, and enter the following script:


Param($globalSelectedItems)

foreach ($globalSelectedItem in $globalSelectedItems)
{
	$class = Get-SCOMClass -Id $globalSelectedItem["Id"]
	$instances = Get-SCOMClassInstance -class $class
	foreach ($instance in $instances)
	{
		$dataObject = $ScriptContext.CreateFromObject($instance, "Id=Id,State=HealthState,DisplayName=DisplayName,Path=Path", $null)
		$dataObject["ParentRelatedObject"] = $class.DisplayName
		$ScriptContext.ReturnCollection.Add($dataObject)
	}
}

4. Add the third widget using Details Widget.

5. Add the fourth widget using the Contextual Health Widget.

6. Add the fifth widget using the Contextual Alert Widget.

Here is a screenshot of the end result.

post5-2

Enjoy your SCOM searches now!

Improving UNIX/Linux Heartbeat Monitor

SCOM monitoring infrastructures that deal with Cross-Platform (UNIX/Linux) monitoring must be very familiar with the UNIX/Linux Heartbeat Monitor as it is implemented by UNIX/Linux Core Library MP.

For quick reference here is the link to the Unit Monitor Type for this monitor (as it is implemented in version 7.5.1042.0 of the management pack): http://systemcentercore.com/?GetElement=Microsoft.Unix.WSMan.Heartbeat.MonitorType&Type=UnitMonitorType&ManagementPack=Microsoft.Unix.Library&Version=7.5.1042.0.

In my experience, this monitor implementation is quite dry – it’s a 2-state monitor that throws a rather laconic alert: “Heartbeat failed” with alert description: “The System is not responding to heartbeats”. An alert notification like this reaching the Unix Support personnel simply raises more questions than providing clues on what’s wrong with the Unix system. Of course one can have a look at the associated Knowledge that is verbose and a good starting point for investigation, but who does open Health Explorer in middle of the night? Also Health Explorer can provide more information regarding the outcome of Diagnostic Task(s) and Recovery (if enabled) but this brings back the question of usability.

With a little work the monitor implementation can be improved in a few areas:

– first let’s have a separate monitor for checking if the system is ICMP alive – I’m sure that Unix Support personnel will be grateful to know if the system is dead or not in the first place. This way the “UNIX/Linux WS-Management Heartbeat ICMP Diagnostic” will not be needed anymore.

Here is how I implemented such monitor:

		<UnitMonitor ID="Unix.ICMPMonitor.HostIsICMPResponsive" Accessibility="Public" Enabled="true" Target="Unix!Microsoft.Unix.Computer" ParentMonitorID="SystemHealth!System.Health.AvailabilityState" Remotable="true" Priority="Normal" TypeID="NetworkMonitoring!System.NetworkManagement.ICMPMonitorType" ConfirmDelivery="false">
			<Category>AvailabilityHealth</Category>
			<AlertSettings AlertMessage="Unix.ICMPMonitor.HostIsICMPResponsive.AlertMessage">
				<AlertOnState>Error</AlertOnState>
				<AutoResolve>true</AutoResolve>
				<AlertPriority>Normal</AlertPriority>
				<AlertSeverity>Error</AlertSeverity>
			</AlertSettings>
			<OperationalStates>
				<OperationalState ID="Responding" MonitorTypeStateID="ICMPResponding" HealthState="Success" />
				<OperationalState ID="NotResponding" MonitorTypeStateID="ICMPNotResponding" HealthState="Error" />
			</OperationalStates>
			<Configuration>
				<IP>$Target/Property[Type="Unix!Microsoft.Unix.Computer"]/NetworkName$</IP>
				<Interval>180</Interval>
				<NoOfRetries>3</NoOfRetries>
				<NumberOfSamples>3</NumberOfSamples>
				<Timeout>1000</Timeout>
				<PacketSizeBytes>32</PacketSizeBytes>
			</Configuration>
		</UnitMonitor>

– second let’s have the Unit Monitor Type slightly morphed into a 3-State with the Warning state giving the opportunity for a recovery action to be taken and the Error state actually firing the alert that will state the fact that the system is not monitored.

Here is how I suggest to have the Unit Monitor Type implemented:

	  <UnitMonitorType ID="Unix.WSMan.Heartbeat.MonitorType" Accessibility="Public">
        <MonitorTypeStates>
          <MonitorTypeState ID="Available" NoDetection="false" />
		  <MonitorTypeState ID="NeedsRecovery" NoDetection="false" />
          <MonitorTypeState ID="NotAvailable" NoDetection="false" />
        </MonitorTypeStates>
        <Configuration>
          <xsd:element name="Interval" type="xsd:int" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="SyncTime" type="xsd:string" minOccurs="0" maxOccurs="1" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
		  <xsd:element name="CorrelateWindowSeconds" type="xsd:integer" minOccurs="0" default="323" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="MissedHeartbeats" type="xsd:integer" minOccurs="0" default="2" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="MissedWindowSeconds" type="xsd:integer" minOccurs="0" default="623" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
        </Configuration>
        <OverrideableParameters>
          <OverrideableParameter ID="Interval" Selector="$Config/Interval$" ParameterType="int" />
          <OverrideableParameter ID="SyncTime" Selector="$Config/SyncTime$" ParameterType="string" />
		  <OverrideableParameter ID="CorrelateWindowSeconds" Selector="$Config/CorrelateWindowSeconds$" ParameterType="int" />
          <OverrideableParameter ID="MissedHeartbeats" Selector="$Config/MissedHeartbeats$" ParameterType="int" />
          <OverrideableParameter ID="MissedWindowSeconds" Selector="$Config/MissedWindowSeconds$" ParameterType="int" />
        </OverrideableParameters>
        <MonitorImplementation>
          <MemberModules>
            <DataSource ID="DS" TypeID="Unix!Microsoft.Unix.WSMan.TimedEnumerator">
              <TargetSystem>$Target/Property[Type="Unix!Microsoft.Unix.Computer"]/NetworkName$</TargetSystem>
              <Uri>http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_Agent?__cimnamespace=root/scx</Uri>
              <Filter />
              <OutputErrorIfAny>true</OutputErrorIfAny>
              <SplitItems>false</SplitItems>
              <Interval>$Config/Interval$</Interval>
              <SyncTime>$Config/SyncTime$</SyncTime>
            </DataSource>
            <ProbeAction ID="EnableMonitoring" TypeID="Unix!Microsoft.Unix.EnableInstanceMonitoringOverrideAction">
              <ManagedEntityId>$Target/Id$</ManagedEntityId>
              <Value>true</Value>
            </ProbeAction>
            <ProbeAction ID="DisableMonitoring" TypeID="Unix!Microsoft.Unix.EnableInstanceMonitoringOverrideAction">
              <ManagedEntityId>$Target/Id$</ManagedEntityId>
              <Value>false</Value>
            </ProbeAction>
            <ConditionDetection ID="RepeatEventCondition" TypeID="System!System.ConsolidatorCondition">
              <Consolidator>
                <ConsolidationProperties />
                <TimeControl>
                  <WithinTimeSchedule>
                    <Interval>$Config/MissedWindowSeconds$</Interval>
                  </WithinTimeSchedule>
                </TimeControl>
                <CountingCondition>
                  <Count>$Config/MissedHeartbeats$</Count>
                  <CountMode>OnNewItemTestOutputRestart_OnTimerRestart</CountMode>
                </CountingCondition>
              </Consolidator>
            </ConditionDetection>
            <ConditionDetection ID="ErrorFilter" TypeID="System!System.ExpressionFilter">
              <Expression>
                <Exists>
                  <ValueExpression>
                    <XPathQuery Type="String">//ErrorCode</XPathQuery>
                  </ValueExpression>
                </Exists>
              </Expression>
            </ConditionDetection>
            <ConditionDetection ID="SuccessFilter" TypeID="System!System.ExpressionFilter">
              <Expression>
                <Not>
                  <Expression>
                    <Exists>
                      <ValueExpression>
                        <XPathQuery Type="String">//ErrorCode</XPathQuery>
                      </ValueExpression>
                    </Exists>
                  </Expression>
                </Not>
              </Expression>
            </ConditionDetection>
			<ConditionDetection TypeID="System!System.CorrelatorAutoCondition" ID="CorrelatedDataCondition">
			  <Correlator>
				<CorrelationExpression>
				  <Expression />
				</CorrelationExpression>
				<Count>1</Count>
				<Interval>$Config/CorrelateWindowSeconds$</Interval>
				<CorrelationOrder>InSequence</CorrelationOrder>
				<CorrelationItemPolicy>First</CorrelationItemPolicy>
			  </Correlator>
			</ConditionDetection>
          </MemberModules>
          <RegularDetections>
            <RegularDetection MonitorTypeStateID="Available">
              <Node ID="EnableMonitoring">
                <Node ID="SuccessFilter">
                  <Node ID="DS" />
                </Node>
              </Node>
            </RegularDetection>
			<RegularDetection MonitorTypeStateID="NeedsRecovery">
				<Node ID="CorrelatedDataCondition">
				  <Node ID="SuccessFilter">
					<Node ID="DS" />
				  </Node>
				  <Node ID="ErrorFilter">
					<Node ID="DS" />
				  </Node>
				</Node>
            </RegularDetection>
            <RegularDetection MonitorTypeStateID="NotAvailable">
              <Node ID="DisableMonitoring">
                <Node ID="RepeatEventCondition">
                  <Node ID="ErrorFilter">
                    <Node ID="DS" />
                  </Node>
                </Node>
              </Node>
            </RegularDetection>
          </RegularDetections>
        </MonitorImplementation>
      </UnitMonitorType>

Here is how the new Monitor looks like:

		<UnitMonitor ID="Unix.HostIsNotMonitoredByAgent.Monitor" Accessibility="Public" Enabled="true" Target="Unix!Microsoft.Unix.Computer" ParentMonitorID="SystemHealth!System.Health.AvailabilityState" Remotable="true" Priority="Normal" TypeID="Unix.WSMan.Heartbeat.MonitorType" ConfirmDelivery="false">
			<Category>AvailabilityHealth</Category>
			<AlertSettings AlertMessage="Unix.HostIsNotMonitoredByAgent.AlertMessage">
				<AlertOnState>Error</AlertOnState>
				<AutoResolve>true</AutoResolve>
				<AlertPriority>Normal</AlertPriority>
				<AlertSeverity>Error</AlertSeverity>
				<AlertParameters>
				</AlertParameters>
			</AlertSettings>
			<OperationalStates>
			  <OperationalState ID="Available" MonitorTypeStateID="Available" HealthState="Success" />
			  <OperationalState ID="NeedsRecovery" MonitorTypeStateID="NeedsRecovery" HealthState="Warning" />
			  <OperationalState ID="NotAvailable" MonitorTypeStateID="NotAvailable" HealthState="Error" />
			</OperationalStates>
			<Configuration>
			  <Interval>300</Interval>
			  <SyncTime></SyncTime>
			  <CorrelateWindowSeconds>323</CorrelateWindowSeconds>
			  <MissedHeartbeats>2</MissedHeartbeats>
			  <MissedWindowSeconds>623</MissedWindowSeconds>
			</Configuration>
		</UnitMonitor>

I will leave to the reader the editing of the Alert Message, so that it’s clear what is going on. It can include “The host is not monitored by agent” and then some steps to follow in the attempt to fix the agent.

And here is the Recovery action associated with the Warning state:

		<Recovery ID="Unix.SCX.Restart.Recovery" Accessibility="Public" Enabled="true" Target="Unix!Microsoft.Unix.Computer" Monitor="Unix.HostIsNotMonitoredByAgent.Monitor" ResetMonitor="false" ExecuteOnState="Warning" Remotable="true" Timeout="300">
			<Category>Maintenance</Category>
			<WriteAction ID="SSHCommand" TypeID="Unix!Microsoft.Unix.SSHCommand.WriteAction">
			  <Host>$Target/Property[Type="Unix!Microsoft.Unix.Computer"]/PrincipalName$</Host>
			  <Port>$Target/Property[Type="Unix!Microsoft.Unix.Computer"]/SSHPort$</Port>
			  <UserName>$RunAs[Name="Unix!Microsoft.Unix.AgentMaintenanceAccount"]/UserName$</UserName>
			  <Password>$RunAs[Name="Unix!Microsoft.Unix.AgentMaintenanceAccount"]/Password$</Password>
			  <Command>/opt/microsoft/scx/bin/tools/scxadmin -stop provider; /opt/microsoft/scx/bin/tools/scxadmin -stop cimom; /opt/microsoft/scx/bin/tools/scxadmin -start all</Command>
			  <TimeoutSeconds>60</TimeoutSeconds>
			</WriteAction>
		</Recovery>

– third, let’s have the original “UNIX/Linux Heartbeat Monitor” disabled using an override.

Hope this helps.

Update: Management Pack that implements all the above changes is available for download on TechNet Gallery: https://gallery.technet.microsoft.com/AddOn-Unix-Heartbeat-3fc2a296.

Correlating Two Counters Threshold Breach

I got just recently a request to have “Total CPU Utilization Percentage” unit monitor re-written. Customer was not satisfied with not having the “List Top CPU Consuming processes” diagnostic output being communicated in the alert, when triggered.

While having a closer look at how this monitor is being implemented in the Windows Server Operating System Management Pack I was surprised to find the way the authors elected to correlate System\Processor Queue and the Processor\% Processor Time\_Total performance counters exceeding threshold. No trace of using the System.Correlator (https://msdn.microsoft.com/en-us/library/ff458713.aspx) condition detection module type.

MSDN documentation reads: “The System.Correlator condition detection module type is used to correlate two incoming data item types, to detect a specific counting and/or ordering pattern of data items. A module of this type accepts data of any type and outputs System.CorrelatorData data.” So if any type, then why not System.Performance.Data?

So, I decided to give it a try and have the monitoring implemented in a new approach, starting from scratch. Have it general so in case any other correlation of two performance counters breach is needed, this solution should come handy.

To summarize again the requirements, the monitoring should:

  • Correlate 2 counters for exceeding configurable thresholds over consecutive samples
  • In case both counters breach their own thresholds some diagnostic will be executed and the output will be captured in the alert description
  • In case any of the counter returns to normal (below threshold), the monitor will turn healthy

So let’s cook!

Start with the following DataSource Module Type:

      <DataSourceModuleType ID="SamplesInRow.DS" Accessibility="Public" Batching="false">
        <Configuration>
          <IncludeSchemaTypes>
            <SchemaType>System!System.ExpressionEvaluatorSchema</SchemaType>
          </IncludeSchemaTypes>
          <xsd:element name="TargetComputerName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="CounterName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="ObjectName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="InstanceName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="Threshold" type="xsd:double" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="IntervalSeconds" type="xsd:integer" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="NumSamples" type="xsd:integer" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="Direction" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
        </Configuration>
        <OverrideableParameters>
          <OverrideableParameter ID="Threshold" Selector="$Config/Threshold$" ParameterType="double" />
        </OverrideableParameters>
        <ModuleImplementation>
          <Composite>
            <MemberModules>
              <DataSource TypeID="Perf!System.Performance.DataProvider" ID="DS">
                <ComputerName>$Config/TargetComputerName$</ComputerName>
                <CounterName>$Config/CounterName$</CounterName>
                <ObjectName>$Config/ObjectName$</ObjectName>
                <InstanceName>$Config/InstanceName$</InstanceName>
                <AllInstances>false</AllInstances>
                <Frequency>$Config/IntervalSeconds$</Frequency>
              </DataSource>
              <ConditionDetection TypeID="Perf!System.Performance.ConsecutiveSamplesCondition" ID="CDThreshold">
                <Threshold>$Config/Threshold$</Threshold>
                <Direction>$Config/Direction$</Direction>
              </ConditionDetection>
              <ConditionDetection TypeID="System!System.ExpressionFilter" ID="CDSufficientSamples">
                <Expression>
                  <SimpleExpression>
                    <ValueExpression>
                      <XPathQuery Type="Double">Value</XPathQuery>
                    </ValueExpression>
                    <Operator>Greater</Operator>
                    <ValueExpression>
                      <Value Type="Double">$Config/NumSamples$</Value>
                    </ValueExpression>
                  </SimpleExpression>
                </Expression>
              </ConditionDetection>
            </MemberModules>
            <Composition>
              <Node ID="CDSufficientSamples">
                <Node ID="CDThreshold">
                  <Node ID="DS" />
                </Node>
              </Node>
            </Composition>
          </Composite>
        </ModuleImplementation>
        <OutputType>Perf!System.Performance.Data</OutputType>
      </DataSourceModuleType>

Add two Correlator Condition Detection Module Types:

      <ConditionDetectionModuleType ID="AndItemCountsFilter.CorrelatorCD" Accessibility="Public" Batching="false" Stateful="true" PassThrough="false">
        <Configuration>
          <IncludeSchemaTypes>
            <SchemaType>System!System.ExpressionEvaluatorSchema</SchemaType>
          </IncludeSchemaTypes>
          <xsd:element name="Correlator" type="CorrelatorType" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
        </Configuration>
        <ModuleImplementation Isolation="Any">
          <Composite>
            <MemberModules>
              <ConditionDetection ID="Correlator" TypeID="System!System.Correlator">
                <Correlator>$Config/Correlator$</Correlator>
              </ConditionDetection>
              <ConditionDetection ID="Filter" TypeID="System!System.ExpressionFilter">
                <Expression>
                  <And>
                    <Expression>
                      <SimpleExpression>
                        <ValueExpression>
                          <XPathQuery Type="UnsignedInteger">Item0Count</XPathQuery>
                        </ValueExpression>
                        <Operator>Equal</Operator>
                        <ValueExpression>
                          <Value Type="UnsignedInteger">1</Value>
                        </ValueExpression>
                      </SimpleExpression>
                    </Expression>
                    <Expression>
                      <SimpleExpression>
                        <ValueExpression>
                          <XPathQuery Type="UnsignedInteger">Item1Count</XPathQuery>
                        </ValueExpression>
                        <Operator>Equal</Operator>
                        <ValueExpression>
                          <Value Type="UnsignedInteger">1</Value>
                        </ValueExpression>
                      </SimpleExpression>
                    </Expression>
                  </And>
                </Expression>
              </ConditionDetection>
            </MemberModules>
            <Composition>
              <Node ID="Filter">
                <Node ID="Correlator" />
              </Node>
            </Composition>
          </Composite>
        </ModuleImplementation>
        <OutputType>System!System.CorrelatorData</OutputType>
        <InputTypes>
          <InputType>System!System.BaseData</InputType>
          <InputType>System!System.BaseData</InputType>
        </InputTypes>
      </ConditionDetectionModuleType>
      <ConditionDetectionModuleType ID="OrItemCountsFilter.CorrelatorCD" Accessibility="Public" Batching="false" Stateful="true" PassThrough="false">
        <Configuration>
          <IncludeSchemaTypes>
            <SchemaType>System!System.ExpressionEvaluatorSchema</SchemaType>
          </IncludeSchemaTypes>
          <xsd:element name="Correlator" type="CorrelatorType" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
        </Configuration>
        <ModuleImplementation Isolation="Any">
          <Composite>
            <MemberModules>
              <ConditionDetection ID="Correlator" TypeID="System!System.Correlator">
                <Correlator>$Config/Correlator$</Correlator>
              </ConditionDetection>
              <ConditionDetection ID="Filter" TypeID="System!System.ExpressionFilter">
                <Expression>
                  <Or>
                    <Expression>
                      <SimpleExpression>
                        <ValueExpression>
                          <XPathQuery Type="UnsignedInteger">Item0Count</XPathQuery>
                        </ValueExpression>
                        <Operator>Equal</Operator>
                        <ValueExpression>
                          <Value Type="UnsignedInteger">1</Value>
                        </ValueExpression>
                      </SimpleExpression>
                    </Expression>
                    <Expression>
                      <SimpleExpression>
                        <ValueExpression>
                          <XPathQuery Type="UnsignedInteger">Item1Count</XPathQuery>
                        </ValueExpression>
                        <Operator>Equal</Operator>
                        <ValueExpression>
                          <Value Type="UnsignedInteger">1</Value>
                        </ValueExpression>
                      </SimpleExpression>
                    </Expression>
                  </Or>
                </Expression>
              </ConditionDetection>
            </MemberModules>
            <Composition>
              <Node ID="Filter">
                <Node ID="Correlator" />
              </Node>
            </Composition>
          </Composite>
        </ModuleImplementation>
        <OutputType>System!System.CorrelatorData</OutputType>
        <InputTypes>
          <InputType>System!System.BaseData</InputType>
          <InputType>System!System.BaseData</InputType>
        </InputTypes>
      </ConditionDetectionModuleType>

Mix them well in a Unit Monitor Type:

      <UnitMonitorType ID="Correlated2PerfCounters.BothThresholdsAbove.VerboseOut.2State.UMT" Accessibility="Public">
        <MonitorTypeStates>
          <MonitorTypeState ID="CorrelatedPerformanceBreach" />
          <MonitorTypeState ID="CorrelatedPerformanceNormal" />
        </MonitorTypeStates>
        <Configuration>
          <IncludeSchemaTypes>
            <SchemaType>System!System.ExpressionEvaluatorSchema</SchemaType>
            <SchemaType>Windows!Microsoft.Windows.PowerShellSchema</SchemaType>
          </IncludeSchemaTypes>
          <xsd:element name="TargetComputerName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="FirstCorrelatedCounterName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="FirstCorrelatedObjectName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="FirstCorrelatedInstanceName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="FirstThreshold" type="xsd:double" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="SecondCorrelatedCounterName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="SecondCorrelatedObjectName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="SecondCorrelatedInstanceName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="SecondThreshold" type="xsd:double" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="IntervalSeconds" type="xsd:integer" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="NumSamples" type="xsd:integer" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="Correlator" type="CorrelatorType" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="ScriptName" type="NonNullString" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="ScriptBody" type="NonNullString" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="Parameters" type="NamedParametersType" minOccurs="0" maxOccurs="1" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
        </Configuration>
        <OverrideableParameters>
          <OverrideableParameter ID="FirstThreshold" Selector="$Config/FirstThreshold$" ParameterType="double" />
          <OverrideableParameter ID="SecondThreshold" Selector="$Config/SecondThreshold$" ParameterType="double" />
        </OverrideableParameters>
        <MonitorImplementation>
          <MemberModules>
            <DataSource TypeID="SamplesInRow.DS" ID="DS1">
              <TargetComputerName>$Config/TargetComputerName$</TargetComputerName>
              <CounterName>$Config/FirstCorrelatedCounterName$</CounterName>
              <ObjectName>$Config/FirstCorrelatedObjectName$</ObjectName>
              <InstanceName>$Config/FirstCorrelatedInstanceName$</InstanceName>
              <Threshold>$Config/FirstThreshold$</Threshold>
              <IntervalSeconds>$Config/IntervalSeconds$</IntervalSeconds>
              <NumSamples>$Config/NumSamples$</NumSamples>
              <Direction>GreaterEqual</Direction>
            </DataSource>
            <DataSource TypeID="SamplesInRow.DS" ID="DS2">
              <TargetComputerName>$Config/TargetComputerName$</TargetComputerName>
              <CounterName>$Config/SecondCorrelatedCounterName$</CounterName>
              <ObjectName>$Config/SecondCorrelatedObjectName$</ObjectName>
              <InstanceName>$Config/SecondCorrelatedInstanceName$</InstanceName>
              <Threshold>$Config/SecondThreshold$</Threshold>
              <IntervalSeconds>$Config/IntervalSeconds$</IntervalSeconds>
              <NumSamples>$Config/NumSamples$</NumSamples>
              <Direction>GreaterEqual</Direction>
            </DataSource>
            <DataSource TypeID="SamplesInRow.DS" ID="DS3">
              <TargetComputerName>$Config/TargetComputerName$</TargetComputerName>
              <CounterName>$Config/FirstCorrelatedCounterName$</CounterName>
              <ObjectName>$Config/FirstCorrelatedObjectName$</ObjectName>
              <InstanceName>$Config/FirstCorrelatedInstanceName$</InstanceName>
              <Threshold>$Config/FirstThreshold$</Threshold>
              <IntervalSeconds>$Config/IntervalSeconds$</IntervalSeconds>
              <NumSamples>$Config/NumSamples$</NumSamples>
              <Direction>Less</Direction>
            </DataSource>
            <DataSource TypeID="SamplesInRow.DS" ID="DS4">
              <TargetComputerName>$Config/TargetComputerName$</TargetComputerName>
              <CounterName>$Config/SecondCorrelatedCounterName$</CounterName>
              <ObjectName>$Config/SecondCorrelatedObjectName$</ObjectName>
              <InstanceName>$Config/SecondCorrelatedInstanceName$</InstanceName>
              <Threshold>$Config/SecondThreshold$</Threshold>
              <IntervalSeconds>$Config/IntervalSeconds$</IntervalSeconds>
              <NumSamples>$Config/NumSamples$</NumSamples>
              <Direction>Less</Direction>
            </DataSource>
            <ProbeAction TypeID="System!System.PassThroughProbe" ID="OnDemandReset" />
            <ProbeAction TypeID="Windows!Microsoft.Windows.PowerShellPropertyBagTriggerOnlyProbe" ID="VerboseOut">
              <ScriptName>$Config/ScriptName$</ScriptName>
              <ScriptBody>$Config/ScriptBody$</ScriptBody>
              <Parameters>$Config/Parameters$</Parameters>
              <TimeoutSeconds>30</TimeoutSeconds>
            </ProbeAction>
            <ConditionDetection TypeID="AndItemCountsFilter.CorrelatorCD" ID="CorrelatedDataConditionBreach">
              <Correlator>$Config/Correlator$</Correlator>
            </ConditionDetection>
            <ConditionDetection TypeID="OrItemCountsFilter.CorrelatorCD" ID="CorrelatedDataConditionNormal">
              <Correlator>$Config/Correlator$</Correlator>
            </ConditionDetection>
            <ConditionDetection ID="FilterOut" TypeID="System!System.ExpressionFilter">
              <Expression>
                <Exists>
                  <ValueExpression>
                    <XPathQuery Type="String">Property[@Name='VerboseOut']</XPathQuery>
                  </ValueExpression>
                </Exists>
              </Expression>
            </ConditionDetection>
          </MemberModules>
          <RegularDetections>
            <RegularDetection MonitorTypeStateID="CorrelatedPerformanceBreach">
              <Node ID="FilterOut">
                <Node ID="VerboseOut">
                  <Node ID="CorrelatedDataConditionBreach">
                    <Node ID="DS1" />
                    <Node ID="DS2" />
                  </Node>
                </Node>
              </Node>
            </RegularDetection>
            <RegularDetection MonitorTypeStateID="CorrelatedPerformanceNormal">
              <Node ID="CorrelatedDataConditionNormal">
                <Node ID="DS3" />
                <Node ID="DS4" />
              </Node>
            </RegularDetection>
          </RegularDetections>
          <OnDemandDetections>
            <OnDemandDetection MonitorTypeStateID="CorrelatedPerformanceNormal">
              <Node ID="OnDemandReset" />
            </OnDemandDetection>
          </OnDemandDetections>
        </MonitorImplementation>
      </UnitMonitorType>

Make use of what you cooked so far in a Unit Monitor:

      <UnitMonitor ID="CPUOverloaded.UM" Accessibility="Public" Enabled="true" Target="Windows!Microsoft.Windows.Server.OperatingSystem" ParentMonitorID="Health!System.Health.PerformanceState" Remotable="true" Priority="Normal" TypeID="Correlated2PerfCounters.BothThresholdsAbove.VerboseOut.2State.UMT" ConfirmDelivery="true">
        <Category>Custom</Category>
        <AlertSettings AlertMessage="CPUOverloaded.UM_AlertMessageResourceID">
          <AlertOnState>Warning</AlertOnState>
          <AutoResolve>true</AutoResolve>
          <AlertPriority>Normal</AlertPriority>
          <AlertSeverity>Warning</AlertSeverity>
          <AlertParameters>
            <AlertParameter1>$Data/Context/Property[@Name='VerboseOut']$</AlertParameter1>
          </AlertParameters>
        </AlertSettings>
        <OperationalStates>
          <OperationalState ID="CorrelatedPerformanceNormal" MonitorTypeStateID="CorrelatedPerformanceNormal" HealthState="Success" />
          <OperationalState ID="CorrelatedPerformanceBreach" MonitorTypeStateID="CorrelatedPerformanceBreach" HealthState="Warning" />
        </OperationalStates>
        <Configuration>
          <TargetComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</TargetComputerName>
          <FirstCorrelatedCounterName>% Processor Time</FirstCorrelatedCounterName>
          <FirstCorrelatedObjectName>Processor</FirstCorrelatedObjectName>
          <FirstCorrelatedInstanceName>_Total</FirstCorrelatedInstanceName>
          <FirstThreshold>90</FirstThreshold>
          <SecondCorrelatedCounterName>Processor Queue Length</SecondCorrelatedCounterName>
          <SecondCorrelatedObjectName>System</SecondCorrelatedObjectName>
          <SecondCorrelatedInstanceName></SecondCorrelatedInstanceName>
          <SecondThreshold>5</SecondThreshold>
          <IntervalSeconds>300</IntervalSeconds>
          <NumSamples>3</NumSamples>
          <Correlator>
            <CorrelationExpression>
              <Expression />
            </CorrelationExpression>
            <Count>1</Count>
            <Interval>900</Interval>
            <Latency>0</Latency>
            <DrainWait>0</DrainWait>
            <CorrelationOrder>AnyOrder</CorrelationOrder>
            <CorrelationItemPolicy>First</CorrelationItemPolicy>
          </Correlator>
          <ScriptName>GetProcessInfo.ps1</ScriptName>
          <ScriptBody> 
    
            param($CpuUtil,$CpuQueue)

            $nl = "`r`n"
            $oAPI = new-object -comObject "MOM.ScriptAPI"
            $pb = $oAPI.CreatePropertyBag()
            
            $VerboseOut = "Cpu Utilization: " + $CpuUtil + $nl
            $VerboseOut = $VerboseOut + "Cpu Queue: " + $CpuQueue + $nl + $nl
            $VerboseOut = $VerboseOut + "ProcessName(PID) : CPUTime" + $nl
            
            $top10 = (Get-Process | Sort-Object CPU -desc | Select-Object -first 10)
            ForEach($proc in $top10)
            {
              $VerboseOut = $VerboseOut + $proc.ProcessName + "(" + $proc.Id + ") : " + $proc.CPU + $nl
            }
            $pb.AddValue("VerboseOut",$VerboseOut)

            #$oAPI.Return($pb)
            $pb

          </ScriptBody>
          <Parameters>
            <Parameter>
              <Name>CpuUtil</Name>
              <Value>$Data/Item0Context/DataItem/SampleValue$</Value>
            </Parameter>
            <Parameter>
              <Name>CpuQueue</Name>
              <Value>$Data/Item1Context/DataItem/SampleValue$</Value>
            </Parameter>
          </Parameters>
        </Configuration>
      </UnitMonitor>

And, of course, add some cosmetics (Display Strings):

        <DisplayString ElementID="CPUOverloaded.UM">
          <Name>CPU Overloaded Unit Monitor</Name>
          <Description />
        </DisplayString>
        <DisplayString ElementID="CPUOverloaded.UM" SubElementID="CorrelatedPerformanceNormal">
          <Name>Correlated Performance Cleared</Name>
        </DisplayString>
        <DisplayString ElementID="CPUOverloaded.UM" SubElementID="CorrelatedPerformanceBreach">
          <Name>Correlated Performance Raised</Name>
        </DisplayString>
        <DisplayString ElementID="CPUOverloaded.UM_AlertMessageResourceID">
          <Name>Processor Is Overloaded</Name>
          <Description>
            The processor counters that exceeded the threshold are presented below, together with the top hog processes stats:

            {0}
          </Description>
        </DisplayString>

You can change the counters and get any other 2 counters correlated threshold breach monitored. Enjoy!

Fighting Clutter in SCOM

When I started to write this post I lingered for a while on what the title should be. I wanted it to suggest the continuous fight that all SCOM administrators face to keep their monitoring environment in order.
In particular I would like to cover a few aspects of it and some suggested solutions.

Management Packs changes and interdependencies

You have to agree with me that the quality of your SCOM monitoring environment is heavily driven by the quality of the Management Packs that are imported at any given time.
And regardless of your level of intimacy with Management Packs, you do know that they change several times (some of them rather often) in their lifetime. So here is the first challenge of staying in control: audit the Management Pack changes.
Each Management Pack has also dependencies: it depends on some MPs while other MPs depend on it. This becomes evident when you are trying for instance to delete an MP and you cannot as SCOM is complaining that other MPs depend on it.
In the course of its lifetime, a SCOM environment will most probably see an ever increasing number of MPs. And even if name conventions are in place and some guidelines on MP Authoring are documented to keep things tidy, chances are that some cleanup will be needed at times.
The cleanup task is a tedious one; it presents the challenge on where to start first, how to approach. If you ever encounter such a situation you probably thought how useful a “map” would be. Let’s say in the form of a diagram showing the MPs and the dependencies between them.
Hence the idea of authoring a management pack that will discover instances of a class that will represent a MP.

<ClassType ID="MPAudit.MP" Accessibility="Public" Abstract="false" Base="System!System.Entity" Hosted="false" Singleton="false" Extension="false">
          <Property ID="MPName" Type="string" AutoIncrement="false" Key="true" CaseSensitive="true" MaxLength="256" MinLength="0" Required="true" Scale="0" />
          <Property ID="MPDisplayName" Type="string" AutoIncrement="false" Key="false" CaseSensitive="false" MaxLength="256" MinLength="0" Required="false" Scale="0" />
          <Property ID="MPFriendlyName" Type="string" AutoIncrement="false" Key="false" CaseSensitive="false" MaxLength="256" MinLength="0" Required="false" Scale="0" />
          <Property ID="Version" Type="string" AutoIncrement="false" Key="false" CaseSensitive="false" MaxLength="256" MinLength="0" Required="false" Scale="0" />
          <Property ID="TimeCreated" Type="string" AutoIncrement="false" Key="false" CaseSensitive="false" MaxLength="256" MinLength="0" Required="false" Scale="0" />
</ClassType>

And while discovering the MPs, let’s also discover the relationships between MPs.

<RelationshipType ID="MPAudit.MPContainsMP" Accessibility="Public" Abstract="false" Base="System!System.Containment">
          <Source ID="Source" Type="MPAudit.MP" />
          <Target ID="Target" Type="MPAudit.MP" />
</RelationshipType>

You can use the following discovery:

<Discovery ID="MPAudit.MP.Discovery.Direct" Enabled="true" Target="SC!Microsoft.SystemCenter.RootManagementServer" ConfirmDelivery="true" Remotable="true" Priority="Normal">
        <Category>Discovery</Category>
        <DiscoveryTypes>
          <DiscoveryClass TypeID="MPAudit.MP" />
          <DiscoveryRelationship TypeID="MPAudit.MPContainsMP" />
        </DiscoveryTypes>
        <DataSource ID="PSDiscovery" TypeID="Windows!Microsoft.Windows.TimedPowerShell.DiscoveryProvider">
          <IntervalSeconds>14400</IntervalSeconds>
          <SyncTime />
          <ScriptName>MPsDiscovery.ps1</ScriptName>
          <ScriptBody><![CDATA[param($sourceId,$managedEntityId)

$erroractionpreference = "SilentlyContinue"

$api = new-object -comObject 'MOM.ScriptAPI'
$discoveryData = $api.CreateDiscoveryData(0, $SourceId, $ManagedEntityId)

$MS = gc env:computername
$ScriptName = "MPsDiscovery.ps1"
$LogEventID = 1510

$evt = New-Object System.Diagnostics.EventLog("Operations Manager")
$evt.Source = "SCAudit"
$infoevent  = [System.Diagnostics.EventLogEntryType]::Information
$warnevent  = [System.Diagnostics.EventLogEntryType]::Warning
$errorevent = [System.Diagnostics.EventLogEntryType]::Error

Write-Host "INF: Script started on $MS"
$evt.WriteEntry("Script $ScriptName started on $MS",$infoevent,$LogEventID)

$ModuleImportError = $false
Try
{
	#Import-Module OperationsManager

	$setupKey = Get-Item -Path "HKLM:\Software\Microsoft\Microsoft Operations Manager\3.0\Setup"
	$installDirectory = $setupKey.GetValue("InstallDirectory") | Split-Path
	$psmPath = $installdirectory + ‘\Powershell\OperationsManager\OperationsManager.psm1’

	Import-Module $psmPath

	$conn = (New-SCOMManagementGroupConnection –ComputerName $MS)
}
Catch [system.exception]
{
	$evt.WriteEntry("Errors detected while importing module OperationsManager on $MS; $error",$errorevent,$LogEventID)
	$ModuleImportError = $true
}

If (!$ModuleImportError)
{

	[System.XML.XMLDocument]$oXML = New-Object System.XML.XMLDocument

	[System.XML.XMLElement]$oXMLmps = $oXML.CreateElement("mps")
	$oXML.appendChild($oXMLmps)

	$mps = Get-SCOMManagementPack

	Foreach ($mp in $mps)
	{

		[System.XML.XMLElement]$oXMLmp = $oXMLmps.appendChild($oXML.CreateElement("mp"))
		$oXMLmp.SetAttribute("name",$mp.Name)
		$oXMLmp.SetAttribute("displayname",$mp.DisplayName)
		$oXMLmp.SetAttribute("friendlyname",$mp.FriendlyName)
		$oXMLmp.SetAttribute("version",$mp.Version)
		$oXMLmp.SetAttribute("timecreated",$mp.TimeCreated)
		$oXMLmp.SetAttribute("lastmodified",$mp.LastModified)

		foreach ($mpref in $mp.References)
		{

			[System.XML.XMLElement]$oXMLmpref=$oXMLmp.appendChild($oXML.CreateElement("mpref"))
			$oXMLmpref.SetAttribute("name",$mpref.Value.Name)

		}

	}

	Foreach ($MP In $oXML.mps.mp) 
	{ 
		$MPinstance = $discoveryData.CreateClassInstance("$MPElement[Name='MPAudit.MP']$")
		$MPinstance.AddProperty("$MPElement[Name='MPAudit.MP']/MPName$", $MP.name)
		#$MPinstance.AddProperty("$MPElement[Name='System!System.Entity']/DisplayName$", $MP.name)
		$MPinstance.AddProperty("$MPElement[Name='MPAudit.MP']/MPDisplayName$", $MP.displayname)
		$MPinstance.AddProperty("$MPElement[Name='MPAudit.MP']/MPFriendlyName$", $MP.friendlyname)
		$MPinstance.AddProperty("$MPElement[Name='MPAudit.MP']/Version$", $MP.version)
		$MPinstance.AddProperty("$MPElement[Name='MPAudit.MP']/TimeCreated$", $MP.timecreated)
		$discoveryData.AddInstance($MPinstance)
	}
	
	Foreach ($MPRef In $oXML.mps.mp.mpref)
	{
		If ($MPRef -ne $null)
		{
			$MPRefInstance = $discoveryData.CreateClassInstance("$MPElement[Name='MPAudit.MP']$")
			$MPRefInstance.AddProperty("$MPElement[Name='MPAudit.MP']/MPName$", $MPRef.name)
			
			$MPParentInstance = $discoveryData.CreateClassInstance("$MPElement[Name='MPAudit.MP']$")
			$MPParentInstance.AddProperty("$MPElement[Name='MPAudit.MP']/MPName$", $MPRef.ParentNode.name)
			
			$rel = $discoveryData.CreateRelationshipInstance("$MPElement[Name='MPAudit.MPContainsMP']$")
			$rel.source = $MPParentInstance
			$rel.target = $MPRefInstance
			$discoveryData.AddInstance($rel)
		}
	}	
			
}

$evt.WriteEntry("Script $ScriptName finished on $MS",$infoevent,$LogEventID)

$discoveryData]]></ScriptBody>
          <Parameters>
            <Parameter>
              <Name>sourceID</Name>
              <Value>$MPElement$</Value>
            </Parameter>
            <Parameter>
              <Name>managedEntityID</Name>
              <Value>$Target/Id$</Value>
            </Parameter>
          </Parameters>
          <TimeoutSeconds>120</TimeoutSeconds>
        </DataSource>
</Discovery>

Once we have the MPs discovered, we just need a simulation monitor to be attached. It will serve as our simulation of the impact an MP change will have on other, dependent MPs; no alert is needed, of course.
An example on how to implement such a testing monitor is:

<UnitMonitorType ID="MPAudit.SimulateMPChange.UMT" Accessibility="Public">
        <MonitorTypeStates>
          <MonitorTypeState ID="Ok" NoDetection="false" />
          <MonitorTypeState ID="Bad" NoDetection="false" />
        </MonitorTypeStates>
        <Configuration>
          <xsd:element minOccurs="1" name="IntervalSeconds" type="xsd:integer" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
        </Configuration>
        <OverrideableParameters>
          <OverrideableParameter ID="IntervalSeconds" Selector="$Config/IntervalSeconds$" ParameterType="int" />
        </OverrideableParameters>
        <MonitorImplementation>
          <MemberModules>
            <DataSource ID="DS1" TypeID="System!System.SimpleScheduler">
              <IntervalSeconds>$Config/IntervalSeconds$</IntervalSeconds>
              <SyncTime></SyncTime>
            </DataSource>
            <ProbeAction ID="PassThrough" TypeID="System!System.PassThroughProbe" />
          </MemberModules>
          <RegularDetections>
            <RegularDetection MonitorTypeStateID="Ok">
              <Node ID="DS1" />
            </RegularDetection>
          </RegularDetections>
          <OnDemandDetections>
            <OnDemandDetection MonitorTypeStateID="Bad">
              <Node ID="PassThrough" />
            </OnDemandDetection>
          </OnDemandDetections>
        </MonitorImplementation>
</UnitMonitorType>
<UnitMonitor ID="MPAudit.MPSimulateMPMonitor.UM" Accessibility="Public" Enabled="true" Target="MPAudit.MP" ParentMonitorID="Health!System.Health.AvailabilityState" Remotable="true" Priority="Normal" TypeID="MPAudit.SimulateMPChange.UMT" ConfirmDelivery="true">
        <Category>Custom</Category>
        <OperationalStates>
          <OperationalState ID="Green" MonitorTypeStateID="Ok" HealthState="Success" />
          <OperationalState ID="Yellow" MonitorTypeStateID="Bad" HealthState="Warning" />
        </OperationalStates>
        <Configuration>
          <IntervalSeconds>600</IntervalSeconds>
        </Configuration>
</UnitMonitor>

An extra ingredient (a dependency monitor) is also required for our solution to work.

<DependencyMonitor ID="MPAudit.MPSimulationDependencyMonitor" Accessibility="Public" Enabled="true" Target="MPAudit.MP" ParentMonitorID="Health!System.Health.AvailabilityState" Remotable="true" Priority="Normal" RelationshipType="MPAudit.MPContainsMP" MemberMonitor="Health!System.Health.AvailabilityState">
        <Category>Custom</Category>
        <Algorithm>WorstOf</Algorithm>
        <MemberUnAvailable>Error</MemberUnAvailable>
</DependencyMonitor>

As you can see I kept things very basic: the monitor resets itself periodically to healthy; and for OnDemand (if Recalculate button is pushed) it will turn yellow. Just enough provided for our simulations to work, right? Here is how such a thing (if implemented as above) will work.

The example might not 100% reflect a real scenario but I think it will make the point. Let’s say I have the following EMC MPs imported and I am wondering WhatIf I am about to change/remove the Monitoring MP, what other MPs will be affected?

post2-1

In Discovered Inventory, with the EMC.SI.Monitoring MP object selected, open HealthExplorer and on the Simulate MP Change Unit Monitor push the Recalculate Health button.

post2-2

Then check which of the other MPs changed state as well.

post2-3

You can draw diagrams for the MP that you are interested.

post2-4

As you probably anticipated now it’s easy to follow-up with some monitoring for MP changes; every so often (15 min for example) a datasource will provide property bag with MP(s) that have changed in the last interval. I will leave the implementation of this monitoring to you as homework.

Where are groups stored?

One information that is missing from the OpsMgr console UI is an easy (or any for the matter) way to find where a group is stored. Then why don’t we get the groups discovered by our MP.

<Discovery ID="MPAudit.Group.Discovery.Direct" Enabled="true" Target="SC!Microsoft.SystemCenter.RootManagementServer" ConfirmDelivery="true" Remotable="true" Priority="Normal">
        <Category>Discovery</Category>
        <DiscoveryTypes>
          <DiscoveryClass TypeID="MPAudit.Group" />
        </DiscoveryTypes>
        <DataSource ID="PSDiscovery" TypeID="Windows!Microsoft.Windows.TimedPowerShell.DiscoveryProvider">
          <IntervalSeconds>14400</IntervalSeconds>
          <SyncTime />
          <ScriptName>GroupsDiscovery.ps1</ScriptName>
          <ScriptBody><![CDATA[param($sourceId,$managedEntityId)

$erroractionpreference = "SilentlyContinue"

$api = new-object -comObject 'MOM.ScriptAPI'
$discoveryData = $api.CreateDiscoveryData(0, $SourceId, $ManagedEntityId)

$MS = gc env:computername
$ScriptName = "GroupsDiscovery.ps1"
$LogEventID = 510

$evt = New-Object System.Diagnostics.EventLog("Operations Manager")
$evt.Source = "SCAudit"
$infoevent  = [System.Diagnostics.EventLogEntryType]::Information
$warnevent  = [System.Diagnostics.EventLogEntryType]::Warning
$errorevent = [System.Diagnostics.EventLogEntryType]::Error

Write-Host "INF: Script started on $MS"
$evt.WriteEntry("Script $ScriptName started on $MS",$infoevent,$LogEventID)

$ModuleImportError = $false
Try
{
	#Import-Module OperationsManager

	$setupKey = Get-Item -Path "HKLM:\Software\Microsoft\Microsoft Operations Manager\3.0\Setup"
	$installDirectory = $setupKey.GetValue("InstallDirectory") | Split-Path
	$psmPath = $installdirectory + ‘\Powershell\OperationsManager\OperationsManager.psm1’

	Import-Module $psmPath

	$conn = (New-SCOMManagementGroupConnection –ComputerName $MS)
}
Catch [system.exception]
{
	$evt.WriteEntry("Errors detected while importing module OperationsManager on $MS; $error",$errorevent,$LogEventID)
	$ModuleImportError = $true
}

If (!$ModuleImportError)
{

	$groups = Get-SCOMGroup

	Foreach ($group in $groups)
	{
		
		$groupinstance = $discoveryData.CreateClassInstance("$MPElement[Name='MPAudit.Group']$")
		$groupinstance.AddProperty("$MPElement[Name='MPAudit.Group']/FullName$", $group.FullName)
		$groupinstance.AddProperty("$MPElement[Name='MPAudit.Group']/DisplayName$", $group.DisplayName)
		$groupinstance.AddProperty("$MPElement[Name='MPAudit.Group']/MPWhereIsSaved$", $group.GetMostDerivedClasses().GetManagementPack().Name)
		$groupinstance.AddProperty("$MPElement[Name='MPAudit.Group']/TimeAdded$", $group.TimeAdded.ToString('g'))
		$discoveryData.AddInstance($groupinstance)
	}
			
}

$evt.WriteEntry("Script $ScriptName finished on $MS",$infoevent,$LogEventID)

$discoveryData]]></ScriptBody>
          <Parameters>
            <Parameter>
              <Name>sourceID</Name>
              <Value>$MPElement$</Value>
            </Parameter>
            <Parameter>
              <Name>managedEntityID</Name>
              <Value>$Target/Id$</Value>
            </Parameter>
          </Parameters>
          <TimeoutSeconds>120</TimeoutSeconds>
        </DataSource>
</Discovery>

What you will get is a nice inventory of your groups and a quick reference on where their definition is saved.

post2-5

To summary, I believe you have now some controls in place to keep your SCOM environment clean. For the better good…