Monthly Archives: April 2015

Correlating Two Counters Threshold Breach

I got just recently a request to have “Total CPU Utilization Percentage” unit monitor re-written. Customer was not satisfied with not having the “List Top CPU Consuming processes” diagnostic output being communicated in the alert, when triggered.

While having a closer look at how this monitor is being implemented in the Windows Server Operating System Management Pack I was surprised to find the way the authors elected to correlate System\Processor Queue and the Processor\% Processor Time\_Total performance counters exceeding threshold. No trace of using the System.Correlator (https://msdn.microsoft.com/en-us/library/ff458713.aspx) condition detection module type.

MSDN documentation reads: “The System.Correlator condition detection module type is used to correlate two incoming data item types, to detect a specific counting and/or ordering pattern of data items. A module of this type accepts data of any type and outputs System.CorrelatorData data.” So if any type, then why not System.Performance.Data?

So, I decided to give it a try and have the monitoring implemented in a new approach, starting from scratch. Have it general so in case any other correlation of two performance counters breach is needed, this solution should come handy.

To summarize again the requirements, the monitoring should:

  • Correlate 2 counters for exceeding configurable thresholds over consecutive samples
  • In case both counters breach their own thresholds some diagnostic will be executed and the output will be captured in the alert description
  • In case any of the counter returns to normal (below threshold), the monitor will turn healthy

So let’s cook!

Start with the following DataSource Module Type:

      <DataSourceModuleType ID="SamplesInRow.DS" Accessibility="Public" Batching="false">
        <Configuration>
          <IncludeSchemaTypes>
            <SchemaType>System!System.ExpressionEvaluatorSchema</SchemaType>
          </IncludeSchemaTypes>
          <xsd:element name="TargetComputerName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="CounterName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="ObjectName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="InstanceName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="Threshold" type="xsd:double" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="IntervalSeconds" type="xsd:integer" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="NumSamples" type="xsd:integer" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="Direction" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
        </Configuration>
        <OverrideableParameters>
          <OverrideableParameter ID="Threshold" Selector="$Config/Threshold$" ParameterType="double" />
        </OverrideableParameters>
        <ModuleImplementation>
          <Composite>
            <MemberModules>
              <DataSource TypeID="Perf!System.Performance.DataProvider" ID="DS">
                <ComputerName>$Config/TargetComputerName$</ComputerName>
                <CounterName>$Config/CounterName$</CounterName>
                <ObjectName>$Config/ObjectName$</ObjectName>
                <InstanceName>$Config/InstanceName$</InstanceName>
                <AllInstances>false</AllInstances>
                <Frequency>$Config/IntervalSeconds$</Frequency>
              </DataSource>
              <ConditionDetection TypeID="Perf!System.Performance.ConsecutiveSamplesCondition" ID="CDThreshold">
                <Threshold>$Config/Threshold$</Threshold>
                <Direction>$Config/Direction$</Direction>
              </ConditionDetection>
              <ConditionDetection TypeID="System!System.ExpressionFilter" ID="CDSufficientSamples">
                <Expression>
                  <SimpleExpression>
                    <ValueExpression>
                      <XPathQuery Type="Double">Value</XPathQuery>
                    </ValueExpression>
                    <Operator>Greater</Operator>
                    <ValueExpression>
                      <Value Type="Double">$Config/NumSamples$</Value>
                    </ValueExpression>
                  </SimpleExpression>
                </Expression>
              </ConditionDetection>
            </MemberModules>
            <Composition>
              <Node ID="CDSufficientSamples">
                <Node ID="CDThreshold">
                  <Node ID="DS" />
                </Node>
              </Node>
            </Composition>
          </Composite>
        </ModuleImplementation>
        <OutputType>Perf!System.Performance.Data</OutputType>
      </DataSourceModuleType>

Add two Correlator Condition Detection Module Types:

      <ConditionDetectionModuleType ID="AndItemCountsFilter.CorrelatorCD" Accessibility="Public" Batching="false" Stateful="true" PassThrough="false">
        <Configuration>
          <IncludeSchemaTypes>
            <SchemaType>System!System.ExpressionEvaluatorSchema</SchemaType>
          </IncludeSchemaTypes>
          <xsd:element name="Correlator" type="CorrelatorType" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
        </Configuration>
        <ModuleImplementation Isolation="Any">
          <Composite>
            <MemberModules>
              <ConditionDetection ID="Correlator" TypeID="System!System.Correlator">
                <Correlator>$Config/Correlator$</Correlator>
              </ConditionDetection>
              <ConditionDetection ID="Filter" TypeID="System!System.ExpressionFilter">
                <Expression>
                  <And>
                    <Expression>
                      <SimpleExpression>
                        <ValueExpression>
                          <XPathQuery Type="UnsignedInteger">Item0Count</XPathQuery>
                        </ValueExpression>
                        <Operator>Equal</Operator>
                        <ValueExpression>
                          <Value Type="UnsignedInteger">1</Value>
                        </ValueExpression>
                      </SimpleExpression>
                    </Expression>
                    <Expression>
                      <SimpleExpression>
                        <ValueExpression>
                          <XPathQuery Type="UnsignedInteger">Item1Count</XPathQuery>
                        </ValueExpression>
                        <Operator>Equal</Operator>
                        <ValueExpression>
                          <Value Type="UnsignedInteger">1</Value>
                        </ValueExpression>
                      </SimpleExpression>
                    </Expression>
                  </And>
                </Expression>
              </ConditionDetection>
            </MemberModules>
            <Composition>
              <Node ID="Filter">
                <Node ID="Correlator" />
              </Node>
            </Composition>
          </Composite>
        </ModuleImplementation>
        <OutputType>System!System.CorrelatorData</OutputType>
        <InputTypes>
          <InputType>System!System.BaseData</InputType>
          <InputType>System!System.BaseData</InputType>
        </InputTypes>
      </ConditionDetectionModuleType>
      <ConditionDetectionModuleType ID="OrItemCountsFilter.CorrelatorCD" Accessibility="Public" Batching="false" Stateful="true" PassThrough="false">
        <Configuration>
          <IncludeSchemaTypes>
            <SchemaType>System!System.ExpressionEvaluatorSchema</SchemaType>
          </IncludeSchemaTypes>
          <xsd:element name="Correlator" type="CorrelatorType" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
        </Configuration>
        <ModuleImplementation Isolation="Any">
          <Composite>
            <MemberModules>
              <ConditionDetection ID="Correlator" TypeID="System!System.Correlator">
                <Correlator>$Config/Correlator$</Correlator>
              </ConditionDetection>
              <ConditionDetection ID="Filter" TypeID="System!System.ExpressionFilter">
                <Expression>
                  <Or>
                    <Expression>
                      <SimpleExpression>
                        <ValueExpression>
                          <XPathQuery Type="UnsignedInteger">Item0Count</XPathQuery>
                        </ValueExpression>
                        <Operator>Equal</Operator>
                        <ValueExpression>
                          <Value Type="UnsignedInteger">1</Value>
                        </ValueExpression>
                      </SimpleExpression>
                    </Expression>
                    <Expression>
                      <SimpleExpression>
                        <ValueExpression>
                          <XPathQuery Type="UnsignedInteger">Item1Count</XPathQuery>
                        </ValueExpression>
                        <Operator>Equal</Operator>
                        <ValueExpression>
                          <Value Type="UnsignedInteger">1</Value>
                        </ValueExpression>
                      </SimpleExpression>
                    </Expression>
                  </Or>
                </Expression>
              </ConditionDetection>
            </MemberModules>
            <Composition>
              <Node ID="Filter">
                <Node ID="Correlator" />
              </Node>
            </Composition>
          </Composite>
        </ModuleImplementation>
        <OutputType>System!System.CorrelatorData</OutputType>
        <InputTypes>
          <InputType>System!System.BaseData</InputType>
          <InputType>System!System.BaseData</InputType>
        </InputTypes>
      </ConditionDetectionModuleType>

Mix them well in a Unit Monitor Type:

      <UnitMonitorType ID="Correlated2PerfCounters.BothThresholdsAbove.VerboseOut.2State.UMT" Accessibility="Public">
        <MonitorTypeStates>
          <MonitorTypeState ID="CorrelatedPerformanceBreach" />
          <MonitorTypeState ID="CorrelatedPerformanceNormal" />
        </MonitorTypeStates>
        <Configuration>
          <IncludeSchemaTypes>
            <SchemaType>System!System.ExpressionEvaluatorSchema</SchemaType>
            <SchemaType>Windows!Microsoft.Windows.PowerShellSchema</SchemaType>
          </IncludeSchemaTypes>
          <xsd:element name="TargetComputerName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="FirstCorrelatedCounterName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="FirstCorrelatedObjectName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="FirstCorrelatedInstanceName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="FirstThreshold" type="xsd:double" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="SecondCorrelatedCounterName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="SecondCorrelatedObjectName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="SecondCorrelatedInstanceName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="SecondThreshold" type="xsd:double" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="IntervalSeconds" type="xsd:integer" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="NumSamples" type="xsd:integer" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="Correlator" type="CorrelatorType" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="ScriptName" type="NonNullString" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="ScriptBody" type="NonNullString" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
          <xsd:element name="Parameters" type="NamedParametersType" minOccurs="0" maxOccurs="1" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
        </Configuration>
        <OverrideableParameters>
          <OverrideableParameter ID="FirstThreshold" Selector="$Config/FirstThreshold$" ParameterType="double" />
          <OverrideableParameter ID="SecondThreshold" Selector="$Config/SecondThreshold$" ParameterType="double" />
        </OverrideableParameters>
        <MonitorImplementation>
          <MemberModules>
            <DataSource TypeID="SamplesInRow.DS" ID="DS1">
              <TargetComputerName>$Config/TargetComputerName$</TargetComputerName>
              <CounterName>$Config/FirstCorrelatedCounterName$</CounterName>
              <ObjectName>$Config/FirstCorrelatedObjectName$</ObjectName>
              <InstanceName>$Config/FirstCorrelatedInstanceName$</InstanceName>
              <Threshold>$Config/FirstThreshold$</Threshold>
              <IntervalSeconds>$Config/IntervalSeconds$</IntervalSeconds>
              <NumSamples>$Config/NumSamples$</NumSamples>
              <Direction>GreaterEqual</Direction>
            </DataSource>
            <DataSource TypeID="SamplesInRow.DS" ID="DS2">
              <TargetComputerName>$Config/TargetComputerName$</TargetComputerName>
              <CounterName>$Config/SecondCorrelatedCounterName$</CounterName>
              <ObjectName>$Config/SecondCorrelatedObjectName$</ObjectName>
              <InstanceName>$Config/SecondCorrelatedInstanceName$</InstanceName>
              <Threshold>$Config/SecondThreshold$</Threshold>
              <IntervalSeconds>$Config/IntervalSeconds$</IntervalSeconds>
              <NumSamples>$Config/NumSamples$</NumSamples>
              <Direction>GreaterEqual</Direction>
            </DataSource>
            <DataSource TypeID="SamplesInRow.DS" ID="DS3">
              <TargetComputerName>$Config/TargetComputerName$</TargetComputerName>
              <CounterName>$Config/FirstCorrelatedCounterName$</CounterName>
              <ObjectName>$Config/FirstCorrelatedObjectName$</ObjectName>
              <InstanceName>$Config/FirstCorrelatedInstanceName$</InstanceName>
              <Threshold>$Config/FirstThreshold$</Threshold>
              <IntervalSeconds>$Config/IntervalSeconds$</IntervalSeconds>
              <NumSamples>$Config/NumSamples$</NumSamples>
              <Direction>Less</Direction>
            </DataSource>
            <DataSource TypeID="SamplesInRow.DS" ID="DS4">
              <TargetComputerName>$Config/TargetComputerName$</TargetComputerName>
              <CounterName>$Config/SecondCorrelatedCounterName$</CounterName>
              <ObjectName>$Config/SecondCorrelatedObjectName$</ObjectName>
              <InstanceName>$Config/SecondCorrelatedInstanceName$</InstanceName>
              <Threshold>$Config/SecondThreshold$</Threshold>
              <IntervalSeconds>$Config/IntervalSeconds$</IntervalSeconds>
              <NumSamples>$Config/NumSamples$</NumSamples>
              <Direction>Less</Direction>
            </DataSource>
            <ProbeAction TypeID="System!System.PassThroughProbe" ID="OnDemandReset" />
            <ProbeAction TypeID="Windows!Microsoft.Windows.PowerShellPropertyBagTriggerOnlyProbe" ID="VerboseOut">
              <ScriptName>$Config/ScriptName$</ScriptName>
              <ScriptBody>$Config/ScriptBody$</ScriptBody>
              <Parameters>$Config/Parameters$</Parameters>
              <TimeoutSeconds>30</TimeoutSeconds>
            </ProbeAction>
            <ConditionDetection TypeID="AndItemCountsFilter.CorrelatorCD" ID="CorrelatedDataConditionBreach">
              <Correlator>$Config/Correlator$</Correlator>
            </ConditionDetection>
            <ConditionDetection TypeID="OrItemCountsFilter.CorrelatorCD" ID="CorrelatedDataConditionNormal">
              <Correlator>$Config/Correlator$</Correlator>
            </ConditionDetection>
            <ConditionDetection ID="FilterOut" TypeID="System!System.ExpressionFilter">
              <Expression>
                <Exists>
                  <ValueExpression>
                    <XPathQuery Type="String">Property[@Name='VerboseOut']</XPathQuery>
                  </ValueExpression>
                </Exists>
              </Expression>
            </ConditionDetection>
          </MemberModules>
          <RegularDetections>
            <RegularDetection MonitorTypeStateID="CorrelatedPerformanceBreach">
              <Node ID="FilterOut">
                <Node ID="VerboseOut">
                  <Node ID="CorrelatedDataConditionBreach">
                    <Node ID="DS1" />
                    <Node ID="DS2" />
                  </Node>
                </Node>
              </Node>
            </RegularDetection>
            <RegularDetection MonitorTypeStateID="CorrelatedPerformanceNormal">
              <Node ID="CorrelatedDataConditionNormal">
                <Node ID="DS3" />
                <Node ID="DS4" />
              </Node>
            </RegularDetection>
          </RegularDetections>
          <OnDemandDetections>
            <OnDemandDetection MonitorTypeStateID="CorrelatedPerformanceNormal">
              <Node ID="OnDemandReset" />
            </OnDemandDetection>
          </OnDemandDetections>
        </MonitorImplementation>
      </UnitMonitorType>

Make use of what you cooked so far in a Unit Monitor:

      <UnitMonitor ID="CPUOverloaded.UM" Accessibility="Public" Enabled="true" Target="Windows!Microsoft.Windows.Server.OperatingSystem" ParentMonitorID="Health!System.Health.PerformanceState" Remotable="true" Priority="Normal" TypeID="Correlated2PerfCounters.BothThresholdsAbove.VerboseOut.2State.UMT" ConfirmDelivery="true">
        <Category>Custom</Category>
        <AlertSettings AlertMessage="CPUOverloaded.UM_AlertMessageResourceID">
          <AlertOnState>Warning</AlertOnState>
          <AutoResolve>true</AutoResolve>
          <AlertPriority>Normal</AlertPriority>
          <AlertSeverity>Warning</AlertSeverity>
          <AlertParameters>
            <AlertParameter1>$Data/Context/Property[@Name='VerboseOut']$</AlertParameter1>
          </AlertParameters>
        </AlertSettings>
        <OperationalStates>
          <OperationalState ID="CorrelatedPerformanceNormal" MonitorTypeStateID="CorrelatedPerformanceNormal" HealthState="Success" />
          <OperationalState ID="CorrelatedPerformanceBreach" MonitorTypeStateID="CorrelatedPerformanceBreach" HealthState="Warning" />
        </OperationalStates>
        <Configuration>
          <TargetComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</TargetComputerName>
          <FirstCorrelatedCounterName>% Processor Time</FirstCorrelatedCounterName>
          <FirstCorrelatedObjectName>Processor</FirstCorrelatedObjectName>
          <FirstCorrelatedInstanceName>_Total</FirstCorrelatedInstanceName>
          <FirstThreshold>90</FirstThreshold>
          <SecondCorrelatedCounterName>Processor Queue Length</SecondCorrelatedCounterName>
          <SecondCorrelatedObjectName>System</SecondCorrelatedObjectName>
          <SecondCorrelatedInstanceName></SecondCorrelatedInstanceName>
          <SecondThreshold>5</SecondThreshold>
          <IntervalSeconds>300</IntervalSeconds>
          <NumSamples>3</NumSamples>
          <Correlator>
            <CorrelationExpression>
              <Expression />
            </CorrelationExpression>
            <Count>1</Count>
            <Interval>900</Interval>
            <Latency>0</Latency>
            <DrainWait>0</DrainWait>
            <CorrelationOrder>AnyOrder</CorrelationOrder>
            <CorrelationItemPolicy>First</CorrelationItemPolicy>
          </Correlator>
          <ScriptName>GetProcessInfo.ps1</ScriptName>
          <ScriptBody> 
    
            param($CpuUtil,$CpuQueue)

            $nl = "`r`n"
            $oAPI = new-object -comObject "MOM.ScriptAPI"
            $pb = $oAPI.CreatePropertyBag()
            
            $VerboseOut = "Cpu Utilization: " + $CpuUtil + $nl
            $VerboseOut = $VerboseOut + "Cpu Queue: " + $CpuQueue + $nl + $nl
            $VerboseOut = $VerboseOut + "ProcessName(PID) : CPUTime" + $nl
            
            $top10 = (Get-Process | Sort-Object CPU -desc | Select-Object -first 10)
            ForEach($proc in $top10)
            {
              $VerboseOut = $VerboseOut + $proc.ProcessName + "(" + $proc.Id + ") : " + $proc.CPU + $nl
            }
            $pb.AddValue("VerboseOut",$VerboseOut)

            #$oAPI.Return($pb)
            $pb

          </ScriptBody>
          <Parameters>
            <Parameter>
              <Name>CpuUtil</Name>
              <Value>$Data/Item0Context/DataItem/SampleValue$</Value>
            </Parameter>
            <Parameter>
              <Name>CpuQueue</Name>
              <Value>$Data/Item1Context/DataItem/SampleValue$</Value>
            </Parameter>
          </Parameters>
        </Configuration>
      </UnitMonitor>

And, of course, add some cosmetics (Display Strings):

        <DisplayString ElementID="CPUOverloaded.UM">
          <Name>CPU Overloaded Unit Monitor</Name>
          <Description />
        </DisplayString>
        <DisplayString ElementID="CPUOverloaded.UM" SubElementID="CorrelatedPerformanceNormal">
          <Name>Correlated Performance Cleared</Name>
        </DisplayString>
        <DisplayString ElementID="CPUOverloaded.UM" SubElementID="CorrelatedPerformanceBreach">
          <Name>Correlated Performance Raised</Name>
        </DisplayString>
        <DisplayString ElementID="CPUOverloaded.UM_AlertMessageResourceID">
          <Name>Processor Is Overloaded</Name>
          <Description>
            The processor counters that exceeded the threshold are presented below, together with the top hog processes stats:

            {0}
          </Description>
        </DisplayString>

You can change the counters and get any other 2 counters correlated threshold breach monitored. Enjoy!

Advertisements