RPS Master Controller Design

RPS Master Controller Design Overview

One of the core technologies leveraged by the Rapid Provisioning System (RPS) is Microsoft Service Management Automation (SMA). SMA allows RPS to use runbooks to automate workflow tasks and access global assets via PowerShell. While SMA is a powerful tool for automating tasks, it falls short in handling jobs that fail to execute due to network connectivity issues. One of the core requirements for RPS is to ensure it can operate within environments with low-bandwidth and intermittent connectivity.

To overcome this obstacle, the RPS team developed a Master Controller which ensures SMA jobs that fail to start or fail while executing are restarted and completed properly. The Master Controller enhances built-in SMA job execution by tracking custom RPS job states and dependencies to execute jobs in advanced sequences. It also evaluates all Task Assignments belonging to Task Maps via RPS APIs and progresses them when dependencies are met.

alt text

Note

This document provides an overview of the Master Controller and references technical designs, which provide additional detail to some of the components. Instead of embedding these technical design documents in this document, references to them are included to ensure that the most recent versions are always used.

Term	Definition
RPS Node	A running instance of the RPS system.
Configuration Item (CI)	A managed component, device, or appliance.
DSC	Microsoft’s PowerShell Desired State Configuration software package.
SMA	Microsoft Service Management Automation software package.
Cmdlet	A lightweight command that is used in the Windows PowerShell environment.

Task Assignments – How They are Generated from a Task Map

The referenced documents below will show how a Task Map, when assigned, will generate a set of Task Assignments. These are generated through their definitions and filters. The documents provide visual representation of how Task Assignments are generated when assigned to a container. The reference documents are available at:

$\Documents\Operations\**RPS TaskAssignmentDiagram.vsd**
$\Documents\Operations\**RPS Tasking Guide.docx**

Task States – How the Master Controller Handles Flow of Execution

The referenced document below will show how the Task Assignments resulting from a Task Map being assigned will be executed, and a visual representation of the flow the Master Controller takes to complete a Task Map’s execution. These flows in general are controlled via Task States. These states indicate the next step a Task Map will make. The detailed diagram for this action is available at:

$\Documents\Operations\**RPS TaskAssignmentDiagram.vsd**

Controller "Region"	Description
Invoke-Evaluation	This cmdlet triggers the algorithm that determines, based on TaskState, whether a TaskMap will proceed to the next step.
Gather Active Assignments	The cmdlet “Get-RpsActiveTaskAssignments” is used to gather all Task Assignments currently assigned to the executing node, that are both active and Ready to be launched.
Process Active Assignments in Parallel	Once all active Task Assignments are gathered for execution, they are processed in parallel threads to trigger the steps that follow (health check, execution).
Perform Health Checks as Needed	When a task is ready for execution, it will be evaluated to determine if it was previously started, and if so, if it is still in a “healthy” state and should be skipped or re-initiated.
Launch Tasks	Once health checks and states have been evaluated, if the Task in question requires an execution, it will be launched within SMA.
Correlating Tasks to Targets	Each runbook authored in support of Task Mapping/Task Item formats will have a default parameter of the TaskAssignmentId. This object correlation gives a relationship back to the Target Item, Container, and other object relationships that provide all context necessary to navigate configuration data.

Executing the Master Controller

The Master Controller is launched through a scheduled job within SMA. This schedule is configured as part of the installation process of RPS leveraging DSC. This DSC configuration maintains the SMA schedule, which launches a “health check” runbook called Check-ControllerStatus.

When started, the Check-ControllerStatus runbook determines if the Master Controller is operational, stuck, or otherwise not executing. If it is found to be unhealthy, or not executing in some way, any existing “stuck” jobs will be shut down and a new copy will be started.

Note

It is important that only one copy of the Master Controller runs at any given time on a given node. Multiple running Master Controller jobs can cause job or job state corruption.

Once started, the Master Controller will run on a constant loop, repeating with a default 60 second delay between completions of the entire loop. This prevents the controller from running too quickly, while remaining constantly in action.

Triggering Task Map Progression

The “Invoke-RpsEvaluateTaskAssignmentStatus” cmdlet is used to evaluate, and advance Task Maps based on their definitions and dependencies. Task State evaluation is executed using the logic shown in the document found at:

$\Documents\Operations\ RPS TaskAssignmentDiagram.vsd

This document outlines how Task Assignments are processed and advanced, based on the value of the Task State property.

Gathering Active Task Assignments

The “Get-RpsActiveTaskAssignments” cmdlet is used to gather all active Task Assignments. Assignments are considered active if they meet the following two conditions.

Task Assignment is assigned to a Target Item on the current RPS Node.
Task Assignment is considered “Active”. See the below for more detail.

Active Definition

For a Task Assignment to be considered “Active” by RPS, all related entities must be explicitly active, meaning their IsActive flags are set to true. This includes:

Task Item
Target Item
Container
Task Map (if used for assignment)
Task Definition (if used for assignment)
Target Group (if used for assignment)

Parallel Processing of Active Assignments

Once active assignments are gathered, they are processed in parallel by the Master Controller using .NET Workflow API commands. This provides a great improvement in performance, as many jobs will be analyzed and reviewed for execution simultaneously.

Health Checks Performed on individual Jobs

Health-checks performed by the Master Controller ensure that executed jobs within SMA are actively executing and not hung-up or otherwise failed. The health-checks and resulting actions performed are listed as follows:

Is the job supposed to be running, and is it running? – If it is not, restart it.
Is it not running, but has been started and failed? – Job is restarted.
Has it been started, but is queued or otherwise pending execution? – Considered healthy.
Is it actively running/executing? Considered healthy.

Each execution is checked each time the Master Controller is run to ensure each job is healthy, or restarted in the case it is failing. This allows for controlled failures, automated recoveries, and assumed outage scenarios.

Launch Tasks

Once a Task Assignment has passed both Active checks, and Health Checks, it will be launched within SMA. This is done by calling the “Start-SmaJob” command with the Task Item as the reference runbook. At this point, SMA takes over the execution of the task, while the Master Controller and the RPS system monitor its state.

Correlating Tasks to Targets – TaskAssignmentID

Each runbook authored in support of RPS requires a single parameter – the TaskAssignmentId. This parameter enables correlation of the runbook to its TargetItem, Container, Group, TaskMap objects, and other related entities. By providing the TaskAssignmentID, these entities can be retrieved from the RPS database to provide configuration data on a per-execution basis, as needed and at scale.

User Interactions using PendingUserActions

“New-RpsPendingUserAction” is a cmdlet designed to flag a Task Assignment for user interaction. This functionality provides a common way to initiate user-interactive requests on as-needed basis. When this command is executed, it will require a TaskAssignmentId, Button labels for Positive/Negative user replies, and a Message to display. This data can be used by the UI or API to request feedback. If the negative action is chosen, the TaskAssignmentId requesting an action will be set to “cancelled,” and any executions that depend on its completion will not be executed. If the positive action is chosen, the TaskAssignmentId requesting an action will be set to Completed, and any steps depending on this task will consider it complete and proceed to any following steps.

RPS Master Controller Design