Health Rule Violation Events

You can configure health rule violation events, such as triggering an event when a job fails. To set health rule violation events, create an alerting policy for synthetic jobs.

This diagram shows the retest process for a job failure and its triggered events. In this example, Job A is scheduled to run at 10:00 am at a five minute interval. If successful, no event or message is triggered; if it fails, "Error started" is triggered, and the job retests. If successful, no even or message is triggered; if it fails, "Error started" is triggered, and the job retests. If successful, "Problem ended" is triggered; if it fails, "Problem ended" is triggered, and the job retests. As long as the job continues to fail, the job will retest every 5 minutes and continue triggering "Error continues".

Note: Immediate retest reruns the job immediately after the first failure. All other retest configurations execute the job as per the configured job schedule.
  1. Job A is executed at 10:00 am
    • If Job A succeeds, then no event/message is generated
    • If Job A fails, then the "Error started" event is generated
  2. Job A is retested immediately at 10.01 am ( Previous execution: Job A during execution at 10.00 am, and "Error started" event is generated)

    • If Job A succeeds, then no event/message is generated

    • If Job A fails, then the "Error confirmed after retest” event is generated

  3. Job A is executed at 10:05 am ( Previous execution: Job A failed during retest at 10.01 am, and the "Error confirmed after rest" event is generated)

    • If Job A succeeds, then the "Problem ended" event is generated

    • If Job A fails, then the "Error continues” event is generated

  4. Job A is executed at the next scheduled time ( Previous execution: Job failed and “Error continues” event is generated)

    • If Job A succeeds, then the "Problem ended" event is generated

    • If Job A fails, then the "Error continues” event is generated

Note:
  • If the immediate retest based on availability errors or performance threshold is configured, then the "Error started" event is generated.
  • If the error is confirmed during the retest, then the "Error confirmed" event is triggered.
  • If the error is not confirmed during the retest, then no event is generated.