Schedule a planned disaster recovery for your Splunk Cloud Platform environment
Up to twice a year, you can schedule a planned failover of your Splunk Cloud Platform environment when it is integrated with Cross-Region Disaster Recovery. Performing this planned failover lets you confirm that the service is performing as you expect and lets you test processes to ensure that your data collection and forwarding infrastructure sends data to your Splunk Cloud Platform environment properly even when it is active in the secondary cloud service provider region.
Cross-Region Disaster Recovery protects your environment from regional cloud service provider (CSP) failures in a primary region by failing over to a secondary region. This excludes protection of regional failures in the secondary region.
Requirements for performing a planned disaster recovery
Meet the following requirements to perform a planned disaster recovery for your Splunk Cloud Platform environment:
- Your Splunk Cloud Platform environment must have Cross-Region Disaster Recovery implemented and functional
- You must use Splunk Cloud Platform within specific service limits. See Service limits and constraints in the Splunk Cloud Platform Service Manual
- The environment must not already be within a maintenance window or going through an upgrade
- The environment must be able to be in a maintenance window for the planned failover and failback of at least four hours
- Your Splunk Cloud Platform environment must fail back to the primary region no sooner than 24 hours after the failover completes, and within 2 weeks of the planned failover
-
You must develop your own Disaster Recovery (DR) test plan in advance based on your business needs and compliance requirements. Splunk expects you to validate this plan after a planned failover and failback
Schedule a planned disaster recovery
To schedule a planned disaster recovery, perform the following steps:
- Visit the Splunk Support Portal.
- Select the Need Help? button.
- Select Create a Case, then choose Support.
- Select the Splunk Cloud environment on which you want to perform the test.
- Select Cloud Change Request.
- Select Standard Configuration.
- In the Config file field, enter "CRDR".
- In the Description field, enter the following text:
JSON
Request for planned failover testing: Please schedule failover and failback of my Splunk Cloud Platform instance during the following time. Planned Failover Maintenance Window: {Date, 4-hour time window, Time zone} Planned Failback Maintenance Window: {Date, 4-hour time window, Time zone} - Select Submit.
After Splunk receives the request, it will contact you to schedule the appropriate maintenance windows to perform the planned disaster recovery.
Sequence of communications during the maintenance window
When you schedule a maintenance window for a planned disaster recovery failover, the following section defines the schedule of events that happen during the window. The window includes pre-checks and post-checks in addition to the failover and failback events.
Before a failover or failback begins
-
For planned failovers and failbacks:
-
You must create a Support case to begin scheduling two Maintenance Windows for your planned failover or failback
-
You will then receive confirmation of the planned failover and failback dates and times in the Support ticket
-
You must then communicate any changes to planned failover or failback dates and times in the Support ticket
-
-
For unplanned failovers and failbacks:
-
Splunk notifies you of a regional outage
-
During a failover or failback
-
You receive automated email communications to the operation contacts listed operational contacts of the following events:
-
Email communication with timestamps of failover and failback events
-
Failover / failback initiated
-
DNS switch completed
-
Ingestion and Indexing started
-
Search availability started
-
Failover / failback completed
-
Following is an example of an automated email communication:
What happens if there are issues during failover or failback?
- If Splunk identifies the problem, Splunk creates an incident
-
If you identify a problem, open a Support case through the standard channels (for example, through Salesforce or by calling in)