Create a Runbook
Runbooks in Harness AI SRE enable you to automate incident response workflows, operational procedures, and remediation actions. This comprehensive guide walks you through creating, configuring, and deploying effective runbooks that can significantly reduce mean time to resolution (MTTR) and improve your team's operational efficiency.
Before You Begin
Prerequisites
Ensure you have the following before creating your first runbook:
- Platform Access: Active Harness AI SRE account with appropriate permissions.
- User Permissions: Required Account, Organisation and Project level permissions.
- Integration Access: Configured integrations for the tools you plan to use (Slack, Jira, ServiceNow, etc.).
- Monitoring Setup: Alert sources configured (Datadog, New Relic, PagerDuty, etc.).
Key Concepts
Before diving into runbook creation, familiarize yourself with these core concepts:
- Actions: Individual tasks or operations within a runbook (notifications, API calls, pipeline executions).
- Triggers: Conditions that automatically initiate runbook execution.
- Variables: Dynamic values that can be passed between actions and customized per execution.
- Sequences: The order in which actions are executed within your workflow.
Step 1: Initialize Your Runbook
Create a New Runbook
- Navigate to Runbooks: Go to AI SRE → Runbooks in your Harness platform
- Start Creation: Click + New Runbook to begin the creation process
- Basic Information: Provide essential details for your runbook:
- Name: Use a descriptive name (e.g., "High CPU Alert Response", "Database Connection Recovery").
- Description: Clearly explain the runbook's purpose and when it should be used.

Design Your Workflow
Once your runbook is created, you'll enter the workflow designer where you can build your automation sequence.
1. Add Actions to Your Workflow
Actions are the building blocks of your runbook. Each action performs a specific task in your incident response or operational workflow.
Common Action Types:
- Communication: Send notifications, create Slack channels, start Zoom or Microsoft Teams meetings.
- Harness: Execute pipelines, run scripts, trigger deployments, or add a feature flag.
- Ticketing: Create Jira or ServiceNow incident tickets, update status, assign teams, or update incident tickets.
- Change: Manage GitHub pull requests, create or revert changes.
- Custom: Build your own actions using custom HTTP actions or custom scripts.

2. Configure Action Parameters
Each action requires specific configuration to function correctly. Parameters vary by action type but typically include:
Example: Slack Channel Creation
- Channel Name: Use variables like
${incident.id}
for dynamic naming. - Channel Privacy: Configure privacy with public or private channels.

3. Arrange Action Sequences
The order of actions is crucial for effective runbook execution. Drag and drop actions in the left panel according to the order of execution to create logical sequences.

Best Practices for Sequencing:
- Immediate Response: Start with critical notifications and incident creation.
- Information Gathering: Follow with diagnostic and monitoring actions.
- Remediation: Execute fix actions based on gathered information.
- Validation: Verify that remediation was successful.
- Closure: Update stakeholders and close incidents.
4. Define Workflow Variables
Variables make your runbooks dynamic and reusable across different scenarios.
Variable Configuration Steps
- Select Context Type: Choose the Incident or Alert Context (Any/No/Custom) based on which you want to define the variables.
- Choose Specific Type: For Custom Incident or Alert Context, select the appropriate Incident or Alert Type from the dropdown depending on the use case.


Variable Types
- Input Variables: Values provided when the runbook is triggered.
- Output Variables: Results from action executions.
Variable Configuration Details
- Input Variables: Must be defined based on the incident or alert context.
- Output Variables: Must be defined based on the action execution.
- Required Fields: Name, Display Name, Description, Type, and Default Value.
- Data Types: String, Integer, Number, Boolean, Object, or Array.
- Requirement Level: Variables can be defined as required or optional based on the use case.

Step 2: Available Actions and Integrations
Harness AI SRE provides a comprehensive library of pre-built actions across multiple categories. Choose the right combination of actions to build effective automation workflows.
Communication & Collaboration Tools
Establish immediate communication channels and keep stakeholders informed throughout incident resolution.
Slack Integration
- Send Notifications: Broadcast alerts to channels or direct messages.
- Create Channels: Automatically create incident-specific channels.
- Start Threads: Organize discussions and updates.
- Add Members: Add members to the channel.
- Archive Channels: Clean up after incident resolution.
Microsoft Teams Integration
- Send Messages: Send alerts to specific teams or channels.
- Create Meetings: Automatically create Teams meeting, optionally attaching an AI transcription bot.
Zoom Integration
- Create Meetings: Instantly set up incident response calls, optionally attaching an AI transcription bot.
- End Meetings: End an active Zoom meeting.
Incident Response & Ticketing Systems
Automate incident tracking, assignment, and resolution workflows across your preferred ticketing platforms.
Jira Integration
- Issue Creation: Automatically create tickets with relevant context.
- Status Updates: Progress incidents through workflow states.
- Update Tickets: Updates an existing Jira issue's summary, description, issue type, or adds a comment with relevant context.
ServiceNow Integration
- Incident Management: Create and manage ServiceNow incidents.
- Change Requests: Initiate emergency or standard changes.
- Update Incidents: Updates an existing ServiceNow incident's summary, description, issue type, or adds a comment with relevant context.
Automation & Pipeline Execution
Execute remediation actions, deploy fixes, and trigger operational workflows.
Harness Pipelines Integration
- Pipeline Execution: Trigger deployment or remediation pipeline.
- Feature Flag Management: Deploy specific versions or rollback changes.
- Environment Management: Manage infrastructure scaling or configuration.
Step 3: Configure Triggers
Triggers determine when and how your runbooks execute automatically. Proper trigger configuration ensures your runbooks respond to the right conditions at the right time.
Setting Up Triggers
- Access Trigger Configuration: Click the Triggers tab in your runbook editor.
- Add Trigger: Click + New Trigger to begin the trigger setup process.
- Choose Trigger Template: Select the type from available templates.
- Define Conditions: Set specific conditions for runbook activation based on the frequency of events or changes to specific resources.
- Test Triggers: Validate trigger logic before deployment.
Note: A user can add more than one trigger to a runbook based on the use case.

Trigger Configuration Best Practices
- Avoid Trigger Overlap: Ensure multiple runbooks don't trigger simultaneously for the same event.
- Use Appropriate Delays: Add delays between related triggers to prevent conflicts.
- Test Thoroughly: Validate trigger conditions in non-production environments.
- Monitor Execution: Track trigger effectiveness and adjust conditions as needed.
Step 4: Test Your Runbook
Thorough testing is essential before deploying runbooks to production. A well-tested runbook prevents failures during critical incidents and ensures reliable automation.
Testing Steps
- Select an Alert or Incident: Go to AI SRE → Alerts or Incidents in your Harness platform, then select the alert or incident you want to test.
- Select a Runbook: Click the Runbooks tab and select the runbook you want to test.
- Execute Runbook: In case of no associated runbooks, click Execute Runbook to begin the testing process.
- Test Runbook: Click Execute to begin the testing process.

Pre-Production Testing
1. Environment Preparation
- Test Environment: Set up a dedicated testing environment that mirrors production.
- Test Data: Prepare realistic test scenarios and data sets.
- Integration Sandboxes: Use test instances of integrated tools (Slack, Jira, etc.).
- Mock Services: Create mock endpoints for external dependencies.
2. Functional Testing
- Action Validation: Verify each action executes correctly with expected parameters.
- Sequence Testing: Confirm actions execute in the correct order.
- Variable Passing: Validate that variables are correctly passed between actions.
- Error Handling: Test failure scenarios and error recovery mechanisms.
3. Integration Testing
- Notification Delivery: Confirm all notifications reach intended recipients.
- Pipeline Executions: Verify that triggered pipelines complete successfully.
- API Responses: Check that external API calls return expected results.
- Authentication: Ensure all integrations authenticate properly.
4. End-to-End Testing
- Complete Workflows: Execute full runbook scenarios from trigger to completion.
- Multiple Scenarios: Test various input combinations and edge cases.
- Performance Testing: Measure execution times and resource usage.
- Concurrent Execution: Test behavior when multiple instances run simultaneously.
Testing Checklist
- All actions execute without errors.
- Notifications are delivered to correct channels/recipients.
- Variables are properly populated and passed.
- External integrations respond as expected.
- Error conditions are handled gracefully.
- Execution logs provide sufficient detail for troubleshooting.
- Performance meets acceptable thresholds.
- Security permissions are correctly enforced.
Step 5: Deploy and Monitor
Once testing is complete, deploy your runbook to production and establish monitoring to ensure continued effectiveness.
Deployment Process
- Final Review: Conduct a final review of runbook configuration and testing results.
- Stakeholder Approval: Obtain necessary approvals from the team.
- Production Deployment: Activate the runbook in your production environment.
- Documentation Update: Update operational documentation with runbook details.
Best Practices for Runbook Creation
Design Principles
- Start Simple: Begin with basic workflows and gradually add complexity as you gain experience.
- Modular Design: Create reusable actions and workflows that can be combined for different scenarios.
- Clear Naming: Use descriptive names for runbooks, actions, and variables that clearly indicate their purpose.
Operational Excellence
- Regular Updates: Review and update runbooks regularly to reflect changes in infrastructure and processes.
- Timeout Configuration: Set appropriate timeouts to prevent runbooks from hanging indefinitely.
- Conditional Logic: Use conditional statements to avoid unnecessary action execution.
Troubleshooting Common Issues
Execution Failures
Problem: Runbook actions fail to execute
- Solution: Check integration credentials and network connectivity.
- Prevention: Implement health checks and credential rotation.
Problem: Variables not passing between actions
- Solution: Verify variable names and data types match expectations.
- Prevention: Use consistent naming conventions and validate variable mappings.
Performance Issues
Problem: Runbooks execute slowly
- Solution: Optimize action sequences and enable parallel execution where possible.
- Prevention: Regular performance testing and monitoring.
Next Steps
Advanced Configuration
- Configure Authentication: Set up secure access to integrated tools and services.
- Configure Incident Fields: Customize incident data collection and processing.
- Return to Overview: Explore additional runbook capabilities and features.
Integration Setup Guides
Communication & Collaboration
- Slack Integration: Complete setup guide for Slack automation.
- Microsoft Teams Integration: Configure Teams notifications and collaboration.
- Zoom Integration: Set up automated meeting creation and management.
Incident Management
- Jira Integration: Automate issue tracking and project management.
- ServiceNow Integration: Integrate with enterprise service management.
Automation & Pipelines
- Harness Pipelines Integration: Execute deployment and remediation pipelines.
Need Help? Contact our support team or visit the Harness Documentation for additional resources and troubleshooting guides.