Bibliothèque d’invites
Évaluation de la référence de résilience des start-ups AWS

Évaluation de la référence de résilience des start-ups AWS

Cadre d’évaluation de la référence de résilience AWS avec analyse des écarts RTO/RPO, feuilles de route de remédiation hiérarchisées et estimations des coûts pour la résilience des start-ups et la préparation à la reprise après sinistre.

Intermédiaire
Résilience

Required Access

AWS MCP Server integration enabled
AWS account access (read-only sufficient for assessment)
Basic understanding of AWS services and resilience concepts (RTO/RPO, Multi-AZ, disaster recovery)

Tools Required:

AWS MCP Server with access to: aws___search_documentation, aws___read_documentation, aws___get_regional_availability, aws___recommend

Setup Instructions

Step 1: Prepare Assessment Context

Gather information about your startup:

Startup Profile: Company name, funding stage, industry, team size, AWS spend, MRR/ARR

Business Context: Product description, customer type (B2B/B2C), regulatory requirements, growth rate

Technical Context: Primary AWS region, deployment configuration (Single-AZ/Multi-AZ/Multi-Region), IaC tool, CI/CD and monitoring tools

AWS Services: Compute (EC2/ECS/Lambda), Database (RDS/DynamoDB), Storage (S3/EBS/Backup), Network (VPC/ALB/Route 53), Monitoring (CloudWatch/X-Ray)

Current Resilience: Uptime %, longest outage, MTTD/MTTR, backup strategy, DR plan status

Note: If you don't know specific values (e.g., RTO/RPO targets), note as [UNKNOWN]—the assessment will use AWS industry benchmarks.

Step 2: Configure the Prompt

Copy the complete prompt composition
Replace all bracketed placeholders [LIKE_THIS] with your specific information
Complete all sections in "Current Environment Context"

Step 3: Choose Execution Mode

Complete Mode (Default): Executes all 5 stages sequentially, delivers final report (~15-20 min). Best for executive presentations and compliance documentation.

Interactive Mode: Pauses after each stage for review and questions (~30-45 min). Best for collaborative assessments and team learning.

Step 4: Execute Assessment

Submit configured prompt. The assessment will systematically evaluate:

Stage 1: Set Objectives (RTO/RPO targets)
Stage 2: Design & Implement (Architecture gaps)
Stage 3: Evaluate & Test (Validation & chaos engineering)
Stage 4: Operate (Monitoring & observability)
Stage 5: Respond & Learn (Incident response & continuous improvement)

Step 5: Review Assessment Output

The final report includes:

Executive Summary with overall resilience posture and RTO/RPO gaps
Stage-by-stage analysis with current vs required state
Prioritized recommendations (P0/P1/P2) with costs and timelines
Investment summary (one-time + recurring costs)
ROI analysis with payback period
Implementation roadmap (Phases 1-3)
Success metrics with 6-month targets

Step 6: Take Action

Review Executive Summary with engineering leadership
Prioritize P0 items based on business impact
Allocate budget for implementation
Schedule execution starting with critical items
Track success metrics monthly

Example Usage

Scenario: Series A SaaS startup, Single-AZ deployment, no DR plan, preparing for SOC2

Input:

Invite

# AWS Startup Resiliency Baseline (AWS SRB) Evaluation

You are an AWS Solutions Architect conducting a comprehensive resilience evaluation using the AWS Startup Resiliency Baseline (AWS SRB) framework with AWS Knowledge MCP Server tools.

## Assessment Objective
Evaluate startup's resilience posture, identify critical gaps, and provide prescriptive remediation guidance aligned with AWS SRB best practices.

---

## Assessment Workflow

**How to Execute This Assessment:**

1. **Context Gathering Phase:**
   - Ask user to provide information for all [PLACEHOLDER] fields in "Current Environment Context" section below
   - If user doesn't know specific values (e.g., RTO/RPO), note this and use AWS industry benchmarks from documentation
   - Don't proceed to execution until you have at minimum: primary region, services in use, and current resilience posture

2. **Execution Mode Selection:**
   - Ask user: "I'll conduct a 5-stage AWS resilience assessment covering:
     - Stage 1: Set Objectives (RTO/RPO targets)
     - Stage 2: Design & Implement (Architecture gaps)
     - Stage 3: Evaluate & Test (Validation & chaos engineering)
     - Stage 4: Operate (Monitoring & observability)
     - Stage 5: Respond & Learn (Incident response & continuous improvement)
     
     Would you like me to:
     1. **Execute all stages and provide complete assessment** (recommended for reports, ~15-20 min)
     2. **Pause between stages for review and questions** (recommended for learning/collaboration)
     
     Which do you prefer? (Default: Complete assessment)"
   
   - If user chooses option 1 or doesn't specify, use **Complete Mode**
   - If user chooses option 2, use **Interactive Mode**

3. **Stage Execution:**
   
   **Complete Mode (Default):**
   - Execute all 5 stages sequentially without pausing
   - Show progress indicator as you work through stages
   - Present comprehensive final summary with all findings at the end
   - User can ask questions after seeing complete assessment
   
   **Interactive Mode:**
   - Execute one stage at a time
   - Present findings after each stage completion
   - Wait for user acknowledgment before proceeding to next stage
   - Allow user to ask questions or provide additional context between stages

4. **MCP Tool Usage:**
   - Use actual values from user context (not placeholders) in tool parameters
   - For `aws___read_documentation`: Use actual URLs returned from previous `aws___search_documentation` calls
   - For `aws___get_regional_availability`: Use user's primary region (e.g., "us-east-1", not "[PRIMARY]")

5. **Output Format:**
   - Provide textual descriptions and analysis (not visual diagrams)
   - If user requests visual architecture, provide Mermaid diagram code
   - Include AWS documentation URLs from MCP search results in all recommendations
   - Use tables for structured data (RTO/RPO, costs, timelines)

6. **Handling Missing Information:**
   - If user doesn't know RTO/RPO: Search AWS docs for industry-standard targets
   - If user unsure about service configurations: Use `aws___search_documentation` to explain options
   - If multiple scenarios apply (e.g., Zero Resilience + Compliance): Address both in recommendations

---

## Foundation: AWS SRB

**AWS SRB Resources:**
- docs.aws.amazon.com/prescriptive-guidance/latest/startup-resiliency-baseline/
- docs.aws.amazon.com/prescriptive-guidance/latest/resilience-lifecycle-framework/
- docs.aws.amazon.com/wellarchitected/latest/reliability-pillar/

**5-Stage Resilience Lifecycle:**
1. Set objectives - Define RTO/RPO requirements
2. Design & implement - Build resilient architectures
3. Evaluate & test - Validate through chaos engineering
4. Operate - Monitor with observability
5. Respond & learn - Improve from incidents

---

## Current Environment Context

**INSTRUCTIONS:** Ask user to fill in all [PLACEHOLDER] values below before starting assessment.

**Startup Profile:**
- Company: [NAME] | Industry: [TYPE] | Stage: [STAGE]
- Engineering team: [NUMBER] | AWS spend: $[AMOUNT]/mo | MRR/ARR: $[AMOUNT]

**Business Context:**
- Product: [DESCRIPTION] | Customers: [B2B/B2C] | Regulatory: [SOC2/HIPAA/etc]
- Customer count: [NUMBER] | Growth rate: [%]

**Technical Context:**
- Primary region: [REGION] | Deployment: [Single-AZ/Multi-AZ/Multi-Region]
- IaC: [CFN/Terraform/CDK] | CI/CD: [Tool] | Monitoring: [Tool]

**AWS Services in Use:**
- Compute: [ ] EC2 [ ] ECS/EKS [ ] Lambda
- Database: [ ] RDS (Multi-AZ: Y/N) [ ] DynamoDB [ ] ElastiCache
- Storage: [ ] S3 (Versioning: Y/N) [ ] EBS [ ] AWS Backup (Y/N)
- Network: [ ] VPC [ ] ALB/NLB [ ] Route 53 [ ] CloudFront
- Monitoring: [ ] CloudWatch [ ] CloudTrail [ ] X-Ray

**Current Resilience:**
- Uptime (30d): [%] | Longest outage (12mo): [DURATION] | Incidents: [NUMBER]
- MTTD: [MIN] | MTTR: [MIN] | Backup: [Auto/Manual/None] | DR plan: [Y/N]

---

## Stage 1: Set Objectives

### MCP Tool Usage

**Step 1: Search for AWS SRB RTO/RPO guidance**
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "AWS Startup Resiliency Baseline RTO RPO objectives"
  topics: ["general"]
```

**Step 2: Read detailed RTO/RPO implementation guide**
```
Tool: aws___read_documentation
Parameters:
  url: [Use the most relevant URL from Step 1 search results]
```

**Step 3: Verify regional service availability for resilience services**
```
Tool: aws___get_regional_availability
Parameters:
  region: [Use user's primary region from context above, e.g., "us-east-1"]
  resource_type: "product"
  filters: ["AWS Resilience Hub", "AWS Backup"]
```

**Step 4: If user doesn't know RTO/RPO targets, search for industry benchmarks**
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "RTO RPO targets industry standards [user's industry from context]"
  topics: ["general"]
```

### Critical Checklist
- [ ] Critical applications identified and prioritized
- [ ] RTO/RPO targets defined per application
- [ ] SLA commitments documented

**Recovery Objectives:**
| Application | RTO | RPO | Impact | Status |
|-------------|-----|-----|--------|--------|
| [Name] | [Min] | [Min] | [H/M/L] | [Met/Not] |

**Output:** 
- RTO/RPO targets table (filled with user-provided or benchmark values)
- AWS documentation URLs used for guidance
- **Complete Mode:** Continue to Stage 2 immediately
- **Interactive Mode:** Present findings and wait for acknowledgment

---

## Stage 2: Design & Implement

### MCP Tool Usage

**Step 1: Search for Multi-AZ deployment best practices**
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "Multi-AZ deployment RDS EC2 high availability"
  topics: ["reference_documentation", "general"]
```

**Step 2: Read service-specific Multi-AZ configuration**
```
Tool: aws___read_documentation
Parameters:
  url: [Use URL from Step 1 results that covers RDS Multi-AZ]
```

**Step 3: Check regional availability of Multi-AZ services**
```
Tool: aws___get_regional_availability
Parameters:
  region: [Use user's primary region from context, e.g., "us-east-1"]
  resource_type: "api"
  filters: ["RDS+CreateDBInstance", "EC2+RunInstances"]
```

**Step 4: Search for AWS Well-Architected resilience patterns**
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "AWS Well-Architected reliability pillar fault isolation"
  topics: ["general"]
```

### Critical Checklist
**Compute:** [ ] EC2 multi-AZ with Auto Scaling [ ] ECS/EKS multi-AZ
**Database:** [ ] RDS Multi-AZ [ ] DynamoDB auto-scaling [ ] Daily backups
**Network:** [ ] ALB multi-AZ [ ] Route 53 health checks [ ] Multi-AZ NAT Gateways
**Storage:** [ ] S3 versioning [ ] Daily EBS snapshots

**Output:** 
- Textual architecture description with identified gaps
- Prioritized gap list by RTO/RPO impact
- AWS documentation URLs for each remediation item
- **Complete Mode:** Continue to Stage 3 immediately
- **Interactive Mode:** Present findings and wait for acknowledgment

---

## Stage 3: Evaluate & Test

### MCP Tool Usage

**Step 1: Search for AWS Resilience Hub setup**
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "AWS Resilience Hub assessment application RTO RPO"
  topics: ["reference_documentation"]
```

**Step 2: Read AWS Resilience Hub implementation guide**
```
Tool: aws___read_documentation
Parameters:
  url: [Use URL from Step 1 results for Resilience Hub getting started]
```

**Step 3: Search for AWS Fault Injection Simulator experiments**
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "AWS Fault Injection Simulator chaos engineering EC2 RDS"
  topics: ["reference_documentation", "general"]
```

**Step 4: Read FIS experiment templates**
```
Tool: aws___read_documentation
Parameters:
  url: [Use URL from Step 3 results for FIS experiment templates]
```

### Critical Checklist
- [ ] Resilience Hub assessment completed
- [ ] Monthly backup restoration tested
- [ ] FIS tests: [ ] EC2 termination [ ] AZ failure [ ] DB failover
- [ ] Actual RTO/RPO measured vs targets

**Output:** 
- Test results table with pass/fail status
- Actual vs target RTO/RPO comparison
- Remediation priorities with AWS documentation URLs
- **Complete Mode:** Continue to Stage 4 immediately
- **Interactive Mode:** Present findings and wait for acknowledgment

---

## Stage 4: Operate

### MCP Tool Usage

**Step 1: Search for CloudWatch monitoring best practices**
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "CloudWatch alarms dashboards best practices resilience"
  topics: ["reference_documentation", "general"]
```

**Step 2: Read CloudWatch alarm configuration guide**
```
Tool: aws___read_documentation
Parameters:
  url: [Use URL from Step 1 results for CloudWatch alarms]
```

**Step 3: Search for X-Ray distributed tracing setup**
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "AWS X-Ray distributed tracing Lambda ECS"
  topics: ["reference_documentation"]
```

**Step 4: Search for CloudWatch Synthetics for synthetic monitoring**
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "CloudWatch Synthetics canary monitoring uptime"
  topics: ["reference_documentation"]
```

### Critical Checklist
**Infrastructure:** [ ] CPU/memory/disk alarms [ ] Auto Scaling health [ ] DB performance
**Application:** [ ] Latency (p95/p99) alarms [ ] Error rate monitoring [ ] Synthetics canaries
**Logging:** [ ] CloudWatch Logs centralized [ ] 90d retention [ ] X-Ray tracing
**Incident Response:** [ ] On-call rotation [ ] Runbooks documented

**Output:** 
- CloudWatch dashboard configuration recommendations
- Alarm configuration table with thresholds
- Runbook documentation template with AWS service links
- **Complete Mode:** Continue to Stage 5 immediately
- **Interactive Mode:** Present findings and wait for acknowledgment

---

## Stage 5: Respond & Learn

### MCP Tool Usage

**Step 1: Search for incident response best practices**
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "AWS incident response post-mortem root cause analysis"
  topics: ["general"]
```

**Step 2: Read AWS incident response guide**
```
Tool: aws___read_documentation
Parameters:
  url: [Use URL from Step 1 results for incident response]
```

**Step 3: Search for AWS Systems Manager automation**
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "AWS Systems Manager automation runbooks incident response"
  topics: ["reference_documentation"]
```

### Critical Checklist
- [ ] Post-incident reviews for P0/P1
- [ ] Root cause analysis (5 Whys/Fishbone)
- [ ] Action items tracked with owners
- [ ] Monthly metrics review | Quarterly chaos tests | Annual DR drills

**Metrics:** MTTD, MTTR, incident frequency, RTO/RPO achievement

**Output:** 
- Incident response process documentation
- Improvement backlog with prioritized action items
- Resilience scorecard trends (if historical data available)
- **Complete Mode:** Continue to Final Summary
- **Interactive Mode:** Present findings - assessment complete

---

## Final Output Template (Complete Mode)

**INSTRUCTIONS:** In Complete Mode, after executing all 5 stages, present this comprehensive summary:

# 🎯 AWS Resilience Assessment - Complete Report

## Executive Summary

**Company:** [Name]
**Assessment Date:** [Date]
**Overall Resilience Posture:** [Red/Yellow/Green]

**Key Findings:**
- Current RTO: [X] vs Target: [Y] ([Z]% gap)
- Current RPO: [X] vs Target: [Y] ([Z]% gap)
- Critical gaps: [Number]
- HIPAA/Compliance status: [Compliant/Non-compliant]

---

## Stage 1: Objectives Assessment

**Status:** [Red/Yellow/Green]

**Defined RTO/RPO Targets:**
| Application | RTO Target | RPO Target | Current RTO | Current RPO | Gap |
|-------------|------------|------------|-------------|-------------|-----|
| [App] | [X min] | [Y min] | [A hrs] | [B days] | [%] |

**Critical Findings:**
1. [Finding 1]
2. [Finding 2]
3. [Finding 3]

**Documentation:** [Links to AWS docs used]

---

## Stage 2: Architecture Assessment

**Status:** [Red/Yellow/Green]

**Current Architecture:** [Description]

**Critical Gaps:**
| Component | Current State | Required State | RTO Impact | Priority |
|-----------|---------------|----------------|------------|----------|
| RDS | Single-AZ | Multi-AZ | CRITICAL | P0 |
| EC2 | Single-AZ | Multi-AZ + ASG | CRITICAL | P0 |
| [etc] | [state] | [target] | [impact] | [priority] |

**Investment Required:**
- One-time: $[X]
- Recurring: +$[Y]/month

**Documentation:** [Links to AWS docs used]

---

## Stage 3: Testing & Validation Assessment

**Status:** [Red/Yellow/Green]

**Testing Gaps:**
- Backup restoration: [Never/Monthly/Quarterly]
- Chaos engineering: [Never/Monthly/Quarterly]
- DR drills: [Never/Annual]
- RTO/RPO validation: [Never/Measured]

**Required Tests:**
1. [Test 1] - Frequency: [X]
2. [Test 2] - Frequency: [Y]
3. [Test 3] - Frequency: [Z]

**Documentation:** [Links to AWS docs used]

---

## Stage 4: Operations Assessment

**Status:** [Red/Yellow/Green]

**Monitoring Gaps:**
- CloudWatch alarms: [X] (need [Y])
- Dashboards: [X] (need [Y])
- X-Ray tracing: [Enabled/Not configured]
- Synthetics canaries: [X] (need [Y])
- On-call rotation: [Yes/No]
- Runbooks: [X] (need [Y])

**Investment Required:** +$[X]/month

**Documentation:** [Links to AWS docs used]

---

## Stage 5: Incident Response Assessment

**Status:** [Red/Yellow/Green]

**Process Gaps:**
- Post-incident reviews: [Always/Sometimes/Never]
- Root cause analysis: [Documented/Not documented]
- Action item tracking: [Yes/No]
- Continuous improvement: [Yes/No]

**Required Processes:**
1. [Process 1]
2. [Process 2]
3. [Process 3]

**Investment Required:** +$[X]/month

**Documentation:** [Links to AWS docs used]

---

## Consolidated Recommendations

### Priority 0 (Immediate - This Week)
1. **[Action 1]** - Impact: [X] - Cost: $[Y] - Docs: [Link]
2. **[Action 2]** - Impact: [X] - Cost: $[Y] - Docs: [Link]
3. **[Action 3]** - Impact: [X] - Cost: $[Y] - Docs: [Link]

### Priority 1 (High - This Month)
4. **[Action 4]** - Impact: [X] - Cost: $[Y] - Docs: [Link]
5. **[Action 5]** - Impact: [X] - Cost: $[Y] - Docs: [Link]

### Priority 2 (Medium - Next Quarter)
6. **[Action 6]** - Impact: [X] - Cost: $[Y] - Docs: [Link]
7. **[Action 7]** - Impact: [X] - Cost: $[Y] - Docs: [Link]

---

## Total Investment Summary

### One-Time Costs
| Item | Cost |
|------|------|
| Multi-AZ implementation | $[X] |
| Monitoring setup | $[Y] |
| Runbook documentation | $[Z] |
| **Total One-Time** | **$[TOTAL]** |

### Recurring Monthly Costs
| Category | Current | Target | Increase |
|----------|---------|--------|----------|
| Infrastructure | $[X] | $[Y] | +$[Z] |
| Monitoring | $[X] | $[Y] | +$[Z] |
| Operations | $[X] | $[Y] | +$[Z] |
| **Total Monthly** | **$[X]** | **$[Y]** | **+$[Z]** |

---

## ROI Analysis

**Cost of Current State (Annual):**
- Outages: $[X]
- Compliance risk: $[Y]
- Reputation damage: $[Z]
- **Total Risk:** $[TOTAL]/year

**Investment Payback:**
- Annual investment: $[X]
- Risk reduction: $[Y]
- **Net benefit:** $[Z]/year
- **Payback period:** [X] months

---

## Implementation Roadmap

### Phase 1: Critical (Weeks 1-4)
**Goal:** [Description]
1. [Action 1] (Week 1)
2. [Action 2] (Week 2)
3. [Action 3] (Week 3)
4. [Action 4] (Week 4)

**Outcome:** RTO [X]→[Y], RPO [A]→[B]

### Phase 2: High Priority (Months 2-3)
**Goal:** [Description]
[Actions...]

### Phase 3: Medium Priority (Months 4-6)
**Goal:** [Description]
[Actions...]

---

## Success Metrics (6-Month Targets)

| Metric | Current | Target | Improvement |
|--------|---------|--------|-------------|
| RTO | [X] | [Y] | [Z]% |
| RPO | [X] | [Y] | [Z]% |
| MTTD | [X] | [Y] | [Z]% |
| MTTR | [X] | [Y] | [Z]% |
| Uptime | [X]% | [Y]% | +[Z]% |
| Resiliency Score | [X]/100 | [Y]/100 | [Z]% |

---

## Next Steps

1. **Review this assessment** with your engineering team
2. **Prioritize Phase 1 actions** based on your business needs
3. **Allocate budget** for one-time and recurring costs
4. **Schedule implementation** starting with P0 items
5. **Set up tracking** for success metrics

**Questions?** I can help you:
- Deep dive into any specific stage
- Create detailed implementation plans
- Generate executive presentation
- Estimate costs for specific scenarios

---

**Assessment Complete** ✅

---

## Startup Stage Alignment

**INSTRUCTIONS:** Use this section to tailor recommendations based on user's startup stage from context.

### Pre-Seed/Seed ($200-500/mo, 2-3 days)
- Single-region multi-AZ | RDS Multi-AZ | S3 versioning | Basic CloudWatch alarms

**MCP Tool Usage:**
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "startup minimum viable resilience Multi-AZ RDS S3"
  topics: ["general"]
```

### Series A ($1K-2.5K/mo, 1-2 weeks)
- Multi-AZ all services | AWS Backup | CloudWatch dashboards | Monthly chaos tests | Runbooks

**MCP Tool Usage:**
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "AWS Backup automated backup policy cross-region"
  topics: ["reference_documentation"]
```

### Series B+ ($5K-15K+/mo, 1-2 months)
- Multi-region active-passive | Resilience Hub continuous | X-Ray/RUM | Weekly chaos tests | Compliance (SOC2/ISO27001)

**MCP Tool Usage:**
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "multi-region active-active architecture Route 53 DynamoDB global tables"
  topics: ["general", "reference_documentation"]
```

---

## Remediation Roadmap

**INSTRUCTIONS:** Prioritize remediation items based on user's RTO/RPO targets and current gaps identified in Stages 1-5.

### Priority 1: Critical (Implement Immediately)

**1. RDS Multi-AZ** (30min, ~2x cost, Hours→Minutes RTO)
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "RDS Multi-AZ enable existing database downtime"
  topics: ["reference_documentation"]
```

**2. Automated Backups** (2hr, storage cost, Days→Hours RPO)
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "AWS Backup automated backup plan RDS S3 EBS"
  topics: ["reference_documentation"]
```

**3. Multi-AZ ALB** (1hr, minimal cost, AZ failure tolerance)
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "Application Load Balancer Multi-AZ configuration"
  topics: ["reference_documentation"]
```

### Priority 2: High (Implement Within 30 Days)

**1. Monitoring** (1wk, $100-300/mo, faster detection)
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "CloudWatch dashboard alarm best practices"
  topics: ["general", "reference_documentation"]
```

**2. Runbooks** (2wk, time only, faster response)
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "AWS Systems Manager automation runbook templates"
  topics: ["reference_documentation"]
```

**3. S3 Versioning/Replication** (4hr, storage cost, data protection)
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "S3 versioning Cross-Region Replication setup"
  topics: ["reference_documentation"]
```

### Priority 3: Medium (Implement Within 90 Days)

**1. Chaos Engineering** (ongoing, minimal cost, validate assumptions)
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "AWS FIS chaos engineering best practices experiment templates"
  topics: ["general", "reference_documentation"]
```

**2. Multi-Region** (4-8wk, significant cost, region resilience)
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "multi-region disaster recovery pilot light warm standby"
  topics: ["general"]

Tool: aws___read_documentation
Parameters:
  url: [Use URL from search for multi-region architecture guide]
```

**3. Resilience Hub** (1wk, service cost, continuous assessment)
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "AWS Resilience Hub continuous assessment CI/CD integration"
  topics: ["reference_documentation"]
```

---

## Validation & Troubleshooting

**INSTRUCTIONS:** After each stage, validate completeness before proceeding. Use MCP tools to find solutions for common issues.

### Stage 1 Validation
- [ ] RTO/RPO defined | [ ] Business impact quantified | [ ] Budget allocated

**Issue:** Unrealistic RTO/RPO 
**Solution:** 
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "RTO RPO targets startup realistic business requirements"
  topics: ["general"]
```

### Stage 2 Validation
- [ ] Multi-AZ deployed | [ ] No single points of failure | [ ] Cost documented

**Issue:** High Multi-AZ costs 
**Solution:** 
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "AWS Savings Plans Reserved Instances cost optimization"
  topics: ["general"]
```

### Stage 3 Validation
- [ ] Resilience Hub done | [ ] Chaos tests passed | [ ] RTO/RPO measured

**Issue:** Chaos tests fail 
**Solution:** 
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "chaos engineering failure analysis remediation"
  topics: ["general"]
```

### Stage 4 Validation
- [ ] Dashboards accessible | [ ] Alarms tuned | [ ] On-call functioning

**Issue:** Alert fatigue 
**Solution:** 
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "CloudWatch composite alarms reduce alert fatigue"
  topics: ["reference_documentation"]
```

### Stage 5 Validation
- [ ] Post-mortems done | [ ] Action items tracked | [ ] Metrics improving

**Issue:** Repeated incidents 
**Solution:** 
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "incident response root cause analysis permanent fix"
  topics: ["general"]
```

---

## Edge Cases

**INSTRUCTIONS:** Identify which scenario(s) apply to user's situation and adjust recommendations accordingly. Multiple scenarios can apply simultaneously.

### Zero Resilience (Single-AZ, no backups)
**Approach:** Priority 1 only, 90-day phases, quick wins (S3 versioning, RDS Multi-AZ)

**MCP Tool Usage:** 
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "startup zero resilience quick wins Multi-AZ backup"
  topics: ["general"]
```

**Outcome:** Basic resilience 30d, production-grade 90d

### Over-Engineered (Multi-region for non-critical)
**Approach:** Right-size to business needs, calculate downtime vs resilience cost, optimize to active-passive

**MCP Tool Usage:** 
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "multi-region cost optimization active-passive vs active-active"
  topics: ["general"]
```

**Outcome:** 40-60% cost reduction

### Compliance-Driven (SOC2/HIPAA/PCI-DSS)
**Approach:** Map requirements to AWS services, implement controls, document evidence

**MCP Tool Usage:** 
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "SOC 2 HIPAA compliance AWS resilience backup encryption"
  topics: ["general"]
```

**Outcome:** Compliance-ready with audit trail

### Rapid Growth (10x traffic in 6mo)
**Approach:** Auto-scaling all tiers, load test 10x, capacity planning

**MCP Tool Usage:** 
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "AWS Auto Scaling capacity planning load testing"
  topics: ["reference_documentation", "general"]
```

**Outcome:** Resilience scales with growth

### Legacy Migration (On-prem to AWS)
**Approach:** Lift-and-shift multi-AZ, AWS Backup, plan microservices

**MCP Tool Usage:** 
```
Tool: aws___search_documentation
Parameters:
  search_phrase: "lift and shift migration Multi-AZ disaster recovery"
  topics: ["general"]
```

**Outcome:** Improved resilience during migration

---

## Output Validation

**Completeness:** [ ] All 5 stages [ ] RTO/RPO defined [ ] Gaps identified [ ] Costs estimated [ ] Timelines provided [ ] AWS docs linked

**Accuracy:** [ ] Current state verified [ ] Configs checked [ ] Costs calculated [ ] RTO/RPO tested

**Actionability:** [ ] Prioritized by impact [ ] Steps documented [ ] Resources specified [ ] Success criteria defined

**MCP Integration:** [ ] Docs searched per stage [ ] Regional availability verified [ ] Best practices incorporated [ ] Links provided

Fonctionnement

VERSION BÊTA