The Internet of Things on AWS – Official Blog

High availability patterns for AWS IoT Greengrass using Pacemaker

Edge computing downtime in industrial IoT environments can be both inconvenient and costly. Systems at the edge require continuous operation to maintain business continuity. While AWS IoT Greengrass delivers powerful edge computing capabilities, achieving true enterprise-grade high availability requires additional orchestration. This post shows how to use Pacemaker, a cluster resource manager, to build resilient edge infrastructure with automated failover.

In this walkthrough, you’ll learn to implement active/passive and active/active high availability patterns using Pacemaker with AWS IoT Greengrass, complete with automated failover, state replication, and monitoring integration.

The high availability challenge for edge computing

Traditional cloud applications benefit from built-in redundancy and auto-scaling, however, applications on the edge face unique challenges:

  • Physical isolation: Edge devices operate in remote locations with limited connectivity
  • Resource constraints: Unlike cloud environments, edge resources are finite and precious
  • Service criticality: Edge failures can halt physical operations immediately
  • Recovery complexity: Manual intervention at remote sites is expensive and slow

AWS IoT Greengrass addresses many edge computing challenges, but high availability requires thoughtful architecture beyond a single device deployment.

How Pacemaker enhances AWS IoT Greengrass

Pacemaker helps you build highly available AWS IoT Greengrass deployments through cluster management capabilities:

Proven reliability

  • Used in mission-critical environments for over a decade
  • Handles complex failure scenarios with sophisticated fencing mechanisms
  • Works in both active/passive and active/active configurations

AWS IoT Greengrass-aware resource management

  • Monitors Greengrass service health and component states
  • Manages shared storage for seamless state transfer
  • Coordinates failover of dependent services and network resources

Enterprise-ready integration

  • Integrates with existing Linux infrastructure management
  • Supports complex dependency chains and resource constraints
  • Provides detailed logging and monitoring for compliance requirements

Together, these tools keep your edge workloads running during hardware failures or network disruptions.

Architecture overview: High availability patterns

AWS IoT Greengrass high availability can be implemented using two primary patterns, each optimized for different use cases.

Active/Passive configuration: Maximizing data consistency

This mode maximizes data consistency and automated failover—ideal for mission-critical applications where data integrity and service continuity are paramount. One node runs Greengrass actively while the other stands ready in standby mode. A software-based, block-level data replication service like Distributed Replicated Block Device (DRBD) ensures instant state synchronization between nodes, enabling failover with zero data loss and maintaining device identity.


┌─────────────────┐    ┌─────────────────┐ 
│   Primary Node  │    │  Standby Node   │ 
│                 │    │                 │ 
│ ┌─────────────┐ │    │ ┌─────────────┐ │ 
│ │ Greengrass  │ │    │ │ Greengrass  │ │ 
│ │   ACTIVE    │ │    │ │  STANDBY    │ │ 
│ └─────────────┘ │    │ └─────────────┘ │ 
│                 │    │                 │ 
│ ┌─────────────┐ │    │ ┌─────────────┐ │ 
│ │   DRBD      │◄┼────┼►│   DRBD      │ │ 
│ │  Primary    │ │    │ │ Secondary   │ │ 
│ └─────────────┘ │    │ └─────────────┘ │ 
└─────────────────┘    └─────────────────┘ 

Key benefits:

This configuration ensures complete state preservation during failover with sub-minute downtime, zero data loss for in-flight transactions and critical operations, while maintaining device identity, certificates, and Stream Manager persistence seamlessly.

Real-world use cases:

Active/Passive configurations are essential in scenarios requiring zero or minimal data loss, such as in-flight entertainment systems that handle offline payment processing and battery manufacturing facilities where production lines depend on continuous data flow from critical manufacturing sensors and ML model outputs to maintain operational integrity and quality control.

Active/Active: Maximum throughput and scalability

This mode maximizes throughput and provides horizontal scaling for high-volume workloads. Multiple independent Greengrass instances run simultaneously across cluster nodes, with intelligent load balancing distributing work based on node health and capacity. Each node operates with its own unique device credentials and configurations.


┌─────────────────┐    ┌─────────────────┐ 
│   Node 1        │    │   Node 2        │ 
│                 │    │                 │ 
│ ┌─────────────┐ │    │ ┌─────────────┐ │ 
│ │ Greengrass  │ │    │ │ Greengrass  │ │ 
│ │   ACTIVE    │ │    │ │   ACTIVE    │ │ 
│ └─────────────┘ │    │ └─────────────┘ │ 
│                 │    │                 │ 
│ ┌─────────────┐ │    │ ┌─────────────┐ │ 
│ │Load Balancer│◄┼────┼►│Load Balancer│ │ 
│ │ (A/P Mode)  │ │    │ │ (Standby)   │ │ 
│ └─────────────┘ │    │ └─────────────┘ │ 
└─────────────────┘    └─────────────────┘ 

Key benefits:

These configurations enable horizontal scaling for high-throughput scenarios, improve resource utilization across nodes, and provide graceful degradation under partial failures.

Real-world use cases:

Active/Active configurations are ideal for high-volume scenarios such as automotive parts manufacturing facilities and large-scale manufacturing operations with multiple production lines, where each node handles different line segments to provide both redundancy and increased processing capacity for real-time analytics and anomaly detection.

Configuration selection guide

Use Active/Passive for applications that require zero data loss, shared state, and device identity preservation. This pattern works well when you need a single point of control and can accept failover times under one minute.Use Active/Active when you need high throughput and horizontal scaling. This pattern suits applications that can operate independently without shared state, where load distribution provides operational benefits, and graceful degradation is preferable to complete failover.

How to implementation the solution

The complete playbook, including detailed configuration examples and testing procedures, is available in the GitHub respository. This provides an Active/Passive implementation automation using Ansible that you can customize for your specific requirements. Active/Active setup steps are also available in MANUAL-SETUP-GUIDE within the same repository.

Setup steps

1. Environment setup

Clone the repository and set up the development environment

git clone https://github.com/aws-samples/sample-greengrass-ha-pacemaker.git
cd sample-greengrass-ha-pacemaker
./scripts/setup-dev-env.sh && source .venv/bin/activate

2. Configure cluster secrets

Generate and encrypt cluster credentials using Ansible Vault

# Create vault password file
echo "your_secure_password" > .vault_pass
chmod 600 .vault_pass
# Auto-generate encrypted secrets
./scripts/setup-vault.sh

This creates `vars/cluster-vault.yml` with encrypted credentials for cluster authentication and DRBD replication.

3. Prepare Greengrass credentials

Note: This approach is designed for testing and demonstration purposes only.

Download Greengrass installation files from AWS IoT Console.

  1. Navigate to AWS IoT Core console → Greengrass → Core devices
  2. Click ‘Set up one core device’ → ‘Set up a device with installer download’
  3. Name your device (e.g., ‘greengrass-ha-device’)
  4. Select or create a Thing Group
  5. Download both files and rename them:
    1. Rename hash-setup.sh to greengrass-setup.sh
    2. Rename hash.zip to greengrass-certs.zip
  6. Place files in `files/greengrass/` directory

4. Deploy and configure

This will deploy AWS EC2 and necessary resources to test on AWS.

# Deploy infrastructure
make cdk-deploy && make cdk-inventory
# Retrieve SSH private key
./scripts/get-ssh-key.sh
# Configure HA cluster
ansible-playbook playbooks/setup/system-prerequisites.yml -i inventory/cdk-dev-hosts
ansible-playbook playbooks/setup/configure-ha.yml -i inventory/cdk-dev-hosts --vault-password-file .vault_pass

5. Validate and test

Check cluster status and optionally, run an automated failover test.

# Check cluster status
ansible node-1 -i inventory/cdk-dev-hosts -m shell -a "sudo pcs status" --become
# Test failover (optional)
ansible-playbook playbooks/testing/test-failover-simulation.yml -i inventory/cdk-dev-hosts --vault-password-file .vault_pass

The automated tests validate resource migration, DRBD promotion, and data consistency during failover.

Cleanup

This will destroy the resources created by CDK.

# Destroy infrastructure
make cdk-destroy

Conclusion: Enterprise-ready edge computing

AWS IoT Greengrass and Pacemaker together provide the high availability needed for mission-critical edge deployments. By using Pacemaker’s cluster management capabilities, organizations can confidently deploy Greengrass where reliability is essential.Whether you’re managing industrial control systems, processing real-time analytics, or orchestrating edge AI workloads, this architectural pattern provides the foundation for resilient, scalable edge computing that your business can depend on.

Next steps

Ready to implement enterprise-grade high availability for your AWS IoT Greengrass deployments? Here’s your path forward:

Repository: sample-greengrass-ha-pacemaker

 


About the authors

Yong Ji Yong Ji is a Senior Solutions Architect at Amazon Web Services (AWS), helping enterprises build innovative cloud-based solutions. With over 25 years of experience in cloud architecture, analytics and data engineering, Yong brings deep technical expertise and a passion for solving complex business challenges. Outside of work, Yong is a passionate table tennis player.

Siddhant Srivastava Siddhant Srivastava is a Software Development Engineer with AWS IoT Greengrass. He has 3+ years of experience in edge computing with focus on building resilient, scalable distributed systems. Outside work, Siddhant participates in soccer leagues and billiards tournaments.