The Internet of Things on AWS – Official Blog
High availability patterns for AWS IoT Greengrass using Pacemaker
Edge computing downtime in industrial IoT environments can be both inconvenient and costly. Systems at the edge require continuous operation to maintain business continuity. While AWS IoT Greengrass delivers powerful edge computing capabilities, achieving true enterprise-grade high availability requires additional orchestration. This post shows how to use Pacemaker, a cluster resource manager, to build resilient edge infrastructure with automated failover.
In this walkthrough, you’ll learn to implement active/passive and active/active high availability patterns using Pacemaker with AWS IoT Greengrass, complete with automated failover, state replication, and monitoring integration.
The high availability challenge for edge computing
Traditional cloud applications benefit from built-in redundancy and auto-scaling, however, applications on the edge face unique challenges:
- Physical isolation: Edge devices operate in remote locations with limited connectivity
- Resource constraints: Unlike cloud environments, edge resources are finite and precious
- Service criticality: Edge failures can halt physical operations immediately
- Recovery complexity: Manual intervention at remote sites is expensive and slow
AWS IoT Greengrass addresses many edge computing challenges, but high availability requires thoughtful architecture beyond a single device deployment.
How Pacemaker enhances AWS IoT Greengrass
Pacemaker helps you build highly available AWS IoT Greengrass deployments through cluster management capabilities:
Proven reliability
- Used in mission-critical environments for over a decade
- Handles complex failure scenarios with sophisticated fencing mechanisms
- Works in both active/passive and active/active configurations
AWS IoT Greengrass-aware resource management
- Monitors Greengrass service health and component states
- Manages shared storage for seamless state transfer
- Coordinates failover of dependent services and network resources
Enterprise-ready integration
- Integrates with existing Linux infrastructure management
- Supports complex dependency chains and resource constraints
- Provides detailed logging and monitoring for compliance requirements
Together, these tools keep your edge workloads running during hardware failures or network disruptions.
Architecture overview: High availability patterns
AWS IoT Greengrass high availability can be implemented using two primary patterns, each optimized for different use cases.
Active/Passive configuration: Maximizing data consistency
This mode maximizes data consistency and automated failover—ideal for mission-critical applications where data integrity and service continuity are paramount. One node runs Greengrass actively while the other stands ready in standby mode. A software-based, block-level data replication service like Distributed Replicated Block Device (DRBD) ensures instant state synchronization between nodes, enabling failover with zero data loss and maintaining device identity.
Key benefits:
This configuration ensures complete state preservation during failover with sub-minute downtime, zero data loss for in-flight transactions and critical operations, while maintaining device identity, certificates, and Stream Manager persistence seamlessly.
Real-world use cases:
Active/Passive configurations are essential in scenarios requiring zero or minimal data loss, such as in-flight entertainment systems that handle offline payment processing and battery manufacturing facilities where production lines depend on continuous data flow from critical manufacturing sensors and ML model outputs to maintain operational integrity and quality control.
Active/Active: Maximum throughput and scalability
This mode maximizes throughput and provides horizontal scaling for high-volume workloads. Multiple independent Greengrass instances run simultaneously across cluster nodes, with intelligent load balancing distributing work based on node health and capacity. Each node operates with its own unique device credentials and configurations.
Key benefits:
These configurations enable horizontal scaling for high-throughput scenarios, improve resource utilization across nodes, and provide graceful degradation under partial failures.
Real-world use cases:
Active/Active configurations are ideal for high-volume scenarios such as automotive parts manufacturing facilities and large-scale manufacturing operations with multiple production lines, where each node handles different line segments to provide both redundancy and increased processing capacity for real-time analytics and anomaly detection.
Configuration selection guide
Use Active/Passive for applications that require zero data loss, shared state, and device identity preservation. This pattern works well when you need a single point of control and can accept failover times under one minute.Use Active/Active when you need high throughput and horizontal scaling. This pattern suits applications that can operate independently without shared state, where load distribution provides operational benefits, and graceful degradation is preferable to complete failover.
How to implementation the solution
The complete playbook, including detailed configuration examples and testing procedures, is available in the GitHub respository. This provides an Active/Passive implementation automation using Ansible that you can customize for your specific requirements. Active/Active setup steps are also available in MANUAL-SETUP-GUIDE within the same repository.
Setup steps
1. Environment setup
Clone the repository and set up the development environment
2. Configure cluster secrets
Generate and encrypt cluster credentials using Ansible Vault
This creates `vars/cluster-vault.yml` with encrypted credentials for cluster authentication and DRBD replication.
3. Prepare Greengrass credentials
Note: This approach is designed for testing and demonstration purposes only.
Download Greengrass installation files from AWS IoT Console.
- Navigate to AWS IoT Core console → Greengrass → Core devices
- Click ‘Set up one core device’ → ‘Set up a device with installer download’
- Name your device (e.g., ‘greengrass-ha-device’)
- Select or create a Thing Group
- Download both files and rename them:
- Rename hash-setup.sh to greengrass-setup.sh
- Rename hash.zip to greengrass-certs.zip
- Place files in `files/greengrass/` directory
4. Deploy and configure
This will deploy AWS EC2 and necessary resources to test on AWS.
5. Validate and test
Check cluster status and optionally, run an automated failover test.
The automated tests validate resource migration, DRBD promotion, and data consistency during failover.
Cleanup
This will destroy the resources created by CDK.
Conclusion: Enterprise-ready edge computing
AWS IoT Greengrass and Pacemaker together provide the high availability needed for mission-critical edge deployments. By using Pacemaker’s cluster management capabilities, organizations can confidently deploy Greengrass where reliability is essential.Whether you’re managing industrial control systems, processing real-time analytics, or orchestrating edge AI workloads, this architectural pattern provides the foundation for resilient, scalable edge computing that your business can depend on.
Next steps
Ready to implement enterprise-grade high availability for your AWS IoT Greengrass deployments? Here’s your path forward:
Repository: sample-greengrass-ha-pacemaker
- AWS IoT Greengrass documentation
- Pacemaker documentation
- DRBD user’s guide
- High Availability cluster best practices
About the authors
Yong Ji Yong Ji is a Senior Solutions Architect at Amazon Web Services (AWS), helping enterprises build innovative cloud-based solutions. With over 25 years of experience in cloud architecture, analytics and data engineering, Yong brings deep technical expertise and a passion for solving complex business challenges. Outside of work, Yong is a passionate table tennis player.
Siddhant Srivastava Siddhant Srivastava is a Software Development Engineer with AWS IoT Greengrass. He has 3+ years of experience in edge computing with focus on building resilient, scalable distributed systems. Outside work, Siddhant participates in soccer leagues and billiards tournaments.