Migration & Modernization

Replacing Netflix Conductor with AWS Step Functions: What We Learned

Introduction

Accelerate your business transformation by migrating from Netflix Conductor to AWS Step Functions. This strategic move offers significant benefits:

  • Cost Reduction: Eliminate infrastructure management costs and take advantage of AWS’s pay-per-use model.
  • Increased Reliability: Benefit from AWS’s fully managed service with a 99.9% availability SLA.
  • Enhanced Scalability: Scale to up to 1 million open executions per account, with limits that can be raised on request.
  • Improved Developer Productivity: Use visual workflow design and native AWS service integrations.

Following Netflix’s announcement that it discontinued active development of Conductor in December 2023, organizations need a clear migration path that ensures business continuity while gaining Step Functions’ operational and cost advantages. The open-source project continues under Orkes and community stewardship, but organizations seeking a fully managed AWS-native solution will find Step Functions a strong fit. This guide provides a step-by-step approach to transition your critical workflows.

Organizations currently using Netflix Conductor, whether the open-source version or the commercial Orkes offering, will find a clear migration path to AWS Step Functions in this guide.

By migrating to AWS Step Functions, you can:

  • Use native SDK integrations with 220+ AWS services without custom glue code.
  • Future-proof your orchestration layer with ongoing AWS innovations.

In this post, we’ll share what we learned while helping a customer successfully migrate from Conductor to Step Functions. You’ll learn:

  • A step-by-step migration approach that minimizes disruption.
  • Key architectural decisions that set you up for success.
  • Best practices for evaluating and executing the transition.

We’ve also created a complete Conductor to Step Functions Migration GitHub repository you can use as a starting point.

Prerequisites

Before beginning your migration from Netflix Conductor to AWS Step Functions, ensure you have:

Note: Ensure that sensitive data such as API keys, credentials, and personally identifiable information (PII) are never hardcoded in workflow definitions or task worker code. Use AWS Secrets Manager or AWS Systems Manager Parameter Store for secrets management.

Understanding the Landscape: Netflix Conductor vs. AWS Step Functions

Netflix Conductor Overview

Netflix Conductor is an open-source workflow orchestration engine developed by Netflix to handle its microservice orchestration needs. Designed to:

  • Coordinate tasks across microservices.
  • Handle complex workflows with branching and dynamic behavior.
  • Provide visibility into workflow execution.
  • Scale to Netflix’s massive operational requirements.

Conductor uses a JSON-based domain-specific language (DSL) to define workflows and tasks, with a focus on flexibility and extensibility.

Orkes: The Commercial Version of Netflix Conductor

Orkes is a commercial offering based on the Netflix Conductor framework, providing:

  • Fully Managed Service: Orkes Cloud provides a SaaS offering with zero infrastructure management.
  • Enhanced Security: Enterprise-grade security features including SSO, RBAC, and audit logs.
  • Advanced UI: Improved workflow designer and monitoring dashboards.
  • Premium Support: 24/7 enterprise support with SLAs.
  • Scalability: Automatic scaling to handle enterprise workloads.
  • Pre-built Integrations: Ready-to-use connectors for popular services and APIs.
  • Compliance: SOC 2 Type II compliance and other enterprise certifications.
  • Hybrid Deployment: Options for cloud, on-premises, or hybrid deployments.

Orkes maintains full compatibility with the open-source Netflix Conductor while adding enterprise features that make it more comparable to AWS Step Functions in terms of managed service capabilities.

AWS Step Functions Overview

AWS Step Functions is a fully managed service that facilitates coordinating distributed applications and microservices using visual workflows. Key features include:

  • Visual workflow designer with Amazon States Language (ASL).
  • Native integration with 220+ AWS services.
  • Built-in error handling and retry mechanisms.
  • Pay-per-use pricing model.
  • Automatic scaling and high availability through orchestrating other AWS services that have auto-scaling capabilities.
  • Comprehensive observability through Amazon CloudWatch and AWS X-Ray.

Step Functions offers two types of workflows:

  1. Standard: For long-running processes.
  2. Express: For high-volume, short-running processes.

Key Differences and Advantages

The following table provides a quick comparison of the three orchestration platforms:

# Feature Netflix Conductor (Open Source) Orkes (Commercial) AWS Step Functions
1 Deployment Model Self-hosted Fully managed SaaS / Hybrid Fully managed
2 Infrastructure Management Manual (Amazon EC2, Amazon RDS, Elasticsearch) Zero Zero
3 AWS Service Integration Custom HTTP workers required Pre-built connectors Native (220+ services)
4 Pricing Model Infrastructure + operational costs Subscription-based tiers Pay-per-use (state transitions)
5 Scalability Manual configuration Automatic Automatic
6 Visual Designer Basic UI Enhanced workflow designer AWS Console + Workflow Studio
7 Error Handling Manual retry configuration Enhanced with monitoring Built-in Retry/Catch/Fallback
8 Security & Compliance Basic authentication SSO, RBAC, SOC 2 Type II IAM, VPC, AWS CloudTrail, HIPAA
9 Monitoring Custom setup required Advanced dashboards CloudWatch, X-Ray integration
10 State Management Global workflow context Enhanced with larger payloads Explicit data passing (InputPath/OutputPath)
11 Workflow Definition JSON DSL JSON DSL (compatible) Amazon States Language (ASL)
12 Multi-cloud Support Yes Yes AWS only
13 Typical Monthly Cost $1,200+ (small deployment) Varies by tier ~$200 (for 1,000 workflows/month, usage-based)
14 Operational Overhead High (patching, scaling, monitoring) Low (managed service) None (fully managed)
15 Support Model Community 24/7 enterprise support AWS Support plans

The key advantages of Step Functions are native service integration (eliminating custom HTTP workers), declarative error handling with Retry/Catch blocks, and zero operational overhead. See the service integration code comparison and error handling example in our reference implementation.

Note: For organizations with regulatory requirements (e.g., HIPAA, SOC 2, PCI DSS), AWS Step Functions supports compliance through IAM policies, CloudTrail audit logging, and encryption. Review the AWS Compliance Programs page for the latest certifications applicable to your workloads.

Migration Strategy

Step 1: Analyze and Map Workflows

Start by inventorying your Conductor workflows and mapping constructs to Step Functions equivalents. See the complete construct mapping table in our Conductor to Step Functions Migration GitHub repository.

# Conductor Construct Step Functions Equivalent
1 Simple Task Task State
2 Switch/Decision Task Choice State
3 Fork/Join Parallel State
4 Sub-workflow Nested workflow execution
5 Wait Task Wait State

Step 2: Migrate Task Workers

Convert Conductor HTTP task workers to Lambda functions. The core business logic stays the same; what changes is the wrapper. See the before/after task worker code.

Where possible, replace custom integration code with native Step Functions service integrations.

Step 3: Convert Workflow Definitions

Translate Conductor JSON DSL to Amazon States Language (ASL). Key differences to account for:

  • Data flow: Conductor uses a global workflow context (${workflow.input.*}). Step Functions uses explicit InputPath, OutputPath, and ResultPath for precise data routing between states.
  • Error handling: Add retry and catch blocks declaratively in ASL rather than configuring retries per task in Conductor.
  • State management: Conductor accumulates state globally; Step Functions passes data explicitly between states.

Important: When converting workflow definitions, review the data you pass between states to ensure you do not inadvertently expose sensitive information in execution logs. Use Step Functions’ logging configuration to control the level of detail the service captures and avoid passing secrets or PII directly in state payloads.

See the full workflow conversion example for a side-by-side comparison.

Step 4: Phased Cutover

We recommend a gradual migration:

# Phase Duration Activities
1 Assessment Week 1–2 Inventory workflows, map dependencies, prioritize
2 Pilot Week 3–6 Migrate 2–3 non-critical workflows, establish CI/CD
3 Bulk Migration Week 7–12 Migrate remaining workflows, run parallel validation
4 Production Cutover Week 13–16 Shift traffic gradually (10% → 50% → 100%), decommission Conductor

Total timeline: 3–4 months for organizations with 20–50 workflows. Simpler environments can move faster; complex integrations or regulatory requirements might extend the timeline.

Use Case: Media Processing

Media processing workflows automate the ingestion, transformation, and delivery of video and audio content. These workflows typically involve multiple coordinated steps: uploading raw media files, extracting technical metadata (format, duration, resolution), transcoding to multiple output formats for different devices and bandwidths, and delivering processed content to storage or CDNs.

This use case is particularly relevant for demonstrating workflow orchestration migration because it showcases common enterprise requirements: event-driven triggers (Amazon S3 uploads), long-running tasks (transcoding jobs), integration with external services (AWS Elemental MediaConvert), state tracking across multiple stages, and reliable error handling for failed processing attempts.

The following architecture shows how we implemented this media processing workflow in both Conductor and Step Functions, enabling a direct comparison of the two approaches.

Architecture

End-to-end architecture diagram comparing Netflix Conductor and AWS Step Functions implementations of a media processing workflow. The left side shows Conductor with ECS-based HTTP task workers connecting to MediaConvert. The right side shows Step Functions with Lambda functions connecting to MediaConvert. Both share Amazon S3 for storage, Amazon EventBridge for event routing, Amazon SQS for message queuing, and Amazon DynamoDB for state tracking.

Figure 1: End-to-end architecture for media orchestration flow comparing Netflix Conductor (ECS task workers) and AWS Step Functions (Lambda functions) implementations.

  1. Media Asset Upload — Use Amazon S3 for bucket-to-bucket transfer of large-scale media assets, or browser-based direct upload via pre-signed URLs for individual file ingestion.
  2. Event Management and Routing — Amazon EventBridge orchestrates S3 object creation events with granular filtering. EventBridge integration with Amazon SQS includes Dead Letter Queue (DLQ) implementation for reliable error handling and event persistence.
  3. Event Processing and Workflow Initiation — IngestListenerLambda consumes SQS messages and performs transformations to align with workflow engine specifications, using AWS SDK for Step Functions or REST APIs for Netflix Conductor workflow initiation.
  4. Workflow Orchestration — Implements a dual-stage media processing workflow with optional status tracking.
  5. Netflix Conductor: Stage 1 executes MediaInfo analysis via HTTP-based task on Amazon Elastic Container Service (ECS); Stage 2 executes a MediaConvert job for proxy generation.
  6. AWS Step Functions: Stage 1 uses Lambda-triggered MediaInfo analysis; Stage 2 uses a Lambda-triggered MediaConvert job for proxy generation.

Security Note: Enable AWS CloudTrail for audit logging of all Step Functions API calls and configure Amazon CloudWatch alarms to monitor for unauthorized access attempts or workflow execution anomalies. For production deployments, consider integrating with AWS Security Hub for centralized security monitoring.

Reference Implementation

Both implementations use TypeScript, AWS CDK v2 for infrastructure, and identical business logic making them ideal for side-by-side comparison. The Conductor version requires custom HTTP task workers running on ECS, while the Step Functions version replaces these with Lambda functions and native service integrations with built-in retry and error handling.

We also built a CLI tool Workflow Migrator that automates the structural conversion of Conductor JSON DSL to Step Functions ASL, which you can use as a starting point for your own migrations.

We used Amazon Q Developer to accelerate development of both implementations. See the complete workflow definitions, Amazon Q prompts used, and the full source in the Conductor to Step Functions Migration GitHub repository.

Results

Based on our reference implementation and architectural comparison, organizations may see the following improvements. Individual results vary based on workflow complexity and scale:

  • 40–60% reduction in infrastructure costs by eliminating self-managed Conductor servers.
  • Up to 75% reduction in deployment complexity by replacing custom HTTP task workers with native service integrations.
  • 80% less integration code when using Step Functions’ 220+ direct service integrations versus custom Conductor workers.
  • Reduced operational overhead by eliminating patching, scaling, and monitoring of Conductor infrastructure.

For a mid-size deployment running 1,000 workflows/month with 8 states average, Step Functions costs approximately $200/month compared to $1,200+/month for self-hosted Conductor (EC2, RDS, Elasticsearch) — an 83% cost reduction.

Cleanup

To avoid ongoing charges from the reference implementation, remove all deployed resources:

cd sample-migrate-netflix-conductor-to-step-functions
cdk destroy --all

This removes the Step Functions state machine, Lambda functions, DynamoDB tables, SQS queues, EventBridge rules, and S3 buckets. Verify in the AWS CloudFormation console that all stacks have been deleted.

Conclusion

Migrating from Netflix Conductor to AWS Step Functions reduces operational overhead, lowers costs, and provides tighter integration with the AWS ecosystem. The key is a phased approach: map your workflows, convert task workers to Lambda functions, translate definitions to ASL, and cut over gradually.

Important: Security is a shared responsibility between AWS and the customer. While AWS Step Functions provides built-in security features such as IAM integration, encryption at rest, and CloudTrail logging, customers are responsible for securing their workflow definitions, managing access controls, and ensuring compliance with their organization’s regulatory requirements. Refer to the AWS Shared Responsibility Model for details.

To get started:

  1. Clone the Conductor to Step Functions Migration GitHub repository and explore both approaches side-by-side.
  2. Inventory your existing Conductor workflows and identify pilot candidates.
  3. Review the AWS Step Functions documentation and Step Functions Workshop.

Additional Resources