Amazon SWF has been applied to use cases in media processing, business process automation, data analytics, migration to the cloud, and batch processing. Some examples are:
Use case #1: Video encoding using Amazon S3 and Amazon EC2. In this use case, large videos are uploaded to Amazon S3 in chunks. The upload of chunks has to be monitored. After a chunk is uploaded, it is encoded by downloading it to an Amazon EC2 instance. The encoded chunk is stored to another Amazon S3 location. After all of the chunks have been encoded in this manner, they are combined into a complete encoded file which is stored back in its entirety to Amazon S3. Failures could occur during this process due to one or more chunks encountering encoding errors. Such failures need to be detected and handled.
With Amazon SWF: The entire application is built as a workflow where each video file is handled as one workflow execution. The tasks that are processed by different workers are: upload a chunk to Amazon S3, download a chunk from Amazon S3 to an Amazon EC2 instance and encode it, store a chunk back to Amazon S3, combine multiple chunks into a single file, and upload a complete file to Amazon S3. The decider initiates concurrent tasks to exploit the parallelism in the use case. It initiates a task to encode an uploaded chunk without waiting for other chunks to be uploaded. If a task for a chunk fails, the decider re-runs it for that chunk only. The application state kept by Amazon SWF helps the decider control the workflow. For example, the decider uses it to detect when all chunks have been encoded and to extract their Amazon S3 locations so that they can be combined. The execution’s progress is continuously tracked in the Amazon SWF Management Console. If there are failures, the specific tasks that failed are identified and used to pinpoint the failed chunks.
Use case #2: Processing large product catalogs using Amazon Mechanical Turk. While validating data in large catalogs, the products in the catalog are processed in batches. Different batches can be processed concurrently. For each batch, the product data is extracted from servers in the datacenter and transformed into CSV (Comma Separated Values) files required by Amazon Mechanical Turk’s Requester User Interface (RUI). The CSV is uploaded to populate and run the HITs (Human Intelligence Tasks). When HITs complete, the resulting CSV file is reverse transformed to get the data back into the original format. The results are then assessed and Amazon Mechanical Turk workers are paid for acceptable results. Failures are weeded out and reprocessed, while the acceptable HIT results are used to update the catalog. As batches are processed, the system needs to track the quality of the Amazon Mechanical Turk workers and adjust the payments accordingly. Failed HITs are re-batched and sent through the pipeline again.
With Amazon SWF: The use case above is implemented as a set of workflows. A BatchProcess workflow handles the processing for a single batch. It has workers that extract the data, transform it and send it through Amazon Mechanical Turk. The BatchProcess workflow outputs the acceptable HITs and the failed ones. This is used as the input for three other workflows: MTurkManager, UpdateCatalogWorkflow, and RerunProducts. The MTurkManager workflow makes payments for acceptable HITs, responds to the human workers who produced failed HITs, and updates its own database for tracking results quality. The UpdateCatalogWorkflow updates the master catalog based on acceptable HITs. The RerunProducts workflow waits until there is a large enough batch of products with failed HITs. It then creates a batch and sends it back to the BatchProcess workflow. The entire end-to-end catalog processing is performed by a CleanupCatalog workflow that initiates child executions of the above workflows. Having a system of well-defined workflows enables this use case to be architected, audited, and run systematically for catalogs with several million products.
Use case #3: Migrating components from the datacenter to the cloud. Business critical operations are hosted in a private datacenter but need to be moved entirely to the cloud without causing disruptions.
With Amazon SWF: Amazon SWF-based applications can combine workers that wrap components running in the datacenter with workers that run in the cloud. To transition a datacenter worker seamlessly, new workers of the same type are first deployed in the cloud. The workers in the datacenter continue to run as usual, along with the new cloud-based workers. The cloud-based workers are tested and validated by routing a portion of the load through them. During this testing, the application is not disrupted because the workers in the datacenter continue to run. After successful testing, the workers in the datacenter are gradually stopped and those in the cloud are scaled up, so that the workers are eventually run entirely in the cloud. This process can be repeated for all other workers in the datacenter so that the application moves entirely to the cloud. If for some business reason, certain processing steps must continue to be performed in the private data center, those workers can continue to run in the private data center and still participate in the application.
See our case studies for more exciting applications and systems that developers and enterprises are building with Amazon SWF.