AWS Compute Blog
Parallel Processing in Python with AWS Lambda
If you develop an AWS Lambda function with Node.js, you can call multiple web services without waiting for a response due to its asynchronous nature. All requests are initiated almost in parallel, so you can get results much faster than a series of sequential calls to each web service. Considering the maximum execution duration for Lambda, it is beneficial for I/O bound tasks to run in parallel.
If you develop a Lambda function with Python, parallelism doesn’t come by default. Lambda supports Python 2.7 and Python 3.6, both of which have multiprocessing and threading modules. The multiprocessing module supports multiple cores so it is a better choice, especially for CPU intensive workloads. With the threading module, all threads are going to run on a single core though performance difference is negligible for network-bound tasks.
In this post, I demonstrate how the Python multiprocessing module can be used within a Lambda function to run multiple I/O bound tasks in parallel.
Example use case
In this example, you call Amazon EC2 and Amazon EBS API operations to find the total EBS volume size for all your EC2 instances in a region.
This is a two-step process:
- The Lambda function calls EC2 to list all EC2 instances
- The function calls EBS for each instance to find attached EBS volumes
Sequential Execution
If you make these calls sequentially, during the second step, your code has to loop over all the instances and wait for each response before moving to the next request.
The class named VolumesSequential has the following methods:
- __init__ creates an EC2 resource.
- total_size returns all EC2 instances and passes these to the instance_volumes method.
- instance_volumes finds the total size of EBS volumes for the instance.
- total_size adds all sizes from all instances to find total size for the EBS volumes.
Source Code for Sequential Execution
import time
import boto3
class VolumesSequential(object):
    """Finds total volume size for all EC2 instances"""
    def __init__(self):
        self.ec2 = boto3.resource('ec2')
    def instance_volumes(self, instance):
        """
        Finds total size of the EBS volumes attached
        to an EC2 instance
        """
        instance_total = 0
        for volume in instance.volumes.all():
            instance_total += volume.size
        return instance_total
    def total_size(self):
        """
        Lists all EC2 instances in the default region
        and sums result of instance_volumes
        """
        print "Running sequentially"
        instances = self.ec2.instances.all()
        instances_total = 0
        for instance in instances:
            instances_total += self.instance_volumes(instance)
        return instances_total
def lambda_handler(event, context):
    volumes = VolumesSequential()
    _start = time.time()
    total = volumes.total_size()
    print "Total volume size: %s GB" % total
    print "Sequential execution time: %s seconds" % (time.time() - _start)Parallel Execution
The multiprocessing module that comes with Python 2.7 lets you run multiple processes in parallel. Due to the Lambda execution environment not having /dev/shm (shared memory for processes) support, you can’t use multiprocessing.Queue or multiprocessing.Pool.
If you try to use multiprocessing.Queue, you get an error similar to the following:
[Errno 38] Function not implemented: OSError
…
    sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue)
OSError: [Errno 38] Function not implemented
On the other hand, you can use multiprocessing.Pipe instead of multiprocessing.Queue to accomplish what you need without getting any errors during the execution of the Lambda function.
The class named VolumeParallel has the following methods:
- __init__ creates an EC2 resource
- instance_volumes finds the total size of EBS volumes attached to an instance
- total_size finds all instances and runs instance_volumes for each to find the total size of all EBS volumes attached to all EC2 instances.
Source Code for Parallel Execution
import time
from multiprocessing import Process, Pipe
import boto3
class VolumesParallel(object):
    """Finds total volume size for all EC2 instances"""
    def __init__(self):
        self.ec2 = boto3.resource('ec2')
    def instance_volumes(self, instance, conn):
        """
        Finds total size of the EBS volumes attached
        to an EC2 instance
        """
        instance_total = 0
        for volume in instance.volumes.all():
            instance_total += volume.size
        conn.send([instance_total])
        conn.close()
    def total_size(self):
        """
        Lists all EC2 instances in the default region
        and sums result of instance_volumes
        """
        print "Running in parallel"
        # get all EC2 instances
        instances = self.ec2.instances.all()
        
        # create a list to keep all processes
        processes = []
        # create a list to keep connections
        parent_connections = []
        
        # create a process per instance
        for instance in instances:            
            # create a pipe for communication
            parent_conn, child_conn = Pipe()
            parent_connections.append(parent_conn)
            # create the process, pass instance and connection
            process = Process(target=self.instance_volumes, args=(instance, child_conn,))
            processes.append(process)
        # start all processes
        for process in processes:
            process.start()
        # make sure that all processes have finished
        for process in processes:
            process.join()
        instances_total = 0
        for parent_connection in parent_connections:
            instances_total += parent_connection.recv()[0]
        return instances_total
def lambda_handler(event, context):
    volumes = VolumesParallel()
    _start = time.time()
    total = volumes.total_size()
    print "Total volume size: %s GB" % total
    print "Sequential execution time: %s seconds" % (time.time() - _start)
Performance
There are a few differences between two Lambda functions when it comes to the execution environment. The parallel function requires more memory than the sequential one. You may run the parallel Lambda function with a relatively large memory setting to see how much memory it uses. The amount of memory required by the Lambda function depends on what the function does and how many processes it runs in parallel. To restrict maximum memory usage, you may want to limit the number of parallel executions.
In this case, when you give 1024 MB for both Lambda functions, the parallel function runs about two times faster than the sequential function. I have a handful of EC2 instances and EBS volumes in my account so the test ran way under the maximum execution limit for Lambda. Remember that parallel execution doesn’t guarantee that the runtime for the Lambda function will be under the maximum allowed duration but does speed up the overall execution time.
Sequential Run Time Output
START RequestId: 4c370b12-f9d3-11e6-b46b-b5d41afd648e Version: $LATEST
Running sequentially
Total volume size: 589 GB
Sequential execution time: 3.80066084862 seconds
END RequestId: 4c370b12-f9d3-11e6-b46b-b5d41afd648e
REPORT RequestId: 4c370b12-f9d3-11e6-b46b-b5d41afd648e Duration: 4091.59 ms Billed Duration: 4100 ms  Memory Size: 1024 MB Max Memory Used: 46 MBParallel Run Time Output
START RequestId: 4f1328ed-f9d3-11e6-8cd1-c7381c5c078d Version: $LATEST
Running in parallel
Total volume size: 589 GB
Sequential execution time: 1.89170885086 seconds
END RequestId: 4f1328ed-f9d3-11e6-8cd1-c7381c5c078d
REPORT RequestId: 4f1328ed-f9d3-11e6-8cd1-c7381c5c078d Duration: 2069.33 ms Billed Duration: 2100 ms  Memory Size: 1024 MB Max Memory Used: 181 MB 
Summary
In this post, I demonstrated how to run multiple I/O bound tasks in parallel by developing a Lambda function with the Python multiprocessing module. With the help of this module, you freed the CPU from waiting for I/O and fired up several tasks to fit more I/O bound operations into a given time frame. This might be the trick to reduce the overall runtime of a Lambda function especially when you have to run so many and don’t want to split the work into smaller chunks.