Business Productivity

Using the Amazon WorkDocs API with the AWS SDK for Python and AWS Lambda

Amazon WorkDocs is a secure, fully managed file collaboration and management service. In addition to direct user access from various devices, you can use the WorkDocs API to develop applications that offer more documentation functions to meet your business needs. In this blog post, we demonstrate how to use the WorkDocs API with the AWS SDK for Python and AWS Lambda through a WorkDocs use case.

The following diagram illustrates a use case where users collaborate on their documentation work through either a web interface or through the WorkDocs web client. At the core of this application is a set of document processing functions, highlighted by the blue square in the diagram. These processing functions are triggered by WorkDocs events, such as a document update from the application web page or WorkDocs web client. When events occur, the functions access relevant documents stored in WorkDocs, process, merge, or generate document contents in local memory, and then upload the processed or new documents to WorkDocs. This event-driven application automates document processing in seconds.

WorkDocs Application Use Case

To meet the application requirements, we use the WorkDocs linkage to Amazon Simple Notification Service (Amazon SNS) for WorkDocs event notification, AWS Lambda via Amazon API Gateway for event handling, and the AWS SDK for Python for WorkDocs API access. This design is illustrated in the diagram below.

WorkDocs Application Design

Following the design, let’s briefly go through the event-driven application workflow. On a document update event, WorkDocs generates an event notification through Amazon SNS. The Lambda function receives the event through Amazon API Gateway. Then, depending on the event type, the Lambda function runs the application code to parse the WorkDocs folders and download relevant documents, calls the application’s document processing functions and, at the end, uploads the processed or generated documents back to WorkDocs.

We implement this workflow through the following steps:

  • Create an IAM role for accessing the WorkDocs service.
  • Set up a Lambda function.
  • Set up API Gateway.
  • Enable and subscribe to WorkDocs SNS notification.
  • Make WorkDocs API calls using the AWS SDK for Python from the Lambda function.

We describe the implementation of each of the steps below.

Note that the WorkDocs service has taken care of the WorkDocs SNS association internally and thus, we don’t need to implement it.

Create an IAM role for WorkDocs access

We create an IAM role for two purposes: to enable WorkDocs SNS notification and to enable the Lambda function to access WorkDocs documents. We define the role (named “Lambda-WorkDocs”) by first choosing the Lambda service that uses the role, and then attaching permission policies.

The WorkDocsFullAccessAll policy permits full WorkDocs access. You may restrict the permission scope based on your application’s access requirements.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "workdocs:*"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ]
}

AWSLambdaBasicExecutionRole is an AWS managed policy for Lambda function log creation and access. You can use Amazon CloudWatch Logs to track the WorkDocs events that Lambda received or to debug Lambda callback functions.

The role’s Trust relationships show the Lambda service (lambda.amazonaws.com) as the trusted entity to resume the role.

Set up a Lambda function

In the Lambda dashboard, create a Lambda function from scratch with the following parameters.

The created Lambda function “workdocs-app” appears on the dashboard. Because API Gateway will trigger this Lambda function, we set up API Gateway next.

The Lambda function code is composed of two segments: a lambda_handler() triggered by the WorkDocs update events, and application functions invoked by the handler. Here is a Python code structure layout.

import boto3

workdocs = boto3.client('workdocs')

def app_method_1()
...
def app_method_n()

def lambda_handler(event, context):
# determine WorkDocs event type
# download documents
# invoke app_method_x() to process the documents
# upload the processed documents 
    return

Because we use the AWS SDK for Python to access the WorkDocs API, we import Boto3 (AWS SDK for Python) and instantiate a Boto3 client. We provide code examples later in this post.

Set up API Gateway

To relay WorkDocs events to the Lambda function, we use API Gateway as the SNS endpoint for WorkDocs. In turn, API Gateway posts the events to its integration endpoint, Lambda.

Let’s use the API Gateway dashboard to create a new API named “workdocs-event-post”, as shown below.

 

For the created “workdocs-event-post” API, we set up the API Resource Action with POST method.

 

At the last step in this exercise, we publish the workdocs-event-post API to the “Production” stage using the API Gateway dashboard. Notice that the published URL, https://xxxxxxxxxx.execute-api.us-west-2.amazonaws.com/Production/, is the API endpoint. Calling it invokes the Lambda function.

Furthermore, publishing the API leads to an update of the API integration endpoint, the Lambda function, as we can see on the Lambda dashboard now.

Enable and subscribe to the WorkDocs notification

Now that we have set up API Gateway, we need to connect the WorkDocs event notification process with API Gateway through two steps:

  • Enable WorkDocs SNS notification
  • Subscribe the SNS notification to API Gateway

To enable the WorkDocs SNS notification, we use the WorkDocs console. On the Manage Your WorkDocs Sites page, choose the WorkDocs site and then choose Actions, Manage Notifications. Choose Enable Notification. Then enter the Amazon Resource Name (ARN) of either the IAM role or user that we created for the WorkDocs application.

To subscribe the SNS notification to API Gateway, we can use the AWS WorkDocs CLI command or the WorkDocs SDK. Here is the synopsis for the CLI or SDK call.

create-notification-subscription
--organization-id <d-xxxxxxxxxx under AWS Directory Service>
--protocol HTTPS
--subscription-type ALL
--notification-endpoint <API Gateway endpoint URL>

Make WorkDocs API calls using the AWS SDK for Python

The Lambda function executes application code to process and manage documents when WorkDocs events occur. We have laid out the Lambda function code structure above. Here we provide a few code examples to show you how to use the WorkDocs API with the AWS SDK for Python. To focus on WorkDocs-related code, we omit other code, such as error handling.

For the WorkDocs Python SDK specification, refer to AWS Python SDK Boto 3 Documentation: WorkDocs Service.

Get WorkDocs event data

import json
import urllib    
import requests  

import boto3

workdocs = boto3.client('workdocs')

def lambda_handler(event):
    message = event['Message']
    message_dict = json.loads(message)
    workdocs_action = message_dict['action']
    version_id = message_dict['entityId']
    document_id = message_dict['parentEntityId']
    ...

Get document name

response = workdocs.get_document_version(
    DocumentId=document_id,
    VersionId=version_id
)
document_name = response['Metadata']['Name']
...

Download a document from WorkDocs to the Lambda local file system

response = workdocs.get_document_version(
    DocumentId=document_id,
    VersionId=version_id,
    Fields='SOURCE'
)
url = response['Metadata']['Source']['ORIGINAL']
urllib.urlretrieve(url, <local_path/document_name>)
...

Search a document per name string file_name

response = workdocs.describe_users(
    OrganizationId=organization_id
)
root_folder_id = response['Users'][0]['RootFolderId']

response = workdocs.describe_folder_contents(FolderId=root_folder_id)
for entry in response['Documents']:
    if file_name in entry['LatestVersionMetadata']['Name']: 
        version_id = entry['LatestVersionMetadata']['Id']
        document_id = entry['Id']
        break
    else:
        raise ValueError('%s not found in documents.' % file_name)
...

Note that this code snippet searches for the file only in the current folder in the user’s MyDocs. For files in subfolders, the code needs to iterate over the subfolders by calling describe_folder_contents() on them and performing the lookup.

Upload an updated document from the Lambda local workspace back to WorkDocs

# Get the document parent folder 
response = workdocs.get_document(
    DocumentId=document_id
)
parentfolder_id = response['Metadata']['ParentFolderId']

# Create a latest version object and obtain its upload URL
response = workdocs.initiate_document_version_upload(
    Name=file_name, 
    ContentType="text/plain",
    ParentFolderId=parentfolder_id
)
version_id_latest = response['Metadata']['LatestVersionMetadata']['Id']
url = response['UploadMetadata']['UploadUrl']
    
# Upload the document
headers = {'Content-Type': 'text/plain'}
headers = {'x-amz-server-side-encryption': 'AES256'}
data = open(document_path + compiled_report, 'r').read()
response = requests.put(url, headers=headers, data=data.encode('utf-8'))

# Enable the uploaded document
response = workdocs.update_document_version(
    DocumentId=document_id,
    VersionId=version_id_latest,
    VersionStatus='ACTIVE'
)
... 

Wrap up

This blog describes a WorkDocs use case that employs WorkDocs, the AWS SDK for Python, and AWS Lambda to automate document processing.

The following related resources can help you as you develop WorkDocs applications.

Jim Huang

Jim Huang