AWS Compute Blog

Adding voice to a CircuitPython project using Amazon Polly

An Adafruit PyPortal displaying a quote while synthesizing and playing speech using Amazon Polly.

An Adafruit PyPortal displaying a quote while synthesizing and playing speech using Amazon Polly.

As a natural means of communication, voice is a powerful way to humanize an experience. What if you could make anything talk? This guide walks through how to leverage the cloud to add voice to an off-the-shelf microcontroller. Use it to develop more advanced ideas, like a talking toaster that encourages healthy breakfast habits or a house plant that can express its needs.

This project uses an Adafruit PyPortal, an open-source IoT touch display programmed using CircuitPython, a lightweight version of Python that works on embedded hardware. You copy your code to the PyPortal like you would to a thumb drive and it runs. Random quotes from the PaperQuotes API are periodically displayed on the PyPortal LCD.

A microcontroller can’t do speech synthesis on its own so I use Amazon Polly, a natural text to speech synthesis service, to generate audio. Adding speech also extends accessibility to the visually impaired. This project includes an example for requesting arbitrary speech in addition to random quotes. Use this example to add a voice to any CircuitPython project.

An Adafruit PyPortal, an external speaker, and a microSD card.

An Adafruit PyPortal, an external speaker, and a microSD card.

I deploy the backend to the AWS Cloud using the AWS Serverless Application Repository. The code on the PyPortal makes a REST call to the backend to fetch a quote and synthesize speech audio for playback on the device.

Prerequisites

You need the following to complete the project:

Deploy the backend application

An architecture diagram of the serverless backend when requesting speech synthesis of a text string.

An architecture diagram of the serverless backend when requesting speech synthesis of a text string.

The serverless backend consists of an Amazon API Gateway endpoint that invokes an AWS Lambda function. If called with a JSON object containing text and voiceId attributes, it uses Amazon Polly to synthesize speech and uploads an MP3 file as a public object to Amazon S3. Upon completion, it returns the URL for downloading the audio file. It also processes the submitted text and adds return lines so that it can appear text-wrapped when displayed on the PyPortal. For a full list of voices, see the Amazon Polly documentation. An example response:

To fetch quotes instead of a text field, call the endpoint with a comma-separated list of tags as shown in the following diagram. The Lambda function then calls the PaperQuotes API. It fetches up to 50 quotes per tag and selects a random one to synthesize as speech. As with arbitrary text, it returns a URL and a text-wrapped representation of the quote.

An architecture diagram of the serverless backend when requesting a random quote from the PaperQuotes API to synthesize as speech.

An architecture diagram of the serverless backend when requesting a random quote from the PaperQuotes API to synthesize as speech.

I use the AWS Serverless Application Model (AWS SAM) to create the backend template. While it can be deployed using the AWS SAM CLI, you can also deploy from the AWS Management Console:

  1. Generate a free PaperQuotes API key at paperquotes.com. The serverless backend requires this to fetch quotes.
  2. Navigate to the aws-serverless-pyportal-polly application in the AWS Serverless Application Repository.
  3. Under Application settings, enter the parameter, PaperQuotesAPIKey.
  4. Choose Deploy.
  5. Once complete, choose View CloudFormation Stack.
  6. Select the Outputs tab and make a note of the SpeechApiUrl. This is required for configuring the PyPortal.
  7. Click the link listed for SpeechApiKey in the Outputs tab.
  8. Click Show to reveal the API key. Make a note of this. This is required for authenticating requests from the PyPortal to the SpeechApiUrl.

PyPortal setup

The following instructions walk through installing the latest version of the Adafruit CircuityPython libraries and firmware. It also shows how to enable an external speaker module.

  1. Follow these instructions from Adafruit to install the latest version of the CircuitPython bootloader. At the time of writing, the latest version is 5.3.0.
  2. Follow these instructions to install the latest Adafruit CircuitPython library bundle. I use bundle version 5.x.
  3. Insert the microSD card in the slot located on the back of the device.
  4. Cut the jumper pad on the back of the device labeled A0. This enables you to use an external speaker instead of the built-in speaker.
  5. Plug the external speaker connector into the port labeled SPEAKER on the back of the device.
  6. Optionally install the Mu Editor, a multi-platform code editor and serial debugger compatible with Adafruit CircuitPython boards. This can help with troubleshooting issues.
  7. Optionally if you have a 3D printer at home, you can print a case for your PyPortal. This can protect and showcase your project.

Code PyPortal

As with regular Python, CircuitPython does not need to be compiled to execute. You can flash new firmware on the PyPortal by copying a Python file and necessary assets to a mounted volume. The bootloader runs code.py anytime the device starts or any files are updated.

  1. Use a USB cable to plug the PyPortal into your computer and wait until a new mounted volume CIRCUITPY is available.
  2. Download the project from GitHub. Inside the project, copy the contents of /circuit-python on to the CIRCUITPY volume.
  3. Inside the volume, open and edit the secrets.py file. Include your Wi-Fi credentials along with the SpeechApiKey and SpeechApiUrl API Gateway endpoint. These can be found under Outputs in the AWS CloudFormation stack created by the AWS Serverless Application Repository.
  4. Save the file, and the device restarts. It takes a moment to connect to Wi-Fi and make the first request.
    Optionally, if you installed the Mu Editor, you can click on “Serial” to follow along the device log.

The PyPortal takes a few moments to connect to the Wi-Fi network and make its first request. On success, you hear it greet you and describe itself. The default interval is set to then display and read a quote every five minutes.

Understanding the CircuitPython code

See the bottom of circuit-python/code.py from the GitHub project. When the PyPortal connects to Wi-Fi, the first thing it does is synthesize an arbitrary “hello world” text for display. It then begins periodically displaying and “speaking” quotes.

# Connect to WiFi
print("Connecting to WiFi...")
wifi.connect()
print("Connected!")

displayQuote("Ready!")

speakText('Hello world! I am an Adafruit PyPortal running Circuit Python speaking to you using AWS Serverless', 'Joanna')

while True:
    speakQuote('equality, humanity', 'Joanna')
    time.sleep(60*secrets['interval'])

Both the speakText and speakQuote function call the synthesizeSpeech function. The difference is whether text or tags are passed to the API.

def speakText(text, voice):
    data = { "text": text, "voiceId": voice }
    synthesizeSpeech(data)

def speakQuote(tags, voice):
    data = { "tags": tags, "voiceId": voice }
    synthesizeSpeech(data)

The synthesizeSpeech function posts the data to the API Gateway endpoint. It then invokes the Lambda function and returns the MP3 URL and the formatted text. The downloadfile function is called to fetch the MP3 file and store it on the SD card. displayQuote is called to display the quote on the LCD. Finally, the playMP3 opens the file and plays the speech audio using the built-in or external speaker.

def synthesizeSpeech(data):
    response = postToAPI(secrets['endpoint'], data)
    downloadfile(response['url'], '/sd/cache.mp3')
    displayQuote(response['text'])
    playMP3("/sd/cache.mp3")

Modifying the Lambda function

The serverless application includes a Lambda function, SynthesizeSpeechFunction, which can be modified directly in the Lambda console. The AWS SAM template used to deploy the AWS Serverless Application Repository application adds policies for accessing the S3 bucket where audio is stored. It also grants access to Amazon Polly for synthesizing speech. It also adds the PaperQuote API token as an environment variable and sets API Gateway as an event source.

SynthesizeSpeechFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: lambda_functions/SynthesizeSpeech/
      Handler: app.lambda_handler
      Runtime: python3.8
      Policies:
        - S3FullAccessPolicy:
            BucketName: !Sub "${AWS::StackName}-audio"
        - Version: '2012-10-17'
          Statement:
            - Effect: Allow
              Action:
                - polly:*
              Resource: '*'
      Environment:
        Variables:
          BUCKET_NAME: !Sub "${AWS::StackName}-audio"
          PAPER_QUOTES_TOKEN: !Ref PaperQuotesAPIKey
      Events:
        Speech:
          Type: Api
          Properties:
            RestApiId: !Ref SpeechApi
            Path: /speech
            Method: post

To edit the Lambda function, navigate back to the CloudFormation stack and click on the SpeechSynthesizeFunction under the Resources tab.

From here, you can edit the Lambda function code directly. Clicking Save deploys the new code.

The getQuotes function is called to fetch quotes from the PaperQuotes API. You can change this to call from a different source, such as a custom selection of quotes. Try modifying it to fetch social media posts or study questions.

Conclusion

I show how to add natural sounding text to speech on a microcontroller using a serverless backend. This is accomplished by deploying an application through the AWS Serverless Application Repository. The deployed API uses API Gateway to securely invoke a Lambda function that fetches quotes from the PaperQuotes API and generates speech using Amazon Polly. The speech audio is uploaded to S3.

I then show how to program a microcontroller, the Adafruit PyPortal, using CircuitPython. The code periodically calls the serverless API to fetch a quote and to download speech audio for playback. The sample code also demonstrates synthesizing arbitrary text to speech, meaning it can be used for any project you can conceive. Check out my previous guide on using the PyPortal to create a Martian weather display for inspiration.