AWS Machine Learning Blog

Use AWS AI and ML services to foster accessibility and inclusion of people with a visual or communication impairment

AWS offers a broad set of artificial intelligence (AI) and machine learning (ML) services, including a suite of pre-trained, ready-to-use services for developers with no prior ML experience. In this post, we demonstrate how to use such services to build an application that fosters the inclusion of people with a visual or communication impairment, which includes difficulties in seeing, reading, hearing, speaking, or having a conversation in a foreign language. With services such as Amazon Transcribe, Amazon Polly, Amazon Translate, Amazon Rekognition and Amazon Textract, you can add features to your projects such as live transcription, text to speech, translation, object detection, and text extraction from images.

Screenshots of the web app showcasing five features of AWS AugmentAbility.

According to the World Health Organization, over 1 billion people—about 15% of the global population—live with some form of disability, and this number is likely to grow because of population ageing and an increase in the prevalence of some chronic diseases. For people with a speech, hearing, or visual impairment, everyday tasks such as listening to a speech or a TV program, expressing a feeling or a need, looking around, or reading a book can feel like impossible challenges. A wide body of research highlights the importance of assistive technologies for the inclusion of people with disabilities in society. According to research by the European Parliamentary Research Service, mainstream technologies such as smartphones provide more and more capabilities suitable for addressing the needs of people with disabilities. In addition, when you design for people with disabilities, you tend to build features that improve the experience for everyone; this is known as the curb-cut effect.

This post demonstrates how you can use the AWS SDK for JavaScript to integrate capabilities provided by AWS AI services into your own solutions. To do that, a sample web application showcases how to use Amazon Transcribe, Amazon Polly, Amazon Translate, Amazon Rekognition, and Amazon Textract to easily implement accessibility features. The source code of this application, AWS AugmentAbility, is available on GitHub to use as a starting point for your own projects.

Solution overview

AWS AugmentAbility is powered by five AWS AI services: Amazon Transcribe, Amazon Translate, Amazon Polly, Amazon Rekognition, and Amazon Textract. It also uses Amazon Cognito user pools and identity pools for managing authentication and authorization of users.

After deploying the web app, you will be able to access the following features:

  • Live transcription and text to speech – The app transcribes conversations and speeches for you in real time using Amazon Transcribe, an automatic speech recognition service. Type what you want to say, and the app says it for you by using Amazon Polly text-to-speech capabilities. This feature also integrates with Amazon Transcribe automatic language identification for streaming transcriptions—with a minimum of 3 seconds of audio, the service can automatically detect the dominant language and generate a transcript without you having to specify the spoken language.
  • Live transcription and text to speech with translation – The app transcribes and translates conversations and speeches for you, in real time. Type what you want to say, and the app translates and says it for you. Translation is available in the over 75 languages currently supported by Amazon Translate.
  • Real-time conversation translation – Select a target language, speak in your language, and the app translates what you said in your target language by combining Amazon Transcribe, Amazon Translate, and Amazon Polly capabilities.
  • Object detection – Take a picture with your smartphone, and the app describes the objects around you by using Amazon Rekognition label detection features.
  • Text recognition for labels, signs, and documents – Take a picture with your smartphone of any label, sign, or document, and the app reads it out loud for you. This feature is powered by Amazon Rekognition and Amazon Textract text extraction capabilities. AugmentAbility can also translate the text into over 75 languages, or make it more readable for users with dyslexia by using the OpenDyslexic font.

Live transcription, text to speech, and real-time conversation translation features are currently available in Chinese, English, French, German, Italian, Japanese, Korean, Brazilian Portuguese, and Spanish. Text recognition features are currently available in Arabic, English, French, German, Italian, Portuguese, Russian, and Spanish. An updated list of the languages supported by each feature is available on the AugmentAbility GitHub repo.

You can build and deploy AugmentAbility locally on your computer or in your AWS account by using AWS Amplify Hosting, a fully managed CI/CD and static web hosting service for fast, secure, and reliable static and server-side rendered apps.

The following diagram illustrates the architecture of the application, assuming that it’s deployed in the cloud using AWS Amplify Hosting.

Architecture diagram including AWS Amplify, Amazon Cognito, Transcribe, Translate, Polly, Rekognition, Textract.

The solution workflow includes the following steps:

  1. A mobile browser is used to access the web app—an HTML, CSS, and JavaScript application hosted by AWS Amplify Hosting. The application has been implemented using the SDK for JavaScript and the AWS Amplify JavaScript library.
  2. The user signs in by entering a user name and a password. Authentication is performed against the Amazon Cognito user pool. After a successful login, the Amazon Cognito identity pool is used to provide the user with the temporary AWS credentials required to access app features.
  3. While the user explores the different features of the app, the mobile browser interacts with Amazon Transcribe (StartStreamTranscriptionWebSocket operation), Amazon Translate (TranslateText operation), Amazon Polly (SynthesizeSpeech operation), Amazon Rekognition (DetectLabels and DetectText operations) and Amazon Textract (DetectDocumentText operation).

AWS services have been integrated in the mobile web app by using the SDK for JavaScript. Generally speaking, the SDK for JavaScript provides access to AWS services in either browser scripts or Node.js; for this sample project, the SDK is used in browser scripts. For additional information about how to access AWS services from a browser script, refer to Getting Started in a Browser Script. The SDK for JavaScript is provided as a JavaScript file supporting a default set of AWS services. This file is typically loaded into browser scripts using a <script> tag that references the hosted SDK package. A custom browser SDK was built with a specified set of services (for instructions, refer to Building the SDK for Browser).

Each service was integrated in the mobile web app following the guidelines and code samples available in the AWS SDK for JavaScript Developer Guide. The implementation of live transcription features required some additional steps because Amazon Transcribe Streaming WebSocket requires developers to encode the audio with event stream encoding and use the Signature Version 4 signing process for adding authentication information to AWS API requests sent by HTTP. For more information about this approach, refer to Transcribe speech to text in real time using Amazon Transcribe with WebSocket.

The user sign-in webpage has been implemented using authentication features of the AWS Amplify JavaScript library. For more details about the authentication and authorization flow, refer to Accessing AWS services using an identity pool after sign-in.

The following walkthrough shows how to deploy AugmentAbility by using AWS Amplify Hosting; it includes the following steps:

  1. Create the Amazon Cognito user pool and identity pool, and grant permissions for accessing AWS AI services.
  2. Clone the GitHub repository and edit the configuration file.
  3. Deploy the mobile web app to the AWS Amplify console.
  4. Use the mobile web app.

Create the Amazon Cognito user pool and identity pool, and grant permissions for accessing AWS AI services

The first step required for deploying the app consists of creating an Amazon Cognito user pool with the Hosted UI enabled, creating an Amazon Cognito identity pool, integrating the two pools, and finally granting permissions for accessing AWS services to the AWS Identity and Access Management (IAM) role associated with the identity pool. You can either complete this step by manually working on each task, or by deploying an AWS CloudFormation template.

The CloudFormation template automatically provisions and configures the necessary resources, including the Amazon Cognito pools, IAM roles, and IAM policies.

  1. Sign in to the AWS Management Console and launch the CloudFormation template by choosing Launch Stack:

    The template launches in the EU West (Ireland) AWS Region by default. To launch the solution in a different Region, use the Region selector in the console navigation bar. Make sure to select a Region in which the AWS services in scope (Amazon Cognito, AWS Amplify, Amazon Transcribe, Amazon Polly, Amazon Translate, Amazon Rekognition, and Amazon Textract) are available (us-east-2, us-east-1, us-west-1, us-west-2, ap-south-1, ap-northeast-2, ap-southeast-1, ap-southeast-2, ca-central-1, eu-central-1, eu-west-1, eu-west-2).
  2. Choose Next.
  3. For Region, enter the identifier of the Region you want use (among the supported ones).
  4. For Username, enter the user name you want to use to access the app.
  5. For Email, enter the email address to which the temporary password for your first sign-in should be sent.
  6. Choose Next.
  7. On the Configure stack options page, choose Next.
  8. On the Review page, review and confirm the settings.
  9. Select the check box acknowledging that the template will create IAM resources and may require an AWS CloudFormation capability.
  10. Choose Create stack to deploy the stack.

You can view the status of the stack on the AWS CloudFormation console in the Status column. You should receive a CREATE_COMPLETE status in a couple of minutes.

As part of the template deployment, the following permissions are granted to the IAM role that is assumed by the authenticated user:

  • transcribe:StartStreamTranscriptionWebSocket
  • translate:TranslateText
  • comprehend:DetectDominantLanguage
  • polly:SynthesizeSpeech
  • rekognition:DetectText
  • rekognition:DetectLabels
  • textract:DetectDocumentText

Even though Amazon Comprehend is not explicitly used in this web application, permissions are granted for the action comprehend:DetectDominantLanguage. Amazon Translate may automatically invoke Amazon Comprehend to determine the language of the text to be translated if a language code isn’t specified.

Clone the GitHub repository and edit the configuration file

Now that access to AWS AI services has been configured, you’re ready to clone the GitHub repository and edit the configuration file.

  1. In the AWS AugmentAbility GitHub repo, choose Code and Download ZIP.
    You’re either prompted to choose a location on your computer where the ZIP file should be downloaded to, or it will automatically be saved in your Downloads folder.
  2. After you download the file, unzip it and delete the ZIP file.
    You should have obtained a folder named aws-augmentability-main with some files and subfolders in it.
  3. Create a file named config.js with any text editor, and enter the following content in it:
    var appConfig = {
        "IdentityPoolId": "INSERT_COGNITO_IDENTITY_POOL_ID"
    }
    
    var amplifyConfig = {
        "Auth": {
            "region": "INSERT_AWS_REGION_ID",
            "userPoolId": "INSERT_COGNITO_USER_POOL_ID",
            "userPoolWebClientId": "INSERT_COGNITO_USER_POOL_CLIENT_ID",
            "mandatorySignIn": true,
            "cookieStorage": {
                "domain": window.location.hostname,
                "path": "/",
                "expires": 30,
                "secure": true
          }
        }
    }
  4. In the config.js file you created, replace the four INSERT_ strings with the Amazon Cognito identity pool ID, identifier of your Region of choice, Amazon Cognito user pool ID, and user pool client ID.
    You can retrieve such values by opening the AWS CloudFormation console, choosing the stack named augmentability-stack, and choosing the Outputs tab.
    Screenshot of the CloudFormation stack Outputs tab.
  5. Save the config.js file in the aws-augmentability-main folder, and zip the folder to obtain a new aws-augmentability-main.zip file.

Deploy the mobile web app to the Amplify console

Now that you have downloaded and edited the AugmentAbility project files, you’re ready to build and deploy the mobile web app using the Amplify console.

  1. On the Get started with Amplify Hosting page, choose Deploy without Git provider.
  2. Choose Continue.
  3. In the Start a manual deployment section, for App name, enter the name of your app.
  4. For Environment name, enter a meaningful name for the environment, such as development or production.
  5. For Method, choose Drag and drop.
  6. Either drag and drop the aws-augmentability-main.zip file from your computer onto the drop zone or use Choose files to select the aws-augmentability-main.zip file from your computer.
  7. Choose Save and deploy, and wait for the message Deployment successfully completed.

Use the mobile web app

The mobile web app should now be deployed. Before accessing the app for the first time, you have to set a new password for the user that has been automatically created during Step 1. You can find the link to the temporary login screen in the Outputs tab for the CloudFormation stack (field UserPoolLoginUrl). For this first sign-in, you use the user name you set up and the temporary password you received via email.

After you set your new password, you’re ready to test the mobile web app.

In the General section of the Amplify console, you should be able to find a link to the app under the Production branch URL label. Open it or send it to your smartphone, then sign in with your new credentials, and start playing with AugmentAbility.

Animated screenshot showcasing the “Live transcription and text to speech” feature of AWS AugmentAbility.

Animated screenshot showcasing the “Live transcription and text to speech” feature of AWS AugmentAbility.

Animated screenshot showcasing the “Object detection” feature of AWS AugmentAbility.

Animated screenshot showcasing the “Object detection” feature of AWS AugmentAbility.

Animated screenshot showcasing the “Text recognition” feature of AWS AugmentAbility.

Animated screenshot showcasing the “Text recognition” feature of AWS AugmentAbility.

Next steps

If you want to make changes to the mobile web app, you can work on the files cloned from the repository, locally build the mobile web app (as explained in the README file), and then redeploy the app by uploading the updated ZIP file via the Amplify console. As an alternative, you can create a GitHub, Bitbucket, GitLab, or AWS CodeCommit repository to store your project files, and connect it to Amplify to benefit from automatic builds on every code commit. To learn more about this approach, refer to Getting started with existing code. If you follow this tutorial, make sure to replace the command npm run build with npm run-script build at Step 2a.

To create additional users on the Amazon Cognito console, refer to Creating a new user in the AWS Management Console. In case you need to recover the password for a user, you should use the temporary login screen you used for changing the temporary password. You can find the link on the Outputs tab of the CloudFormation stack (field UserPoolLoginUrl).

Clean up

When you’re done with your tests, to avoid incurring future charges, delete the resources created during this walkthrough.

  1. On the AWS CloudFormation console, choose Stacks in the navigation pane.
  2. Choose the stack augmentability-stack.
  3. Choose Delete and confirm deletion when prompted.
  4. On the Amplify console, select the app you created.
  5. On the Actions menu, choose Delete app and confirm deletion when prompted.

Conclusion

In this post, I showed you how to deploy a code sample that uses AWS AI and ML services to put features such as live transcription, text to speech, object detection, or text recognition in the hands of everyone. Knowing how to build applications that can be used by people with a wide range of abilities and disabilities is key for creating more inclusive and accessible products.

To get started with AugmentAbility, clone or fork the GitHub repository and start experimenting with the mobile web app. If you want to experiment with AugmentAbility before deploying resources in your AWS account, you can check out the live demo (credentials: demo-user, Demo-password-1).


About the Author

Luca Guida is a Solutions Architect at AWS; he is based in Milan and supports Italian ISVs in their cloud journey. With an academic background in computer science and engineering, he started developing his AI/ML passion at university; as a member of the natural language processing (NLP) community within AWS, Luca helps customers be successful while adopting AI/ML services.