This Guidance demonstrates how to set up audio playback for a webpage using Amazon Polly, which can read the content of the webpage aloud for your visitors and highlight the text as it’s being narrated. This text-to-speech capability enhances accessibility for your users, representing a crucial step in your organization's accessibility strategy. Furthermore, audio-enriched content is more impactful and memorable, helping to drive increased traffic to your page and strengthen your brand.

Please note: [Disclaimer]

Architecture Diagram

Download the architecture diagram PDF 
  • Static Webpages
  • This architecture diagram shows how to use Amazon Polly to read and highlight content on static webpages.

  • Dynamic Webpages
  • This architecture diagram shows how to use Amazon Polly to read and highlight content on dynamic webpages.

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

  • You can adjust and test the speech configuration for static and pre-generated speech in an environment outside of the web application. When you’re ready to deploy changes to your web application, you can link your webpages to the pre-generated speech files as part of your existing web publication process. You can also adjust and test dynamic generation in a test web environment by using a modified version of the client-side JavaScript provided in this Guidance. You can then deploy these changes to your production web server as part of your web application update process. Additionally, Amazon CloudWatch allows you to monitor the use of Amazon Polly and Amazon S3 resources. Notably, the Amazon Polly request character count increases as the number of requests for speech generation increases.

    Read the Operational Excellence whitepaper 
  • A Cognito identity pool provides unauthenticated users with sufficient access to Amazon Polly and Amazon S3 resources to generate speech from text on the webpage. You can modify this to provide access only to authenticated users. Alternatively, you can use CloudFront to distribute the audio content so that webpage visitors will not have direct access to the Amazon S3 bucket. This enables you to secure and restrict access, such as by using signed URLs or cookies, geographical restrictions, and AWS WAF protections.

    This Guidance uses the 256-bit Advanced Encryption Standard (AES) to encrypt the Amazon S3 bucket, but you can modify this Guidance to use AWS Key Management Service (AWS KMS). For data in transit, access to Amazon Polly occurs through the AWS Command Line Interface (AWS CLI), an AWS software development kit (AWS SDK) tool for JavaScript, or an HTTPS or TLS connection. You can use an Amazon S3 bucket policy to mandate HTTPS to access the bucket.

    Read the Security whitepaper 
  • Amazon S3 provides highly durable storage, and most storage classes replicate objects across three Availability Zones (AZs), increasing availability and decreasing the chance that a visitor is unable to access speech files. Additionally, an AWS SDK for JavaScript, which dynamically synthesizes speech, is equipped with throttling and retry capabilities.

    Read the Reliability whitepaper 
  • To optimize this Guidance, first identify static content and use the provided pre-generation capability as outlined in the architecture diagram. This improves performance by eliminating the need for real-time speech synthesis. Static content can be served directly from an Amazon S3 bucket, either as a static website or as the origin for a CloudFront distribution. This avoids the need for additional processing or storage capacity on your web server to serve speech and audio files. It also benefits from low-latency access to this data through caching and the availability of that content at the edge. Next, customize the content selection for speech generation, utilizing the provided configuration examples.

    To reduce latency, deploy this Guidance to the same AWS Region as your web application. Alternatively, if you use a CloudFront distribution, you can achieve low latency through caching or by serving content at the edge, closer to your web application’s visitors. Additionally, you can customize how content is selected for speech generation to control the text to be synthesized.

    Read the Performance Efficiency whitepaper 
  • To optimize costs, you can identify static content and pre-generate speech files so that Amazon Polly only needs to convert text to speech once. You can also limit the number of spoken voices or languages that Amazon Polly can generate for your website. Data transfer charges depend on the size and frequency of downloads of generated MP3 and speech mark's files. You can reduce this cost by hosting pre-generated files in Amazon S3, thereby also reducing the storage capacity requirement for your web server.

    You can also optimize costs through content caching, which is especially beneficial for popular content whose audio requires frequent access. You can test this approach and monitor costs using AWS Cost and Usage Reports (AWS CUR). Additionally, for content that is dynamic but small (fewer than 6,000 characters long), this Guidance generates speech marks synchronously, so you won’t need to store and download them from an Amazon S3 bucket, thereby reducing traffic costs.

    Read the Cost Optimization whitepaper 
  • This Guidance allows you to pre-generate speech for static content—a synthesize-once, listen-many approach that minimizes resource and energy use. Additionally, this Guidance uses serverless resources from Amazon Polly, Amazon S3, and Amazon Cognito, so you don’t need to overprovision compute and storage. You can also choose the Amazon S3 Express One Zone Storage Class rather than the default tier to reduce inter-AZ replication and your overall storage footprint.

    Read the Sustainability whitepaper 
Blog

Read webpages and highlight content using Amazon Polly

This blog post demonstrates how to use Amazon Polly—a leading cloud service that converts text into lifelike speech—to read the content of a webpage and highlight the content as it’s being read.

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.

Was this page helpful?