AWS Public Sector Blog

Breaking barriers: How AWS is revolutionizing the accessibility of federal agency communications for people with visual disabilities

AWS Branded Background with text "Breaking barriers: How AWS is revolutionizing the accessibility of federal agency communications for people with visual disabilities"

Over 7.2 million Americans with visual disabilities face barriers accessing critical government information. Equal access to public communications is both a legal requirement and essential service. AWS enables federal agencies to provide accessible communications through automated document-to-speech conversion. This solution combines Amazon Simple Storage Service (Amazon S3), Amazon Textract, and Amazon Polly to transform written government documents into high-quality audio content.

When agencies upload documents to Amazon S3, Amazon Textract extracts the text while Amazon Polly converts it to natural-sounding speech. This automated process maintains document privacy, while giving visually impaired citizens independent access to important information. The solution we will explore in this post addresses three critical needs: compliance with accessibility regulations, improved service delivery to visually impaired citizens, and efficient use of agency resources.

Federal agencies can implement this solution to meet legal accessibility requirements, serve citizens more effectively, reduce manual processing costs, and scale accessibility services. This document outlines the technical architecture, implementation approach, and expected outcomes for deploying automated document-to-speech conversion using AWS services.

AWS services for accessibility implementation

The solution we will discuss in this post uses AWS services to automate document processing and delivery. Amazon S3 stores and manages source documents, while AWS Lambda functions process documents and coordinate service interactions across the workflow. Amazon Textract extracts text from documents, and Amazon Polly converts this text to natural-sounding speech. AWS Step Functions manages the workflow orchestration, with Amazon Simple Queue Service (Amazon SQS) handling message queuing to ensure reliable processing. Amazon DynamoDB tracks document status and metadata throughout the process. Finally, Amazon Connect delivers the audio content to citizens.

This architecture makes government communications accessible through automated text-to-speech conversion, improving service delivery for all citizens.

Architecture

The following figure illustrates a serverless workflow through which text documents are processed by multiple AWS services. Documents stored in Amazon S3 trigger a processing pipeline that uses AWS Step Functions to coordinate Amazon Textract for text extraction and Amazon Polly for text-to-speech conversion. Amazon Connect provides the interface for citizens to access the audio output, while Amazon DynamoDB tracks the processing status.

Figure 1. Architecture diagram of the solution

Architecture workflow

Our accessibility solution processes documents through the following workflow:

  1. Federal agencies upload PDF notices to Amazon S3, initiating immediate processing. Amazon S3 implements versioning and server-side encryption, with IAM policies restricting bucket access.
  2. An AWS Lambda function, triggered by S3 event notifications, processes document metadata using standard PDF processing libraries. It creates a DynamoDB entry with a “RECEIVED” status and unique ID, then routes document details to an Amazon SQS queue.
  3. AWS Step Functions processes documents in batches of 200 or after five minutes through a Map state, handling up to 200 documents simultaneously.
  4. AWS Step Functions invokes Amazon Textract’s asynchronous API to extract text, forms, and tables from PDFs, capturing spatial information and confidence scores. Amazon Textract sends completion notifications through SNS.
  5. An AWS Lambda function processes extracted text using natural language processing techniques, detecting sentence boundaries, named entities, and normalizing text. It applies government-specific terminology rules, stores the processed text in Amazon S3, and updates the Amazon DynamoDB status to “PROCESSED.”
  6. AWS Step Functions sends the processed text with SSML (Speech Synthesis Markup Language) tags to Amazon Polly. SSML enhances the audio output quality by controlling aspects like pronunciation, volume, pitch, and pacing, creating more natural-sounding speech. For example, SSML can properly handle abbreviations, numbers, and specialized government terms. The system generates 24kHz MP3 audio using neural TTS voices matched to document language and content.
  7. The system stores audio files in a read-optimized Amazon S3 bucket using document IDs, updates the Amazon DynamoDB status to “AUDIO_READY,” and signals readiness through SQS.
  8. Amazon Connect retrieves recipient lists from Amazon DynamoDB and creates call flows handling busy signals, voicemail, and failures while respecting time zones.
  9. Amazon Connect makes outbound calls using official government agency caller IDs, streaming audio files from S3 to recipients. Recipients control playback using phone keypad commands.
  10. Amazon Connect records outcomes in DynamoDB, marking successful deliveries as “DELIVERED.” For failures, Lambda analyzes causes and schedules retries using exponential backoff.

Conclusion

The solution we discussed in this post uses AWS services to improve government communication accessibility and meet federal accessibility requirements. The serverless architecture automates document-to-speech conversion and delivers information to visually impaired citizens. The key features of the solution include secure document processing, text-to-speech conversion, and delivery tracking. The solution delivers value while maintaining flexibility to adapt to agency needs and citizen feedback.

Looking to the future, we plan to explore enhancements could improve the solution’s functionality. These enhancements could include:

  • Implementing AWS X-Ray for distributed tracing, CloudWatch for comprehensive monitoring, and AWS CloudTrail for API auditing.
  • Integrating Amazon Comprehend could to enable advanced text analysis, and Amazon Translate to support multi-language capabilities.
  • Adding Amazon CloudWatch dashboards, cost allocation tags, and automated testing for audio quality to further enhance operational visibility and management.
Natti Swaminathan

Natti Swaminathan

Natti is a senior solutions architect on the US federal civilian team at AWS. He works closely with customers to build and architect mission critical solutions. Natti has extensive experience leading, architecting, and implementing high-impact technology solutions that address diverse business needs. He has a master’s degree in electrical and computer engineering from Wichita State University and an MBA from North Carolina State.

Sri Gudavalli

Sri Gudavalli

Sri is a solutions architect with AWS, specializing in enterprise cloud transformations and generative AI implementations. He partners with enterprise customers across the US-East Region to architect and deliver cloud-native solutions, leveraging advanced AWS services including Amazon Bedrock, Amazon CodeWhisperer, and large language models. His expertise spans cloud migration and application modernization, and he helps organizations harness the power of generative AI to drive innovation and business value.