AWS Machine Learning Blog

Using responsible AI principles with Amazon Bedrock Batch Inference

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

The recent announcement of batch inference in Amazon Bedrock enables organizations to process large volumes of data efficiently at 50% less cost compared to On-Demand pricing. It’s especially useful when the use case is not latency sensitive and you don’t need real-time inference. However, as we embrace these powerful capabilities, we must also address a critical challenge: implementing responsible AI practices in batch processing scenarios.

In this post, we explore a practical, cost-effective approach for incorporating responsible AI guardrails into Amazon Bedrock Batch Inference workflows. Although we use a call center’s transcript summarization as our primary example, the methods we discuss are broadly applicable to a variety of batch inference use cases where responsible considerations and data protection are a top priority.

Our approach combines two key elements:

  • Responsible prompting – We demonstrate how to embed responsible AI principles directly into the prompts used for batch inference, preparing for responsible outputs from the start
  • Postprocessing guardrails – We show how to apply additional safeguards to the batch inference output, making sure that the remaining sensitive information is properly handled

This two-step process offers several advantages:

  • Cost-effectiveness – By applying heavy-duty guardrails to only the typically shorter output text, we minimize processing costs without compromising on ethics
  • Flexibility – The technique can be adapted to various use cases beyond transcript summarization, making it valuable across industries
  • Quality assurance – By incorporating responsible considerations at both the input and output stages, we maintain high standards of responsible AI throughout the process

Throughout this post, we address several key challenges in responsible AI implementation for batch inference. These include safeguarding sensitive information, providing accuracy and relevance of AI-generated content, mitigating biases, maintaining transparency, and adhering to data protection regulations. By tackling these challenges, we aim to provide a comprehensive approach to responsible AI use in batch processing.

To illustrate these concepts, we provide practical step-by-step guidance on implementing this technique.

Solution overview

This solution uses Amazon Bedrock for batch inference to summarize call center transcripts, coupled with the following two-step approach to maintain responsible AI practices. The method is designed to be cost-effective, flexible, and maintain high responsible standards.

  • Responsible data preparation and batch inference:
    • Use responsible prompting to prepare data for batch processing
    • Store the prepared JSONL file in an Amazon Simple Storage Service (Amazon S3) bucket
    • Use Amazon Bedrock batch inference for efficient and cost-effective call center transcript summarization
  • Postprocessing with Amazon Bedrock Guardrails:
    • After the completion of initial summarization, apply Amazon Bedrock Guardrails to detect and redact sensitive information, filter inappropriate content, and maintain compliance with responsible AI policies
    • By applying guardrails to the shorter output text, you optimize for both cost and responsible compliance

This two-step approach combines the efficiency of batch processing with robust responsible safeguards, providing a comprehensive solution for responsible AI implementation in scenarios involving sensitive data at scale.

In the following sections, we walk you through the key components of implementing responsible AI practices in batch inference workflows using Amazon Bedrock, with a focus on responsible prompting techniques and guardrails.

Prerequisites

To implement the proposed solution, make sure you have satisfied the following requirements:

Responsible prompting techniques

When setting up your batch inference job, it’s crucial to incorporate responsible guidelines into your prompts. The following is a concise example of how you might structure your prompt:

prompt = f"""
Summarize the following customer service transcript:

{transcript}

Instructions:
1. Focus on the main issue, steps taken, and resolution.
2. Maintain a professional and empathetic tone.
3. Do not include any personally identifiable information (PII) in the summary.
4. Use gender-neutral language even if gender is explicitly mentioned.
5. Reflect the emotional context accurately without exaggeration.
6. Highlight actionable insights for improving customer service.
7. If any part is unclear or ambiguous, indicate this in the summary.
8. Replace specific identifiers with generic terms like 'the customer' or '{{MASKED}}'.
"""

This prompt sets the stage for responsible summarization by explicitly instructing the model to protect privacy, minimize bias, and focus on relevant information.

Set up a batch inference job

For detailed instructions on how to set up and run a batch inference job using Amazon Bedrock, refer to Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock. It provides detailed instructions for the following steps:

  • Preparing your data in the required JSONL format
  • Understanding the quotas and limitations for batch inference jobs
  • Starting a batch inference job using either the Amazon Bedrock console or API
  • Collecting and analyzing the output from your batch job

By following the instructions in our previous post and incorporating the responsible prompt provided in the preceding section, you’ll be well-equipped to set up batch inference jobs.

Amazon Bedrock Guardrails

After the batch inference job has run successfully, apply Amazon Bedrock Guardrails as a postprocessing step. This provides an additional layer of protection against potential responsible violations or sensitive information disclosure. The following is a simple implementation, but you can update this based on your data volume and SLA requirements:

import boto3, os, json, time

# Initialize Bedrock client and set guardrail details
bedrock_runtime = boto3.client('bedrock-runtime')
guardrail_id = "<Your Guardrail ID>"
guardrail_version = "<Your Guardrail Version>"

# S3 bucket and file details i.e. output of batch inference job
bucket_name = '<S3 bucket with batch inference output>'
prefix = "<prefix>"
filename = '<filename>'

# Set up AWS session and S3 client
session = boto3.Session(
    aws_access_key_id=os.environ.get('AWS_ACCESS_KEY_ID'),
    aws_secret_access_key=os.environ.get('AWS_SECRET_ACCESS_KEY'),
    region_name=os.environ.get('AWS_REGION')
)
s3 = session.client('s3')

# Read and process batch inference output from S3
output_data = []
try:
    object_key = f"{prefix}{filename}"
    json_data = s3.get_object(Bucket=bucket_name, Key=object_key)['Body'].read().decode('utf-8')
    
    for line in json_data.splitlines():
        data = json.loads(line)
        output_entry = {
            'request_id': data['recordId'],
            'output_text': data['modelOutput']['content'][0]['text']
        }
        output_data.append(output_entry)
except Exception as e:
    print(f"Error reading JSON file from S3: {e}")

# Function to apply guardrails and mask PII data
def mask_pii_data(batch_output: str):
    try:
        pii_data = [{"text": {"text": batch_output}}]
        response = bedrock_runtime.apply_guardrail(
            guardrailIdentifier=guardrail_id,
            guardrailVersion=guardrail_version,
            source='OUTPUT',
            content=pii_data
        )
        return response['outputs'][0]['text'] if response['action'] == 'GUARDRAIL_INTERVENED' else pii_data
    except Exception as e:
        print(f"An error occurred: {str(e)}")

# Set up rate limiting: # 20 requests per minute, 3 seconds interval
rpm = 20
interval = 3

# Apply guardrails to each record
masked_data = []
for record in output_data:
    iteration_start = time.time()
    
    record['masked_data'] = mask_pii_data(record['output_text'])
    masked_data.append(record)
    
    # Implement rate limiting
    time.sleep(max(0, interval - (time.time() - iteration_start)))

Key points about this implementation:

  • We use the apply_guardrail method from the Amazon Bedrock runtime to process each output
  • The guardrail is applied to the ‘OUTPUT’ source, focusing on postprocessing
  • We handle rate limiting by introducing a delay between API calls, making sure that we don’t exceed the requests per minute quota, which is 20 requests per minute
  • The function mask_pii_data applies the guardrail and returns the processed text if the guardrail intervened
  • We store the masked version for comparison and analysis

This approach allows you to benefit from the efficiency of batch processing while still maintaining strict control over the AI’s outputs and protecting sensitive information. By addressing responsible considerations at both the input (prompting) and output (guardrails) stages, you’ll have a comprehensive approach to responsible AI in batch inference workflows.

Although this example focuses on call center transcript summarization, you can adapt the principles and methods discussed in this post to various batch inference scenarios across different industries, always prioritizing responsible AI practices and data protection.

Responsible considerations for responsible AI

Although the prompt in the previous section provides a basic framework, there are many responsible considerations you can incorporate depending on your specific use case. The following is a more comprehensive list of responsible guidelines:

  • Privacy protection – Avoid including any personally identifiable information in the summary. This protects customer privacy and aligns with data protection regulations, making sure that sensitive personal data is not exposed or misused.
  • Factual accuracy – Focus on facts explicitly stated in the transcript, avoiding speculation. This makes sure that the summary remains factual and reliable, providing an accurate representation of the interaction without introducing unfounded assumptions.
  • Bias mitigation – Be mindful of potential biases related to gender, ethnicity, location, accent, or perceived socioeconomic status. This helps prevent discrimination and maintains fair treatment for your customers, promoting equality and inclusivity in AI-generated summaries.
  • Cultural sensitivity – Summarize cultural references or idioms neutrally, without interpretation. This respects cultural diversity and minimizes misinterpretation, making sure that cultural nuances are acknowledged without imposing subjective judgments.
  • Gender neutrality – Use gender-neutral language unless gender is explicitly mentioned. This promotes gender equality and minimizing stereotyping, creating summaries that are inclusive and respectful of all gender identities.
  • Location neutrality – Include location only if relevant to the customer’s issue. This minimizes regional stereotyping and focuses on the actual issue rather than unnecessary generalizations based on geographic information.
  • Accent awareness – If accent or language proficiency is relevant, mention it factually without judgment. This acknowledges linguistic diversity without discrimination, respecting the varied ways in which people communicate.
  • Socioeconomic neutrality – Focus on the issue and resolution, regardless of the product or service tier discussed. This promotes fair treatment regardless of a customer’s economic background, promoting equal consideration of customers’ concerns.
  • Emotional context – Use neutral language to describe emotions accurately. This provides insight into customer sentiment without escalating emotions, allowing for a balanced representation of the interaction’s emotional tone.
  • Empathy reflection – Note instances of the agent demonstrating empathy. This highlights positive customer service practices, encouraging the recognition and replication of compassionate interactions.
  • Accessibility awareness – Include information about any accessibility needs or accommodations factually. This promotes inclusivity and highlights efforts to accommodate diverse needs, fostering a more accessible and equitable customer service environment.
  • Responsible behavior flagging – Identify potentially irresponsible behavior without repeating problematic content. This helps identify issues for review while minimizing the propagation of inappropriate content, maintaining responsible standards in the summarization process.
  • Transparency – Indicate unclear or ambiguous information in the summary. This promotes transparency and helps identify areas where further clarification might be needed, making sure that limitations in understanding are clearly communicated.
  • Continuous improvement – Highlight actionable insights for improving customer service. This turns the summarization process into a tool for ongoing enhancement of service quality, contributing to the overall improvement of customer experiences.

When implementing responsible AI practices in your batch inference workflows, consider which of these guidelines are most relevant to your specific use case. You may need to add, remove, or modify instructions based on your industry, target audience, and specific responsible considerations. Remember to regularly review and update your responsible guidelines as new challenges and considerations emerge in the field of AI ethics.

Clean up

To delete the guardrail you created, follow the steps in Delete a guardrail.

Conclusion

Implementing responsible AI practices, regardless of the specific feature or method, requires a thoughtful balance of privacy protection, cost-effectiveness, and responsible considerations. In our exploration of batch inference with Amazon Bedrock, we’ve demonstrated how these principles can be applied to create a system that not only efficiently processes large volumes of data, but does so in a manner that respects privacy, avoids bias, and provides actionable insights.

We encourage you to adopt this approach in your own generative AI implementations. Start by incorporating responsible guidelines into your prompts and applying guardrails to your outputs. Responsible AI is an ongoing commitment—continuously monitor, gather feedback, and adapt your approach to align with the highest standards of responsible AI use. By prioritizing ethics alongside technological advancement, we can create AI systems that not only meet business needs, but also contribute positively to society.


About the authors

Ishan Singh is a Generative AI Data Scientist at Amazon Web Services, where he helps customers build innovative and responsible generative AI solutions and products. With a strong background in AI/ML, Ishan specializes in building Generative AI solutions that drive business value. Outside of work, he enjoys playing volleyball, exploring local bike trails, and spending time with his wife and dog, Beau.

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.